dair-ai/Prompt-Engineering-Guide
Prompt Engineering Guide
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task-oriented AI. The repository serves as a central hub for learning how to design, evaluate, and optimize interactions with language models, ranging from basic prompting techniques to complex, multi-step reasoning workflows.
The guide distinguishes itself through its focus on agentic orchestration and advanced context engineering. It details methodologies for dynamic task decomposition, where complex queries are broken into manageable subtasks, and hierarchical context engineering, which structures instructions to manage agent behavior and domain-specific knowledge. Furthermore, it covers the integration of external tools through function calling and the implementation of stateful memory systems to track task progress and execution history.
Beyond core prompting strategies, the repository covers a broad capability surface including retrieval-augmented generation, synthetic data generation, and automated evaluation using model-based verification. It also provides technical documentation and benchmarks for a wide array of proprietary and open-source models, alongside practical guidance on mitigating security risks such as prompt injection and jailbreaking.
The documentation is maintained as an open-source repository, offering a collection of guides, research paper summaries, and interactive notebooks to support hands-on learning.
Features
- Agentic Controllers - "Coordinates autonomous task execution by chaining reasoning steps and managing tool-use loops through a central controller."
- Parallel Agent Swarming - "Improves inference efficiency by spawning concurrent sub-agents to handle independent components of a larger research or reasoning task."
- Agent Execution Loops - Understanding the agent loop is fundamental to debugging and optimizing AI agents. The loop consists of repeated cycles of: 1. **Action**: The agent decides to take an action (call a tool) 2. **Environment Response**: Th
- Retrieval Augmented Generation Frameworks - In this section, we summarize the key developments of the components of a RAG system, which include Retrieval, Generation, and Augmentation. ### Retrieval[](#retrieval) Retrieval is the component of RAG that deals with r
- Language Models - In addition to the release of Mistral Large, a smaller model and optimized model called Mistral Small is also announced. Mistral Small is optimized for low-latency workloads and outperforms Mixtral 8x7B. Mistral AI repor
- Large Language Model Capabilities - Claude 3 capabilities include advanced reasoning, basic mathematics, analysis, data extraction, forecasting, content creation, code generation, and converting in non-English languages like Spanish, Japanese, and French.
- Large Language Models - Meta also reported that they will be releasing a 400B parameter model which is still training and coming soon! There are also efforts around multimodal support, multilingual capabilities, and longer context windows in th
- Mixture-of-Experts Models - Mixtral 8x22B is a new open large language model (LLM) released by Mistral AI. Mixtral 8x22B is characterized as a sparse mixture-of-experts model with 39B active parameters out of a total of 141B parameters. ## Capabili
- Model Architecture Overviews - Kimi K2.5 builds on the Kimi K2 language model—a 1.04 trillion total parameter MoE model utilizing 384 experts with 8 activated per token (32 billion activated parameters). The multimodal architecture consists of three c
- Multimodal Models - Gemini is trained natively multimodal and exhibits the ability to combine capabilities across modalities with the reasoning capabilities of the language model. Capabilities include but not limited to information extracti
- Multimodal Vision Models - GPT-4 Turbo with vision is the newest version of GPT-4. It has the ability to understand images, in addition to all other GPT-4 Turbo capabilties. The model returns a maximum of 4,096 output tokens, and a context window
- Video Generation Models - Sora is reported to be a diffusion model that can generate entire videos or extend generated videos. It also uses a Transformer architecture leading to scaling performance. Videos and images are represented as patches, s
- AI Agent Frameworks - In this section, we provide an overview of LLM-based agents, including definitions, common design patterns, tips, use cases, and applications. This content is based on our new course "Building Effective AI Agents with n8
- AI Agent Planning - At the core of any effective AI agent is its planning capability, powered by large language models (LLMs). Modern LLMs enable several crucial planning functions: - Task decomposition through chain-of-thought reasoning -
- Autonomous AI Agents - AI agents combine LLMs with autonomous decision-making capabilities, enabling them to perform complex tasks through reasoning, reflection, and dynamic tool usage. **Example: Task Planning Agent** **Scenario**: User asks
- Large Language Models - Mistral Large's capabilities and strengths include: - 32K tokens context window - has native multilingual capacities (fluent in English, French, Spanish, German, and Italian) - strong capabilities in reasoning, knowledge
- Instruction-Tuned Language Models - A Mixtral 8x7B - Instruct model is also released together with the base Mixtral 8x7B model. This includes a chat model fine-tuned for instruction following using supervised fine tuning (SFT) and followed by direct prefer
- Large Language Models - Both the OLMo-7B and OLMo-1B models adopt a decoder-only transformer architecture. It follows improvements from other models like PaLM and Llama: - no biases - a non-parametric layer norm - SwiGLU activation function - R
- Multimodal Models - Most vision-adapted models treat multimodal capability as an add-on to a text backbone, introducing visual tokens late in training at high ratios (e.g., 50% or more). Kimi K2.5 takes a different approach. The team found
- Adversarial Prompts - This section contains a collection of prompts for that raises awareness of different LLM vulnerabilities. Prompt InjectionPrompt LeakingJailbreaking Last updated on Sun Dec 28 2025 Sponsored by Hallucination Identificati
- Few-Shot Prompting - While large-language models demonstrate remarkable zero-shot capabilities, they still fall short on more complex tasks when using the zero-shot setting. Few-shot prompting can be used as a technique to enable in-context
- Prompt Engineering - Prompt Engineering helps to effectively design and improve prompts to get better results on different tasks with LLMs. While the previous basic examples were fun, in this section we cover more advanced prompting engineer
- Retrieval Mechanisms - Retrieval is the component of RAG that deals with retrieving highly relevant context from a retriever. A retriever can be enhanced in many ways, including: **Enhancing Semantic Representations** This process involves dir
- Stateful Memory Management - "Maintains short-term working memory and long-term external storage to track task progress, execution history, and intermediate work."
- Task Planning Frameworks - Instead of reasoning ad-hoc inside a single context window, Deep Agents maintain structured task plans they can update, retry, and recover from. Think of it as a living to-do list that guides the agent toward its long-te
- Reasoning Models - Large reasoning models (LRMs) or simply, reasoning LLMs, are models explicitly trained to perform native thinking or chain-of-thought. Popular examples of reasoning models include Gemini 2.5 Pro, Claude 3.7 Sonnet, and o
- Text Generation - Using GPT-4's text generation, you can build applications to: - Draft documents - Write code - Answer questions about a knowledge base - Analyze texts - Give software a natural language interface - Tutor in a range of su
- Agent Development Guides - A technical resource detailing the architecture, planning, and tool-use patterns required to build autonomous, task-oriented AI systems.
- Context Engineering Guides - A methodology for structuring system prompts, memory, and tool definitions to improve the reliability and performance of agentic workflows.
- Prompt Engineering Knowledge Bases - A comprehensive collection of techniques, strategies, and best practices for optimizing interactions with large language models.
- RAG Implementation Guides - A technical guide covering the design, evaluation, and optimization of retrieval-augmented generation systems for knowledge-intensive tasks.
- Reasoning Strategy Guides - A curated repository of advanced prompting strategies and research-backed methods for eliciting complex reasoning and planning from language models.
- Mixture-of-Experts Models - Grok-1 is a mixture-of-experts (MoE) large language model (LLM) with 314B parameters which includes the open release of the base model weights and network architecture. Grok-1 is trained by xAI and consists of MoE model
- Prompting Fundamentals - ## Prompting an LLM[](#prompting-an-llm) You can achieve a lot with simple prompts, but the quality of results depends on how much information you provide it and how well-crafted the prompt is. A prompt can contain infor
- Task Decomposition Strategies - "Breaks complex user queries into manageable subtasks that are executed sequentially or in parallel by specialized sub-agents."
- Agent Design Patterns - In this section, we provide an overview of LLM-based agents, including definitions, common design patterns, tips, use cases, and applications. This content is based on our new course "Building Effective AI Agents with n8
- Function Calling Interfaces - "Enables agents to interact with external APIs and environments by mapping natural language intent to structured tool definitions."
- Planning Strategies - When building agentic systems, **planning** is an important component to enable the system to better perform complex tasks. As an example, when building deep research agentic systems, planning helps in planning the actua
- Agent Memory Modules - The memory module helps to store the agent's internal logs including past thoughts, actions, and observations from the environment, including all interactions between agent and user. There are two main memory types that
- Research Agents - Deep Research is OpenAI’s new agent that can perform **multi-step research** on the internet for performing complex tasks like generating reports and competitor analysis. It is an **agentic reasoning system** that has ac
- Agentic Workflows - While large language models (LLMs) excel at simple, narrow tasks like translation or email generation, they fall short when dealing with complex, broader tasks that require multiple steps, planning, and reasoning. These
- Prompt Chaining Patterns - Prompt chaining involves breaking down a complex task into sequential LLM calls, where each step's output feeds into the next. **Example: Document Generation Workflow** This workflow demonstrates a prompt chaining patter
- Prompt Engineering Patterns - **Problem**: Vague instructions lead to unpredictable behavior. **Example**: ``` Do some research and create a report. ``` **Better Approach**: ``` Execute research by: 1. Analyzing the user query to identify key informa
- Agent Swarms - Instead of executing a task as a reasoning chain, K2.5 initiates an Agent Swarm through dynamic task decomposition, subagent instantiation, and parallel subtask scheduling. A trainable orchestrator creates specialized fr
- Function Calling Guides - Below is a list of use cases that can benefit from the function calling capability of LLMs: - **Conversational Agents**: Function calling can be used to create complex conversational agents or chatbots that answer comple
- Agentic Retrieval Augmented Generation - **Agentic RAG** is a system that leverages reasoning models for building agentic RAG applications that involve advanced tool use and reasoning on complex knowledge bases or sources. It can involve leveraging a **retrieva
- Multi-Agent Orchestration Systems - One big agent (typically with a very long context) is no longer enough. I've seen arguments (opens in a new tab) against multi-agent systems and in favor of monolithic systems, but I'm skeptical about this. The orchestra
- Context Window Benchmarks - Gemini 1.5 Pro achieves near-perfect "needle" recall of up to 1 million tokens in all modalities, i.e., text, video, and audio. To put the context window support of Gemini 1.5 Pro into perspective, Gemini 1.5 Pro can pro
- LLM Agent Evaluation Frameworks - *AgentBench benchmark to evaluate LLM-as-Agent on real-world challenges and 8 different environments. Figure source: Liu et al. 2023* Similar to evaluating LLM themselves, evaluating LLM agents is a challenging task. Acc
- Function Calling - Function calling is the ability to reliably connect LLMs to external tools to enable effective tool usage and interaction with external APIs. LLMs like GPT-4 and GPT-3.5 have been fine-tuned to detect when a function nee
- Reasoning Engines - Perhaps one of the most difficult tasks for an LLM today is one that requires some form of reasoning. Reasoning is one of most interesting areas due to the types of complex applications that can emerge from LLMs. There h
- Visual Question Answering Models - Visual question answering involves asking the model questions about an image passed as input. The Gemini models show different multimodal reasoning capabilities for image understanding over charts, natural images, memes,
- Agentic Systems - Agentic systems can be categorized into two main types: ### 1\. AI Workflows[](#1-ai-workflows) **AI workflows** are systems where LLMs and tools are orchestrated through **predefined code paths**. These systems follow a
- Context Engineering Techniques - ### Layered Context Architecture[](#layered-context-architecture) Context engineering applies to all stages of the AI agent build process. Depending on the AI Agent, it's sometimes helpful to think of context as a hierar
- Hallucination Detection - Hallucination Identification
- LLM-as-a-Judge Frameworks - When building applications that require automated evaluation/assessment, LLM-as-a-Judge is an option. LLM-as-a-Judge leverages the complex understanding and reasoning of large amounts of information. Reasoning LLMs are i
- LLM Agent Frameworks - Generally speaking, an LLM agent framework can consist of the following core components: - User Request - a user question or request - Agent/Brain - the agent core acting as coordinator - Planning - assists the agent in
- Code Completion Models - Let's test out a basic example where we ask the model to generate a valid Python function that can generate the nth fibonnaci number. ``` messages = [ { "role": "system", "content": "You are an expert programmer that hel
- Code Reasoning Models - With its long-context reasoning, Gemini 1.5 Pro is can answer questions about the codebase. Using Google AI Studio, Gemini 1.5 Pro allows up to 1 million tokens, so we can upload an entire codebase and prompt it with dif
- Conversational Language Models - ChatGPT is a new model trained by OpenAI (opens in a new tab) that has the capability to interact in a conversational way. This model is trained to follow instructions in a prompt to provide appropriate responses in the
- Creative Content Generators - Gemini Advanced demonstrates the ability to perform creative collaboration tasks. It can be used like other models such as GPT-4 for generating fresh content ideas, analyzing trends and strategies for growing audiences.
- Image Generation Models - 4o Image Generation is OpenAI’s latest image model embedded into ChatGPT. It can create photorealistic outputs, take images as inputs and transform them, and follow detailed instructions, including generating text into i
- Mathematical and Code Generation Models - The table below shows how Mistral Large performs on common maths and coding benchmarks. Mistral Large demonstrates strong performance on the Math and GSM8K benchmarks but it is significantly outperformed on coding benchm
- Multilingual Language Models - The table below demonstrates Mistral Large performance on multilingual reasoning benchmarks. Mistral Large outperforms Mixtral 8x7B and Llama 2 70B in all languages, including French, German, Spanish, and Italian.
- Multimodal AI Models - Gemini is the newest most capable AI model from Google Deepmind. It's built with multimodal capabilities from the ground up and can showcases impressive crossmodal reasoning across texts, images, video, audio, and code.
- Multimodal Generation Models - An interesting capability of Gemini Advanced is that it can generate interleaved images and text. As an example, we prompted the following: ``` Please create a blog post about a trip to New York, where a dog and his owne
- Small Language Models - Phi-2 is the latest small language model (SLM) released by Microsoft Research. Phi-2 follows the previous Phi-1 model and Phi-1.5 models. Phi-1 is a 1.3 billion parameters model trained on "textbook quality" data from th
- Sparse Mixture of Experts Models - Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model released by Mistral AI (opens in a new tab). Mixtral has a similar architecture as Mistral 7B (opens in a new tab) but the main difference is that each la
- LLM Routing Patterns - Routing directs different requests to specialized LLM chains or agents based on query classification. **Example: Customer Support Router** This workflow illustrates a routing pattern for intelligent query distribution in
- Model Architecture Innovations - Kimi K2.5 advances the state of the art through two main contributions: joint optimization of text and vision, and Agent Swarm for parallel agent orchestration. Together, these enable strong performance across reasoning,
- Reasoning Frameworks - This work by Chi et al. (2024) (opens in a new tab) presents an approach for general reasoning and search on tasks that can be decomposed into components. The proposed graph-based framework, THOUGHTSCULPT, incorporates i
- Defense Tactics - It's widely known that language models tend to elicit undesirable and harmful behaviors such as generating inaccurate statements, offensive text, biases, and much more. Furthermore, other researchers have also developed
- Jailbreaking Techniques - LLMs like ChatGPT includes guardrails limiting the model from outputting harmful, illegal, unethical, or violent content of any kind. However, users on Reddit found a jailbreaking technique that allows a user to bypass t
- Code Completion Engines - These LLMs have also been incorporated into tools like GitHub Copilot which makes them useful for developers. One useful feature is the ability of the model to complete functions. *Prompt:* ``` # function to multiply two
- Agent Architectures - ### The Original Design Problem[](#the-original-design-problem) Let's look at a basic deep research agent architecture. The initial architecture connects the web search tool directly to the deep research agent. This desi
- Agent Verification Systems - Next to context engineering, verification is one of the most important components of an agentic system (though less often discussed). Verification boils down to verifying outputs, which can be automated (LLM-as-a-Judge)
- Agentic Context Management - Deep Agents don’t rely on conversation history alone. They store intermediate work in external memory like files, notes, vectors, or databases, letting them reference what matters without overloading the model’s context.
- Function Calling Interfaces - In this section, we provide an overview of LLM-based agents, including definitions, common design patterns, tips, use cases, and applications. This content is based on our new course "Building Effective AI Agents with n8
- Long Context Processing - To demonstrate Gemini 1.5 Pro abilities to process and analyze documents, we start with a very basic question answering task. the Gemini 1.5 Pro model in the Google AI Studio supports up to 1 million tokens so we are abl
- Memory Systems - The third essential component is memory management, which comes in two primary forms: 1. Short-term (Working) Memory - Functions as a buffer for immediate context - Enables in-context learning - Sufficient for most task
- Multi-Agent Systems - ### Sub-Agent Communication[](#sub-agent-communication) When designing multi-agent systems, carefully consider: **What information does the sub-agent need?** - For the search worker: Just the search query text - Not the
- Prompt Engineering - Let's start with a simple example and instruct the model to achieve a task based on an instruction. *Prompt*: ``` [INST] You are a helpful code assistant. Your task is to generate a valid JSON object based on the given i
- Prompt Engineering Techniques - **Few-Shot Prompting:** Providing the LLM with a few examples of desired input-output pairs guides it towards generating higher-quality responses by demonstrating the expected pattern. Learn more about few-shot prompting
- Retrieval Augmented Generation Frameworks - Some popular comprehensive tools to build RAG systems include LangChain (opens in a new tab), LlamaIndex (opens in a new tab), and DSPy (opens in a new tab). There are also a range of specialized tools that serve differe
- Safety Guardrails - There are some scenarios where the model will refuse to respond because of the safety alignment it has undergone. As an example, the model sometimes refuses to answer the prompt request below. It can be fixed by rephrasi
- Text Generation Engines - The generator in a RAG system is responsible for converting retrieved information into a coherent text that will form the final output of the model. This process involves diverse input data which sometimes require effort
- AI Agent Tooling Guides - The second critical component is an agent's ability to interface with external tools. A well-designed agent must not only have access to various tools but also understand when and how to use them appropriately. Common to
- Interactive Notebooks - Contains a collection of notebooks we have designed to help you get started with prompt engineering. More to be added soon! | Description | Notebook | | :-- | :-: | | Learn how to perform many different types of common t
- Trustworthiness Benchmarks - You can also find a GitHub repository with a complete evaluation kit for testing the trustworthiness of LLMs across the different dimensions. Code: https://github.com/HowieHwong/TrustLLM (opens in a new tab)
- Function Calling Implementations - As a basic example, let's say we asked the model to check the weather in a given location. The LLM alone would not be able to respond to this request because it has been trained on a dataset with a cutoff point. The way
- Transformer Architectures - Here is a summary of the mentioned technical details of Llama 3: - It uses a standard decoder-only transformer. - The vocabulary is 128K tokens. - It is trained on sequences of 8K tokens. - It applies grouped query atten
- Pre-training Corpora - This release also includes the release a pre-training dataset called Dolma (opens in a new tab) -- a diverse, multi-source corpus of 3 trillion token across 5B documents acquired from 7 different data sources. The creati
- Instruction-Tuned Models - Mistral 7B is designed for easy fine-tuning across various tasks. The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. This version of the
- Small Language Models - According to the model page (opens in a new tab), Phi-2 can be prompted using a QA format, a chat format, and the code format. Below we demonstrated how to effectively use these prompt templates using different scenarios
- Sparse Mixture of Experts Models - Mixtral 8x7B is a Sparse Mixture of Experts (SMoE) language model released by Mistral AI (opens in a new tab). Mixtral has a similar architecture as Mistral 7B (opens in a new tab) but the main difference is that each la
- Instruction Tuning Datasets - - Instruction finetuning scales well with the number of tasks and the size of the model; this suggests the need for scaling number of tasks and size of model further - Adding CoT datasets into the finetuning enables good
- Reinforcement Learning Frameworks - Training the orchestrator to effectively parallelize is non-trivial. The PARL reward function combines three components: a parallelism reward that incentivizes the orchestrator to actually spawn concurrent subagents (pre
- Fine-Tuning Services - Developers can now access the `GPT-4o-2024-08-06` checkpoint for fine-tuning through the dedicated fine-tuning dashboard (opens in a new tab). This process allows for customization of response structure, tone, and adhere
- Video Analysis Models - Gemini 1.5 Pro is trained with multimodal capabilities from the ground up and it also demonstrates video understanding capabilities. We tested a few prompts with one of the recent lectures on LLMs by Andrej Karpathy (ope
- Text Summarization - One of the standard tasks in natural language generation is text summarization. Text summarization can include many different flavors and domains. In fact, one of the most promising applications of language models is the
- Hierarchical Prompt Structures - "Structures system prompts and instructions into layered tiers to manage agent behavior, task constraints, and domain-specific knowledge."
- Prompt Design Principles - Here are some tips to keep in mind while you are designing your prompts: ### Start Simple[](#start-simple) As you get started with designing prompts, you should keep in mind that it is really an iterative process that re
- Programming Prompts - GPT-4 (OpenAI)Mixtral MoE 8x7B Instruct (Fireworks) ``` from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4", messages=[ { "role": "user", "content": "Can you write me a po
- Prompt Design Guides - **Specificity and Clarity:** Just like giving instructions to a human, prompts should clearly articulate the desired outcome. Ambiguity can lead to unexpected or irrelevant outputs. **Structured Inputs and Outputs:** Str
- Prompt Design Principles - As we cover more and more examples and applications with prompt engineering, you will notice that certain elements make up a prompt. A prompt contains any of the following elements: **Instruction** - a specific task or i
- Prompt Templates - ``` Your task is to extract model names from machine learning paper abstracts. Your response is an array of the model names in the format [\"model_name\"]. If you don't find model names in the abstract or you are not sur
- Active-Prompting - Chain-of-thought (CoT) methods rely on a fixed set of human-annotated exemplars. The problem with this is that the exemplars might not be the most effective examples for the different tasks. To address this, Diao et al.,
- Automatic Chain-of-Thought Prompting - When applying chain-of-thought prompting with demonstrations, the process involves hand-crafting effective and diverse examples. This manual effort could lead to suboptimal solutions. Zhang et al. (2022) (opens in a new
- Automatic Reasoning and Tool-use - Combining CoT prompting and tools in an interleaved manner has shown to be a strong and robust approach to address many tasks with LLMs. These approaches typically require hand-crafting task-specific demonstrations and c
- Chain-of-Thought Prompting - Image Source: Wei et al. (2022) (opens in a new tab) Introduced in Wei et al. (2022) (opens in a new tab), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps. You
- Generated Knowledge Prompting - Image Source: Liu et al. 2022 (opens in a new tab) LLMs continue to be improved and one popular technique includes the ability to incorporate knowledge or information to help the model make more accurate predictions. Usi
- Meta Prompting Strategies - According to Zhang et al. (2024) (opens in a new tab), the key characteristics of meta prompting can be summarized as follows: **1\. Structure-oriented**: Prioritizes the format and pattern of problems and solutions over
- ReAct Prompting - Below is a high-level example of how the ReAct prompting approach works in practice. We will be using OpenAI for the LLM and LangChain (opens in a new tab) as it already has built-in functionality that leverages the ReAc
- Reasoning and Acting Frameworks - ReAct is inspired by the synergies between "acting" and "reasoning" which allow humans to learn new tasks and make decisions or reasoning. Chain-of-thought (CoT) prompting has shown the capabilities of LLMs to carry out
- Zero-shot Prompting - Large language models (LLMs) today, such as GPT-3.5 Turbo, GPT-4, and Claude 3, are tuned to follow instructions and are trained on large amounts of data. Large-scale training makes these models capable of performing som
- Automatic Prompt Optimizers - Image Source: Zhou et al., (2022) (opens in a new tab) Zhou et al., (2022) (opens in a new tab) propose automatic prompt engineer (APE) a framework for automatic instruction generation and selection. The instruction gene
- Prompt Formatting Techniques - You have tried a very simple prompt above. A standard prompt has the following format: ``` ? ``` or ``` ``` You can format this into a question answering (QA) format, which is standard in a lot of QA datasets, as follows
- Prompt Functions - ## Introduction[](#introduction) When we draw a parallel between GPT's dialogue interface and a programming language's shell, the encapsulation prompt can be thought of as forming a function. This function has a unique n
- System Prompts - Below is the system prompt I have put together for this subagent: ``` You are an expert research planner. Your task is to break down a complex research query (delimited by ) into specific search subtasks, each focusing o
- Chain of Thought Prompting - Image Source: Kojima et al. (2022) (opens in a new tab) One recent idea that came out more recently is the idea of zero-shot CoT (opens in a new tab) (Kojima et al. 2022) that essentially involves adding "Let's think ste
- ReAct Prompting - To demonstrate how ReAct prompting works, let's follow an example from the paper. The first step is to select cases from a training set (e.g., HotPotQA) and compose ReAct-format trajectories. These are used as few-shot e
- Automated Research Agents - Deep Research can perform **complex multi-step research tasks** much faster than people can, reducing hours of work to minutes. It is useful for tasks that require extensive and complex web searches, as it figures out a
- Reasoning Methodologies - - Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic (opens in a new tab) (February 2024) - Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 (opens in a new
- Augmentation Strategies - Augmentation involves the process of effectively integrating context from retrieved passages with the current generation task. Before discussing more on the augmentation process, augmentation stages, and augmentation dat
- Prompt Injection Protections - Prompt injection is a type of LLM vulnerability where a prompt containing a concatenation of trusted prompt and untrusted inputs lead to unexpected behaviors, and sometimes undesired behaviors from the LLM. Prompt inject
- Adversarial Prompting - JailbreakingPrompt InjectionPrompt Leaking
- Synthetic Data Generation - Unfortunately, in the life of a Machine Learning Engineer, there's often a lack of labeled data or very little of it. Typically, upon realizing this, projects embark on a lengthy process of data collection and labeling.
- Fine-Tuning Datasets - In the above guide, we showcase a practical example of fine-tuning which involves training a model for emotion classification. Using a JSONL formatted dataset (opens in a new tab) containing text samples labeled with cor
- Prompt Chaining - To improve the reliability and performance of LLMs, one of the important prompt engineering techniques is to break tasks into its subtasks. Once those subtasks have been identified, the LLM is prompted with a subtask and
- Code Generation Prompts - Generate Code SnippetGenerate MySQL QueryDraw TiKZ Diagram
- Reflexion - Reflexion is best suited for the following: 1. **An agent needs to learn from trial and error**: Reflexion is designed to help agents improve their performance by reflecting on past mistakes and incorporating that knowle
- Self-Consistency - Perhaps one of the more advanced techniques out there for prompt engineering is self-consistency. Proposed by Wang et al. (2022) (opens in a new tab), self-consistency aims "to replace the naive greedy decoding used in c
- Tree of Thoughts - For complex tasks that require exploration or strategic lookahead, traditional or simple prompting techniques fall short. Yao et el. (2023) (opens in a new tab) and Long (2023) (opens in a new tab) recently proposed Tree
- Retrieval Augmented Generation Research - Advanced RAG helps deal with issues present in Naive RAG such as improving retrieval quality that could involve optimizing the pre-retrieval, retrieval, and post-retrieval processes. The pre-retrieval process involves op
- Coding Agents - Gemini is also used to build a generalist agent called AlphaCode 2 (opens in a new tab) that combines it's reasoning capabilities with search and tool-use to solve competitive programming problems. AlphaCode 2 ranks with
- Structured Output Schemas - In addition to the high-level instruction and the user input, you might have noticed that I spent a considerable amount of effort on the details related to the subtasks the planning agent needs to produce. Below are the
- Model Performance Benchmarks - Mixtral demonstrates strong capabilities in mathematical reasoning, code generation, and multilingual tasks. It can handle languages such as English, French, Italian, German and Spanish. Mistral AI also released a Mixtra
- Reasoning and Planning Research - There is a lot of debate about whether LLMs can reason and plan. Both reasoning and planning are important capabilities for unlocking complex applications with LLMs such as in the domains of robotics and autonomous agent
- Video Lectures - We have published a 1 hour lecture that provides a comprehensive overview of prompting techniques, applications, and tools. - Video Lecture - Notebook with code - Slides ---
- Code Generation Prompts - You can also use the code generation capabilities of these LLMs to generate code from comments alone. Let's look at another example that passes the instructions as a comment block: *Prompt:* ``` """ 1. Create a list of m
- Image Generation Interfaces - Access 4o Image Generation in the ChatGPT application (web or mobile) by prompting with text, or by selecting “Create an image” from the tools. The model is also accessible in Sora, or via OpenAI API with gpt-image-1. Te
- Prompt Engineering Guides - The guide demonstrates how you can use context caching to analyze the summaries of all the ML papers we've documented over the past year (opens in a new tab). We store these summaries in a text file, which can now be fed
- Automated Output Evaluators - "Automates the evaluation of agent outputs and reasoning quality by using a secondary model to validate task completion."
- Small Language Models - LLM researchers are keen to explore whether small language models have similar emergent capabilities as their large counterparts and if there are techniques for training that can help to achieve this. The model is traine
- Image Understanding Models - Gemini Ultra can also take few-shot prompts and generate images. For example, as shown in the example below, it can be prompted with one example of interleaved image and text where the user provides information about two
- Prompt Parameterization Strategies - Prompt injections have similarities to SQL injection (opens in a new tab) and we can potentially learn defense tactics from that domain. Inspired by this, a potential solution for prompt injection, suggested by Simon (op
- Meta Prompting - Meta Prompting is an advanced prompting technique that focuses on the structural and syntactical aspects of tasks and problems rather than their specific content details. This goal with meta prompting is to construct a m
- Classification Prompts - Few-Shot Sentiment ClassificationSentiment Classification
- Evaluation Prompts - Evaluate Plato's Dialogue
- Information Extraction Prompts - ## Background[](#background) The following prompt tests an LLM's capabilities to perform an information extraction task which involves extracting model names from machine learning paper abstracts. ## Prompt[](#prompt) ``
- Mathematical Prompts - This section contains a collection of prompts for testing the mathematical capabilities of LLMs. Evaluating Composite FunctionsAdding Odd Numbers Last updated on Sun Dec 28 2025 Sponsored by Draw a Person Using AlphabetE
- Prompt Performance Benchmarks - | | Precision | Recall | F1 | Template Stickiness | | --- | --- | --- | --- | --- | | *Baseline* | *61.2* | *70.6* | *65.6* | *79%* | | *CoT* | *72.6* | *85.1* | *78.4* | *87%* | | *Zero-CoT* | *75.5* | *88.3* | *81.4* |
- Reasoning Model Prompting - ### **General Usage Patterns & Prompting Tips**[](#general-usage-patterns--prompting-tips) - **Strategic Reasoning:** Use reasoning models for reasoning-heavy modules or components of your LLM-based applications, not for
- Program-Aided Reasoning - Gao et al., (2022) (opens in a new tab) presents a method that uses LLMs to read natural language problems and generate programs as the intermediate reasoning steps. Coined, program-aided language models (PAL), it differ
- Prompt Engineering Defenses - A simple defense tactic to start experimenting with is to just enforce the desired behavior via the instruction passed to the model. This is not a complete solution or offers any guarantees but it highlights the power of
- State Management Patterns - We are not showing it in v1 of our deep research agent, but an important part of this project was to optimize the results to generate the final report. In many cases, the agentic system might need to revise all or a subs
- Agentic Prompt Patterns - A large language model (LLM) with general-purpose capabilities serves as the main brain, agent module, or coordinator of the system. This component will be activated using a prompt template that entails important details
- Prompt Engineering Patterns - **Problem**: Too many rules make the agent inflexible and unable to handle edge cases. **Example**: ``` NEVER skip a search task. ALWAYS perform exactly 3 searches. NEVER combine similar queries. ``` **Better Approach**:
- Agent Performance Metrics - Track these metrics to evaluate context engineering effectiveness: 1. **Task Completion Rate**: Percentage of tasks completed successfully 2. **Behavioral Consistency**: Similarity of agent behavior across similar inputs
- Context Engineering - ### The Iterative Nature of Improving Context[](#the-iterative-nature-of-improving-context) Context engineering is not a one-time effort. The development process involves: 1. **Initial implementation** with basic system
- Context Validation Frameworks - Evaluation is key to ensuring context engineering techniques are working as they should for your AI agents. Before deployment, validate your context design: - **Completeness**: Does it cover all important scenarios? - **
- Prompt Engineering - Provide explicit instructions about the agent's workflow: ``` ## GENERAL INSTRUCTIONS The user will provide a query, and you will convert that query into a search plan with multiple search tasks (3 web searches). You wil
- Tool Definition Patterns - ### The Importance of Detailed Tool Descriptions[](#the-importance-of-detailed-tool-descriptions) Tool definitions typically appear in two places: 1. **In the system prompt**: Detailed explanations of what tools do and w
- Code Editors - Example coming soon!
- System Prompt Guardrails - Similar to the Mistral 7B model (opens in a new tab), it's possible to enforce guardrails in chat generations using the `safe_prompt` boolean flag in the API by setting `safe_mode=True`: ``` # helpful completion function
- Prompt Injection Vulnerabilities - Prompt leaking is another type of prompt injection where prompt attacks are designed to leak details from the prompt which could contain confidential or proprietary information that was not intended for the public. A lot
- Agentic Planning Patterns - #### Planning for Agentic Systems[](#planning-for-agentic-systems) When building agentic systems, **planning** is an important component to enable the system to better perform complex tasks. As an example, when building
- Function Calling Interfaces - You can also use the Code Llama models for function calling. However, the Code Llama 70B Instruct model provided via the together.ai APIs currently don't support this feature. So for now we went ahead and provided an exa
- Visual Reasoning Systems - Models like o3 can leverage multi-tool use capabilities to perform advanced visual reasoning (opens in a new tab) and perform tasks such as reasoning about images and even modifying images (e.g., zoom, crop, rotate, etc.
- AI Agent Debugging Tools - When building AI agents, you'll inevitably encounter situations where the agent doesn't behave as expected. Maybe it's calling the wrong tool, passing incorrect arguments, or failing to call a tool when it should. This i
- Agent Development Guides - Based on practical experience building agents, here are key recommendations for effective tool definitions: **Be Specific in Descriptions** Instead of "Search the web", use "Search the web for current information. Use th
- AI Agent Engineering Guides - Context engineering is a critical practice for building reliable AI agents that requires: - **Significant iteration time** spent tuning prompts and tool definitions - **Careful architectural decisions** about agent separ
- Prompt Engineering Resources - Building effective AI agents requires substantial tuning of system prompts and tool definitions. The process involves spending hours iterating on: - System prompt design and refinement - Tool definitions and usage instru
- System Prompt Engineering - Here is the full system prompt for the deep research agent we built in n8n: ``` You are a deep research agent who will help with planning and executing search tasks to generate a deep research report. ## GENERAL INSTRUCT
- Prompt Engineering Guides - #### Detailed prompts give you more control.[](#detailed-prompts-give-you-more-control) If your prompt is not descriptive, ChatGPT often fills in additional details. This can be useful for quick tests or exploration, but
- Agent Architectures - In this section, we provide an overview of LLM-based agents, including definitions, common design patterns, tips, use cases, and applications. This content is based on our new course "Building Effective AI Agents with n8
- Chain-of-Thought Reasoning - A new paper by Lee et al. (2024) (opens in a new tab) proposes to improve reasoning in LLMs using small language models. It first applies knowledge distillation to a small LM with rationales generated by the large LM wit
- LLM Performance Benchmarks - This new paper by Machlab and Battle (2024) (opens in a new tab) analyzes the in-context recall performance of different LLMs using several needle-in-a-haystack tests. It shows that various LLMs recall facts at different
- Model Evaluation Leaderboards - The authors have also published a leaderboard here (opens in a new tab). For example, the table below shows how the different models measure on the truthfulness dimension. As mentioned on their website, "More trustworthy
- Reasoning Models - Sun et al. (2023) (opens in a new tab) recently proposed an overview of reasoning with foundation models which focuses on the latest advancements in various reasoning tasks. This work also focuses on a more extensive loo
- Retrieval Augmented Generation Paradigms - Over the past few years, RAG systems have evolved from Naive RAG to Advanced RAG and Modular RAG. This evolution has occurred to address certain limitations around performance, cost, and efficiency. *Figure Source (opens
- LLM Truthfulness Prompts - This section contains a collection of prompts for exploring truthfulness in LLMs. Last updated on Sun Dec 28 2025 Sponsored by Explain A ConceptHallucination Identification
- Prompt Engineering Guides - You can achieve a lot with simple prompts, but the quality of results depends on how much information you provide it and how well-crafted the prompt is. A prompt can contain information like the *instruction* or *questio
- Algorithmic Biases - LLMs can produce problematic generations that can potentially be harmful and display biases that could deteriorate the performance of the model on downstream tasks. Some of these can be mitigated through effective prompt
- Code Debugging Assistants - We can use the model to help debug a piece of code. Let's say we want to get feedback from the model on a piece of code we wrote to check for bugs. Here is an example demonstrating this capability: ``` messages = [ { "ro
- LLM Evaluation Frameworks - This section contains a collection of prompts for testing the capabilities of LLMs to be used for evaluation which involves using the LLMs themselves as a judge. Evaluate Plato's Dialogue Last updated on Sun Dec 28 2025
- AI Workflow Patterns - ### Pattern 1: Prompt Chaining[](#pattern-1-prompt-chaining) Prompt chaining involves breaking down a complex task into sequential LLM calls, where each step's output feeds into the next. **Example: Document Generation W
- Agent System Prompts - The system prompt begins with a clear definition of the agent's role: ``` You are a deep research agent who will help with planning and executing search tasks to generate a deep research report. ```
- Chatbot Interfaces - ### Multi-turn Conversations[](#multi-turn-conversations) To begin demonstrating the capabilities of ChatGPT, we will use the chatbot assistant example above and discuss the results. Compared to `text-davinci-003`, the `
- Context Engineering Strategies - Context engineering applies to all stages of the AI agent build process. Depending on the AI Agent, it's sometimes helpful to think of context as a hierarchical structure. For our basic agentic system, we can organize co
- Conversational Interfaces - To begin demonstrating the capabilities of ChatGPT, we will use the chatbot assistant example above and discuss the results. Compared to `text-davinci-003`, the `gpt-3.5-turbo` model that powers ChatGPT uses a chat forma
- Function Calling - At its core, function calling enables LLMs to interact with external tools, APIs, and knowledge bases. When an LLM receives a query that requires information or actions beyond its training data, it can decide to call an
- LLM Guardrails - When building with LLMs for real-world applications, it's important to enforce guardrails. The Mistral 7B model makes it possible to leverage system prompting to enforce output constraints. In addition, Mistral 7B also p
- Long Context Retrieval Models - Mixtral also shows strong performance in retrieving information from its context window of 32k tokens no matter information location and sequence length. To measure Mixtral's ability to handle long context, it was evalua
- Model Configuration - The first step is to configure model access. Let's install the following libraries to get started: ``` %%capture !pip install openai !pip install pandas ``` Let's import the necessary libraries and set the `TOGETHER_API_
- Prompt Engineering Strategies - LLMs have strong capabilities to generate coherent text. Using effective prompt strategies can steer the model to produce better, consistent, and more factual responses. LLMs can also be especially useful for generating
- Reasoning Benchmarks - The table below shows how Mistral Large performs on common reasoning and knowledge benchmarks. It largely falls behind GPT-4 but it's the superior model compared to other LLMs like Claude 2 and Gemini Pro 1.0.
- Reasoning Models - Gemini models display impressive crossmodal reasoning capabilities. For instance, the figure below demonstrates a solution to a physics problem drawn by a teacher (left). Gemini is then prompted to reason about the quest
- Synthetic Data Generators - ## Synthetic Data for RAG Setup[](#synthetic-data-for-rag-setup) Unfortunately, in the life of a Machine Learning Engineer, there's often a lack of labeled data or very little of it. Typically, upon realizing this, proje
- Vector Database Pipelines - 1. **Data Preparation:** First convert the readme file (containing the summaries) into a plain text file. 2. **Utilizing the Gemini API:** You can upload the text file using the Google `generativeai` library. 3. **Implem
- Prompt Engineering Guides - The instruction is the high-level instructions provided to the system to instruct it exactly what to do. ``` You are an expert research planner. Your task is to break down a complex research query (delimited by ) into sp
- Artificial Intelligence Agent Tutorials - This is a huge shift in how we build with AI agents. Deep agents also feel like an important building block for what comes next: personalized proactive agents that can act on our behalf. I will write more on proactive ag
- AI Agent Development Guides - In this section, we provide an overview of LLM-based agents, including definitions, common design patterns, tips, use cases, and applications. This content is based on our new course "Building Effective AI Agents with n8
- AI Agent Tutorials - Let's explore context engineering principles through an example: a minimal deep research agent that performs web searches and generates reports. ### The Context Engineering Challenge[](#the-context-engineering-challenge)
- Prompt Engineering Guides - Let’s look at a concrete example of some recent context engineering work I did for a multi-agent deep research application I built for personal use. I built the agentic workflow inside of n8n, but the tool doesn’t matter
- Code Explanation Utilities - If you are learning to program in a certain language, it might be useful to prompt the model to explain certain bits of code. Let's reuse the query generated above and ask the model to explain it. If you are using the sa
- Multi-Parameter Function Invocations - Let's create a function that generates a password by taking five input parameters, and outputs the generated password. *Prompt:* ``` function_name: [pg] input: ["length", "capitalized", "lowercase", "numbers", "special"]
- Image Generation Techniques - It helps to specify the aspect ratio you want in your prompt, even when using a reference image. The model can select the correct aspect ratio if it has clues in the prompt (e.g. images of rockets are often 2:3), but def
- Translation Models - Gemini 1.5 Pro can be provided a grammar manual (500 pages of linguistic documentation, a dictionary, and ~400 parallel sentences) for Kalamang, a language spoken by fewer than 200 speakers worldwide, and translates Engl
- Prompt Engineering Datasets - This section contains a collection of prompts for testing the question answering capabilities of LLMs. Closed Domain Question AnsweringOpen Domain Question AnsweringScience Question Answering Last updated on Sun Dec 28 2
- Research Overviews - - The Prompt Report: A Systematic Survey of Prompting Techniques (opens in a new tab) (June 2024) - Prompt Design and Engineering: Introduction and Advanced Methods (opens in a new tab) (January 2024) - A Survey on Hallu
- Retrieval Augmented Generation Guides - As better introduced here (opens in a new tab), RAG can be defined as: > RAG takes input and retrieves a set of relevant/supporting documents given a source (e.g., Wikipedia). The documents are concatenated as context wi
- Model Architectures - Gemini 1.5 Pro is a sparse mixture-of-experts (MoE) Transformer based model built on Gemini 1.0's multimodal capabilities. The benefit of MoE is that the total parameters of the model can grow while keeping the number of
- Question Answering Formats - QA format is useful for scenarios where you are asking the model a question and want a concise answer in return. You can use the following prompt template: ``` Instruct: {{prompt}} Output: ``` Here is an example: *Prompt
- Infinite Context Architectures - A new paper (opens in a new tab) by Google integrates compressive memory into a vanilla dot-product attention layer. The goal is to enable Transformer LLMs to effectively process infinitely long inputs with bounded memor
- Small Language Model Evaluations - LLM researchers are keen to explore whether small language models have similar emergent capabilities as their large counterparts and if there are techniques for training that can help to achieve this. The model is traine
- Model Benchmarks - Notably, Llama 3 8B (instruction-tuned) outperforms Gemma 7B (opens in a new tab) and Mistral 7B Instruct (opens in a new tab). Llama 3 70 broadly outperforms Gemini Pro 1.5 (opens in a new tab) and Claude 3 Sonnet (open
- Professional Services - Welcome to our services page! Here you can find information about the services we offer. ## Our Offerings[](#our-offerings) ### Trainings & Workshops[](#trainings--workshops) We offer both cohort-based trainings for team
- Training Workshops - We offer both cohort-based trainings for teams to learn how to systematically apply proven techniques around prompt engineering, context engineering, RAG, and AI Agents. If you looking for a private training, please requ
- Classification Prompts - This section contains a collection of prompts for testing the test classification capabilities of LLMs. Sentiment ClassificationFew-Shot Sentiment Classification Last updated on Sun Dec 28 2025 Sponsored by Prompt HubSen
- Code Generation Prompts - One application where LLMs are quite effective is code generation. Copilot is a great example of this. There are a vast number of code-generation tasks you can perform with clever prompts. Let's look at a few examples be
- Conversational Persona Guides - Perhaps one of the more interesting things you can achieve with prompt engineering is instructing the LLM system on how to behave, its intent, and its identity. This is particularly useful when you are building conversat
- Prompt Engineering Experiments - | Short name | Description | | --- | --- | | Baseline | Provide a a job posting and asking if it is fit for a graduate. | | CoT | Give a few examples of accurate classification before querying. | | Zero-CoT | Ask the mod
- Sentiment Analysis Prompts - ## Background[](#background) This prompt tests an LLM's text classification capabilities by prompting it to classify a piece of text. ## Prompt[](#prompt) ``` Classify the text into neutral, negative, or positive Text: I
- Directional Stimulus Prompting - Li et al., (2023) (opens in a new tab) proposes a new prompting technique to better guide the LLM in generating the desired summary. A tuneable policy LM is trained to generate the stimulus/hint. Seeing more use of RL to
- Graph Prompting - Liu et al., 2023 (opens in a new tab) introduces GraphPrompt, a new prompting framework for graphs to improve performance on downstream tasks. More coming soon! ## Related Learning Course ### Prompt Engineering for LLMs
- Model Steering - One area for experimentation is the ability to steer the model to provide answers in a certain tone and style via the `system` messages. This can accelerate personalization and getting accurate and more precise results f
- Multimodal Chain-of-Thought Prompting - Zhang et al. (2023) (opens in a new tab) recently proposed a multimodal chain-of-thought prompting approach. Traditional CoT focuses on the language modality. In contrast, Multimodal CoT incorporates text and vision into
- Context Engineering - A few years ago, many, even top AI researchers, claimed that prompt engineering would be dead by now. Obviously, they were very wrong, and in fact, prompt engineering is now even more important than ever. It is so import
- Creative Writing Prompts - Infinite PrimesInterdisciplinaryInventing New WordsRhymes
- Multimodal Prompts - This section contains a collection of prompts for exploring the capabilities of LLMs and multimodal models. Draw a Person Using Alphabet Last updated on Sun Dec 28 2025 Sponsored by Extract Model NamesDraw a Person Using
- Prompt Optimization Strategies - If your prompt is not descriptive, ChatGPT often fills in additional details. This can be useful for quick tests or exploration, but if you have something specific in mind, write a detailed and descriptive prompt. If you
- Prompt Templates - Prompting Gemma 7B effectively requires being able to use the prompt template properly. In the following examples, we will cover a few examples that demonstrate the use effective use of the prompt template of Gemma 7B In
- Prompt Chaining - ### Prompt Chaining for Document QA[](#prompt-chaining-for-document-qa) Prompt chaining can be used in different scenarios that could involve several operations or transformations. For instance, one common use case of LL
- Deep Research Methodologies - - Introducing deep research | OpenAI (opens in a new tab) - Introduction to Deep Research (opens in a new tab) - OpenAI Deep Research: The Future of Autonomous Research and Analysis (opens in a new tab) - OpenAI’s 5-Stag
- Model Evaluation Reports - This work also presents a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Below are the main findings from the evaluation: - While proprietary LLMs generally outperform most open-source c
- Reasoning Elicitation Methods - Reasoning in LLMs can be elicited and enhanced using many different prompting approaches. Qiao et al. (2023) (opens in a new tab) categorized reasoning methods research into two different branches, namely reasoning enhan
- Retrieval Augmented Generation Analysis - This new paper by Wu et al. (2024) (opens in a new tab) aims to quantify the tug-of-war between RAG and LLMs' internal prior. It focuses on GPT-4 and other LLMs on question answering for the analysis. It finds that provi
- Retrieval Augmented Generation Research - In this overview, we discussed several research aspects of RAG research and different approaches for enhancing retrieval, augmentation, and generation of a RAG system. Here are several challenges emphasized by Gao et al.
- Retrieval Augmented Generation Research - In this overview, we discussed several research aspects of RAG research and different approaches for enhancing retrieval, augmentation, and generation of a RAG system. Here are several challenges emphasized by Gao et al.
- Memory Management Strategies - This first version of the deep research application I have built doesn’t require the use of short-term memory, but we have built a version of it that caches subqueries for different user queries. This is useful to achiev
- Modular RAG - As the name implies, Modular RAG enhances functional modules such as incorporating a search module for similarity retrieval and applying fine-tuning in the retriever. Both Naive RAG and Advanced RAG are special cases of
- Naive RAG Implementations - Naive RAG follows the traditional aforementioned process of indexing, retrieval, and generation. In short, a user input is used to query relevant documents which are then combined with a prompt and passed to the model to
- RAG Evaluation Frameworks - Similar to measuring the performance of LLMs on different aspects, evaluation plays a key role in understanding and optimizing the performance of RAG models across diverse application scenarios. Traditionally, RAG system
- Adversarial Simulation Environments - GPT-4 has improved in terms of safety, as many of the jailbreaking and prompt injection techniques described above are not as effective anymore. Simulations continue to be an effective technique to jailbreak the system.
- Adversarial Prompt Datasets - This adversarial prompt example aims to demonstrate the concept of jailbreaking which deals with bypassing the safety policies and guardrails of an LLM. Please note that the prompt example provided below is for raising a
- Synthetic Data Best Practices - This paper (opens in a new tab) provides an overview of best practices and lessons learned on synthetic data for language models ans was published by Google DeepMind and other collaborators. It focuses on synthetic data