Conversational RAG
Introduction
So far, we've treated each query independently. But real conversations build on previous exchanges:
User: "What authentication methods do you support?"
Assistant: "We support OAuth 2.0, API keys, and JWT tokens..."
User: "How do I set up the first one?"
"The first one" refers to OAuth—information from the previous exchange. This lesson explores Conversational RAG: patterns for handling multi-turn conversations where context from previous messages matters.
The Challenge of Context
Why Single-Turn RAG Falls Short
Standard RAG treats each query in isolation:
Query: "How do I set up the first one?"
Retrieved: [Random documents about "setup" and "first"]
Result: Confused or irrelevant response
The retrieval system doesn't know "the first one" means OAuth.
Types of Conversational Dependencies
Pronoun References:
- "How do I configure it?" → What is "it"?
- "Can you explain that more?" → What is "that"?
Implicit Context:
- "What about security?" → Security of what?
- "And the pricing?" → Pricing of what feature?
Follow-up Questions:
- "What else should I know?" → About what topic?
- "Are there any alternatives?" → To what?
History Management
Storing Conversation History
First, we need to track previous messages:
interface Message {
id: string;
role: 'user' | 'assistant';
content: string;
timestamp: Date;
}
interface Conversation {
id: string;
userId: string;
messages: Message[];
createdAt: Date;
updatedAt: Date;
}
Database Schema for Conversations
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES auth.users(id),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE,
role TEXT NOT NULL CHECK (role IN ('user', 'assistant')),
content TEXT NOT NULL,
sources JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX messages_conversation_idx ON messages(conversation_id, created_at);
Loading Conversation History
async function getConversationHistory(
conversationId: string,
limit: number = 10
): Promise<Message[]> {
const { data, error } = await supabase
.from('messages')
.select('*')
.eq('conversation_id', conversationId)
.order('created_at', { ascending: false })
.limit(limit);
if (error) throw error;
// Return in chronological order
return data.reverse();
}
Query Transformation
The Key Technique: Rewriting Queries
Before retrieval, we transform the user's query to include conversation context:
Conversation:
User: "What authentication methods do you support?"
Assistant: "We support OAuth 2.0, API keys, and JWT tokens..."
New Query: "How do I set up the first one?"
Transformed: "How do I set up OAuth 2.0 authentication?"
The transformed query is self-contained and can be used for retrieval.
LLM-Based Query Transformation
async function transformQuery(
currentQuery: string,
history: Message[]
): Promise<string> {
if (history.length === 0) {
return currentQuery;
}
const historyText = history
.slice(-6) // Last 3 exchanges
.map(m => `${m.role}: ${m.content}`)
.join('\n');
const prompt = `Given the conversation history and the current query, rewrite the query to be self-contained. Include any context from the conversation that's necessary to understand the query. If the query is already self-contained, return it unchanged.
Conversation history:
${historyText}
Current query: ${currentQuery}
Self-contained query:`;
const response = await model.generateContent({
contents: [{ role: 'user', parts: [{ text: prompt }] }],
generationConfig: { temperature: 0, maxOutputTokens: 200 }
});
return response.response.text().trim();
}
Examples of Query Transformation
| History | Original Query | Transformed Query |
|---|---|---|
| Discussed OAuth setup | "What about refresh tokens?" | "How do refresh tokens work with OAuth 2.0?" |
| Talked about pricing tiers | "Which one is best for startups?" | "Which pricing tier is best for startups?" |
| Explained rate limits | "How can I increase it?" | "How can I increase the API rate limits?" |
The Conversational RAG Pipeline
Complete Flow
┌────────────────────────────────────────────────────────────────┐
│ CONVERSATIONAL RAG │
│ │
│ 1. Load History ──▶ 2. Transform Query ──▶ 3. Retrieve │
│ │ │ │ │
│ [Previous [Self-contained [Documents │
│ messages] query] with context] │
│ │ │ │
│ ▼ ▼ │
│ 4. Build Prompt ◀────────────┘ │
│ │ │
│ [History + Context + Query] │
│ │ │
│ ▼ │
│ 5. Generate Response │
│ │ │
│ 6. Save to History │
└────────────────────────────────────────────────────────────────┘
Implementation
// app/api/chat/route.ts
export async function POST(request: Request) {
const { message, conversationId } = await request.json();
// 1. Load conversation history
let history: Message[] = [];
if (conversationId) {
history = await getConversationHistory(conversationId);
}
// 2. Transform query with history context
const transformedQuery = await transformQuery(message, history);
// 3. Retrieve relevant documents using transformed query
const queryEmbedding = await embedQuery(transformedQuery);
const { data: docs } = await supabase.rpc('search_docs', {
query_embedding: queryEmbedding,
match_count: 5
});
// 4. Build prompt with history and context
const context = docs
.map((d: any) => `[${d.source}]\n${d.content}`)
.join('\n\n---\n\n');
const historyPrompt = history
.slice(-4) // Last 2 exchanges
.map(m => `${m.role === 'user' ? 'User' : 'Assistant'}: ${m.content}`)
.join('\n\n');
const fullPrompt = `CONVERSATION HISTORY:
${historyPrompt}
RELEVANT DOCUMENTATION:
${context}
---
USER QUESTION: ${message}`;
// 5. Generate response
const response = await generateStreamingResponse(fullPrompt);
// 6. Save messages to history
const newConversationId = conversationId || crypto.randomUUID();
await supabase.from('messages').insert([
{
conversation_id: newConversationId,
role: 'user',
content: message
},
{
conversation_id: newConversationId,
role: 'assistant',
content: await collectStream(response),
sources: docs.map((d: any) => d.source)
}
]);
return new Response(response, {
headers: {
'Content-Type': 'text/event-stream',
'X-Conversation-Id': newConversationId
}
});
}
Prompt Design for Conversations
Including History in the Prompt
The system instruction should acknowledge conversation context:
const systemInstruction = `You are a helpful documentation assistant. You are having an ongoing conversation with a user about our product.
IMPORTANT:
1. Use the CONVERSATION HISTORY to understand context from previous exchanges
2. Use the RELEVANT DOCUMENTATION to answer the current question
3. If the user refers to something discussed earlier, acknowledge that context
4. Stay consistent with previous answers unless the documentation contradicts them
If you don't have enough information in the documentation to answer, say so.`;
History Window Size
Don't include unlimited history—it wastes tokens and can confuse the model:
const MAX_HISTORY_MESSAGES = 6; // ~3 exchanges
const MAX_HISTORY_TOKENS = 2000; // Or limit by tokens
function trimHistory(history: Message[]): Message[] {
// Take most recent messages
let trimmed = history.slice(-MAX_HISTORY_MESSAGES);
// Further trim if too many tokens
while (estimateTokens(trimmed) > MAX_HISTORY_TOKENS && trimmed.length > 2) {
trimmed = trimmed.slice(1);
}
return trimmed;
}
Advanced Patterns
Conversation Summarization
For long conversations, summarize older exchanges:
async function summarizeHistory(messages: Message[]): Promise<string> {
if (messages.length < 10) {
return messages.map(m => `${m.role}: ${m.content}`).join('\n');
}
const oldMessages = messages.slice(0, -6);
const recentMessages = messages.slice(-6);
const summaryPrompt = `Summarize the key points from this conversation in 2-3 sentences:
${oldMessages.map(m => `${m.role}: ${m.content}`).join('\n')}
Summary:`;
const summary = await model.generateContent({
contents: [{ role: 'user', parts: [{ text: summaryPrompt }] }],
generationConfig: { temperature: 0, maxOutputTokens: 150 }
});
return `[Earlier in conversation: ${summary.response.text()}]
Recent messages:
${recentMessages.map(m => `${m.role}: ${m.content}`).join('\n')}`;
}
Topic Detection
Track what the conversation is about:
async function detectTopics(history: Message[]): Promise<string[]> {
const content = history.map(m => m.content).join(' ');
const prompt = `What topics are discussed in this conversation? List up to 5 keywords:
${content}
Topics:`;
const response = await model.generateContent({
contents: [{ role: 'user', parts: [{ text: prompt }] }],
generationConfig: { temperature: 0, maxOutputTokens: 50 }
});
return response.response.text()
.split(/[,\n]/)
.map(t => t.trim())
.filter(t => t.length > 0);
}
// Use topics to boost retrieval
const topics = await detectTopics(history);
// Add topics to query or use for filtering
Conversation Branches
Allow users to "go back" to earlier points:
interface ConversationBranch {
id: string;
parentMessageId: string;
messages: Message[];
}
// When user says "let's go back to when we discussed X"
async function createBranch(
conversationId: string,
branchFromMessageId: string
): Promise<string> {
const branchId = crypto.randomUUID();
await supabase.from('conversation_branches').insert({
id: branchId,
conversation_id: conversationId,
parent_message_id: branchFromMessageId
});
return branchId;
}
Summary
In this lesson, we explored Conversational RAG:
Key Takeaways:
-
Single-turn RAG misses context: Pronouns and references require conversation history
-
Query transformation is essential: Rewrite queries to be self-contained before retrieval
-
History management matters: Store and efficiently load conversation history
-
Limit history wisely: Too much history wastes tokens and confuses models
-
Advanced patterns help at scale: Summarization, topic detection, and branching
Next Steps
We've built sophisticated retrieval and conversation handling. In the final lesson, we'll address the practical concerns of Performance and Cost Optimization—making your RAG system fast and affordable at scale.
"A good conversation remembers where it's been while looking where it's going." — Unknown

