Conversational RAG

Introduction

So far, we've treated each query independently. But real conversations build on previous exchanges:

User: "What authentication methods do you support?"
Assistant: "We support OAuth 2.0, API keys, and JWT tokens..."

User: "How do I set up the first one?"

"The first one" refers to OAuth—information from the previous exchange. This lesson explores Conversational RAG: patterns for handling multi-turn conversations where context from previous messages matters.

The Challenge of Context

Why Single-Turn RAG Falls Short

Standard RAG treats each query in isolation:

Query: "How do I set up the first one?"
Retrieved: [Random documents about "setup" and "first"]
Result: Confused or irrelevant response

The retrieval system doesn't know "the first one" means OAuth.

Types of Conversational Dependencies

Pronoun References:

"How do I configure it?" → What is "it"?
"Can you explain that more?" → What is "that"?

Implicit Context:

"What about security?" → Security of what?
"And the pricing?" → Pricing of what feature?

Follow-up Questions:

"What else should I know?" → About what topic?
"Are there any alternatives?" → To what?

History Management

Storing Conversation History

First, we need to track previous messages:

interface Message {
  id: string;
  role: 'user' | 'assistant';
  content: string;
  timestamp: Date;
}

interface Conversation {
  id: string;
  userId: string;
  messages: Message[];
  createdAt: Date;
  updatedAt: Date;
}

Database Schema for Conversations

CREATE TABLE conversations (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES auth.users(id),
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE,
  role TEXT NOT NULL CHECK (role IN ('user', 'assistant')),
  content TEXT NOT NULL,
  sources JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX messages_conversation_idx ON messages(conversation_id, created_at);

Loading Conversation History

async function getConversationHistory(
  conversationId: string,
  limit: number = 10
): Promise<Message[]> {
  const { data, error } = await supabase
    .from('messages')
    .select('*')
    .eq('conversation_id', conversationId)
    .order('created_at', { ascending: false })
    .limit(limit);

  if (error) throw error;

  // Return in chronological order
  return data.reverse();
}

Query Transformation

The Key Technique: Rewriting Queries

Before retrieval, we transform the user's query to include conversation context:

Conversation:
User: "What authentication methods do you support?"
Assistant: "We support OAuth 2.0, API keys, and JWT tokens..."

New Query: "How do I set up the first one?"
Transformed: "How do I set up OAuth 2.0 authentication?"

The transformed query is self-contained and can be used for retrieval.

LLM-Based Query Transformation

async function transformQuery(
  currentQuery: string,
  history: Message[]
): Promise<string> {
  if (history.length === 0) {
    return currentQuery;
  }

  const historyText = history
    .slice(-6)  // Last 3 exchanges
    .map(m => `${m.role}: ${m.content}`)
    .join('\n');

  const prompt = `Given the conversation history and the current query, rewrite the query to be self-contained. Include any context from the conversation that's necessary to understand the query. If the query is already self-contained, return it unchanged.

Conversation history:
${historyText}

Current query: ${currentQuery}

Self-contained query:`;

  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: prompt }] }],
    generationConfig: { temperature: 0, maxOutputTokens: 200 }
  });

  return response.response.text().trim();
}

Examples of Query Transformation

History	Original Query	Transformed Query
Discussed OAuth setup	"What about refresh tokens?"	"How do refresh tokens work with OAuth 2.0?"
Talked about pricing tiers	"Which one is best for startups?"	"Which pricing tier is best for startups?"
Explained rate limits	"How can I increase it?"	"How can I increase the API rate limits?"

The Conversational RAG Pipeline

Complete Flow

┌────────────────────────────────────────────────────────────────┐
│                    CONVERSATIONAL RAG                          │
│                                                                │
│  1. Load History  ──▶  2. Transform Query  ──▶  3. Retrieve    │
│         │                     │                      │         │
│  [Previous         [Self-contained       [Documents      │
│   messages]         query]               with context]   │
│                           │                      │         │
│                           ▼                      ▼         │
│                    4. Build Prompt  ◀────────────┘         │
│                           │                                 │
│                    [History + Context + Query]             │
│                           │                                 │
│                           ▼                                 │
│                    5. Generate Response                    │
│                           │                                 │
│                    6. Save to History                      │
└────────────────────────────────────────────────────────────────┘

Implementation

// app/api/chat/route.ts

export async function POST(request: Request) {
  const { message, conversationId } = await request.json();

  // 1. Load conversation history
  let history: Message[] = [];
  if (conversationId) {
    history = await getConversationHistory(conversationId);
  }

  // 2. Transform query with history context
  const transformedQuery = await transformQuery(message, history);

  // 3. Retrieve relevant documents using transformed query
  const queryEmbedding = await embedQuery(transformedQuery);
  const { data: docs } = await supabase.rpc('search_docs', {
    query_embedding: queryEmbedding,
    match_count: 5
  });

  // 4. Build prompt with history and context
  const context = docs
    .map((d: any) => `[${d.source}]\n${d.content}`)
    .join('\n\n---\n\n');

  const historyPrompt = history
    .slice(-4)  // Last 2 exchanges
    .map(m => `${m.role === 'user' ? 'User' : 'Assistant'}: ${m.content}`)
    .join('\n\n');

  const fullPrompt = `CONVERSATION HISTORY:
${historyPrompt}

RELEVANT DOCUMENTATION:
${context}

---

USER QUESTION: ${message}`;

  // 5. Generate response
  const response = await generateStreamingResponse(fullPrompt);

  // 6. Save messages to history
  const newConversationId = conversationId || crypto.randomUUID();

  await supabase.from('messages').insert([
    {
      conversation_id: newConversationId,
      role: 'user',
      content: message
    },
    {
      conversation_id: newConversationId,
      role: 'assistant',
      content: await collectStream(response),
      sources: docs.map((d: any) => d.source)
    }
  ]);

  return new Response(response, {
    headers: {
      'Content-Type': 'text/event-stream',
      'X-Conversation-Id': newConversationId
    }
  });
}

Prompt Design for Conversations

Including History in the Prompt

The system instruction should acknowledge conversation context:

const systemInstruction = `You are a helpful documentation assistant. You are having an ongoing conversation with a user about our product.

IMPORTANT:
1. Use the CONVERSATION HISTORY to understand context from previous exchanges
2. Use the RELEVANT DOCUMENTATION to answer the current question
3. If the user refers to something discussed earlier, acknowledge that context
4. Stay consistent with previous answers unless the documentation contradicts them

If you don't have enough information in the documentation to answer, say so.`;

History Window Size

Don't include unlimited history—it wastes tokens and can confuse the model:

const MAX_HISTORY_MESSAGES = 6;  // ~3 exchanges
const MAX_HISTORY_TOKENS = 2000; // Or limit by tokens

function trimHistory(history: Message[]): Message[] {
  // Take most recent messages
  let trimmed = history.slice(-MAX_HISTORY_MESSAGES);

  // Further trim if too many tokens
  while (estimateTokens(trimmed) > MAX_HISTORY_TOKENS && trimmed.length > 2) {
    trimmed = trimmed.slice(1);
  }

  return trimmed;
}

Advanced Patterns

Conversation Summarization

For long conversations, summarize older exchanges:

async function summarizeHistory(messages: Message[]): Promise<string> {
  if (messages.length < 10) {
    return messages.map(m => `${m.role}: ${m.content}`).join('\n');
  }

  const oldMessages = messages.slice(0, -6);
  const recentMessages = messages.slice(-6);

  const summaryPrompt = `Summarize the key points from this conversation in 2-3 sentences:
${oldMessages.map(m => `${m.role}: ${m.content}`).join('\n')}

Summary:`;

  const summary = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: summaryPrompt }] }],
    generationConfig: { temperature: 0, maxOutputTokens: 150 }
  });

  return `[Earlier in conversation: ${summary.response.text()}]

Recent messages:
${recentMessages.map(m => `${m.role}: ${m.content}`).join('\n')}`;
}

Topic Detection

Track what the conversation is about:

async function detectTopics(history: Message[]): Promise<string[]> {
  const content = history.map(m => m.content).join(' ');

  const prompt = `What topics are discussed in this conversation? List up to 5 keywords:
${content}

Topics:`;

  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: prompt }] }],
    generationConfig: { temperature: 0, maxOutputTokens: 50 }
  });

  return response.response.text()
    .split(/[,\n]/)
    .map(t => t.trim())
    .filter(t => t.length > 0);
}

// Use topics to boost retrieval
const topics = await detectTopics(history);
// Add topics to query or use for filtering

Conversation Branches

Allow users to "go back" to earlier points:

interface ConversationBranch {
  id: string;
  parentMessageId: string;
  messages: Message[];
}

// When user says "let's go back to when we discussed X"
async function createBranch(
  conversationId: string,
  branchFromMessageId: string
): Promise<string> {
  const branchId = crypto.randomUUID();

  await supabase.from('conversation_branches').insert({
    id: branchId,
    conversation_id: conversationId,
    parent_message_id: branchFromMessageId
  });

  return branchId;
}

Summary

In this lesson, we explored Conversational RAG:

Key Takeaways:

Single-turn RAG misses context: Pronouns and references require conversation history
Query transformation is essential: Rewrite queries to be self-contained before retrieval
History management matters: Store and efficiently load conversation history
Limit history wisely: Too much history wastes tokens and confuses models
Advanced patterns help at scale: Summarization, topic detection, and branching

Next Steps

We've built sophisticated retrieval and conversation handling. In the final lesson, we'll address the practical concerns of Performance and Cost Optimization—making your RAG system fast and affordable at scale.

"A good conversation remembers where it's been while looking where it's going." — Unknown