Improving Retrieval Quality

Introduction

The quality of your RAG system is only as good as its retrieval. If you retrieve irrelevant documents, even the best LLM will produce poor answers. This lesson explores techniques to improve retrieval quality: hybrid search, query expansion, and re-ranking.

These aren't just optimizations—they're often the difference between a useful assistant and a frustrating one.

The Limits of Pure Vector Search

Where Vector Search Falls Short

Vector similarity search is powerful, but it has limitations:

Exact Matches:

Query: "error code 404"
Best semantic match: "Client-side errors and troubleshooting"
Actual need: Document mentioning "404" specifically

Semantic search understands meaning but may miss exact terminology.

Rare Terms:

Query: "configure OIDC provider"
Semantic search: Finds general authentication docs
Missed: The one document that specifically mentions "OIDC"

Rare or technical terms may not be well-represented in embeddings.

Boolean Requirements:

Query: "authentication AND NOT OAuth"
Semantic search: Doesn't understand boolean logic

Users sometimes have specific inclusion/exclusion needs.

Hybrid Search: The Best of Both Worlds

Combining Vector and Full-Text Search

Hybrid search combines:

Vector similarity: Understands meaning, finds semantically related content
Full-text search (FTS): Matches exact keywords, handles rare terms

PostgreSQL Full-Text Search

Supabase/PostgreSQL has built-in FTS capabilities:

-- Add a full-text search column
ALTER TABLE documents
ADD COLUMN fts_vector tsvector
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;

-- Index for fast FTS
CREATE INDEX documents_fts_idx ON documents USING gin(fts_vector);

Hybrid Search Function

CREATE OR REPLACE FUNCTION hybrid_search(
  query_text TEXT,
  query_embedding VECTOR(768),
  match_count INT DEFAULT 10,
  full_text_weight FLOAT DEFAULT 0.3,
  semantic_weight FLOAT DEFAULT 0.7
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  score FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  WITH semantic_results AS (
    SELECT
      d.id,
      d.content,
      d.source,
      d.title,
      1 - (d.embedding <=> query_embedding) AS semantic_score,
      0::float AS fts_score
    FROM documents d
    ORDER BY d.embedding <=> query_embedding
    LIMIT match_count * 2
  ),
  fts_results AS (
    SELECT
      d.id,
      d.content,
      d.source,
      d.title,
      0::float AS semantic_score,
      ts_rank(d.fts_vector, plainto_tsquery('english', query_text)) AS fts_score
    FROM documents d
    WHERE d.fts_vector @@ plainto_tsquery('english', query_text)
    ORDER BY fts_score DESC
    LIMIT match_count * 2
  ),
  combined AS (
    SELECT
      COALESCE(s.id, f.id) AS id,
      COALESCE(s.content, f.content) AS content,
      COALESCE(s.source, f.source) AS source,
      COALESCE(s.title, f.title) AS title,
      COALESCE(s.semantic_score, 0) AS semantic_score,
      COALESCE(f.fts_score, 0) AS fts_score
    FROM semantic_results s
    FULL OUTER JOIN fts_results f ON s.id = f.id
  )
  SELECT
    c.id,
    c.content,
    c.source,
    c.title,
    (c.semantic_score * semantic_weight + c.fts_score * full_text_weight) AS score
  FROM combined c
  ORDER BY score DESC
  LIMIT match_count;
END;
$$;

Using Hybrid Search

const { data: results } = await supabase.rpc('hybrid_search', {
  query_text: message,
  query_embedding: embedding,
  match_count: 5,
  full_text_weight: 0.3,
  semantic_weight: 0.7
});

When to Adjust Weights

Content Type	Semantic Weight	FTS Weight
Technical docs with jargon	0.5	0.5
Conversational content	0.8	0.2
Code/API references	0.4	0.6
General knowledge	0.7	0.3

Query Expansion

What is Query Expansion?

Query expansion enriches the user's query before retrieval:

Original: "auth setup"
Expanded: "authentication setup, configuration, OAuth, API keys, login"

This helps retrieve documents that use different terminology.

LLM-Based Query Expansion

Use the LLM to generate related terms:

async function expandQuery(query: string): Promise<string[]> {
  const prompt = `Given this search query, generate 3-5 related search terms or phrases that might find relevant documentation. Return only the terms, one per line.

Query: "${query}"

Related terms:`;

  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: prompt }] }],
    generationConfig: { temperature: 0.3, maxOutputTokens: 100 }
  });

  const terms = response.response.text()
    .split('\n')
    .map(t => t.trim())
    .filter(t => t.length > 0);

  return [query, ...terms];
}

Multi-Query Retrieval

Search with multiple expanded queries:

async function expandedSearch(
  originalQuery: string,
  matchCount: number = 5
): Promise<SearchResult[]> {
  // 1. Expand the query
  const queries = await expandQuery(originalQuery);

  // 2. Search with each query
  const allResults: SearchResult[] = [];
  for (const query of queries) {
    const embedding = await embedQuery(query);
    const { data } = await supabase.rpc('search_docs', {
      query_embedding: embedding,
      match_count: matchCount
    });
    allResults.push(...data);
  }

  // 3. Deduplicate and rank
  const uniqueResults = deduplicateById(allResults);
  const rankedResults = rankByFrequencyAndScore(uniqueResults);

  return rankedResults.slice(0, matchCount);
}

Re-Ranking

The Re-Ranking Pattern

Re-ranking is a two-stage retrieval process:

Retrieve: Get many candidates quickly (e.g., Top-20)
Re-rank: Use a more sophisticated method to select the best

Why Re-Rank?

Initial retrieval is fast but approximate. Re-ranking allows:

More expensive but accurate scoring
Cross-comparison between candidates
Use of additional signals (recency, popularity)

LLM-Based Re-Ranking

Use the LLM to judge relevance:

async function rerankWithLLM(
  query: string,
  candidates: SearchResult[],
  topK: number = 5
): Promise<SearchResult[]> {
  const candidateList = candidates
    .map((c, i) => `[${i}] ${c.title}: ${c.content.slice(0, 200)}...`)
    .join('\n\n');

  const prompt = `Given the search query and candidate documents, rank the top ${topK} most relevant documents. Return only the document numbers in order of relevance, one per line.

Query: "${query}"

Candidates:
${candidateList}

Most relevant document numbers (most relevant first):`;

  const response = await model.generateContent({
    contents: [{ role: 'user', parts: [{ text: prompt }] }],
    generationConfig: { temperature: 0, maxOutputTokens: 50 }
  });

  const rankings = response.response.text()
    .match(/\d+/g)
    ?.map(Number)
    .filter(n => n >= 0 && n < candidates.length)
    .slice(0, topK) || [];

  return rankings.map(i => candidates[i]);
}

Simple Score-Based Re-Ranking

Combine multiple signals without an LLM:

interface ScoredResult extends SearchResult {
  semanticScore: number;
  recencyScore: number;
  popularityScore: number;
  combinedScore: number;
}

function rerank(
  results: SearchResult[],
  weights: { semantic: number; recency: number; popularity: number }
): SearchResult[] {
  const now = Date.now();

  const scored: ScoredResult[] = results.map(r => {
    // Recency: exponential decay over 90 days
    const ageMs = now - new Date(r.updatedAt).getTime();
    const ageDays = ageMs / (1000 * 60 * 60 * 24);
    const recencyScore = Math.exp(-ageDays / 90);

    // Popularity: normalize views to 0-1
    const popularityScore = Math.min(r.viewCount / 1000, 1);

    const combinedScore =
      r.similarity * weights.semantic +
      recencyScore * weights.recency +
      popularityScore * weights.popularity;

    return { ...r, semanticScore: r.similarity, recencyScore, popularityScore, combinedScore };
  });

  return scored.sort((a, b) => b.combinedScore - a.combinedScore);
}

Measuring Retrieval Quality

Key Metrics

Precision@K: Of the K documents retrieved, how many are relevant?

function precisionAtK(retrieved: string[], relevant: string[], k: number): number {
  const topK = retrieved.slice(0, k);
  const relevantInTopK = topK.filter(id => relevant.includes(id)).length;
  return relevantInTopK / k;
}

Recall@K: Of all relevant documents, how many did we retrieve in the top K?

function recallAtK(retrieved: string[], relevant: string[], k: number): number {
  const topK = retrieved.slice(0, k);
  const relevantInTopK = topK.filter(id => relevant.includes(id)).length;
  return relevantInTopK / relevant.length;
}

Mean Reciprocal Rank (MRR): How high is the first relevant result?

function mrr(retrieved: string[], relevant: string[]): number {
  const firstRelevantIndex = retrieved.findIndex(id => relevant.includes(id));
  if (firstRelevantIndex === -1) return 0;
  return 1 / (firstRelevantIndex + 1);
}

Building a Test Set

Create a dataset of queries with known relevant documents:

interface RetrievalTestCase {
  query: string;
  relevantDocIds: string[];
}

const testCases: RetrievalTestCase[] = [
  {
    query: "How do I authenticate API requests?",
    relevantDocIds: ["auth-guide", "api-reference-auth", "quick-start"]
  },
  {
    query: "What are the rate limits?",
    relevantDocIds: ["rate-limiting", "api-limits"]
  }
  // ... more test cases
];

async function evaluateRetrieval(
  searchFn: (query: string) => Promise<SearchResult[]>
): Promise<{ precision: number; recall: number; mrr: number }> {
  let totalPrecision = 0;
  let totalRecall = 0;
  let totalMrr = 0;

  for (const testCase of testCases) {
    const results = await searchFn(testCase.query);
    const retrievedIds = results.map(r => r.id);

    totalPrecision += precisionAtK(retrievedIds, testCase.relevantDocIds, 5);
    totalRecall += recallAtK(retrievedIds, testCase.relevantDocIds, 5);
    totalMrr += mrr(retrievedIds, testCase.relevantDocIds);
  }

  return {
    precision: totalPrecision / testCases.length,
    recall: totalRecall / testCases.length,
    mrr: totalMrr / testCases.length
  };
}

Summary

In this lesson, we explored techniques to improve retrieval quality:

Key Takeaways:

Pure vector search has limitations: Exact terms and rare vocabulary may be missed
Hybrid search combines strengths: Vector similarity + full-text search
Query expansion broadens reach: Multiple variations capture more relevant docs
Re-ranking improves precision: Two-stage retrieval with sophisticated second pass
Measure to improve: Precision, recall, and MRR track retrieval quality

Next Steps

Great retrieval is essential, but conversations require more. In the next lesson, we'll explore Conversational RAG—handling multi-turn conversations where context from previous exchanges matters.

"The best search engine is one that understands not just what you typed, but what you meant." — Unknown