Improving Retrieval Quality
Introduction
The quality of your RAG system is only as good as its retrieval. If you retrieve irrelevant documents, even the best LLM will produce poor answers. This lesson explores techniques to improve retrieval quality: hybrid search, query expansion, and re-ranking.
These aren't just optimizations—they're often the difference between a useful assistant and a frustrating one.
The Limits of Pure Vector Search
Where Vector Search Falls Short
Vector similarity search is powerful, but it has limitations:
Exact Matches:
Query: "error code 404"
Best semantic match: "Client-side errors and troubleshooting"
Actual need: Document mentioning "404" specifically
Semantic search understands meaning but may miss exact terminology.
Rare Terms:
Query: "configure OIDC provider"
Semantic search: Finds general authentication docs
Missed: The one document that specifically mentions "OIDC"
Rare or technical terms may not be well-represented in embeddings.
Boolean Requirements:
Query: "authentication AND NOT OAuth"
Semantic search: Doesn't understand boolean logic
Users sometimes have specific inclusion/exclusion needs.
Hybrid Search: The Best of Both Worlds
Combining Vector and Full-Text Search
Hybrid search combines:
- Vector similarity: Understands meaning, finds semantically related content
- Full-text search (FTS): Matches exact keywords, handles rare terms
PostgreSQL Full-Text Search
Supabase/PostgreSQL has built-in FTS capabilities:
-- Add a full-text search column
ALTER TABLE documents
ADD COLUMN fts_vector tsvector
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
-- Index for fast FTS
CREATE INDEX documents_fts_idx ON documents USING gin(fts_vector);
Hybrid Search Function
CREATE OR REPLACE FUNCTION hybrid_search(
query_text TEXT,
query_embedding VECTOR(768),
match_count INT DEFAULT 10,
full_text_weight FLOAT DEFAULT 0.3,
semantic_weight FLOAT DEFAULT 0.7
)
RETURNS TABLE (
id UUID,
content TEXT,
source TEXT,
title TEXT,
score FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
WITH semantic_results AS (
SELECT
d.id,
d.content,
d.source,
d.title,
1 - (d.embedding <=> query_embedding) AS semantic_score,
0::float AS fts_score
FROM documents d
ORDER BY d.embedding <=> query_embedding
LIMIT match_count * 2
),
fts_results AS (
SELECT
d.id,
d.content,
d.source,
d.title,
0::float AS semantic_score,
ts_rank(d.fts_vector, plainto_tsquery('english', query_text)) AS fts_score
FROM documents d
WHERE d.fts_vector @@ plainto_tsquery('english', query_text)
ORDER BY fts_score DESC
LIMIT match_count * 2
),
combined AS (
SELECT
COALESCE(s.id, f.id) AS id,
COALESCE(s.content, f.content) AS content,
COALESCE(s.source, f.source) AS source,
COALESCE(s.title, f.title) AS title,
COALESCE(s.semantic_score, 0) AS semantic_score,
COALESCE(f.fts_score, 0) AS fts_score
FROM semantic_results s
FULL OUTER JOIN fts_results f ON s.id = f.id
)
SELECT
c.id,
c.content,
c.source,
c.title,
(c.semantic_score * semantic_weight + c.fts_score * full_text_weight) AS score
FROM combined c
ORDER BY score DESC
LIMIT match_count;
END;
$$;
Using Hybrid Search
const { data: results } = await supabase.rpc('hybrid_search', {
query_text: message,
query_embedding: embedding,
match_count: 5,
full_text_weight: 0.3,
semantic_weight: 0.7
});
When to Adjust Weights
| Content Type | Semantic Weight | FTS Weight |
|---|---|---|
| Technical docs with jargon | 0.5 | 0.5 |
| Conversational content | 0.8 | 0.2 |
| Code/API references | 0.4 | 0.6 |
| General knowledge | 0.7 | 0.3 |
Query Expansion
What is Query Expansion?
Query expansion enriches the user's query before retrieval:
Original: "auth setup"
Expanded: "authentication setup, configuration, OAuth, API keys, login"
This helps retrieve documents that use different terminology.
LLM-Based Query Expansion
Use the LLM to generate related terms:
async function expandQuery(query: string): Promise<string[]> {
const prompt = `Given this search query, generate 3-5 related search terms or phrases that might find relevant documentation. Return only the terms, one per line.
Query: "${query}"
Related terms:`;
const response = await model.generateContent({
contents: [{ role: 'user', parts: [{ text: prompt }] }],
generationConfig: { temperature: 0.3, maxOutputTokens: 100 }
});
const terms = response.response.text()
.split('\n')
.map(t => t.trim())
.filter(t => t.length > 0);
return [query, ...terms];
}
Multi-Query Retrieval
Search with multiple expanded queries:
async function expandedSearch(
originalQuery: string,
matchCount: number = 5
): Promise<SearchResult[]> {
// 1. Expand the query
const queries = await expandQuery(originalQuery);
// 2. Search with each query
const allResults: SearchResult[] = [];
for (const query of queries) {
const embedding = await embedQuery(query);
const { data } = await supabase.rpc('search_docs', {
query_embedding: embedding,
match_count: matchCount
});
allResults.push(...data);
}
// 3. Deduplicate and rank
const uniqueResults = deduplicateById(allResults);
const rankedResults = rankByFrequencyAndScore(uniqueResults);
return rankedResults.slice(0, matchCount);
}
Re-Ranking
The Re-Ranking Pattern
Re-ranking is a two-stage retrieval process:
- Retrieve: Get many candidates quickly (e.g., Top-20)
- Re-rank: Use a more sophisticated method to select the best
Why Re-Rank?
Initial retrieval is fast but approximate. Re-ranking allows:
- More expensive but accurate scoring
- Cross-comparison between candidates
- Use of additional signals (recency, popularity)
LLM-Based Re-Ranking
Use the LLM to judge relevance:
async function rerankWithLLM(
query: string,
candidates: SearchResult[],
topK: number = 5
): Promise<SearchResult[]> {
const candidateList = candidates
.map((c, i) => `[${i}] ${c.title}: ${c.content.slice(0, 200)}...`)
.join('\n\n');
const prompt = `Given the search query and candidate documents, rank the top ${topK} most relevant documents. Return only the document numbers in order of relevance, one per line.
Query: "${query}"
Candidates:
${candidateList}
Most relevant document numbers (most relevant first):`;
const response = await model.generateContent({
contents: [{ role: 'user', parts: [{ text: prompt }] }],
generationConfig: { temperature: 0, maxOutputTokens: 50 }
});
const rankings = response.response.text()
.match(/\d+/g)
?.map(Number)
.filter(n => n >= 0 && n < candidates.length)
.slice(0, topK) || [];
return rankings.map(i => candidates[i]);
}
Simple Score-Based Re-Ranking
Combine multiple signals without an LLM:
interface ScoredResult extends SearchResult {
semanticScore: number;
recencyScore: number;
popularityScore: number;
combinedScore: number;
}
function rerank(
results: SearchResult[],
weights: { semantic: number; recency: number; popularity: number }
): SearchResult[] {
const now = Date.now();
const scored: ScoredResult[] = results.map(r => {
// Recency: exponential decay over 90 days
const ageMs = now - new Date(r.updatedAt).getTime();
const ageDays = ageMs / (1000 * 60 * 60 * 24);
const recencyScore = Math.exp(-ageDays / 90);
// Popularity: normalize views to 0-1
const popularityScore = Math.min(r.viewCount / 1000, 1);
const combinedScore =
r.similarity * weights.semantic +
recencyScore * weights.recency +
popularityScore * weights.popularity;
return { ...r, semanticScore: r.similarity, recencyScore, popularityScore, combinedScore };
});
return scored.sort((a, b) => b.combinedScore - a.combinedScore);
}
Measuring Retrieval Quality
Key Metrics
Precision@K: Of the K documents retrieved, how many are relevant?
function precisionAtK(retrieved: string[], relevant: string[], k: number): number {
const topK = retrieved.slice(0, k);
const relevantInTopK = topK.filter(id => relevant.includes(id)).length;
return relevantInTopK / k;
}
Recall@K: Of all relevant documents, how many did we retrieve in the top K?
function recallAtK(retrieved: string[], relevant: string[], k: number): number {
const topK = retrieved.slice(0, k);
const relevantInTopK = topK.filter(id => relevant.includes(id)).length;
return relevantInTopK / relevant.length;
}
Mean Reciprocal Rank (MRR): How high is the first relevant result?
function mrr(retrieved: string[], relevant: string[]): number {
const firstRelevantIndex = retrieved.findIndex(id => relevant.includes(id));
if (firstRelevantIndex === -1) return 0;
return 1 / (firstRelevantIndex + 1);
}
Building a Test Set
Create a dataset of queries with known relevant documents:
interface RetrievalTestCase {
query: string;
relevantDocIds: string[];
}
const testCases: RetrievalTestCase[] = [
{
query: "How do I authenticate API requests?",
relevantDocIds: ["auth-guide", "api-reference-auth", "quick-start"]
},
{
query: "What are the rate limits?",
relevantDocIds: ["rate-limiting", "api-limits"]
}
// ... more test cases
];
async function evaluateRetrieval(
searchFn: (query: string) => Promise<SearchResult[]>
): Promise<{ precision: number; recall: number; mrr: number }> {
let totalPrecision = 0;
let totalRecall = 0;
let totalMrr = 0;
for (const testCase of testCases) {
const results = await searchFn(testCase.query);
const retrievedIds = results.map(r => r.id);
totalPrecision += precisionAtK(retrievedIds, testCase.relevantDocIds, 5);
totalRecall += recallAtK(retrievedIds, testCase.relevantDocIds, 5);
totalMrr += mrr(retrievedIds, testCase.relevantDocIds);
}
return {
precision: totalPrecision / testCases.length,
recall: totalRecall / testCases.length,
mrr: totalMrr / testCases.length
};
}
Summary
In this lesson, we explored techniques to improve retrieval quality:
Key Takeaways:
-
Pure vector search has limitations: Exact terms and rare vocabulary may be missed
-
Hybrid search combines strengths: Vector similarity + full-text search
-
Query expansion broadens reach: Multiple variations capture more relevant docs
-
Re-ranking improves precision: Two-stage retrieval with sophisticated second pass
-
Measure to improve: Precision, recall, and MRR track retrieval quality
Next Steps
Great retrieval is essential, but conversations require more. In the next lesson, we'll explore Conversational RAG—handling multi-turn conversations where context from previous exchanges matters.
"The best search engine is one that understands not just what you typed, but what you meant." — Unknown

