Lesson 6.3: Hybrid Queries (Semantic + Structured)
The Power of Hybrid Search
Semantic search alone (imprecise):
-- Find similar documents
SELECT content FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 10;
-- Problem: Returns results user doesn't have access to
-- Or from wrong departments, or outdated content
SQL filters alone (brittle):
-- Keyword search
SELECT content FROM documents
WHERE content ILIKE '%machine learning%'
AND department = 'ai';
-- Problem: Misses synonyms, related concepts
-- "neural networks", "deep learning", "ML models" won't match
Hybrid (best of both):
SELECT
content,
title,
department,
embedding <=> query_embedding AS semantic_score,
ts_rank(search_vector, query) AS keyword_score,
-- Combined score
(embedding <=> query_embedding) * 0.7 +
(1 - ts_rank(search_vector, query)) * 0.3 AS combined_score
FROM documents
WHERE
-- Structured filters (fast, precise)
user_id = 123
AND department IN ('ai', 'engineering')
AND created_at > NOW() - INTERVAL '1 year'
-- Semantic filter (flexible, recalls related content)
AND embedding <=> query_embedding < 0.8
-- Keyword filter (catches exact matches)
AND search_vector @@ to_tsquery('machine | learning')
ORDER BY combined_score
LIMIT 20;
Pre-Filtering Strategy
Performance optimization:
-- Step 1: Filter with SQL (fast, uses B-tree indexes)
WITH filtered_docs AS (
SELECT id, content, title
FROM documents
WHERE user_id = 123
AND department = 'engineering'
AND created_at > NOW() - INTERVAL '6 months'
-- Reduces from 1M to 10k rows
)
-- Step 2: Vector search on subset (much faster)
SELECT
d.content,
e.embedding <=> query_embedding AS similarity
FROM filtered_docs d
JOIN embeddings e ON d.id = e.document_id
WHERE e.embedding <=> query_embedding < 0.7
ORDER BY similarity
LIMIT 10;
Key Takeaways
- Semantic search alone is imprecise - returns irrelevant or unauthorized results
- SQL filters alone are brittle - miss synonyms and related concepts
- Hybrid search combines semantic similarity with structured filters for best results
- Pre-filtering dramatically improves performance by reducing vector search scope
- Combined scoring allows balancing semantic and keyword relevance
- Full-text search (ts_rank) complements vector search for exact keyword matches
- Production systems should always combine semantic search with SQL filters

