Module 9: Querying and Filtering
Beyond Basic Similarity Search
Introduction
Real-world applications need more than just "find similar vectors." You need to filter by metadata, combine multiple queries, and handle edge cases.
By the end of this module, you'll understand:
- Advanced querying techniques
- Metadata filtering strategies
- Query optimization
- Handling common edge cases
9.1 Anatomy of a Vector Query
Every vector query has these components:
const query = {
// Required: The query vector
vector: [0.1, -0.2, 0.3, ...],
// How many results to return
topK: 10,
// Optional: Filter by metadata
filter: { category: 'tutorial' },
// Optional: Include metadata in results
includeMetadata: true,
// Optional: Minimum similarity threshold
scoreThreshold: 0.7,
// Optional: Which namespace/collection
namespace: 'production'
}
9.2 Metadata Filtering
Why Filter?
Similarity search alone often isn't enough:
- "Find similar products in electronics category"
- "Find relevant documents from the last 30 days"
- "Find matching users in my country"
Metadata filters narrow the search space.
Pre-filtering vs Post-filtering
Pre-filtering (recommended):
- Apply metadata filters first
- Search only within filtered subset
- Efficient—reduces vectors to search
Post-filtering:
- Find top K similar vectors
- Filter results by metadata
- Problem: Might return fewer than K results
Most vector databases use pre-filtering or a hybrid approach.
Filter Syntax Examples
Pinecone:
await index.query({
vector: queryEmbedding,
topK: 10,
filter: {
category: { $eq: 'electronics' },
price: { $lte: 1000 },
inStock: { $eq: true }
}
})
// Logical operators
filter: {
$and: [
{ category: { $in: ['electronics', 'computers'] } },
{ price: { $gte: 100, $lte: 500 } }
]
}
// OR conditions
filter: {
$or: [
{ category: 'sale' },
{ discount: { $gte: 20 } }
]
}
pgvector (SQL):
SELECT id, content, 1 - (embedding <=> $1) as similarity
FROM documents
WHERE category = 'electronics'
AND price <= 1000
AND in_stock = true
ORDER BY embedding <=> $1
LIMIT 10;
Qdrant:
await client.search('products', {
vector: queryEmbedding,
limit: 10,
filter: {
must: [
{ key: 'category', match: { value: 'electronics' } },
{ key: 'price', range: { lte: 1000 } }
],
should: [
{ key: 'featured', match: { value: true } }
]
}
})
Chroma:
await collection.query({
queryEmbeddings: [queryEmbedding],
nResults: 10,
where: {
$and: [
{ category: { $eq: 'electronics' } },
{ price: { $lte: 1000 } }
]
}
})
9.3 Multi-Vector Queries
Query with Multiple Vectors
Some use cases require querying with multiple vectors:
// Find items similar to any of these
async function multiVectorQuery(
vectors: number[][],
topK: number
): Promise<Result[]> {
const allResults = await Promise.all(
vectors.map(v => index.query({ vector: v, topK }))
)
// Merge and deduplicate
const seen = new Set()
const merged: Result[] = []
for (const results of allResults) {
for (const result of results.matches) {
if (!seen.has(result.id)) {
seen.add(result.id)
merged.push(result)
}
}
}
// Re-rank by average similarity (or other strategy)
return merged.sort((a, b) => b.score - a.score).slice(0, topK)
}
Query Expansion
Improve recall by querying with variations:
async function expandedQuery(query: string): Promise<Result[]> {
// Generate query variations
const variations = await generateVariations(query)
// e.g., ["How do I reset password", "password reset help", "forgot password"]
// Embed all variations
const embeddings = await embedBatch(variations)
// Query with each and merge results
const allResults = await Promise.all(
embeddings.map(e => index.query({ vector: e, topK: 20 }))
)
// Merge, deduplicate, and re-rank
return mergeResults(allResults, 10)
}
9.4 Score Thresholds
Why Use Thresholds?
Without thresholds, you always get K results—even if they're not relevant.
// Problem: Always returns 10 results
const results = await index.query({ vector: queryEmbed, topK: 10 })
// Might include: { id: 'x', score: 0.15 } // Not relevant!
// Solution: Filter by score
const relevantResults = results.matches.filter(r => r.score >= 0.7)
Setting Appropriate Thresholds
Thresholds depend on your embedding model and use case:
| Embedding Model | Good Match | Moderate | Poor |
|---|---|---|---|
| OpenAI text-embedding-3 | > 0.8 | 0.6-0.8 | < 0.6 |
| Cohere embed | > 0.7 | 0.5-0.7 | < 0.5 |
| Local models | Varies |
Recommendation: Test with your data to find appropriate thresholds.
// Analyze score distribution
async function analyzeScores(testQueries: string[]) {
const scores: number[] = []
for (const query of testQueries) {
const results = await search(query, 100)
scores.push(...results.map(r => r.score))
}
// Calculate percentiles
scores.sort((a, b) => a - b)
console.log('p10:', scores[Math.floor(scores.length * 0.1)])
console.log('p50:', scores[Math.floor(scores.length * 0.5)])
console.log('p90:', scores[Math.floor(scores.length * 0.9)])
}
9.5 Pagination
Offset-based Pagination
Simple but not ideal for vector search:
-- Page 2 with 10 results per page
SELECT * FROM documents
ORDER BY embedding <=> $1
LIMIT 10 OFFSET 10;
Problem: Expensive for large offsets.
Cursor-based Pagination
More efficient for deep pagination:
// First query
const page1 = await index.query({
vector: queryEmbed,
topK: 10
})
const lastScore = page1.matches[9].score
const lastId = page1.matches[9].id
// Next page: filter to scores below last
const page2 = await index.query({
vector: queryEmbed,
topK: 10,
filter: {
$or: [
{ _score: { $lt: lastScore } },
{ $and: [
{ _score: { $eq: lastScore } },
{ id: { $gt: lastId } }
]}
]
}
})
Note: Not all vector databases support score-based filtering.
Fetch More Strategy
Simple and often sufficient:
async function paginatedSearch(
query: number[],
page: number,
pageSize: number
): Promise<Result[]> {
// Fetch all results up to current page
const total = (page + 1) * pageSize
const results = await index.query({ vector: query, topK: total })
// Return only the current page
const start = page * pageSize
return results.matches.slice(start, start + pageSize)
}
9.6 Handling Edge Cases
Empty Results
async function searchWithFallback(query: string): Promise<Result[]> {
const embedding = await embed(query)
const results = await index.query({
vector: embedding,
topK: 10
})
// If no results meet threshold
if (results.matches.every(r => r.score < 0.5)) {
// Option 1: Return empty
return []
// Option 2: Return with warning
return {
results: [],
warning: 'No relevant results found'
}
// Option 3: Try broader search
const broader = await searchBroader(query)
return broader
}
return results.matches
}
Very Long Queries
async function handleLongQuery(query: string): Promise<Result[]> {
// Most embedding models have token limits
const MAX_TOKENS = 8000
if (countTokens(query) > MAX_TOKENS) {
// Option 1: Truncate
query = truncateToTokens(query, MAX_TOKENS)
// Option 2: Summarize first
query = await summarize(query)
// Option 3: Chunk and multi-query
const chunks = chunkText(query, MAX_TOKENS)
const embeddings = await embedBatch(chunks)
return multiVectorSearch(embeddings)
}
return normalSearch(query)
}
Duplicate Detection
async function searchWithDedup(
query: number[],
topK: number
): Promise<Result[]> {
// Fetch extra to account for duplicates
const results = await index.query({
vector: query,
topK: topK * 2
})
// Deduplicate by content similarity
const unique: Result[] = []
for (const result of results.matches) {
const isDuplicate = unique.some(
u => contentSimilarity(u.metadata.content, result.metadata.content) > 0.95
)
if (!isDuplicate) {
unique.push(result)
}
if (unique.length >= topK) break
}
return unique
}
9.7 Query Optimization
Reduce Dimensions at Query Time
If your database supports it:
// OpenAI allows dimension reduction
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
dimensions: 512 // Instead of 1536
})
Batch Queries
// Instead of sequential queries
for (const query of queries) {
results.push(await search(query)) // Slow!
}
// Use batch queries if supported
const results = await batchSearch(queries)
// Or parallel execution
const results = await Promise.all(
queries.map(q => search(q))
)
Cache Common Queries
import { LRUCache } from 'lru-cache'
const queryCache = new LRUCache<string, Result[]>({
max: 1000,
ttl: 1000 * 60 * 5 // 5 minutes
})
async function cachedSearch(query: string): Promise<Result[]> {
const cached = queryCache.get(query)
if (cached) return cached
const results = await search(query)
queryCache.set(query, results)
return results
}
9.8 Query Patterns by Use Case
RAG (Retrieval-Augmented Generation)
async function ragQuery(question: string): Promise<Context[]> {
const embedding = await embed(question)
// Higher topK for more context options
const results = await index.query({
vector: embedding,
topK: 10,
includeMetadata: true
})
// Filter by relevance threshold
const relevant = results.matches
.filter(r => r.score > 0.6)
.slice(0, 5)
return relevant.map(r => ({
content: r.metadata.content,
source: r.metadata.source,
score: r.score
}))
}
Recommendation System
async function getRecommendations(
userId: string,
excludeIds: string[]
): Promise<Product[]> {
// Get user's preference vector (average of liked items)
const userVector = await getUserPreferenceVector(userId)
const results = await index.query({
vector: userVector,
topK: 50,
filter: {
id: { $nin: excludeIds }, // Exclude already seen
inStock: true
}
})
// Diversify results
return diversifyRecommendations(results.matches, 10)
}
Semantic Search with Facets
async function searchWithFacets(
query: string,
filters: Record<string, any>
): Promise<{ results: Result[]; facets: Facet[] }> {
const embedding = await embed(query)
// Main search
const results = await index.query({
vector: embedding,
topK: 20,
filter: buildFilter(filters)
})
// Get facet counts (separate queries or database feature)
const facets = await getFacetCounts(embedding, filters)
return { results: results.matches, facets }
}
Key Takeaways
- Metadata filtering is essential for real applications
- Pre-filtering is more efficient than post-filtering
- Score thresholds prevent returning irrelevant results
- Handle edge cases gracefully
- Optimize queries with caching, batching, and dimension reduction
Exercise: Build a Filtered Search API
Create an API endpoint that supports:
- Text query with semantic search
- Category filter (single or multiple)
- Date range filter
- Minimum score threshold
- Pagination
Test with sample data and verify all filters work correctly.
Next up: Module 10 - Metadata and Hybrid Search

