Module 9: Querying and Filtering

Beyond Basic Similarity Search

Introduction

Real-world applications need more than just "find similar vectors." You need to filter by metadata, combine multiple queries, and handle edge cases.

By the end of this module, you'll understand:

Advanced querying techniques
Metadata filtering strategies
Query optimization
Handling common edge cases

9.1 Anatomy of a Vector Query

Every vector query has these components:

const query = {
  // Required: The query vector
  vector: [0.1, -0.2, 0.3, ...],

  // How many results to return
  topK: 10,

  // Optional: Filter by metadata
  filter: { category: 'tutorial' },

  // Optional: Include metadata in results
  includeMetadata: true,

  // Optional: Minimum similarity threshold
  scoreThreshold: 0.7,

  // Optional: Which namespace/collection
  namespace: 'production'
}

9.2 Metadata Filtering

Why Filter?

Similarity search alone often isn't enough:

"Find similar products in electronics category"
"Find relevant documents from the last 30 days"
"Find matching users in my country"

Metadata filters narrow the search space.

Pre-filtering vs Post-filtering

Pre-filtering (recommended):

Apply metadata filters first
Search only within filtered subset
Efficient—reduces vectors to search

Post-filtering:

Find top K similar vectors
Filter results by metadata
Problem: Might return fewer than K results

Most vector databases use pre-filtering or a hybrid approach.

Filter Syntax Examples

Pinecone:

await index.query({
  vector: queryEmbedding,
  topK: 10,
  filter: {
    category: { $eq: 'electronics' },
    price: { $lte: 1000 },
    inStock: { $eq: true }
  }
})

// Logical operators
filter: {
  $and: [
    { category: { $in: ['electronics', 'computers'] } },
    { price: { $gte: 100, $lte: 500 } }
  ]
}

// OR conditions
filter: {
  $or: [
    { category: 'sale' },
    { discount: { $gte: 20 } }
  ]
}

pgvector (SQL):

SELECT id, content, 1 - (embedding <=> $1) as similarity
FROM documents
WHERE category = 'electronics'
  AND price <= 1000
  AND in_stock = true
ORDER BY embedding <=> $1
LIMIT 10;

Qdrant:

await client.search('products', {
  vector: queryEmbedding,
  limit: 10,
  filter: {
    must: [
      { key: 'category', match: { value: 'electronics' } },
      { key: 'price', range: { lte: 1000 } }
    ],
    should: [
      { key: 'featured', match: { value: true } }
    ]
  }
})

Chroma:

await collection.query({
  queryEmbeddings: [queryEmbedding],
  nResults: 10,
  where: {
    $and: [
      { category: { $eq: 'electronics' } },
      { price: { $lte: 1000 } }
    ]
  }
})

9.3 Multi-Vector Queries

Query with Multiple Vectors

Some use cases require querying with multiple vectors:

// Find items similar to any of these
async function multiVectorQuery(
  vectors: number[][],
  topK: number
): Promise<Result[]> {
  const allResults = await Promise.all(
    vectors.map(v => index.query({ vector: v, topK }))
  )

  // Merge and deduplicate
  const seen = new Set()
  const merged: Result[] = []

  for (const results of allResults) {
    for (const result of results.matches) {
      if (!seen.has(result.id)) {
        seen.add(result.id)
        merged.push(result)
      }
    }
  }

  // Re-rank by average similarity (or other strategy)
  return merged.sort((a, b) => b.score - a.score).slice(0, topK)
}

Query Expansion

Improve recall by querying with variations:

async function expandedQuery(query: string): Promise<Result[]> {
  // Generate query variations
  const variations = await generateVariations(query)
  // e.g., ["How do I reset password", "password reset help", "forgot password"]

  // Embed all variations
  const embeddings = await embedBatch(variations)

  // Query with each and merge results
  const allResults = await Promise.all(
    embeddings.map(e => index.query({ vector: e, topK: 20 }))
  )

  // Merge, deduplicate, and re-rank
  return mergeResults(allResults, 10)
}

9.4 Score Thresholds

Why Use Thresholds?

Without thresholds, you always get K results—even if they're not relevant.

// Problem: Always returns 10 results
const results = await index.query({ vector: queryEmbed, topK: 10 })
// Might include: { id: 'x', score: 0.15 } // Not relevant!

// Solution: Filter by score
const relevantResults = results.matches.filter(r => r.score >= 0.7)

Setting Appropriate Thresholds

Thresholds depend on your embedding model and use case:

Embedding Model	Good Match	Moderate	Poor
OpenAI text-embedding-3	> 0.8	0.6-0.8	< 0.6
Cohere embed	> 0.7	0.5-0.7	< 0.5
Local models	Varies

Recommendation: Test with your data to find appropriate thresholds.

// Analyze score distribution
async function analyzeScores(testQueries: string[]) {
  const scores: number[] = []

  for (const query of testQueries) {
    const results = await search(query, 100)
    scores.push(...results.map(r => r.score))
  }

  // Calculate percentiles
  scores.sort((a, b) => a - b)
  console.log('p10:', scores[Math.floor(scores.length * 0.1)])
  console.log('p50:', scores[Math.floor(scores.length * 0.5)])
  console.log('p90:', scores[Math.floor(scores.length * 0.9)])
}

9.5 Pagination

Offset-based Pagination

Simple but not ideal for vector search:

-- Page 2 with 10 results per page
SELECT * FROM documents
ORDER BY embedding <=> $1
LIMIT 10 OFFSET 10;

Problem: Expensive for large offsets.

Cursor-based Pagination

More efficient for deep pagination:

// First query
const page1 = await index.query({
  vector: queryEmbed,
  topK: 10
})
const lastScore = page1.matches[9].score
const lastId = page1.matches[9].id

// Next page: filter to scores below last
const page2 = await index.query({
  vector: queryEmbed,
  topK: 10,
  filter: {
    $or: [
      { _score: { $lt: lastScore } },
      { $and: [
        { _score: { $eq: lastScore } },
        { id: { $gt: lastId } }
      ]}
    ]
  }
})

Note: Not all vector databases support score-based filtering.

Fetch More Strategy

Simple and often sufficient:

async function paginatedSearch(
  query: number[],
  page: number,
  pageSize: number
): Promise<Result[]> {
  // Fetch all results up to current page
  const total = (page + 1) * pageSize
  const results = await index.query({ vector: query, topK: total })

  // Return only the current page
  const start = page * pageSize
  return results.matches.slice(start, start + pageSize)
}

9.6 Handling Edge Cases

Empty Results

async function searchWithFallback(query: string): Promise<Result[]> {
  const embedding = await embed(query)
  const results = await index.query({
    vector: embedding,
    topK: 10
  })

  // If no results meet threshold
  if (results.matches.every(r => r.score < 0.5)) {
    // Option 1: Return empty
    return []

    // Option 2: Return with warning
    return {
      results: [],
      warning: 'No relevant results found'
    }

    // Option 3: Try broader search
    const broader = await searchBroader(query)
    return broader
  }

  return results.matches
}

Very Long Queries

async function handleLongQuery(query: string): Promise<Result[]> {
  // Most embedding models have token limits
  const MAX_TOKENS = 8000

  if (countTokens(query) > MAX_TOKENS) {
    // Option 1: Truncate
    query = truncateToTokens(query, MAX_TOKENS)

    // Option 2: Summarize first
    query = await summarize(query)

    // Option 3: Chunk and multi-query
    const chunks = chunkText(query, MAX_TOKENS)
    const embeddings = await embedBatch(chunks)
    return multiVectorSearch(embeddings)
  }

  return normalSearch(query)
}

Duplicate Detection

async function searchWithDedup(
  query: number[],
  topK: number
): Promise<Result[]> {
  // Fetch extra to account for duplicates
  const results = await index.query({
    vector: query,
    topK: topK * 2
  })

  // Deduplicate by content similarity
  const unique: Result[] = []
  for (const result of results.matches) {
    const isDuplicate = unique.some(
      u => contentSimilarity(u.metadata.content, result.metadata.content) > 0.95
    )
    if (!isDuplicate) {
      unique.push(result)
    }
    if (unique.length >= topK) break
  }

  return unique
}

9.7 Query Optimization

Reduce Dimensions at Query Time

If your database supports it:

// OpenAI allows dimension reduction
const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: query,
  dimensions: 512  // Instead of 1536
})

Batch Queries

// Instead of sequential queries
for (const query of queries) {
  results.push(await search(query))  // Slow!
}

// Use batch queries if supported
const results = await batchSearch(queries)

// Or parallel execution
const results = await Promise.all(
  queries.map(q => search(q))
)

Cache Common Queries

import { LRUCache } from 'lru-cache'

const queryCache = new LRUCache<string, Result[]>({
  max: 1000,
  ttl: 1000 * 60 * 5  // 5 minutes
})

async function cachedSearch(query: string): Promise<Result[]> {
  const cached = queryCache.get(query)
  if (cached) return cached

  const results = await search(query)
  queryCache.set(query, results)
  return results
}

9.8 Query Patterns by Use Case

RAG (Retrieval-Augmented Generation)

async function ragQuery(question: string): Promise<Context[]> {
  const embedding = await embed(question)

  // Higher topK for more context options
  const results = await index.query({
    vector: embedding,
    topK: 10,
    includeMetadata: true
  })

  // Filter by relevance threshold
  const relevant = results.matches
    .filter(r => r.score > 0.6)
    .slice(0, 5)

  return relevant.map(r => ({
    content: r.metadata.content,
    source: r.metadata.source,
    score: r.score
  }))
}

Recommendation System

async function getRecommendations(
  userId: string,
  excludeIds: string[]
): Promise<Product[]> {
  // Get user's preference vector (average of liked items)
  const userVector = await getUserPreferenceVector(userId)

  const results = await index.query({
    vector: userVector,
    topK: 50,
    filter: {
      id: { $nin: excludeIds },  // Exclude already seen
      inStock: true
    }
  })

  // Diversify results
  return diversifyRecommendations(results.matches, 10)
}

Semantic Search with Facets

async function searchWithFacets(
  query: string,
  filters: Record<string, any>
): Promise<{ results: Result[]; facets: Facet[] }> {
  const embedding = await embed(query)

  // Main search
  const results = await index.query({
    vector: embedding,
    topK: 20,
    filter: buildFilter(filters)
  })

  // Get facet counts (separate queries or database feature)
  const facets = await getFacetCounts(embedding, filters)

  return { results: results.matches, facets }
}

Key Takeaways

Metadata filtering is essential for real applications
Pre-filtering is more efficient than post-filtering
Score thresholds prevent returning irrelevant results
Handle edge cases gracefully
Optimize queries with caching, batching, and dimension reduction

Exercise: Build a Filtered Search API

Create an API endpoint that supports:

Text query with semantic search
Category filter (single or multiple)
Date range filter
Minimum score threshold
Pagination

Test with sample data and verify all filters work correctly.

Next up: Module 10 - Metadata and Hybrid Search