Supabase Search Functionality (Postgres RPC)

Introduction

We have our documents chunked and stored with embeddings. Now we need to search them. This lesson focuses on building the search functionality in Supabase using Postgres Remote Procedure Calls (RPC).

By encapsulating our search logic in a database function, we gain clean abstractions, better performance, and the ability to add sophisticated filtering without changing client code.

Why Use RPC Functions?

The Alternative: Direct Queries

You could query the database directly from your application:

// Direct query approach
const { data } = await supabase
  .from('documents')
  .select('*')
  .order('embedding', { ascending: true })
  .limit(5);

But this has limitations:

pgvector operators (<=>, <->) aren't easily expressed in the Supabase client
Complex filtering logic clutters your application code
Changes require application deployments, not just database updates

The RPC Approach

Encapsulate search logic in a Postgres function:

// RPC approach
const { data } = await supabase.rpc('search_docs', {
  query_embedding: embedding,
  match_count: 5
});

Benefits:

Clean separation of concerns
Full access to Postgres/pgvector features
Atomic updates (change the function, all calls get the update)
Better performance (less data transfer, database-level optimization)

Building the Search Function

Basic Vector Search Function

Let's start with a fundamental search function:

CREATE OR REPLACE FUNCTION search_docs(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Understanding the Function

Function Signature:

CREATE OR REPLACE FUNCTION search_docs(
  query_embedding VECTOR(768),  -- Input: the query vector
  match_count INT DEFAULT 5     -- How many results to return
)

Return Type:

RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)

Returns a table structure with the fields needed for our application.

The Core Query:

SELECT
  d.id,
  d.content,
  d.source,
  d.title,
  1 - (d.embedding <=> query_embedding) AS similarity
FROM documents d
ORDER BY d.embedding <=> query_embedding
LIMIT match_count;

Key Points:

<=> is pgvector's cosine distance operator
Distance ranges from 0 (identical) to 2 (opposite)
We convert to similarity with 1 - distance
Results are ordered by distance (ascending = most similar first)

pgvector Distance Operators

pgvector provides three distance operators:

Operator	Metric	Range	Use Case
`<=>`	Cosine distance	0 to 2	Semantic similarity (most common)
`<->`	Euclidean distance (L2)	0 to ∞	When magnitude matters
`<#>`	Inner product	-∞ to ∞	Pre-normalized vectors

For RAG applications, cosine distance (<=>) is almost always the right choice because it focuses on direction (meaning) rather than magnitude (text length).

Adding Filtering Capabilities

Filter by Source

Often you want to search within specific documents:

CREATE OR REPLACE FUNCTION search_docs_filtered(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5,
  filter_source TEXT DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE
    filter_source IS NULL OR d.source = filter_source
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Usage:

// Search all documents
await supabase.rpc('search_docs_filtered', {
  query_embedding: embedding,
  match_count: 5
});

// Search only in getting-started.md
await supabase.rpc('search_docs_filtered', {
  query_embedding: embedding,
  match_count: 5,
  filter_source: 'getting-started.md'
});

Similarity Threshold

Only return results above a minimum similarity:

CREATE OR REPLACE FUNCTION search_docs_threshold(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5,
  similarity_threshold FLOAT DEFAULT 0.5
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE
    1 - (d.embedding <=> query_embedding) >= similarity_threshold
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Why use a threshold?

Low-similarity results may be irrelevant and could confuse the LLM. A threshold of 0.5-0.7 typically filters out noise.

Multi-Tenant Search

For applications where users have their own documents:

CREATE OR REPLACE FUNCTION search_user_docs(
  query_embedding VECTOR(768),
  user_id_param UUID,
  match_count INT DEFAULT 5
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
SECURITY DEFINER  -- Runs with function owner's permissions
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE d.user_id = user_id_param
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Security Note:

The SECURITY DEFINER modifier means the function runs with the permissions of the user who created it (typically the database owner), not the calling user. This bypasses RLS for the function's internal queries.

However, we explicitly filter by user_id, ensuring users only get their own documents. This pattern is useful when:

RLS would add overhead to vector operations
You need precise control over filtering logic
You want to combine security filtering with performance optimization

The Complete Production Function

Here's a comprehensive search function suitable for production:

CREATE OR REPLACE FUNCTION search_docs(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5,
  filter_source TEXT DEFAULT NULL,
  similarity_threshold FLOAT DEFAULT 0.0
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  chunk_index INT,
  similarity FLOAT
)
LANGUAGE plpgsql
STABLE  -- Optimization hint: function doesn't modify data
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    d.chunk_index,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE
    (filter_source IS NULL OR d.source = filter_source)
    AND (1 - (d.embedding <=> query_embedding)) >= similarity_threshold
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Performance Optimizations

STABLE modifier: Tells Postgres the function doesn't modify data and returns the same result for the same inputs within a transaction. Enables query optimization.

Index utilization: The ORDER BY d.embedding <=> query_embedding clause uses our vector index for fast approximate nearest neighbor search.

Limit before filter (consideration): The WHERE clause filters after ordering, which may be suboptimal for highly selective filters. For better performance with filters, consider:

-- More efficient for selective filters
RETURN QUERY
SELECT * FROM (
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    d.chunk_index,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE filter_source IS NULL OR d.source = filter_source
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count * 2  -- Fetch extra to account for threshold filtering
) sub
WHERE sub.similarity >= similarity_threshold
LIMIT match_count;

Calling the RPC from Next.js

Basic Usage

// lib/search.ts
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

interface SearchResult {
  id: string;
  content: string;
  source: string;
  title: string;
  chunk_index: number;
  similarity: number;
}

export async function searchDocuments(
  queryEmbedding: number[],
  options: {
    matchCount?: number;
    filterSource?: string;
    similarityThreshold?: number;
  } = {}
): Promise<SearchResult[]> {
  const {
    matchCount = 5,
    filterSource = null,
    similarityThreshold = 0.5
  } = options;

  const { data, error } = await supabase.rpc('search_docs', {
    query_embedding: queryEmbedding,
    match_count: matchCount,
    filter_source: filterSource,
    similarity_threshold: similarityThreshold
  });

  if (error) {
    throw new Error(`Search failed: ${error.message}`);
  }

  return data as SearchResult[];
}

Integration with the RAG Pipeline

// app/api/chat/route.ts
export async function POST(request: Request) {
  const { message } = await request.json();

  // 1. Generate query embedding
  const queryEmbedding = await embedQuery(message);

  // 2. Search for relevant documents
  const relevantDocs = await searchDocuments(queryEmbedding, {
    matchCount: 5,
    similarityThreshold: 0.6
  });

  // 3. Check if we found relevant context
  if (relevantDocs.length === 0) {
    return Response.json({
      answer: "I couldn't find relevant information to answer your question.",
      sources: []
    });
  }

  // 4. Build context from results
  const context = relevantDocs
    .map(doc => `[Source: ${doc.source}]\n${doc.content}`)
    .join('\n\n---\n\n');

  // 5. Generate response (covered in Module 3)
  const response = await generateResponse(context, message);

  return Response.json({
    answer: response,
    sources: relevantDocs.map(d => ({ source: d.source, title: d.title }))
  });
}

Testing the Search Function

Manual Testing in Supabase

You can test the RPC function directly in the Supabase SQL editor:

-- First, get an embedding for a test query
-- (You'd normally do this via the Gemini API)

-- For testing, use an embedding from an existing document
SELECT embedding
FROM documents
WHERE source = 'getting-started.md'
LIMIT 1;

-- Then search with that embedding
SELECT * FROM search_docs(
  '[0.023, -0.145, ...]'::vector(768),  -- paste embedding here
  5,      -- match_count
  NULL,   -- filter_source
  0.5     -- similarity_threshold
);

Automated Testing

// tests/search.test.ts
describe('search_docs RPC', () => {
  it('returns relevant documents for valid embedding', async () => {
    // Create a test document with known embedding
    const testEmbedding = await generateEmbedding('test authentication setup');

    await supabase.from('documents').insert({
      content: 'Authentication guide for setting up user login',
      embedding: testEmbedding,
      source: 'test-doc.md',
      title: 'Auth Guide'
    });

    // Search with similar query
    const queryEmbedding = await generateEmbedding('how to setup auth');
    const results = await searchDocuments(queryEmbedding);

    expect(results.length).toBeGreaterThan(0);
    expect(results[0].source).toBe('test-doc.md');
    expect(results[0].similarity).toBeGreaterThan(0.7);
  });

  it('respects similarity threshold', async () => {
    const queryEmbedding = await generateEmbedding('completely unrelated query xyz');

    const results = await searchDocuments(queryEmbedding, {
      similarityThreshold: 0.9  // Very high threshold
    });

    expect(results.length).toBe(0);  // Nothing should match
  });

  it('filters by source', async () => {
    const queryEmbedding = await generateEmbedding('authentication');

    const results = await searchDocuments(queryEmbedding, {
      filterSource: 'specific-doc.md'
    });

    results.forEach(r => {
      expect(r.source).toBe('specific-doc.md');
    });
  });
});

Summary

In this lesson, we built the search functionality that powers our RAG system:

Key Takeaways:

RPC functions encapsulate search logic: Cleaner code, better performance, atomic updates
pgvector's <=> operator enables cosine similarity: Convert distance to similarity with 1 - distance
Filtering improves relevance: Source filters and similarity thresholds reduce noise
Multi-tenant search requires explicit user filtering: Use SECURITY DEFINER with care
Test your search thoroughly: Verify results, thresholds, and filters work as expected

Module 2 Complete

You've completed Module 2: The Indexing Phase. You now understand:

How to prepare and chunk documents effectively
How to generate and store embeddings in Supabase
How to build search functionality with RPC functions

In Module 3, we'll explore the RAG Core—retrieval science, prompt engineering, and the generation phase where we call Gemini to produce grounded responses.

"A good search is the beginning of a good conversation." — Unknown

Supabase Search Functionality (Postgres RPC)

Introduction

By encapsulating our search logic in a database function, we gain clean abstractions, better performance, and the ability to add sophisticated filtering without changing client code.

Why Use RPC Functions?

The Alternative: Direct Queries

You could query the database directly from your application:

// Direct query approach
const { data } = await supabase
  .from('documents')
  .select('*')
  .order('embedding', { ascending: true })
  .limit(5);

But this has limitations:

pgvector operators (<=>, <->) aren't easily expressed in the Supabase client
Complex filtering logic clutters your application code
Changes require application deployments, not just database updates

The RPC Approach

Encapsulate search logic in a Postgres function:

// RPC approach
const { data } = await supabase.rpc('search_docs', {
  query_embedding: embedding,
  match_count: 5
});

Benefits:

Clean separation of concerns
Full access to Postgres/pgvector features
Atomic updates (change the function, all calls get the update)
Better performance (less data transfer, database-level optimization)

Building the Search Function

Basic Vector Search Function

Let's start with a fundamental search function:

CREATE OR REPLACE FUNCTION search_docs(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Understanding the Function

Function Signature:

CREATE OR REPLACE FUNCTION search_docs(
  query_embedding VECTOR(768),  -- Input: the query vector
  match_count INT DEFAULT 5     -- How many results to return
)

Return Type:

RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)

Returns a table structure with the fields needed for our application.

The Core Query:

SELECT
  d.id,
  d.content,
  d.source,
  d.title,
  1 - (d.embedding <=> query_embedding) AS similarity
FROM documents d
ORDER BY d.embedding <=> query_embedding
LIMIT match_count;

Key Points:

<=> is pgvector's cosine distance operator
Distance ranges from 0 (identical) to 2 (opposite)
We convert to similarity with 1 - distance
Results are ordered by distance (ascending = most similar first)

pgvector Distance Operators

pgvector provides three distance operators:

Operator	Metric	Range	Use Case
`<=>`	Cosine distance	0 to 2	Semantic similarity (most common)
`<->`	Euclidean distance (L2)	0 to ∞	When magnitude matters
`<#>`	Inner product	-∞ to ∞	Pre-normalized vectors

For RAG applications, cosine distance (<=>) is almost always the right choice because it focuses on direction (meaning) rather than magnitude (text length).

Adding Filtering Capabilities

Filter by Source

Often you want to search within specific documents:

CREATE OR REPLACE FUNCTION search_docs_filtered(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5,
  filter_source TEXT DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE
    filter_source IS NULL OR d.source = filter_source
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Usage:

// Search all documents
await supabase.rpc('search_docs_filtered', {
  query_embedding: embedding,
  match_count: 5
});

// Search only in getting-started.md
await supabase.rpc('search_docs_filtered', {
  query_embedding: embedding,
  match_count: 5,
  filter_source: 'getting-started.md'
});

Similarity Threshold

Only return results above a minimum similarity:

CREATE OR REPLACE FUNCTION search_docs_threshold(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5,
  similarity_threshold FLOAT DEFAULT 0.5
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE
    1 - (d.embedding <=> query_embedding) >= similarity_threshold
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Why use a threshold?

Low-similarity results may be irrelevant and could confuse the LLM. A threshold of 0.5-0.7 typically filters out noise.

Multi-Tenant Search

For applications where users have their own documents:

CREATE OR REPLACE FUNCTION search_user_docs(
  query_embedding VECTOR(768),
  user_id_param UUID,
  match_count INT DEFAULT 5
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
SECURITY DEFINER  -- Runs with function owner's permissions
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE d.user_id = user_id_param
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Security Note:

However, we explicitly filter by user_id, ensuring users only get their own documents. This pattern is useful when:

RLS would add overhead to vector operations
You need precise control over filtering logic
You want to combine security filtering with performance optimization

The Complete Production Function

Here's a comprehensive search function suitable for production:

CREATE OR REPLACE FUNCTION search_docs(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5,
  filter_source TEXT DEFAULT NULL,
  similarity_threshold FLOAT DEFAULT 0.0
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  chunk_index INT,
  similarity FLOAT
)
LANGUAGE plpgsql
STABLE  -- Optimization hint: function doesn't modify data
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    d.chunk_index,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE
    (filter_source IS NULL OR d.source = filter_source)
    AND (1 - (d.embedding <=> query_embedding)) >= similarity_threshold
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Performance Optimizations

STABLE modifier: Tells Postgres the function doesn't modify data and returns the same result for the same inputs within a transaction. Enables query optimization.

Index utilization: The ORDER BY d.embedding <=> query_embedding clause uses our vector index for fast approximate nearest neighbor search.

Limit before filter (consideration): The WHERE clause filters after ordering, which may be suboptimal for highly selective filters. For better performance with filters, consider:

-- More efficient for selective filters
RETURN QUERY
SELECT * FROM (
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    d.chunk_index,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE filter_source IS NULL OR d.source = filter_source
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count * 2  -- Fetch extra to account for threshold filtering
) sub
WHERE sub.similarity >= similarity_threshold
LIMIT match_count;

Calling the RPC from Next.js

Basic Usage

// lib/search.ts
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

interface SearchResult {
  id: string;
  content: string;
  source: string;
  title: string;
  chunk_index: number;
  similarity: number;
}

export async function searchDocuments(
  queryEmbedding: number[],
  options: {
    matchCount?: number;
    filterSource?: string;
    similarityThreshold?: number;
  } = {}
): Promise<SearchResult[]> {
  const {
    matchCount = 5,
    filterSource = null,
    similarityThreshold = 0.5
  } = options;

  const { data, error } = await supabase.rpc('search_docs', {
    query_embedding: queryEmbedding,
    match_count: matchCount,
    filter_source: filterSource,
    similarity_threshold: similarityThreshold
  });

  if (error) {
    throw new Error(`Search failed: ${error.message}`);
  }

  return data as SearchResult[];
}

Integration with the RAG Pipeline

// app/api/chat/route.ts
export async function POST(request: Request) {
  const { message } = await request.json();

  // 1. Generate query embedding
  const queryEmbedding = await embedQuery(message);

  // 2. Search for relevant documents
  const relevantDocs = await searchDocuments(queryEmbedding, {
    matchCount: 5,
    similarityThreshold: 0.6
  });

  // 3. Check if we found relevant context
  if (relevantDocs.length === 0) {
    return Response.json({
      answer: "I couldn't find relevant information to answer your question.",
      sources: []
    });
  }

  // 4. Build context from results
  const context = relevantDocs
    .map(doc => `[Source: ${doc.source}]\n${doc.content}`)
    .join('\n\n---\n\n');

  // 5. Generate response (covered in Module 3)
  const response = await generateResponse(context, message);

  return Response.json({
    answer: response,
    sources: relevantDocs.map(d => ({ source: d.source, title: d.title }))
  });
}

Testing the Search Function

Manual Testing in Supabase

You can test the RPC function directly in the Supabase SQL editor:

-- First, get an embedding for a test query
-- (You'd normally do this via the Gemini API)

-- For testing, use an embedding from an existing document
SELECT embedding
FROM documents
WHERE source = 'getting-started.md'
LIMIT 1;

-- Then search with that embedding
SELECT * FROM search_docs(
  '[0.023, -0.145, ...]'::vector(768),  -- paste embedding here
  5,      -- match_count
  NULL,   -- filter_source
  0.5     -- similarity_threshold
);

Automated Testing

// tests/search.test.ts
describe('search_docs RPC', () => {
  it('returns relevant documents for valid embedding', async () => {
    // Create a test document with known embedding
    const testEmbedding = await generateEmbedding('test authentication setup');

    await supabase.from('documents').insert({
      content: 'Authentication guide for setting up user login',
      embedding: testEmbedding,
      source: 'test-doc.md',
      title: 'Auth Guide'
    });

    // Search with similar query
    const queryEmbedding = await generateEmbedding('how to setup auth');
    const results = await searchDocuments(queryEmbedding);

    expect(results.length).toBeGreaterThan(0);
    expect(results[0].source).toBe('test-doc.md');
    expect(results[0].similarity).toBeGreaterThan(0.7);
  });

  it('respects similarity threshold', async () => {
    const queryEmbedding = await generateEmbedding('completely unrelated query xyz');

    const results = await searchDocuments(queryEmbedding, {
      similarityThreshold: 0.9  // Very high threshold
    });

    expect(results.length).toBe(0);  // Nothing should match
  });

  it('filters by source', async () => {
    const queryEmbedding = await generateEmbedding('authentication');

    const results = await searchDocuments(queryEmbedding, {
      filterSource: 'specific-doc.md'
    });

    results.forEach(r => {
      expect(r.source).toBe('specific-doc.md');
    });
  });
});

Summary

In this lesson, we built the search functionality that powers our RAG system:

Key Takeaways:

RPC functions encapsulate search logic: Cleaner code, better performance, atomic updates
pgvector's <=> operator enables cosine similarity: Convert distance to similarity with 1 - distance
Filtering improves relevance: Source filters and similarity thresholds reduce noise
Multi-tenant search requires explicit user filtering: Use SECURITY DEFINER with care
Test your search thoroughly: Verify results, thresholds, and filters work as expected

Module 2 Complete

You've completed Module 2: The Indexing Phase. You now understand:

How to prepare and chunk documents effectively
How to generate and store embeddings in Supabase
How to build search functionality with RPC functions

In Module 3, we'll explore the RAG Core—retrieval science, prompt engineering, and the generation phase where we call Gemini to produce grounded responses.

"A good search is the beginning of a good conversation." — Unknown

Supabase Search Functionality (Postgres RPC)

Introduction

Why Use RPC Functions?

The Alternative: Direct Queries

The RPC Approach

Building the Search Function

Basic Vector Search Function

Understanding the Function

pgvector Distance Operators

Adding Filtering Capabilities

Filter by Source

Similarity Threshold

Multi-Tenant Search

The Complete Production Function

Performance Optimizations

Calling the RPC from Next.js

Basic Usage

Integration with the RAG Pipeline

Testing the Search Function

Manual Testing in Supabase

Automated Testing

Summary

Module 2 Complete

Quiz

Supabase Search Functionality (Postgres RPC)

Introduction

Why Use RPC Functions?

The Alternative: Direct Queries

The RPC Approach

Building the Search Function

Basic Vector Search Function

Understanding the Function

pgvector Distance Operators

Adding Filtering Capabilities

Filter by Source

Similarity Threshold

Multi-Tenant Search

The Complete Production Function

Performance Optimizations

Calling the RPC from Next.js

Basic Usage

Integration with the RAG Pipeline

Testing the Search Function

Manual Testing in Supabase

Automated Testing

Summary

Module 2 Complete

Quiz