Architectural Overview: The Next.js & Supabase Stack

Introduction

Now that we understand RAG conceptually and know how embeddings work, it's time to map these concepts to our specific technology stack. This lesson provides a comprehensive architectural overview of how Next.js, Supabase, and Gemini work together to create a production-ready RAG system.

By the end of this lesson, you'll have a clear mental model of the entire system—where each operation happens, how data flows between components, and why the architecture is designed this way.

The Full-Stack Flow

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         CLIENT BROWSER                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │   Chat UI   │    │   Message   │    │  Citations  │         │
│  │  Component  │    │   Display   │    │   Panel     │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
└────────────────────────────┬────────────────────────────────────┘
                             │ HTTP/WebSocket
┌────────────────────────────▼────────────────────────────────────┐
│                      NEXT.JS SERVER                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │  API Route  │    │   Context   │    │   Prompt    │         │
│  │  /api/chat  │───▶│  Builder    │───▶│  Assembler  │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│         │                  │                   │                │
│         │                  │                   ▼                │
│         │                  │          ┌─────────────┐          │
│         │                  │          │   Gemini    │          │
│         │                  │          │    Client   │          │
│         │                  │          └─────────────┘          │
└─────────┼──────────────────┼────────────────────────────────────┘
          │                  │
          │ Vector Search    │ Text Generation
          ▼                  ▼
┌─────────────────┐  ┌─────────────────┐
│    SUPABASE     │  │   GEMINI API    │
│  ┌───────────┐  │  │  ┌───────────┐  │
│  │ pgvector  │  │  │  │ Embedding │  │
│  │  search   │  │  │  │   Model   │  │
│  └───────────┘  │  │  └───────────┘  │
│  ┌───────────┐  │  │  ┌───────────┐  │
│  │ documents │  │  │  │ Generative│  │
│  │   table   │  │  │  │   Model   │  │
│  └───────────┘  │  │  └───────────┘  │
└─────────────────┘  └─────────────────┘

Component Responsibilities

Client Browser:

Renders the chat interface
Sends user queries to the API
Displays streamed responses
Shows citation/source information

Next.js Server:

Handles all API endpoints
Orchestrates the RAG pipeline
Manages authentication state
Protects sensitive API keys

Supabase:

Stores document chunks and embeddings
Executes vector similarity searches
Enforces row-level security
Manages user data

Gemini API:

Generates embeddings for queries
Produces text responses
Handles streaming output

Data Flow: Query to Response

Let's trace a complete user interaction through the system:

Step 1: User Sends Query

// Client-side
const response = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    message: "How do I configure authentication?"
  })
});

The query travels from the browser to the Next.js API route.

Step 2: Query Embedding

// Server-side: /api/chat/route.ts
const queryEmbedding = await embedQuery(message);
// Result: [0.023, -0.145, 0.087, ..., 0.234] (768 dimensions)

The server calls Gemini's embedding API to vectorize the user's question.

Step 3: Vector Search

// Server-side
const { data: relevantDocs } = await supabase
  .rpc('search_docs', {
    query_embedding: queryEmbedding,
    match_count: 5
  });

The embedding is sent to Supabase, which searches for similar vectors using pgvector's cosine similarity operator.

Step 4: Context Assembly

// Server-side
const context = relevantDocs
  .map(doc => `[Source: ${doc.source}]\n${doc.content}`)
  .join('\n\n---\n\n');

const prompt = `
You are a helpful assistant. Answer using ONLY the following context.

CONTEXT:
${context}

USER QUESTION: ${message}
`;

Retrieved documents are assembled into a prompt with clear instructions for the LLM.

Step 5: Generation

// Server-side
const stream = await gemini.generateContentStream({
  contents: [{ role: 'user', parts: [{ text: prompt }] }],
  systemInstruction: "You are a documentation assistant..."
});

The prompt is sent to Gemini's generative model, which produces a streamed response.

Step 6: Response Streaming

// Server-side - Return streaming response
return new Response(stream, {
  headers: { 'Content-Type': 'text/event-stream' }
});

// Client-side - Process stream
const reader = response.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Update UI with new tokens
}

Tokens are streamed back to the client as they're generated, providing immediate feedback.

Supabase as a Vector Database

The Power of pgvector

Supabase uses PostgreSQL with the pgvector extension, which adds native vector operations to the database. This is significant because:

Unified Storage: You don't need a separate vector database. Your user data, application data, and vector embeddings all live in the same PostgreSQL instance.

SQL Integration: Vector operations integrate naturally with SQL queries. You can filter by metadata, join with other tables, and use standard PostgreSQL features.

Familiar Tools: If you know Supabase and PostgreSQL, you already know 90% of what you need. The vector operations are just new operators and functions.

The Documents Table Schema

Here's a conceptual schema for storing document chunks:

CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  content TEXT NOT NULL,
  embedding VECTOR(768),

  -- Metadata for filtering and attribution
  source TEXT NOT NULL,
  title TEXT,
  chunk_index INTEGER,

  -- Timestamps
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW(),

  -- Optional: User ownership for multi-tenant systems
  user_id UUID REFERENCES auth.users(id)
);

-- Index for fast vector search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Key Fields:

Field	Purpose
`content`	The actual text chunk
`embedding`	768-dimensional vector from Gemini
`source`	File name or URL for attribution
`title`	Human-readable title
`chunk_index`	Position in original document
`user_id`	For multi-tenant access control

Vector Search Function (RPC)

For clean, reusable search logic, we encapsulate the vector search in a PostgreSQL function:

CREATE OR REPLACE FUNCTION search_docs(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5,
  filter_source TEXT DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE
    (filter_source IS NULL OR d.source = filter_source)
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

How it works:

<=> is pgvector's cosine distance operator (lower = more similar)
We convert distance to similarity: 1 - distance
Results are ordered by similarity
Optional filter_source parameter for scoped searches

Calling from Next.js:

const { data, error } = await supabase.rpc('search_docs', {
  query_embedding: embedding,
  match_count: 5,
  filter_source: 'getting-started.md' // optional
});

Security Architecture

Why RAG Operations Belong on the Server

A critical architectural decision: all RAG operations happen server-side. This is non-negotiable for production applications.

API Key Protection:

// NEVER in client code
const GEMINI_API_KEY = process.env.GEMINI_API_KEY; // Server only

// Client never sees this
const supabaseServiceKey = process.env.SUPABASE_SERVICE_KEY; // Server only

If API keys are exposed to the client:

Anyone can use your Gemini quota
Anyone can bypass database security
Your bills could skyrocket

Row-Level Security: Even with Supabase's client library, sensitive operations should go through API routes where you have full control over queries.

The Security Model

┌─────────────────────────────────────────────────────┐
│                   CLIENT BROWSER                    │
│                                                     │
│  • User's JWT token (from Supabase Auth)           │
│  • No API keys                                      │
│  • No direct database access for vectors           │
└──────────────────────┬──────────────────────────────┘
                       │ JWT Token
┌──────────────────────▼──────────────────────────────┐
│                   NEXT.JS API ROUTES                │
│                                                     │
│  • Validates JWT                                    │
│  • Holds GEMINI_API_KEY                            │
│  • Holds SUPABASE_SERVICE_KEY (for admin ops)      │
│  • Enforces business logic                         │
└──────────────────────┬──────────────────────────────┘
                       │ Service Key
┌──────────────────────▼──────────────────────────────┐
│                     SUPABASE                        │
│                                                     │
│  • RLS policies on documents table                 │
│  • User can only query their own documents         │
│  • Admin operations via service key                │
└─────────────────────────────────────────────────────┘

Row-Level Security for Vectors

For multi-tenant applications, RLS policies control which documents each user can search:

-- Enable RLS
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

-- Users can only read their own documents
CREATE POLICY "Users read own documents"
ON documents FOR SELECT
USING (auth.uid() = user_id);

-- Service key bypasses RLS (for admin/indexing)
-- This is automatic when using the service role key

With this policy:

User A's search only returns User A's documents
User B cannot access User A's knowledge base
The indexing script (using service key) can write any documents

The Indexing Pipeline

Offline vs. Online Operations

RAG systems have two distinct operational phases:

Offline (Indexing):

Runs when documents are added/updated
Can be batch processed
Doesn't block user requests
Often runs as a cron job or triggered by webhooks

Online (Query):

Runs when users ask questions
Must be fast (< 2-3 seconds)
Real-time requirement

Indexing Architecture

┌─────────────────────────────────────────────────────┐
│              DOCUMENT SOURCES                       │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐            │
│  │  PDFs   │  │Markdown │  │   Web   │            │
│  │         │  │  Files  │  │  Pages  │            │
│  └────┬────┘  └────┬────┘  └────┬────┘            │
└───────┼────────────┼────────────┼───────────────────┘
        │            │            │
        ▼            ▼            ▼
┌─────────────────────────────────────────────────────┐
│              INGESTION PIPELINE                     │
│  ┌────────────────────────────────────────────┐    │
│  │            Document Loader                  │    │
│  │    (Parse different file formats)          │    │
│  └────────────────────┬───────────────────────┘    │
│                       ▼                             │
│  ┌────────────────────────────────────────────┐    │
│  │            Text Chunker                     │    │
│  │    (Split into semantic units)             │    │
│  └────────────────────┬───────────────────────┘    │
│                       ▼                             │
│  ┌────────────────────────────────────────────┐    │
│  │         Embedding Generator                 │    │
│  │    (Gemini text-embedding-004)             │    │
│  └────────────────────┬───────────────────────┘    │
│                       ▼                             │
│  ┌────────────────────────────────────────────┐    │
│  │           Database Writer                   │    │
│  │    (Supabase documents table)              │    │
│  └────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────┘

Batch Processing Considerations

When indexing large document sets:

Chunk Processing:

// Process in batches to avoid memory issues
const BATCH_SIZE = 100;

for (let i = 0; i < chunks.length; i += BATCH_SIZE) {
  const batch = chunks.slice(i, i + BATCH_SIZE);
  const embeddings = await embedBatch(batch);
  await supabase.from('documents').insert(
    batch.map((chunk, j) => ({
      content: chunk.text,
      embedding: embeddings[j],
      source: chunk.source
    }))
  );
}

Rate Limiting: Respect API rate limits. Gemini has generous limits, but large-scale indexing should implement:

Request throttling
Exponential backoff on errors
Progress tracking and resumability

The Query Pipeline

Request Flow

// /app/api/chat/route.ts
export async function POST(request: Request) {
  // 1. Authentication
  const user = await getUser(request);
  if (!user) return unauthorized();

  // 2. Parse request
  const { message } = await request.json();

  // 3. Generate query embedding
  const queryEmbedding = await embedQuery(message);

  // 4. Search for relevant documents
  const { data: docs } = await supabase.rpc('search_docs', {
    query_embedding: queryEmbedding,
    match_count: 5
  });

  // 5. Build context
  const context = buildContext(docs);

  // 6. Generate response
  const stream = await generateResponse(context, message);

  // 7. Return streaming response
  return new StreamingResponse(stream);
}

Performance Optimization Points

Caching Query Embeddings: If users ask similar questions, cache embeddings:

const cacheKey = hashQuery(message);
let embedding = cache.get(cacheKey);
if (!embedding) {
  embedding = await embedQuery(message);
  cache.set(cacheKey, embedding, TTL);
}

Connection Pooling: Supabase handles this, but ensure your client is reused:

// lib/supabase.ts - Singleton client
let supabase: SupabaseClient;

export function getSupabase() {
  if (!supabase) {
    supabase = createClient(url, key);
  }
  return supabase;
}

Parallel Operations: When operations are independent, run them in parallel:

const [embedding, userPrefs] = await Promise.all([
  embedQuery(message),
  getUserPreferences(userId)
]);

Summary

In this lesson, we've mapped the RAG architecture to our specific technology stack:

Key Takeaways:

Clear separation of concerns: Client handles UI, server handles orchestration, Supabase stores data, Gemini provides AI capabilities
Server-side execution is mandatory: All sensitive operations (API calls, database queries) must run on the server
Supabase + pgvector = unified storage: No need for a separate vector database
RLS enables multi-tenancy: Users only see their own documents
Indexing is offline, querying is online: Different performance characteristics and requirements
The RPC pattern encapsulates search logic: Clean, reusable, and secure

Module 1 Complete

Congratulations! You've completed Module 1: Foundational Theory. You now understand:

What RAG is and why it's essential
How vector embeddings enable semantic search
The complete architecture of our Next.js + Supabase + Gemini stack

In Module 2, we'll dive into the Indexing Phase, where you'll learn the art of document chunking, vectorization strategies, and building a searchable knowledge base.

"Architecture is about making strategic decisions that are hard to change later. Get them right, and everything else becomes easier." — Martin Fowler

Architectural Overview: The Next.js & Supabase Stack

Introduction

By the end of this lesson, you'll have a clear mental model of the entire system—where each operation happens, how data flows between components, and why the architecture is designed this way.

The Full-Stack Flow

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         CLIENT BROWSER                          │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │   Chat UI   │    │   Message   │    │  Citations  │         │
│  │  Component  │    │   Display   │    │   Panel     │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
└────────────────────────────┬────────────────────────────────────┘
                             │ HTTP/WebSocket
┌────────────────────────────▼────────────────────────────────────┐
│                      NEXT.JS SERVER                             │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │  API Route  │    │   Context   │    │   Prompt    │         │
│  │  /api/chat  │───▶│  Builder    │───▶│  Assembler  │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│         │                  │                   │                │
│         │                  │                   ▼                │
│         │                  │          ┌─────────────┐          │
│         │                  │          │   Gemini    │          │
│         │                  │          │    Client   │          │
│         │                  │          └─────────────┘          │
└─────────┼──────────────────┼────────────────────────────────────┘
          │                  │
          │ Vector Search    │ Text Generation
          ▼                  ▼
┌─────────────────┐  ┌─────────────────┐
│    SUPABASE     │  │   GEMINI API    │
│  ┌───────────┐  │  │  ┌───────────┐  │
│  │ pgvector  │  │  │  │ Embedding │  │
│  │  search   │  │  │  │   Model   │  │
│  └───────────┘  │  │  └───────────┘  │
│  ┌───────────┐  │  │  ┌───────────┐  │
│  │ documents │  │  │  │ Generative│  │
│  │   table   │  │  │  │   Model   │  │
│  └───────────┘  │  │  └───────────┘  │
└─────────────────┘  └─────────────────┘

Component Responsibilities

Client Browser:

Renders the chat interface
Sends user queries to the API
Displays streamed responses
Shows citation/source information

Next.js Server:

Handles all API endpoints
Orchestrates the RAG pipeline
Manages authentication state
Protects sensitive API keys

Supabase:

Stores document chunks and embeddings
Executes vector similarity searches
Enforces row-level security
Manages user data

Gemini API:

Generates embeddings for queries
Produces text responses
Handles streaming output

Data Flow: Query to Response

Let's trace a complete user interaction through the system:

Step 1: User Sends Query

// Client-side
const response = await fetch('/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    message: "How do I configure authentication?"
  })
});

The query travels from the browser to the Next.js API route.

Step 2: Query Embedding

// Server-side: /api/chat/route.ts
const queryEmbedding = await embedQuery(message);
// Result: [0.023, -0.145, 0.087, ..., 0.234] (768 dimensions)

The server calls Gemini's embedding API to vectorize the user's question.

Step 3: Vector Search

// Server-side
const { data: relevantDocs } = await supabase
  .rpc('search_docs', {
    query_embedding: queryEmbedding,
    match_count: 5
  });

The embedding is sent to Supabase, which searches for similar vectors using pgvector's cosine similarity operator.

Step 4: Context Assembly

// Server-side
const context = relevantDocs
  .map(doc => `[Source: ${doc.source}]\n${doc.content}`)
  .join('\n\n---\n\n');

const prompt = `
You are a helpful assistant. Answer using ONLY the following context.

CONTEXT:
${context}

USER QUESTION: ${message}
`;

Retrieved documents are assembled into a prompt with clear instructions for the LLM.

Step 5: Generation

// Server-side
const stream = await gemini.generateContentStream({
  contents: [{ role: 'user', parts: [{ text: prompt }] }],
  systemInstruction: "You are a documentation assistant..."
});

The prompt is sent to Gemini's generative model, which produces a streamed response.

Step 6: Response Streaming

// Server-side - Return streaming response
return new Response(stream, {
  headers: { 'Content-Type': 'text/event-stream' }
});

// Client-side - Process stream
const reader = response.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Update UI with new tokens
}

Tokens are streamed back to the client as they're generated, providing immediate feedback.

Supabase as a Vector Database

The Power of pgvector

Supabase uses PostgreSQL with the pgvector extension, which adds native vector operations to the database. This is significant because:

Unified Storage: You don't need a separate vector database. Your user data, application data, and vector embeddings all live in the same PostgreSQL instance.

SQL Integration: Vector operations integrate naturally with SQL queries. You can filter by metadata, join with other tables, and use standard PostgreSQL features.

Familiar Tools: If you know Supabase and PostgreSQL, you already know 90% of what you need. The vector operations are just new operators and functions.

The Documents Table Schema

Here's a conceptual schema for storing document chunks:

CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  content TEXT NOT NULL,
  embedding VECTOR(768),

  -- Metadata for filtering and attribution
  source TEXT NOT NULL,
  title TEXT,
  chunk_index INTEGER,

  -- Timestamps
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW(),

  -- Optional: User ownership for multi-tenant systems
  user_id UUID REFERENCES auth.users(id)
);

-- Index for fast vector search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Key Fields:

Field	Purpose
`content`	The actual text chunk
`embedding`	768-dimensional vector from Gemini
`source`	File name or URL for attribution
`title`	Human-readable title
`chunk_index`	Position in original document
`user_id`	For multi-tenant access control

Vector Search Function (RPC)

For clean, reusable search logic, we encapsulate the vector search in a PostgreSQL function:

CREATE OR REPLACE FUNCTION search_docs(
  query_embedding VECTOR(768),
  match_count INT DEFAULT 5,
  filter_source TEXT DEFAULT NULL
)
RETURNS TABLE (
  id UUID,
  content TEXT,
  source TEXT,
  title TEXT,
  similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    d.id,
    d.content,
    d.source,
    d.title,
    1 - (d.embedding <=> query_embedding) AS similarity
  FROM documents d
  WHERE
    (filter_source IS NULL OR d.source = filter_source)
  ORDER BY d.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

How it works:

<=> is pgvector's cosine distance operator (lower = more similar)
We convert distance to similarity: 1 - distance
Results are ordered by similarity
Optional filter_source parameter for scoped searches

Calling from Next.js:

const { data, error } = await supabase.rpc('search_docs', {
  query_embedding: embedding,
  match_count: 5,
  filter_source: 'getting-started.md' // optional
});

Security Architecture

Why RAG Operations Belong on the Server

A critical architectural decision: all RAG operations happen server-side. This is non-negotiable for production applications.

API Key Protection:

// NEVER in client code
const GEMINI_API_KEY = process.env.GEMINI_API_KEY; // Server only

// Client never sees this
const supabaseServiceKey = process.env.SUPABASE_SERVICE_KEY; // Server only

If API keys are exposed to the client:

Anyone can use your Gemini quota
Anyone can bypass database security
Your bills could skyrocket

Row-Level Security: Even with Supabase's client library, sensitive operations should go through API routes where you have full control over queries.

The Security Model

┌─────────────────────────────────────────────────────┐
│                   CLIENT BROWSER                    │
│                                                     │
│  • User's JWT token (from Supabase Auth)           │
│  • No API keys                                      │
│  • No direct database access for vectors           │
└──────────────────────┬──────────────────────────────┘
                       │ JWT Token
┌──────────────────────▼──────────────────────────────┐
│                   NEXT.JS API ROUTES                │
│                                                     │
│  • Validates JWT                                    │
│  • Holds GEMINI_API_KEY                            │
│  • Holds SUPABASE_SERVICE_KEY (for admin ops)      │
│  • Enforces business logic                         │
└──────────────────────┬──────────────────────────────┘
                       │ Service Key
┌──────────────────────▼──────────────────────────────┐
│                     SUPABASE                        │
│                                                     │
│  • RLS policies on documents table                 │
│  • User can only query their own documents         │
│  • Admin operations via service key                │
└─────────────────────────────────────────────────────┘

Row-Level Security for Vectors

For multi-tenant applications, RLS policies control which documents each user can search:

-- Enable RLS
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

-- Users can only read their own documents
CREATE POLICY "Users read own documents"
ON documents FOR SELECT
USING (auth.uid() = user_id);

-- Service key bypasses RLS (for admin/indexing)
-- This is automatic when using the service role key

With this policy:

User A's search only returns User A's documents
User B cannot access User A's knowledge base
The indexing script (using service key) can write any documents

The Indexing Pipeline

Offline vs. Online Operations

RAG systems have two distinct operational phases:

Offline (Indexing):

Runs when documents are added/updated
Can be batch processed
Doesn't block user requests
Often runs as a cron job or triggered by webhooks

Online (Query):

Runs when users ask questions
Must be fast (< 2-3 seconds)
Real-time requirement

Indexing Architecture

┌─────────────────────────────────────────────────────┐
│              DOCUMENT SOURCES                       │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐            │
│  │  PDFs   │  │Markdown │  │   Web   │            │
│  │         │  │  Files  │  │  Pages  │            │
│  └────┬────┘  └────┬────┘  └────┬────┘            │
└───────┼────────────┼────────────┼───────────────────┘
        │            │            │
        ▼            ▼            ▼
┌─────────────────────────────────────────────────────┐
│              INGESTION PIPELINE                     │
│  ┌────────────────────────────────────────────┐    │
│  │            Document Loader                  │    │
│  │    (Parse different file formats)          │    │
│  └────────────────────┬───────────────────────┘    │
│                       ▼                             │
│  ┌────────────────────────────────────────────┐    │
│  │            Text Chunker                     │    │
│  │    (Split into semantic units)             │    │
│  └────────────────────┬───────────────────────┘    │
│                       ▼                             │
│  ┌────────────────────────────────────────────┐    │
│  │         Embedding Generator                 │    │
│  │    (Gemini text-embedding-004)             │    │
│  └────────────────────┬───────────────────────┘    │
│                       ▼                             │
│  ┌────────────────────────────────────────────┐    │
│  │           Database Writer                   │    │
│  │    (Supabase documents table)              │    │
│  └────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────┘

Batch Processing Considerations

When indexing large document sets:

Chunk Processing:

// Process in batches to avoid memory issues
const BATCH_SIZE = 100;

for (let i = 0; i < chunks.length; i += BATCH_SIZE) {
  const batch = chunks.slice(i, i + BATCH_SIZE);
  const embeddings = await embedBatch(batch);
  await supabase.from('documents').insert(
    batch.map((chunk, j) => ({
      content: chunk.text,
      embedding: embeddings[j],
      source: chunk.source
    }))
  );
}

Rate Limiting: Respect API rate limits. Gemini has generous limits, but large-scale indexing should implement:

Request throttling
Exponential backoff on errors
Progress tracking and resumability

The Query Pipeline

Request Flow

// /app/api/chat/route.ts
export async function POST(request: Request) {
  // 1. Authentication
  const user = await getUser(request);
  if (!user) return unauthorized();

  // 2. Parse request
  const { message } = await request.json();

  // 3. Generate query embedding
  const queryEmbedding = await embedQuery(message);

  // 4. Search for relevant documents
  const { data: docs } = await supabase.rpc('search_docs', {
    query_embedding: queryEmbedding,
    match_count: 5
  });

  // 5. Build context
  const context = buildContext(docs);

  // 6. Generate response
  const stream = await generateResponse(context, message);

  // 7. Return streaming response
  return new StreamingResponse(stream);
}

Performance Optimization Points

Caching Query Embeddings: If users ask similar questions, cache embeddings:

const cacheKey = hashQuery(message);
let embedding = cache.get(cacheKey);
if (!embedding) {
  embedding = await embedQuery(message);
  cache.set(cacheKey, embedding, TTL);
}

Connection Pooling: Supabase handles this, but ensure your client is reused:

// lib/supabase.ts - Singleton client
let supabase: SupabaseClient;

export function getSupabase() {
  if (!supabase) {
    supabase = createClient(url, key);
  }
  return supabase;
}

Parallel Operations: When operations are independent, run them in parallel:

const [embedding, userPrefs] = await Promise.all([
  embedQuery(message),
  getUserPreferences(userId)
]);

Summary

In this lesson, we've mapped the RAG architecture to our specific technology stack:

Key Takeaways:

Clear separation of concerns: Client handles UI, server handles orchestration, Supabase stores data, Gemini provides AI capabilities
Server-side execution is mandatory: All sensitive operations (API calls, database queries) must run on the server
Supabase + pgvector = unified storage: No need for a separate vector database
RLS enables multi-tenancy: Users only see their own documents
Indexing is offline, querying is online: Different performance characteristics and requirements
The RPC pattern encapsulates search logic: Clean, reusable, and secure

Module 1 Complete

Congratulations! You've completed Module 1: Foundational Theory. You now understand:

What RAG is and why it's essential
How vector embeddings enable semantic search
The complete architecture of our Next.js + Supabase + Gemini stack

In Module 2, we'll dive into the Indexing Phase, where you'll learn the art of document chunking, vectorization strategies, and building a searchable knowledge base.

"Architecture is about making strategic decisions that are hard to change later. Get them right, and everything else becomes easier." — Martin Fowler

Architectural Overview: The Next.js & Supabase Stack

Introduction

The Full-Stack Flow

High-Level Architecture

Component Responsibilities

Data Flow: Query to Response

Step 1: User Sends Query

Step 2: Query Embedding

Step 3: Vector Search

Step 4: Context Assembly

Step 5: Generation

Step 6: Response Streaming

Supabase as a Vector Database

The Power of pgvector

The Documents Table Schema

Vector Search Function (RPC)

Security Architecture

Why RAG Operations Belong on the Server

The Security Model

Row-Level Security for Vectors

The Indexing Pipeline

Offline vs. Online Operations

Indexing Architecture

Batch Processing Considerations

The Query Pipeline

Request Flow

Performance Optimization Points

Summary

Module 1 Complete

Quiz

Architectural Overview: The Next.js & Supabase Stack

Introduction

The Full-Stack Flow

High-Level Architecture

Component Responsibilities

Data Flow: Query to Response

Step 1: User Sends Query

Step 2: Query Embedding

Step 3: Vector Search

Step 4: Context Assembly

Step 5: Generation

Step 6: Response Streaming

Supabase as a Vector Database

The Power of pgvector

The Documents Table Schema

Vector Search Function (RPC)

Security Architecture

Why RAG Operations Belong on the Server

The Security Model

Row-Level Security for Vectors

The Indexing Pipeline

Offline vs. Online Operations

Indexing Architecture

Batch Processing Considerations

The Query Pipeline

Request Flow

Performance Optimization Points

Summary

Module 1 Complete

Quiz