Architectural Overview: The Next.js & Supabase Stack
Introduction
Now that we understand RAG conceptually and know how embeddings work, it's time to map these concepts to our specific technology stack. This lesson provides a comprehensive architectural overview of how Next.js, Supabase, and Gemini work together to create a production-ready RAG system.
By the end of this lesson, you'll have a clear mental model of the entire system—where each operation happens, how data flows between components, and why the architecture is designed this way.
The Full-Stack Flow
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT BROWSER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Chat UI │ │ Message │ │ Citations │ │
│ │ Component │ │ Display │ │ Panel │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│ HTTP/WebSocket
┌────────────────────────────▼────────────────────────────────────┐
│ NEXT.JS SERVER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ API Route │ │ Context │ │ Prompt │ │
│ │ /api/chat │───▶│ Builder │───▶│ Assembler │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ │ │ ▼ │
│ │ │ ┌─────────────┐ │
│ │ │ │ Gemini │ │
│ │ │ │ Client │ │
│ │ │ └─────────────┘ │
└─────────┼──────────────────┼────────────────────────────────────┘
│ │
│ Vector Search │ Text Generation
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ SUPABASE │ │ GEMINI API │
│ ┌───────────┐ │ │ ┌───────────┐ │
│ │ pgvector │ │ │ │ Embedding │ │
│ │ search │ │ │ │ Model │ │
│ └───────────┘ │ │ └───────────┘ │
│ ┌───────────┐ │ │ ┌───────────┐ │
│ │ documents │ │ │ │ Generative│ │
│ │ table │ │ │ │ Model │ │
│ └───────────┘ │ │ └───────────┘ │
└─────────────────┘ └─────────────────┘
Component Responsibilities
Client Browser:
- Renders the chat interface
- Sends user queries to the API
- Displays streamed responses
- Shows citation/source information
Next.js Server:
- Handles all API endpoints
- Orchestrates the RAG pipeline
- Manages authentication state
- Protects sensitive API keys
Supabase:
- Stores document chunks and embeddings
- Executes vector similarity searches
- Enforces row-level security
- Manages user data
Gemini API:
- Generates embeddings for queries
- Produces text responses
- Handles streaming output
Data Flow: Query to Response
Let's trace a complete user interaction through the system:
Step 1: User Sends Query
// Client-side
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: "How do I configure authentication?"
})
});
The query travels from the browser to the Next.js API route.
Step 2: Query Embedding
// Server-side: /api/chat/route.ts
const queryEmbedding = await embedQuery(message);
// Result: [0.023, -0.145, 0.087, ..., 0.234] (768 dimensions)
The server calls Gemini's embedding API to vectorize the user's question.
Step 3: Vector Search
// Server-side
const { data: relevantDocs } = await supabase
.rpc('search_docs', {
query_embedding: queryEmbedding,
match_count: 5
});
The embedding is sent to Supabase, which searches for similar vectors using pgvector's cosine similarity operator.
Step 4: Context Assembly
// Server-side
const context = relevantDocs
.map(doc => `[Source: ${doc.source}]\n${doc.content}`)
.join('\n\n---\n\n');
const prompt = `
You are a helpful assistant. Answer using ONLY the following context.
CONTEXT:
${context}
USER QUESTION: ${message}
`;
Retrieved documents are assembled into a prompt with clear instructions for the LLM.
Step 5: Generation
// Server-side
const stream = await gemini.generateContentStream({
contents: [{ role: 'user', parts: [{ text: prompt }] }],
systemInstruction: "You are a documentation assistant..."
});
The prompt is sent to Gemini's generative model, which produces a streamed response.
Step 6: Response Streaming
// Server-side - Return streaming response
return new Response(stream, {
headers: { 'Content-Type': 'text/event-stream' }
});
// Client-side - Process stream
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Update UI with new tokens
}
Tokens are streamed back to the client as they're generated, providing immediate feedback.
Supabase as a Vector Database
The Power of pgvector
Supabase uses PostgreSQL with the pgvector extension, which adds native vector operations to the database. This is significant because:
Unified Storage: You don't need a separate vector database. Your user data, application data, and vector embeddings all live in the same PostgreSQL instance.
SQL Integration: Vector operations integrate naturally with SQL queries. You can filter by metadata, join with other tables, and use standard PostgreSQL features.
Familiar Tools: If you know Supabase and PostgreSQL, you already know 90% of what you need. The vector operations are just new operators and functions.
The Documents Table Schema
Here's a conceptual schema for storing document chunks:
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding VECTOR(768),
-- Metadata for filtering and attribution
source TEXT NOT NULL,
title TEXT,
chunk_index INTEGER,
-- Timestamps
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Optional: User ownership for multi-tenant systems
user_id UUID REFERENCES auth.users(id)
);
-- Index for fast vector search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Key Fields:
| Field | Purpose |
|---|---|
content | The actual text chunk |
embedding | 768-dimensional vector from Gemini |
source | File name or URL for attribution |
title | Human-readable title |
chunk_index | Position in original document |
user_id | For multi-tenant access control |
Vector Search Function (RPC)
For clean, reusable search logic, we encapsulate the vector search in a PostgreSQL function:
CREATE OR REPLACE FUNCTION search_docs(
query_embedding VECTOR(768),
match_count INT DEFAULT 5,
filter_source TEXT DEFAULT NULL
)
RETURNS TABLE (
id UUID,
content TEXT,
source TEXT,
title TEXT,
similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
d.id,
d.content,
d.source,
d.title,
1 - (d.embedding <=> query_embedding) AS similarity
FROM documents d
WHERE
(filter_source IS NULL OR d.source = filter_source)
ORDER BY d.embedding <=> query_embedding
LIMIT match_count;
END;
$$;
How it works:
<=>is pgvector's cosine distance operator (lower = more similar)- We convert distance to similarity:
1 - distance - Results are ordered by similarity
- Optional
filter_sourceparameter for scoped searches
Calling from Next.js:
const { data, error } = await supabase.rpc('search_docs', {
query_embedding: embedding,
match_count: 5,
filter_source: 'getting-started.md' // optional
});
Security Architecture
Why RAG Operations Belong on the Server
A critical architectural decision: all RAG operations happen server-side. This is non-negotiable for production applications.
API Key Protection:
// NEVER in client code
const GEMINI_API_KEY = process.env.GEMINI_API_KEY; // Server only
// Client never sees this
const supabaseServiceKey = process.env.SUPABASE_SERVICE_KEY; // Server only
If API keys are exposed to the client:
- Anyone can use your Gemini quota
- Anyone can bypass database security
- Your bills could skyrocket
Row-Level Security: Even with Supabase's client library, sensitive operations should go through API routes where you have full control over queries.
The Security Model
┌─────────────────────────────────────────────────────┐
│ CLIENT BROWSER │
│ │
│ • User's JWT token (from Supabase Auth) │
│ • No API keys │
│ • No direct database access for vectors │
└──────────────────────┬──────────────────────────────┘
│ JWT Token
┌──────────────────────▼──────────────────────────────┐
│ NEXT.JS API ROUTES │
│ │
│ • Validates JWT │
│ • Holds GEMINI_API_KEY │
│ • Holds SUPABASE_SERVICE_KEY (for admin ops) │
│ • Enforces business logic │
└──────────────────────┬──────────────────────────────┘
│ Service Key
┌──────────────────────▼──────────────────────────────┐
│ SUPABASE │
│ │
│ • RLS policies on documents table │
│ • User can only query their own documents │
│ • Admin operations via service key │
└─────────────────────────────────────────────────────┘
Row-Level Security for Vectors
For multi-tenant applications, RLS policies control which documents each user can search:
-- Enable RLS
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
-- Users can only read their own documents
CREATE POLICY "Users read own documents"
ON documents FOR SELECT
USING (auth.uid() = user_id);
-- Service key bypasses RLS (for admin/indexing)
-- This is automatic when using the service role key
With this policy:
- User A's search only returns User A's documents
- User B cannot access User A's knowledge base
- The indexing script (using service key) can write any documents
The Indexing Pipeline
Offline vs. Online Operations
RAG systems have two distinct operational phases:
Offline (Indexing):
- Runs when documents are added/updated
- Can be batch processed
- Doesn't block user requests
- Often runs as a cron job or triggered by webhooks
Online (Query):
- Runs when users ask questions
- Must be fast (< 2-3 seconds)
- Real-time requirement
Indexing Architecture
┌─────────────────────────────────────────────────────┐
│ DOCUMENT SOURCES │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ PDFs │ │Markdown │ │ Web │ │
│ │ │ │ Files │ │ Pages │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
└───────┼────────────┼────────────┼───────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ INGESTION PIPELINE │
│ ┌────────────────────────────────────────────┐ │
│ │ Document Loader │ │
│ │ (Parse different file formats) │ │
│ └────────────────────┬───────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Text Chunker │ │
│ │ (Split into semantic units) │ │
│ └────────────────────┬───────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Embedding Generator │ │
│ │ (Gemini text-embedding-004) │ │
│ └────────────────────┬───────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Database Writer │ │
│ │ (Supabase documents table) │ │
│ └────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Batch Processing Considerations
When indexing large document sets:
Chunk Processing:
// Process in batches to avoid memory issues
const BATCH_SIZE = 100;
for (let i = 0; i < chunks.length; i += BATCH_SIZE) {
const batch = chunks.slice(i, i + BATCH_SIZE);
const embeddings = await embedBatch(batch);
await supabase.from('documents').insert(
batch.map((chunk, j) => ({
content: chunk.text,
embedding: embeddings[j],
source: chunk.source
}))
);
}
Rate Limiting: Respect API rate limits. Gemini has generous limits, but large-scale indexing should implement:
- Request throttling
- Exponential backoff on errors
- Progress tracking and resumability
The Query Pipeline
Request Flow
// /app/api/chat/route.ts
export async function POST(request: Request) {
// 1. Authentication
const user = await getUser(request);
if (!user) return unauthorized();
// 2. Parse request
const { message } = await request.json();
// 3. Generate query embedding
const queryEmbedding = await embedQuery(message);
// 4. Search for relevant documents
const { data: docs } = await supabase.rpc('search_docs', {
query_embedding: queryEmbedding,
match_count: 5
});
// 5. Build context
const context = buildContext(docs);
// 6. Generate response
const stream = await generateResponse(context, message);
// 7. Return streaming response
return new StreamingResponse(stream);
}
Performance Optimization Points
Caching Query Embeddings: If users ask similar questions, cache embeddings:
const cacheKey = hashQuery(message);
let embedding = cache.get(cacheKey);
if (!embedding) {
embedding = await embedQuery(message);
cache.set(cacheKey, embedding, TTL);
}
Connection Pooling: Supabase handles this, but ensure your client is reused:
// lib/supabase.ts - Singleton client
let supabase: SupabaseClient;
export function getSupabase() {
if (!supabase) {
supabase = createClient(url, key);
}
return supabase;
}
Parallel Operations: When operations are independent, run them in parallel:
const [embedding, userPrefs] = await Promise.all([
embedQuery(message),
getUserPreferences(userId)
]);
Summary
In this lesson, we've mapped the RAG architecture to our specific technology stack:
Key Takeaways:
-
Clear separation of concerns: Client handles UI, server handles orchestration, Supabase stores data, Gemini provides AI capabilities
-
Server-side execution is mandatory: All sensitive operations (API calls, database queries) must run on the server
-
Supabase + pgvector = unified storage: No need for a separate vector database
-
RLS enables multi-tenancy: Users only see their own documents
-
Indexing is offline, querying is online: Different performance characteristics and requirements
-
The RPC pattern encapsulates search logic: Clean, reusable, and secure
Module 1 Complete
Congratulations! You've completed Module 1: Foundational Theory. You now understand:
- What RAG is and why it's essential
- How vector embeddings enable semantic search
- The complete architecture of our Next.js + Supabase + Gemini stack
In Module 2, we'll dive into the Indexing Phase, where you'll learn the art of document chunking, vectorization strategies, and building a searchable knowledge base.
"Architecture is about making strategic decisions that are hard to change later. Get them right, and everything else becomes easier." — Martin Fowler

