Architectural Overview: The Next.js & Supabase Stack
Introduction
Now that we understand RAG conceptually and know how embeddings work, it's time to map these concepts to our specific technology stack. This lesson provides a comprehensive architectural overview of how Next.js, Supabase, and Gemini work together to create a production-ready RAG system.
By the end of this lesson, you'll have a clear mental model of the entire system—where each operation happens, how data flows between components, and why the architecture is designed this way.
The Full-Stack Flow
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT BROWSER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Chat UI │ │ Message │ │ Citations │ │
│ │ Component │ │ Display │ │ Panel │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│ HTTP/WebSocket
┌────────────────────────────▼────────────────────────────────────┐
│ NEXT.JS SERVER │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ API Route │ │ Context │ │ Prompt │ │
│ │ /api/chat │───▶│ Builder │───▶│ Assembler │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ │ │ ▼ │
│ │ │ ┌─────────────┐ │
│ │ │ │ Gemini │ │
│ │ │ │ Client │ │
│ │ │ └─────────────┘ │
└─────────┼──────────────────┼────────────────────────────────────┘
│ │
│ Vector Search │ Text Generation
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ SUPABASE │ │ GEMINI API │
│ ┌───────────┐ │ │ ┌───────────┐ │
│ │ pgvector │ │ │ │ Embedding │ │
│ │ search │ │ │ │ Model │ │
│ └───────────┘ │ │ └───────────┘ │
│ ┌───────────┐ │ │ ┌───────────┐ │
│ │ documents │ │ │ │ Generative│ │
│ │ table │ │ │ │ Model │ │
│ └───────────┘ │ │ └───────────┘ │
└─────────────────┘ └─────────────────┘
Component Responsibilities
Client Browser:
- Renders the chat interface
- Sends user queries to the API
- Displays streamed responses
- Shows citation/source information
Next.js Server:
- Handles all API endpoints
- Orchestrates the RAG pipeline
- Manages authentication state
- Protects sensitive API keys
Supabase:
- Stores document chunks and embeddings
- Executes vector similarity searches
- Enforces row-level security
- Manages user data
Gemini API:
- Generates embeddings for queries
- Produces text responses
- Handles streaming output
Data Flow: Query to Response
Let's trace a complete user interaction through the system:
Step 1: User Sends Query
// Client-side
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: "How do I configure authentication?"
})
});
The query travels from the browser to the Next.js API route.
Step 2: Query Embedding
// Server-side: /api/chat/route.ts
const queryEmbedding = await embedQuery(message);
// Result: [0.023, -0.145, 0.087, ..., 0.234] (768 dimensions)
The server calls Gemini's embedding API to vectorize the user's question.
Step 3: Vector Search
// Server-side
const { data: relevantDocs } = await supabase
.rpc('search_docs', {
query_embedding: queryEmbedding,
match_count: 5
});
The embedding is sent to Supabase, which searches for similar vectors using pgvector's cosine similarity operator.
Step 4: Context Assembly
// Server-side
const context = relevantDocs
.map(doc => `[Source: ${doc.source}]\n${doc.content}`)
.join('\n\n---\n\n');
const prompt = `
You are a helpful assistant. Answer using ONLY the following context.
CONTEXT:
${context}
USER QUESTION: ${message}
`;
Retrieved documents are assembled into a prompt with clear instructions for the LLM.
Step 5: Generation
// Server-side
const stream = await gemini.generateContentStream({
contents: [{ role: 'user', parts: [{ text: prompt }] }],
systemInstruction: "You are a documentation assistant..."
});
The prompt is sent to Gemini's generative model, which produces a streamed response.
Step 6: Response Streaming
// Server-side - Return streaming response
return new Response(stream, {
headers: { 'Content-Type': 'text/event-stream' }
});
// Client-side - Process stream
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
// Update UI with new tokens
}
Tokens are streamed back to the client as they're generated, providing immediate feedback.
Supabase as a Vector Database
The Power of pgvector
Supabase uses PostgreSQL with the pgvector extension, which adds native vector operations to the database. This is significant because:
New to Supabase? If you're unfamiliar with Supabase's architecture and PostgreSQL foundations, consider taking Supabase Fundamentals first. The Supabase Architecture Overview lesson provides essential context for understanding how pgvector fits into the platform.
Unified Storage: You don't need a separate vector database. Your user data, application data, and vector embeddings all live in the same PostgreSQL instance.
SQL Integration: Vector operations integrate naturally with SQL queries. You can filter by metadata, join with other tables, and use standard PostgreSQL features.
Familiar Tools: If you know Supabase and PostgreSQL, you already know 90% of what you need. The vector operations are just new operators and functions.
The Documents Table Schema
Here's a conceptual schema for storing document chunks:
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding VECTOR(768),
-- Metadata for filtering and attribution
source TEXT NOT NULL,
title TEXT,
chunk_index INTEGER,
-- Timestamps
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
-- Optional: User ownership for multi-tenant systems
user_id UUID REFERENCES auth.users(id)
);
-- Index for fast vector search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Key Fields:
| Field | Purpose |
|---|---|
content | The actual text chunk |
embedding | 768-dimensional vector from Gemini |
source | File name or URL for attribution |
title | Human-readable title |
chunk_index | Position in original document |
user_id | For multi-tenant access control |
Vector Search Function (RPC)
For clean, reusable search logic, we encapsulate the vector search in a PostgreSQL function:
CREATE OR REPLACE FUNCTION search_docs(
query_embedding VECTOR(768),
match_count INT DEFAULT 5,
filter_source TEXT DEFAULT NULL
)
RETURNS TABLE (
id UUID,
content TEXT,
source TEXT,
title TEXT,
similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
d.id,
d.content,
d.source,
d.title,
1 - (d.embedding <=> query_embedding) AS similarity
FROM documents d
WHERE
(filter_source IS NULL OR d.source = filter_source)
ORDER BY d.embedding <=> query_embedding
LIMIT match_count;
END;
$$;
How it works:
<=>is pgvector's cosine distance operator (lower = more similar)- We convert distance to similarity:
1 - distance - Results are ordered by similarity
- Optional
filter_sourceparameter for scoped searches
Calling from Next.js:
const { data, error } = await supabase.rpc('search_docs', {
query_embedding: embedding,
match_count: 5,
filter_source: 'getting-started.md' // optional
});
Security Architecture
Why RAG Operations Belong on the Server
A critical architectural decision: all RAG operations happen server-side. This is non-negotiable for production applications.
API Key Protection:
// NEVER in client code
const GEMINI_API_KEY = process.env.GEMINI_API_KEY; // Server only
// Client never sees this
const supabaseServiceKey = process.env.SUPABASE_SERVICE_KEY; // Server only
If API keys are exposed to the client:
- Anyone can use your Gemini quota
- Anyone can bypass database security
- Your bills could skyrocket
Row-Level Security: Even with Supabase's client library, sensitive operations should go through API routes where you have full control over queries.
The Security Model
┌─────────────────────────────────────────────────────┐
│ CLIENT BROWSER │
│ │
│ • User's JWT token (from Supabase Auth) │
│ • No API keys │
│ • No direct database access for vectors │
└──────────────────────┬──────────────────────────────┘
│ JWT Token
┌──────────────────────▼──────────────────────────────┐
│ NEXT.JS API ROUTES │
│ │
│ • Validates JWT │
│ • Holds GEMINI_API_KEY │
│ • Holds SUPABASE_SERVICE_KEY (for admin ops) │
│ • Enforces business logic │
└──────────────────────┬──────────────────────────────┘
│ Service Key
┌──────────────────────▼──────────────────────────────┐
│ SUPABASE │
│ │
│ • RLS policies on documents table │
│ • User can only query their own documents │
│ • Admin operations via service key │
└─────────────────────────────────────────────────────┘
Row-Level Security for Vectors
For multi-tenant applications, RLS policies control which documents each user can search. For a deeper understanding of RLS concepts, see Introduction to Row Level Security and Anatomy of RLS Policies in the Supabase Fundamentals course.
-- Enable RLS
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
-- Users can only read their own documents
CREATE POLICY "Users read own documents"
ON documents FOR SELECT
USING (auth.uid() = user_id);
-- Service key bypasses RLS (for admin/indexing)
-- This is automatic when using the service role key
With this policy:
- User A's search only returns User A's documents
- User B cannot access User A's knowledge base
- The indexing script (using service key) can write any documents
The Indexing Pipeline
Offline vs. Online Operations
RAG systems have two distinct operational phases:
Offline (Indexing):
- Runs when documents are added/updated
- Can be batch processed
- Doesn't block user requests
- Often runs as a cron job or triggered by webhooks
Online (Query):
- Runs when users ask questions
- Must be fast (< 2-3 seconds)
- Real-time requirement
Indexing Architecture
┌─────────────────────────────────────────────────────┐
│ DOCUMENT SOURCES │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ PDFs │ │Markdown │ │ Web │ │
│ │ │ │ Files │ │ Pages │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
└───────┼────────────┼────────────┼───────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ INGESTION PIPELINE │
│ ┌────────────────────────────────────────────┐ │
│ │ Document Loader │ │
│ │ (Parse different file formats) │ │
│ └────────────────────┬───────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Text Chunker │ │
│ │ (Split into semantic units) │ │
│ └────────────────────┬───────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Embedding Generator │ │
│ │ (Gemini text-embedding-004) │ │
│ └────────────────────┬───────────────────────┘ │
│ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ Database Writer │ │
│ │ (Supabase documents table) │ │
│ └────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Batch Processing Considerations
When indexing large document sets:
Chunk Processing:
// Process in batches to avoid memory issues
const BATCH_SIZE = 100;
for (let i = 0; i < chunks.length; i += BATCH_SIZE) {
const batch = chunks.slice(i, i + BATCH_SIZE);
const embeddings = await embedBatch(batch);
await supabase.from('documents').insert(
batch.map((chunk, j) => ({
content: chunk.text,
embedding: embeddings[j],
source: chunk.source
}))
);
}
Rate Limiting: Respect API rate limits. Gemini has generous limits, but large-scale indexing should implement:
- Request throttling
- Exponential backoff on errors
- Progress tracking and resumability
The Query Pipeline
Request Flow
// /app/api/chat/route.ts
export async function POST(request: Request) {
// 1. Authentication
const user = await getUser(request);
if (!user) return unauthorized();
// 2. Parse request
const { message } = await request.json();
// 3. Generate query embedding
const queryEmbedding = await embedQuery(message);
// 4. Search for relevant documents
const { data: docs } = await supabase.rpc('search_docs', {
query_embedding: queryEmbedding,
match_count: 5
});
// 5. Build context
const context = buildContext(docs);
// 6. Generate response
const stream = await generateResponse(context, message);
// 7. Return streaming response
return new StreamingResponse(stream);
}
Performance Optimization Points
Caching Query Embeddings: If users ask similar questions, cache embeddings:
const cacheKey = hashQuery(message);
let embedding = cache.get(cacheKey);
if (!embedding) {
embedding = await embedQuery(message);
cache.set(cacheKey, embedding, TTL);
}
Connection Pooling: Supabase handles this, but ensure your client is reused:
// lib/supabase.ts - Singleton client
let supabase: SupabaseClient;
export function getSupabase() {
if (!supabase) {
supabase = createClient(url, key);
}
return supabase;
}
Parallel Operations: When operations are independent, run them in parallel:
const [embedding, userPrefs] = await Promise.all([
embedQuery(message),
getUserPreferences(userId)
]);
Summary
In this lesson, we've mapped the RAG architecture to our specific technology stack:
Key Takeaways:
-
Clear separation of concerns: Client handles UI, server handles orchestration, Supabase stores data, Gemini provides AI capabilities
-
Server-side execution is mandatory: All sensitive operations (API calls, database queries) must run on the server
-
Supabase + pgvector = unified storage: No need for a separate vector database
-
RLS enables multi-tenancy: Users only see their own documents
-
Indexing is offline, querying is online: Different performance characteristics and requirements
-
The RPC pattern encapsulates search logic: Clean, reusable, and secure
Module 1 Complete
Congratulations! You've completed Module 1: Foundational Theory. You now understand:
- What RAG is and why it's essential
- How vector embeddings enable semantic search
- The complete architecture of our Next.js + Supabase + Gemini stack
In Module 2, we'll dive into the Indexing Phase, where you'll learn the art of document chunking, vectorization strategies, and building a searchable knowledge base.
"Architecture is about making strategic decisions that are hard to change later. Get them right, and everything else becomes easier." — Martin Fowler

