Module 15: Integration with LangChain & AI SDK
Connecting Vector Databases to Your AI Application
Introduction
Vector databases don't exist in isolation—they're part of a larger AI application. This module shows you how to integrate them with popular frameworks.
By the end of this module, you'll know how to:
- Use vector databases with LangChain
- Integrate with the Vercel AI SDK
- Build complete RAG pipelines
- Handle production concerns
15.1 LangChain Integration
What is LangChain?
LangChain is a framework for building LLM-powered applications. It provides:
- Abstractions for common patterns
- Pre-built integrations with vector stores
- Chains and agents for complex workflows
LangChain Vector Store Interface
LangChain provides a unified interface for vector stores:
import { Document } from 'langchain/document'
interface VectorStore {
addDocuments(documents: Document[]): Promise<void>
similaritySearch(query: string, k: number): Promise<Document[]>
similaritySearchWithScore(query: string, k: number): Promise<[Document, number][]>
}
Using Pinecone with LangChain
import { PineconeStore } from '@langchain/pinecone'
import { OpenAIEmbeddings } from '@langchain/openai'
import { Pinecone } from '@pinecone-database/pinecone'
import { Document } from 'langchain/document'
// Initialize
const pinecone = new Pinecone()
const index = pinecone.index('my-index')
const embeddings = new OpenAIEmbeddings()
// Create vector store
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
pineconeIndex: index,
namespace: 'documents'
})
// Add documents
await vectorStore.addDocuments([
new Document({
pageContent: 'Vector databases store embeddings.',
metadata: { source: 'tutorial', category: 'databases' }
}),
new Document({
pageContent: 'LangChain simplifies LLM development.',
metadata: { source: 'docs', category: 'frameworks' }
})
])
// Search
const results = await vectorStore.similaritySearch(
'How do I store vectors?',
5
)
console.log(results)
// [Document { pageContent: 'Vector databases...', metadata: {...} }, ...]
Using pgvector with LangChain
import { PGVectorStore } from '@langchain/community/vectorstores/pgvector'
import { OpenAIEmbeddings } from '@langchain/openai'
import { PoolConfig } from 'pg'
const config: PoolConfig = {
connectionString: process.env.DATABASE_URL
}
const embeddings = new OpenAIEmbeddings()
// Create or connect to vector store
const vectorStore = await PGVectorStore.initialize(embeddings, {
postgresConnectionOptions: config,
tableName: 'documents',
columns: {
idColumnName: 'id',
vectorColumnName: 'embedding',
contentColumnName: 'content',
metadataColumnName: 'metadata'
}
})
// Add and search
await vectorStore.addDocuments(documents)
const results = await vectorStore.similaritySearch(query, 5)
Using Chroma with LangChain
import { Chroma } from '@langchain/community/vectorstores/chroma'
import { OpenAIEmbeddings } from '@langchain/openai'
const embeddings = new OpenAIEmbeddings()
// Create from documents
const vectorStore = await Chroma.fromDocuments(
documents,
embeddings,
{
collectionName: 'my-collection',
url: 'http://localhost:8000' // Or omit for embedded mode
}
)
// Search with filter
const results = await vectorStore.similaritySearch(
query,
5,
{ category: 'tutorial' } // Metadata filter
)
15.2 Building RAG with LangChain
Basic RAG Chain
import { ChatOpenAI } from '@langchain/openai'
import { PineconeStore } from '@langchain/pinecone'
import { createRetrievalChain } from 'langchain/chains/retrieval'
import { createStuffDocumentsChain } from 'langchain/chains/combine_documents'
import { ChatPromptTemplate } from '@langchain/core/prompts'
// Setup
const llm = new ChatOpenAI({ model: 'gpt-4-turbo' })
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
pineconeIndex: index
})
const retriever = vectorStore.asRetriever({ k: 5 })
// Create prompt
const prompt = ChatPromptTemplate.fromTemplate(`
Answer the question based on the following context:
{context}
Question: {input}
Answer:
`)
// Create chains
const documentChain = await createStuffDocumentsChain({
llm,
prompt
})
const retrievalChain = await createRetrievalChain({
retriever,
combineDocsChain: documentChain
})
// Use
const response = await retrievalChain.invoke({
input: 'What are vector databases used for?'
})
console.log(response.answer)
RAG with Conversation History
import { createHistoryAwareRetriever } from 'langchain/chains/history_aware_retriever'
import { MessagesPlaceholder } from '@langchain/core/prompts'
// Rephrase prompt for conversation context
const rephrasePrompt = ChatPromptTemplate.fromMessages([
new MessagesPlaceholder('chat_history'),
['user', '{input}'],
['user', 'Given the conversation, rephrase the question for search.']
])
// Create history-aware retriever
const historyAwareRetriever = await createHistoryAwareRetriever({
llm,
retriever,
rephrasePrompt
})
// Answer prompt
const answerPrompt = ChatPromptTemplate.fromMessages([
['system', 'Answer based on context:\n\n{context}'],
new MessagesPlaceholder('chat_history'),
['user', '{input}']
])
// Full conversational chain
const chain = await createRetrievalChain({
retriever: historyAwareRetriever,
combineDocsChain: await createStuffDocumentsChain({
llm,
prompt: answerPrompt
})
})
// Use with history
const response = await chain.invoke({
input: 'Tell me more about that',
chat_history: [
{ role: 'user', content: 'What are vector databases?' },
{ role: 'assistant', content: 'Vector databases store embeddings...' }
]
})
15.3 Vercel AI SDK Integration
What is the Vercel AI SDK?
The Vercel AI SDK provides:
- Streaming UI components
- Type-safe tool calling
- Multi-provider support
- Edge-optimized performance
Basic RAG with AI SDK
import { openai } from '@ai-sdk/openai'
import { generateText } from 'ai'
import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'
const pinecone = new Pinecone()
const index = pinecone.index('my-index')
const openaiClient = new OpenAI()
async function ragQuery(question: string) {
// Generate query embedding
const embeddingResponse = await openaiClient.embeddings.create({
model: 'text-embedding-3-small',
input: question
})
// Retrieve relevant documents
const queryResult = await index.query({
vector: embeddingResponse.data[0].embedding,
topK: 5,
includeMetadata: true
})
const context = queryResult.matches
?.map(m => m.metadata?.content)
.join('\n\n') ?? ''
// Generate answer
const { text } = await generateText({
model: openai('gpt-4-turbo'),
system: `Answer questions based on this context:\n\n${context}`,
prompt: question
})
return text
}
Streaming RAG with AI SDK
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'
export async function POST(req: Request) {
const { messages } = await req.json()
const lastMessage = messages[messages.length - 1].content
// Get context from vector database
const context = await getRelevantContext(lastMessage)
const result = streamText({
model: openai('gpt-4-turbo'),
system: `You are a helpful assistant. Use this context to answer questions:\n\n${context}`,
messages
})
return result.toDataStreamResponse()
}
// Client-side with useChat
import { useChat } from 'ai/react'
export function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat()
return (
<div>
{messages.map(m => (
<div key={m.id}>{m.role}: {m.content}</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
</form>
</div>
)
}
RAG as a Tool
import { openai } from '@ai-sdk/openai'
import { generateText, tool } from 'ai'
import { z } from 'zod'
const searchKnowledgeBase = tool({
description: 'Search the knowledge base for relevant information',
parameters: z.object({
query: z.string().describe('The search query')
}),
execute: async ({ query }) => {
const embedding = await getEmbedding(query)
const results = await index.query({
vector: embedding,
topK: 5,
includeMetadata: true
})
return results.matches?.map(m => ({
content: m.metadata?.content,
source: m.metadata?.source,
score: m.score
})) ?? []
}
})
const { text, toolResults } = await generateText({
model: openai('gpt-4-turbo'),
tools: { searchKnowledgeBase },
maxSteps: 3, // Allow multiple tool calls
prompt: 'What does our documentation say about vector databases?'
})
15.4 Complete RAG Application
Here's a production-ready RAG implementation:
// lib/rag.ts
import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'
const pinecone = new Pinecone()
const openai = new OpenAI()
interface RAGResult {
answer: string
sources: Array<{
content: string
source: string
score: number
}>
}
export async function ragQuery(
question: string,
options: {
topK?: number
minScore?: number
filter?: Record<string, any>
} = {}
): Promise<RAGResult> {
const { topK = 5, minScore = 0.7, filter } = options
// Step 1: Embed the question
const embeddingResponse = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: question
})
const queryEmbedding = embeddingResponse.data[0].embedding
// Step 2: Retrieve relevant documents
const index = pinecone.index('knowledge-base')
const queryResult = await index.query({
vector: queryEmbedding,
topK,
filter,
includeMetadata: true
})
// Step 3: Filter by score
const relevantDocs = queryResult.matches?.filter(
m => (m.score ?? 0) >= minScore
) ?? []
if (relevantDocs.length === 0) {
return {
answer: "I couldn't find relevant information to answer your question.",
sources: []
}
}
// Step 4: Build context
const context = relevantDocs
.map(doc => doc.metadata?.content)
.join('\n\n---\n\n')
// Step 5: Generate answer
const completion = await openai.chat.completions.create({
model: 'gpt-4-turbo',
messages: [
{
role: 'system',
content: `You are a helpful assistant. Answer questions based on the provided context.
If the context doesn't contain enough information, say so.
Always cite your sources.
Context:
${context}`
},
{ role: 'user', content: question }
],
temperature: 0.3
})
return {
answer: completion.choices[0].message.content ?? '',
sources: relevantDocs.map(doc => ({
content: String(doc.metadata?.content ?? ''),
source: String(doc.metadata?.source ?? 'unknown'),
score: doc.score ?? 0
}))
}
}
// app/api/chat/route.ts
import { ragQuery } from '@/lib/rag'
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'
export async function POST(req: Request) {
const { messages } = await req.json()
const lastMessage = messages[messages.length - 1].content
// Get RAG context
const { sources } = await ragQuery(lastMessage)
const context = sources.map(s => s.content).join('\n\n')
// Stream response
const result = streamText({
model: openai('gpt-4-turbo'),
system: `Answer based on this context:\n\n${context}\n\nIf unsure, say so.`,
messages
})
return result.toDataStreamResponse()
}
15.5 Production Considerations
Error Handling
async function safeRAGQuery(question: string): Promise<RAGResult> {
try {
return await ragQuery(question)
} catch (error) {
console.error('RAG query failed:', error)
// Fallback: Answer without context
if (error instanceof Error && error.message.includes('rate limit')) {
await delay(1000)
return safeRAGQuery(question) // Retry
}
return {
answer: 'I encountered an error. Please try again.',
sources: []
}
}
}
Caching
import { LRUCache } from 'lru-cache'
const cache = new LRUCache<string, RAGResult>({
max: 1000,
ttl: 1000 * 60 * 10 // 10 minutes
})
async function cachedRAGQuery(question: string): Promise<RAGResult> {
const cacheKey = question.toLowerCase().trim()
const cached = cache.get(cacheKey)
if (cached) {
return cached
}
const result = await ragQuery(question)
cache.set(cacheKey, result)
return result
}
Logging and Monitoring
async function monitoredRAGQuery(
question: string,
userId: string
): Promise<RAGResult> {
const startTime = Date.now()
try {
const result = await ragQuery(question)
// Log successful query
await logQuery({
userId,
question,
answerLength: result.answer.length,
sourcesCount: result.sources.length,
latencyMs: Date.now() - startTime,
status: 'success'
})
return result
} catch (error) {
// Log failed query
await logQuery({
userId,
question,
latencyMs: Date.now() - startTime,
status: 'error',
error: error instanceof Error ? error.message : 'Unknown'
})
throw error
}
}
Key Takeaways
- LangChain provides abstractions for common vector store operations
- Vercel AI SDK is great for streaming and React integration
- Always handle errors gracefully in production
- Cache when possible to reduce costs and latency
- Monitor your RAG pipeline for quality and performance
Exercise: Build a Complete RAG App
Build a document Q&A application with:
- Document ingestion (PDF or text files)
- Chunking and embedding
- Vector storage (your choice of database)
- Chat interface with streaming
- Source citations in responses
- Error handling and retry logic
Bonus:
- Add conversation history
- Implement caching
- Add logging and monitoring
Next up: Epilogue - Next Steps

