Module 15: Integration with LangChain & AI SDK

Connecting Vector Databases to Your AI Application

Introduction

Vector databases don't exist in isolation—they're part of a larger AI application. This module shows you how to integrate them with popular frameworks.

By the end of this module, you'll know how to:

Use vector databases with LangChain
Integrate with the Vercel AI SDK
Build complete RAG pipelines
Handle production concerns

15.1 LangChain Integration

What is LangChain?

LangChain is a framework for building LLM-powered applications. It provides:

Abstractions for common patterns
Pre-built integrations with vector stores
Chains and agents for complex workflows

LangChain Vector Store Interface

LangChain provides a unified interface for vector stores:

import { Document } from 'langchain/document'

interface VectorStore {
  addDocuments(documents: Document[]): Promise<void>
  similaritySearch(query: string, k: number): Promise<Document[]>
  similaritySearchWithScore(query: string, k: number): Promise<[Document, number][]>
}

Using Pinecone with LangChain

import { PineconeStore } from '@langchain/pinecone'
import { OpenAIEmbeddings } from '@langchain/openai'
import { Pinecone } from '@pinecone-database/pinecone'
import { Document } from 'langchain/document'

// Initialize
const pinecone = new Pinecone()
const index = pinecone.index('my-index')
const embeddings = new OpenAIEmbeddings()

// Create vector store
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
  pineconeIndex: index,
  namespace: 'documents'
})

// Add documents
await vectorStore.addDocuments([
  new Document({
    pageContent: 'Vector databases store embeddings.',
    metadata: { source: 'tutorial', category: 'databases' }
  }),
  new Document({
    pageContent: 'LangChain simplifies LLM development.',
    metadata: { source: 'docs', category: 'frameworks' }
  })
])

// Search
const results = await vectorStore.similaritySearch(
  'How do I store vectors?',
  5
)

console.log(results)
// [Document { pageContent: 'Vector databases...', metadata: {...} }, ...]

Using pgvector with LangChain

import { PGVectorStore } from '@langchain/community/vectorstores/pgvector'
import { OpenAIEmbeddings } from '@langchain/openai'
import { PoolConfig } from 'pg'

const config: PoolConfig = {
  connectionString: process.env.DATABASE_URL
}

const embeddings = new OpenAIEmbeddings()

// Create or connect to vector store
const vectorStore = await PGVectorStore.initialize(embeddings, {
  postgresConnectionOptions: config,
  tableName: 'documents',
  columns: {
    idColumnName: 'id',
    vectorColumnName: 'embedding',
    contentColumnName: 'content',
    metadataColumnName: 'metadata'
  }
})

// Add and search
await vectorStore.addDocuments(documents)
const results = await vectorStore.similaritySearch(query, 5)

Using Chroma with LangChain

import { Chroma } from '@langchain/community/vectorstores/chroma'
import { OpenAIEmbeddings } from '@langchain/openai'

const embeddings = new OpenAIEmbeddings()

// Create from documents
const vectorStore = await Chroma.fromDocuments(
  documents,
  embeddings,
  {
    collectionName: 'my-collection',
    url: 'http://localhost:8000'  // Or omit for embedded mode
  }
)

// Search with filter
const results = await vectorStore.similaritySearch(
  query,
  5,
  { category: 'tutorial' }  // Metadata filter
)

15.2 Building RAG with LangChain

Basic RAG Chain

import { ChatOpenAI } from '@langchain/openai'
import { PineconeStore } from '@langchain/pinecone'
import { createRetrievalChain } from 'langchain/chains/retrieval'
import { createStuffDocumentsChain } from 'langchain/chains/combine_documents'
import { ChatPromptTemplate } from '@langchain/core/prompts'

// Setup
const llm = new ChatOpenAI({ model: 'gpt-4-turbo' })
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
  pineconeIndex: index
})
const retriever = vectorStore.asRetriever({ k: 5 })

// Create prompt
const prompt = ChatPromptTemplate.fromTemplate(`
Answer the question based on the following context:

{context}

Question: {input}

Answer:
`)

// Create chains
const documentChain = await createStuffDocumentsChain({
  llm,
  prompt
})

const retrievalChain = await createRetrievalChain({
  retriever,
  combineDocsChain: documentChain
})

// Use
const response = await retrievalChain.invoke({
  input: 'What are vector databases used for?'
})

console.log(response.answer)

RAG with Conversation History

import { createHistoryAwareRetriever } from 'langchain/chains/history_aware_retriever'
import { MessagesPlaceholder } from '@langchain/core/prompts'

// Rephrase prompt for conversation context
const rephrasePrompt = ChatPromptTemplate.fromMessages([
  new MessagesPlaceholder('chat_history'),
  ['user', '{input}'],
  ['user', 'Given the conversation, rephrase the question for search.']
])

// Create history-aware retriever
const historyAwareRetriever = await createHistoryAwareRetriever({
  llm,
  retriever,
  rephrasePrompt
})

// Answer prompt
const answerPrompt = ChatPromptTemplate.fromMessages([
  ['system', 'Answer based on context:\n\n{context}'],
  new MessagesPlaceholder('chat_history'),
  ['user', '{input}']
])

// Full conversational chain
const chain = await createRetrievalChain({
  retriever: historyAwareRetriever,
  combineDocsChain: await createStuffDocumentsChain({
    llm,
    prompt: answerPrompt
  })
})

// Use with history
const response = await chain.invoke({
  input: 'Tell me more about that',
  chat_history: [
    { role: 'user', content: 'What are vector databases?' },
    { role: 'assistant', content: 'Vector databases store embeddings...' }
  ]
})

15.3 Vercel AI SDK Integration

What is the Vercel AI SDK?

The Vercel AI SDK provides:

Streaming UI components
Type-safe tool calling
Multi-provider support
Edge-optimized performance

Basic RAG with AI SDK

import { openai } from '@ai-sdk/openai'
import { generateText } from 'ai'
import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'

const pinecone = new Pinecone()
const index = pinecone.index('my-index')
const openaiClient = new OpenAI()

async function ragQuery(question: string) {
  // Generate query embedding
  const embeddingResponse = await openaiClient.embeddings.create({
    model: 'text-embedding-3-small',
    input: question
  })

  // Retrieve relevant documents
  const queryResult = await index.query({
    vector: embeddingResponse.data[0].embedding,
    topK: 5,
    includeMetadata: true
  })

  const context = queryResult.matches
    ?.map(m => m.metadata?.content)
    .join('\n\n') ?? ''

  // Generate answer
  const { text } = await generateText({
    model: openai('gpt-4-turbo'),
    system: `Answer questions based on this context:\n\n${context}`,
    prompt: question
  })

  return text
}

Streaming RAG with AI SDK

import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export async function POST(req: Request) {
  const { messages } = await req.json()
  const lastMessage = messages[messages.length - 1].content

  // Get context from vector database
  const context = await getRelevantContext(lastMessage)

  const result = streamText({
    model: openai('gpt-4-turbo'),
    system: `You are a helpful assistant. Use this context to answer questions:\n\n${context}`,
    messages
  })

  return result.toDataStreamResponse()
}

// Client-side with useChat
import { useChat } from 'ai/react'

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat()

  return (
    <div>
      {messages.map(m => (
        <div key={m.id}>{m.role}: {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
      </form>
    </div>
  )
}

RAG as a Tool

import { openai } from '@ai-sdk/openai'
import { generateText, tool } from 'ai'
import { z } from 'zod'

const searchKnowledgeBase = tool({
  description: 'Search the knowledge base for relevant information',
  parameters: z.object({
    query: z.string().describe('The search query')
  }),
  execute: async ({ query }) => {
    const embedding = await getEmbedding(query)
    const results = await index.query({
      vector: embedding,
      topK: 5,
      includeMetadata: true
    })

    return results.matches?.map(m => ({
      content: m.metadata?.content,
      source: m.metadata?.source,
      score: m.score
    })) ?? []
  }
})

const { text, toolResults } = await generateText({
  model: openai('gpt-4-turbo'),
  tools: { searchKnowledgeBase },
  maxSteps: 3,  // Allow multiple tool calls
  prompt: 'What does our documentation say about vector databases?'
})

15.4 Complete RAG Application

Here's a production-ready RAG implementation:

// lib/rag.ts
import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'

const pinecone = new Pinecone()
const openai = new OpenAI()

interface RAGResult {
  answer: string
  sources: Array<{
    content: string
    source: string
    score: number
  }>
}

export async function ragQuery(
  question: string,
  options: {
    topK?: number
    minScore?: number
    filter?: Record<string, any>
  } = {}
): Promise<RAGResult> {
  const { topK = 5, minScore = 0.7, filter } = options

  // Step 1: Embed the question
  const embeddingResponse = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: question
  })
  const queryEmbedding = embeddingResponse.data[0].embedding

  // Step 2: Retrieve relevant documents
  const index = pinecone.index('knowledge-base')
  const queryResult = await index.query({
    vector: queryEmbedding,
    topK,
    filter,
    includeMetadata: true
  })

  // Step 3: Filter by score
  const relevantDocs = queryResult.matches?.filter(
    m => (m.score ?? 0) >= minScore
  ) ?? []

  if (relevantDocs.length === 0) {
    return {
      answer: "I couldn't find relevant information to answer your question.",
      sources: []
    }
  }

  // Step 4: Build context
  const context = relevantDocs
    .map(doc => doc.metadata?.content)
    .join('\n\n---\n\n')

  // Step 5: Generate answer
  const completion = await openai.chat.completions.create({
    model: 'gpt-4-turbo',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant. Answer questions based on the provided context.
If the context doesn't contain enough information, say so.
Always cite your sources.

Context:
${context}`
      },
      { role: 'user', content: question }
    ],
    temperature: 0.3
  })

  return {
    answer: completion.choices[0].message.content ?? '',
    sources: relevantDocs.map(doc => ({
      content: String(doc.metadata?.content ?? ''),
      source: String(doc.metadata?.source ?? 'unknown'),
      score: doc.score ?? 0
    }))
  }
}

// app/api/chat/route.ts
import { ragQuery } from '@/lib/rag'
import { openai } from '@ai-sdk/openai'
import { streamText } from 'ai'

export async function POST(req: Request) {
  const { messages } = await req.json()
  const lastMessage = messages[messages.length - 1].content

  // Get RAG context
  const { sources } = await ragQuery(lastMessage)
  const context = sources.map(s => s.content).join('\n\n')

  // Stream response
  const result = streamText({
    model: openai('gpt-4-turbo'),
    system: `Answer based on this context:\n\n${context}\n\nIf unsure, say so.`,
    messages
  })

  return result.toDataStreamResponse()
}

15.5 Production Considerations

Error Handling

async function safeRAGQuery(question: string): Promise<RAGResult> {
  try {
    return await ragQuery(question)
  } catch (error) {
    console.error('RAG query failed:', error)

    // Fallback: Answer without context
    if (error instanceof Error && error.message.includes('rate limit')) {
      await delay(1000)
      return safeRAGQuery(question)  // Retry
    }

    return {
      answer: 'I encountered an error. Please try again.',
      sources: []
    }
  }
}

Caching

import { LRUCache } from 'lru-cache'

const cache = new LRUCache<string, RAGResult>({
  max: 1000,
  ttl: 1000 * 60 * 10  // 10 minutes
})

async function cachedRAGQuery(question: string): Promise<RAGResult> {
  const cacheKey = question.toLowerCase().trim()
  const cached = cache.get(cacheKey)

  if (cached) {
    return cached
  }

  const result = await ragQuery(question)
  cache.set(cacheKey, result)
  return result
}

Logging and Monitoring

async function monitoredRAGQuery(
  question: string,
  userId: string
): Promise<RAGResult> {
  const startTime = Date.now()

  try {
    const result = await ragQuery(question)

    // Log successful query
    await logQuery({
      userId,
      question,
      answerLength: result.answer.length,
      sourcesCount: result.sources.length,
      latencyMs: Date.now() - startTime,
      status: 'success'
    })

    return result
  } catch (error) {
    // Log failed query
    await logQuery({
      userId,
      question,
      latencyMs: Date.now() - startTime,
      status: 'error',
      error: error instanceof Error ? error.message : 'Unknown'
    })

    throw error
  }
}

Key Takeaways

LangChain provides abstractions for common vector store operations
Vercel AI SDK is great for streaming and React integration
Always handle errors gracefully in production
Cache when possible to reduce costs and latency
Monitor your RAG pipeline for quality and performance

Exercise: Build a Complete RAG App

Build a document Q&A application with:

Document ingestion (PDF or text files)
Chunking and embedding
Vector storage (your choice of database)
Chat interface with streaming
Source citations in responses
Error handling and retry logic

Bonus:

Add conversation history
Implement caching
Add logging and monitoring

Next up: Epilogue - Next Steps