DRAFT Agentic Design Patterns - Knowledge Retrieval (RAG)"

This guide demonstrates how to implement sophisticated RAG systems using TypeScript, Next.js 15, LangChain, and LangGraph on Vercel's platform. We'll build from basic retrieval to advanced agentic RAG patterns that self-correct, route queries intelligently, and handle complex multi-step reasoning.

Mental Model: RAG as an Intelligent Research Assistant

Think of RAG like having a research assistant who doesn't just fetch documents but understands context, evaluates source quality, and synthesizes information. Traditional RAG is like a library catalog system - you query, it retrieves. Agentic RAG is like having a PhD student who knows when to search, what to search for, cross-references sources, identifies contradictions, and even knows when to say "I need to look elsewhere." In the serverless context, this assistant works in short bursts (within 777 seconds) but maintains conversation state across interactions.

Basic Example: Simple Vector-Based RAG

1. Install RAG Dependencies

npm install @langchain/pinecone @pinecone-database/pinecone
npm install @langchain/textsplitters
npm install es-toolkit es-toolkit/compat

Installs Pinecone for vector storage, Google's embedding-001 for embeddings, text splitters for chunking, and es-toolkit for utility functions.

2. Initialize Vector Store

// lib/vector-store.ts
import { Pinecone } from '@pinecone-database/pinecone';
import { PineconeStore } from '@langchain/pinecone';
import { GoogleGenerativeAIEmbeddings } from '@langchain/google-genai';
import { memoize } from 'es-toolkit/compat';

// Memoize client creation for serverless efficiency
const getPineconeClient = memoize(() => 
  new Pinecone({
    apiKey: process.env.PINECONE_API_KEY!,
  })
);

export async function getVectorStore() {
  const pinecone = getPineconeClient();
  const index = pinecone.index(process.env.PINECONE_INDEX_NAME!);
  
  const embeddings = new GoogleGenerativeAIEmbeddings({
    modelName: "embedding-001",
    taskType: "RETRIEVAL_DOCUMENT",
  });
  
  return PineconeStore.fromExistingIndex(embeddings, {
    pineconeIndex: index,
    maxConcurrency: 5, // Optimize for serverless
  });
}

Creates a memoized Pinecone client to avoid re-initialization on each serverless invocation, using Google's embeddings for cost optimization.

3. Document Ingestion with Chunking

// lib/ingestion.ts
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
import { Document } from '@langchain/core/documents';
import { getVectorStore } from './vector-store';
import { chunk, map } from 'es-toolkit';

interface ChunkingConfig {
  chunkSize: number;
  chunkOverlap: number;
  separators?: string[];
}

export async function ingestDocuments(
  texts: string[],
  metadata: Record<string, any>[] = [],
  config: ChunkingConfig = {
    chunkSize: 1500,
    chunkOverlap: 200,
  }
) {
  const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: config.chunkSize,
    chunkOverlap: config.chunkOverlap,
    separators: config.separators || ['\n\n', '\n', '.', '!', '?'],
  });
  
  // Process documents in parallel batches
  const documents = await Promise.all(
    map(texts, async (text, index) => {
      const chunks = await splitter.splitText(text);
      return chunks.map((chunk, chunkIndex) => 
        new Document({
          pageContent: chunk,
          metadata: {
            ...metadata[index],
            chunkIndex,
            originalIndex: index,
          },
        })
      );
    })
  );
  
  const flatDocs = documents.flat();
  const vectorStore = await getVectorStore();
  
  // Batch insert for efficiency
  const batches = chunk(flatDocs, 100);
  for (const batch of batches) {
    await vectorStore.addDocuments(batch);
  }
  
  return flatDocs.length;
}

Implements smart document chunking with overlap to maintain context, processing documents in parallel batches optimized for serverless timeout limits.

4. Basic RAG Chain

// lib/rag/basic-rag.ts
import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { PromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { RunnablePassthrough, RunnableSequence } from '@langchain/core/runnables';
import { getVectorStore } from '../vector-store';
import { formatDocumentsAsString } from 'langchain/util/document';

export async function createBasicRAGChain() {
  const vectorStore = await getVectorStore();
  const retriever = vectorStore.asRetriever({
    k: 4, // Retrieve top 4 relevant chunks
    searchType: 'similarity',
  });
  
  const prompt = PromptTemplate.fromTemplate(`
    Answer the question based only on the following context:
    {context}
    
    Question: {question}
    
    Answer concisely and cite the relevant parts of the context.
  `);
  
  const model = new ChatGoogleGenerativeAI({
    modelName: 'gemini-2.5-flash',
    temperature: 0.3,
    maxOutputTokens: 1024,
  });
  
  const chain = RunnableSequence.from([
    {
      context: async (input: { question: string }) => {
        const docs = await retriever.invoke(input.question);
        return formatDocumentsAsString(docs);
      },
      question: new RunnablePassthrough(),
    },
    prompt,
    model,
    new StringOutputParser(),
  ]);
  
  return chain;
}

Creates a basic RAG chain that retrieves context, formats it with the question, and generates a grounded response.

5. API Route for Basic RAG

// app/api/rag/basic/route.ts
import { createBasicRAGChain } from '@/lib/rag/basic-rag';
import { NextResponse } from 'next/server';

export const runtime = 'nodejs';
export const maxDuration = 60;

export async function POST(req: Request) {
  try {
    const { question } = await req.json();
    
    const chain = await createBasicRAGChain();
    const response = await chain.invoke({ question });
    
    return NextResponse.json({ answer: response });
  } catch (error) {
    console.error('RAG error:', error);
    return NextResponse.json(
      { error: 'Failed to process query' },
      { status: 500 }
    );
  }
}

Simple API endpoint that accepts questions and returns RAG-augmented answers with a 60-second timeout for basic queries.

Advanced Example: Agentic RAG with Self-Correction

1. Self-Corrective RAG with CRAG Pattern

// lib/rag/corrective-rag.ts
import { StateGraph, END } from '@langchain/langgraph';
import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { BaseMessage, HumanMessage, AIMessage } from '@langchain/core/messages';
import { Document } from '@langchain/core/documents';
import { getVectorStore } from '../vector-store';
import { WebBrowser } from '@langchain/community/tools/webbrowser';
import { GoogleGenerativeAIEmbeddings } from '@langchain/google-genai';
import { filter, map, some } from 'es-toolkit';

interface CRAGState {
  question: string;
  documents: Document[];
  relevanceScores: number[];
  finalAnswer: string;
  needsWebSearch: boolean;
  webResults: Document[];
}

export function createCorrectiveRAG() {
  const model = new ChatGoogleGenerativeAI({
    modelName: 'gemini-2.5-pro',
    temperature: 0,
  });
  
  const relevanceModel = new ChatGoogleGenerativeAI({
    modelName: 'gemini-2.5-flash',
    temperature: 0,
  });
  
  const workflow = new StateGraph<CRAGState>({
    channels: {
      question: null,
      documents: null,
      relevanceScores: null,
      finalAnswer: null,
      needsWebSearch: null,
      webResults: null,
    },
  });
  
  // Node: Retrieve documents
  workflow.addNode('retrieve', async (state) => {
    const vectorStore = await getVectorStore();
    const retriever = vectorStore.asRetriever({ k: 5 });
    const documents = await retriever.invoke(state.question);
    
    return { documents };
  });
  
  // Node: Evaluate relevance
  workflow.addNode('evaluate_relevance', async (state) => {
    const relevancePrompt = `
      Score the relevance of this document to the question (0-10):
      Question: {question}
      Document: {document}
      
      Return only a number between 0-10.
    `;
    
    const relevanceScores = await Promise.all(
      map(state.documents, async (doc) => {
        const response = await relevanceModel.invoke([
          new HumanMessage(
            relevancePrompt
              .replace('{question}', state.question)
              .replace('{document}', doc.pageContent)
          ),
        ]);
        return parseFloat(response.content as string) || 0;
      })
    );
    
    // Check if we need web search (all scores below 7)
    const needsWebSearch = !some(relevanceScores, score => score >= 7);
    
    return { relevanceScores, needsWebSearch };
  });
  
  // Node: Web search fallback
  workflow.addNode('web_search', async (state) => {
    if (!state.needsWebSearch) {
      return { webResults: [] };
    }
    
    const embeddings = new GoogleGenerativeAIEmbeddings({
      modelName: "embedding-001",
    });
    
    const browser = new WebBrowser({ model, embeddings });
    const searchResult = await browser.invoke(state.question);
    
    // Parse search results into documents
    const webResults = [
      new Document({
        pageContent: searchResult,
        metadata: { source: 'web_search' },
      }),
    ];
    
    return { webResults };
  });
  
  // Node: Generate answer
  workflow.addNode('generate', async (state) => {
    // Filter high-relevance documents
    const relevantDocs = filter(
      state.documents,
      (_, index) => state.relevanceScores[index] >= 7
    );
    
    // Combine with web results if needed
    const allDocs = [...relevantDocs, ...state.webResults];
    
    const context = map(allDocs, doc => doc.pageContent).join('\n\n');
    
    const response = await model.invoke([
      new HumanMessage(`
        Answer this question using the provided context:
        
        Context:
        ${context}
        
        Question: ${state.question}
        
        Provide a comprehensive answer with citations.
      `),
    ]);
    
    return { finalAnswer: response.content as string };
  });
  
  // Define workflow edges
  workflow.setEntryPoint('retrieve');
  workflow.addEdge('retrieve', 'evaluate_relevance');
  workflow.addEdge('evaluate_relevance', 'web_search');
  workflow.addEdge('web_search', 'generate');
  workflow.addEdge('generate', END);
  
  return workflow.compile();
}

Implements CRAG pattern that evaluates document relevance, falls back to web search when needed, and generates answers from verified sources.

2. Multi-Query RAG for Complex Questions

// lib/rag/multi-query-rag.ts
import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { getVectorStore } from '../vector-store';
import { uniqBy, flatten, take } from 'es-toolkit';
import { Document } from '@langchain/core/documents';

interface MultiQueryConfig {
  numQueries: number;
  maxDocsPerQuery: number;
  temperature: number;
}

export class MultiQueryRAG {
  private model: ChatGoogleGenerativeAI;
  private queryGenerator: ChatGoogleGenerativeAI;
  
  constructor() {
    this.model = new ChatGoogleGenerativeAI({
      modelName: 'gemini-2.5-pro',
      temperature: 0.3,
    });
    
    this.queryGenerator = new ChatGoogleGenerativeAI({
      modelName: 'gemini-2.5-flash',
      temperature: 0.7, // Higher temp for query diversity
    });
  }
  
  async generateQueries(
    originalQuery: string,
    config: MultiQueryConfig = {
      numQueries: 3,
      maxDocsPerQuery: 3,
      temperature: 0.7,
    }
  ): Promise<string[]> {
    const prompt = `
      Generate ${config.numQueries} different search queries to find information about:
      "${originalQuery}"
      
      Make queries that:
      1. Use different keywords and phrasings
      2. Focus on different aspects of the question
      3. Range from specific to general
      
      Return only the queries, one per line.
    `;
    
    const response = await this.queryGenerator.invoke([
      new HumanMessage(prompt),
    ]);
    
    const queries = (response.content as string)
      .split('\n')
      .filter(q => q.trim())
      .slice(0, config.numQueries);
    
    return [originalQuery, ...queries];
  }
  
  async retrieveWithMultiQuery(
    query: string,
    config?: MultiQueryConfig
  ): Promise<Document[]> {
    const queries = await this.generateQueries(query, config);
    const vectorStore = await getVectorStore();
    
    // Retrieve for each query in parallel
    const allResults = await Promise.all(
      queries.map(q => 
        vectorStore.similaritySearch(q, config?.maxDocsPerQuery || 3)
      )
    );
    
    // Deduplicate by content
    const uniqueDocs = uniqBy(
      flatten(allResults),
      doc => doc.pageContent
    );
    
    // Return top documents
    return take(uniqueDocs, 10);
  }
  
  async answer(query: string): Promise<string> {
    const documents = await this.retrieveWithMultiQuery(query);
    
    const context = documents
      .map((doc, idx) => `[${idx + 1}] ${doc.pageContent}`)
      .join('\n\n');
    
    const response = await this.model.invoke([
      new HumanMessage(`
        Answer based on the following context:
        
        ${context}
        
        Question: ${query}
        
        Include reference numbers [1], [2], etc. for your sources.
      `),
    ]);
    
    return response.content as string;
  }
}

Generates multiple query variations to improve retrieval coverage, deduplicates results, and provides referenced answers.

3. Adaptive RAG Router

// lib/rag/adaptive-rag.ts
import { StateGraph, END } from '@langchain/langgraph';
import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { BaseMessage } from '@langchain/core/messages';
import { createBasicRAGChain } from './basic-rag';
import { MultiQueryRAG } from './multi-query-rag';
import { createCorrectiveRAG } from './corrective-rag';

interface AdaptiveRAGState {
  query: string;
  complexity: 'simple' | 'medium' | 'complex';
  answer: string;
  confidence: number;
}

export function createAdaptiveRAG() {
  const classifier = new ChatGoogleGenerativeAI({
    modelName: 'gemini-2.5-flash',
    temperature: 0,
  });
  
  const workflow = new StateGraph<AdaptiveRAGState>({
    channels: {
      query: null,
      complexity: null,
      answer: null,
      confidence: null,
    },
  });
  
  // Node: Classify query complexity
  workflow.addNode('classify', async (state) => {
    const prompt = `
      Classify this query's complexity:
      "${state.query}"
      
      Simple: Factual, single-hop questions
      Medium: Multi-aspect questions needing synthesis
      Complex: Questions requiring reasoning, validation, or multiple sources
      
      Respond with only: simple, medium, or complex
    `;
    
    const response = await classifier.invoke([
      new HumanMessage(prompt),
    ]);
    
    const complexity = (response.content as string).trim().toLowerCase() as 
      'simple' | 'medium' | 'complex';
    
    return { complexity };
  });
  
  // Node: Simple RAG
  workflow.addNode('simple_rag', async (state) => {
    if (state.complexity !== 'simple') return {};
    
    const chain = await createBasicRAGChain();
    const answer = await chain.invoke({ question: state.query });
    
    return { answer, confidence: 0.9 };
  });
  
  // Node: Multi-Query RAG
  workflow.addNode('multi_query_rag', async (state) => {
    if (state.complexity !== 'medium') return {};
    
    const multiRAG = new MultiQueryRAG();
    const answer = await multiRAG.answer(state.query);
    
    return { answer, confidence: 0.8 };
  });
  
  // Node: Corrective RAG
  workflow.addNode('corrective_rag', async (state) => {
    if (state.complexity !== 'complex') return {};
    
    const crag = createCorrectiveRAG();
    const result = await crag.invoke({
      question: state.query,
      documents: [],
      relevanceScores: [],
      finalAnswer: '',
      needsWebSearch: false,
      webResults: [],
    });
    
    return { 
      answer: result.finalAnswer, 
      confidence: 0.7 
    };
  });
  
  // Conditional routing based on complexity
  workflow.setEntryPoint('classify');
  
  workflow.addConditionalEdges('classify', (state) => {
    switch (state.complexity) {
      case 'simple':
        return 'simple_rag';
      case 'medium':
        return 'multi_query_rag';
      case 'complex':
        return 'corrective_rag';
      default:
        return 'simple_rag';
    }
  });
  
  workflow.addEdge('simple_rag', END);
  workflow.addEdge('multi_query_rag', END);
  workflow.addEdge('corrective_rag', END);
  
  return workflow.compile();
}

Routes queries to appropriate RAG strategies based on complexity classification, optimizing for both speed and accuracy.

4. Streaming RAG API with Progress Updates

// app/api/rag/adaptive/route.ts
import { createAdaptiveRAG } from '@/lib/rag/adaptive-rag';

export const runtime = 'nodejs';
export const maxDuration = 300;

export async function POST(req: Request) {
  const { query } = await req.json();
  
  const encoder = new TextEncoder();
  const stream = new TransformStream();
  const writer = stream.writable.getWriter();
  
  const workflow = createAdaptiveRAG();
  
  (async () => {
    try {
      // Send progress events
      await writer.write(
        encoder.encode(`data: ${JSON.stringify({
          type: 'status',
          message: 'Analyzing query complexity...'
        })}\n\n`)
      );
      
      const events = await workflow.stream({
        query,
        complexity: 'simple',
        answer: '',
        confidence: 0,
      });
      
      for await (const event of events) {
        // Send intermediate updates
        if (event.complexity) {
          await writer.write(
            encoder.encode(`data: ${JSON.stringify({
              type: 'complexity',
              complexity: event.complexity,
              message: `Using ${event.complexity} RAG strategy`
            })}\n\n`)
          );
        }
        
        if (event.answer) {
          await writer.write(
            encoder.encode(`data: ${JSON.stringify({
              type: 'answer',
              content: event.answer,
              confidence: event.confidence
            })}\n\n`)
          );
        }
      }
      
      await writer.write(
        encoder.encode(`data: ${JSON.stringify({ type: 'done' })}\n\n`)
      );
    } catch (error) {
      await writer.write(
        encoder.encode(`data: ${JSON.stringify({ 
          type: 'error', 
          error: String(error) 
        })}\n\n`)
      );
    } finally {
      await writer.close();
    }
  })();
  
  return new Response(stream.readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
    },
  });
}

Streams RAG execution progress including complexity analysis, strategy selection, and final answers with confidence scores.

5. React Component for Adaptive RAG

// components/AdaptiveRAGInterface.tsx
'use client';

import { useState } from 'react';
import { useMutation } from '@tanstack/react-query';
import { groupBy } from 'es-toolkit';

interface RAGEvent {
  type: 'status' | 'complexity' | 'answer' | 'error' | 'done';
  message?: string;
  complexity?: string;
  content?: string;
  confidence?: number;
  error?: string;
}

export default function AdaptiveRAGInterface() {
  const [query, setQuery] = useState('');
  const [events, setEvents] = useState<RAGEvent[]>([]);
  const [answer, setAnswer] = useState('');
  
  const ragMutation = useMutation({
    mutationFn: async (userQuery: string) => {
      setEvents([]);
      setAnswer('');
      
      const response = await fetch('/api/rag/adaptive', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ query: userQuery }),
      });
      
      if (!response.ok) throw new Error('RAG failed');
      
      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      
      while (reader) {
        const { done, value } = await reader.read();
        if (done) break;
        
        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');
        
        for (const line of lines) {
          if (line.startsWith('data: ')) {
            try {
              const event = JSON.parse(line.slice(6)) as RAGEvent;
              setEvents(prev => [...prev, event]);
              
              if (event.type === 'answer') {
                setAnswer(event.content || '');
              }
            } catch (e) {
              // Ignore parse errors
            }
          }
        }
      }
    },
  });
  
  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();
    if (query.trim()) {
      ragMutation.mutate(query);
    }
  };
  
  // Group events by type for display
  const eventGroups = groupBy(events, event => event.type);
  
  return (
    <div className="w-full max-w-4xl mx-auto">
      <div className="card bg-base-100 shadow-xl">
        <div className="card-body">
          <h2 className="card-title">Adaptive RAG System</h2>
          
          <form onSubmit={handleSubmit} className="space-y-4">
            <div className="form-control">
              <label className="label">
                <span className="label-text">Your Question</span>
              </label>
              <textarea
                className="textarea textarea-bordered h-24"
                placeholder="Ask a question..."
                value={query}
                onChange={(e) => setQuery(e.target.value)}
                disabled={ragMutation.isPending}
              />
            </div>
            
            <button
              type="submit"
              className="btn btn-primary"
              disabled={ragMutation.isPending || !query.trim()}
            >
              {ragMutation.isPending ? (
                <>
                  <span className="loading loading-spinner"></span>
                  Processing...
                </>
              ) : 'Get Answer'}
            </button>
          </form>
          
          {/* Progress indicators */}
          {events.length > 0 && (
            <div className="mt-6 space-y-4">
              {eventGroups.complexity && (
                <div className="alert alert-info">
                  <span>
                    Query Complexity: 
                    <span className="badge badge-primary ml-2">
                      {eventGroups.complexity[0].complexity}
                    </span>
                  </span>
                </div>
              )}
              
              {eventGroups.status && (
                <div className="mockup-code">
                  {eventGroups.status.map((event, idx) => (
                    <pre key={idx} data-prefix={`${idx + 1}`}>
                      <code>{event.message}</code>
                    </pre>
                  ))}
                </div>
              )}
            </div>
          )}
          
          {/* Answer display */}
          {answer && (
            <div className="mt-6">
              <div className="divider">Answer</div>
              <div className="prose max-w-none">
                <div className="p-4 bg-base-200 rounded-lg">
                  {answer}
                </div>
                {events.find(e => e.confidence) && (
                  <div className="mt-2">
                    <progress 
                      className="progress progress-success w-full" 
                      value={events.find(e => e.confidence)?.confidence || 0} 
                      max="1"
                    />
                    <p className="text-sm text-center mt-1">
                      Confidence: {((events.find(e => e.confidence)?.confidence || 0) * 100).toFixed(0)}%
                    </p>
                  </div>
                )}
              </div>
            </div>
          )}
          
          {ragMutation.isError && (
            <div className="alert alert-error mt-4">
              <span>Error: {ragMutation.error?.message}</span>
            </div>
          )}
        </div>
      </div>
    </div>
  );
}

Interactive UI component that displays RAG execution progress, complexity classification, and confidence-scored answers.

Conclusion

This implementation demonstrates the evolution from basic RAG to sophisticated agentic patterns that intelligently route queries, self-correct with web search fallbacks, and adapt strategies based on complexity. The serverless architecture on Vercel ensures cost-effective scaling while LangGraph's state machines enable complex workflows within the 777-second limit. Key patterns include CRAG for self-correction, multi-query for comprehensive retrieval, and adaptive routing for optimal performance. The use of es-toolkit throughout ensures clean, functional code patterns while streaming responses provide excellent user experience even for complex queries.