RAG with LangGraph: Designing Stateful Retrieval Pipelines in Python
Giovanni Romerogiovanniromero.dev
Comments (0)
Views (25)

RAG with LangGraph: Designing Stateful Retrieval Pipelines in Python

Introduction

Retrieval-Augmented Generation (RAG) has become a standard approach for grounding large language models (LLMs) with external data. However, most implementations remain simplistic, typically following a linear pipeline: retrieve, generate, return.

This article focuses on a more robust approach: stateful RAG pipelines using LangGraph. By introducing explicit state management, evaluation loops, and conditional execution, we can build systems that are more reliable, interpretable, and production-ready.


What is a Stateful RAG Pipeline?

A stateful RAG pipeline maintains structured state across multiple steps of execution. Instead of treating each request as a stateless operation, the system tracks:

  • User query
  • Retrieved documents
  • Intermediate responses
  • Evaluation signals

This enables iterative refinement, better debugging, and more controlled reasoning.


Why Use LangGraph for RAG Workflows

LangGraph extends traditional chain-based approaches by enabling graph-based execution. This allows:

  • Conditional branching
  • Iterative loops
  • Stateful memory
  • Modular node composition

This model is particularly well-suited for advanced RAG systems where retrieval and generation may need multiple passes.


RAG Pipeline Architecture

A production-grade RAG system typically includes:

Retriever

Responsible for fetching relevant documents from a vector database using similarity search.

Generator

An LLM that produces answers using retrieved context.

Evaluator

A validation component that determines whether the generated response is sufficient.

Controller (LangGraph)

Orchestrates execution flow, including retries and termination conditions.


Implementation in Python

Below is a simplified implementation using LangGraph:

from langgraph.graph import StateGraph
from typing import TypedDict, List
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema import Document

# =========================
# Base Configuration
# =========================

# LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Embeddings
embeddings = OpenAIEmbeddings()

# Example documents (replace with real data)
docs = [
    Document(page_content="LangGraph is a framework for building stateful AI workflows."),
    Document(page_content="RAG combines retrieval with generation to improve accuracy."),
]

# Vector store
vectorstore = FAISS.from_documents(docs, embeddings)

# =========================
# Graph State
# =========================

class GraphState(TypedDict):
    question: str
    documents: List[Document]
    answer: str
    iterations: int

# =========================
# Node: Retrieve
# =========================

def retrieve(state: GraphState):
    query = state["question"]

    retrieved_docs = vectorstore.similarity_search(query, k=3)

    return {
        "documents": retrieved_docs,
        "iterations": state.get("iterations", 0) + 1
    }

# =========================
# Node: Generate
# =========================

def generate(state: GraphState):
    docs = state["documents"]

    context = "\n\n".join([doc.page_content for doc in docs])

    prompt = f"""
You are a technical assistant.

Use ONLY the context below to answer the question.
If the answer is not contained in the context, respond with "I don't know".

Context:
{context}

Question:
{state['question']}

Answer:
"""

    response = llm.invoke(prompt)

    return {
        "answer": response.content
    }

# =========================
# Node: Evaluate
# =========================

MAX_ITERATIONS = 3

def evaluate(state: GraphState):
    # Safety limit to avoid infinite loops
    if state["iterations"] >= MAX_ITERATIONS:
        return "end"

    eval_prompt = f"""
Evaluate the quality of the answer.

Question:
{state['question']}

Answer:
{state['answer']}

Respond with:
- "retry" if the answer is incomplete, incorrect, or uncertain
- "end" if the answer is correct and sufficient
"""

    decision = llm.invoke(eval_prompt).content.strip().lower()

    if "retry" in decision:
        return "retry"

    return "end"

# =========================
# Build Graph
# =========================

builder = StateGraph(GraphState)

builder.add_node("retrieve", retrieve)
builder.add_node("generate", generate)

builder.set_entry_point("retrieve")

builder.add_edge("retrieve", "generate")

builder.add_conditional_edges(
    "generate",
    evaluate,
    {
        "retry": "retrieve",
        "end": "__end__"
    }
)

graph = builder.compile()

# =========================
# Run
# =========================

if __name__ == "__main__":
    result = graph.invoke({
        "question": "What is LangGraph?",
        "iterations": 0
    })

    print("\nFinal Answer:\n")
    print(result["answer"])

Adding Evaluation Loops

Evaluation loops allow the system to improve responses iteratively. A simple heuristic may check for uncertainty, but in production systems this is often replaced with:

  • Secondary LLM evaluators
  • Confidence scoring
  • Retrieval quality metrics

Designing Adaptive Retrieval Strategies

Not all queries require retrieval. Adaptive RAG introduces a decision step:

  • If the query is simple, answer directly
  • If external knowledge is needed, trigger retrieval

This reduces latency and cost while maintaining accuracy.


Scaling RAG Pipelines in Production

To move from prototype to production:

  • Use optimized vector databases (FAISS, Pinecone, Weaviate)
  • Implement caching layers
  • Add observability (logging, tracing)
  • Introduce re-ranking models for better retrieval quality

Best Practices

  • Use semantic chunking with overlap
  • Select high-quality embedding models
  • Monitor retrieval performance
  • Continuously evaluate output quality

Conclusion

Stateful RAG pipelines represent a significant improvement over basic retrieval-generation patterns. By leveraging LangGraph, developers can build systems that are iterative, adaptive, and robust enough for real-world applications.

This approach is becoming foundational for modern AI systems that require reliable reasoning over external data.

Tags:

raglanggraphretrieval-pipeline

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *