LangChain has evolved from an experimental framework to the industry standard for building LLM-powered applications. With its modular architecture, extensive integrations, and production-ready features, LangChain enables developers to create sophisticated AI systems that combine language models with external data and tools. This guide covers everything from basic concepts to advanced production patterns. For model-specific integration, see our Claude API guide, and for retrieval patterns, check out our RAG implementation guide.
Understanding LangChain Architecture
LangChain's architecture is built around composable components that can be combined to create complex AI workflows. The framework has matured significantly, with LangChain Expression Language (LCEL) providing a declarative way to chain components together while maintaining streaming, async, and batch capabilities.
- Models: Interfaces to LLMs (OpenAI, Anthropic, local models) and embeddings
- Prompts: Template management, few-shot examples, and dynamic prompt construction
- Chains: Sequences of operations that process inputs and generate outputs
- Agents: Autonomous systems that use LLMs to decide which actions to take
- Memory: Conversation history and context management
- Retrievers: Document search and RAG pipeline components
- Tools: External capabilities that agents can use
Getting Started with LangChain
# Install LangChain and common integrations
pip install langchain langchain-openai langchain-anthropic langchain-community
pip install chromadb faiss-cpu # Vector stores
pip install python-dotenv # Environment management
# Set up environment variables
export OPENAI_API_KEY='your-openai-key'
export ANTHROPIC_API_KEY='your-anthropic-key'# Basic LangChain setup and first chain
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Initialize models
openai_model = ChatOpenAI(model="gpt-4-turbo")
anthropic_model = ChatAnthropic(model="claude-3-5-sonnet-20241022")
# Create a simple chain using LCEL (LangChain Expression Language)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant that explains concepts clearly and concisely."),
("human", "{input}")
])
# Chain components together with the | operator
chain = prompt | anthropic_model | StrOutputParser()
# Run the chain
result = chain.invoke({"input": "Explain microservices architecture in 3 sentences"})
print(result)
# Streaming response
for chunk in chain.stream({"input": "What is containerization?"}):
print(chunk, end="", flush=True)Building RAG Pipelines
Retrieval-Augmented Generation (RAG) is one of the most powerful patterns for grounding LLM responses in your own data. LangChain provides all the components needed to build production RAG systems, from document loading to vector storage and retrieval.
# Complete RAG Pipeline Implementation
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import (
PyPDFLoader,
TextLoader,
DirectoryLoader,
WebBaseLoader
)
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from typing import List
import os
class RAGPipeline:
"""Production-ready RAG pipeline with LangChain"""
def __init__(
self,
collection_name: str = "documents",
persist_directory: str = "./chroma_db",
chunk_size: int = 1000,
chunk_overlap: int = 200
):
self.embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
self.llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,
separators=["\n\n", "\n", ".", " ", ""]
)
# Initialize or load vector store
self.vectorstore = Chroma(
collection_name=collection_name,
embedding_function=self.embeddings,
persist_directory=persist_directory
)
self.retriever = self.vectorstore.as_retriever(
search_type="mmr", # Maximum Marginal Relevance
search_kwargs={"k": 5, "fetch_k": 10}
)
def load_documents(self, source: str, source_type: str = "pdf") -> List:
"""Load documents from various sources"""
if source_type == "pdf":
loader = PyPDFLoader(source)
elif source_type == "text":
loader = TextLoader(source)
elif source_type == "directory":
loader = DirectoryLoader(
source,
glob="**/*.{pdf,txt,md}",
show_progress=True
)
elif source_type == "web":
loader = WebBaseLoader(source)
else:
raise ValueError(f"Unsupported source type: {source_type}")
documents = loader.load()
return self.text_splitter.split_documents(documents)
def add_documents(self, source: str, source_type: str = "pdf"):
"""Add documents to the vector store"""
chunks = self.load_documents(source, source_type)
self.vectorstore.add_documents(chunks)
print(f"Added {len(chunks)} chunks to vector store")
def create_chain(self):
"""Create the RAG chain"""
template = """You are a helpful assistant answering questions based on the provided context.
Use only the information from the context to answer. If the context doesn't contain
the answer, say "I don't have enough information to answer that question."
Context:
{context}
Question: {question}
Answer:"""
prompt = ChatPromptTemplate.from_template(template)
def format_docs(docs):
return "\n\n".join([
f"Source: {doc.metadata.get('source', 'Unknown')}\n{doc.page_content}"
for doc in docs
])
# Build the chain
chain = (
{"context": self.retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| self.llm
| StrOutputParser()
)
return chain
def query(self, question: str) -> str:
"""Query the RAG pipeline"""
chain = self.create_chain()
return chain.invoke(question)
def query_with_sources(self, question: str) -> dict:
"""Query and return sources"""
docs = self.retriever.invoke(question)
chain = self.create_chain()
answer = chain.invoke(question)
return {
"answer": answer,
"sources": [
{
"content": doc.page_content[:200] + "...",
"source": doc.metadata.get("source", "Unknown")
}
for doc in docs
]
}
# Usage example
rag = RAGPipeline(collection_name="company_docs")
# Add documents
rag.add_documents("./documents/handbook.pdf", "pdf")
rag.add_documents("./documents/policies/", "directory")
# Query
result = rag.query_with_sources("What is the vacation policy?")
print(f"Answer: {result['answer']}")
print(f"Sources: {result['sources']}")Building Agents with Tools
LangChain agents can autonomously decide which tools to use based on user input. The latest LangGraph integration provides more control over agent workflows with explicit state management and human-in-the-loop capabilities.
# Building Agents with Custom Tools
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.tools import DuckDuckGoSearchRun
import requests
from typing import Optional
# Define custom tools
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city. Use this when the user asks about weather."""
# In production, use a real weather API
api_key = os.getenv("WEATHER_API_KEY")
response = requests.get(
f"https://api.weatherapi.com/v1/current.json",
params={"key": api_key, "q": city}
)
if response.ok:
data = response.json()
return f"Weather in {city}: {data['current']['temp_f']}F, {data['current']['condition']['text']}"
return f"Could not fetch weather for {city}"
@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression. Use this for calculations."""
try:
# Safe evaluation of mathematical expressions
allowed_names = {"abs": abs, "round": round, "min": min, "max": max}
result = eval(expression, {"__builtins__": {}}, allowed_names)
return str(result)
except Exception as e:
return f"Error calculating: {e}"
@tool
def search_database(query: str, table: Optional[str] = None) -> str:
"""Search the company database for information. Use for company-specific queries."""
# Simulated database search
return f"Database results for '{query}': [Sample results would appear here]"
# Initialize search tool
search_tool = DuckDuckGoSearchRun()
# Combine all tools
tools = [get_weather, calculate, search_database, search_tool]
# Create the agent
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful assistant with access to various tools.
Use the appropriate tool based on the user's question.
Always explain your reasoning before using a tool.
If you can't find information, say so clearly."""),
("human", "{input}"),
("placeholder", "{agent_scratchpad}")
])
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=5,
handle_parsing_errors=True
)
# Run the agent
result = agent_executor.invoke({
"input": "What's the weather in San Francisco and what's 15% of 847?"
})
print(result["output"])Memory and Conversation Management
Effective memory management is crucial for conversational AI applications. LangChain provides multiple memory types for different use cases, from simple conversation buffers to sophisticated summarization and entity extraction.
# Advanced Memory Management
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import (
ChatMessageHistory,
RedisChatMessageHistory
)
from langchain_core.chat_history import BaseChatMessageHistory
from typing import Dict
# In-memory session store (use Redis in production)
session_store: Dict[str, ChatMessageHistory] = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
"""Retrieve or create session history"""
if session_id not in session_store:
session_store[session_id] = ChatMessageHistory()
return session_store[session_id]
# For production, use Redis
def get_redis_history(session_id: str) -> BaseChatMessageHistory:
return RedisChatMessageHistory(
session_id=session_id,
url="redis://localhost:6379",
ttl=3600 # 1 hour expiry
)
# Create a conversational chain with memory
llm = ChatOpenAI(model="gpt-4-turbo")
prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful customer service assistant for TechCorp.
You help users with product information, orders, and technical support.
Be friendly, professional, and concise."""),
MessagesPlaceholder(variable_name="history"),
("human", "{input}")
])
chain = prompt | llm
# Wrap with message history
conversational_chain = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history"
)
# Usage with session management
def chat(user_id: str, message: str) -> str:
"""Process a chat message with conversation history"""
response = conversational_chain.invoke(
{"input": message},
config={"configurable": {"session_id": user_id}}
)
return response.content
# Example conversation
user_id = "user_123"
print(chat(user_id, "Hi, I need help with my order"))
print(chat(user_id, "The order number is #12345"))
print(chat(user_id, "When will it arrive?")) # Remembers contextLangGraph for Complex Workflows
LangGraph extends LangChain with support for cyclic graphs, enabling more sophisticated agent architectures. It's particularly useful for multi-step workflows with conditional logic, human-in-the-loop checkpoints, and parallel execution.
# LangGraph Multi-Agent Workflow
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
from typing import TypedDict, Annotated, List
import operator
# Define the state schema
class AgentState(TypedDict):
messages: Annotated[List, operator.add]
current_agent: str
task_complete: bool
# Initialize models for different agents
researcher = ChatOpenAI(model="gpt-4-turbo", temperature=0.7)
writer = ChatOpenAI(model="gpt-4-turbo", temperature=0.9)
reviewer = ChatOpenAI(model="gpt-4-turbo", temperature=0.3)
# Define agent nodes
def research_agent(state: AgentState) -> AgentState:
"""Research agent gathers information"""
messages = state["messages"]
last_message = messages[-1].content if messages else ""
response = researcher.invoke([
HumanMessage(content=f"""You are a research agent. Research the following topic
and provide key facts, statistics, and insights:
Topic: {last_message}
Provide structured research findings.""")
])
return {
"messages": [AIMessage(content=f"[Research]: {response.content}")],
"current_agent": "writer",
"task_complete": False
}
def writer_agent(state: AgentState) -> AgentState:
"""Writer agent creates content based on research"""
messages = state["messages"]
research = next(
(m.content for m in reversed(messages) if "[Research]" in m.content),
""
)
response = writer.invoke([
HumanMessage(content=f"""You are a content writer. Based on the following research,
write a compelling article section:
{research}
Write engaging, informative content.""")
])
return {
"messages": [AIMessage(content=f"[Draft]: {response.content}")],
"current_agent": "reviewer",
"task_complete": False
}
def reviewer_agent(state: AgentState) -> AgentState:
"""Reviewer agent checks quality"""
messages = state["messages"]
draft = next(
(m.content for m in reversed(messages) if "[Draft]" in m.content),
""
)
response = reviewer.invoke([
HumanMessage(content=f"""You are an editor. Review this content for:
1. Accuracy
2. Clarity
3. Engagement
Content:
{draft}
Provide feedback and a final verdict: APPROVED or NEEDS_REVISION""")
])
needs_revision = "NEEDS_REVISION" in response.content.upper()
return {
"messages": [AIMessage(content=f"[Review]: {response.content}")],
"current_agent": "writer" if needs_revision else "end",
"task_complete": not needs_revision
}
# Routing function
def route_agent(state: AgentState) -> str:
if state["task_complete"]:
return END
return state["current_agent"]
# Build the graph
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("researcher", research_agent)
workflow.add_node("writer", writer_agent)
workflow.add_node("reviewer", reviewer_agent)
# Set entry point
workflow.set_entry_point("researcher")
# Add edges
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")
workflow.add_conditional_edges(
"reviewer",
route_agent,
{
"writer": "writer", # Revision needed
END: END # Approved
}
)
# Compile with checkpointing
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
# Run the workflow
inputs = {
"messages": [HumanMessage(content="AI in Healthcare: 2026 Trends")],
"current_agent": "researcher",
"task_complete": False
}
# Execute with thread ID for persistence
config = {"configurable": {"thread_id": "article_1"}}
for event in app.stream(inputs, config=config):
for node, output in event.items():
print(f"\n{'='*50}")
print(f"Node: {node}")
if output.get("messages"):
print(output["messages"][-1].content[:500])Production Deployment Patterns
Deploying LangChain applications to production requires attention to performance, reliability, and observability. Here are key patterns for production success.
# Production-Ready LangChain Service
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.callbacks import BaseCallbackHandler
from langsmith import Client
import asyncio
import logging
import time
from typing import Optional, AsyncGenerator
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# LangSmith for observability (set LANGCHAIN_API_KEY and LANGCHAIN_TRACING_V2=true)
langsmith_client = Client()
app = FastAPI(title="LangChain Production API")
# Custom callback for metrics
class MetricsCallback(BaseCallbackHandler):
def __init__(self):
self.start_time = None
self.token_count = 0
def on_llm_start(self, *args, **kwargs):
self.start_time = time.time()
def on_llm_end(self, response, *args, **kwargs):
duration = time.time() - self.start_time
logger.info(f"LLM call completed in {duration:.2f}s")
def on_llm_new_token(self, token: str, *args, **kwargs):
self.token_count += 1
# Request/Response models
class ChatRequest(BaseModel):
message: str
session_id: Optional[str] = None
stream: bool = False
class ChatResponse(BaseModel):
response: str
session_id: str
tokens_used: int
# Initialize chain
llm = ChatOpenAI(
model="gpt-4-turbo",
temperature=0.7,
request_timeout=30,
max_retries=3
)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{input}")
])
chain = prompt | llm | StrOutputParser()
@app.post("/chat", response_model=ChatResponse)
async def chat_endpoint(request: ChatRequest):
"""Non-streaming chat endpoint"""
try:
metrics = MetricsCallback()
response = await chain.ainvoke(
{"input": request.message},
config={"callbacks": [metrics]}
)
return ChatResponse(
response=response,
session_id=request.session_id or "anonymous",
tokens_used=metrics.token_count
)
except Exception as e:
logger.error(f"Chat error: {e}")
raise HTTPException(status_code=500, detail=str(e))
@app.post("/chat/stream")
async def chat_stream_endpoint(request: ChatRequest):
"""Streaming chat endpoint"""
async def generate() -> AsyncGenerator[str, None]:
try:
async for chunk in chain.astream({"input": request.message}):
yield f"data: {chunk}\n\n"
yield "data: [DONE]\n\n"
except Exception as e:
logger.error(f"Stream error: {e}")
yield f"data: Error: {str(e)}\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream"
)
@app.get("/health")
async def health_check():
"""Health check endpoint"""
return {"status": "healthy"}Best Practices Summary
LangChain Production Checklist
Use LangSmith for tracing and debugging
Implement proper error handling and retries
Cache embeddings and frequent queries
Use streaming for better UX
Monitor token usage and costs
Implement rate limiting
Use async operations for better performance
Test with realistic data volumes
Set up alerting for failures
Version your prompts and chains
Conclusion
LangChain provides a comprehensive framework for building production-ready LLM applications. From simple chains to complex multi-agent systems, the modular architecture allows you to start simple and scale as needed. Combined with LangGraph for complex workflows and LangSmith for observability, you have everything needed to build reliable AI systems.
The key to success is starting with well-defined use cases, implementing proper observability from day one, and iterating based on real-world feedback. As LLM technology continues to advance, LangChain's abstractions will help you adopt new models and capabilities without rewriting your applications.
Ready to build production AI applications? Contact Jishu Labs for expert LangChain development services. Our team has built dozens of production LLM systems and can help you accelerate your AI initiatives.
About Michael Chen
Michael Chen is the AI Engineering Lead at Jishu Labs, specializing in building production AI systems. He has implemented LLM solutions for Fortune 500 companies and contributes to open-source AI frameworks.