Claude has emerged as one of the most capable and reliable large language models for enterprise applications. With its exceptional reasoning abilities, large context window, and strong safety features, Claude is increasingly the model of choice for production AI systems. This comprehensive guide covers everything you need to know to integrate Claude API into your applications, from basic setup to advanced patterns like streaming, tool use, and vision capabilities. For building more complex AI workflows, see our guides on LangChain and RAG systems.
Why Choose Claude for Your Application?
Before diving into implementation, let's understand what makes Claude stand out in the crowded LLM landscape. Claude excels in several areas that matter for production applications: nuanced understanding of complex instructions, consistent and predictable outputs, strong performance on coding tasks, and built-in safety features that reduce the risk of harmful outputs.
- 200K Token Context Window: Process entire codebases, long documents, or extensive conversation histories
- Superior Reasoning: Excels at multi-step reasoning, analysis, and following complex instructions
- Tool Use (Function Calling): Native support for structured tool interactions and API calls
- Vision Capabilities: Analyze images, charts, diagrams, and screenshots
- Consistent Output: Reliable formatting and adherence to specified output structures
- Safety First: Constitutional AI training reduces harmful outputs without sacrificing capability
Getting Started with Claude API
To begin using Claude API, you'll need to create an account at console.anthropic.com and generate an API key. Anthropic offers several model tiers: Claude 3.5 Sonnet for the best balance of speed and capability, Claude 3.5 Haiku for fast, cost-effective tasks, and Claude 3 Opus for the most complex reasoning tasks.
Installation and Setup
Anthropic provides official SDKs for Python and TypeScript/JavaScript. Let's set up both environments for maximum flexibility in your projects.
# Python installation
pip install anthropic
# Node.js / TypeScript installation
npm install @anthropic-ai/sdk
# Set your API key as environment variable
export ANTHROPIC_API_KEY='your-api-key-here'Here's a basic example in both Python and TypeScript to verify your setup is working correctly:
# Python - Basic Claude API Call
import anthropic
client = anthropic.Anthropic() # Uses ANTHROPIC_API_KEY env var
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Explain the difference between REST and GraphQL in 3 sentences."
}
]
)
print(message.content[0].text)// TypeScript - Basic Claude API Call
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic(); // Uses ANTHROPIC_API_KEY env var
async function main() {
const message = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [
{
role: 'user',
content: 'Explain the difference between REST and GraphQL in 3 sentences.'
}
]
});
console.log(message.content[0].text);
}
main();Implementing Streaming Responses
For chat applications and real-time interfaces, streaming responses provide a much better user experience. Instead of waiting for the entire response to generate, users see tokens appear as they're produced. This is essential for production applications where perceived latency matters.
# Python - Streaming Responses
import anthropic
client = anthropic.Anthropic()
def stream_response(prompt: str):
"""Stream Claude's response token by token"""
with client.messages.stream(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # New line at end
# Access final message for metadata
final_message = stream.get_final_message()
print(f"\nTokens used: {final_message.usage.input_tokens + final_message.usage.output_tokens}")
# Usage
stream_response("Write a haiku about software engineering")// TypeScript - Streaming with Server-Sent Events (SSE)
import Anthropic from '@anthropic-ai/sdk';
import { Response } from 'express';
const client = new Anthropic();
async function streamToClient(prompt: string, res: Response) {
// Set SSE headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const stream = await client.messages.stream({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }]
});
for await (const event of stream) {
if (event.type === 'content_block_delta' &&
event.delta.type === 'text_delta') {
res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`);
}
}
const finalMessage = await stream.finalMessage();
res.write(`data: ${JSON.stringify({
done: true,
usage: finalMessage.usage
})}\n\n`);
res.end();
}Tool Use (Function Calling)
One of Claude's most powerful features is tool use, which allows the model to call functions you define. This enables Claude to interact with external APIs, databases, and services in a structured way. Unlike simple text extraction, tool use provides guaranteed JSON output that matches your schema.
# Python - Tool Use Example
import anthropic
import json
from typing import Any
client = anthropic.Anthropic()
# Define tools with JSON Schema
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location. Use this when the user asks about weather conditions.",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state/country, e.g., 'San Francisco, CA' or 'London, UK'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit preference"
}
},
"required": ["location"]
}
},
{
"name": "search_database",
"description": "Search the product database for items matching criteria.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query for product names or descriptions"
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "home", "sports"],
"description": "Product category filter"
},
"max_price": {
"type": "number",
"description": "Maximum price filter"
}
},
"required": ["query"]
}
}
]
def execute_tool(name: str, inputs: dict) -> Any:
"""Execute a tool and return results"""
if name == "get_weather":
# In production, call actual weather API
return {
"temperature": 72,
"unit": inputs.get("unit", "fahrenheit"),
"condition": "sunny",
"humidity": 45
}
elif name == "search_database":
# In production, query actual database
return {
"results": [
{"name": "Wireless Headphones", "price": 79.99},
{"name": "Bluetooth Speaker", "price": 49.99}
],
"total": 2
}
return {"error": "Unknown tool"}
def chat_with_tools(user_message: str):
"""Handle a conversation with tool use"""
messages = [{"role": "user", "content": user_message}]
# Initial API call
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=messages
)
# Process tool calls in a loop
while response.stop_reason == "tool_use":
# Find tool use blocks
tool_results = []
for block in response.content:
if block.type == "tool_use":
# Execute the tool
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
})
# Add assistant response and tool results to messages
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
# Continue conversation
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=messages
)
# Return final text response
return response.content[0].text
# Usage
result = chat_with_tools("What's the weather like in San Francisco?")
print(result)Vision Capabilities
Claude can analyze images, making it invaluable for applications that need to understand visual content. From analyzing charts and diagrams to extracting text from screenshots, Claude's vision capabilities open up many possibilities.
# Python - Image Analysis
import anthropic
import base64
import httpx
client = anthropic.Anthropic()
def analyze_image_from_url(image_url: str, prompt: str) -> str:
"""Analyze an image from a URL"""
# Fetch image and convert to base64
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")
# Determine media type from URL
media_type = "image/jpeg" # Default
if image_url.endswith(".png"):
media_type = "image/png"
elif image_url.endswith(".gif"):
media_type = "image/gif"
elif image_url.endswith(".webp"):
media_type = "image/webp"
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": image_data,
},
},
{
"type": "text",
"text": prompt
}
],
}
],
)
return message.content[0].text
def analyze_local_image(file_path: str, prompt: str) -> str:
"""Analyze a local image file"""
with open(file_path, "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
# Determine media type from extension
ext = file_path.lower().split(".")[-1]
media_types = {
"jpg": "image/jpeg",
"jpeg": "image/jpeg",
"png": "image/png",
"gif": "image/gif",
"webp": "image/webp"
}
media_type = media_types.get(ext, "image/jpeg")
message = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": media_type,
"data": image_data,
},
},
{
"type": "text",
"text": prompt
}
],
}
],
)
return message.content[0].text
# Usage examples
result = analyze_local_image(
"architecture_diagram.png",
"Describe this system architecture diagram. What are the main components and how do they interact?"
)
print(result)Building a Production-Ready Chat Application
Let's put everything together to build a production-ready chat application with conversation history, system prompts, and proper error handling.
// TypeScript - Production Chat Application
import Anthropic from '@anthropic-ai/sdk';
import { Redis } from 'ioredis';
interface Message {
role: 'user' | 'assistant';
content: string;
}
interface ConversationContext {
systemPrompt: string;
messages: Message[];
metadata: {
userId: string;
sessionId: string;
createdAt: Date;
};
}
class ClaudeChat {
private client: Anthropic;
private redis: Redis;
private model: string;
private maxTokens: number;
constructor(options: {
model?: string;
maxTokens?: number;
redisUrl?: string;
} = {}) {
this.client = new Anthropic();
this.redis = new Redis(options.redisUrl || process.env.REDIS_URL!);
this.model = options.model || 'claude-3-5-sonnet-20241022';
this.maxTokens = options.maxTokens || 4096;
}
async createConversation(
userId: string,
systemPrompt: string
): Promise<string> {
const sessionId = `chat_${Date.now()}_${Math.random().toString(36).slice(2)}`;
const context: ConversationContext = {
systemPrompt,
messages: [],
metadata: {
userId,
sessionId,
createdAt: new Date()
}
};
await this.redis.set(
`conversation:${sessionId}`,
JSON.stringify(context),
'EX',
86400 // 24 hour expiry
);
return sessionId;
}
async sendMessage(
sessionId: string,
userMessage: string
): Promise<AsyncGenerator<string, void, unknown>> {
// Retrieve conversation context
const contextJson = await this.redis.get(`conversation:${sessionId}`);
if (!contextJson) {
throw new Error('Conversation not found');
}
const context: ConversationContext = JSON.parse(contextJson);
// Add user message
context.messages.push({ role: 'user', content: userMessage });
// Prepare messages for API
const apiMessages = context.messages.map(m => ({
role: m.role as 'user' | 'assistant',
content: m.content
}));
// Create streaming response
const self = this;
async function* streamResponse(): AsyncGenerator<string, void, unknown> {
let fullResponse = '';
try {
const stream = await self.client.messages.stream({
model: self.model,
max_tokens: self.maxTokens,
system: context.systemPrompt,
messages: apiMessages
});
for await (const event of stream) {
if (event.type === 'content_block_delta' &&
event.delta.type === 'text_delta') {
fullResponse += event.delta.text;
yield event.delta.text;
}
}
// Save assistant response to context
context.messages.push({ role: 'assistant', content: fullResponse });
await self.redis.set(
`conversation:${sessionId}`,
JSON.stringify(context),
'EX',
86400
);
} catch (error) {
if (error instanceof Anthropic.APIError) {
console.error(`API Error: ${error.status} - ${error.message}`);
throw new Error(`Claude API error: ${error.message}`);
}
throw error;
}
}
return streamResponse();
}
async getConversationHistory(sessionId: string): Promise<Message[]> {
const contextJson = await this.redis.get(`conversation:${sessionId}`);
if (!contextJson) {
throw new Error('Conversation not found');
}
return JSON.parse(contextJson).messages;
}
async deleteConversation(sessionId: string): Promise<void> {
await this.redis.del(`conversation:${sessionId}`);
}
}
// Usage with Express
import express from 'express';
const app = express();
const chat = new ClaudeChat();
app.post('/api/chat/create', async (req, res) => {
const { userId, systemPrompt } = req.body;
const sessionId = await chat.createConversation(
userId,
systemPrompt || 'You are a helpful assistant.'
);
res.json({ sessionId });
});
app.post('/api/chat/message', async (req, res) => {
const { sessionId, message } = req.body;
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
try {
const stream = await chat.sendMessage(sessionId, message);
for await (const chunk of stream) {
res.write(`data: ${JSON.stringify({ text: chunk })}\n\n`);
}
res.write(`data: ${JSON.stringify({ done: true })}\n\n`);
} catch (error) {
res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
}
res.end();
});Error Handling and Retry Logic
Production applications must handle API errors gracefully. Claude API may return errors due to rate limits, overloaded servers, or invalid requests. Implementing proper retry logic with exponential backoff ensures reliability.
# Python - Robust Error Handling with Retries
import anthropic
import time
from functools import wraps
from typing import TypeVar, Callable
T = TypeVar('T')
def with_retries(
max_retries: int = 3,
base_delay: float = 1.0,
max_delay: float = 60.0
) -> Callable:
"""Decorator for automatic retries with exponential backoff"""
def decorator(func: Callable[..., T]) -> Callable[..., T]:
@wraps(func)
def wrapper(*args, **kwargs) -> T:
last_exception = None
for attempt in range(max_retries + 1):
try:
return func(*args, **kwargs)
except anthropic.RateLimitError as e:
last_exception = e
if attempt < max_retries:
# Use retry-after header if available
delay = min(
base_delay * (2 ** attempt),
max_delay
)
print(f"Rate limited. Retrying in {delay}s...")
time.sleep(delay)
except anthropic.APIStatusError as e:
if e.status_code >= 500:
# Server error - retry
last_exception = e
if attempt < max_retries:
delay = base_delay * (2 ** attempt)
print(f"Server error {e.status_code}. Retrying in {delay}s...")
time.sleep(delay)
else:
# Client error - don't retry
raise
except anthropic.APIConnectionError as e:
last_exception = e
if attempt < max_retries:
delay = base_delay * (2 ** attempt)
print(f"Connection error. Retrying in {delay}s...")
time.sleep(delay)
raise last_exception
return wrapper
return decorator
class RobustClaudeClient:
"""Claude client with built-in error handling"""
def __init__(self):
self.client = anthropic.Anthropic()
@with_retries(max_retries=3)
def complete(self, prompt: str, **kwargs) -> str:
"""Send a completion request with automatic retries"""
response = self.client.messages.create(
model=kwargs.get("model", "claude-3-5-sonnet-20241022"),
max_tokens=kwargs.get("max_tokens", 1024),
messages=[{"role": "user", "content": prompt}],
**{k: v for k, v in kwargs.items()
if k not in ["model", "max_tokens"]}
)
return response.content[0].text
def safe_complete(self, prompt: str, fallback: str = "", **kwargs) -> str:
"""Complete with fallback on any error"""
try:
return self.complete(prompt, **kwargs)
except Exception as e:
print(f"Error: {e}. Returning fallback.")
return fallback
# Usage
client = RobustClaudeClient()
result = client.complete("Explain quantum computing in simple terms")
print(result)Cost Optimization Strategies
Claude API usage can become expensive at scale. Implementing cost optimization strategies helps manage expenses while maintaining quality.
- Model Selection: Use Haiku for simple tasks, Sonnet for complex ones, Opus only when necessary
- Prompt Caching: Cache system prompts to reduce input tokens on repeated calls
- Response Caching: Cache responses for identical queries using Redis or similar
- Token Monitoring: Track usage per user/feature to identify optimization opportunities
- Prompt Engineering: Concise, well-structured prompts reduce token usage
- Truncation: Limit conversation history to recent messages when context allows
# Python - Cost-Optimized Claude Client
import anthropic
import hashlib
import json
from typing import Optional
import redis
class CostOptimizedClient:
"""Claude client with caching and cost tracking"""
# Pricing per 1M tokens (as of 2024)
PRICING = {
"claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
"claude-3-5-haiku-20241022": {"input": 0.25, "output": 1.25},
"claude-3-opus-20240229": {"input": 15.00, "output": 75.00}
}
def __init__(self, redis_url: str):
self.client = anthropic.Anthropic()
self.cache = redis.from_url(redis_url)
self.total_cost = 0.0
def _cache_key(self, prompt: str, model: str) -> str:
"""Generate cache key from prompt and model"""
content = f"{model}:{prompt}"
return f"claude_cache:{hashlib.sha256(content.encode()).hexdigest()}"
def _calculate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
"""Calculate cost for a request"""
pricing = self.PRICING.get(model, self.PRICING["claude-3-5-sonnet-20241022"])
input_cost = (input_tokens / 1_000_000) * pricing["input"]
output_cost = (output_tokens / 1_000_000) * pricing["output"]
return input_cost + output_cost
def complete(
self,
prompt: str,
model: str = "claude-3-5-sonnet-20241022",
use_cache: bool = True,
cache_ttl: int = 3600,
**kwargs
) -> tuple[str, float]:
"""Complete with caching and cost tracking"""
# Check cache first
if use_cache:
cache_key = self._cache_key(prompt, model)
cached = self.cache.get(cache_key)
if cached:
return json.loads(cached)["response"], 0.0
# Make API call
response = self.client.messages.create(
model=model,
max_tokens=kwargs.get("max_tokens", 1024),
messages=[{"role": "user", "content": prompt}],
**{k: v for k, v in kwargs.items() if k != "max_tokens"}
)
result = response.content[0].text
cost = self._calculate_cost(
model,
response.usage.input_tokens,
response.usage.output_tokens
)
self.total_cost += cost
# Cache result
if use_cache:
self.cache.setex(
cache_key,
cache_ttl,
json.dumps({"response": result})
)
return result, cost
def select_model(self, task_complexity: str) -> str:
"""Select appropriate model based on task complexity"""
if task_complexity == "simple":
return "claude-3-5-haiku-20241022"
elif task_complexity == "complex":
return "claude-3-opus-20240229"
return "claude-3-5-sonnet-20241022"
# Usage
client = CostOptimizedClient("redis://localhost:6379")
# Simple task - use Haiku
response, cost = client.complete(
"What is 2 + 2?",
model=client.select_model("simple")
)
print(f"Response: {response}, Cost: ${cost:.6f}")
# Complex task - use Sonnet
response, cost = client.complete(
"Analyze this code for security vulnerabilities...",
model=client.select_model("moderate")
)
print(f"Total session cost: ${client.total_cost:.4f}")Best Practices Summary
Claude API Integration Checklist
Always use environment variables for API keys
Implement streaming for user-facing applications
Add retry logic with exponential backoff
Cache responses when appropriate
Track token usage and costs
Use the right model for each task
Structure prompts for consistent outputs
Handle all error types gracefully
Set appropriate max_tokens limits
Use tool use for structured data extraction
Frequently Asked Questions
Frequently Asked Questions
What is the difference between Claude 3.5 Sonnet and Claude 3 Opus?
Claude 3.5 Sonnet offers the best balance of speed and capability for most use cases, with faster response times and lower cost. Claude 3 Opus is the most capable model for complex reasoning tasks but costs more and is slower. For most production applications, Sonnet is recommended.
How much does Claude API cost?
Claude API pricing varies by model. Claude 3.5 Sonnet costs $3 per million input tokens and $15 per million output tokens. Claude 3.5 Haiku is more economical at $0.25/$1.25 per million tokens. You can optimize costs using caching, model selection, and prompt engineering.
What is the maximum context window for Claude?
Claude 3.5 Sonnet and Claude 3 Opus support up to 200,000 tokens of context, equivalent to roughly 150,000 words or 500 pages. This allows processing entire codebases, long documents, or extensive conversation histories in a single request.
Can Claude process images and documents?
Yes, Claude has vision capabilities that allow it to analyze images, charts, diagrams, screenshots, and PDFs. You can send images as base64-encoded data or URLs, and Claude can describe, analyze, or extract information from visual content.
How do I handle rate limits with Claude API?
Implement exponential backoff retry logic for rate limit errors (429 status). Anthropic provides rate limit headers in responses. For production, consider request queuing, caching responses, and monitoring your usage against limits.
Conclusion
Claude API provides a powerful foundation for building AI-powered applications. By following the patterns and best practices outlined in this guide, you can create reliable, cost-effective, and scalable AI features. Whether you're building a chatbot, document analyzer, or AI agent, Claude's capabilities combined with proper engineering practices will help you succeed.
As Claude continues to evolve with new features and improved capabilities, the fundamentals covered here will remain relevant. Start with simple integrations, monitor your usage, and gradually adopt more advanced patterns as your needs grow.
Need help integrating Claude into your application? Contact Jishu Labs for expert AI development services. Our team has extensive experience building production AI systems and can help you leverage Claude's capabilities effectively.
About Sarah Johnson
Sarah Johnson is the CTO at Jishu Labs with 15+ years of experience in software architecture and AI systems. She has led the development of enterprise AI solutions and is passionate about making AI accessible and practical for businesses of all sizes.