How-tos

Claude Agent SDK: Build Production AI Agents Without Starting from Scratch

Master the Claude Agent SDK to build autonomous AI agents for your SaaS. Complete guide with code examples, architecture patterns, and common pitfalls.

Nico Acosta

December 24, 2025

16 min read

You've been building features by hand while your competitors ship AI agents that work around the clock. The Claude Agent SDK—the same engine powering Claude Code—is now available as a library.

In the next 20 minutes, you'll understand exactly how to build an autonomous agent that reads your codebase, fixes bugs, and ships features while you focus on landing customers. No PhD required. No six-month learning curve. Just working code you can deploy this weekend.

#What Is the Claude Agent SDK and Why Should You Care?

Here's the situation: you've seen what Claude Code can do. It reads files, runs commands, fixes bugs, and ships features autonomously. What you might not know is that the entire engine powering Claude Code is now available as a library you can drop into your own product.

That's the Claude Agent SDK.

Think of it this way: if Claude Code is the finished car, the Agent SDK is the engine you can install in your own chassis. Same power, your design. The SDK gives you Claude's entire agent loop—the part that decides what to do, uses tools, and verifies its work—without you having to reinvent any of it.

One important note: you might see references to "Claude Code SDK" in older articles or search results. Anthropic renamed it to "Claude Agent SDK" in late 2025 to reflect its broader use cases beyond just coding tasks.

Don't confuse this with the Anthropic Client SDK. The Client SDK requires you to implement the tool loop yourself—you send a prompt, get a response, execute any tools manually, send results back, repeat. It's a lot of plumbing.

The Agent SDK handles all of that autonomously. You send a prompt, and the agent reads files, runs commands, makes edits, and verifies its own work without you writing the orchestration logic.

Loading diagram...

What does this mean for your product? Your agent can handle customer support tickets, debug code, generate reports, or analyze documents while you sleep. Each of those is a potential paid feature. The SDK removes the build-from-scratch tax so you can focus on what makes your product unique.

#How Do I Install and Set Up the Claude Agent SDK?

A botched setup can waste hours. Let's get this right the first time so you're building in minutes, not debugging your environment.

The SDK comes in two flavors: TypeScript and Python. Pick whichever matches your stack. Both have identical capabilities—the agent loop, built-in tools, streaming, sessions, everything.

Requirements:

Python 3.10+ for the Python SDK
Node.js 18+ for the TypeScript SDK
Claude Code CLI is bundled automatically with both packages

Here's the copy-paste installation:


1# TypeScript/Node.js
2npm install @anthropic-ai/claude-agent-sdk
3
4# Python
5pip install claude-agent-sdk
6
7# Set your API key (get it from console.anthropic.com)
8export ANTHROPIC_API_KEY=your-api-key

That's it. One command and you're ready.

One gotcha that trips people up: version mismatch between the Claude Code CLI and the SDK. If you're getting weird agent recognition errors, run claude --version and make sure it matches the SDK requirements in the docs. This is the most common support question on the GitHub issues, and the fix is always "update your CLI."

Never hardcode your API key in source files. Environment variables keep your credentials out of git history and make deployment cleaner. Your future self (and anyone who reviews your code) will thank you.

Faster setup means faster time-to-demo. When a potential customer asks "can you show me how this works?", you want to be deploying agents, not debugging npm installs.

#What's the Core Agent Loop and How Does It Work?

Understanding the agent loop is the difference between debugging agents quickly and spending days confused about why your agent isn't working.

The loop has three phases that repeat until the task is done:

Gather context - The agent reads files, searches the codebase, or spawns subagents to collect information
Take action - Execute tools, run bash commands, generate code, make edits
Verify work - Check if the output is correct, run tests, validate assumptions

If verification fails, the loop repeats. The agent gathers more context, tries a different approach, and verifies again. This feedback mechanism is what makes agents actually useful—they self-correct instead of confidently shipping broken code.

Loading diagram...

Here's the loop in action:


1import { query } from "@anthropic-ai/claude-agent-sdk";
2
3// The SDK handles the entire loop for you
4for await (const message of query({
5  prompt: "Find and fix the bug in auth.py",
6  options: {
7    allowedTools: ["Read", "Edit", "Bash"],
8    permissionMode: "acceptEdits"
9  }
10})) {
11  console.log(message);
12  // Claude reads the file, finds the bug, edits it, verifies the fix
13}

The biggest pitfall here: agents try to one-shot complex tasks. They'll attempt to implement an entire feature in a single pass, run out of context mid-implementation, and leave you with half-working code. The fix is explicit task breakdown—give your agent smaller, focused tasks rather than "build me an authentication system."

Context management is the production differentiator. Pushing entire conversation history on each API call exhausts your token budget fast. The SDK includes automatic context compaction that summarizes older exchanges, but you should still design your prompts to request focused, specific actions.

A well-tuned agent loop handles 10x more customer requests without human intervention. That's the difference between a support burden and a profit center.

#What Built-In Tools Are Available and When Should I Use Each?

The SDK ships with eight core tools that handle 90% of what you'll need. Here's what each does and when to reach for it:

Tool	What it does	When to use
Read	Read any file in the working directory	Viewing code, configs, documentation
Write	Create new files	Generating new components or configs
Edit	Make precise edits to existing files	Bug fixes, refactoring, updates
Bash	Run terminal commands and scripts	Git operations, npm, tests, builds
Glob	Find files by pattern	Discovering files: `*/.ts`, `src/*/.py`
Grep	Search file contents with regex	Finding function calls, variable usage
WebSearch	Search the internet	Looking up current documentation or APIs
WebFetch	Fetch and parse web pages	Reading docs, scraping structured data

Loading diagram...

Here's the critical security lesson: don't enable all tools by default. Start with read-only access (Read, Glob, Grep) and add write capabilities only after you've validated the agent's behavior. One Reddit thread described an agent that ran rm -rf on a test directory because Bash was enabled without restrictions. Start paranoid, loosen permissions carefully.

Each tool you enable is a capability you can market. "AI that fixes your bugs" requires the Edit tool. "AI that deploys your code" requires Bash. Think about which capabilities map to features your customers will pay for, then enable only those.

Building an agent is the easy part. Knowing which features to build first is where most founders waste months. BrainGrid turns your vague ideas into structured specs with AI-ready tasks—so you ship features that convert, not features that collect dust.

#How Do I Add Custom Tools and Integrate External APIs?

Built-in tools cover file operations and web access. But your agent needs to talk to your product—your CRM, database, Stripe, Slack, whatever powers your business. That's where custom tools and MCP come in.

Model Context Protocol (MCP) is Anthropic's standardized way to connect agents to external services. Instead of writing OAuth flows and API wrappers yourself, you plug in pre-built MCP servers for Slack, GitHub, Asana, Playwright, databases, and hundreds more. They handle authentication and API calls. You just configure them.

Here's how to add browser automation with the Playwright MCP server:


1import { query } from "@anthropic-ai/claude-agent-sdk";
2
3for await (const message of query({
4  prompt: "Open our pricing page and verify it loads correctly",
5  options: {
6    mcpServers: {
7      playwright: {
8        command: "npx",
9        args: ["@playwright/mcp@latest"]
10      }
11    }
12  }
13})) {
14  console.log(message);
15}

For custom integrations that don't have pre-built servers, you define your own tools with input validation and safety guards:


1import { z } from "zod";
2
3const createContactTool = {
4  name: "create_crm_contact",
5  description: "Create a new contact in the CRM",
6  inputSchema: z.object({
7    email: z.string().email(),
8    name: z.string().min(1),
9    plan: z.enum(["free", "pro", "enterprise"])
10  }),
11  handler: async ({ input, context }) => {
12    // Always add timeout protection
13    const controller = new AbortController();
14    const timeoutId = setTimeout(() => controller.abort(), 5000);
15
16    try {
17      const result = await crmClient.createContact(input, {
18        signal: controller.signal
19      });
20      clearTimeout(timeoutId);
21      return result;
22    } catch (error) {
23      clearTimeout(timeoutId);
24      throw error;
25    }
26  }
27};

Loading diagram...

The critical pitfall: failing to sandbox tools. Running shell commands or database queries without timeouts creates runaway processes and security holes. Every custom tool should have a timeout wrapper, input validation, and explicit error handling.

Every integration you add is a potential upsell. "Connect to Stripe" becomes a Pro feature. "Sync with Slack" becomes an Enterprise add-on. Plan your integrations around what customers will pay for before you build them.

#How Do I Handle Long-Running Agents and Subagents?

Real production tasks take minutes or hours, not seconds. A security audit across a large codebase. Analyzing months of customer support tickets. Generating comprehensive documentation. Agents that crash mid-task lose customer data and trust.

The SDK handles this through two mechanisms: sessions and subagents.

Sessions maintain context across multiple exchanges. Your agent can work on a task, you can close your laptop, come back tomorrow, and resume exactly where it left off. Context is preserved—files read, analysis done, conversation history.


1let sessionId: string;
2
3// First query: capture session ID
4for await (const message of query({
5  prompt: "Read the authentication module and understand how it works",
6  options: { allowedTools: ["Read", "Glob", "Grep"] }
7})) {
8  if (message.type === "system" && message.subtype === "init") {
9    sessionId = message.session_id;
10  }
11}
12
13// Later (hours or days later): resume with full context preserved
14for await (const message of query({
15  prompt: "Now find all places that call the auth module",
16  options: { resume: sessionId }
17})) {
18  console.log(message);
19  // Agent remembers everything from the first query
20}

Subagents handle task complexity by isolating context windows. When your main agent hits a complex subtask, it spawns a specialized subagent to handle it. The subagent works in its own context, returns relevant excerpts, and the parent continues without context explosion.


1import asyncio
2from claude_agent_sdk import query, ClaudeAgentOptions
3
4async def analyze_codebase():
5    # Enable Task tool to let Claude spawn subagents automatically
6    async for message in query(
7        prompt="Analyze this codebase for security vulnerabilities",
8        options=ClaudeAgentOptions(
9            allowed_tools=["Read", "Glob", "Grep", "Task"]
10        )
11    ):
12        print(message)
13    # Claude may spawn subagents for:
14    # - SQL injection analysis
15    # - XSS vulnerability scanning
16    # - Dependency auditing
17
18asyncio.run(analyze_codebase())

Loading diagram...

One common problem: CPU usage spikes to 100% when spawning too many subagents simultaneously. Claude tries to parallelize aggressively, which is great for speed but can overwhelm modest hardware. Limit concurrency in your .claude configuration or implement explicit concurrency controls in your orchestration layer.

The SDK also includes automatic context compaction. For very long-running operations, it summarizes older parts of the conversation to prevent token exhaustion. You don't need to implement this—it happens automatically—but understanding it helps you design better prompts.

Agents that handle hour-long analysis tasks without crashing can charge premium pricing. Reliability is a feature customers pay for.

#What Are the Critical Mistakes That Cost You Customers?

Every crashed agent is a churned customer. Every silent failure erodes trust. Here are the mistakes that kill production agents—and how to avoid them.

Mistake	What You'll See	The Fix
No tool time limits	Runaway processes, hung requests	Wrap every tool handler in a 5-second timeout
Full history on every call	Token exhaustion mid-task	Use conversation summaries + selective retrieval
No streaming backpressure	UI freezes, stalled responses	Flush SSE/websocket frames explicitly
Hardcoded agent prompts	Can't update without redeploy	Store agent templates in a config service
No verification layer	Silent failures, wrong outputs	Add rules-based + visual feedback loops
Single model dependency	Outages cascade to users	Route fast tasks to Haiku, complex to Sonnet

Loading diagram...

The most insidious mistake: marking features complete without end-to-end testing. Agents will confidently report "task complete" while the feature is actually broken. Without verification—whether that's automated tests, visual checks, or LLM-as-judge evaluation—you're shipping silent failures.

Permission sprawl is the fastest path to unsafe autonomy. Treat tool access like production IAM: start from deny-all, allow only what each agent needs, require explicit confirmations for sensitive actions, and block dangerous commands entirely.

One person on the Anthropic community described their experience:

"We deployed an agent that could run Bash commands for our internal tooling. Worked great in dev. In production, a weird edge case triggered git reset --hard on a customer's repo. Three hours of their work, gone. Now every destructive command requires human approval, no exceptions."

Each security hole is a lawsuit waiting to happen. Each crashed agent is a support ticket. Each silent failure is revenue walking out the door. Build verification into every agent from day one.

#How Do I Deploy My Agent to Production This Weekend?

An agent sitting on your laptop earns $0. You need it live, in front of customers, collecting feedback and proving value. Here's the fastest path from "working locally" to "deployed and demo-ready."

The proven stack for a weekend deploy:

┌─────────────────────────────────────────────────────────┐
│                     Your SaaS                           │
├─────────────────────────────────────────────────────────┤
│  Next.js Frontend                                       │
│  ├── /app/api/agent/route.ts  ← Claude Agent SDK       │
│  └── /app/dashboard           ← User interface          │
├─────────────────────────────────────────────────────────┤
│  Database (Postgres)                                    │
│  ├── Sessions table (agent state)                       │
│  └── Results table (outputs)                            │
├─────────────────────────────────────────────────────────┤
│  Vercel                                                 │
│  ├── Edge Functions (API routes)                        │
│  └── Cron Jobs (scheduled agents)                       │
└─────────────────────────────────────────────────────────┘

Here's a minimal API route that streams agent responses to your frontend:


1// app/api/agent/route.ts
2import { query } from "@anthropic-ai/claude-agent-sdk";
3import { NextRequest } from "next/server";
4
5export async function POST(req: NextRequest) {
6  const { prompt, sessionId } = await req.json();
7
8  const encoder = new TextEncoder();
9  const stream = new ReadableStream({
10    async start(controller) {
11      for await (const message of query({
12        prompt,
13        options: {
14          allowedTools: ["Read", "Glob", "Grep"],
15          resume: sessionId,
16          permissionMode: "bypassPermissions"
17        }
18      })) {
19        controller.enqueue(
20          encoder.encode(`data: ${JSON.stringify(message)}\n\n`)
21        );
22      }
23      controller.close();
24    }
25  });
26
27  return new Response(stream, {
28    headers: { "Content-Type": "text/event-stream" }
29  });
30}

Loading diagram...

The critical deployment gotcha: rate limiting. Without it, one eager user can exhaust your entire monthly API budget in an hour. Add request limits per user, per session, and per time window before you go live.

Feature-flag new capabilities per tenant. When you add a new tool or integration, roll it out to beta users first, validate it doesn't break anything, then expand. This saves you from deploying a bug to 100% of customers simultaneously.

Add monitoring from day one. Plug SDK hooks into your observability stack—Datadog, OpenTelemetry, whatever you're already using. Capture tool latency, token usage, error rates. You can't improve what you don't measure, and you definitely can't debug production issues without logs.

A deployed agent is a demo-able product. Demo-able products close deals. Get it live, then iterate.

You now have everything you need to build production AI agents with the Claude Agent SDK. But building the right agent—one that customers actually pay for—requires more than code. It requires clarity on what to build and why.

BrainGrid transforms your product ideas into AI-ready specifications, breaking complex features into tasks that agents can execute. Stop guessing. Start shipping.

About the Author

Nico Acosta is the Co-founder & CEO of BrainGrid, where we're building the future of AI-assisted software development. With over 20 years of experience in Product Management building developer platforms at companies like Twilio and AWS, Nico focuses on building platforms at scale that developers trust.

Want to discuss AI coding workflows or share your experiences? Find me on X or connect on LinkedIn.

Keep Reading

How-tos

Building the BrainGrid Way: From Idea to Shipped Software

A clear mental model for turning ideas into working software with AI. Four stages — Idea, Epic, Requirements, Build — no guesswork, no restarts.

Feb 20, 2026

6 min read

How-tos

Vibe Coding a SaaS Startup With BrainGrid

How I found a startup idea on IdeaBrowser, pivoted it to match my experience, and used BrainGrid's spec-driven development to vibe code an immigration concierge MVP in a weekend.

Feb 18, 2026

11 min read

How-tos

How to Get SaaS Product Ideas: 7 Proven Methods for Solo Founders

7 proven methods to find profitable SaaS ideas: Reddit frustrations, YC Request for Startups, indie hacker playbooks, and validation tactics. For solo founders using AI coding tools.

Feb 9, 2026

12 min read

Get Started

Ready to build without the back-and-forth?

Turn messy thoughts into engineering-grade prompts that coding agents can nail the first time.

Re-love coding with AI