Engineering

Cutting AI Costs with a Single Line: Anthropic Tool Caching in AI SDK v5

How we reduced our AI costs and sped up queries by adding three lines of code to enable Anthropic's tool caching in AI SDK v5.

Cutting AI Costs with a Single Line: Anthropic Tool Caching in AI SDK v5

Every AI API call we send with the BrainGrid base agent sends the same tool definitions. Every. Single. Time.

When you're building an AI agent with a dozen tools, each with detailed schemas and descriptions, you're looking at a couple thousand tokens that get sent with every request. It's like paying for coffee whether you drink it or not.

We recently migrated to AI SDK v5 for other reasons (hello, proper TypeScript types and tool streaming), but buried in the release notes was a feature that caught our eye: support for Anthropic's ephemeral caching through providerOptions.

For context, BrainGrid is an AI-powered platform that helps developers turn messy ideas into crystal-clear specs that AI coding assistants can actually implement. We analyze your codebase, ask the right clarifying questions, and break requirements down into atomic, verifiable, AI-ready tasks. Each task becomes a precise prompt with full context, so your AI IDE (Cursor, Claude Code, etc.) gets it right the first time. Behind the scenes, we're orchestrating multiple specialized agents with dozens of tools—which is why this SDK migration touched everything.

The Problem: Why pay for slower performance?

Here's what happens in a typical AI agent workflow:

  1. User asks a question
  2. We send the question + all tool definitions to the AI
  3. AI responds with tool calls
  4. We execute the tools and send results back... with all the tool definitions again
  5. Repeat until done

Our read_web_page tool alone has a schema that's hundreds of tokens. Multiply that by all our tools, then by the number of turns in a conversation, and you're burning tokens like they're going out of style.

The kicker? Tool definitions rarely change. We're essentially paying to send the same static data over and over, and it is slower.

The Approach: AI SDK v5 + Anthropic Caching

During our v5 migration, we noticed the new providerOptions parameter. Turns out, Anthropic had quietly released tool caching for exactly this use case.

The documentation was sparse, but the concept was simple: mark your tool definitions as cacheable, and Anthropic will store them for reuse. You pay 25% more for cached input tokens, but you only pay 10% of input tokens for the cache hit.

Here's the thing, since tool definitions are not changing often, this is a great candidate for caching. It's a no-brainer. Faster and cheaper.

The Implementation: Low-hanging fruit in three lines

The implementation is simple. We just need to add three lines of code to the tool definition.

const readWebPageTool = tool({ description: readWebPageTool.description, inputSchema: readWebPageTool.schema, // These three lines. That's it. providerOptions: { anthropic: { cacheControl: { type: 'ephemeral' } }, }, execute: async ({ url }) => { const { results } = await exa.getContents(url); return results; }, });

That's literally it. Three lines of configuration.

The Gotcha: Order Matters

Here's where it gets interesting. You'd think you'd need to add caching to every tool, right? Nope.

Anthropic's caching works on a "cache point" system. When you mark a tool as cacheable, everything before it in the request gets cached too. This means you only need to cache the last tool in your array:

const tools = { generateRequirements: this.getGenerateRequirementsAITool(), generateSubtasks: this.getGenerateSubtasksAITool(), clarifyingQuestions: this.getClarifyingQuestionsAITool(), thinking: this.getThinkingAITool(), webSearch: this.getWebSearchAITool(), // IMPORTANT: read_web_page must be the last tool // Its caching configuration will cause all tools before it to be cached readWebPage: this.getReadWebPageAITool(), // ← Only this one has caching };

It's a quirk of how Anthropic implemented caching, but it works in our favor. One configuration point, all tools cached.

The Results: Faster, Cheaper, Better

After deploying this change:

  • Token costs dropped despite the 25% premium on cached tokens
  • Queries feel snappier because less data is transmitted
  • No behavior changes - it just works

The math is simple: if you use tools more than 4 times in a conversation (and trust me, you will), caching pays for itself. After that, it's pure savings.

The Takeaway

Sometimes the best optimizations are embarrassingly simple. We spent weeks optimizing our context, refactoring our prompts, and even considering switching providers.

The actual solution? Three lines of configuration.

If you're using AI SDK v5 with Anthropic and you have tools in your agent, add caching. Today. Your CFO will thank you, your users will notice the speed improvement, and you'll wonder why you didn't do it sooner.

Just remember: put the cached tool last in your array. Don't ask me how long it took us to figure that out.


About the Author

Nico Acosta is the Co-founder and CEO of BrainGrid, where we're building the future of AI-assisted product development. With over 20 years of experience in Product Management, AI, cloud platforms, and developer tools at companies like AWS and Twilio, Nico focuses on building products that are force multipliers for developers.

Want to discuss Agent architectures or share your experiences? Find me on X or connect on LinkedIn.

Interested in turning half-baked thoughts into crystal-clear, AI-ready specs and tasks that your IDE can nail, the first time? Check out BrainGrid - Follow us on X for updates.

Ready to stop babysitting your AI coding assistant?

Join the waitlist for BrainGrid and experience AI-powered planning that makes coding agents actually work.

Join the Waitlist