Getting More Out of the Claude Platform

Channel Anthropic

Date May 6, 2026

Duration 28 min

Tags Claude Platform, Prompt Caching, Tool Search, Compaction, Advisor Strategy

TL;DR

A practical, demo-driven session showing five platform patterns that meaningfully reduce cost and improve quality for Claude-powered agents: prompt caching, tool search, programmatic tool calling, context compaction, and the advisor strategy. Each pattern includes a before/after comparison and an explanation of when to apply it.

Key Takeaways

Prompt caching — Structure prompts so the stable prefix caches; re-read cost drops to near zero for repeat calls with the same system context
Tool search — Don't load all tools into context; use tool search to surface only the tools relevant to the current task
Programmatic tool calling — Call tools directly from code rather than via model-driven selection for high-confidence, deterministic operations
Compaction — Periodically compress long-running context into a dense summary; preserves relevant state without ballooning token counts
Advisor strategy — Use a fast, cheap model for most calls; escalate to a more capable model as an advisor for tasks that need deeper reasoning
Stack them — The patterns compound: caching + tool search + compaction together can reduce costs by 10x+ on long-running agents

Summary

Pattern 1: Prompt Caching

Prompt caching reuses computed key-value representations of tokens that appear at the start of a prompt. The key to maximizing cache hit rate is structure: put stable content (system prompt, tool definitions, long documents) at the front of every request. Variable content (user messages, task-specific context) goes at the end. In live demos, Anthropic shows cache hit rates moving from 20% to 85% with prompt reordering alone.

Best for: agents with large system prompts, RAG applications with shared document context, APIs called with the same configuration repeatedly.

Pattern 2: Tool Search

Instead of loading 50 tools into every context, tool search dynamically retrieves the 5-10 tools most relevant to the current task. This is implemented via a vector index of tool definitions — at inference time, the model's task description is used to retrieve the most relevant tools. Token savings are significant; quality is equal or better because the model isn't distracted by irrelevant options.

Best for: agents with large tool libraries, MCP-equipped agents, domain-specific agents that need different tools in different modes.

Pattern 3: Programmatic Tool Calling

Not all tool calls need to go through the model. When your code knows deterministically that a tool should be called (e.g., always retrieve the current date, always fetch the user's profile), call it directly and inject the result into context. Reserve model-driven tool selection for genuinely ambiguous cases. This cuts latency and token usage for the deterministic portion of your workflow.

Pattern 4: Compaction

Compaction is periodic context compression. Rather than letting context grow unboundedly over a long session, compaction summarizes prior turns into a structured representation — preserving key decisions, artifacts, and state while discarding verbose reasoning chains. Anthropic demonstrates compaction keeping a 10-hour agent session under 50K tokens at all times, with quality metrics unchanged vs. the uncompacted baseline.

Pattern 5: The Advisor Strategy

The advisor strategy uses two models: a fast, inexpensive model handles most interactions, and a more capable model acts as an advisor — reviewing and refining the fast model's output for tasks that warrant it. The trigger for escalation can be confidence-based (low-confidence outputs route to the advisor), task-based (complex reasoning tasks always use the advisor), or sampling-based (periodic advisor review as a quality gate).

Cost reduction of 60-80% vs. always using the most capable model, with quality delta of less than 5% on representative evals.

Stacking the Patterns

The final demo combines all five patterns in a single long-running agent. The result: a 10x cost reduction vs. the naive implementation with no measurable quality degradation. The session closes with a pattern selection guide — which patterns apply to which use cases.

Notable Quotes

"The difference between an agent that costs $0.50 per run and $5 per run is usually not the model — it's these five patterns."

"Tool search sounds like a niche optimization. At scale, it's the difference between your agent working and your agent failing with context overflow."

Pattern Selection Guide

Pattern	Best For	Typical Savings
Prompt caching	Shared system prompts, RAG	50–80% on repeated calls
Tool search	Large tool libraries	20–40% context reduction
Programmatic tool calling	Deterministic tool use	10–30% latency reduction
Compaction	Long-running sessions	Context stays bounded
Advisor strategy	Cost/quality balance	60–80% cost reduction