Back to Videos

Getting More Out of the Claude Platform

Channel Anthropic
Date May 6, 2026
Duration 28 min
Tags Claude Platform, Prompt Caching, Tool Search, Compaction, Advisor Strategy
TL;DR

A practical, demo-driven session showing five platform patterns that meaningfully reduce cost and improve quality for Claude-powered agents: prompt caching, tool search, programmatic tool calling, context compaction, and the advisor strategy. Each pattern includes a before/after comparison and an explanation of when to apply it.

Key Takeaways

Summary

Pattern 1: Prompt Caching

Prompt caching reuses computed key-value representations of tokens that appear at the start of a prompt. The key to maximizing cache hit rate is structure: put stable content (system prompt, tool definitions, long documents) at the front of every request. Variable content (user messages, task-specific context) goes at the end. In live demos, Anthropic shows cache hit rates moving from 20% to 85% with prompt reordering alone.

Best for: agents with large system prompts, RAG applications with shared document context, APIs called with the same configuration repeatedly.

Pattern 2: Tool Search

Instead of loading 50 tools into every context, tool search dynamically retrieves the 5-10 tools most relevant to the current task. This is implemented via a vector index of tool definitions — at inference time, the model's task description is used to retrieve the most relevant tools. Token savings are significant; quality is equal or better because the model isn't distracted by irrelevant options.

Best for: agents with large tool libraries, MCP-equipped agents, domain-specific agents that need different tools in different modes.

Pattern 3: Programmatic Tool Calling

Not all tool calls need to go through the model. When your code knows deterministically that a tool should be called (e.g., always retrieve the current date, always fetch the user's profile), call it directly and inject the result into context. Reserve model-driven tool selection for genuinely ambiguous cases. This cuts latency and token usage for the deterministic portion of your workflow.

Pattern 4: Compaction

Compaction is periodic context compression. Rather than letting context grow unboundedly over a long session, compaction summarizes prior turns into a structured representation — preserving key decisions, artifacts, and state while discarding verbose reasoning chains. Anthropic demonstrates compaction keeping a 10-hour agent session under 50K tokens at all times, with quality metrics unchanged vs. the uncompacted baseline.

Pattern 5: The Advisor Strategy

The advisor strategy uses two models: a fast, inexpensive model handles most interactions, and a more capable model acts as an advisor — reviewing and refining the fast model's output for tasks that warrant it. The trigger for escalation can be confidence-based (low-confidence outputs route to the advisor), task-based (complex reasoning tasks always use the advisor), or sampling-based (periodic advisor review as a quality gate).

Cost reduction of 60-80% vs. always using the most capable model, with quality delta of less than 5% on representative evals.

Stacking the Patterns

The final demo combines all five patterns in a single long-running agent. The result: a 10x cost reduction vs. the naive implementation with no measurable quality degradation. The session closes with a pattern selection guide — which patterns apply to which use cases.

Notable Quotes

"The difference between an agent that costs $0.50 per run and $5 per run is usually not the model — it's these five patterns."

"Tool search sounds like a niche optimization. At scale, it's the difference between your agent working and your agent failing with context overflow."

Pattern Selection Guide

PatternBest ForTypical Savings
Prompt cachingShared system prompts, RAG50–80% on repeated calls
Tool searchLarge tool libraries20–40% context reduction
Programmatic tool callingDeterministic tool use10–30% latency reduction
CompactionLong-running sessionsContext stays bounded
Advisor strategyCost/quality balance60–80% cost reduction

References