Caching, Harnesses, and Advisors: Building on Claude at GitHub Scale

Channel Anthropic

Date May 6, 2026

Duration 26 min

Tags GitHub, Prompt Caching, Scale, Platform Engineering, Advisor Strategy

TL;DR

GitHub ships Claude to millions of developers across four surfaces — chat, CLI, coding agent, and code review. Mario Rodriguez (CPO) and Brad Abrams (Anthropic) explain how prompt caching, evaluation harnesses, and the new advisor strategy are the levers that make quality and cost manageable at that scale.

Key Takeaways

Four surfaces, one model — GitHub Copilot uses Claude across chat, CLI (Claude Code), the coding agent, and code review — each with different latency and quality requirements
Prompt caching is the cost lever — At GitHub's volume, aggressive caching on system prompts and shared context cuts costs dramatically without sacrificing quality
Evaluation harnesses gate every release — GitHub built structured eval pipelines that test Claude on representative developer tasks before any rollout
The advisor strategy — Use a cheaper, faster model for most calls, escalate to a more capable model for tasks that need deeper reasoning; dramatic cost reduction with minimal quality loss
Tool search over static toolsets — Instead of giving Claude a fixed list of tools, dynamically surface only the relevant tools for the current task to keep context lean
Compaction for long sessions — GitHub Copilot's coding agent uses compaction to summarize prior context, enabling sessions that span hours without degradation

Summary

The GitHub Copilot Claude Stack

GitHub is one of Anthropic's largest and most demanding platform customers. Mario Rodriguez describes Copilot as a multi-surface product where each surface has distinct requirements: chat needs low latency and broad coverage, the coding agent needs deep reasoning and long context, code review needs precision and conciseness. The challenge is delivering quality across all of them without the cost structure becoming prohibitive.

Prompt Caching at Scale

Prompt caching is GitHub's most important cost control mechanism. By caching system prompts, common context, and shared tool definitions, GitHub avoids reprocessing the same tokens millions of times per day. Brad Abrams walks through the caching architecture and the specific patterns that achieve the highest cache hit rates — including structuring prompts so the stable prefix is always at the front.

The Advisor Strategy

The advisor strategy is a pattern where a fast, cheap model handles the initial response, and a more capable model acts as an advisor — reviewing and refining only when the task warrants it. For code review specifically, this cuts costs significantly while preserving quality on the cases that matter: complex refactors, security-sensitive changes, and cross-file reasoning.

Evaluation Harnesses

GitHub built eval pipelines that run against a curated set of developer tasks before any model update ships. These aren't synthetic benchmarks — they're drawn from real Copilot usage patterns. The harness measures task completion rate, diff quality, and latency. No update goes to production without clearing the eval gate.

Tool Search and Compaction

Rather than loading all tools into every context, GitHub uses tool search to surface only the tools relevant to the current task. Combined with compaction — periodic summarization of prior context into a dense representation — this keeps token counts manageable even in sessions that span hundreds of turns.

Notable Quotes

"At the scale we operate, the difference between a 60% cache hit rate and an 85% cache hit rate is the difference between a product that's economically viable and one that isn't."

"The advisor strategy sounds simple but it took us a while to trust it. Once we had the evals to back it up, we couldn't unsee how good the economics were."

Patterns from This Talk

Pattern	Use Case	Benefit
Prompt caching	Shared system prompts	Cost reduction
Advisor strategy	Code review, reasoning tasks	Cost + quality balance
Tool search	Long-running agents	Context efficiency
Compaction	Multi-hour sessions	Context persistence
Eval harnesses	Release gating	Quality assurance