Back to Videos

Caching, Harnesses, and Advisors: Building on Claude at GitHub Scale

Channel Anthropic
Date May 6, 2026
Duration 26 min
Tags GitHub, Prompt Caching, Scale, Platform Engineering, Advisor Strategy
TL;DR

GitHub ships Claude to millions of developers across four surfaces — chat, CLI, coding agent, and code review. Mario Rodriguez (CPO) and Brad Abrams (Anthropic) explain how prompt caching, evaluation harnesses, and the new advisor strategy are the levers that make quality and cost manageable at that scale.

Key Takeaways

Summary

The GitHub Copilot Claude Stack

GitHub is one of Anthropic's largest and most demanding platform customers. Mario Rodriguez describes Copilot as a multi-surface product where each surface has distinct requirements: chat needs low latency and broad coverage, the coding agent needs deep reasoning and long context, code review needs precision and conciseness. The challenge is delivering quality across all of them without the cost structure becoming prohibitive.

Prompt Caching at Scale

Prompt caching is GitHub's most important cost control mechanism. By caching system prompts, common context, and shared tool definitions, GitHub avoids reprocessing the same tokens millions of times per day. Brad Abrams walks through the caching architecture and the specific patterns that achieve the highest cache hit rates — including structuring prompts so the stable prefix is always at the front.

The Advisor Strategy

The advisor strategy is a pattern where a fast, cheap model handles the initial response, and a more capable model acts as an advisor — reviewing and refining only when the task warrants it. For code review specifically, this cuts costs significantly while preserving quality on the cases that matter: complex refactors, security-sensitive changes, and cross-file reasoning.

Evaluation Harnesses

GitHub built eval pipelines that run against a curated set of developer tasks before any model update ships. These aren't synthetic benchmarks — they're drawn from real Copilot usage patterns. The harness measures task completion rate, diff quality, and latency. No update goes to production without clearing the eval gate.

Tool Search and Compaction

Rather than loading all tools into every context, GitHub uses tool search to surface only the tools relevant to the current task. Combined with compaction — periodic summarization of prior context into a dense representation — this keeps token counts manageable even in sessions that span hundreds of turns.

Notable Quotes

"At the scale we operate, the difference between a 60% cache hit rate and an 85% cache hit rate is the difference between a product that's economically viable and one that isn't."

"The advisor strategy sounds simple but it took us a while to trust it. Once we had the evals to back it up, we couldn't unsee how good the economics were."

Patterns from This Talk

PatternUse CaseBenefit
Prompt cachingShared system promptsCost reduction
Advisor strategyCode review, reasoning tasksCost + quality balance
Tool searchLong-running agentsContext efficiency
CompactionMulti-hour sessionsContext persistence
Eval harnessesRelease gatingQuality assurance