What it is: Context rot is the observed degradation of large language model output quality as the input context grows longer. Documented across 18 frontier models in Chroma’s July 2025 study. Affects every major model, including Claude Opus 4, Sonnet 4, GPT-4.1, o3, GPT-4o, and Gemini 2.5 Pro.
Who it is for: Developers running long agent sessions, RAG pipelines, or multi-turn conversations where context length matters.
Best if: You want a short reference on what context rot is and how to work around it.
Skip if: You only use AI through short, single-turn chat interfaces. Want one practical AI workflow every morning? Subscribe to our free daily newsletter.
What is Context Rot?
Context rot is the empirically-observed degradation of large language model performance as the input context grows longer. Chroma’s July 2025 study tested 18 frontier models (including Claude Opus 4, Sonnet 4, Haiku 3.5, GPT-4.1, o3, GPT-4o, Gemini 2.5 Pro, and Qwen3-235B) on the same task at varying input lengths and found that every model degrades as context grows — even on simple tasks. The degradation is non-uniform and unpredictable: position effects, distractor interference, and even structural coherence of the input all influence the size of the drop. Counter-intuitively, shuffled context sometimes outperforms logically-coherent prose.
Why does Context Rot matter?
Context rot is one of the most under-appreciated forces in long-running agent reliability. A 1M-token context window does NOT mean you can usefully load 1M tokens of input — quality drops well before the ceiling. This is why a Map approach (load high-level structure plus targeted reads) outperforms a Manual approach (load the whole codebase) on multi-hour agent tasks. It is also a major reason why session resets are necessary on long-running work: the act of clearing the window and starting fresh with a structured handoff recovers quality that context rot would otherwise eat.
How does Context Rot work?
Three effects compound. Position effect: tokens near the start and end of context are recalled accurately; tokens in the middle 40–60% (HumanLayer’s “Dumb Zone”) are not. Distractor interference: a single semantically-similar distractor sentence reduces accuracy versus a clean baseline; multiple distractors compound. Coherence paradox: shuffled context sometimes beats logically-coherent prose, suggesting the model is using surface cues rather than understanding structure.
Practical implications for working developers: keep context lean, reset aggressively, use subagents for investigation work so exploration doesn’t pollute the main session, and bake context reset into long workflows.
Related terms
Learn more on Beginners in AI
Sources and further reading
- Chroma — Context Rot study (July 2025)
- Context Rot research repository
- Anthropic — Effective harnesses for long-running agents
Last reviewed: May 2026. AI terminology evolves quickly — verify specifics on the official source pages above.
Get Smarter About AI Every Morning
Free daily newsletter — one term, one tool, one tip. Plain English.
Free forever. Unsubscribe anytime.
You may also like
- Long-Running Claude Code Tasks
- Harness Engineering for Beginners
- Why AI Coding Agents Fail
- Claude
- Context window
- Context engineering
- AI Glossary
Two ways to go further
The AI Prompt Library
1,000+ ready-to-use prompts for Claude, ChatGPT, and Gemini. Stop staring at a blank box.
Get it for $39 →2-Hour Live AI Crash Course
A private, beginner-friendly session across Claude, ChatGPT, Gemini, and the wider landscape.
Book for $125 →