What it is: A sprint contract is the locked agreement between a generator AI agent and an evaluator AI agent on what will be built and how success will be verified — before any code is written. Named pattern from Anthropic’s three-agent harness, used to lock acceptance criteria upfront.
Who it is for: Developers building multi-agent harnesses or working in teams that run AI agents on production code.
Best if: You want a short reference on what a sprint contract is and why it beats ad-hoc verification.
Skip if: You only use AI for short one-shot tasks. Want one practical AI workflow every morning? Subscribe to our free daily newsletter.
What is Sprint Contract (AI agents)?
A sprint contract is the locked agreement between a generator AI agent (the one doing the work) and an evaluator agent (the one judging it) on what will be built and how success will be verified — before any code is written. The pattern is named in Anthropic’s three-agent harness documentation: “The generator proposed what it would build and how success would be verified, and the evaluator reviewed that proposal.”
Why does Sprint Contract (AI agents) matter?
Sprint contracts lock the acceptance criteria before work begins. Without one, the generator can drift through the task and the evaluator’s only honest answer is “does it look right.” With one, the evaluator has a fixed yardstick: the criteria are written down, the verification steps are concrete, and the generator can’t quietly move the goalposts mid-task. It’s the agent-team version of test-driven development — the spec exists before the code, and the spec is binding.
How does Sprint Contract (AI agents) work?
The contract is usually a short Markdown or JSON file. The generator drafts it; the evaluator reviews it (and may push back on vague criteria); both agree before implementation starts. The contract names: what’s being built (the feature or change), what passing looks like (the verification steps the evaluator will run), and hard thresholds (numerical pass/fail criteria where applicable — latency targets, accuracy floors, test coverage minimums). The implementation phase produces artifacts; the evaluator runs the verification; the result is yes/no per criterion.
Related terms
Learn more on Beginners in AI
Sources and further reading
- Anthropic — Harness design for long-running apps
- InfoQ — Anthropic three-agent harness
- Anthropic — Effective harnesses for long-running agents
Last reviewed: May 2026. AI terminology evolves quickly — verify specifics on the official source pages above.
Get Smarter About AI Every Morning
Free daily newsletter — one term, one tool, one tip. Plain English.
Free forever. Unsubscribe anytime.
You may also like
- AI Agent Verification
- Harness Engineering for Beginners
- Feature Lists for AI Coding Agents
- Evaluator agent
- Subagent
- Feature list
- AI Glossary
Two ways to go further
The AI Prompt Library
1,000+ ready-to-use prompts for Claude, ChatGPT, and Gemini. Stop staring at a blank box.
Get it for $39 →2-Hour Live AI Crash Course
A private, beginner-friendly session across Claude, ChatGPT, Gemini, and the wider landscape.
Book for $125 →