What it is: A practical guide to using a structured feature list (usually feature_list.json) to bound the scope of AI coding agents like Claude Code, Cursor, and Codex. Covers the schema Anthropic uses, why JSON beats Markdown for this job, how the agent reads and updates the file, and the relationship to AGENTS.md and progress files.
Who it is for: Developers who give the agent a clear task and watch it gradually expand into “while I’m here…” refactors, surprise analytics, and rewrites of working code.
Best if: You want a copy-paste sample feature list and a 30-minute setup that ends scope drift.
Skip if: You haven’t used a coding agent yet. Start with Why AI Coding Agents Fail first. Want one practical AI workflow every morning? Subscribe to our free daily newsletter.
Heads up — this is a more intermediate AI topic. If you’re brand new to AI, start with How to Use AI and Best AI Tools for Beginners first. This post assumes you’ve used Claude Code, Cursor, or another coding agent on a real project and know what AGENTS.md is. It builds on Why AI Coding Agents Fail, What is AGENTS.md?, Long-Running Claude Code Tasks, AI Agent Verification, and AI Agent Observability.
What is “scope drift” and why does it happen?
Scope drift is what happens when you give the agent a bounded task and the result you get back is much bigger than the task you asked for. Real examples that working developers have all seen:
- You ask the agent to add a contact form. The diff renames three components, adds a new analytics provider, refactors the routing layer, and updates the design system.
- You ask for a single new API endpoint. The agent also rewrites two unrelated endpoints “for consistency” and adds a middleware layer you never approved.
- You ask for a quick bug fix. The agent fixes the bug and also opens a 300-line PR to “improve the testing infrastructure” along the way.
Three forces drive this. Helpful-by-default training. Modern coding models are trained to be helpful and proactive — if they see something they could improve, they often will, whether or not you asked. Open-ended prompts. Without an explicit list of what’s in scope, the agent’s working definition of “the task” expands to match whatever’s interesting in the surrounding code. No anchor to return to. By message twenty in a session, the agent’s working memory is dominated by recent file reads, not the original goal. Addy Osmani put it well in his long-running-agents writeup: “Over many context windows, agents drift. The original goal gets summarized, then re-summarized, then loses fidelity.”
The fix isn’t a better model. Alignment drift is currently unsolved at the model layer. It has to be fixed at the harness layer — with a feature list.
What is a feature list?
A feature list is a structured file (almost always JSON, almost always named feature_list.json) that enumerates every feature the project will ship. The agent reads it at the start of every session, picks exactly one feature to work on, and updates only the “passing” flag when verification succeeds. Anything not in the list is, by definition, out of scope.
Anthropic uses feature_list.json in their own three-agent harness. For their reference build of a claude.ai-clone, the initializer agent generated 200+ features in one file before any coding agent ran — all marked failing. The coding agents then worked through the list one feature at a time, flipping the passing flag when verification steps confirmed completion.
Why JSON, not Markdown? Anthropic noticed something subtle and important: “the model is less likely to inappropriately change or overwrite JSON files compared to Markdown files.” Markdown invites prose-style rewrites — the agent treats it as something to improve. JSON looks like data, so the agent treats it as something to read and respect. That structural difference is enough to keep the feature list stable across sessions.
What goes in a good feature list?
Anthropic’s documented schema is the core. Each feature has four fields:
category— the type of feature (e.g.,functional,ui,infrastructure).description— one-line human-readable summary of what the feature does.steps— array of verification steps that prove the feature works end-to-end. Machine-verifiable, not vague.passes— boolean. Startsfalse. Flips totrueonly when every verification step passes.
The broader harness community has converged on five common extensions:
id— a stable identifier so you can reference the feature in commits, progress logs, and PRs.statusenum (todo/in_progress/done/blocked/wontdo) when the booleanpassesfield is too coarse.dependencies— IDs of features that must pass first.files— expected paths the feature touches.effort— S/M/L or an hours estimate.
And one top-level array, separate from the features, that does most of the anti-drift work: out_of_scope. An explicit list of things the agent should refuse to do, even if they seem related. OAuth. Password reset. 2FA. Performance optimization. Refactors. Anything you can name as “not for this project” goes here.
What does a real feature_list.json look like?
A copy-paste sample, for a small login feature in a SaaS project:
{
"features": [
{
"id": "auth-01",
"category": "functional",
"description": "User can sign in with email + password",
"steps": [
"POST /login returns 200 with valid creds",
"Invalid creds return 401",
"Session cookie set on success",
"Playwright: user lands on /dashboard after login"
],
"dependencies": [],
"passes": false
},
{
"id": "auth-02",
"category": "functional",
"description": "User can sign out and the session is invalidated",
"steps": [
"POST /logout returns 204",
"Subsequent /me request returns 401",
"Cookie cleared"
],
"dependencies": ["auth-01"],
"passes": false
}
],
"out_of_scope": [
"OAuth or third-party sign-in",
"Password reset emails",
"2FA",
"Magic-link login",
"Account deletion"
]
}
That’s roughly 600 bytes. It bounds the agent’s work, names the explicit non-goals, and gives every feature verification steps that can be checked mechanically. Start with five to ten features for a small project; let an initializer agent generate the long list for a bigger one.
Want concrete agent-engineering workflows daily? The free Beginners in AI daily brief ships one practical pattern per day. Plain English, no tech background required.
How does the agent actually use the feature list?
The discipline Anthropic enforces (and that you can enforce too via AGENTS.md instructions) is short:
- Read
feature_list.jsonfirst. Always. Before reading anything else, before touching any code. - Pick exactly one feature where
passesisfalseand all dependencies are satisfied. - Implement only that feature. If the work would also fix or change something else, note it but don’t act on it.
- Run the verification steps. All of them. Each one passes or the feature isn’t done.
- Flip
passestotrueonly after every verification step passes. Then stop and either start the next session or wait for human review. - Never edit
descriptionorsteps. Anthropic’s exact rule: “It is unacceptable to remove or edit tests because this could lead to missing or buggy functionality.” The implementation may only modify thepassesfield. - Refuse or escalate if asked to do something not on the list. Don’t expand scope on your own initiative.
That last rule is the heart of the pattern. With an out_of_scope array and an explicit refusal protocol, the agent has cover to say “that’s not in the list” rather than scope-creep its way into a 400-line PR.
How does the feature list relate to AGENTS.md and progress files?
Three files, three jobs. The pattern is now standard across the harness-engineering canon:
- AGENTS.md — how this codebase works. Stable. Project-wide. Build commands, conventions, architecture, “do not touch.” Edited by humans, read by every agent. Doesn’t change often.
feature_list.json— what we’re building right now. Changes per project. The initializer-agent or the lead developer writes it; coding agents only flip thepassesfield. Locked once defined.progress.md(orclaude-progress.txt) — append-only log of what’s been tried, what worked, what failed, why. Read on session resume so the next session doesn’t repeat dead ends. See Long-Running Claude Code Tasks for the full pattern.
Addy Osmani frames it as the “ralph-loop” pattern: AGENTS.md plus a project-state file plus a progress log. His blunt observation: “The agent itself is amnesiac, but the filesystem isn’t.” Three text files do most of the work that a real long-term memory system would do.
What about Spec Kit and other tools?
GitHub’s open-source Spec Kit is the most-used tool for spec-driven development in 2026. Its /speckit.tasks command generates a tasks.md from your project plan with dependency ordering, parallel-execution markers, exact file paths, and checkpoint validation between phases. It supports Claude Code, GitHub Copilot, Cursor CLI, Gemini CLI, Codex CLI, Tabnine and roughly thirty other agents.
Spec Kit‘s tasks.md is functionally equivalent to feature_list.json. The four-phase Spec Kit workflow — Spec → Plan → Tasks → Implement — treats the Tasks artifact as a gate. The implement phase can’t reach beyond the tasks file.
Two other tools in this category worth knowing about:
- Conductor ($22M Series A in 2026) is a macOS app for running multiple Claude Code instances on isolated git worktrees, each pinned to a single feature. Visual UI on top of the same one-feature-at-a-time discipline.
- Crystal is a git-worktree-based parallel Claude Code runner with a shared task list and file locking. Free, open-source, command-line.
For a solo developer, a plain feature_list.json in your repo plus the AGENTS.md instruction to read it first is enough. For a team running multiple agents on the same codebase, Spec Kit or Conductor is the natural step up.
What feature-list anti-patterns should you avoid?
- Prose-only feature lists in Markdown. The agent rewrites them. Use JSON.
- Vague acceptance criteria. “Works well” and “looks clean” aren’t checkable. Every verification step must be a yes/no the agent (or the test suite) can answer.
- Letting the agent edit the spec while implementing it. Lock
descriptionandsteps; allow onlypassesto change. If the spec is wrong, escalate to the human, don’t quietly rewrite. - No
out_of_scopearray. Without an explicit non-goals list, every “while I’m here…” gets self-justified. - Features stuck “in progress” forever. Define what closes a feature. If
passeshasn’t flipped in three sessions, either the feature is too big (split it) or the verification is broken (fix it). - Skipping verification steps. The list is decoration without them. Each feature needs at minimum one machine-checkable proof of correctness.
How do you set up your first feature_list.json in 30 minutes?
- Step 1. Create
feature_list.jsonat your repo root. - Step 2. Add 5–10 features, each with
id,description,steps(machine-checkable), andpasses: false. Keepdescriptionshort. - Step 3. Add a top-level
out_of_scopearray. List every adjacent change you do NOT want the agent to make in this project. - Step 4. Add one paragraph to your AGENTS.md: “Always read
feature_list.jsonfirst. Pick one feature withpasses: falseand all dependencies passing. Implement only that feature. Run the verification steps. Flippassestotrueonly if every step passes. Never editdescriptionorsteps. If asked to do something not in the list, say so and stop.” - Step 5. Test it. Hand the agent a real task and watch what happens. The first session usually reveals which features need clearer verification steps.
- Step 6 (optional). If you’re on a bigger project, install Spec Kit and run
/speckit.tasksto generate the file from a plan.
A 30-minute file. The agent stops drifting. Worth it.
Frequently asked questions
Do I need a feature list for a small task?
For a single 5-minute edit, no. For anything that takes more than 30 minutes or touches more than three files, yes — the time to write the list is less than the time you’ll spend reverting scope-creep changes the agent makes without one. A solo developer working on a real project should have one in every active repo.
Why JSON instead of Markdown or YAML?
Anthropic observed empirically that agents are less likely to rewrite JSON files than Markdown files. The structural rigidity reads as “this is data, leave it alone” rather than “this is prose, improve it.” YAML is also fine in practice; Markdown is the worst choice for this specific job.
What if the user asks for something outside the list?
The agent should refuse or escalate, not just do it. The right response is something like: “This isn’t currently on the feature list. Would you like me to add it as a new feature with verification steps, or treat it as out of scope for this session?” That converts an ad-hoc request into a tracked decision and keeps the spec honest.
Should the agent generate the feature list, or should I?
Both work. For a small project, you write it. For a large project, a dedicated “initializer agent” (the Planner role in Anthropic’s three-agent harness) generates it from a brief, then humans review and lock it before any implementation starts. Either way, the implementation agents only flip the passes field — they never edit the spec while building it.
How does this relate to acceptance criteria in a normal product backlog?
Same idea, more machine-readable. The steps array in a feature list is essentially the acceptance-criteria list from a user story, formatted so the agent can verify it programmatically. If your team already writes good acceptance criteria, translating them into JSON takes about ten minutes per feature.
Get Smarter About AI Every Morning
Free daily newsletter — one story, one tool, one tip. Plain English, no jargon.
Free forever. Unsubscribe anytime.
1-on-1 Coaching
Claude AI Crash Course
1-hour private video session with James. Walk through writing your first feature_list.json, wiring it into AGENTS.md, and setting up the three-file harness pattern (AGENTS.md + feature list + progress note) for your project. Best for developers tired of agents scope-creeping.
Group Format
AI Workshops for Teams
Team workshops for engineering departments standardising the three-file harness pattern (AGENTS.md + feature list + progress note) across a codebase, plus Spec Kit rollout if needed. Best for teams of 3+ developers shipping production code with AI agents.
You may also like
- Why AI Coding Agents Fail
- What is AGENTS.md?
- Long-Running Claude Code Tasks
- AI Agent Verification
- AI Agent Observability Guide
- CLAUDE.md: The File That Makes Claude Code Smarter
- AI Glossary
Sources
- Anthropic — Effective harnesses for long-running agents (feature_list.json schema)
- Anthropic — Harness design for long-running application development
- Walking Labs — Learn Harness Engineering (Lecture 8 + Project 04)
- GitHub Spec Kit (spec-driven development)
- Spec Kit documentation
- Addy Osmani — Long-running agents
- Martin Fowler — Harness engineering for coding agent users
- HumanLayer — Skill Issue: Harness Engineering
- InfoQ — Anthropic three-agent harness
- Claude Code — Grokipedia
Last reviewed: May 2026. Harness-engineering patterns are converging fast — verify on the primary sources above before adopting a specific schema for a production project.
Two ways to go further
The AI Prompt Library
1,000+ ready-to-use prompts for Claude, ChatGPT, and Gemini. Stop staring at a blank box.
Get it for $39 →2-Hour Live AI Crash Course
A private, beginner-friendly session across Claude, ChatGPT, Gemini, and the wider landscape.
Book for $125 →