How your AI coding agents ship real software, and prove it works.
One Markdown file in your repo. AGENTS.md says what to build. FLYWHEEL.md is the playbook for how it ships: build it, prove it in production, learn, improve, with a human in the loop where it counts.
Here is the whole thing. Drop it in your repo root, next to AGENTS.md and SOUL.md, and your agents read it before they touch anything. Copy this starter, then make it yours.
# FLYWHEEL.md How an agent ships and improves this project, turn by turn. AGENTS.md = what to do. SOUL.md = who to be. FLYWHEEL.md = how to ship. ## The loop Ship, verify, learn, improve. Each turn compounds. ## The stages (rename, add, or remove to fit your project) 1. Plan. Propose the approach and the blast radius. Gate: a human signs off if it is risky. 2. Build. Small, reversible steps. 3. Review. Diff, tests, data flow. Gate (optional): a human or a second agent reviews. 4. Ship. Merge, release, deploy. Land the whole chain. 5. Verify. Prove it in production, with evidence. A synthetic pass is not proof. 6. Learn. Cost, regressions, feedback. Gate (often): wait for real-world signal. 7. Improve. Fix the cause, raise the bar, delete the toil. ## The bar (holds every stage) - Done means deployed and verified, with evidence. - Every iteration costs money. - Know your data flow. - Fix the cause, never the symptom. - Leave a trail.
New to it? Browse example flywheels for CLIs, libraries, services, frontends, and ML projects. Steal it, fork it, no attribution needed.
Agents can write code all day. The hard part is everything after: does it actually work, in production, for a real person? And can you prove it?
That's the loop: ship → verify → learn → improve. Run it with discipline and software starts improving itself, safely. Run it without and you get confident, untested, unobservable change, at machine speed. The question under every autonomous codebase: what happens when the loop closes without a human in it?
This is what's inside FLYWHEEL.md: the stages a change moves through. Each stage has a finish line, and some have a gate, a point where the agent stops and waits for a human before going on. The seven below are a starting point. Rename, add, or remove them to match how your team ships.
Three things to read here: the stages (the steps a change travels), the gates (where a human stays in control), and the bar (the rules that hold at every stage).
Restate the goal, propose the approach, name the blast radius.
Make the change in small, reversible steps.
Read your own diff, run tests and linters, trace the data flow.
Merge, release, deploy. Land the whole chain, not just the merge.
Prove it works in production, by you, with evidence: a screenshot, a real request, real output.
Capture what actually happened: cost, regressions, the surprise, user feedback.
Fix the cause, raise the bar, delete the toil.
Read the full FLYWHEEL.md on GitHub →
Humans stay in the loop. A flywheel is not "run unattended forever." It says exactly where a human gates a stage and the agent pauses for feedback, then resumes when you reply. A CLI, a model, and a web service each get a different loop and different gates.
The loop is the shape of your process. These don't change between stages, whatever stages you choose.
Self-improving software only works if you can watch it work. ClawMetry is the observability layer for agents in the loop: every iteration, every cost, every change, in real time.