The third file in the agent canon

FLYWHEEL.md

How your AI coding agents ship real software, and prove it works.

One Markdown file in your repo. AGENTS.md says what to build. FLYWHEEL.md is the playbook for how it ships: build it, prove it in production, learn, improve, with a human in the loop where it counts.

01Ship
02Verify
03Learn
04Improve
The file

It's just a Markdown file.

Here is the whole thing. Drop it in your repo root, next to AGENTS.md and SOUL.md, and your agents read it before they touch anything. Copy this starter, then make it yours.

FLYWHEEL.md
# FLYWHEEL.md

How an agent ships and improves this project, turn by turn.
AGENTS.md = what to do. SOUL.md = who to be. FLYWHEEL.md = how to ship.

## The loop
Ship, verify, learn, improve. Each turn compounds.

## The stages (rename, add, or remove to fit your project)
1. Plan. Propose the approach and the blast radius. Gate: a human signs off if it is risky.
2. Build. Small, reversible steps.
3. Review. Diff, tests, data flow. Gate (optional): a human or a second agent reviews.
4. Ship. Merge, release, deploy. Land the whole chain.
5. Verify. Prove it in production, with evidence. A synthetic pass is not proof.
6. Learn. Cost, regressions, feedback. Gate (often): wait for real-world signal.
7. Improve. Fix the cause, raise the bar, delete the toil.

## The bar (holds every stage)
- Done means deployed and verified, with evidence.
- Every iteration costs money.
- Know your data flow.
- Fix the cause, never the symptom.
- Leave a trail.

New to it? Browse example flywheels for CLIs, libraries, services, frontends, and ML projects. Steal it, fork it, no attribution needed.

Why this file exists

Writing code was never the hard part.

Agents can write code all day. The hard part is everything after: does it actually work, in production, for a real person? And can you prove it?

That's the loop: ship → verify → learn → improve. Run it with discipline and software starts improving itself, safely. Run it without and you get confident, untested, unobservable change, at machine speed. The question under every autonomous codebase: what happens when the loop closes without a human in it?

AGENTS.mdwhat to do (the project's instructions)
SOUL.mdwho to be (the agent's identity)
FLYWHEEL.mdhow to ship, and how to know you did
The loop

The stages a change travels.

This is what's inside FLYWHEEL.md: the stages a change moves through. Each stage has a finish line, and some have a gate, a point where the agent stops and waits for a human before going on. The seven below are a starting point. Rename, add, or remove them to match how your team ships.

Three things to read here: the stages (the steps a change travels), the gates (where a human stays in control), and the bar (the rules that hold at every stage).

01

Plan

Restate the goal, propose the approach, name the blast radius.

Done when: the plan and the risks are written down.Gate: a human signs off on anything risky, irreversible, or ambiguous.
02

Build

Make the change in small, reversible steps.

Done when: it runs and the diff is self-contained.
03

Review

Read your own diff, run tests and linters, trace the data flow.

Gate (optional): a human or a second agent reviews before merge.
04

Ship

Merge, release, deploy. Land the whole chain, not just the merge.

Done when: the change is live where users are.
05

Verify

Prove it works in production, by you, with evidence: a screenshot, a real request, real output.

Done when: you have seen it work for real. Passing tests are not the same as proof.
06

Learn

Capture what actually happened: cost, regressions, the surprise, user feedback.

Gate (often): wait for real-world signal before the next turn.
07

Improve

Fix the cause, raise the bar, delete the toil.

Done when: the next turn starts smarter than this one did.

Read the full FLYWHEEL.md on GitHub →

Humans stay in the loop. A flywheel is not "run unattended forever." It says exactly where a human gates a stage and the agent pauses for feedback, then resumes when you reply. A CLI, a model, and a web service each get a different loop and different gates.

Principles

A few rules that hold every turn.

The loop is the shape of your process. These don't change between stages, whatever stages you choose.

The other half

A loop you can't see is a liability.

Self-improving software only works if you can watch it work. ClawMetry is the observability layer for agents in the loop: every iteration, every cost, every change, in real time.

The loop is the discipline. The proof is the product.