The third file in the agent canon

FLYWHEEL.md

How your AI coding agents ship real software, and prove it works.

One Markdown file in your repo. AGENTS.md says what to build. FLYWHEEL.md is the playbook for how it ships: build it, prove it in production, learn, improve, with a human in the loop where it counts.

See how it works → Get the file ↓

01Ship

02Verify

03Learn

04Improve

The file

It's just a Markdown file.

Here is the whole thing. Drop it in your repo root, next to AGENTS.md and SOUL.md, and your agents read it before they touch anything. Copy this starter, then make it yours.

FLYWHEEL.md

# FLYWHEEL.md

How an agent ships and improves this project, turn by turn.
AGENTS.md = what to do. SOUL.md = who to be. FLYWHEEL.md = how to ship.

## The loop
Ship, verify, learn, improve. Each turn compounds.

## The stages (rename, add, or remove to fit your project)
1. Plan. Propose the approach and the blast radius. Gate: a human signs off if it is risky.
2. Build. Small, reversible steps.
3. Review. Diff, tests, data flow. Gate (optional): a human or a second agent reviews.
4. Ship. Merge, release, deploy. Land the whole chain.
5. Verify. Prove it in production, with evidence. A synthetic pass is not proof.
6. Learn. Cost, regressions, feedback. Gate (often): wait for real-world signal.
7. Improve. Fix the cause, raise the bar, delete the toil.

## The bar (holds every stage)
- Done means deployed and verified, with evidence.
- Every iteration costs money.
- Know your data flow.
- Fix the cause, never the symptom.
- Leave a trail.

New to it? Browse example flywheels for CLIs, libraries, services, frontends, and ML projects. Steal it, fork it, no attribution needed.

Fork the file ↗

Why this file exists

Writing code was never the hard part.

Agents can write code all day. The hard part is everything after: does it actually work, in production, for a real person? And can you prove it?

That's the loop: ship → verify → learn → improve. Run it with discipline and software starts improving itself, safely. Run it without and you get confident, untested, unobservable change, at machine speed. The question under every autonomous codebase: what happens when the loop closes without a human in it?

AGENTS.mdwhat to do (the project's instructions)

SOUL.mdwho to be (the agent's identity)

FLYWHEEL.mdhow to ship, and how to know you did

The loop

The stages a change travels.

This is what's inside FLYWHEEL.md: the stages a change moves through. Each stage has a finish line, and some have a gate, a point where the agent stops and waits for a human before going on. The seven below are a starting point. Rename, add, or remove them to match how your team ships.

Three things to read here: the stages (the steps a change travels), the gates (where a human stays in control), and the bar (the rules that hold at every stage).

Plan

Restate the goal, propose the approach, name the blast radius.

Done when: the plan and the risks are written down.Gate: a human signs off on anything risky, irreversible, or ambiguous.

Build

Make the change in small, reversible steps.

Done when: it runs and the diff is self-contained.

Review

Read your own diff, run tests and linters, trace the data flow.

Gate (optional): a human or a second agent reviews before merge.

Ship

Merge, release, deploy. Land the whole chain, not just the merge.

Done when: the change is live where users are.

Verify

Prove it works in production, by you, with evidence: a screenshot, a real request, real output.

Done when: you have seen it work for real. Passing tests are not the same as proof.

Learn

Capture what actually happened: cost, regressions, the surprise, user feedback.

Gate (often): wait for real-world signal before the next turn.

Improve

Fix the cause, raise the bar, delete the toil.

Done when: the next turn starts smarter than this one did.

Read the full FLYWHEEL.md on GitHub →

Humans stay in the loop. A flywheel is not "run unattended forever." It says exactly where a human gates a stage and the agent pauses for feedback, then resumes when you reply. A CLI, a model, and a web service each get a different loop and different gates.

Principles

A few rules that hold every turn.

The loop is the shape of your process. These don't change between stages, whatever stages you choose.

Done means deployed and verified, with evidence.
A diagnosis is not a fix. A merge is not a deploy. A deploy is not a verification. Land the whole chain, then say done.
Every iteration costs money.
Each poll, each fetch, each run is spend. Treat request volume like a budget you can blow, because you can.
Know your data flow.
"Works on my machine" is the most expensive lie an agent tells. Trace the path from source to screen before you trust it.
Fix the cause, never the symptom.
No skipping a check to go green. No bypassing the guard to make the error disappear. Find the root, or you'll meet it again.
Leave a trail.
The next turn of the loop shouldn't relearn what this one did. Write down the non-obvious: the gotcha, the constraint, the why.

FLYWHEEL.md

It's just a Markdown file.

Writing code was never the hard part.

The stages a change travels.

Plan

Build

Review

Ship

Verify

Learn

Improve

A few rules that hold every turn.

Done means deployed and verified, with evidence.

Every iteration costs money.

Know your data flow.

Fix the cause, never the symptom.

Leave a trail.

A loop you can't see is a liability.