The Autonomous Dev Loop

Quality comes from cycles, not heroics.

Design
Code
Review
Post-Merge
Triage
Design

What This Is

An AI that ships finished work — without supervision.

Executive Summary

The goal: An AI that takes an issue from idea to merge-ready PR with no human involvement — and never lets quality slip through the cracks.

The method: Not a single brilliant agent. A system of interlocking loops, each with one job, each checking the others.

The Problem

Most AI coding tools are autocomplete with confidence. They generate code that compiles — and subtly fails in production. No system thinks about what the issue actually asked for. No system measures whether its own reviews change anything.

The Answer

A self-correcting loop. Design documents anchor intent. A dev loop builds and reviews. A post-merge review checks whether the issue was actually solved. Triage keeps the board clean. Each loop feeds the next.

The Philosophy

Think before acting. Run the checks before pushing. Rushing burns more time than patience ever will.

This system is built on one conviction: quality is not a gate at the end — it's a property of the process.

  • Slow is smooth, smooth is fast
    A design doc written up front saves five rebases later. Time spent orienting saves time spent correcting.
  • Two is one and one is none
    Verify everything. One review catches some things. Three independent reviews catch most things. Post-merge review catches the rest.
  • Don't guess. Don't assume.
    Every decision is anchored to a document. Every review is anchored to a commit SHA. Ambiguity compounds into debt.
  • The human's time is sacred
    Everything they see is finished, tested, reviewed, and clean. If it's not done, they don't see it. Assignment is the signal.

Part I

Why design up front — and how it drives everything else.

Design Is Not Optional

Most failures don't happen at code review. They happen earlier — when someone starts writing code without a clear picture of what done looks like.

Without a design document:

  • The PR solves the wrong problem
  • Acceptance criteria are implicit — so they're never checked
  • Review bots find style issues instead of structural ones
  • After merge, there's no way to know if the issue was actually resolved

With a design document:

Example — issue #120 "GitHub PR review support" was filed. Before a line of code was written, a design doc specified: what APIs to call, what the output format must be, what error cases to handle, and what done looks like — 4 explicit acceptance criteria.

The document becomes the contract. Every subsequent loop checks against it.

The Pre-Code Step

Before any issue gets a PR, the pre-code skill runs. It produces a structured plan anchored to the issue.

Issue → Read issue body + comments
      → Identify acceptance criteria (explicit + implied)
      → Research the codebase for context
      → Write a plan: approach, file changes, test strategy
      → Get the plan approved
      → Only then: write code

What the plan locks in:

  • The exact acceptance criteria that will be verified at post-merge review
  • The architectural approach — so review bots know why the code looks the way it does
  • The test strategy — so CI failures have a clear diagnosis path
The plan isn't documentation overhead. It's the input to every downstream loop.

Design as Loop Anchor

The design document doesn't get filed and forgotten. It's referenced at every stage:

During development
The worker reads the plan before writing code. The plan specifies which files to touch — the worker doesn't guess.

During review
Bot reviewers evaluate code against the design intent, not just style. "Does this implementation actually satisfy criterion 3?"

At self-review
The pre-push self-review checks each acceptance criterion explicitly. A clean self-review requires all criteria accounted for.

At post-merge review
After merge, the post-merge skill reads the issue, finds the acceptance criteria, and verifies the PR delivered each one. Anything missed → a new bug issue.

Remove the design document and the post-merge review has nothing to verify against. The loop collapses into vibes.

Part II

The dev loop — and how it self-corrects.

The Dev Loop: Overview

The loop runs on a schedule. Every 10 minutes, it checks state and takes exactly one action. No ambiguity. One rule set. One action per run.

Open PR?
→ no →
Open issues?
→ no →
NO_REPLY
Open PR?
→ no →
Open issues?
→ yes →
Spawn dev worker
PR created
PR open
Active worker?
CI green?
Reviews done?
Findings addressed?
Self-review?
Hand off

The rules are a priority stack, not a checklist:

  1. No open PR → check for open issues → spawn dev worker (pre-code → implement → open PR)
  2. If an active worker is running → stop, don't interfere
  3. If CI is failing → spawn a fix worker
  4. If reviews have unaddressed findings → spawn a fix worker
  5. If self-review is missing → spawn a self-review worker
  6. If everything is clean → apply ready, assign to human

Step 0 is how new work enters. Without it, the loop only maintains in-flight PRs — it never picks up anything new.

Self-Correction: Review Feedback

When a bot posts REQUEST_CHANGES, the dev loop doesn't wait for a human to notice. The next run detects it and responds.

The fix cycle:

gargoyle PR #775 — real run, 2026-05-14 gpt-review-bot posted 2 MAJOR findings against SHA 93d89ba6. Next dev loop run: no fix plan exists for this SHA → add wip label → spawn worker with the findings. Worker addresses both, pushes new commit. Next run: new HEAD, reviews re-triggered. Bots re-review against the new SHA. All APPROVED → self-review worker spawned.

Why SHA-anchoring matters:

Bots review against a specific commit. If the code changes, the old review is stale — even if it said APPROVED. The loop always checks: are these reviews against the current HEAD? A stale APPROVED is treated the same as no review.

Self-Correction: CI Failures

CI failures are treated as blocking — not as noise to retry.

The loop's CI protocol:

  • CI pending → wait (don't act on stale state)
  • CI failed → identify which job, spawn a targeted fix worker
  • CI passed, but reviews pending → wait for reviews (don't skip)
  • CI passed, all reviews green → proceed

What makes this work: The fix worker gets the specific failing job and its logs — not just "CI failed." It reads the actual error, diagnoses it, fixes it, and pushes. No shotgun approaches.

review-bot — real pattern observed across dozens of runs When the gpt-review-bot CI job was still in-flight, the dev loop returned NO_REPLY five runs in a row. No premature action. When reviews landed, the loop immediately identified the findings and spawned a worker. Correct behavior both ways.

The Post-Merge Review Loop

After a PR merges, the dev loop's job is done — but the post-merge review loop starts.

The post-merge review runs hourly. It reads each merged PR, finds the linked issue, and checks:

  1. Were all acceptance criteria from the issue actually delivered?
  2. Did the implementation match the approach in the design?
  3. Are there any gaps that would cause silent failures later?

When it finds a gap → it files a new bug issue on Gitea. The issue includes:

  • Which acceptance criterion was missed
  • What was delivered vs. what was required
  • A link back to the original issue and PR

That bug issue then enters the normal issue backlog. The dev loop picks it up on the next cycle — pre-code, implement, review, post-merge review. The gap closes itself.

gargoyle PR #771 — post-merge review Criterion 2 ("document chosen approach in design doc") not met — only an inline comment, nothing in docs/. → Filed issue #773. Dev loop picked it up. PR #774 delivered the doc. Post-merge review confirmed all criteria satisfied.

The Triage Loop

The triage loop runs every 30 minutes. Its job is observation, not execution.

What it does:

  • Evaluates issues against domain docs: reads design docs and domain knowledge to check whether each issue is correctly and fully specified. If docs resolve ambiguity, requirements are filled in. If the issue conflicts with established behaviour, or docs don't resolve it, adds needs-detail and flags for human — the dev loop will not pick up an unresolved issue
  • Syncs dependency labels: if a blocking issue closes, remove blocked from downstream issues
  • Flags oversized issues: size:L or size:XL without needs-split → add the label
  • Checks PR state: are PRs that are fully reviewed and approved correctly labeled?

What it explicitly does NOT do:

  • Touch PR labels (that's the dev loop's job)
  • Trigger the dev loop (they run independently on their own schedules)
  • Fix anything — it observes, labels, and reports

Why this belongs in triage:
Triage is the only loop with enough context to evaluate intent against the system's documented behaviour. The dev loop is an executor — it shouldn't be resolving ambiguity, it should be implementing clarity. If something is ambiguous, the human decides before any code is written.

Triage is the immune system. It doesn't build anything — it keeps the board honest so every downstream loop always has accurate, complete state to work from.

Evaluating issues against domain docs and regulations requires genuine reasoning — triage runs on Opus with high thinking, not a fast model.

Part III

Why documentation makes the loops work.

Docs Are the Memory

An AI has no persistent state between sessions. Every loop run starts cold. Documentation is the only continuity.

Without docs, every loop run has to re-derive context from the codebase. With docs, each run reads the relevant document and immediately knows:

  • What the system is supposed to do
  • What was decided and why
  • What the current PR is trying to accomplish

Two kinds of docs, two different jobs:

Domain docs

What the system is and why it works the way it does.

Survives rewrites. Stable. Example: "The OrderManager owns placement, tracking, and deduplication of orders. An order exists from intent to fill — the manager is the authority on its state."

Implementation docs

How the current code does it — frameworks, data structures, patterns.

Tied to the implementation. Updated when the approach changes. Example: "Using Process.get/put in init/1 for per-test isolation."

Docs as Loop Fuel

Every loop reads the same documents. What shifts is the perspective — the question being asked of them.

Pre-code

Perspective: What do I need to build?

Issue + codebase + domain docs → produces the design doc and acceptance criteria that anchor everything downstream.

Dev loop

Perspective: Is this implementation correct?

Same docs + PR diff + review comments → evaluates whether the code matches the intent, not just whether it compiles.

Post-merge review

Perspective: Did this deliver what was promised?

Same docs + merged diff → checks reality against the original acceptance criteria, cold, after the fact.

The docs don't change. The question does. That's what makes each loop see something different in the same material.

Project Config: One File, All Loops

Each project has a single YAML config that all loops share:

# memory/projects/review-bot.yaml
repo: rodin/review-bot
gitea_url: https://gitea.weiker.me
api_base: https://gitea.weiker.me/api/v1
patterns_repo: rodin/go-patterns
validation_template: docs/TEMPLATE-FEATURE-VALIDATION.md
assignees: [aweiker]
review_bots: [sonnet-review-bot, gpt-review-bot, security-review-bot]
post_merge_state: memory/state/review-bot-post-merge.json

Why this matters: Every cron job reads the same config. If the repo moves, change one file. If a new reviewer is added, change one file. The loops themselves never need to be touched.

The config is the contract between the operator (Aaron) and the loops.

Skill Files: Logic Lives Here

Cron job prompts are intentionally minimal:

Execute the post-merge-review skill for the review-bot project.
Read ~/.openclaw/workspace/skills/post-merge-review/SKILL.md
and follow it exactly.
Load project config from memory/projects/review-bot.yaml.
If no new PRs were reviewed, respond with exactly NO_REPLY.

The skill file contains the actual logic. Rules, step sequences, error handling, the NO_REPLY contract. This means:

  • Improving the logic = edit the skill file, not 6 cron jobs
  • Adding a new project = copy the config, point at the same skill
  • Debugging = read the skill, not the session transcript
The cron prompt is the trigger. The skill is the brain. Keep them separate.

Review Personas: Narrow Questions Beat Broad Mandates

"Review this code for quality" is not a useful prompt. It spreads attention thin and produces generic feedback. A persona is a reviewer with a specific job, a specific lens, and a specific patterns library.

Personas are configured per project. review-bot has 3; gargoyle has 4:

review-bot (3 reviewers)

sonnet — structural scan, API design, error handling

gpt — breadth scan, gap-finding, compound failure chains

security — input validation, auth boundaries, injection paths

gargoyle (4 reviewers)

elixir-otp — OTP anti-patterns, supervision strategy, process isolation

event-sourcing — event ordering, replay correctness, aggregate boundaries

security — same lens, different codebase context

trading-domain — fill price arithmetic, order state machine, deduplication

The persona set is tailored to the project's risk surface. A Go service needs different eyes than an Elixir trading system.

Pattern Repos: Grounding the Model

LLMs are trained on the whole internet — including every bad Stack Overflow answer, every cargo-culted snippet, every "this works but nobody knows why" pattern. Without something to anchor against, a model confidently generates the most common pattern, which is often not the correct one.

Pattern repos solve this by grounding the reviewer in authoritative source. Not what the internet does — what the actual stdlib, framework, or domain does. Citations point to specific lines in production codebases that have been read, tested, and maintained by experts.

rodin/elixir-patterns
Sourced from elixir-lang/elixir and phoenixframework/phoenix. The reviewer knows what correct OTP looks like because it's reading the OTP source, not a tutorial.

rodin/go-patterns
Sourced from golang/go and kubernetes/kubernetes. Error wrapping, context propagation, interface design — from the engineers who wrote the language.

rodin/security-patterns
Not "here's what secure code looks like" generically — specific attack surfaces, specific mitigations, anchored to real examples.

rodin/trading-patterns
Derived from industry standards and trading regulations. The domain reviewer isn't guessing at trading semantics — it's anchored to how markets actually work, not how the code happens to work today.

The model isn't smarter with pattern repos. It's better constrained. That's more valuable than smarter.

Part IV

How each loop works — in detail.

Dev Loop: Step by Step

1. Read project config
2. Get all open PRs from rodin (the AI author)
3. If no open PRs:
   → Get open issues (unassigned or assigned to rodin, no active PR)
   → If none: NO_REPLY
   → If issues exist: spawn dev worker:
        - Run pre-code skill → write design doc, get criteria
        - Implement in a git worktree on a new branch
        - Push branch, open PR against main
        - Apply wip label, assign to rodin
4. If active worker (wip label, updated < 5 min ago) → NO_REPLY

For the active PR:
5. Check CI status against current HEAD SHA
   → CI pending: NO_REPLY
   → CI failed: spawn fix worker with failing job + logs
6. Check all bot reviews against current HEAD SHA
   → Missing reviews: NO_REPLY (wait for bots)
   → REQUEST_CHANGES with no fix plan: spawn fix worker
7. Check self-review comment for current HEAD SHA
   → Missing: spawn self-review worker
8. All clean:
   → Remove wip label
   → Apply ready label
   → Assign to human
   → Deliver notification

Step 3 is how new work enters the loop. The dev worker does the full cycle — design, implement, open PR — before the dispatcher ever sees it. Steps 4–8 then drive that PR to completion.

Dev Loop: The Dispatcher Pattern

The dev loop is a dispatcher, not a worker. It reads state and makes one decision. The actual work happens in a spawned subagent.

Why split them:

The dispatcher (Haiku)

  • Reads 5–10 API calls
  • Applies priority rules
  • Spawns one worker or returns NO_REPLY
  • Takes 10–30 seconds
  • No tool restrictions needed

The worker (Sonnet)

  • Gets a narrow, specific task
  • Has exec, sessions_spawn, sessions_yield
  • No direct API access — works through code
  • Takes 60–180 seconds
  • Isolated: failure doesn't affect dispatcher state

The dispatcher uses Haiku — cheap and fast for pure API reads. Workers use Sonnet for code reasoning. Right model for the right job.

Post-Merge Review Loop: Step by Step

1. Read project config + state file (lastReviewedMergedAt, reviewedPRs)
2. Fetch recently merged PRs
3. Filter: only PRs merged after lastReviewedMergedAt, not in reviewedPRs
4. If none → NO_REPLY

For each new PR:
5. Read the PR diff (file-by-file)
6. Find the linked issue (from PR body or branch name)
7. Read the issue — extract acceptance criteria
8. Read the design doc / validation template if present
9. For each acceptance criterion:
   → Find evidence in the diff or issue comments
   → Mark: satisfied / partial / missing
10. If any missing/partial → open a bug issue on Gitea
11. Update state file: add PR to reviewedPRs, update lastReviewedMergedAt

The state file is the post-merge review's memory. Without it, every run re-reviews every PR. With it, the loop is incremental — only new merges.

Post-Merge Review Loop: The Gap Pattern

Most PRs have no gaps — the loop returns NO_REPLY in 20 seconds.

When gaps exist, the post-merge review files a precise bug issue:

review-bot issue #84 — auto-filed by post-merge review PR #83: vcs/util.go not delivered
"Issue #82 acceptance criterion 3 required GetAllFilesInPath and BuildLineToPositionMap in the vcs package. These functions appear in review.go (the old location) but were not extracted to vcs/util.go as specified. The file vcs/util.go does not exist in the merged commit."
gargoyle issue #773 — auto-filed by post-merge review PR #771: fail-safe approach not documented
"Issue #763 acceptance criterion 2 requires the chosen approach to be documented in the design doc. The fail-safe logic is explained in inline code comments in ingest_bars.ex but is not recorded in any file under docs/. The acceptance criterion is not satisfied."

These are bugs that human reviewers would never catch — because by the time the post-merge review runs, the code is already merged and "done."

Triage Loop: Step by Step

1. Read project config + domain docs (design docs, CLAUDE.md, validation template)
2. Fetch all open issues (excluding blocked/needs-split/needs-detail)
3. For each issue — evaluate against domain docs:
   → Body empty or missing problem statement: add needs-detail
   → Has content but conflicts with domain docs or regulations: add needs-detail,
     comment with the specific conflict
   → Ambiguous — docs don’t resolve how it should work: add needs-detail,
     flag for human decision
   → Requirements clear and consistent with docs: proceed
   (dev loop will not pick up a needs-detail issue)
4. Fetch all open issues with blocked label
5. For each blocked issue:
   → Check if the blocking issue is closed
   → If closed: remove blocked label
6. Fetch all open issues with size:L or size:XL label
7. For each large issue without needs-split:
   → Add needs-split label
8. Fetch all open PRs
9. Check: any PRs from rodin with fully-approved reviews
   that are still labeled wip?
   → Indicates a stale wip lock — report it
10. Nothing changed → NO_REPLY
    Something changed → deliver notification

Step 3 uses the docs as the authority. The domain docs define how the system must behave. Triage checks each issue against that knowledge — if the docs resolve it, requirements are filled in; if they don’t, a human decides. The dev loop never sees an ambiguous issue.

The NO_REPLY Contract

Every loop ends in one of two states:

NO_REPLY

The entire message to the cron system. No output. No notification. Silent — no notification sent.

Means: Everything is as expected. The loop ran correctly and found nothing to do. Silent success.

A real message

Delivered to the configured channel. Contains a summary of what changed.

Means: Something happened that a human should know about — a PR was handed off, a bug was filed, a stale lock was detected.

Why this matters: If every loop run generated a notification, the channel becomes noise and gets ignored. The signal-to-noise ratio has to be 100% — or the human starts skimming, and the important things get missed.

Silence = healthy. Messages = signal.

Part V

Building review-bot — without a human in the loop.

The Story: review-bot

review-bot is a Go service that reviews PRs on Gitea using AI. It was built almost entirely autonomously — 56 merged PRs across review-bot, 380 across gargoyle, with Rodin doing the full development loop on both.

The challenge: Go code requires operational awareness that AI often misses — org conventions, security instincts, system boundaries. A naive AI generates code that compiles and fails silently.

The solution: Use the loop itself to build the tool that improves the loop.

What ran autonomously:

  • Issue triage and dependency labeling
  • Pre-code design documents
  • PR creation, review, self-review
  • Post-merge reviews
  • Bot review experiments (Sonnet vs GPT-5 vs Opus)

What needed humans:

  • Initial architecture decisions
  • Merging approved PRs
  • Occasional clarification on intent
  • Security-sensitive design choices

Real Example: The CommitID Chain

A single issue spawned a chain of work that demonstrates every loop.

Issue #114: "Thread CommitID through the abstraction layer"

Step 1 — Pre-code Plan produced: add CommitID to vcs.ReviewRequest, thread through gitea.Adapter, use as primary anchor in github.Client.PostReview, wire in main.go. 4 acceptance criteria documented.
Step 2 — Dev loop PR #117 created. gpt-review-bot flagged 2 findings. Worker fixed them, pushed. Bots re-reviewed, all APPROVED. Self-review spawned, passed. Ready label applied, assigned to Aaron.
Step 3 — post-merge review After merge: post-merge review read issue #114, checked all 4 criteria against the merged diff. All satisfied — CommitID added, threaded through all layers, tests cover all new behaviors. No issues filed.

Real Example: The Pagination Gap

Issue #116: "Fix duplicate declaration build error in github package"

PR #119 — what was delivered review.go and identity.go deleted, consolidated into reviews.go. Build error fixed. But during implementation, the developer also added pagination to ListReviews — not in the original issue scope.
post-merge review — what was found Post-merge review verified criterion 1 (build error fixed ✓) and criterion 2 (duplicate declaration removed ✓). The pagination addition was a bonus — correctly noted as an enhancement, not a gap. No issues filed. Clean.
Triage — what happened next PR #119 landed with wip label and full bot approvals. Triage detected the approved-but-wip state and reported it. Dev loop cleaned up the label on the next run.

Three loops, three different jobs, all triggered by one merge.

Real Example: The Missing vcs/util.go

Issue #82: "Extract shared VCS utilities into vcs package"

PR #83 — what was merged PR delivered vcs interfaces, types, and most of the specified content. The code compiled. CI passed. Bot reviewers approved.
post-merge review — what was found Post-merge review read acceptance criterion 3: "GetAllFilesInPath and BuildLineToPositionMap must be in vcs/util.go." Checked the diff. vcs/util.go does not exist. Filed issue #84: "vcs/util.go not delivered."

Also found: vcs.ContentEntry and vcs.GiteaClient should have been deleted per criterion 4. They weren't. Filed issue #85.

Also found: 5 required interface methods missing from the vcs package. Filed issue #86.

Three bugs, zero human reviewers involved. The PR was merged and "done" — the post-merge review found the gaps.

The Pattern That Emerges

Looking across 436 merged PRs (380 gargoyle + 56 review-bot) in 3 weeks of autonomous operation:

The post-merge review catches what pre-merge review misses.
Review happens when code is fresh and the reviewer is primed by the PR description. Post-merge review happens cold, against the issue — it's structurally harder to miss things the issue asked for.

The loop amplifies quality over time.
Each issue filed by the post-merge review enters the dev loop. The loop fixes it. The post-merge review checks the fix. Quality compounds.

Silence is the majority state.
Most loop runs return NO_REPLY. The system is healthy most of the time. When something shows up, it's real.

The human becomes the merge gate.
Not the reviewer, not the debugger, not the scheduler. Aaron merges approved PRs. That's it. Everything else is handled.

The goal was never to replace the human. It was to make the human's time count.

Summary

Design first — acceptance criteria become the contract every downstream loop checks against.

Dev loop — runs on a priority stack, self-corrects on failures, hands off only when clean.

post-merge review — verifies intent after merge, when it's hardest to rationalize gaps away.

Triage — keeps the board honest so the loops always have accurate state.

Docs — the only continuity an AI has between sessions. Remove them and the loops go blind.

NO_REPLY — the sound of a healthy system. Signal means something real happened.

Quality comes from cycles, not heroics.

One More Thing

The full system is documented at
github.com/Rodin-AI/how-i-work