Notebook 08 — The Coding Workflow¶

Premise: The seven preceding notebooks taught the pieces — the loop, the prompts, the skills, the identity, the audit. This notebook teaches the workflow that uses them to build software the way you'd want a senior engineer at your lab to build it.

It is the same pattern, fractal:

Plan · Do · Validate — at the workflow level, and at the skill level inside every step.

We walk through the seven phases — steering → brainstorm → build → deepen → specify → implement → review — read the actual artifacts produced at each stage, and unpack why each phase exists. The principled-coder identity governs the entire chain.

By the end you will have:

Read the principled-coder identity that governs every phase
Walked through real artifacts from a shipped feature (Kimi provider, SPEC-001)
Mapped each command to a plan/do/validate role
Seen the same fractal pattern at workflow level and skill level
Understood why heavy plan + cheap do works better than the reverse

Setup¶

In [ ]:

from pathlib import Path
from rich import print
from rich.panel import Panel
from rich.console import Console
from rich.table import Table
from rich.markdown import Markdown

console = Console()
MIRROR = Path('../.claude').resolve()

def show(path: str, *, title: str | None = None, head: int | None = None) -> None:
    p = MIRROR / path
    body = p.read_text()
    if head is not None:
        body = '\n'.join(body.splitlines()[:head])
    console.print(Panel(Markdown(body), title=title or path, border_style='cyan'))

1. The identity that governs everything¶

Every command in the chain — /brainstorm, /build, /deepen, /specify, /implement, /review — references the same identity: the principled-coder. Recall notebook 05: an identity is a version-controlled file that produces aligned decisions across many agents and many roles. This is that pattern, applied to coding.

Read the four pillars. Note the explicit ordering: when they conflict, simplicity wins over cleverness, modularity wins over monoliths, security wins over convenience, scalability wins over "fast enough."

In [ ]:

show('agents/principled-coder.md', title='principled-coder identity', head=70)

Every phase below applies these pillars in that order. A brainstorm that violates simplicity gets pushed back. A spec that leaks logic across modules gets rewritten. An implementation without audit hooks fails review. The identity is the rubric.

2. The seven phases¶

#	Phase	Role	Output
0	`/create-steering-docs`	One-time project context	`product.md`, `tech.md`, `structure.md`, `roadmap.md`
1	`/brainstorm`	Plan — WHY	`.claude/brainstorms/<feature>.md`
2	`/build`	Plan — WHAT (decisions)	entry in `.claude/decisions-log.md`
3	`/deepen`	Plan — research enrichment	enriched build doc
4	`/specify`	Plan — HOW (concrete)	`PRD.md`, `SDD.md`, `PLAN.md`
5	`/implement`	Do	code + tests
6	`/review`	Validate	ADRs + monitoring plan + memory promotion

Five phases of plan, one phase of do, one phase of validate. That ratio is on purpose. Most teams flip it (one rushed plan meeting, then five sprints of building, then chaos). The premise here: 80% of the work happens before any line of production code is typed.

Phase 0 — Steering: one-time persistent context¶

Before any feature, you write four documents that don't change much. They become the lens through which every future feature gets evaluated.

product.md — vision, personas, success metrics, business constraints
tech.md — stack, build commands, quality thresholds
structure.md — modules, directories, dependency rules
roadmap.md — phase, themes, what's in scope this quarter

Without steering, every new feature re-litigates the same five questions ("who's this for?" "what stack?" "what counts as done?"). With steering, the agent can read the docs once and stop asking.

In [ ]:

show('examples/steering/product.md', title='Real example: arc/.claude/steering/product.md', head=40)

Phase 1 — `/brainstorm`: WHY (plan, layer 1)¶

Brainstorm explores the reason for building. It does NOT make design decisions — that's /build. The output is a structured exploration: inspiration, audience, use cases, desired outcomes, guiding principles.

Why first? If you skip this, you build a technically correct thing nobody wanted. The cheapest mistake to fix is the one caught before any architecture is drawn.

In [ ]:

show('commands/brainstorm.md', title='/brainstorm command', head=30)
print()
show('examples/brainstorms/2026-04-27-nlit-demo-local-build.md', title='Real brainstorm: NLIT 2026 local-first demo', head=40)

Plan/Do/Validate inside /brainstorm:

Plan: assess clarity — is brainstorming even needed?
Do: collaborative dialogue, one question at a time (inspiration → audience → use cases → outcomes → principles)
Validate: scope check — what's IN, what's OUT, what stays for later

Phase 2 — `/build`: WHAT (plan, layer 2 — decisions)¶

/build walks through every design decision a feature needs, one at a time, with tradeoffs. Each option is ranked by the principled-coder pillars (Simplicity → Modularity → Security → Scalability). When a federal mandate (NIST, FedRAMP, CMMC) dictates exactly one answer, the build skill auto-applies the decision and logs the citation rather than asking.

The output: an entry in decisions-log.md — a flat table of every decision, the chosen option, and the tier-specific notes. This is the artifact the spec consumes next.

In [ ]:

show('commands/build.md', title='/build command', head=40)

Plan/Do/Validate inside /build:

Plan: present the decision + ranked options with tradeoffs
Do: ask the user (or auto-apply if federally mandated)
Validate: log decision with reasoning + tier impact, repeat for next decision

Phase 3 — `/deepen`: research enrichment (plan, layer 3)¶

Now that decisions are made, /deepen spawns parallel research agents to enrich each decision with best practices, edge cases, past solutions, and external documentation. This is where the 80/20 split becomes real — heavy research before any code, in parallel, in minutes.

Critically, /deepen searches .claude/solutions/ — your team's archive of previously-solved problems. Solve once, apply forever.

In [ ]:

show('commands/deepen.md', title='/deepen command', head=30)

Plan/Do/Validate inside /deepen:

Plan: parse the build doc into sections, match each to a research agent
Do: spawn parallel agents — best-practices researcher, edge-case finder, solutions-archive searcher, doc-fetcher
Validate: synthesize into the build doc, flagging conflicts between sources

Phase 4 — `/specify`: HOW (plan, layer 4 — concrete tasks)¶

/specify converts decisions + research into three artifacts:

PRD.md — Product Requirements: what the feature does, for whom, success criteria, what's out of scope.
SDD.md — Solution Design: architecture, components, contracts, module boundaries.
PLAN.md — Implementation Plan: ordered tasks with checkboxes, file locations, acceptance per task.

The three docs are linked by traceability: every PRD requirement maps to an SDD component maps to a PLAN task. If a task can't be traced back to a requirement, it doesn't exist.

In [ ]:

show('examples/spec-kimi/PRD.md', title='Real PRD: Kimi provider', head=30)
print()
show('examples/spec-kimi/SDD.md', title='Real SDD: Kimi provider', head=30)
print()
show('examples/spec-kimi/PLAN.md', title='Real PLAN: Kimi provider', head=30)

Note the discipline: PRD lists requirements with IDs (FR-1, FR-2, NFR-1...). SDD references those IDs. PLAN references the SDD components. The chain is auditable end-to-end.

Plan/Do/Validate inside /specify:

Plan: validate intake (clarity, completeness), detect feature type (API/UI/DB/Integration)
Do: generate PRD → SDD → PLAN, each consuming the previous
Validate: 3-Cs check — Completeness, Consistency, Correctness — and confidence-based routing (fast-track simple features, phase-gates for complex)

Phase 5 — `/implement`: Do¶

Now — and only now — code gets written. /implement reads the PLAN, categorizes tasks by specialist (test-impl, backend, db, etc.), spawns parallel agent swarms, and runs TDD per task.

TDD is the Plan/Do/Validate inside /implement:

Plan: write the failing test that captures the desired behavior
Do: write the minimum code to pass it
Validate: run the test (and the rest of the suite); if it fails, fix the implementation, not the test

The implement command also applies the four pillars per task: every spawned specialist inherits the principled-coder identity and refuses code that violates simplicity / modularity / security / scalability — even if the test passes.

In [ ]:

show('commands/implement.md', title='/implement command', head=40)

Phase 6 — `/review`: Validate¶

Implementation that passes its own tests has not been reviewed. /review spawns a separate swarm with no investment in the code that was just written:

security-engineer (OWASP, secrets, trust boundaries)
architect-reviewer (boundaries, SDD compliance)
code-reviewer (DRY, SOLID, readability)
code-simplifier (can this be smaller?)
coverage-analyzer (test gaps by business impact)
performance-engineer (N+1, memory, hot paths)

A PR fails review if it violates any pillar — even if the tests pass. Review then produces:

ADRs — architecture decision records, signed and dated, captured for posterity
Monitoring plan — what to watch in prod after deploy
Memory promotion — extracts globally-useful learnings from the spec README into long-term memory

In [ ]:

show('examples/adrs/ADR-017A-opt-in-policy-pipeline.md', title='Real ADR produced by /review', head=40)

Plan/Do/Validate inside /review:

Plan: load spec + diff, search solutions archive for relevant prior reviews
Do: spawn parallel reviewers, each filtering through one pillar
Validate: synthesize findings, generate ADRs for accepted-as-is divergences from spec, write monitoring plan

3. The fractal — same pattern, two scales¶

Scale	Plan	Do	Validate
Workflow (whole feature)	brainstorm + build + deepen + specify	implement	review
Skill (single phase)	gather inputs, list options	act on chosen option	check the action against the input contract
TDD (single task)	write failing test	write minimum code	run test, run suite, run lint
Decision (single tradeoff)	enumerate options ranked by pillars	user picks (or federal auto-apply)	log to decisions-log with reasoning

The same shape, all the way down. Plan-Do-Validate is fractal. Once you see it once, you see it everywhere — in the loop (notebook 03), in prompt design (notebook 02), in identity-driven decisions (notebook 05), and in the audit chain (notebook 06) which validates the whole sequence after the fact.

4. Why this works¶

Cheap iteration before expensive iteration. Throwing away a brainstorm costs 10 minutes. Throwing away a sprint of code costs 2 weeks. The chain front-loads the cheap mistakes.
Explicit identity, not implicit taste. The principled-coder pillars are written down. Anyone (or any agent) can apply them. No "vibes-based" decisions.
Validation by a different agent. /review is not the agent that did /implement. The validator has no investment in the work — it's harder for it to rationalize a problem.
Every phase produces a markdown artifact. Brainstorms, decisions, PRDs, SDDs, PLANs, ADRs — all version-controlled, diffable, reviewable, blamable. "Why did we choose X?" has an answer with a date and a SHA.
Federal auto-apply. Rules that are mandated (NIST 800-53 AU-2, etc.) don't get asked — they get applied with the citation. Saves time, prevents drift.
Solutions archive compounds. Every /review can promote a learning to .claude/solutions/{category}/. Next /brainstorm on a similar topic surfaces the prior solution. The team gets smarter with every feature.

5. Try it on a tiny feature¶

The methodology compresses well — for a small feature you can run all seven phases in under an hour. Pick something narrow:

"Add a --quiet flag to the agent loop that suppresses streaming output."
"Emit a warning when 80% of token budget is consumed."
"Add a coverage subcommand that prints test coverage by module."

Walk through the chain by hand:

Open the relevant commands/<name>.md in .claude/commands/ — read what it asks for.
Write the brainstorm in your own words (3 paragraphs).
Write the decisions table (5–10 rows).
Skim the deepen command — what 3 questions would you research in parallel?
Sketch PRD/SDD/PLAN. Notice how short they are when the prior phases did their job.
Implement. Test first.
Review against all four pillars before merging.

If you can do that loop on a 30-minute feature, you can do it on a 30-day feature.

Takeaway¶

The workflow is identity-governed plan/do/validate, fractal.
5 phases of plan, 1 of do, 1 of validate. The ratio is the point.
Every phase outputs a markdown artifact in version control. The history of the thinking is preserved alongside the history of the code.
Skills, identity, audit, and the loop (notebooks 02–06) are the ingredients. This notebook is the recipe.

Next: 09 — Your Workflow. Apply the recipe to a real lab problem.

Notebook 08 — The Coding Workflow¶

Setup¶

1. The identity that governs everything¶

2. The seven phases¶

Phase 0 — Steering: one-time persistent context¶

Phase 1 — /brainstorm: WHY (plan, layer 1)¶

Phase 2 — /build: WHAT (plan, layer 2 — decisions)¶

Phase 3 — /deepen: research enrichment (plan, layer 3)¶

Phase 4 — /specify: HOW (plan, layer 4 — concrete tasks)¶

Phase 5 — /implement: Do¶

Phase 6 — /review: Validate¶

3. The fractal — same pattern, two scales¶

4. Why this works¶

5. Try it on a tiny feature¶

Takeaway¶

Phase 1 — `/brainstorm`: WHY (plan, layer 1)¶

Phase 2 — `/build`: WHAT (plan, layer 2 — decisions)¶

Phase 3 — `/deepen`: research enrichment (plan, layer 3)¶

Phase 4 — `/specify`: HOW (plan, layer 4 — concrete tasks)¶

Phase 5 — `/implement`: Do¶

Phase 6 — `/review`: Validate¶