Capstone: Build One Small Feature Spec-First, End to End
Time to run the whole loop yourself, start to finish, on a real feature. This capstone walks you through one complete cycle — write the spec, structure the prompt, hand it to your agent, verify the output, and iterate — using a concrete task. Do it in whatever agent you already use; the steps are identical across Claude Code, Cursor, Copilot, Devin, and the rest, because the spec lives outside the tool.
Pick the feature below, or substitute one from your own work that is similar in size. The goal is to internalize the rhythm so the loop becomes how you naturally work.
What You'll Learn
- How to run the full spec-driven loop on one feature
- How the four steps connect in practice
- How to handle a verification failure live
- What "done" looks like under this method
The feature
You will build a password-strength checker: a single function that scores a password and returns whether it is acceptable, with a reason when it is not. Small enough to finish in one session, rich enough to have real edge cases. Here is the full loop.
- 1. SpecCriteria, edges, constraints
- 2. PromptContext + spec + plan-first
- 3. VerifyA test per criterion
- 4. IterateDiagnose and fix
Step 1: Write the spec
Before you touch the agent, write the spec. Resist the urge to skip ahead — this is the step that does the work. Use the template from earlier in the course.
## Task
A function checkPassword(pw: string) that scores a password and reports
whether it is acceptable.
## Acceptance criteria
- Returns { ok: true } when the password meets ALL rules below.
- Returns { ok: false, reason } naming the FIRST unmet rule otherwise.
- Rules, checked in this order:
1. At least 12 characters. reason: "too_short"
2. Contains at least one lowercase and one uppercase letter. reason: "needs_mixed_case"
3. Contains at least one digit. reason: "needs_digit"
4. Is not in the bundled COMMON_PASSWORDS list. reason: "too_common"
## Edge cases
- Empty string -> { ok: false, reason: "too_short" }.
- Leading/trailing whitespace counts toward length and is NOT trimmed.
- The common-password check is case-insensitive.
## Constraints
- One pure function, no side effects. No new dependencies.
- Reason reflects the FIRST failing rule in the listed order, not all of them.
## Prior decisions
- COMMON_PASSWORDS already exists as an exported array; import and use it.
## Verification
- A unit test per criterion and edge case.
- Run the project's test command.
Notice how the ordering rule ("first unmet rule") and the no-trim rule remove ambiguity the agent would otherwise guess. That precision is the point.
Step 2: Structure the prompt and hand it off
Wrap the spec in the three layers. Add context about where it lives, drop the spec in verbatim, and end with constraints plus a plan-first instruction.
## Context
Add this to auth/password.ts. Match the functional style there — return
result objects, don't throw. COMMON_PASSWORDS is exported from
auth/common-passwords.ts.
## Spec
<paste the full spec from Step 1>
## Constraints & response
- Only modify auth/password.ts. No new dependencies.
- Show me your plan (approach per rule, assumptions) before writing code.
Wait for my go-ahead.
Send it. When the agent returns a plan, read it. Does its approach honor the ordering rule? Did it notice the no-trim edge case? If the plan misreads a criterion, that is a cheap moment to correct — either clarify the spec or nudge the plan — before any code exists. Give the go-ahead only when the plan matches your intent.
Step 3: Verify against the spec
When the code comes back, do not skim it and nod. Run the verification checklist. Each criterion becomes a check. Ideally you write these tests yourself from the spec, or have a fresh session write them from the spec alone, so they test your contract and not the implementation.
checkPassword("Abcdefghijk1") => { ok: true } // meets all
checkPassword("Abc1") => { ok: false, "too_short" } // rule 1
checkPassword("abcdefghijkl1") => { ok: false, "needs_mixed_case" } // rule 2
checkPassword("Abcdefghijkl") => { ok: false, "needs_digit" } // rule 3
checkPassword("") => { ok: false, "too_short" } // empty
checkPassword(" Abcdefghij1 ") => { ok: true } // whitespace counts
// a known common password in mixed case => { ok: false, "too_common" }
Run them. Read the real output, not the agent's "all pass" summary. Count them against your criteria — every rule and edge case should have a test. A green suite with the right number of meaningful, spec-derived tests means the feature is verified.
Step 4: Iterate on whatever failed
Odds are at least one check fails on the first pass. Good — practice the diagnosis. Suppose the whitespace test fails: the agent trimmed the input, so " Abcdefghij1 " was treated as shorter and rejected.
Run the fork: was the spec wrong or the output wrong? Your spec explicitly said whitespace is not trimmed and counts toward length. So the spec was clear and the output disobeyed it. That is an output bug — re-prompt surgically:
Criterion fails: the no-trim edge case. " Abcdefghij1 " (with surrounding
spaces) should return { ok: true } because whitespace counts toward length
and must NOT be trimmed. Your implementation trims it. Fix only that; leave
the passing rules unchanged.
If instead the failure traced to something your spec never addressed — say, what happens with a password that is all spaces — that would be a spec gap. You would add the edge case to the spec, then re-run. Same loop, opposite fix. Repeat until every check is green.
What "done" looks like
Under spec-driven development, "done" is not "the agent said it finished" and not "it ran once without crashing." Done is:
- Every acceptance criterion and edge case has a passing, spec-derived test.
- You have read the diff and confirmed nothing outside the spec sneaked in.
- The spec itself is complete enough that someone else could re-derive the feature from it.
That third point is the lasting payoff. You did not just get a password checker. You got a checker plus a verified spec you can reuse, hand off, or run through a different agent and get the same result.
Keep the loop
You now have the full method. Run it on your next real task, then the one after. The first few times it will feel slower than vibe-prompting; within a handful of features it becomes faster, because you stop paying the patch-and-repeat tax. When you want to go deeper on the specific tool you run this loop in, the companion courses — Claude Code, Cursor and AI IDE workflows, GitHub Copilot — teach the buttons. This course gave you the recipe; take it to whichever kitchen you like.
Key Takeaways
- The capstone runs the entire loop on one feature: spec, prompt, verify, iterate.
- Writing the precise spec first — ordering rules, no-trim behavior — is what removes the agent's guesswork.
- Plan-first lets you catch a misread before any code exists.
- Verify with a spec-derived test per criterion and read real output, not the agent's summary.
- When a check fails, diagnose spec-versus-output and apply the opposite fix accordingly.
- "Done" means every criterion is verified, the diff holds no surprises, and the spec is reusable.

