Building BrainGrid with BrainGrid: Spec-Driven Development with Claude Code
How we ship features in under an hour — from half-baked idea to tested, deployed code — using spec-driven development with BrainGrid and Claude Code.
How we ship features in under an hour — from half-baked idea to tested, deployed code.
#48 Minutes
Here's what a typical feature build looks like for us:
110:02 /specify "Add time range filter to credits health dashboard" 210:03 → REQ-287 created with 8 acceptance criteria 3 410:03 /build REQ-287 510:03 → Branch: feature/REQ-287-time-range-filter, 5 tasks linked 610:04 → Task 1: Add date range picker component (in progress) 710:11 → Task 1 complete, validation passed 810:11 → Task 2: Update API endpoint with date params (in progress) 910:18 → Task 2 complete, validation passed 10 ... 1110:42 → All 5 tasks complete, yarn validate:fix passes 12 1310:42 → PR created: "feat: add time range filter to credits health dashboard (#1287)" 1410:43 → braingrid requirement review 1510:44 → 8/8 acceptance criteria validated against PR diff 1610:45 → Code review approved, merged to dev 17 1810:46 → Agent writes test spec: time-range-filter.md 1910:47 → agent-browser opens dev deployment 2010:48 → Database seeded with test data for known date ranges 2110:48 → agent-browser navigates to dashboard, selects "Last 7 days" 2210:49 → agent-browser snapshots table, verifies filtered results 2310:50 → Database query confirms filter matches actual data 2410:50 → Test spec updated: status: PASSED
48 minutes. Idea to tested, merged feature. The spec took 60 seconds. The human reviewed diffs, approved the PR, and made one adjustment to a component's padding. The test ran against the live dev deployment — not a mock, not a stub — with real data verified at the database level.
Every feature in BrainGrid is built this way. Not as a marketing exercise — because it's the fastest workflow we've found. This post explains how it works.
#The Problem
Vibe-coding works until you merge it, deploy it, and realize you forgot the error state. Or the loading state. Or what happens when the user has zero credits and clicks the button anyway.
You didn't think about those cases because you're moving fast — and that's fine for prototypes. But the AI didn't think about them either, because you never told it to. It built exactly what you described: the happy path. Everything else is missing.
The fix isn't to slow down or stop using AI. It's to give the AI a requirement that already thought through the edge cases, error handling, and loading states — so you don't have to. The AI that writes your spec thinks like a product engineer. It asks the questions you'd skip. The AI that implements the spec executes like a production software engineer, because that's what a professional requirement demands.
That's the entire philosophy: start with requirements, not code.
#The Workflow: Four Commands
#1. /specify — Turn an idea into a requirement
1/specify "Trial users should see upgrade prompt instead of buy credits when out of credits"
AI refines your one-liner into a structured requirement. Here's what REQ-375 looked like after /specify:
1REQ-375: Trial users should see upgrade prompt instead of buy credits 2 3Problem: 4 Trial organizations see the same credit exhaustion messaging as paid orgs, 5 offering "Top-up credits" — but trial users can't purchase credit packs. 6 7Solution: 8 Detect trial status, display trial-appropriate messaging with only the 9 upgrade CTA. Paid orgs continue seeing both options. 10 11Components to modify: 12 - out-of-credits-banner-wrapper.tsx (fetch subscription status) 13 - out-of-credits-banner.tsx (conditional render by trial status) 14 - low-credits-banner.tsx (agent overlay variant) 15 16Acceptance Criteria: 17 ✓ Trial org + 0 credits → top banner shows "Your trial has run out 18 of credits" with only "Upgrade plan" button 19 ✓ Trial org + 0 credits → agent overlay shows "Upgrade now" only 20 ✓ Paid org + 0 credits → shows both "Top-up credits" and "Upgrade plan" 21 ✓ Trial org + 1-50 credits → standard low credits message (not trial-specific) 22 ✓ Loading state → show paid behavior until trial status confirmed 23 ✓ Error state → fall back to paid behavior, log error 24 ✓ Status changes reflect consistently across both banners
That's condensed. The full requirement also included a data fetching strategy (React Query with 5-minute stale time), props interfaces, error/loading state specifications, and a message variation table mapping every condition to its banner variant, message copy, and CTA buttons.
A well-structured requirement has these components:
- Problem statement — what's broken or missing, in user-facing terms
- Solution summary — the approach, not the implementation
- Scope — which files/components are affected (so the AI doesn't wander)
- Acceptance criteria — testable given/when/then conditions that define "done"
- Edge cases and error handling — loading states, failures, boundary conditions
- Out of scope — what this requirement deliberately doesn't cover
The AI generates all of this from a single sentence. You type one line, the AI writes the full spec — problem statement, acceptance criteria, edge cases, scope — and you review it. Most of the time it's 80% right. You fix the 20%, move on. We catch bad specs about 20% of the time — the AI assumed a modal instead of inline editing, or missed an auth check, or scoped too broadly. Editing a spec takes seconds. Debugging a wrong implementation takes an hour.
#2. /breakdown — Turn the spec into tasks
1/breakdown REQ-375
This is more than "split the work into chunks." The AI assembles context from three sources: the full requirement (acceptance criteria, edge cases, technical decisions), your codebase structure (repository analysis, file tree, existing patterns), and related documentation. It then generates atomic implementation tasks — each scoped to a single concern — with explicit dependencies between them. The AI knows which files exist, which hooks and components are already in your codebase, and how they're structured.
Here are the actual tasks generated for REQ-375:
1TASK-1: Create useSubscriptionStatus hook 2 → New hook: src/hooks/use-subscription-status.ts 3 → Fetch from /api/organizations/[orgId]/subscription 4 → React Query: cache key ['subscription-status', orgId], staleTime 5min 5 → Return { isTrialSubscription, isLoading, error } 6 → On error: default isTrialSubscription to false (safe fallback) 7 8TASK-2: Update out-of-credits-banner with trial support 9 → File: src/components/out-of-credits-banner/out-of-credits-banner.tsx 10 → Add isTrialSubscription: boolean prop 11 → Trial + 0 credits: render "Your trial has run out of credits" + "Upgrade plan" only 12 → Paid + 0 credits: keep both "Top-up credits" and "Upgrade plan" (existing behavior) 13 14TASK-3: Update out-of-credits-banner-wrapper 15 → File: src/components/out-of-credits-banner/out-of-credits-banner-wrapper.tsx 16 → Call useSubscriptionStatus hook 17 → Pass isTrialSubscription to banner component 18 → While loading: default to paid behavior (no flicker) 19 20TASK-4: Update low-credits-banner with trial support 21 → File: src/components/agent/agent-pane/low-credits-banner.tsx 22 → Trial + 0 credits: show "Upgrade now" only 23 → Trial + 1-50 credits: standard "You have only X credits left" (not trial-specific) 24 25TASK-5: Run validation 26 → yarn validate:fix (type-check + lint + format + test)
These aren't vague tickets. They're prompts — each one tells the agent exactly which file to modify, which pattern to follow, which prop to add, and what the expected behavior should be. The spec already made the design decisions (cache key, stale time, fallback behavior), so the tasks are pure execution.
#3. /build — Start implementing
1/build REQ-375
This runs a four-step flow:
- Fetches the build plan from BrainGrid with requirement details and the full task array
- Creates a feature branch —
feature/REQ-375-trial-upgrade-prompt— and associates it in BrainGrid so everything is linked - Creates and links tasks in Claude Code, connecting each to BrainGrid so status syncs automatically
- Starts implementing the first task immediately — no "shall I proceed?" prompts
The agent picks up tasks sequentially — implements the code, runs yarn validate:fix, and if validation passes, marks the task complete and moves to the next. If validation fails, it reads the error, fixes the issue, and re-runs before moving on. You watch in real time and course-correct when needed.
You can steer focus by appending instructions: /build REQ-375 start with the data fetching hook — Claude adjusts task priority accordingly.
#4. braingrid requirement review — Validate acceptance criteria against the PR
1braingrid requirement review
This is where the spec earns its keep. The command auto-detects the requirement from the branch name and the PR number from git, then uses AI to perform the review. It fetches the PR diff from GitHub and the full requirement with acceptance criteria from BrainGrid, then the AI reasons about whether each criterion is satisfied by the actual code changes — tracing criteria to specific lines in the diff:
1Reviewing PR #1288 against REQ-375... 2 3Acceptance Criteria: 4 ✅ Trial org + 0 credits → banner shows "Upgrade plan" only 5 → out-of-credits-banner.tsx:42 — conditional render on isTrialSubscription 6 ✅ Paid org + 0 credits → shows both buttons 7 → out-of-credits-banner.tsx:38 — default branch renders both CTAs 8 ✅ Loading state → paid behavior until status confirmed 9 → out-of-credits-banner-wrapper.tsx:18 — isTrialSubscription defaults to false 10 ✅ Error state → fallback to paid behavior 11 → out-of-credits-banner-wrapper.tsx:22 — catch block sets isTrialSubscription = false 12 ... 13 147/7 acceptance criteria validated.
Did the implementation miss an edge case? Is there a criterion with no corresponding code change? You find out before the PR merges, not after users report a bug.
Here's what it looks like when a criterion fails — say the agent implemented the top banner but forgot to update the agent overlay:
1Reviewing PR #1288 against REQ-375... 2 3Acceptance Criteria: 4 ✅ Trial org + 0 credits → banner shows "Upgrade plan" only 5 → out-of-credits-banner.tsx:42 — conditional render on isTrialSubscription 6 ❌ Trial org + 0 credits → agent overlay shows "Upgrade now" only 7 → low-credits-banner.tsx — no trial-specific conditional found, 8 still renders standard "Top-up credits" CTA for all org types 9 ✅ Paid org + 0 credits → shows both buttons 10 → out-of-credits-banner.tsx:38 — default branch renders both CTAs 11 ... 12 136/7 acceptance criteria validated. 1 failed.
The AI traces each criterion to specific lines in the diff. It's not grepping for keywords — it's reasoning about whether the code changes actually satisfy the criterion. It catches semantic gaps that compile and lint clean: a component that handles the trial state but renders the wrong CTA text, or an error fallback that works correctly but doesn't match the criterion's specified behavior. Where it can't catch you is runtime logic errors where the code reads correctly but behaves wrong — "the conditional exists but the boolean is inverted." That's exactly what the next layer is for.
#AI-Driven Testing
After the PR merges, testing follows the same pattern: AI writes the test, AI runs the test, humans review. This is the layer that catches what code review can't — behavioral bugs in the running application.
The agent writes a markdown test spec, then executes it by driving a real browser against the deployed app while verifying data at the database level:
1Test: Credits Top-Up with Stripe 2────────────────────────────────────────────────── 3Setup: Query initial balance → 1,998 credits 4Step 1: agent-browser → navigate to /settings/billing 5Step 2: agent-browser → click "Top-up credits", select $10 / 1,000 credits 6Step 3: agent-browser → fill Stripe test card, click Pay 7Step 4: agent-browser → wait for redirect, verify success message 8Verify: Query final balance → 2,998 credits ✅ 9────────────────────────────────────────────────── 10Result: PASSED 11 - Credits added to ORGANIZATION account (not USER) 12 - Transaction type: OVERAGE_PURCHASE 13 - Credits expire after 1 year 14 - Stripe event payload verified in events table
agent-browser drives the actual user flow — clicking buttons, filling forms, navigating pages. MCP servers give the agent direct database access for setup and verification. The agent reads the spec, executes each step against the running app, and appends actual results including event payloads and database state changes it observed.
The defense is layered: spec review catches missing implementations, requirement review catches criterion-to-code gaps, and browser tests catch behavioral bugs in the live app. The real gap — a bad spec and a bad test — is the same gap human engineering has. The difference is that every layer runs automatically.
#When It Breaks
The happy path is nice. Here's what happens when things go wrong.
The spec itself is wrong. This is the one failure mode no amount of automation catches — because every downstream layer executes the spec faithfully. If the spec says "show a modal" and you meant inline editing, the implementation will be correct according to the wrong spec. That's why /specify walks you through clarifying questions before generating the requirement, and why you review the spec before /breakdown. The human is the checkpoint. If you rubber-stamp a bad spec, everything downstream is on you.
The AI misunderstands the spec. Different from above — this is when /specify generates acceptance criteria that don't match your intent and you catch it. We catch bad specs about 20% of the time — the AI assumed a modal when we wanted inline editing, missed an auth requirement, or scoped too broadly. Editing a spec takes seconds. Debugging a wrong implementation takes an hour.
A task fails validation. The agent runs yarn validate:fix after every task. If types break or tests fail, it reads the error, fixes the code, and re-validates before marking the task complete. You see this happening in real time. If it gets stuck in a loop, you intervene — but that's rare because the task description already specified which patterns to follow.
requirement review flags a gap. A criterion shows no corresponding code change. The agent either missed it or decided it was out of scope. You see exactly which criterion failed and can either implement it or mark it as intentionally deferred.
The test fails. agent-browser snapshots the DOM and the agent reads the actual state. "Expected 'Upgrade plan' button, found 'Top-up credits' button." The error is usually obvious from the snapshot. The agent can fix the code and re-run, or you can investigate manually. Element refs (@e1, @e2) change after every page interaction, so the agent re-snapshots after each step — stale refs are the most common failure mode and the tooling handles it.
#Patterns Worth Stealing
These apply to any AI-assisted development workflow, with or without BrainGrid:
- Specify before building. A few minutes of upfront clarity saves an hour of rework. This is the single highest-leverage thing you can do.
- Tasks are prompts. Write task descriptions as if you're prompting an AI — because you are. Include file paths, patterns, and APIs.
- Automate the things humans forget. Task status sync, validation, branch naming conventions. If the developer has to remember to do it, they won't.
- Validate against the spec, not just the code. Code review catches bugs. Spec review catches missed requirements. Browser tests catch behavioral regressions. You need all three.
- The agent should just start. After an explicit build command, don't ask "shall I proceed?" The user already expressed intent.
- Test against real infrastructure. Database queries, browser interactions, deployed endpoints. Mocks hide bugs.
- Memory compounds. Store learnings persistently. Your future self (and your agents) will thank you.
- Handle errors reactively. Don't pre-check if every tool is installed. Run it and handle failure if it occurs.
#Try It in 5 Minutes
You don't need the full setup to start. The core loop — specify, break down, build — works with just three things:
1# 1. Install the CLI 2npm install -g @braingrid/cli 3 4# 2. Create an account and authenticate 5# → Sign up at app.braingrid.ai, then: 6braingrid login 7 8# 3. Initialize in your project 9braingrid init 10 11# 4. Open Claude Code, then type: 12/specify "your feature idea here" 13 14# 5. Break it into tasks 15/breakdown REQ-XXX 16 17# 6. Start building 18/build REQ-XXX
That's it. No MCP servers, no hooks, no browser automation. Add those later when you want database verification, automated task sync, or AI-driven testing. Start with the spec.
#Under the Hood
The workflow runs on Claude Code with a few extensions. Everything above works out of the box. Everything below powers the full experience but is optional.
Skills are markdown files that teach Claude domain knowledge. They load automatically when relevant context is detected — you don't invoke them manually. We use braingrid-cli (the spec-driven workflow), agent-browser (E2E testing), frontend-design (UI that doesn't look AI-generated), and memory (persistent learnings via mem0).
Hooks run shell scripts in response to tool calls. Our most important hook: when Claude marks a task complete, a PostToolUse hook syncs the status to BrainGrid automatically. You never manually update a task tracker.
MCP servers give Claude authenticated access to external services. Supabase for database queries — you control the scope (read-write on dev, read-only on production, or production off entirely — your call). Axiom for production logs (read-only). Playwright for browser automation. mem0 for persistent memory. When Claude needs to verify a migration, it queries the dev database directly. When it needs to test a flow, it drives a browser against the staging deployment. No context switching.
Persistent memory via mem0 stores learnings across sessions: "AuthKit login: wait 1500ms after email submit for password field." "Credit expiration cron uses billing_anniversary_day, not created_at." Before starting work, the agent searches memory. After discovering something reusable, it stores it. This compounds — month-old debugging insights surface exactly when needed.
#Links & Resources
#BrainGrid
- Web App: app.braingrid.ai
- CLI (
@braingrid/cli): npmjs.com/package/@braingrid/cli - GitHub: github.com/BrainGridAI/braingrid
- Documentation: braingrid.ai
#Claude Code
- Claude Code: docs.anthropic.com/en/docs/claude-code
- Skills: docs.anthropic.com/en/docs/claude-code/skills
- MCP (Model Context Protocol): modelcontextprotocol.io
#Tools
- Vercel agent-browser: github.com/vercel-labs/agent-browser
- mem0 (Persistent Memory): github.com/mem0ai/mem0
- Supabase: supabase.com
- Axiom: axiom.co
- Playwright: playwright.dev
About the Author
Tyler Wells is the Co-founder & CTO of BrainGrid, where we're building the future of AI-assisted software development. With over 25 years of experience in distributed systems and developer tools, Tyler focuses on making complex technology accessible to engineering teams.
Want to discuss AI coding workflows or share your experiences? Find me on X or connect on LinkedIn.
Keep Reading
Ready to build without the back-and-forth?
Turn messy thoughts into engineering-grade prompts that coding agents can nail the first time.
Re-love coding with AI