A dark factory isn't no humans. It's no humans on the floor — because the floor is specified, jigged, and gated tightly enough to build web software unattended. We didn't remove anyone. We moved them up a level: from writing code to defining intent and the checks that decide what ships.
Warm stations run automated. Cool stations are where human judgment lives. Nothing ships until it passes inspection.
"Can a software company raise its quality bar and get far more done — without it turning into AI hand-waving I can't trust?"
Most "AI delivery" pitches ask you to trust the model. This one doesn't. The model produces candidates; a fixed set of deterministic checks decides what's acceptable. The reliability comes from the checks, not from faith in the machine.
Quality is enforced before anyone sees the work. Standards live in tooling, not in one person's head. Every build leaves an audit trail. And a human still owns the release.
It's only as good as the spec and the checks. Garbage intent in, garbage out — so the real work moves up front, to getting the requirement right. This is a twilight factory, not a fully dark one: lights mostly off, a human at the exception desk and the release sign-off.
"Will this raise the quality of what I get and how much gets done — or just be AI hype I can't trust?"
You get the version of the work that usually gets cut for time — full tests, security, monitoring, documentation — built in as standard. More gets delivered against a spec you signed off, and nothing reaches you until it has passed inspection.
Your standards live in tooling, not in one contractor's head — so key-person risk drops. Every change is checked and logged. You get an audit trail, not a black box.
Garbage spec in, garbage out. The work shifts to getting the requirement right up front, with us — and a human still signs off every release. We don't ship to your production on autopilot.
"Is this hype, and does it make me redundant?"
You stop typing boilerplate and start designing the line — the specs, the jigs (skills, conventions, hooks), the CI/CD, and the inspection stage. The model emits candidates; build, full tests, mutation, architecture-fitness and contract checks decide. Trust lives in the checks, not the model.
Up the abstraction ladder: domain modelling, the quality systems, the deployment automation and the inspection that make unattended runs safe. Plus real leverage — parallel agents under your control, each held to the same standard.
It's exactly as good as your checks and your specs — no better. Novel architecture is still yours. And reviewing agent output well is a genuine skill you'll have to build.
"Where do I fit once the build runs itself?"
The human layer — intent, design, the client relationship, the judgement calls — becomes the scarce, valuable input that bounds everything downstream. You run the spec and design layer, and you staff the exception desk.
When the build is reliable and gated, the differentiator is how well intent is captured and how the relationship is held. That's your work. Less time chasing build status; more time on the things only people do.
The spec and design have to be sharper and earlier. Ambiguity that used to get quietly resolved mid-build now has to be resolved up front — which raises the bar on discovery and clarity.
"What does this do to cost, margin, and risk — on both sides of the invoice?"
Cost basis moves from billable hours to outcomes priced against a spec. Fewer overruns, because cost is dominated by spec and inspection design rather than headcount-hours that drift.
Throughput per person rises, so margin stops tracking headcount. The main variable cost becomes model/compute spend — controllable with hard caps, model routing, and concurrency limits. The fixed cost is maintaining the jigs and checks.
There's upfront investment in the jigs and inspection before the leverage shows. And model spend needs active governance — it's a real line item, not a rounding error.
This is what lets the lights go off. Each check is a deterministic inspector the agent must pass — no negotiation. The model's job is to produce something that survives inspection; inspection, not your trust in the model, is the guarantee.
It builds clean, with nullability and static analysis enforced — not warned.
The full suite passes. Coverage thresholds are a gate, not a dashboard.
Catches tests that assert nothing — proving the suite actually bites.
Layering and boundary rules are tested like any other behaviour.
API output is checked against the signed contract from the spec.
Hooks block anything that leaks a secret or trips a security rule.
A model reviews for smell and intent. It advises. It never authorizes a release.
An illustrative feature of representative size, traditional delivery versus the line. The structure is the point; the figures are placeholders for you to set from your own data.
// Illustrative only — replace every figure with your measured numbers before presenting.
The credible version of this is honest about its edges. Three places the human stays, deliberately.
The line can't disambiguate a requirement the client hasn't settled. Undecided intent surfaces as an escalation, not a guess.
It excels at producing well-understood shapes at speed. Novel architecture and genuinely new problems are still human work.
Unattended merge to a client's production is a commercial and contractual decision, kept human on purpose — not a technical limitation.
Smallest trustworthy loop first — not the whole cathedral. Each step earns the next.
One accepted input: story, acceptance criteria, contract. Intent becomes machine-readable.
Consolidate house patterns into skills, conventions, and hooks the agent can't ignore.
Make the deterministic checks genuinely strict before automating any generation.
Spec → headless build → inspection → human review. One task class, one repo.
Only once that loop is boring: parallel agents under an orchestrator you control.
Every escaped defect updates a jig or check, so its whole class can't return.
Grant more autonomy per task class as trust data accumulates — never globally at once.
Start with one task class on one repository. Strict inspection, a signed spec, a human at the desk. If it doesn't earn the next step, it doesn't get it.
Book the first loop →