Whitepaper

The Provenance Problem

Everyone can write code — who's proving it works? Manufacturing factories have six sigma. What about automated software factories?

The Verification Gap

A single AI coding agent produces correct code roughly 61% of the time. Run N agents in parallel, and the probability that at least one succeeds climbs fast. But without verification, you cannot identify which one. That is the gap.

P(at least 1 correct | N) = 1 - (1 - p)N

~/hyper/probability --interactive

Base success probability (p)

Number of agents (N)

P(at least 1 correct | p=0.61, N=4)

97.7%

Probability by agent count (p=0.61)

95% verification threshold
99% practical certainty
61%
N=1
85%
N=2
98%
N=4
99.9%
N=8
100.0%
N=16

The problem is not generation. The problem is identification.

More agents only helps if you can verify which output is correct. Without verification, best-of-N is just expensive guessing.

Natural Selection for Code

Evolution produces robust solutions without central planning. Hyper applies the same principle: generate variation, apply selection pressure, retain only the fittest.

Organisms

Agents

Competing for the same niche

Environment

Gates

Selective pressure

Selection

Tournament

Variation + selection

Fossil record

Evidence trail

Every generation logged

Reproduction

Merge

Winners reproduce into main

~/hyper/natural-selection --n=16 --generations=5

16 organisms. 5 generations. Each row applies selection pressure. Only survivors advance.

Generation 0

Spawned

16/16

Generation 1

Hard gates: build, test, lint, typecheck

8/16

Generation 2

Advisory gates: visual, policy, diff-review

4/16

Generation 3

Council review: AI judge panel

2/16

Generation 4

Evidence fusion: Dempster-Shafer ranking

1/16
Alive
Eliminated
Winner

16 enter. 1 survives.

The Competitive Landscape

Multi-agent is table stakes. Verification is the moat.

~/hyper/landscape --compare
CapabilityHyperGastownBlackboxFactory.aiOpenClaw
Architecture paradigmCompetitionsame task, best winsDelegationdifferent tasks, manual mergeMulti-model orchestrationAI judge picks best outputAgent-native droidsdelegator + specialized droidsGateway + sub-agentscustom skills for each task
Multi-agentTournament (N compete, 1 wins)Work distribution (tasks farmed to subagents)Parallel execution (multi-model, AI judge)Droid delegation (sub-droids in parallel)Agent teams (custom skills required)
Independent verification11 gatesAI judgeTest hooks + code review
Evidence fusionDempster-Shafer
Fail-closedTest + coverage gates
Evidence trailFullExecution logsAudit logsSession transcripts
Visual regression
Merge confidenceMathematical
Cost per verified merge$0.47

Every competitor distributes work across agents — the same paradigm as hiring more developers. Hyper runs a tournament: N agents independently attempt the same specification, each sampling a different region of the model's latent space. Verification gates select the winner. This rigs the math — instead of hoping one agent gets it right, you raise the probability that at least one does. The difference between parallelization and selection.

Before each tournament, the decomposer re-evaluates the codebase and generates fresh specs. After, it tracks what failed and applies anti-fixation — refusing to retry semantically similar approaches, routing instead to entirely different parts of the vision. Self-healing by design: the factory never gets stuck, it adapts.

When 90% AI adoption correlates with 9% more bugs and 154% larger PRs (Google DORA 2025), the only safe factory is one that verifies everything.

Dark Factory Mathematics

Compound reliability over hundreds of autonomous merges. Without independent gates, reliability collapses exponentially. With them, it holds.

The naive approach

If each merge has 94.2% reliability, after 100 merges:

0.942100 = 0.23% zero defects

With fail-closed gates

Rsystem = 1 - (1 - Rgate)G

With 11 gates at 95% each: 1 - (0.05)11 = ~1.0

~/hyper/reliability --simulate

System reliability over autonomous merge count

MergesReliability (%)
10
Hyper >99%
Naive 55%
25
Hyper >99%
Naive 22%
50
Hyper >99%
Naive 5%
75
Hyper >99%
Naive 1%
100
Hyper >99%
Naive <1%
Hyper (11 fail-closed gates)
Naive (single-agent, no gates)

The Level 5 Vision

A fully autonomous code factory that decomposes vision into specs, runs tournaments, merges winners, and repeats. No human in the loop unless the evidence demands it.

~/hyper/factory --autonomous
1

Signal

Bug report, feature request, vision statement

2

Spec

LLM decomposes into actionable spec

3

Confidence

Calibrated confidence score

4

Route

Dark / cautious / hold

5

Tournament

N agents compete in isolation

6

Verify

11 gates, evidence fusion

7

Merge

Winner merged, losers discarded

8

Repeat

Next spec, forever

The only safe autonomous factory is one that trusts nothing and verifies everything.

See the evidence.

Every claim in this paper is backed by real gate logs, diff artifacts, and evidence trails. Hyper does not ask you to trust it. It asks you to verify.