Hyper — AI agents compete. The best code wins.

The Verification Gap

A single AI coding agent produces correct code roughly 61% of the time. Run N agents in parallel, and the probability that at least one succeeds climbs fast. But without verification, you cannot identify which one. That is the gap.

P(at least 1 correct | N) = 1 - (1 - p)^N

~/hyper/probability --interactive

Base success probability (p)

Number of agents (N)

P(at least 1 correct | p=0.61, N=4)

97.7%

Probability by agent count (p=0.61)

95% verification threshold

99% practical certainty

61%

N=1

85%

N=2

98%

N=4

99.9%

N=8

100.0%

N=16

The problem is not generation. The problem is identification.

More agents only helps if you can verify which output is correct. Without verification, best-of-N is just expensive guessing.

Natural Selection for Code

Evolution produces robust solutions without central planning. Hyper applies the same principle: generate variation, apply selection pressure, retain only the fittest.

Organisms

Agents

Competing for the same niche

Environment

Gates

Selective pressure

Selection

Tournament

Variation + selection

Fossil record

Evidence trail

Every generation logged

Reproduction

Merge

Winners reproduce into main

~/hyper/natural-selection --n=16 --generations=5

16 organisms. 5 generations. Each row applies selection pressure. Only survivors advance.

Generation 0

Spawned

16/16

Generation 1

Hard gates: build, test, lint, typecheck

8/16

Generation 2

Advisory gates: visual, policy, diff-review

4/16

Generation 3

Council review: AI judge panel

2/16

Generation 4

Evidence fusion: Dempster-Shafer ranking

1/16

Alive

Eliminated

Winner

16 enter. 1 survives.

The Competitive Landscape

Multi-agent is table stakes. Verification is the moat.

~/hyper/landscape --compare

Capability	Hyper	Gastown	Blackbox	Factory.ai	OpenClaw
Architecture paradigm	Competitionsame task, best wins	Delegationdifferent tasks, manual merge	Multi-model orchestrationAI judge picks best output	Agent-native droidsdelegator + specialized droids	Gateway + sub-agentscustom skills for each task
Multi-agent	Tournament (N compete, 1 wins)	Work distribution (tasks farmed to subagents)	Parallel execution (multi-model, AI judge)	Droid delegation (sub-droids in parallel)	Agent teams (custom skills required)
Independent verification	11 gates	✕	AI judge	Test hooks + code review	✕
Evidence fusion	Dempster-Shafer	✕	✕	✕	✕
Fail-closed	✓	✕	✕	Test + coverage gates	✕
Evidence trail	Full	✕	Execution logs	Audit logs	Session transcripts
Visual regression	✓	✕	✕	✕	✕
Merge confidence	Mathematical	✕	✕	✕	✕
Cost per verified merge	$0.47	✕	✕	✕	✕

Every competitor distributes work across agents — the same paradigm as hiring more developers. Hyper runs a tournament: N agents independently attempt the same specification, each sampling a different region of the model's latent space. Verification gates select the winner. This rigs the math — instead of hoping one agent gets it right, you raise the probability that at least one does. The difference between parallelization and selection.

Before each tournament, the decomposer re-evaluates the codebase and generates fresh specs. After, it tracks what failed and applies anti-fixation — refusing to retry semantically similar approaches, routing instead to entirely different parts of the vision. Self-healing by design: the factory never gets stuck, it adapts.

When 90% AI adoption correlates with 9% more bugs and 154% larger PRs (Google DORA 2025), the only safe factory is one that verifies everything.

Dark Factory Mathematics

Compound reliability over hundreds of autonomous merges. Without independent gates, reliability collapses exponentially. With them, it holds.

The naive approach

If each merge has 94.2% reliability, after 100 merges:

0.942¹⁰⁰ = 0.23% zero defects

With fail-closed gates

R_system = 1 - (1 - R_gate)^G

With 11 gates at 95% each: 1 - (0.05)¹¹ = ~1.0

~/hyper/reliability --simulate

System reliability over autonomous merge count

MergesReliability (%)

Hyper >99%

Naive 55%

Hyper >99%

Naive 22%

Hyper >99%

Naive 5%

Hyper >99%

Naive 1%

100

Hyper >99%

Naive <1%

Hyper (11 fail-closed gates)

Naive (single-agent, no gates)

The Level 5 Vision

A fully autonomous code factory that decomposes vision into specs, runs tournaments, merges winners, and repeats. No human in the loop unless the evidence demands it.

~/hyper/factory --autonomous

Signal

Bug report, feature request, vision statement

Spec

LLM decomposes into actionable spec

Confidence

Calibrated confidence score

Route

Dark / cautious / hold

Tournament

N agents compete in isolation

Verify

11 gates, evidence fusion

Merge

Winner merged, losers discarded

Repeat

Next spec, forever

The only safe autonomous factory is one that trusts nothing and verifies everything.

The Provenance Problem

The Verification Gap

Natural Selection for Code

The Competitive Landscape

Dark Factory Mathematics

The Level 5 Vision

See the evidence.