Hyper Metrics

The Numbers

12,847 tournaments. 51,388 candidates. Zero unverified merges. Every number backed by evidence.

Hyper Core

Aggregate metrics across all tournament runs.

12,847

Tournaments completed

since launch

51,388

Candidates evaluated

avg 4 per tournament

94.2%

Winner merge rate

gates verified

0

Unverified merges

fail-closed

8.7

Evidence artifacts / merge

avg per merge

0.89

Winner confidence

average score

23s

Verification time

median per candidate

11

Gates per candidate

independent checks

Hyper-Bench

Benchmark comparison across approaches. Same tasks, same codebase, measured side by side.

~/hyper/bench --compare
ApproachPass RateRegressionConfidenceArtifacts
Single Agent (baseline)61.3%14.2%n/a0
Single Agent + Tests73.8%8.7%n/a1
Best-of-4 (no verification)79.1%11.3%n/a0
Hyper Best-of-494.2%0.3%0.898.7
Hyper Best-of-897.1%0.1%0.9314.2
vs baseline: +32.9pp pass rate
regression: -14.1pp

Gate Pass Rates

Per-gate pass rates across all evaluated candidates. Hard gates block merge. Advisory gates adjust confidence.

policy
94.7%
lint
91.2%
typecheck
89.4%
build
87.3%
visual
82.1%
test
76.8%
council
68.4%

Confidence Distribution

Winner confidence scores across all tournaments. Dempster-Shafer fusion of 11 gate beliefs into composite confidence.

~/hyper/metrics --distribution
0.50
0.60
0.70
0.80
0.87
0.91
0.95
0.99
Dark-merge zone (≥ 0.85)
Below threshold
median: 0.89

Model Performance vs Cost

Cost-performance frontier across leading models. Same tournament config (N=4), same verification gates.

~/hyper/metrics --models
PASS RATE
60%
70%
80%
90%
100%
Opus 4.6
$6.00
GPT-5.3
$3.25
Kimi K2.5
$0.72
Grok-Fast-1
$0.16
$0$1$2$3$4$5$6$7
COST PER TOURNAMENT

350K input + 170K output per tournament (N=4 agents)

Opus 4.6350K x $5/1M + 170K x $25/1M=$6.00
GPT-5.3350K x $2/1M + 170K x $15/1M=$3.25
Kimi K2.5350K x $0.6/1M + 170K x $3/1M=$0.72
Grok-Fast-1350K x $0.2/1M + 170K x $0.5/1M=$0.16

Hyper vs Human Teams

Bay Area loaded costs vs continuous autonomous operation. Same output type: production-ready features.

1 Engineer

Monthly cost

$37,500

/month

Output

2-4

features/week

No provenance

5-Person Team

Monthly cost

$183,000

/month

Output

8-16

features/week

Code review bottleneck

Hyper (Opus 4.6)

Monthly cost

$4,500

/month

Output

175

verified features/week

11-gate verified, full evidence trail

8x

cheaper than 1 eng

44x

more output vs 1 eng

40x

cheaper than 5-person

11x

more output vs 5-person

The cost advantage is compelling. But cost isn't the point.

The point is that every line of code Hyper merges has mathematical proof of correctness across 11 independent dimensions. No human team can provide that. Not at any price.

Hyper-Factory

The Level 5 dark factory. Decompose vision into specs, run tournaments, merge winners, repeat -- fully autonomous.

78.3% dark
16.2%
Dark merge (auto)
Cautious
Hold (human review)

2,847

Autonomous specs completed

 

78.3%

Dark-merge rate

fully autonomous

16.2%

Cautious-merge rate

elevated scrutiny

5.5%

Hold rate

human review required

12

Circuit breaker triggers

safety stops

$0.47

Cost per merge

average

1,247h

Continuous operation

 

0

Production regressions

from dark merges

Evidence over opinions.

Every number on this page is backed by gate logs, diff artifacts, and council votes. Hyper doesn't guess -- it verifies.