Research / Benchmarks

Benchmarks before claims.

We publish benchmark results with full methodology — parameter count, training compute, energy use, and accuracy — before any product positioning.

Evaluation suite

Primary benchmarks: ARC-AGI for abstract reasoning, Lean theorem proving for formal verification, and planning/maze tasks for sequential decision making.

Secondary metrics track sample efficiency, inference latency on consumer and datacenter GPUs, and energy per correct answer.

Shipped capabilities

ARC-AGI public leaderboard submissions
Lean proof success rate tracking
Planning task accuracy and step efficiency
RTX 4090 and datacenter GPU baselines
Reproducible eval harness in open source

Benchmarks before claims.

Evaluation suite

Shipped capabilities

The future of AI requires sovereign infrastructure, trustworthy reasoning and enterprise governance.