Research / Benchmarks
Benchmarks before claims.
We publish benchmark results with full methodology — parameter count, training compute, energy use, and accuracy — before any product positioning.
Evaluation suite
Primary benchmarks: ARC-AGI for abstract reasoning, Lean theorem proving for formal verification, and planning/maze tasks for sequential decision making.
Secondary metrics track sample efficiency, inference latency on consumer and datacenter GPUs, and energy per correct answer.
Shipped capabilities
- ARC-AGI public leaderboard submissions
- Lean proof success rate tracking
- Planning task accuracy and step efficiency
- RTX 4090 and datacenter GPU baselines
- Reproducible eval harness in open source