Benchmarks — Oenerga

Benchmark philosophy

Peak numbers do not buy infrastructure

Enterprise and sovereign buyers do not operate on isolated component marketing figures. They evaluate systems based on what matters in production: how much work is delivered per rack, how much energy is consumed per useful unit of output, how communication scales, and how active state affects real cost.

The AI hardware industry has a benchmarking problem. Processor-level TOPS numbers routinely characterize systems that spend more energy moving data than computing with it. Peak throughput numbers are measured on microbenchmarks that do not reflect production inference patterns. Memory bandwidth numbers reflect theoretical maximums, not useful bandwidth under real workloads.

Oenerga's benchmark approach is built for the buyers who have been through that cycle. We measure what procurement teams pay for, under conditions they can audit, with methodology they can review before making infrastructure decisions.

What Oenerga measures — and why

Wall-power accounting

Energy per token measured at the system level — compute, memory, interconnect, and control included. Not chip-level thermal design power.

Application-level throughput

Tokens per second and requests per second on workloads representative of real serving patterns — not synthetic kernel benchmarks.

Rack-level outcomes

Throughput per rack and cost per rack are the relevant units for procurement decisions — not chip-level peak figures.

Explicit baseline comparison

Every Oenerga benchmark names the baseline system and describes the test conditions. Numbers without baselines are not numbers.

Performance targets

What Oenerga delivers

Architecture-level advantages expressed in the terms that matter to procurement and infrastructure teams. Full methodology available on request.

34%

lower

Energy per token at wall power, long-context inference, vs. GPU baseline

2.3×

improvement

Long-context throughput per rack under high concurrency serving workloads

41%

lower

Communication overhead at rack scale vs. electrical-only scale-up baseline

3.0×

better

Active-state efficiency in KV-heavy transformer serving, CIM-augmented vs. conventional

26%

lower

Projected infrastructure cost per rack under modeled long-context serving profile

[X] values represent target performance ranges under defined workload conditions. Specific figures available under NDA to qualified technical reviewers. Baselines, methodology, and test conditions disclosed in full.

Methodology

Customer-auditable by design

Oenerga benchmark methodology is designed to support serious technical review. Baselines, workload classes, context lengths, model families, concurrency levels, and power accounting boundaries should be visible and reviewable. Optical and system power are included where relevant.

Infrastructure buyers evaluating Oenerga systems should not need to trust marketing copy. They should be able to review the test conditions, replicate the environment under their own infrastructure assumptions, and verify the numbers with their own technical teams. That is the standard Oenerga is building toward.

Benchmark reviews are conducted under appropriate confidentiality arrangements for qualified partners. Full methodology documentation — including model families, context lengths, concurrency levels, baseline system specifications, and power measurement methodology — is available for review before any procurement decision.

Request Benchmark Methodology Book Technical Review

Methodology elements

Explicit baseline systems

Every comparison names the reference system, its configuration, and the software stack used. Comparisons without named baselines are not published.

Defined workload classes

Workloads are defined by model family, context length, concurrency level, and serving pattern — not by a single representative token count.

Wall-power accounting

Energy measurements are taken at the system boundary. Optical, memory, interconnect, and control subsystem power are included in the total.

Application-level metrics

Primary output metrics — tokens per second, energy per token, latency at load — are measured at the application layer, not at the compute plane in isolation.

Rack-level outcome focus

Final reported results are normalized to per-rack and per-watt terms. Chip-level performance claims are only published alongside system-level context.

Reproducible evaluation workflow

Benchmark environments and test scripts are documented to allow partner teams to replicate evaluation conditions independently, with Oenerga support where needed.

Pilot evaluations

Deployments that demonstrate the architecture

These evaluation scenarios reflect the workload classes and decision criteria relevant to Oenerga's primary buyer categories. Partner and deployment details are disclosed under appropriate confidentiality arrangements.

Case study 01 — Hyperscale

Hyperscaler pilot

[Pilot Partner Name] evaluated LUCID against a conventional serving baseline for long-context inference under high session concurrency. Oenerga demonstrated [X]% lower energy per token and [X]× improved throughput per rack in the target workload regime.

LUCID long-context inference energy per token

Case study 02 — Frontier lab

Frontier lab evaluation

A memory-heavy transformer serving workflow was used to compare state-handling efficiency and communication burden between Oenerga and a GPU-cluster baseline. Oenerga showed [X]% lower movement-related overhead and improved active-state utilization across tested context lengths.

LUCID state-handling KV-intensive workloads

Case study 03 — Sovereign infrastructure

Strategic infrastructure review

A sovereign AI deployment scenario assessed rack-level economics, scalability, and architecture-level differentiation under realistic serving profiles. Oenerga delivered [X] in effective state density and [X]% lower projected infrastructure cost under the modeled serving profile.

AURORA-M rack economics sovereign AI

Resources