cargo bench.
Parallel ECS throughput
The headline: a 1,000,000-entity world stepped through a 5-system parallel pipeline runs in ~2.69 ms/tick (median). The same workload run sequentially is ~30.6 ms, so the rayon-parallel scheduler delivers an ~11.4× speedup on this machine. For frame-budget intuition: at 60 FPS a frame is 16.6 ms, so a million entities of ECS work fit in roughly a sixth of a frame.| Workload | Time/tick | Notes |
|---|---|---|
| 1M entities · 5-system pipeline · parallel | 2.69 ms | 11.4× over the 30.6 ms sequential run |
| 250K entities · 5-system pipeline · parallel | 1.36 ms | |
| 10K entities · headless physics + transform | 10.1 ms | full gameplay tick, not pure ECS |
| 10K entities · physics step | 9.78 ms | |
| Dataset extract + digest · 1K entities | ~0.8–1.05 ms | the reproducibility/answer-key path |
Scope, stated honestly: the 1M-entity figure is pure ECS (component iteration + the
scheduler), not a full render+physics frame. The 10K-entity numbers are full headless gameplay
ticks. The render benches in the repo measure CPU-side preparation, not GPU frame time.
Where the speed comes from
The ECS internals are built for this. Individual optimizations, measured on Apple Silicon (release) at the commits that introduced them — re-run withcargo bench to reproduce on your hardware:
spawn_batch— mass spawning 1M entities: 397 ms → 24.1 ms (~16×).- Chunked parallel
par_for_each(multi-component, 1M): 6.59 ms → 1.05 ms (~6.3×). - Chunked single-component read (1M): 2.79 ms → 290 µs (~9.6×); mutate: 6.58 ms → 517 µs (~12.7×).
Reproduce it
The benchmarks live inbenches/ (the parallel-pipeline stress test) and crates/*/benches/
(physics, dataset, math SIMD, animation, nav, scene, particle). To run them:
target/criterion/. The
canonical provenance for the headline stats — what was measured, when, and on what hardware — is
STATS.md at the repo root.