euca-dataset is the data face of the engine: it extracts the world’s exact state as structured
data, hashes it for reproducibility, and (via the render path) emits aligned ground-truth modalities.
Because the truth is read from the authoritative CPU state, generated appearance never changes it —
which is what lets Euca double as a world-model answer key.
Structured state
WorldStateGraph— the observable world as{ tick, entity_count, entities }, serialized to canonical, deterministic JSON. This is the structured form behindobserve; the HTTP/observeroute returns the flattenedRichEntityDataprojection.state_digest(FNV-1a) — a 64-bit hash of the observable state over a canonical ordering (entities by id, components by name, fields in declaration order). Equal digests ⇔ equal observable state — the reproducibility oracle behind Determinism. Two AI-generated visual skins of the same world produce a byte-identical digest.
Ground-truth modalities
When rendered, the render path emits aligned ground-truth channels intoeuca-dataset (not the color image — these are exact labels):
- ✅ Entity-id segmentation — a per-pixel entity index (
index + 1;0= background), so a pixel maps deterministically to anentity_id. - ✅ Metric depth — a per-pixel depth in world units.
Aligned bundles
The offline data face exports a deterministic rollout as aligned layers — video plus object/field/ graph/causal projections, actions, and counterfactuals — each example addressable by(episode_id | stream_id, tick, entity_id), with deterministic tick↔frame and pixel→entity mappings.
The structured projection, causal projection, counterfactual harness, action logging, and Parquet/manifest
export are in place; the full GT modality suite and a few adapters are still being built.
Dataset extraction is primarily an offline/in-process capability (see the
world_model_capabilities and experiment examples in the repo), not a plain :3917 HTTP route. The
online path — a live world an agent learns against step by step — is covered in
Evaluation.Status
- ✅
WorldStateGraph, canonical JSON,state_digest(FNV-1a), entity-id segmentation + metric depth channels, aligned-bundle export (object/causal projection, counterfactuals, actions, Parquet/manifest) — shipped. - 🟡 The full spatial GT suite (normals, semantic-seg, camera pose, optical flow, multi-camera) and some scoring adapters are in progress.
Evaluation
How exact state + the answer key become a model score.