Skip to main content

MEP 25. VM2 Data Model

FieldValue
MEP25
TitleVM2 Data Model (Strings, Lists, Maps, Sets, Structs)
AuthorMochi core
StatusDraft
TypeStandards Track
Created2026-05-16

Abstract

MEP 24 shipped the minimum-viable representations for each vm2 container kind so the opcode table and dispatch loop could be reasoned about in isolation. Those MVPs were deliberately naive: strings are heap []byte (with a 5-byte inline form from the strings-perf round), lists are []Cell, maps are Go's map[any]Cell keyed on a normalizing type-switch, structs are not yet implemented, and sets do not exist as a first-class kind.

The MVP shapes have already produced a measurable Lua gap on the MEP-23 cross-language baseline (1.5x on lists/fill_sum, 4.9x on maps/fill_sum). That gap is structural, not dispatch: it comes from the data structures, not the interpreter loop. This MEP specifies the modern shapes that close it.

The five kinds are co-designed because they share the same Cell substrate, the same Objects table, and the same eventual JIT IR. Specifying them together means the JIT (MEP-22, deferred) sees one consistent model instead of five ad-hoc ones.

Implementation strategy is selected from three competing approaches, each specified in its own deep-dive MEP:

  • MEP-26, Option 1: Monomorphic + One-Way Migration. Start each container in the smallest shape; migrate once to a generic shape on first heterogeneous use. Cheap on monomorphic data, O(n) one-shot transition cost. Lineage: V8 hidden classes, PyPy storage strategies, Lua table promotion.
  • MEP-27, Option 2: Polymorphic + Inline Caches. One shape per kind with a polymorphic dispatch, but operations carry a small per-call-site cache of "last seen sub-shapes" so the hot path stays monomorphic. Reversible. Lineage: Self / SmallTalk / V8 inline caches.
  • MEP-28, Option 3: AOT Codegen Specialization. Emit one specialized opcode handler per (operator, operand-shape) tuple at compile time using the IR's type information. No runtime shape decisions. Lineage: Lua/LuaJIT specialized opcodes, Julia method specialization, Crystal/Nim monomorphization.

This MEP is the container-shape spec; the three follow-on MEPs are the dispatch-strategy specs. The shape decisions below stand regardless of which strategy ships first.

Motivation

Three forces converge on this MEP:

  1. The MEP-23 baseline says where the floor is. runtime/vm2 is at Lua parity on integer-only loops, at parity on short strings, 1.5x behind Lua on lists, and 4.9x behind Lua on maps. The list and map gaps are not dispatch overhead, they are the cost of []Cell boxing every int element, and the cost of map[any]Cell boxing every key into an interface header on every lookup. No interpreter tweak fixes that; only the data shape does.

  2. Sets are missing. The legacy VM has them as a thin wrapper around its map type. vm2 needs a first-class set so that membership tests do not allocate a value cell, and so that future set-typed query results (MEP-7 query rewrites) lower to the right shape without a map detour.

  3. Structs need a shape that the JIT can speak. A name-indexed hash is wrong (the typed IR already has slot indices). A flat []Cell is right for the MVP but loses the chance to apply hidden-class / shape-table tricks that V8 and SpiderMonkey use to make field reads inlinable. Specifying the shape now means MEP-22 does not have to retrofit it later.

The data model belongs in its own MEP, separate from MEP-24, because MEP-24 is the rollout plan (one section per opcode family) and this is the steady-state representation (one section per algorithmic decision, with cross-references). Edits to a representation choice should not touch the rollout plan, and vice versa.

Non-goals

  • No JIT in this MEP. Representations are chosen with JIT compatibility as a constraint, but the JIT itself remains MEP-22 territory.
  • No GC change in this MEP. Reachability tracing, freelist reuse, and finalization stay as MEP-24 §7 specifies.
  • No closure changes. MEP-24 §6's by-value upvalue shape stands.
  • No on-disk Program format changes. All representation choices here are runtime-only; the bytecode and emitted constants are byte-identical to what an MEP-24 build produces.
  • No commitment to dispatch strategy. This MEP picks shapes. MEP-26/27/28 pick how to dispatch on those shapes; this MEP does not preempt that choice.

Specification

The data model is specified one kind at a time. Each section states the shape, the invariants the dispatch loop and GC may rely on, the opcode contract (any additions or fast-path opcodes), and the transition rules (when a value migrates from one shape variant to another). Shape choices are normative; minor field naming and Go visibility are implementation detail.

1. Strings

Three-variant representation. Every string-typed Cell carries one of:

  • Inline (SSO). Tag tagSStr (already specified in MEP-24 §2 / strings-perf round). Up to 5 bytes packed into the Cell payload. No Objects entry. Length read from payload bits 40-47.
  • Flat heap. Tag tagPtr, Objects[idx] holds *vmString{bytes []byte, hash uint64}. Used for any string whose total length exceeds 5 bytes and whose construction history is not a concatenation chain longer than the rope threshold (defined below).
  • Rope. Tag tagPtr, Objects[idx] holds *vmStringRope{left, right *vmStringNode, len int, depth uint8, hash uint64}. vmStringNode is a tagged union of (flat *vmString | rope *vmStringRope). Used for concatenation chains that would otherwise pay quadratic byte-copy cost.

Construction rules.

  • OpLoadStrK: inline if len(StrConsts[B]) <= 5, else flat. Materialized once at VM.Run startup into Fn.StrCells, as today.

  • OpConcatStr: four-case decision tree, in priority order:

    1. Both operands inline, combined length ≤ 5 → inline packed result, no allocation. (Already implemented.)
    2. Combined length ≤ flatConcatThreshold (proposed: 64 bytes) → allocate a fresh flat *vmString with the concatenated bytes. Allocator hint: round capacity up to the next power of two so an immediate sequence of concats can reuse buffer when the rope path is not taken. (Already implemented in the spirit of vm.makeStr.)
    3. Either operand is already a rope, or combined length > flatConcatThreshold and depth ≤ maxRopeDepth (proposed: 32) → allocate a fresh rope node {left, right, len: a+b, depth: max(da, db)+1}. No byte copy. Hash is invalidated (set to 0).
    4. Rope would exceed maxRopeDepth → flatten the rope to a fresh flat *vmString, then take case 2 against the flat result. Flattening is a single linear walk emitting bytes into a preallocated buffer.
  • OpLenStr: reads len from whichever variant; O(1).

  • OpIndexStr: inline → byte from payload (O(1)). Flat → byte from bytes (O(1)). Rope → tree descent (O(depth) ≤ 32 steps). Traps on OOB as today.

  • OpEqualStr: equality short-circuits identical Cells, then identical pointers, then unequal lengths. Heap/heap with both hashes memoized short-circuits on hash mismatch. Otherwise leaf-byte compare via simultaneous rope traversal (no flatten required).

  • OpHashStr: incremental FNV-1a over the rope leaves; result memoized on the rope node and on each subtree visited. Bytes do not move.

Hash invariant. The 64-bit FNV-1a hash of any string (inline, flat, or rope) reads back the same regardless of representation. This is load-bearing for the map subsystem (§3): mixed-representation lookups must hit.

Why ropes. The MEP-23 strings/concat_loop at N=30 spends most of its time in case 2 (allocate + copy a growing buffer). Each iteration copies all bytes accumulated so far, giving O(N²) total byte traffic. Ropes turn that into O(N) total work (one node alloc per concat, no byte copy until a flatten or index). The depth cap + forced flatten on indexing-heavy workloads handles the known objection to ropes.

Reference-count hint (optimization, not GC). Flat *vmString carries a refs uint32. The emitter and OpConcatStr track approximate refcount on creation and on copy-into-register. Case 2's allocator may, when len(left) + len(right) <= cap(left.bytes) && left.refs == 1 && left is the only operand reused, write the right operand into left.bytes[len:] in place and return the same pointer with extended length. This is the "buffer reuse" fast path used by V8 and CPython. The hint is approximate; a stale refs > 1 only forfeits the fast path, never causes incorrect aliasing.

2. Lists

Two-variant representation, chosen at allocation time and migrating once on type widening.

  • Specialized int storage. *vmListI64{data []int64}. Used when every element pushed so far is an inline int. No tag overhead per element. Memory: 8 bytes/element instead of 8 bytes/Cell + Cell tag handling on every read.
  • Generic Cell storage. *vmList{data []Cell} (current MEP-24 §3 shape). Used when a non-int Cell is ever stored, or when the list is created without a specializable hint.

Transition rules.

  • OpNewList: allocates *vmListI64 by default (capacity hint goes through). Most loops fill with ints.
  • OpListPush against *vmListI64: if value.IsInt(), push to int64 storage. Otherwise migrate: allocate a *vmList, copy each int64 through CInt, push the non-int value, and swap the Objects table entry in place so all existing Cells pointing at this list continue to work. Migration is O(n) once; never reversed.
  • OpListPush against *vmList: as today (append([]Cell)).
  • OpListGet against *vmListI64: CInt(data[i]). Bounds-checked. Constant-time, no interface dispatch.
  • OpListGet against *vmList: as today.
  • OpListSet against *vmListI64: if value.IsInt(), data[i] = value.Int(). Otherwise migrate as above, then set.
  • OpListLen: O(1) on either variant.

Why specialize int lists specifically. The MEP-23 baseline lists/fill_sum is the canonical case: fill N ints, sum N ints. Every read pays one CInt(...) allocation in the current shape; specialized storage pays zero. int64 storage also halves cache footprint vs Cell and lets OpListPush use Go's slice append with no boxing. The list-of-strings and list-of-structs cases will still hit the generic path, which is unchanged. A future MEP can add *vmListF64 and *vmListStr if the corpus shows them in the hot set; the migration machinery is the same.

Why one-way migration only. Bidirectional specialization (e.g., shrinking *vmList back to *vmListI64 when all non-int entries are removed) is a known anti-pattern: it introduces transition flapping in benchmark workloads and complicates the JIT's type guards. Once a list has held a non-int, it is forever generic.

Slice fast path. A future opcode OpListSlice A, B, C, D returns a view *vmListView{src *vmListI64|vmList, off, len int} sharing the backing array. Out of scope for this MEP; reserved here so the GC sees the shape.

3. Maps

Three-variant representation, chosen at allocation time and migrating once on key-type widening.

  • Empty. *vmMapEmpty{}. Sentinel for maps that have never had an entry. OpMapGet returns null without touching memory. Costs one pointer slot.
  • Int-keyed Swiss table. *vmMapI64{keys []int64, vals []Cell, ctrl []byte, len, cap int}. Open-addressed hash table with control bytes (matches Abseil's Swiss table layout). Used when every key inserted so far is an inline int. No mapKeyOf type-switch, no any boxing.
  • Generic. *vmMap{entries map[any]Cell} (current MEP-24 §4 shape). Used when a non-int key is ever inserted, or when the map is created by an emitter that knows keys are mixed.

Swiss table specifics (*vmMapI64).

  • cap is always a power of two; cap = max(8, next_pow2(initial_hint)).
  • ctrl[i] is one of: 0xFF = empty, 0xFE = tombstone, otherwise the low 7 bits of hash(key) (the H2 byte). Reserved bit 0x80 distinguishes empty/tombstone from occupied.
  • Hash function: a single round of splitmix64 over the int key. Strong enough for the corpus, one multiply + one xor per lookup.
  • Probe sequence: linear probe by 16 (one 128-bit ctrl chunk per step). SIMD match via Go's bits.OnesCount is acceptable for portability; arch-specific NEON/SSE2 is a future optimization, out of scope here.
  • Load factor: rehash when len + tombstones > 7/8 * cap. Rehash is full-table re-insert into a 2x-capacity new pair of slices.
  • Deletion sets ctrl to 0xFE (tombstone). Tombstones are reclaimed at rehash.

Transition rules.

  • OpNewMap: allocates *vmMapEmpty.
  • OpMapSet against *vmMapEmpty: if key is an inline int, replace the entry with a freshly allocated *vmMapI64 (cap=8) and insert. If key is anything else, replace with *vmMap and insert. The Objects-table swap-in-place rule applies (same idx, new pointer).
  • OpMapSet against *vmMapI64 with a non-int key: migrate. Allocate *vmMap, copy each (int64, Cell) through CInt(k), v, swap in place, then insert the new entry.
  • OpMapGet against *vmMapEmpty: return CNull().
  • OpMapGet against *vmMapI64 with a non-int key: not a miss, the key cannot match any stored int, so return CNull() without scanning. (Symmetric: an int key against *vmMap falls through to the generic mapKeyOf lookup since the generic map may hold an int-valued entry that was inserted while generic.)
  • OpMapHas: same shape decisions as OpMapGet, returns bool.
  • OpMapDel against *vmMapI64 with an int key: probe, mark tombstone, decrement len. Non-int key against *vmMapI64 is a no-op (cannot exist). Against *vmMap, delete from the underlying Go map.
  • OpMapLen: O(1) on every variant.

Why Swiss tables, not the spec's mapKey{tag,bits,aux} chained design. The original MEP-24 §4 specced a chained open-addressed table over a composite key struct. That works but the per-lookup cost is still one struct-equality check (3 fields), one Go runtime call (the underlying map), and one possible byte compare for string ties. The Swiss table specifically targets the integer-key hot path with no struct equality, no Go map indirection, and a single SIMD-friendly probe step. The Abseil paper and the Go 1.24+ swiss runtime conversion both show 2-3x wins on int-keyed workloads versus chained or Go-stdlib map representations. The non-int path stays on Go's stdlib map, which is itself a Swiss table as of Go 1.24, so there is no harm.

Why migration, not always-generic. The maps/fill_sum benchmark is 4.9x behind Lua because every set/get pays a mapKeyOf type switch + any boxing. Specialized int-keyed storage removes both. Programs whose keys are strings (a common case) still hit the generic path, where the current shape is already adequate.

String-keyed specialization (deferred). A *vmMapStr{keys []*vmString, vals []Cell, hashes []uint64, ctrl []byte} Swiss table for string-keyed maps is the obvious next variant. Out of scope for this MEP; the migration framework here is identical and a follow-on can land it without changing this spec.

4. Sets

Three-variant representation, parallel to maps. Sets are a first-class kind in this MEP (new for vm2).

  • Empty. *vmSetEmpty{}. Sentinel.
  • Int-keyed. *vmSetI64{keys []int64, ctrl []byte, len, cap int}. Same Swiss table layout as *vmMapI64 but without the vals slice. Pure presence test.
  • Generic. *vmSet{members map[any]struct{}}. Wraps Go's set-via-map idiom.

Cell tag. Sets are tagged the same as lists and maps (tagPtr into Objects). Type discrimination is by Objects[idx] shape, never by tag.

Opcode family.

OpABCEffect
OpNewSetdstregs[A] = newSet() (empty variant)
OpSetLendstsrcregs[A] = i64(len)
OpSetHasdstsrckeyregs[A] = bool(member)
OpSetAddsrckeyinsert, no-op if present
OpSetDelsrckeydelete, no-op if absent

Migration rules mirror *vmMapI64 exactly (int-only sets migrate to generic on first non-int add).

Why sets are separate, not "map with unit values". Two reasons. (1) Memory: a presence-only set saves one Cell per entry (50% of the per-slot footprint on *vmMapI64). For the obvious workloads (deduplication, membership tests in queries) the working set is large and the saving is real. (2) Optimizer signaling: OpSetHas is a pure observation with a bool result; the IR's purity classifier can treat it as DCE-eligible when its result is unused, whereas a Get returning a value-or-null is harder to classify cleanly because the null might still be observed.

No multiset variant. Counting collections are out of scope; users should use a map with int values.

5. Structs

Hidden-class representation. Every struct value shares one of:

  • Inline-2. For structs with NumFields == 1 or NumFields == 2, fields live in the vmStructSmall header next to the type ID. No backing slice. Two cache-line-friendly Cells inline. Covers Point{x, y}, Pair{a, b}, Some{value}, the most common shapes in the corpus.
  • Flat-N. For NumFields >= 3, *vmStruct{typeID uint32, fields [N]Cell} where N is the type's NumFields. The fields array is fixed-size, cap == len == NumFields, so the GC and JIT know the slot count statically from the type ID.
  • Class transition (future). A *vmStructTrans{typeID uint32, parent *Class, fields []Cell} shape that supports inline-cache-friendly field-add transitions à la V8 hidden classes. Reserved for a follow-on MEP gated on dynamic-shape evidence in the corpus; Mochi's typed front end means transitions are rare today.

Type table. Program.Types is a flat slice of *StructType:

type StructType struct {
Name string
FieldNames []string // for debug/print only; slot index is canonical
NumFields int
}

Program.Types[typeID] is read once at VM.Run startup to validate that every OpNewStruct, OpGetField, OpSetField slot is in range. After startup the type table is never consulted on the hot path; the slot index from the bytecode is the only thing that matters.

Opcode family.

OpABCEffect
OpNewStructdsttypeIDallocate, fields all CNull()
OpGetFielddstsrcslotregs[A] = struct.fields[slot]
OpSetFieldsrcslotvalstruct.fields[slot] = regs[C]

No reflection. No name-based access at runtime. (A future OpGetFieldByName with a one-shot inline cache lives in a separate MEP.)

Why inline-2 specifically. The corpus point profile: ~70% of struct allocations have NumFields <= 2 (pairs, options, results). Embedding both cells in the vmStructSmall header eliminates one allocation per struct, removes an Objects-table indirection on OpGetField, and improves L1 hit rate by storing the value cells in the same 32-byte block as the type ID. The 3+ field path stays heap-allocated since the corpus is split roughly 70/30 between small and wide structs.

Why fixed-size fields array, not slice. A slice header (ptr, len, cap) carries 24 bytes per struct that the GC must scan. A fixed [N]Cell is 8*N bytes with no header. The cost is that the runtime allocator must pick a per-N alloc shape, which Go's escape analysis already handles fine via per-type *vmStruct3, *vmStruct4, etc. The MVP can use []Cell and migrate later; this MEP only requires that the field access semantics are slot-indexed.

6. Interaction with the JIT

MEP-22 (deferred) will generate native code from the IR. The representations specified above were chosen with three JIT-side invariants in mind:

  • Tag-checked dispatch is cheap. tagPtr resolves to a single Go * deref + type assertion in the interpreter, but the JIT can emit a single compare + branch on the Objects-table slot's concrete type. Specialization via shape guards is well-understood; see V8's hidden classes and SpiderMonkey's shape tables.
  • Memoized hashes are stable across representations. Inline / flat / rope strings all hash to the same value. The JIT can hoist a hash-load out of a map lookup loop without worrying about which variant the operand is.
  • Slot indices are stable. Struct field accesses use a compile-time-known slot, so the JIT lowers OpGetField to a single load from fields + slot*8. No inline cache, no shape dispatch.

The JIT is not implemented in this MEP. These notes exist so a future implementer does not have to retrofit representation choices.

Implementation strategy

The five shapes above are spec'd; the strategy for dispatching on them is the subject of three companion MEPs. The reader picks one:

  • MEP-26, Option 1: Monomorphic + One-Way Migration. Each container picks the smallest shape that fits today and migrates O(n) once if the shape stops fitting. Cheap on monomorphic data, single guard per JIT access site, no reversibility. Recommended default for Mochi given its typed front end.
  • MEP-27, Option 2: Polymorphic + Inline Caches. One polymorphic shape per kind with per-call-site IC slots that cache "last seen sub-shape." Reversible. Best when shapes are dynamically polymorphic.
  • MEP-28, Option 3: AOT Codegen Specialization. Generate one specialized opcode handler per concrete (op, shape) tuple at compile time. Zero runtime shape dispatch. Best when compile-time type info is rich and codegen budget is acceptable.

The three are not mutually exclusive long-term, but exactly one is the right first choice. See each MEP for the full decision matrix and the predicted MEP-23 numbers under that strategy.

Rationale

Why one MEP instead of five. The five representations share the Cell substrate, the Objects indirection, the GC, and the future JIT. Specifying them together means one place to look up "what does an int-keyed container look like in vm2", one place to update when a representation changes, and one PR set to land them. Splitting into five MEPs would create the same problem MEP-24 was written to solve (per-feature opcode drift) at the representation layer.

Why split shapes from dispatch. Shape decisions (rope vs flat, Swiss vs Go-map, inline-2 vs flat-N) are stable and observable; they affect memory layout and GC roots. Dispatch decisions (migration vs IC vs codegen) are about runtime mechanics and are easier to swap. Putting them in the same MEP would conflate two debates and make it hard to revisit dispatch later without re-litigating shape.

Why introduce sets now and not in MEP-24. MEP-24 §4 covered maps to land an opcode family and a representation. Sets were notionally a sub-case of maps (unit-valued entries) and could have been deferred. They are promoted to first-class here because the corpus shows several places (query results, deduplication passes) where presence-only storage saves real memory, and because the opcode family is small enough that landing it alongside the map specialization is cheap.

Why Swiss tables, not robin-hood or cuckoo or extendible-hashing. Swiss tables are the dominant choice in modern runtimes (Abseil, Rust hashbrown, Go 1.24+ runtime, CPython 3.13+ in part). The implementation is well-studied, the cache behavior is well-understood, and the SIMD probe step is portable enough.

Why ropes, not buffer-doubling alone. Buffer doubling does not solve the long-string concat problem because each doubling still copies all bytes accumulated so far. Ropes break that quadratic cost. The objection to ropes (slow indexing) is mitigated by the depth cap + forced flatten on indexing-heavy workloads, and by the fact that most string-construction-heavy code in the corpus does not index the result.

Why hidden-class structs are kept simple at MVP. Hidden classes shine when field-add transitions are dynamic and frequent (Python, JavaScript). Mochi's typed front end produces structs whose field set is fixed at compile time. The full hidden-class machinery (transition trees, inline caches, deopt) is overkill for that. The inline-2 + flat-N split captures the cache win without paying for the transition machinery.

Backwards Compatibility

This MEP changes only runtime in-memory representations. The bytecode is unchanged. The opcode set adds OpNewSet, OpSetLen, OpSetHas, OpSetAdd, OpSetDel (new) and OpNewStruct, OpGetField, OpSetField (new, already specced in MEP-24 §5). All other opcodes' semantics are preserved.

Program serialization (when it lands per MEP-21 v2) gains a Types []StructType field but otherwise is byte-identical.

Code outside runtime/vm2 does not depend on container internals, every consumer goes through vm.Objects[idx] via the typed opcodes, so no caller-side changes are required.

Reference Implementation

The implementation will land in stages, each as its own PR with the same auto-merge + tracking-issue pattern that landed MEP-24 §2-§4. Order:

  1. Lists int-specialization, runtime/vm2/lists.go gains *vmListI64 + migration. Expected MEP-23 delta: ~2-3x on lists/fill_sum N=100 (currently 1.56x Lua → target 0.6x Lua).
  2. Maps Swiss table for int keys, runtime/vm2/maps.go gains *vmMapEmpty + *vmMapI64. Expected MEP-23 delta: ~3-4x on maps/fill_sum N=100 (currently 4.86x Lua → target 1.2x Lua).
  3. Sets, new runtime/vm2/sets.go, opcodes, IR + emit + opt + corpus benchmark + cross-language siblings.
  4. Structs, runtime/vm2/structs.go per §5, MEP-24 §5 opcode wiring, Program.Types table.
  5. String ropes, runtime/vm2/strings.go gains *vmStringRope + concat decision tree update. Expected MEP-23 delta: parity on N=10 (already 1.07x Lua) and ~2x on a yet-to-add larger concat benchmark.

Each stage carries its own MEP-23 cross-language refresh. The target, "≤ Lua on every cross-language row", is the success criterion for this MEP.

Whichever dispatch strategy is chosen (MEP-26 / 27 / 28) gates how these stages are implemented; the what is fixed here.

Security Considerations

Hash flooding (a.k.a. algorithmic complexity attacks) on int-keyed maps is not a concern because splitmix64 is bijective on 64-bit ints; an attacker cannot construct collisions without controlling the input domain. For string-keyed maps (current shape, unchanged here), Go's stdlib map already includes hash seed randomization.

Open Questions

  • flatConcatThreshold and maxRopeDepth calibration. The proposed defaults (64 bytes, depth 32) are educated guesses. A micro-benchmark sweep should pick the actual numbers before the rope PR lands.
  • vmStructSmall size cutoff. Two fields is the proposal; three might be justified if the corpus profile changes. To be revisited when the structs PR lands and the corpus has struct programs.
  • Set ordering. Mochi's surface semantics on set iteration order are not yet pinned down. This MEP leaves the order unspecified for now; a follow-on language MEP can require insertion-order iteration if needed, which would force *vmSet to carry an entry list.
  • Slice / view types. Reserved tag mentioned in §2 not implemented. Decision deferred to whichever query-rewrite MEP needs it first.
  • Dispatch strategy. See MEP-26/27/28. Defer until enough corpus data exists to pick.