Skip to main content

MEP 24. VM2 Subsystems

FieldValue
MEP24
TitleVM2 Subsystem Design
AuthorMochi core
StatusDraft
TypeStandards Track
Created2026-05-16

Abstract

MEP 21 defined the typed register VM and its 8-byte NaN-boxed Cell. The follow-on rounds (MEPs 18-20 and the MEP 23 perf rounds 1-3) drove the integer-only core of vm2 to within ~1.07x Lua on iterative loops and ~1.55-1.70x on recursive calls. That work covered exactly one type family (signed 48-bit integers) and one control-flow shape (direct calls + tail calls). Every other Mochi value family - strings, lists, maps, structs, closures, results - is still either a legacy-VM-only feature or a placeholder slot in the Objects table.

This MEP is the consolidated subsystem plan that takes vm2 from "integer core" to "feature parity with the legacy VM" without falling back into ad-hoc opcode growth. It specifies:

  • the value-representation rules every subsystem must obey,
  • per-subsystem representations for strings, lists, maps, structs, closures,
  • a garbage collection strategy compatible with the contiguous register stack,
  • explicit handler-stack opcodes for Result/error propagation,
  • the dispatch and compilation pipeline gaps that block the rest of the corpus,
  • a porting plan with hard correctness + benchmark gates per subsystem.

The goal is that subsequent PRs implement one subsystem each against an agreed contract, instead of inventing one each.

Motivation

The legacy VM grew organically. Strings, lists, and maps each carry their own opcode family, their own allocation path, and their own implicit GC behavior (none - everything is a Go-managed interface{}). When vm2 was scaffolded under MEP 21 we deliberately kept it integer-only so the dispatch core could be tuned in isolation. That phase is done. The next move is not "add ten more opcodes per type"; it is to write down the shared invariants once so every subsystem PR is a pure implementation exercise, not a design exercise.

Three concrete forcing functions:

  • Cell tag space is shared. Cell reserves a finite set of NaN payload tags (int, bool, nil, ptr, ...). Strings, lists, maps, structs, and closures all want a tag. Allocating them ad-hoc creates a tag collision risk and forces every consumer (verifier, GC, debugger, JIT in MEP 22) to be rewritten each time.
  • GC interacts with the register stack. Round 3 of MEP 23 replaced the sync.Pool frame model with a contiguous Stack []Cell. The pop path explicitly does not zero Cell slots because vm2 has no ptr-tagged cells yet. As soon as the first heap subsystem lands, that assumption breaks. The pop path, the tail-call rewrite path, and the Objects table all need a single owner.
  • Error handling has no design yet. The legacy VM raises Go panics for runtime errors and unwinds via deferred recover. vm2 currently return ret, errors.New(...) from the dispatch loop, which only works because nothing above main wants to catch. As soon as we port try/Result<T,E>/? we need handler frames - and the cheapest design is one we lock in before we have a hundred opcodes referencing the trap path.

Non-goals

  • No new optimizer passes in this MEP. Const-fold, DCE, tail-call rewrite, regalloc are already in compiler2/opt and compiler2/regalloc. Their evolution belongs to MEP 18 / MEP 19 follow-ups.
  • No JIT. That is MEP 22, deferred. This MEP is the interpreter substrate the JIT will eventually consume.
  • No new language features. This MEP ports existing Mochi semantics onto vm2; the source language is unchanged.

Specification

1. Value representation

Cell is unchanged from MEP 21: 8 bytes, NaN-boxed, with tags in the high 16 bits when the IEEE-754 NaN bit pattern is set. The following tag assignments are normative for vm2:

TagPayloadOwner
tagInti48int / bool (boxed when wider)
tagBoolu1bool
tagNil-nil
tagF64full f64non-NaN floats inline; NaN payloads go through tagObj
tagStru32 obj idxstring subsystem (§2)
tagListu32 obj idxlist subsystem (§3)
tagMapu32 obj idxmap subsystem (§4)
tagStructu32 obj idxstruct subsystem (§5)
tagFnu32 fn idxdirect function ref
tagClosu32 obj idxclosure subsystem (§6)
tagErru32 obj idxerror value (§8)

All tag* indices into the Objects table are uint32. That caps live boxed-object count per Run() at ~4 billion - effectively unbounded for the workloads in scope.

Rules every subsystem must obey:

  1. Every heap-allocated value goes through vm.AddObject(o) and is referenced from a Cell by its Objects index. No subsystem may store a Go pointer directly inside a Cell.
  2. The Objects table is owned by the running *VM and is reset on each Run(). Subsystems must not stash indices across runs.
  3. A Cell whose tag is one of tagStr | tagList | tagMap | tagStruct | tagClos | tagErr is a ref cell. Ref cells are roots from the register stack's point of view (§7).

2. Strings

Representation. A string value is {tag: tagStr, payload: idx} where Objects[idx] is a *vmString:

type vmString struct {
bytes []byte // immutable, UTF-8
hash uint64 // memoized; 0 means "not computed yet"
}

Why a header struct. A bare string would work but blocks two optimizations: (a) cached hash for map keys, and (b) future small-string-optimization (storing up to 6 bytes inline in Cell via a tagSStr tag) without re-piping every consumer.

Opcode family.

OpABCEffect
OpLoadStrKdstk idxregs[A] = consts[B] (string)
OpConcatStrdstlhsrhsregs[A] = lhs ++ rhs
OpLenStrdstsrcregs[A] = byte length
OpIndexStrdstsrcidxregs[A] = single-byte string
OpEqualStrdstlhsrhsbyte compare
OpHashStrdstsrcfills + returns memoized hash

All strings are immutable. OpConcatStr always allocates a fresh *vmString. The string constant pool is per-Function, sharing the same indexing as Function.Consts but reading the bytes from a parallel StrConsts [][]byte slice in Function.

Correctness gate. Port tests/vm/legacy/strings_*.mochi (concat, length, index, equality, hash determinism) to a vm2 fixture under runtime/vm2/testdata/strings/ and require byte-identical output.

Benchmark gate. Add bench/template/strings/concat_loop.{mochi,py,lua} and eq_table.{mochi,py,lua} to the MEP 23 sweep. Target: within 2x Lua on concat, within 1.5x Lua on equality.

3. Lists

Representation. Objects[idx] holds *vmList:

type vmList struct {
data []Cell
}

A list is a growable register-cell buffer. No type specialization at MVP (every element is a Cell; ints pay the tag overhead). Specialization ([]int64 storage for homogeneous lists) is a follow-on MEP, gated on profile evidence.

Opcode family.

OpABCEffect
OpNewListdstcapallocate empty vmList with capacity cap
OpListLendstsrcregs[A] = len(list)
OpListGetdstsrcidxbounds-checked load
OpListSetsrcidxvalbounds-checked store (no dst; mutation)
OpListPushsrcvalappend; amortized O(1)

GC interaction. A vmList's data holds ref cells. The GC (§7) must scan them. The pop-frame slot-zero rule (§7) applies because a register holding a tagList cell pins the list.

Correctness gate. Reuse legacy tests/vm/list_*.mochi.

4. Maps

Representation. Objects[idx] holds *vmMap:

type vmMap struct {
entries map[mapKey]Cell
}
type mapKey struct {
tag uint16
bits uint64 // string → vmString.hash, int → i48, bool → 0/1
aux uint32 // for collisions on tagStr: vmString idx (compared by bytes)
}

Go's built-in map is the MVP. Replacement with an open-addressed table is a follow-on MEP. Reasons to ship Go-map first: the legacy VM does the same, behavior parity is trivial, and there is no profile evidence that maps are hot in the corpus.

Opcode family: OpNewMap, OpMapGet, OpMapSet, OpMapDel, OpMapLen, OpMapHas. Same shape as the list family.

Key equality. Keys hash by tag+payload. tagStr keys collide if hashes match; tiebreak by byte compare (the aux field carries the vmString idx of the canonical key for the entry).

5. Structs

Representation. Objects[idx] holds *vmStruct:

type vmStruct struct {
fields []Cell // length == Type.NumFields, indexed by slot
typeID uint32 // index into Program.Types
}

Field slots are assigned at compile time by compiler2/ir (already partially in place via the typed-IR work). The runtime never does name → slot lookup; that is a compile-time invariant.

Opcode family.

OpABCEffect
OpNewStructdsttype idallocate, fields all nil
OpGetFielddstsrcslotregs[A] = struct.fields[slot]
OpSetFieldsrcslotvalstruct.fields[slot] = val

No reflection. No name-based field access at runtime. (If/when added, it lives behind a separate OpGetFieldByName that pays a hash lookup; the typed path stays cheap.)

Type table. Program.Types is parallel to Program.Funcs: a flat slice of *StructType{Name, FieldNames []string, NumFields int}. The verifier rejects OpGetField with slot >= NumFields.

6. Closures

Representation. Objects[idx] holds *vmClosure:

type vmClosure struct {
Fn *Function
Upvalues []Cell
}

Upvalues are by-value at MVP. Reasoning: every Mochi closure in the test corpus captures by value (immutable bindings dominate). By-reference upvalues require an Upval indirection cell and a more complex emitter; defer until a corpus program needs it.

Opcode family.

OpABCDEffect
OpMakeClosdstfnupv srcnregs[A] = closure(fn, copy(regs[upv..upv+n]))
OpCallClosdstclsargs srcncall closure value; reuses OpCall frame path
OpGetUpvaldstidxregs[A] = currentClos.Upvalues[idx]

OpCall (direct call) stays for known-target calls; OpCallClos is for the value-call case. Both go through the same pushFrame path defined in MEP 23 Round 3.

7. Garbage collection

Strategy. Tracing mark-sweep over the Objects table, with a freelist for index reuse. Not Go-GC delegation - we keep Objects []any so the Go runtime traces the boxed objects for us automatically, but we maintain our own freelist to reclaim slots (and thus tag indices) inside a Run().

Why not reference counting. Cycles are reachable (list of structs that points back), and Mochi programs are short-lived enough that the throughput cost of RC's per-write barrier outweighs the latency win.

When GC runs. Triggered when len(Objects) - freelist > threshold after an allocating opcode. Threshold starts at 64K, doubles after each cycle that frees less than 25% of objects. No incremental phase at MVP.

Roots. Three sets:

  1. Register stack. Every Cell in vm.Stack[0:len(vm.Stack)] whose tag is a ref tag is a root. This is why the round-3 pop-frame path must change the moment §3 lands: popFrame must zero ref-tagged slots in the popped window before shrinking, otherwise a freed slot index could be wrongly retained as live. The change is: walk Stack[base:oldEnd], for each ref-tagged cell mark Objects[idx] = nil only if we are also pushing the slot onto the freelist on the next GC.
  2. Frame chain. Each frame holds an *Function, which is statically allocated (Program.Funcs[i]) and not tracked by GC.
  3. In-flight call args. OpCall's argBuf and OpTailCall's args snapshots live on the Go stack across one dispatch step; they are visible to Go GC, not vm2 GC. Vm2 GC cannot run mid-instruction (it only runs at instruction boundaries), so these are safe.

Tail-call interaction. OpTailCall shrinks vm.Stack to base then regrows. The shrink window may contain ref cells from the caller; same zero-out rule as popFrame applies. This is the second site that breaks the round-3 "no zero-out" optimization. The cost is ~NumRegs * 8 bytes of writes per tail call; the compiler should mark frames as leaf-typed (no ref cells) and skip the zero pass when statically provable. That marking is a compile-time analysis in compiler2/ir, gated by §11 milestones.

Verifier obligations. The verifier must reject any program that reads a ref cell from a register the function did not write since entry (uninitialized-ref-cell read). Combined with the zero-out rule, this guarantees no dangling indices.

8. Errors and Result

Model. Mochi has Result<T, E> (from MEP 8 / P8.x) and bare runtime errors (div by zero, OOB, OOM). Both fold onto one mechanism: an error cell (tagErr) plus an explicit handler stack.

Handler stack. A parallel slice on the VM:

type handler struct {
FrameDepth int // len(vm.Frames) at OpPushHandler time
PC int // resume IP in that frame
DstReg int32 // register to receive the err cell
}
vm.Handlers []handler

Opcodes.

OpABCEffect
OpPushHandlerdstpcpush handler{FrameDepth, pc, dst}
OpPopHandlerdrop top of Handlers
OpRaisesrcunwind frames until Handlers top, then jump to its pc
OpOkdstvalwrap val as Result.Ok(val)
OpErrdstvalwrap val as Result.Err(val) (tagged tagErr)
OpTrydstsrcpcif src is err, OpRaise(src); else regs[dst] = src.Ok

This matches MEP 21's intent (Result is a value, not a control-flow primitive at the source level) but gives the runtime cheap unwinding for ? and for trap-style errors. The dispatch loop's current return ret, errors.New(...) paths (div-by-zero, mod-by-zero, OpHalt, unknown op) are rewritten to OpRaise-equivalent in-loop transfer; only the case where Handlers is empty propagates a Go error.

Why a separate handler stack (rather than reusing the frame chain). It is finer-grained than a frame: a single function can have multiple try-blocks. Putting handlers on a parallel stack means the frame stack stays a uniform array of pure activation records.

9. Dispatch evolution

Round 3 of MEP 23 closed most of the dispatch gap. What this MEP locks in:

  • Stay on Go's switch. Computed goto / tail-call dispatch was investigated under MEP 18 and is not portable; the modern Go compiler emits a jump table for dense switch Op (it has since Go 1.21). The interpreter loop in runtime/vm2/eval.go already takes that path.
  • Superinstruction policy. Fuse a bigram only when (a) it appears in the static instruction frequency for at least three corpus programs and (b) the unfused pair costs more than ~3% of the program's runtime in profiles. Fusing speculatively is forbidden; every fused op carries a perf-justification commit message ref in its ops.go doc comment. This is the same standard the existing OpAddI64K, OpJumpIfLessI64, OpTailCallSelf met.
  • No threaded code, no copy-and-patch. Those live in MEP 22, deferred.

10. Compilation pipeline gaps

The work above creates new lowering obligations on compiler2. The pipeline today is parse → typed IR → opt (constfold/DCE/tailcall) → regalloc → emit. Subsystem MVPs need:

PassSubsystem depOwnerMVP scope
string lowering§2emitstring lit → OpLoadStrK; ++OpConcatStr
list/map literals§3, §4ir + emit[1,2,3]OpNewList + OpListPush*N
field slot assign§5irstruct decl → typeID; field name → slot
upvalue capture§6irfree-var analysis → OpMakeClos upv list
try/? lowering§8ir + emittry block → push/pop; ?OpTry
ref-cell zero hint§7irmark functions that hold no ref cells

Each row maps to one PR. Each PR ships its opcode family, its lowering pass, the legacy-port correctness fixtures, and a new entry in the MEP 23 sweep.

11. Porting plan and gates

Subsystems land in this order, gated:

  1. Strings (§2). Legacy fixtures must pass; MEP 23 strings programs added.
  2. Lists (§3). Legacy list fixtures pass; list_sum, list_map benchmarks added.
  3. Structs (§5). Legacy struct fixtures pass. No dedicated bench (structs are accessed inside list/map work).
  4. Maps (§4). Legacy map fixtures pass; map_count benchmark added.
  5. Closures (§6). Legacy closure fixtures pass; closure_counter benchmark added.
  6. GC (§7). Implemented behind a GOMOCHI_VM2_GC=1 flag; turned on by default once the corpus runs clean for 24h on the nightly fuzzer.
  7. Result / try (§8). Legacy result_*.mochi fixtures pass; try_chain benchmark added.

Each PR must:

  • include a runtime/vm2/testdata/<subsystem>/ fixture set ported from the legacy VM, plus a vm2_<subsystem>_test.go that diffs output against the legacy VM,
  • add at least one entry to bench/template/<subsystem>/ covering the new family in Mochi + Python + Lua,
  • update the MEP 23 result table in the same PR,
  • not regress any prior subsystem benchmark by more than 5% on the MEP 17 gate.

The last bullet is the critical one. Subsystem PRs that move integer-core numbers backward (e.g., by widening Cell checks in the hot path) get reverted.

12. Backward compatibility

vm2 has no users outside the project. The legacy VM stays the default until §1-§7 land; once they do, the default flips to vm2 and the legacy VM is removed in a subsequent PR. No source-language change.

Open questions

  • Inline strings. Should we add tagSStr (up to 6 inline bytes) on day one or wait for profile evidence? Leaning wait, but the consumer-side changes are small enough that adding it later is cheap if §2's interface is respected.
  • List specialization. Same question for []int64-backed homogeneous lists. Same answer: wait for profiles.
  • Generational GC. The MVP collector is non-generational. A nursery would help long-running programs but is irrelevant to the current corpus. Defer.
  • Cross-Run Objects pinning. If embedding hosts ever want to hold a Mochi value across Run() calls, we need a pin API. Out of scope until there is a host.

References

  • MEP 21: Compiler2 / VM2 co-design (Cell, typed register VM).
  • MEP 23: Cross-language baseline (the benchmark gate this MEP feeds into).
  • MEP 22: Copy-and-patch JIT (deferred; this MEP is the substrate it will consume).
  • MEP 17: Per-PR perf gate.
  • MEP 8: Result<T,E> source-level semantics.