MEP 24. VM2 Subsystems
| Field | Value |
|---|---|
| MEP | 24 |
| Title | VM2 Subsystem Design |
| Author | Mochi core |
| Status | Draft |
| Type | Standards Track |
| Created | 2026-05-16 |
Abstract
MEP 21 defined the typed register VM and its 8-byte NaN-boxed Cell. The follow-on rounds (MEPs 18-20 and the MEP 23 perf rounds 1-3) drove the integer-only core of vm2 to within ~1.07x Lua on iterative loops and ~1.55-1.70x on recursive calls. That work covered exactly one type family (signed 48-bit integers) and one control-flow shape (direct calls + tail calls). Every other Mochi value family - strings, lists, maps, structs, closures, results - is still either a legacy-VM-only feature or a placeholder slot in the Objects table.
This MEP is the consolidated subsystem plan that takes vm2 from "integer core" to "feature parity with the legacy VM" without falling back into ad-hoc opcode growth. It specifies:
- the value-representation rules every subsystem must obey,
- per-subsystem representations for strings, lists, maps, structs, closures,
- a garbage collection strategy compatible with the contiguous register stack,
- explicit handler-stack opcodes for Result/error propagation,
- the dispatch and compilation pipeline gaps that block the rest of the corpus,
- a porting plan with hard correctness + benchmark gates per subsystem.
The goal is that subsequent PRs implement one subsystem each against an agreed contract, instead of inventing one each.
Motivation
The legacy VM grew organically. Strings, lists, and maps each carry their own opcode family, their own allocation path, and their own implicit GC behavior (none - everything is a Go-managed interface{}). When vm2 was scaffolded under MEP 21 we deliberately kept it integer-only so the dispatch core could be tuned in isolation. That phase is done. The next move is not "add ten more opcodes per type"; it is to write down the shared invariants once so every subsystem PR is a pure implementation exercise, not a design exercise.
Three concrete forcing functions:
- Cell tag space is shared.
Cellreserves a finite set of NaN payload tags (int, bool, nil, ptr, ...). Strings, lists, maps, structs, and closures all want a tag. Allocating them ad-hoc creates a tag collision risk and forces every consumer (verifier, GC, debugger, JIT in MEP 22) to be rewritten each time. - GC interacts with the register stack. Round 3 of MEP 23 replaced the sync.Pool frame model with a contiguous
Stack []Cell. The pop path explicitly does not zeroCellslots because vm2 has no ptr-tagged cells yet. As soon as the first heap subsystem lands, that assumption breaks. The pop path, the tail-call rewrite path, and theObjectstable all need a single owner. - Error handling has no design yet. The legacy VM raises Go panics for runtime errors and unwinds via deferred recover. vm2 currently
return ret, errors.New(...)from the dispatch loop, which only works because nothing above main wants to catch. As soon as we porttry/Result<T,E>/?we need handler frames - and the cheapest design is one we lock in before we have a hundred opcodes referencing the trap path.
Non-goals
- No new optimizer passes in this MEP. Const-fold, DCE, tail-call rewrite, regalloc are already in
compiler2/optandcompiler2/regalloc. Their evolution belongs to MEP 18 / MEP 19 follow-ups. - No JIT. That is MEP 22, deferred. This MEP is the interpreter substrate the JIT will eventually consume.
- No new language features. This MEP ports existing Mochi semantics onto vm2; the source language is unchanged.
Specification
1. Value representation
Cell is unchanged from MEP 21: 8 bytes, NaN-boxed, with tags in the high 16 bits when the IEEE-754 NaN bit pattern is set. The following tag assignments are normative for vm2:
| Tag | Payload | Owner |
|---|---|---|
tagInt | i48 | int / bool (boxed when wider) |
tagBool | u1 | bool |
tagNil | - | nil |
tagF64 | full f64 | non-NaN floats inline; NaN payloads go through tagObj |
tagStr | u32 obj idx | string subsystem (§2) |
tagList | u32 obj idx | list subsystem (§3) |
tagMap | u32 obj idx | map subsystem (§4) |
tagStruct | u32 obj idx | struct subsystem (§5) |
tagFn | u32 fn idx | direct function ref |
tagClos | u32 obj idx | closure subsystem (§6) |
tagErr | u32 obj idx | error value (§8) |
All tag* indices into the Objects table are uint32. That caps live boxed-object count per Run() at ~4 billion - effectively unbounded for the workloads in scope.
Rules every subsystem must obey:
- Every heap-allocated value goes through
vm.AddObject(o)and is referenced from aCellby itsObjectsindex. No subsystem may store a Go pointer directly inside aCell. - The
Objectstable is owned by the running*VMand is reset on eachRun(). Subsystems must not stash indices across runs. - A
Cellwhose tag is one oftagStr | tagList | tagMap | tagStruct | tagClos | tagErris a ref cell. Ref cells are roots from the register stack's point of view (§7).
2. Strings
Representation. A string value is {tag: tagStr, payload: idx} where Objects[idx] is a *vmString:
type vmString struct {
bytes []byte // immutable, UTF-8
hash uint64 // memoized; 0 means "not computed yet"
}
Why a header struct. A bare string would work but blocks two optimizations: (a) cached hash for map keys, and (b) future small-string-optimization (storing up to 6 bytes inline in Cell via a tagSStr tag) without re-piping every consumer.
Opcode family.
| Op | A | B | C | Effect |
|---|---|---|---|---|
OpLoadStrK | dst | k idx | regs[A] = consts[B] (string) | |
OpConcatStr | dst | lhs | rhs | regs[A] = lhs ++ rhs |
OpLenStr | dst | src | regs[A] = byte length | |
OpIndexStr | dst | src | idx | regs[A] = single-byte string |
OpEqualStr | dst | lhs | rhs | byte compare |
OpHashStr | dst | src | fills + returns memoized hash |
All strings are immutable. OpConcatStr always allocates a fresh *vmString. The string constant pool is per-Function, sharing the same indexing as Function.Consts but reading the bytes from a parallel StrConsts [][]byte slice in Function.
Correctness gate. Port tests/vm/legacy/strings_*.mochi (concat, length, index, equality, hash determinism) to a vm2 fixture under runtime/vm2/testdata/strings/ and require byte-identical output.
Benchmark gate. Add bench/template/strings/concat_loop.{mochi,py,lua} and eq_table.{mochi,py,lua} to the MEP 23 sweep. Target: within 2x Lua on concat, within 1.5x Lua on equality.
3. Lists
Representation. Objects[idx] holds *vmList:
type vmList struct {
data []Cell
}
A list is a growable register-cell buffer. No type specialization at MVP (every element is a Cell; ints pay the tag overhead). Specialization ([]int64 storage for homogeneous lists) is a follow-on MEP, gated on profile evidence.
Opcode family.
| Op | A | B | C | Effect |
|---|---|---|---|---|
OpNewList | dst | cap | allocate empty vmList with capacity cap | |
OpListLen | dst | src | regs[A] = len(list) | |
OpListGet | dst | src | idx | bounds-checked load |
OpListSet | src | idx | val | bounds-checked store (no dst; mutation) |
OpListPush | src | val | append; amortized O(1) |
GC interaction. A vmList's data holds ref cells. The GC (§7) must scan them. The pop-frame slot-zero rule (§7) applies because a register holding a tagList cell pins the list.
Correctness gate. Reuse legacy tests/vm/list_*.mochi.
4. Maps
Representation. Objects[idx] holds *vmMap:
type vmMap struct {
entries map[mapKey]Cell
}
type mapKey struct {
tag uint16
bits uint64 // string → vmString.hash, int → i48, bool → 0/1
aux uint32 // for collisions on tagStr: vmString idx (compared by bytes)
}
Go's built-in map is the MVP. Replacement with an open-addressed table is a follow-on MEP. Reasons to ship Go-map first: the legacy VM does the same, behavior parity is trivial, and there is no profile evidence that maps are hot in the corpus.
Opcode family: OpNewMap, OpMapGet, OpMapSet, OpMapDel, OpMapLen, OpMapHas. Same shape as the list family.
Key equality. Keys hash by tag+payload. tagStr keys collide if hashes match; tiebreak by byte compare (the aux field carries the vmString idx of the canonical key for the entry).
5. Structs
Representation. Objects[idx] holds *vmStruct:
type vmStruct struct {
fields []Cell // length == Type.NumFields, indexed by slot
typeID uint32 // index into Program.Types
}
Field slots are assigned at compile time by compiler2/ir (already partially in place via the typed-IR work). The runtime never does name → slot lookup; that is a compile-time invariant.
Opcode family.
| Op | A | B | C | Effect |
|---|---|---|---|---|
OpNewStruct | dst | type id | allocate, fields all nil | |
OpGetField | dst | src | slot | regs[A] = struct.fields[slot] |
OpSetField | src | slot | val | struct.fields[slot] = val |
No reflection. No name-based field access at runtime. (If/when added, it lives behind a separate OpGetFieldByName that pays a hash lookup; the typed path stays cheap.)
Type table. Program.Types is parallel to Program.Funcs: a flat slice of *StructType{Name, FieldNames []string, NumFields int}. The verifier rejects OpGetField with slot >= NumFields.
6. Closures
Representation. Objects[idx] holds *vmClosure:
type vmClosure struct {
Fn *Function
Upvalues []Cell
}
Upvalues are by-value at MVP. Reasoning: every Mochi closure in the test corpus captures by value (immutable bindings dominate). By-reference upvalues require an Upval indirection cell and a more complex emitter; defer until a corpus program needs it.
Opcode family.
| Op | A | B | C | D | Effect |
|---|---|---|---|---|---|
OpMakeClos | dst | fn | upv src | n | regs[A] = closure(fn, copy(regs[upv..upv+n])) |
OpCallClos | dst | cls | args src | n | call closure value; reuses OpCall frame path |
OpGetUpval | dst | idx | regs[A] = currentClos.Upvalues[idx] |
OpCall (direct call) stays for known-target calls; OpCallClos is for the value-call case. Both go through the same pushFrame path defined in MEP 23 Round 3.
7. Garbage collection
Strategy. Tracing mark-sweep over the Objects table, with a freelist for index reuse. Not Go-GC delegation - we keep Objects []any so the Go runtime traces the boxed objects for us automatically, but we maintain our own freelist to reclaim slots (and thus tag indices) inside a Run().
Why not reference counting. Cycles are reachable (list of structs that points back), and Mochi programs are short-lived enough that the throughput cost of RC's per-write barrier outweighs the latency win.
When GC runs. Triggered when len(Objects) - freelist > threshold after an allocating opcode. Threshold starts at 64K, doubles after each cycle that frees less than 25% of objects. No incremental phase at MVP.
Roots. Three sets:
- Register stack. Every
Cellinvm.Stack[0:len(vm.Stack)]whose tag is a ref tag is a root. This is why the round-3 pop-frame path must change the moment §3 lands:popFramemust zero ref-tagged slots in the popped window before shrinking, otherwise a freed slot index could be wrongly retained as live. The change is: walkStack[base:oldEnd], for each ref-tagged cell markObjects[idx] = nilonly if we are also pushing the slot onto the freelist on the next GC. - Frame chain. Each
frameholds an*Function, which is statically allocated (Program.Funcs[i]) and not tracked by GC. - In-flight call args.
OpCall'sargBufandOpTailCall'sargssnapshots live on the Go stack across one dispatch step; they are visible to Go GC, not vm2 GC. Vm2 GC cannot run mid-instruction (it only runs at instruction boundaries), so these are safe.
Tail-call interaction. OpTailCall shrinks vm.Stack to base then regrows. The shrink window may contain ref cells from the caller; same zero-out rule as popFrame applies. This is the second site that breaks the round-3 "no zero-out" optimization. The cost is ~NumRegs * 8 bytes of writes per tail call; the compiler should mark frames as leaf-typed (no ref cells) and skip the zero pass when statically provable. That marking is a compile-time analysis in compiler2/ir, gated by §11 milestones.
Verifier obligations. The verifier must reject any program that reads a ref cell from a register the function did not write since entry (uninitialized-ref-cell read). Combined with the zero-out rule, this guarantees no dangling indices.
8. Errors and Result
Model. Mochi has Result<T, E> (from MEP 8 / P8.x) and bare runtime errors (div by zero, OOB, OOM). Both fold onto one mechanism: an error cell (tagErr) plus an explicit handler stack.
Handler stack. A parallel slice on the VM:
type handler struct {
FrameDepth int // len(vm.Frames) at OpPushHandler time
PC int // resume IP in that frame
DstReg int32 // register to receive the err cell
}
vm.Handlers []handler
Opcodes.
| Op | A | B | C | Effect |
|---|---|---|---|---|
OpPushHandler | dst | pc | push handler{FrameDepth, pc, dst} | |
OpPopHandler | drop top of Handlers | |||
OpRaise | src | unwind frames until Handlers top, then jump to its pc | ||
OpOk | dst | val | wrap val as Result.Ok(val) | |
OpErr | dst | val | wrap val as Result.Err(val) (tagged tagErr) | |
OpTry | dst | src | pc | if src is err, OpRaise(src); else regs[dst] = src.Ok |
This matches MEP 21's intent (Result is a value, not a control-flow primitive at the source level) but gives the runtime cheap unwinding for ? and for trap-style errors. The dispatch loop's current return ret, errors.New(...) paths (div-by-zero, mod-by-zero, OpHalt, unknown op) are rewritten to OpRaise-equivalent in-loop transfer; only the case where Handlers is empty propagates a Go error.
Why a separate handler stack (rather than reusing the frame chain). It is finer-grained than a frame: a single function can have multiple try-blocks. Putting handlers on a parallel stack means the frame stack stays a uniform array of pure activation records.
9. Dispatch evolution
Round 3 of MEP 23 closed most of the dispatch gap. What this MEP locks in:
- Stay on Go's
switch. Computed goto / tail-call dispatch was investigated under MEP 18 and is not portable; the modern Go compiler emits a jump table for denseswitch Op(it has since Go 1.21). The interpreter loop inruntime/vm2/eval.goalready takes that path. - Superinstruction policy. Fuse a bigram only when (a) it appears in the static instruction frequency for at least three corpus programs and (b) the unfused pair costs more than ~3% of the program's runtime in profiles. Fusing speculatively is forbidden; every fused op carries a perf-justification commit message ref in its
ops.godoc comment. This is the same standard the existingOpAddI64K,OpJumpIfLessI64,OpTailCallSelfmet. - No threaded code, no copy-and-patch. Those live in MEP 22, deferred.
10. Compilation pipeline gaps
The work above creates new lowering obligations on compiler2. The pipeline today is parse → typed IR → opt (constfold/DCE/tailcall) → regalloc → emit. Subsystem MVPs need:
| Pass | Subsystem dep | Owner | MVP scope |
|---|---|---|---|
| string lowering | §2 | emit | string lit → OpLoadStrK; ++ → OpConcatStr |
| list/map literals | §3, §4 | ir + emit | [1,2,3] → OpNewList + OpListPush*N |
| field slot assign | §5 | ir | struct decl → typeID; field name → slot |
| upvalue capture | §6 | ir | free-var analysis → OpMakeClos upv list |
| try/? lowering | §8 | ir + emit | try block → push/pop; ? → OpTry |
| ref-cell zero hint | §7 | ir | mark functions that hold no ref cells |
Each row maps to one PR. Each PR ships its opcode family, its lowering pass, the legacy-port correctness fixtures, and a new entry in the MEP 23 sweep.
11. Porting plan and gates
Subsystems land in this order, gated:
- Strings (§2). Legacy fixtures must pass; MEP 23 strings programs added.
- Lists (§3). Legacy list fixtures pass;
list_sum,list_mapbenchmarks added. - Structs (§5). Legacy struct fixtures pass. No dedicated bench (structs are accessed inside list/map work).
- Maps (§4). Legacy map fixtures pass;
map_countbenchmark added. - Closures (§6). Legacy closure fixtures pass;
closure_counterbenchmark added. - GC (§7). Implemented behind a
GOMOCHI_VM2_GC=1flag; turned on by default once the corpus runs clean for 24h on the nightly fuzzer. - Result / try (§8). Legacy
result_*.mochifixtures pass;try_chainbenchmark added.
Each PR must:
- include a
runtime/vm2/testdata/<subsystem>/fixture set ported from the legacy VM, plus avm2_<subsystem>_test.gothat diffs output against the legacy VM, - add at least one entry to
bench/template/<subsystem>/covering the new family in Mochi + Python + Lua, - update the MEP 23 result table in the same PR,
- not regress any prior subsystem benchmark by more than 5% on the MEP 17 gate.
The last bullet is the critical one. Subsystem PRs that move integer-core numbers backward (e.g., by widening Cell checks in the hot path) get reverted.
12. Backward compatibility
vm2 has no users outside the project. The legacy VM stays the default until §1-§7 land; once they do, the default flips to vm2 and the legacy VM is removed in a subsequent PR. No source-language change.
Open questions
- Inline strings. Should we add
tagSStr(up to 6 inline bytes) on day one or wait for profile evidence? Leaning wait, but the consumer-side changes are small enough that adding it later is cheap if §2's interface is respected. - List specialization. Same question for
[]int64-backed homogeneous lists. Same answer: wait for profiles. - Generational GC. The MVP collector is non-generational. A nursery would help long-running programs but is irrelevant to the current corpus. Defer.
- Cross-
RunObjectspinning. If embedding hosts ever want to hold a Mochi value acrossRun()calls, we need a pin API. Out of scope until there is a host.