MEP 20. Value Representation and Allocation Discipline
| Field | Value |
|---|---|
| MEP | 20 |
| Title | Value Representation and Allocation Discipline |
| Author | Mochi core |
| Status | Draft |
| Type | Standards Track |
| Created | 2026-05-16 |
Abstract
The current runtime/vm/value.go defines Value as a struct that holds every possible payload field at once: Tag, Int, Float, Str, Bool, List, Map, Func, BigInt, BigRat. The struct is wide, every register is a copy of this width, and the Func field is an interface{} so values flowing into it escape to the heap. The discipline question is independent of dispatch (MEP-18) and specialization (MEP-19): even a perfect interpreter loop runs slowly if every register move copies 80 bytes and every closure call boxes its receiver.
This MEP narrows Value to 24 bytes (a tag, an 8-byte numeric payload, and one pointer-sized reference slot), pools frame and the register slabs, and audits the hot path to ensure the numeric-only paths from MEP-19 produce zero allocations. The C-host bag-of-tricks (NaN boxing, tagged pointers, separate stacks per type) is surveyed and rejected for reasons specific to the Go runtime: Go's GC requires pointer-typed words and will not scan bit-stolen integers as roots. The trade-off space looks different here than in clox or LuaJIT.
The combined target: the geometric mean of MEP-17 benchmarks improves by 1.2-1.5x relative to MEP-18+19 alone, with allocs/op dropping to zero on fib, iter_sum, string_cat (where strings are pre-interned), map_get, and struct_field.
Motivation
Profile any non-trivial Mochi program today (go test -bench -benchmem against the MEP-17 suite) and the bottom of the profile is dominated by two costs: the size of Value copies (every register move is a runtime.memmove-class operation) and the allocation rate on interface{} boxing inside Func and inside generic call paths.
A side measurement makes the point. The current Value struct is roughly:
type Value struct {
Tag ValueTag // 1 byte
Bool bool // 1 byte
Int int64 // 8 bytes
Float float64 // 8 bytes
Str string // 16 bytes (header)
List []Value // 24 bytes (header)
Map map[string]Value // 8 bytes
Func any // 16 bytes (interface)
BigInt *big.Int // 8 bytes
BigRat *big.Rat // 8 bytes
}
With padding the total is roughly 96 bytes. A function with 16 registers allocates 1.5 KB of register file per frame. A 1000-deep recursion is 1.5 MB of register state. The numeric path in OpAdd_II (MEP-19) does regs[a] = Value{Tag: ValueInt, Int: regs[b].Int + regs[c].Int} which writes 96 bytes per opcode, half of which are unused.
The fix is mechanical: collapse the payload into a single 8-byte slot and use a separate slot for the reference type. This is what nearly every Go-hosted VM in production does (Tengo, Starlark-Go, Risor, Yaegi). It is also what Crafting Interpreters does in its second-half optimizations once the tagged-union version is bottlenecked ([CI-Optimization]).
Specification
New Value layout
type Value struct {
Tag uint8 // 0..15 used today; uint8 leaves room
_ [7]byte // padding so num is 8-byte aligned
Num uint64 // int / float (via math.Float64bits) / bool / smaller refs as index
Ref any // string, list, map, func, BigInt, BigRat — only set for reference tags
}
Total: 24 bytes. Numeric paths read only Tag and Num. Reference paths read all three.
Encoding by tag:
ValueNull: all zero.ValueBool:Num = 0 | 1.ValueInt:Num = uint64(intValue). Use signed/unsigned reinterpret on read.ValueFloat:Num = math.Float64bits(f).ValueStr:Ref = string. (String headers are 16 bytes; storing them inRefviaanyboxes the header but not the string body.)ValueList:Ref = []Value.ValueMap:Ref = map[string]Valueor the future map-of-Values type.ValueFunc:Ref = *Closure(concrete type, not interface; see below).ValueBigInt:Ref = *big.Int.ValueBigRat:Ref = *big.Rat.
The Ref any slot is unavoidable because Go's GC needs to scan the reference when it is live; unsafe.Pointer would defeat scanning, and uintptr is a non-pointer to the GC and unsafe to keep across a collection. Boxing the header (16 bytes for string, 24 bytes for []Value) into any is a small constant cost, accepted as the price of staying GC-friendly.
Why not NaN boxing in Go
NaN boxing packs the entire value into a single 64-bit double. It is the right answer in C/C++/Rust where you control GC and pointer scanning. In Go, the issue is that the GC scans only words typed as pointers. A uint64 that happens to hold a pointer in its low 48 bits is invisible to the collector: the pointer is a root that the GC will not follow, so the pointed-to object may be collected while live, and the pointer may be torn at a copying-GC moment.
There is a workaround (parallel root tables) but the bookkeeping cost is roughly equal to the savings from the tighter layout. Tengo, Starlark-Go, and the Go port of Lox all reach the same conclusion: discriminated struct, no bit-stealing.
The remaining numeric-density win NaN boxing offers (single 64-bit register for the whole value) is not realizable in Go anyway because the calling convention passes structs piecewise. So even if NaN-boxed Value were possible, the Go compiler would not pass it in a single register.
Frame and register pooling
frame and the register slab regs []Value are allocated per call today. Recursion and deep query plans churn the allocator. The fix is a per-VM sync.Pool for *frame and a per-size pool for []Value (powers of two: 4, 8, 16, 32, 64, 128, 256, 512).
type VM struct {
framePool sync.Pool // *frame
regsPools [10]sync.Pool // []Value by capacity bucket
// ...
}
On call entry: pop a *frame from the pool, pop a []Value of the appropriate bucket, attach to the frame. On return: zero the registers (via runtime/internal/memclr or a plain loop), push back to the pools. Zeroing is required so the GC does not see stale pointers; modern Go has clear() in the standard library since 1.21 for this exact use case.
The pool design is well-trodden in Go networking (bytes.Buffer pools, HTTP request pools). The pitfall in VM use is that pooled slices can be retained longer than expected if the user keeps a closure reference; the rule is that the VM never hands out pool-backed slices to user code, only to internal frame state.
Numeric path allocation audit
The combined MEP-18 Phase A + MEP-19 quickening + MEP-20 layout target is: zero allocations per iteration of iter_sum and fib. This is a testable property; add -benchmem to the bench harness from MEP-17 and assert allocs/op == 0 for those two benchmarks in CI.
The audit walks every handler in the quickened numeric path:
OpAdd_II: reads twoValue{Tag, Num}, writes one. No allocation.OpLess_II: reads two, writes one boolValue. No allocation.OpJumpIfFalse: reads one, branches. No allocation.- Loop back: no allocation.
If any of these allocate, the test fails. The any-typed Ref field on the result Value is set to nil for numeric results and Go knows not to escape; the test pins the contract.
Reference path: closure boxing
Today OpMakeClosure allocates an interface{} holding *closure. Concretizing Ref to *Closure (a defined type, not any) for the ValueFunc case removes the interface box. The other reference tags (string, list, map, BigInt, BigRat) all keep any because they hold heterogeneous concrete types; the boxing cost there is per-reference, not per-arithmetic-op, and the gain is small.
A future MEP may introduce a typed reference union (type RefValue { Str string; List []Value; ... }) to remove the last any cost. That requires touching every reference handler and is left for a follow-up.
Interaction with MEP-15 effects, MEP-16 null safety
Option<T> (MEP-16) reuses the existing ValueNull tag for the None case. No new tag is needed. The size of Value is unchanged by MEP-16.
Effect labels (MEP-15) live on FuncType, not on Value. The new layout does not affect effects.
Benchmarks gating the change
This MEP lands in three PRs, each with MEP-17 numbers:
- Value layout flip. Just the struct rewrite plus the call-site updates. Expected: 1.1-1.3x on the geometric mean, mostly from smaller
memmoveper register move. - Frame and register pooling. Expected: 1.05-1.15x and a measurable drop in
allocs/opon recursive benchmarks. *Closureconcretization. Expected: 1.02-1.05x but most visible onhof_map.
A regression on any benchmark beyond MEP-17's 3% budget blocks the PR.
Status at a glance
| Item | Status |
|---|---|
Value struct narrowed to Tag + Num + Ref (24 bytes) | proposed |
Numeric path proven allocation-free on iter_sum and fib | proposed |
Frame pool (sync.Pool[*frame]) | proposed |
| Register slab pool (10 size buckets, powers of two) | proposed |
Ref concretized to *Closure for ValueFunc | proposed |
Other reference tags remain any (small per-ref cost accepted) | proposed |
| NaN boxing | rejected |
| Tagged pointers / unsafe payload | rejected |
Risks
- Slice retention from pools. If a user-visible closure captures a pooled slice, recycling it back to the pool produces aliasing. Mitigation: never hand pool-backed memory to user code. Audit closure capture paths.
- Race in the pools.
sync.Poolis safe; the per-bucketregsPoolsis an array ofsync.Pools, also safe. No new concurrency story. - GC pressure shift. Pool reuse trades allocator load for retention; if the working set grows the pool keeps slabs alive across GC cycles. Mitigation: cap each pool's per-P slab count via a custom wrapper if
pprofshows pool growth.
Non-goals
- No NaN boxing. Permanently rejected for Mochi for the reasons above.
- No custom GC. Go's GC is sufficient; the audit is about minimizing allocation, not replacing the collector.
- No new opcodes. Layout change only.
Implementation notes
Sequencing matters: the layout flip (PR 1) is a big mechanical change and conflicts with most other VM PRs. Land it first; other work rebases. The pooling PR (PR 2) is small and lands second. The *Closure PR (PR 3) is contingent on PR 1.
Total engineering cost: three engineer-weeks, dominated by the mechanical updates to handler call sites and tests.
References
- [CI-Optimization] Bob Nystrom, Crafting Interpreters: Optimization. https://craftinginterpreters.com/optimization.html
- [Wren-perf] Bob Nystrom, Wren Performance. https://wren.io/performance.html (NaN boxing rationale)
- [Go-Pool] Go standard library, sync.Pool. https://pkg.go.dev/sync#Pool
- [Go-Clear] Go 1.21 release notes, clear builtin. https://go.dev/doc/go1.21
- [Tengo] Tengo: A fast script language for Go. https://github.com/d5/tengo (a Go-hosted VM that uses a discriminated Value struct)
- [Starlark-Go] Bazel team, Starlark in Go. https://github.com/google/starlark-go
- [Risor] Risor: A fast and flexible scripting language for Go. https://github.com/risor-io/risor