MEP 20. Value Representation and Allocation Discipline

Field	Value
MEP	20
Title	Value Representation and Allocation Discipline
Author	Mochi core
Status	Draft
Type	Standards Track
Created	2026-05-16

Abstract

The current runtime/vm/value.go defines Value as a struct that holds every possible payload field at once: Tag, Int, Float, Str, Bool, List, Map, Func, BigInt, BigRat. The struct is wide, every register is a copy of this width, and the Func field is an interface{} so values flowing into it escape to the heap. The discipline question is independent of dispatch (MEP-18) and specialization (MEP-19): even a perfect interpreter loop runs slowly if every register move copies 80 bytes and every closure call boxes its receiver.

This MEP narrows Value to 24 bytes (a tag, an 8-byte numeric payload, and one pointer-sized reference slot), pools frame and the register slabs, and audits the hot path to ensure the numeric-only paths from MEP-19 produce zero allocations. The C-host bag-of-tricks (NaN boxing, tagged pointers, separate stacks per type) is surveyed and rejected for reasons specific to the Go runtime: Go's GC requires pointer-typed words and will not scan bit-stolen integers as roots. The trade-off space looks different here than in clox or LuaJIT.

The combined target: the geometric mean of MEP-17 benchmarks improves by 1.2-1.5x relative to MEP-18+19 alone, with allocs/op dropping to zero on fib, iter_sum, string_cat (where strings are pre-interned), map_get, and struct_field.

Motivation

Profile any non-trivial Mochi program today (go test -bench -benchmem against the MEP-17 suite) and the bottom of the profile is dominated by two costs: the size of Value copies (every register move is a runtime.memmove-class operation) and the allocation rate on interface{} boxing inside Func and inside generic call paths.

A side measurement makes the point. The current Value struct is roughly:

type Value struct {
    Tag    ValueTag       // 1 byte
    Bool   bool           // 1 byte
    Int    int64          // 8 bytes
    Float  float64        // 8 bytes
    Str    string         // 16 bytes (header)
    List   []Value        // 24 bytes (header)
    Map    map[string]Value // 8 bytes
    Func   any            // 16 bytes (interface)
    BigInt *big.Int       // 8 bytes
    BigRat *big.Rat       // 8 bytes
}

With padding the total is roughly 96 bytes. A function with 16 registers allocates 1.5 KB of register file per frame. A 1000-deep recursion is 1.5 MB of register state. The numeric path in OpAdd_II (MEP-19) does regs[a] = Value{Tag: ValueInt, Int: regs[b].Int + regs[c].Int} which writes 96 bytes per opcode, half of which are unused.

The fix is mechanical: collapse the payload into a single 8-byte slot and use a separate slot for the reference type. This is what nearly every Go-hosted VM in production does (Tengo, Starlark-Go, Risor, Yaegi). It is also what Crafting Interpreters does in its second-half optimizations once the tagged-union version is bottlenecked ([CI-Optimization]).

Specification

New `Value` layout

type Value struct {
    Tag uint8       // 0..15 used today; uint8 leaves room
    _   [7]byte     // padding so num is 8-byte aligned
    Num uint64      // int / float (via math.Float64bits) / bool / smaller refs as index
    Ref any         // string, list, map, func, BigInt, BigRat — only set for reference tags
}

Total: 24 bytes. Numeric paths read only Tag and Num. Reference paths read all three.

Encoding by tag:

ValueNull: all zero.
ValueBool: Num = 0 | 1.
ValueInt: Num = uint64(intValue). Use signed/unsigned reinterpret on read.
ValueFloat: Num = math.Float64bits(f).
ValueStr: Ref = string. (String headers are 16 bytes; storing them in Ref via any boxes the header but not the string body.)
ValueList: Ref = []Value.
ValueMap: Ref = map[string]Value or the future map-of-Values type.
ValueFunc: Ref = *Closure (concrete type, not interface; see below).
ValueBigInt: Ref = *big.Int.
ValueBigRat: Ref = *big.Rat.

The Ref any slot is unavoidable because Go's GC needs to scan the reference when it is live; unsafe.Pointer would defeat scanning, and uintptr is a non-pointer to the GC and unsafe to keep across a collection. Boxing the header (16 bytes for string, 24 bytes for []Value) into any is a small constant cost, accepted as the price of staying GC-friendly.

Why not NaN boxing in Go

NaN boxing packs the entire value into a single 64-bit double. It is the right answer in C/C++/Rust where you control GC and pointer scanning. In Go, the issue is that the GC scans only words typed as pointers. A uint64 that happens to hold a pointer in its low 48 bits is invisible to the collector: the pointer is a root that the GC will not follow, so the pointed-to object may be collected while live, and the pointer may be torn at a copying-GC moment.

There is a workaround (parallel root tables) but the bookkeeping cost is roughly equal to the savings from the tighter layout. Tengo, Starlark-Go, and the Go port of Lox all reach the same conclusion: discriminated struct, no bit-stealing.

The remaining numeric-density win NaN boxing offers (single 64-bit register for the whole value) is not realizable in Go anyway because the calling convention passes structs piecewise. So even if NaN-boxed Value were possible, the Go compiler would not pass it in a single register.

Frame and register pooling

frame and the register slab regs []Value are allocated per call today. Recursion and deep query plans churn the allocator. The fix is a per-VM sync.Pool for *frame and a per-size pool for []Value (powers of two: 4, 8, 16, 32, 64, 128, 256, 512).

type VM struct {
    framePool sync.Pool             // *frame
    regsPools [10]sync.Pool         // []Value by capacity bucket
    // ...
}

On call entry: pop a *frame from the pool, pop a []Value of the appropriate bucket, attach to the frame. On return: zero the registers (via runtime/internal/memclr or a plain loop), push back to the pools. Zeroing is required so the GC does not see stale pointers; modern Go has clear() in the standard library since 1.21 for this exact use case.

The pool design is well-trodden in Go networking (bytes.Buffer pools, HTTP request pools). The pitfall in VM use is that pooled slices can be retained longer than expected if the user keeps a closure reference; the rule is that the VM never hands out pool-backed slices to user code, only to internal frame state.

Numeric path allocation audit

The combined MEP-18 Phase A + MEP-19 quickening + MEP-20 layout target is: zero allocations per iteration of iter_sum and fib. This is a testable property; add -benchmem to the bench harness from MEP-17 and assert allocs/op == 0 for those two benchmarks in CI.

The audit walks every handler in the quickened numeric path:

OpAdd_II: reads two Value{Tag, Num}, writes one. No allocation.
OpLess_II: reads two, writes one bool Value. No allocation.
OpJumpIfFalse: reads one, branches. No allocation.
Loop back: no allocation.

If any of these allocate, the test fails. The any-typed Ref field on the result Value is set to nil for numeric results and Go knows not to escape; the test pins the contract.

Reference path: closure boxing

Today OpMakeClosure allocates an interface{} holding *closure. Concretizing Ref to *Closure (a defined type, not any) for the ValueFunc case removes the interface box. The other reference tags (string, list, map, BigInt, BigRat) all keep any because they hold heterogeneous concrete types; the boxing cost there is per-reference, not per-arithmetic-op, and the gain is small.

A future MEP may introduce a typed reference union (type RefValue { Str string; List []Value; ... }) to remove the last any cost. That requires touching every reference handler and is left for a follow-up.

Interaction with MEP-15 effects, MEP-16 null safety

Option<T> (MEP-16) reuses the existing ValueNull tag for the None case. No new tag is needed. The size of Value is unchanged by MEP-16.

Effect labels (MEP-15) live on FuncType, not on Value. The new layout does not affect effects.

Benchmarks gating the change

This MEP lands in three PRs, each with MEP-17 numbers:

Value layout flip. Just the struct rewrite plus the call-site updates. Expected: 1.1-1.3x on the geometric mean, mostly from smaller memmove per register move.
Frame and register pooling. Expected: 1.05-1.15x and a measurable drop in allocs/op on recursive benchmarks.
*Closure concretization. Expected: 1.02-1.05x but most visible on hof_map.

A regression on any benchmark beyond MEP-17's 3% budget blocks the PR.

Status at a glance

Item	Status
`Value` struct narrowed to `Tag + Num + Ref` (24 bytes)	proposed
Numeric path proven allocation-free on `iter_sum` and `fib`	proposed
Frame pool (`sync.Pool[*frame]`)	proposed
Register slab pool (10 size buckets, powers of two)	proposed
`Ref` concretized to `*Closure` for `ValueFunc`	proposed
Other reference tags remain `any` (small per-ref cost accepted)	proposed
NaN boxing	rejected
Tagged pointers / unsafe payload	rejected

Risks

Slice retention from pools. If a user-visible closure captures a pooled slice, recycling it back to the pool produces aliasing. Mitigation: never hand pool-backed memory to user code. Audit closure capture paths.
Race in the pools. sync.Pool is safe; the per-bucket regsPools is an array of sync.Pools, also safe. No new concurrency story.
GC pressure shift. Pool reuse trades allocator load for retention; if the working set grows the pool keeps slabs alive across GC cycles. Mitigation: cap each pool's per-P slab count via a custom wrapper if pprof shows pool growth.

Non-goals

No NaN boxing. Permanently rejected for Mochi for the reasons above.
No custom GC. Go's GC is sufficient; the audit is about minimizing allocation, not replacing the collector.
No new opcodes. Layout change only.

Implementation notes

Sequencing matters: the layout flip (PR 1) is a big mechanical change and conflicts with most other VM PRs. Land it first; other work rebases. The pooling PR (PR 2) is small and lands second. The *Closure PR (PR 3) is contingent on PR 1.

Total engineering cost: three engineer-weeks, dominated by the mechanical updates to handler call sites and tests.

References

[CI-Optimization] Bob Nystrom, Crafting Interpreters: Optimization. https://craftinginterpreters.com/optimization.html
[Wren-perf] Bob Nystrom, Wren Performance. https://wren.io/performance.html (NaN boxing rationale)
[Go-Pool] Go standard library, sync.Pool. https://pkg.go.dev/sync#Pool
[Go-Clear] Go 1.21 release notes, clear builtin. https://go.dev/doc/go1.21
[Tengo] Tengo: A fast script language for Go. https://github.com/d5/tengo (a Go-hosted VM that uses a discriminated Value struct)
[Starlark-Go] Bazel team, Starlark in Go. https://github.com/google/starlark-go
[Risor] Risor: A fast and flexible scripting language for Go. https://github.com/risor-io/risor

Abstract​

Motivation​

Specification​

New Value layout​

Why not NaN boxing in Go​

Frame and register pooling​

Numeric path allocation audit​

Reference path: closure boxing​

Interaction with MEP-15 effects, MEP-16 null safety​

Benchmarks gating the change​

Status at a glance​

Risks​

Non-goals​

Implementation notes​

References​