Skip to main content

MEP 42. Native Code Emission: copy-and-patch JIT, C-as-target AOT, and a Wasm-first cross-platform story

FieldValue
MEP42
TitleNative Code Emission
AuthorMochi core
StatusDraft
TypeStandards Track
Created2026-05-18
DependsMEP-23 (Compile-time budget), MEP-40 (vm3 + compiler3), MEP-41 (Memory Safety)

Abstract

Mochi today ships exactly one execution model: the vm3 bytecode interpreter with vm2jit-derived golang-asm JIT for hot methods on x86_64 and aarch64. There is no AOT path; mochi build produces a Go binary that embeds the interpreter, not native code emitted from compiler3 IR. There is no Wasm output. There is no Windows native target. There is no story for mochi run script.mochi to compete with python script.py on startup time, and no story for mochi build --portable to compete with go build on distributable artifact size.

This MEP specifies the from-scratch native code emission layer for Mochi. The architecture is dual-backend by design: copy-and-patch JIT for the interpreter tier (sub-millisecond compile, 3-5x faster than vm3 interpreter, inherits Clang -O2 stencil quality), and C-as-target AOT for shipped binaries (covers every target the user's cc supports, including embedded Cortex-M and microcontrollers). The two backends share compiler3's typed IR; neither requires LLVM or cgo at Mochi build time. The naive-emission research substrate (/notes/Spec/5500/) validates this pair: copy-and-patch is the technique CPython 3.13 shipped in October 2024 (PEP 744); C-as-target is the technique that lets Nim, V, Vala, and Cython cover every embedded toolchain on Earth.

Phase 1 targets five host combinations: x86_64 Linux (ELF/SysV), aarch64 Linux (ELF/AAPCS64), aarch64 macOS (Mach-O/Apple ABI), x86_64 macOS (Mach-O), and wasm32 with WasmGC. These cover every CI runner, every modern cloud ARM instance, every Apple Silicon developer, every browser, and every standalone Wasm runtime (Wasmtime, WAMR, Wasmer, WasmEdge, Spin). Phase 2 adds Windows x86_64/aarch64 (PE/COFF with .pdata/.xdata), riscv64 Linux (RVA22/RVA23), an APE bundler for single-artifact polyglot distribution, plus a native Wasm AOT emitter and a QBE backend for users who want sub-MB stripped binaries without a libc dependency.

The performance bet, deduced from the research substrate: copy-and-patch JIT lands Mochi in the same 3-5x-of-Go band that the §6.16 close-out of MEP-39 left as out of reach for vm3 alone. C-as-target AOT lands mochi build artifacts in the 1-10 MB band (Crystal-like) with binary size and code quality bounded by the user's C compiler, not by Mochi. The Wasm path gives Mochi a distribution channel no other Go-hosted language has: browser, edge, and Wasmtime AOT all from a single emit pipeline.

The closest existing architectural analog is Crystal (closed-world, typed IR, managed runtime, same target tier). The case study to learn most from is .NET NativeAOT (mature managed-language AOT pipeline, trim model, source-generator alternative to runtime reflection, single-file deployment UX). Both are documented in ~/notes/Spec/5500/aot/.

This MEP is a Standards Track design document. The phased plan (Phase 0 spec freeze through Phase 8 Wasm AOT) ships incrementally; no phase ships until its gate is green. The MEP and the code ship in the same PR (MEP-spec-in-sync rule). No phase introduces cgo on the Mochi build host.

Motivation

What MEP-40 left on the table

MEP-40 (vm3 + compiler3) produced a typed IR that propagates Mochi's static type system end-to-end. Every SSA value carries a proven type at IR-emit time; every opcode encodes the type in the opcode itself; the three-bank register file (regsI64 / regsF64 / regsCell) reads and writes native machine words without Cell envelope traffic. This is exactly the precondition a code generator needs: no runtime type guards, no fallback paths, no escape valve. The §6.16 close-out of MEP-39 listed four structural ceilings that vm2 could not lift; vm3 lifted all four. What remains is to spend that headroom by emitting native code instead of dispatching bytecode.

The vm3jit method JIT (MEP-40 Phase 5) covers the inner-loop case but inherits vm2jit's golang-asm encoding: no register allocator, no cross-op optimization, no AOT path. The shipping Mochi binary still embeds the vm3 interpreter, and mochi build my_program.mochi still produces a Go binary that runs the interpreter on my_program.mochi. That is a distribution model, not a code-generation model.

What changed in 2024-2026

Four things between PLDI 2021 and May 2026 make this MEP unavoidable.

Copy-and-patch shipped in CPython 3.13 (October 2024). PEP 744 enabled the copy-and-patch JIT (Xu+Kjolstad, PLDI 2021) behind --enable-experimental-jit in Python 3.13.0. The technique is now production-validated at the scale of CPython: ~1000 lines of Python build-time tooling plus ~100 lines of C runtime per ISA, hand-written C stencils compiled by Clang -O2 at build time, memcpy + patch at runtime. The risk profile is well understood, the macOS arm64 JIT entitlement story is documented, and Brandt Bucher's writeups are now reference material. CPython measured a 9-15% throughput improvement on pyperformance for a first-cut JIT with no register allocation. Mochi, with typed IR and reserved arena-base registers, expects 3-5x on hot loops.

Wasm GC + WASIp2 shipped in 2024. WasmGC reached browser baseline in 2024 (Chrome 119, Firefox 120, Safari 18.4); WASI Preview 2 (component model + WIT bindings) reached stable in Wasmtime 17 (October 2024). Wasm is now a credible AOT target for a managed-runtime language, not a JavaScript fallback. The Mochi handle Cell maps directly to a wasmtime externref or to a typed GC reference under WasmGC; the typed arenas map directly to typed GC structs. The September 2025 Wasm 3.0 release added 64-bit memories and atomic operations, closing the last two gaps for a Mochi Wasm port.

Apple Silicon adoption crossed 50% of developer machines (DeveloperEcosystem 2025 survey). Mochi without an aarch64 macOS native binary is no longer credible for individual developer adoption. The Mach-O writer, the Apple variadic ABI delta, the ad-hoc signing requirement, and the JIT entitlement plist are mandatory work for any "professional language" claim in 2026.

Zig 0.13 + zig cc graduated to "default cross-compiler" status in many teams. Zig's bundled libcs (musl, glibc, mingw) and zero-config cross compilation set the new floor for what users expect from a language's cross-compilation UX. Nim already pairs with zig cc; Crystal users wrap --cross-compile around zig cc; Rust users layer cargo-zigbuild on top of cargo build. Mochi mochi build --target=aarch64-linux should "just work" from a macOS dev machine, and zig cc is the cheapest way to deliver that.

Why two phase-1 backends, not one

The naive-emission survey (~/notes/Spec/5500/naive/00_naive_summary.md) and the backends survey (~/notes/Spec/5500/backends/00_backends_summary.md) agree on the same conclusion through different lenses: no single backend covers the four MEP-42 priority surfaces (fast JIT, distributable AOT, Wasm, embedded) with acceptable engineering cost and Mochi's pure-Go-no-cgo identity preserved. The two-backend strategy is the pragmatic compromise GHC adopted (NCG + LLVM) and Zig adopted (in-house + LLVM + C). Mochi adopts the same shape with smaller pieces: copy-and-patch + C-as-target in phase 1, Wasm emitter + QBE in phase 2.

The pair is complementary, not redundant. Copy-and-patch is millisecond-compile and runtime-tier; C-as-target is cc-bound compile time and ship-tier. Copy-and-patch produces machine code in an mmap'd executable region; C-as-target produces an ELF/Mach-O/PE on disk. Copy-and-patch covers two ISAs out of the gate; C-as-target covers every ISA the user's cc understands. Neither covers the other's surface, and shipping both costs less than shipping either alone with the gaps patched by ad-hoc tools.

Scope

In scope:

  • Complete design and implementation of compiler3/emit/copypatch/ (copy-and-patch JIT: stencil generator, runtime patcher, mmap+W^X manager).
  • Complete design and implementation of compiler3/emit/c/ (C-as-target AOT: typed-IR-to-C lowering, runtime header, cc driver).
  • Initial implementation of compiler3/emit/wasm/ (Wasm 3.0 + WasmGC emitter, browser + standalone targets).
  • Stencil generation tooling (tools/stencilgen/) that invokes Clang at build time and emits a generated Go file per ISA.
  • Linker driver (compiler3/link/) that invokes LLD by default with system linker fallback.
  • Object file readers/writers (compiler3/objfile/elf/, compiler3/objfile/macho/, compiler3/objfile/pe/) using Go's debug/elf, debug/macho, and a hand-rolled PE writer.
  • Cross-compilation support for the five phase-1 targets from any host, optionally via zig cc.
  • DWARF 5 line-table emission for native targets (phase 1); full DWARF + optional PDB in phase 2.
  • A mochi build UX that produces a single distributable binary, with --target, --portable (musl static-PIE), and --mode={dev,release,embedded} flags.
  • A mochi run path that selects copy-and-patch JIT for hot loops when available, falling back to vm3 interpreter when not.
  • Bench harness integration: every BG kernel runs under all three execution modes (interpreter, JIT, AOT) on every supported host, with cross-mode parity gates.

Out of scope (deferred to successor MEPs):

  • LLVM as a primary backend. Available as a phase-3 opt-in (compiler3/emit/llvmir/ emits .ll text, shells to llc); not required for any phase-1 or phase-2 deliverable.
  • MLIR. Reserved for an SIMT / GPU successor MEP.
  • libgccjit. Rejected outright: GPL contagion risk.
  • iOS / iPadOS / visionOS targets. Provisioning, App Store review, and MH_BUNDLE machinery deserve a dedicated mobile MEP.
  • GPU codegen (Metal AIR, CUDA PTX, ROCm, SPIR-V, WGSL). Separate MEP.
  • Tracing JIT. vm3jit is a method JIT; tracing is MEP-50+ territory.
  • IL2CPU / Bartok-style two-stage AOT through an intermediate C++ pass.
  • Profile-guided optimization. Phase 3+ once base AOT and JIT are stable.

Specification

§1 Architecture

The native code emission layer sits between compiler3 (typed IR producer) and the host toolchain (system cc, system linker, host kernel loader). Three emit packages share the typed IR:

compiler3/ir (MEP-40)
|
+---------------+----------------+
| | |
v v v
emit/copypatch emit/c emit/wasm (phase 2 AOT)
(JIT, phase 1) (AOT, phase 1) emit/qbe (AOT, phase 2)
| | |
v v v
mmap exec ELF/Mach-O/PE .wasm module
memcpy+patch via system cc via builtin emit

The boundary is the typed IR. Every emit package consumes the same IR shape, the same SSA value types, the same three-bank register convention, the same Cell ABI. No emit package may add IR ops or modify the type lattice; either the IR already expresses what the backend needs, or the IR change is a separate PR that lands first.

The four-bit arena tag and 12-bit generation encoding of MEP-40 plus the verifier rules of MEP-41 are load-bearing for every backend. Stencils may not mask, shift, or otherwise destructure the generation field; the backend treats it as opaque per MEP-41's Tag Confidentiality Enforcement analog. The C-as-target lowering wraps every handle deref in mochi_deref_T(handle) calls so the C compiler cannot inline gen-extraction; the copy-and-patch stencils never name a register holding a raw gen value.

§2 Copy-and-patch JIT (phase 1)

Hand-write one C function per vm3 opcode in runtime/vm3/op.go. Each stencil takes the vm3 frame, the operand registers, and returns the dispatch target for the next op. Compile each stencil with Clang -O2 -fno-asynchronous-unwind-tables -fno-stack-protector -mno-red-zone at Mochi build time. Extract the resulting machine code and relocations from the .text section; emit them as a generated Go file (compiler3/emit/copypatch/stencils_amd64.go, ..._arm64.go) containing a per-opcode struct: {bytes []byte, holes []Reloc}.

At Mochi runtime, the JIT walks the typed IR for a hot method, picks the stencil for each op, memcpy's the bytes into an mmap'd executable region, and patches the relocations (immediates, jump targets, runtime symbols) in place. The patched code is then jumped to via a Go-friendly entry trampoline that preserves Go's stack invariants. Code-cache management uses a simple bump allocator with a high-water mark; when the cache fills, the JIT falls back to vm3 interpretation for cold ops and recycles the cache on the next GC cycle.

Register convention (x86_64 SysV; mirrored on arm64 AAPCS64 with x19-x28):

  • R12: pointer to current Frame.
  • R13: pointer to typed-arena base table.
  • R14: pointer to per-VM context (PC stash, deopt sentinel slot).
  • R15: scratch.
  • RAX/RDI/RSI/RDX/RCX/R8/R9: Cell operand registers (caller-save, follow stencil ABI).

Reserved callee-saves match the MEP-40 three-bank register-file design. The JIT never spills R12-R14 because stencils assume them on entry; the only spill path is when a stencil's internal codegen needs more than R15 of scratch, in which case the stencil uses the red-zone (Linux) or a Mochi-private scratch slab on the frame (macOS arm64, which has no red zone).

W^X is enforced via the dual-mapping pattern: the code cache is mmap'd twice, once RW (for the patcher) and once RX (for the runtime jump), with the kernel guaranteeing the same physical pages. On Apple Silicon, pthread_jit_write_protect_np(0) toggles the per-thread write-permission bit during patching; the JIT thread holds the toggle for the patch window only. On Linux with PaX or grsec, the dual-mapping is required; on stock Linux, mprotect toggling is the fallback path.

PAC and BTI hardening on aarch64: every stencil entry carries a bti j instruction; every cross-stencil call uses blraa with the appropriate PAC modifier. The PAC key is per-Mochi-process, derived at startup from /dev/urandom and stored in a register only the patcher knows. This is the MEP-41 §8 JIT hardening checklist; copy-and-patch satisfies it without any per-stencil logic because Clang -O2 already emits PAC+BTI when targeted at arm64-apple-darwin.

Stencil set scope (phase 1):

  • All non-allocating vm3 opcodes: arithmetic (i64/f64), comparison, conditional jumps, register move, frame load/store, typed-array element load/store.
  • Inline allocation for short-lived Cells (small int, short string, bool).
  • Slow-path call into the vm3 runtime for: handle dereference miss, arena exhaustion, deopt sentinel, MEP-41 verifier rule check failure.
  • Branch fusion: chained conditional jumps in a single basic block fold into one stencil where possible (Liftoff-style is phase 2; phase 1 keeps every op as a separate stencil).

Not in phase-1 scope (phase 2): cross-op register allocation, inline caching for first-class function dispatch, SIMD intrinsics, generational write-barrier elision via static analysis.

§3 C-as-target AOT (phase 1)

Lower compiler3 IR to C in compiler3/emit/c/. Strategy follows Nim's: one C function per Mochi function, one C struct per Mochi type, every basic block becomes a labeled statement, control flow via goto. Computed goto (GCC extension) is used for the interpreter tier within AOT'd code (for indirect dispatch on dynamic-typed values that escape the static-type discipline); standard switch is the portable fallback for MSVC.

The Mochi C runtime header (runtime/c/mochi.h) declares:

  • mochi_Cell (uint64_t) and the inline NaN-boxing accessors.
  • mochi_arena_t and the typed-arena APIs from MEP-40.
  • mochi_handle_T(arena, gen, idx) constructors and mochi_deref_T(handle) accessors.
  • The verifier-checked operations from MEP-41 (mochi_try_deref_T, mochi_kill, etc.).
  • The slow-path callbacks the JIT and AOT'd code share.

The runtime header is C99-portable and depends only on <stdint.h>, <stdlib.h>, <string.h>. On glibc and musl it adds <unistd.h> for mmap (for the JIT path; AOT code does not mmap). On Windows it uses <windows.h> for VirtualAlloc. Every implementation file (runtime/c/mochi.c) is built into a static library libmochi.a (or mochi.lib on Windows) that the linker driver bundles into the final executable.

The mochi build driver:

  1. Parses + type-checks + lowers the program to compiler3 IR.
  2. Calls compiler3/emit/c/ to produce a temporary .c file (or files, if multi-module).
  3. Shells out to the user's C compiler: prefers zig cc if available (zero-config cross compilation), falls back to cc, falls back to clang, falls back to gcc.
  4. Compiles the .c files plus libmochi.a to a single executable using the chosen linker (LLD by default, system ld fallback).
  5. Strips debug info on release mode; preserves DWARF on dev mode; emits embedded-mode subset on embedded mode.

The C compiler choice is documented but not enforced. mochi build --cc=zig selects zig cc explicitly; mochi build --cc=tcc selects TCC (useful for sub-second build times on small programs); mochi build --cc=clang -- -fsanitize=address passes through C compiler flags. The default is mochi build with no --cc flag, which picks zig cc if installed, else cc.

Cross compilation via zig cc:

mochi build --target=aarch64-linux-musl hello.mochi
mochi build --target=x86_64-windows-gnu hello.mochi
mochi build --target=wasm32-wasi hello.mochi

Each --target lookup maps to a zig cc -target triple. The triple list ships in compiler3/emit/c/triples.go; users can extend it via a mochi.toml config.

§4 Wasm emit (phase 1 minimal, phase 2 AOT)

Phase 1 ships a minimal Wasm 3.0 + WasmGC emitter in compiler3/emit/wasm/ that handles the BG kernel subset (arithmetic, control flow, typed arrays, simple structs). The output module imports a small Mochi-Wasm host shim (runtime/wasm/host.js for browser, runtime/wasm/host.wat for Wasmtime/standalone) that provides the slow-path callbacks the JIT and AOT both need.

Handle Cell mapping: 64-bit Mochi Cell becomes a Wasm i64 for the inline-encoded variants (small int, float, bool, null) and a (ref $mochi_handle) GC reference for handle variants. Typed arenas become WasmGC (struct ...) types per arena, instantiated lazily. The four-bit arena tag is the WasmGC type index; the 12-bit generation is a struct field; the 32-bit slab index is the struct array index.

Phase 2 promotes the Wasm emitter to full AOT through Wasmtime's wasmtime compile (~/notes/Spec/5500/backends/12_wasmtime_aot.md): Mochi emits .wasm, wasmtime compile lowers to native .cwasm, the Mochi loader maps the .cwasm directly. This gives Mochi a universal IR (Wasm) and reaches every Cranelift-supported target transitively.

Browser DWARF: Wasm modules carry DWARF in custom sections (./custom("name").data) per the Chrome C/C++ DevTools Support extension. Phase 1 emits line tables only; phase 2 adds full type and variable info.

§5 Linker strategy

Phase 1: LLD by default, system linker fallback.

  • Linux: ld.lld (default), ld.bfd or ld.gold (fallback).
  • macOS: system ld (which is ld_prime since Xcode 15, default), ld.lld (fallback for cross builds).
  • Windows: lld-link (default), system link.exe (fallback if MSVC is installed).
  • Wasm: wasm-ld (LLVM).

Bundle LLD inside the Mochi distribution under Apache 2 + LLVM Exception license. The bundled LLD is a single ~25 MB binary covering all four formats (ELF, Mach-O, PE, Wasm). The total Mochi binary size impact is acceptable for desktop installs; mochi build --no-bundle-lld is the opt-out for users on restricted disks.

Phase 2: self-hosted writers for ELF, Mach-O, and PE in compiler3/objfile/. Pattern follows Go's cmd/link: the compiler emits the final image directly without an external linker subprocess on the common path. LLD remains the fallback for the "I need to link against a C library that ships as .a" case. This halves cold-start build time (no fork+exec of the linker) and lets Mochi tune the output for compiler3-specific metadata sections (typed-arena debug info, MEP-40 vm3 metadata, MEP-41 verifier-proof manifests).

§6 Runtime / libc strategy

Phase 1:

  • Linux: glibc dynamic-linked default. Document the glibc-2.31 minimum (May 2020, covers Ubuntu 20.04+, RHEL 9+, Debian 11+). mochi build --portable switches to musl 1.2.6 static-PIE for "drop the binary on any Linux from 2015+" mode.
  • macOS: libSystem dynamic-linked (only supported option per Apple; no static libc on macOS).
  • Windows: ntdll + ucrtbase dynamic-linked. Document the Windows 10 1809+ minimum.
  • Wasm: WASI Preview 2 imports (component-model interfaces) for standalone; browser DOM imports for browser builds.

Phase 2:

  • Push musl static-PIE to be the new-project default on Linux; glibc remains supported.
  • Add Cosmopolitan APE target (mochi build --target=ape). One binary, runs on Linux, macOS, Windows, BSDs (~/notes/Spec/5500/runtime/03_cosmopolitan_libc.md).
  • Optional --no-libc freestanding mode for embedded / unikernel users (~/notes/Spec/5500/runtime/05_no_libc_freestanding.md). Direct syscalls on Linux; vendor-specific entrypoints elsewhere.

§7 Debug info strategy

Phase 1:

  • Linux, macOS, Wasm: DWARF 5 line tables only (~/notes/Spec/5500/debug/01_dwarf_5.md). Sufficient for gdb / lldb stack traces and source-line attribution in profilers (perf, Instruments, Chrome DevTools).
  • Windows: skip PDB for phase 1; tell users gdb and lldb work, WinDbg does not.

Phase 2:

  • Full DWARF 5 with type info and variable info on all native targets.
  • Optional CodeView / PDB on Windows (~/notes/Spec/5500/debug/02_codeview_pdb.md) for WinDbg + Visual Studio.
  • Source maps in Wasm custom sections (~/notes/Spec/5500/debug/03_source_maps_wasm.md) for Chrome DevTools.
  • compiler3-aware DWARF extensions: arena tag, generation, and bank index appear as DW_TAG_variable attributes on Mochi-typed values.

§8 Object format strategy

Phase 1:

  • ELF: emit via Go's debug/elf writer with a thin Mochi wrapper that handles the few cmd/link-style cases the Go writer omits.
  • Mach-O: emit via Go's debug/macho plus a wrapper for the load-command extras (LC_DYLD_INFO, LC_CODE_SIGNATURE).
  • PE/COFF: hand-rolled writer in compiler3/objfile/pe/ (Go's debug/pe is read-only). Mandatory .pdata + .xdata for x86_64 Windows, ARM64 unwind bytecode for aarch64 Windows.
  • Wasm: hand-rolled writer in compiler3/emit/wasm/ (no debug/wasm in stdlib).

The format reference files (~/notes/Spec/5500/formats/01_elf.md through 05_ape_cosmopolitan.md) catalog every section, header, and load command Mochi emits.

§9 Phase 1 target matrix

Five host combinations are "must-have" for phase 1 (~/notes/Spec/5500/targets/00_targets_summary.md):

Target ISAOSFormatABIStatusComplexity
x86_64LinuxELFSysVmust-have2
aarch64LinuxELFAAPCS64must-have2
aarch64macOS (Apple Silicon)Mach-OApple ABImust-have3
x86_64macOSMach-OSysVmust-have3 (freebie with Universal 2)
wasm32browser + WASI.wasmWasm 3.0 + GCmust-have2

Estimated phase-1 effort: ~3-4 engineer-months for first working backends across all five targets, assuming Mochi reuses Go's debug/elf + debug/macho and writes a Wasm emitter from scratch.

§10 Phase 2 target matrix

Four additional combinations promoted from "should-have" or "could-have":

Target ISAOSFormatABIStatusComplexity
x86_64WindowsPE/COFFMS ABIshould-have4
aarch64WindowsPE/COFFMS ARM64 ABIcould-have4
riscv64LinuxELFLP64D (RVA22/RVA23)could-have3
polyglotall sixAPEn/a (post-link)could-have3 (cosmocc dep)

Estimated phase-2 effort: ~3 engineer-months for Windows targets (.pdata / .xdata / IAT machinery), 1 engineer-month for riscv64, 2 weeks for the APE bundler.

§11 Out-of-scope targets

Per ~/notes/Spec/5500/targets/00_targets_summary.md §5:

  • iOS / iPadOS / visionOS / tvOS / watchOS: dedicated mobile MEP.
  • ppc64le, s390x, loongarch64: real but small user bases. Add on demand.
  • MIPS: sunset.
  • GPU compute (Metal AIR, CUDA PTX, ROCm, SPIR-V, WGSL): separate SIMT MEP.
  • ARM64EC: deferred until Mochi has a story for x64 plugin interop on Windows.
  • macOS x86_64 once Apple removes Rosetta: handle when it happens (likely 2027+).

Phased plan

Each phase ships as one PR or a small named set of PRs, gated by the criterion in the right column. No phase ships until its gate is green. The MEP file is updated with measured results at each phase boundary (MEP-37 / MEP-38 / MEP-39 / MEP-40 discipline).

PhaseDeliverableGate
0Spec freeze, taxonomy lock, sidebar entryThis MEP merged to main, sidebar updated, meps.json entry present
1compiler3/emit/copypatch/ skeleton + stencilgen tool, x86_64 Linux onlymochi run --jit=copypatch runs hello-world, matches interpreter output bit-for-bit
2Copy-and-patch JIT covers all non-allocating vm3 ops on x86_64 Linux + aarch64 LinuxBG kernels (binary_trees, n_body) run under copy-and-patch JIT, within 5x of Go
3Apple Silicon support: mach-o, JIT entitlement, ad-hoc signingmochi run --jit=copypatch on aarch64 macOS within 5x of Go on BG kernels
4compiler3/emit/c/ skeleton + linker driver, x86_64 Linux onlymochi build hello.mochi produces a native ELF, runs and prints expected output
5C-as-target AOT covers all four phase-1 native targets via system cc + LLDmochi build --target=<each> from x86_64 Linux host produces working binary on each
6Wasm 3.0 + WasmGC emitter, BG kernel subsetmochi build --target=wasm32-wasi hello.mochi runs under Wasmtime, matches interpreter output
7DWARF 5 line tables on all four native targets, gdb/lldb backtrace testgdb shows correct source line on segfault in test program
8Phase 2: Windows x86_64 + aarch64, riscv64 Linux, APE bundlermochi build --target=<each> works from cross-compile host; APE binary runs on Linux+macOS+Windows
9Phase 2: Wasm AOT via wasmtime compile, QBE backend for small static binariesmochi build --emit=wasm-aot produces .cwasm, mochi build --emit=qbe produces sub-1MB stripped ELF

The phase numbers do not match a calendar; they match a dependency order. Phase 1-3 are the JIT track; Phase 4-7 are the AOT track; Phase 8-9 are the cross-platform expansion. The two tracks can run in parallel after Phase 1.

§11 Risks

11.1 Clang ABI drift breaks stencils

Stencil output depends on Clang's code generation for each version. A Clang upgrade can change calling-convention details, register allocator decisions, or relocation kinds in ways that the runtime patcher does not expect. Mitigation: pin a Clang version in CI (tools/stencilgen/CLANG_VERSION); differential-test every stencil set against the vm3 interpreter on every PR that bumps the Clang version.

11.2 macOS arm64 JIT entitlement

Apple requires every JIT process to ship with a signed binary carrying com.apple.security.cs.allow-jit or com.apple.security.cs.allow-unsigned-executable-memory (the former is preferred). Without it, mmap(PROT_EXEC) fails with EPERM. Mitigation: ship signed Mochi releases with the entitlement plist; document the codesign --entitlements jit.plist flow for users who build Mochi from source on Apple Silicon.

11.3 Code-cache memory pressure

The copy-and-patch code cache is mmap'd at process start and grows as more methods JIT. A long-running REPL or server hits the cap eventually. Mitigation: configurable cap (MOCHI_JIT_CACHE_MB, default 64 MB); LRU eviction when the cap is reached; fallback to vm3 interpretation for evicted methods. The eviction policy is bench-tuned in Phase 3.

11.4 C compiler not installed

mochi build assumes the user has a C compiler. On Ubuntu/Debian this is apt install build-essential; on macOS this is xcode-select --install; on Windows this is the Visual Studio Build Tools download. Mitigation: ship mochi doctor subcommand that detects missing toolchain pieces and prints the install command for each OS; document the zig cc path as the universal fallback ("install Zig, get a C compiler for free").

11.5 Wasm size

A Mochi Wasm module carries the Mochi runtime (handle ops, arena allocator, slow-path callbacks) in addition to the user program. Initial size estimate: 200-400 KB compressed for hello-world, dominated by the runtime. Mitigation: tree-shake the runtime via the same closed-world discipline AOT'd C uses; phase-2 work in ~/notes/Spec/5500/backends/12_wasmtime_aot.md measures real sizes and sets a target.

11.6 Windows ABI complexity

The x86_64 Windows ABI differs from SysV in calling convention (RCX/RDX/R8/R9 vs RDI/RSI/RDX/RCX), shadow space (32 bytes mandatory), and unwind info (.pdata + .xdata are not optional; they are required for any function over a trivial size). The aarch64 Windows ABI adds its own unwind bytecode encoding (xdata blocks are a per-function bytecode program, not just metadata). Mitigation: phase 2 budgets 4 engineer-weeks for Windows alone; gate on real binaries running under Windows Defender's exception handler before declaring done.

11.7 Single-file deployment expectations

Users coming from Go expect mochi build to produce a single binary with no external dependencies. Phase 1 with glibc dynamic linking violates this expectation on Linux. Mitigation: make mochi build --portable (musl static-PIE) prominently documented, and make mochi build --bundle (single-file with embedded interpreter for any dyn-typed escape) a phase-2 deliverable. The default mochi build produces a normal dynamically-linked binary; the user opts in to portability.

11.8 Cross-compilation testing

Cross compiling from a single host (e.g., a macOS CI runner) to all four phase-1 targets requires CI to actually execute the cross-compiled binaries on each target. Mitigation: GitHub Actions matrix (linux/amd64, linux/arm64, macos/arm64, macos/amd64, browser via Playwright headless) runs the same BG kernel suite under each binary; cross-compile output is byte-for-byte deterministic across hosts (Reproducible Builds Project compatibility) so the cross-host build is verifiable.

11.9 Backend bus factor

Copy-and-patch is a niche technique. CPython 3.13 made it production-validated, but the institutional knowledge is in two papers (Xu+Kjolstad PLDI 2021, Bucher CPython PEP 744) and three reference implementations (CPython, the original Tiramisu-stencil work, and JSC's WTF). Mitigation: budget time for two Mochi contributors to read the substrate (~/notes/Spec/5500/naive/00_naive_summary.md reading order), pair-program the first stencil set, and document the stencilgen tool thoroughly. The reading-list discipline matches MEP-40's substrate work.

11.10 C-as-target produces "wrong-feeling" stack traces

Crash dumps from AOT'd C code show C-level stack frames (mochi_op_add_i64_at_0x14), not Mochi-level frames. This is the same UX hit Crystal and Nim took. Mitigation: phase-2 DWARF work emits DW_AT_artificial on synthetic C frames and DW_AT_name carrying the Mochi-source name; gdb and lldb both honor this. Stack traces in mochi build --mode=dev mode show Mochi names; release mode keeps the C names for smaller debug-info size.

§12 References

The full research substrate lives in ~/notes/Spec/5500/ (73 deep-dive files plus six summaries). Each file carries a §1 Provenance section with canonical URLs. The most load-bearing citations for this MEP are:

Code generation backends

Copy-and-patch and naive emission

AOT case studies

Target ABIs and formats

Linkers, runtime, debug

Recent papers

  • ~/notes/Spec/5500/papers/01_pldi_2024_2025_codegen.md
  • ~/notes/Spec/5500/papers/02_popl_2024_2025_compiler.md
  • ~/notes/Spec/5500/papers/03_mlir_dialects_2026.md
  • ~/notes/Spec/5500/papers/04_cranelift_design.md
  • ~/notes/Spec/5500/papers/06_compile_time_vs_runtime_tradeoff.md

Mochi cross-references

  • MEP-23 (Compile-time budget): provides the compile-time targets each backend must meet.
  • MEP-40 (vm3 + compiler3): provides the typed IR, handle Cell, typed arenas, three-bank register file.
  • MEP-41 (Memory Safety): provides the verifier rules, generation-as-secret hygiene, W^X + PAC/BTI hardening checklist for the JIT code page.

§13 Workflow note (for implementers)

The MEP-39 standing rule applies: every win must be a generic backend improvement, not a single-purpose super-op. Stencils are generic by construction (one per opcode, not per program pattern); C-as-target is generic (the same lowering for every Mochi program); the Wasm emitter is generic (same module shape regardless of program).

Every phase deliverable is one PR (or a small named set of PRs) gated by the named criterion. No phase ships until its gate is green. The MEP file is updated with measured results at each phase boundary.

The MEP and the code ship in the same PR. A backend change without a corresponding spec update is rejected by review; a spec change without test coverage in the same PR is rejected by review. This is the MEP-spec-in-sync rule.

The two-track structure (JIT phases 1-3, AOT phases 4-7) means contributors can take either track independently after Phase 1. Phase 8-9 are the cross-platform expansion and require both tracks at parity on the existing four targets before adding the fifth (Windows) or sixth (riscv64).

No phase introduces cgo on the Mochi build host. The shipping Mochi binary stays pure-Go-no-cgo. Clang is a build-time dependency of stencilgen, not a runtime dependency. The user's cc is a build-time dependency of mochi build, not a runtime dependency. This is the same identity rule that MEP-40 vm3 preserves.

The five-research-substrate discipline is intentional: every architecture decision in §1-§10 points to a specific file in ~/notes/Spec/5500/. A reviewer who disagrees with a choice should be able to find the substrate file, read the alternatives, and either propose a different file or argue the substrate is wrong. This is the same provenance discipline MEP-41 uses with ~/notes/Spec/5400/.

The public statement from MEP-41 §10.8 ("Mochi is designed to enable signatories of the CISA Secure-by-Design Pledge to use it as part of their memory-safety roadmap") extends to MEP-42 by virtue of the W^X + PAC/BTI + Spectre-index-masking hardening checklist in §2 above. Phase 2-3 of MEP-42 satisfies the JIT-hardening clauses of MEP-41's public statement; the statement should be updated in the same PR that closes MEP-42 Phase 3.