MEP 45. Mochi-to-C transpiler: ahead-of-time native binaries via C as target language
| Field | Value |
|---|---|
| MEP | 45 |
| Title | Mochi-to-C transpiler |
| Author | Mochi core |
| Status | Draft |
| Type | Standards Track |
| Created | 2026-05-22 17:55 (GMT+7) |
| Depends | MEP-4 (Type System), MEP-5 (Type Inference), MEP-13 (ADTs and Match), MEP-41 (Memory Safety) |
| Research | ~/notes/Spec/0045/01..12 |
| Tracking | /docs/implementation/0045/ |
Abstract
Mochi today ships vm3 (mochi run) and a Go-embedded build path (mochi build) that bundles the existing runtime into a Go binary. Neither produces a portable single-file native artefact, and neither reaches architectures outside Go's host matrix. MEP-45 specifies a standalone ahead-of-time pipeline that closes both gaps from first principles: read Mochi source, type-check via the existing frontend (parser + type checker), monomorphise, lower to a small purpose-built IR (aotir), emit ISO C23 source plus a call into libmochi.a, drive a vendored cross-cc, and ship a statically linked native binary. The implementation lives in a fresh tree transpiler3/c/ and reuses only the parser and type checker; no code, IR, or runtime piece is shared with prior backends.
The pipeline owns every stage past the type checker. The build driver vendors zig cc (zig 0.16 series) as the cross-compiler so every tier-1 triple is buildable from any host: x86_64-linux-{gnu,musl}, aarch64-linux-{gnu,musl}, aarch64-darwin, x86_64-darwin, x86_64-windows-msvc (via clang-cl), x86_64-windows-gnu (via zig cc + mingw), and wasm32-wasi. The master correctness gate is byte-equal stdout from the produced binary versus vm3 on the entire fixture corpus, across every tier-1 target (via qemu-user-static for cross-arch on Linux CI). vm3 serves as the recording oracle for expect.txt goldens; the transpiler does not link against, embed, or otherwise depend on vm3's implementation.
Three load-bearing decisions:
- Full monomorphisation, not selective boxing. Every concrete instantiation gets a fresh C struct and a fresh set of helpers. Polymorphic recursion is forbidden at type-check time (Mochi already rejects it), so the instantiation set is finite. The boxed-value
mochi_valuetype exists only at FFI boundaries and for the dynamic-dispatch corner of the query DSL, never on hot user-code paths. - BDWGC default for v1, with the runtime layer kept GC-agnostic. MMTk (Rust-based, ~3 MB), Perceus reference counting (Koka), Bacon-Rajan cycle collection, and Immix variants are all viable; BDWGC wins v1 on integration cost, platform coverage (every tier-1 triple, including WASI with patches), and proven scale (Crystal, Vala, multiple Schemes). The runtime headers segregate GC operations behind
mochi_gc_*thunks so a v2 swap to a precise GC does not touch generated code. WASM target disables BDWGC and uses a precise allocator with a shadow stack. - setjmp/longjmp for try/catch, not zero-cost table-driven unwind. Itanium-style table unwind is faster on the happy path (zero runtime cost when no exception fires) but requires per-architecture frame metadata that we would have to emit or rely on libgcc_s/libunwind for. setjmp/longjmp works on every libc with no extra emit work, costs a register-save on
tryentry, and matches the Erlang-VM and the early-OCaml strategies. v2 may swap to libunwind once the cross-arch story is proven on the happy path.
The gate for each delivery phase is empirical: every Mochi source file in tests/vm/valid/, examples/v0.3/, examples/v0.4/, and examples/v0.5/ must compile via the AOT pipeline and produce stdout that diffs clean against the expect.txt recorded by vm3. Sanitiser-clean (ASan, UBSan, TSan, MSan, LeakSan) is the secondary gate, run nightly. Reproducible-bytes-across-two-hosts is the tertiary gate, run per release.
Motivation
Mochi today ships vm3 (mochi run) and a Go-embedded build path (mochi build) that wraps vm3 into a Go binary. Neither produces a distributable single-file native artefact. The Go-embedded variant requires the Go runtime, weighs in the tens of megabytes, and can only target architectures the Go toolchain reaches.
A from-first-principles AOT pipeline that emits C resolves three concrete user-facing limitations:
- Distribution shape. Shipping a Mochi program today means shipping vm3 (or the Go-embedded wrapper). The AOT binary is one statically linked native ELF/Mach-O/PE per triple, with no runtime dependency beyond libc, ranging from ~3 MB hello-world (release/strip) up to ~12 MB for a program that pulls in the full runtime (HTTP, LLM, FFI).
- Architecture surface. Every Linux distro on every architecture, every modern macOS, every Windows host, every WASI runtime, every BSD, eventually Android and iOS.
zig cccross-compiles to all of these from any host. The Go-embedded build path covers fewer targets and binds Mochi's release cycle to Go's. - Performance ceiling. vm3 lands in the 3-5x-of-Go band (MEP-39 §6.16 close-out, recorded as the prior baseline). The AOT path inherits Clang/GCC
-O2/-O3codegen quality, link-time optimisation, dead-code elimination, and architecture-specific tuning, which on tight numeric and dataset workloads consistently lands within 1.5x of equivalent hand-written C.
A secondary motivation: AOT opens embedding. libmochi.a plus mochi/core.h is a valid C library; a host C/C++ program (game engine, simulator, embedded controller) can call mochi_runtime_init and link a transpiled .o directly. Cosmopolitan via cosmocc produces a single APE binary that runs unmodified on macOS, Linux, Windows, FreeBSD, NetBSD, and OpenBSD. Neither is reachable via the Go-embedded path.
Specification
This section is normative. Sub-notes under ~/notes/Spec/0045/01..12 are informative.
1. Pipeline and IR
MEP-45 owns every stage past the type checker. The pass order:
Mochi source
│ parser (MEP-1/2/3, reused)
▼
AST
│ type checker (MEP-4/5/6, reused)
▼
Typed AST
│ monomorphise (MEP-45 pass 1)
▼
Monomorphic typed AST
│ lower (MEP-45 pass 2)
▼
aotir (MEP-45's own IR; see below)
│ match-to-decision-tree (MEP-45 pass 3; Maranget)
▼
aotir (matches lowered)
│ closure-convert (MEP-45 pass 4)
▼
aotir (closures lowered)
│ emit (MEP-45 pass 5)
▼
ISO C23 source + #line side-map
│ cc (vendored zig cc by default)
▼
Native single-file binary
aotir is MEP-45's own lowering IR, sized to the C codegen's needs. It is not reused from any other backend. Properties:
- Fully monomorphised. No type variables; every collection and ADT carries concrete type arguments.
- Fully type-elaborated. Every expression has a resolved type; every implicit conversion is materialised as an explicit cast node.
- Closure-explicit. Function values carry an explicit environment-struct type after closure-conversion (pass 4); free-variable capture is no longer implicit.
- Match-explicit. Pattern matches lower to a Maranget decision tree (pass 3); no
matchnode survives past that pass. - Effect-annotated. Every call site that may panic carries a marker so the emit pass knows where to plant
setjmp/longjmpmachinery.
The codegen visits each aotir type exactly once via a memoised lower_type(aotirType) → CType. Memoisation guarantees that two structurally identical types share a single C type and a single set of helpers. See note 06 §12.
2. Name mangling
Mangled identifier form:
{pkg}__{module}__{name}[__{instArgsHash6}]
where pkg is the source package path with / replaced by _, module is the file stem, name is the source identifier, and instArgsHash6 is the first 6 hex digits of a BLAKE3 hash over the canonical printing of the instantiation arguments (omitted for non-generic symbols). Two emitted identifiers never collide across packages, generic instantiations, or arity-overloaded constructors. See note 05 §3.
3. Type lowering table
| Mochi type | C type | Notes |
|---|---|---|
int | int64_t (alias mochi_int) | two's complement per C23 mandate |
float | double (alias mochi_float) | IEEE 754 double |
bool | bool (<stdbool.h>) | |
string | mochi_str | immutable UTF-8 slice + precomputed hash |
time | int64_t (mochi_time) | ns since Unix epoch UTC |
duration | int64_t (mochi_dur) | ns |
?T | T for pointer-shaped T (niche), else tagged struct | note 06 §3.1 |
list<T> | mochi_list__T | growable dense vector |
map<K,V> | mochi_map__K_V | Swiss table |
omap<K,V> | mochi_omap__K_V | Swiss table + insertion-order list (query DSL) |
set<T> | mochi_set__T | Swiss table with elided value slot |
stream<T> | mochi_stream__T * | bounded ring + subscriber list |
chan<T> | mochi_chan__T * | bounded ring, point-to-point |
record R | struct pkg_R | packed in source field order |
sum S | struct pkg_S with tag + union u | recursive variants box payload |
fun(A,B):C | mochi_closure__C_A_B | fat pointer (code + env) |
agent A | struct pkg_A with embedded mailbox | note 09 §4 |
4. Expression and statement lowering
Expressions lower to single-assignment C temporaries with explicit casts on every implicit conversion. Short-circuit && / || lower to C short-circuit operators. Integer arithmetic uses C operators (two's complement; division by zero raises MOCHI_ERR_DIVZERO in the checked profile, UB in --fast-int). Float arithmetic preserves IEEE 754 NaN propagation. String + lowers to mochi_str_cat; s[i] returns the i-th code point as a one-character mochi_str.
Collection literals follow the MEP-0951 §1 rule: a literal inside a function body always allocates a fresh collection per call; a literal in a top-level let is emitted as a static const array when every element is a compile-time constant.
if, while, for in, match, return, break, continue lower to the obvious C forms. for x in xs iterates by index on lists, by paired iterator (insertion order) on maps, and by subscription callback on streams (the surrounding function must therefore be on a fiber).
match lowers via Maranget's decision-tree algorithm (Compiling Pattern Matching to Good Decision Trees, ML Workshop 2008). See note 05 §10.
try { ... } catch e { ... } lowers via setjmp/longjmp with a per-thread exception jump-buffer stack. See note 05 §9.
5. Closures
Free functions and methods lower to plain C functions. Closures are fat pointers (code, env):
typedef struct {
bool (*code)(void *env, mochi_int a, mochi_str b);
void *env;
} mochi_closure__bool_i64_str;
Free functions adapted into closures get an env == NULL shim. Method values get an env == self shim. Closure envs are heap-allocated when they escape; v1 always heap-allocates (conservative). The @no_escape attribute is Phase 2. See note 06 §4.
6. Runtime library
Single static archive libmochi.a plus headers mochi/include/mochi/*.h. Subsystems (note 04, note 10 §7):
mochi/core: memory, types, errors, panicmochi/sched: M:N work-stealing scheduler over minicoro, channels, streams, agentsmochi/query: cwisstable Swiss tables, omap, sort, arenamochi/io: file, network, time, envmochi/text: utf8proc + simdutfmochi/data: yyjson (JSON), libfyaml (YAML), home-grown CSVmochi/net: libcurl-backed HTTP (H1/H2/H3 via nghttp2/ngtcp2)mochi/llm: provider abstraction (OpenAI, Anthropic, Google, llama.cpp local)mochi/ffi: C direct, Go via Unix-domain RPC or c-archive, Python via embedded libpython3 or RPC, TS via QuickJS-NG
7. Memory management
Default GC: BDWGC, conservative, incremental, generational mode on. Allocations route through mimalloc with BDWGC tracing on top. Rationale: mature, every tier-1 platform, ~150 KB static, pause times within target on the BG corpus. The runtime is GC-agnostic by design so v2 can swap in MMTk (precise) or Perceus + Bacon-Rajan (refcount + cycle collection) without touching generated code. WASM target disables BDWGC and uses a precise allocator with a shadow stack.
The query pipeline uses arena allocation: intermediates live in a mochi_arena released at the query boundary; the surviving result list is copied to GC. See note 08 §9.
8. Concurrency
M:N work-stealing scheduler, one OS thread per hardware core, lightweight fibers (~32 KB initial stack) on minicoro. Blocking syscalls execute on an overflow pool. Streams are MPMC broadcast channels with a bounded ring; emit blocks the caller when full (v1 default). Agents are records with a mailbox; intent calls enqueue typed messages; the agent's run-loop processes them in order on a dedicated fiber. CPU preemption (Go-style signal preemption) is not in v1.
9. Error model
setjmp/longjmp with a per-thread exception jump-buffer stack. Throwing functions need no annotation; any function may panic. Built-in codes:
| Code | Name | Source |
|---|---|---|
| -1 | MOCHI_ERR_FETCH | network or HTTP non-2xx |
| -2 | MOCHI_ERR_PARSE | JSON / YAML / CSV decode |
| -3 | MOCHI_ERR_TYPE | runtime type mismatch |
| -4 | MOCHI_ERR_INDEX | OOB index / missing key |
| -5 | MOCHI_ERR_DIVZERO | integer divide by zero |
| -6 | MOCHI_ERR_OVERFLOW | integer overflow (debug only) |
| -7 | MOCHI_ERR_FFI | FFI subprocess failure |
| -8 | MOCHI_ERR_LLM | provider error from generate |
| -9 | MOCHI_ERR_ASSERT | expect false |
User error codes are positive integers.
10. ABI and target portability
Target: ISO C23 (-std=c23) with a C17 fallback when a target compiler lacks C23. Hard floor: C11.
Tier-1 (gated on every PR):
x86_64-linux-gnu, x86_64-linux-musl,
aarch64-linux-gnu, aarch64-linux-musl,
aarch64-darwin, x86_64-darwin,
x86_64-windows-msvc (via clang-cl),
x86_64-windows-gnu (via zig cc + mingw),
wasm32-wasi
Tier-2 (nightly): aarch64-windows-msvc, riscv64-linux-gnu, armv7-linux-gnueabihf, BSDs, wasm32-emscripten.
Tier-3 (best effort): Android, iOS, loongarch64, s390x, powerpc64le.
ABI: standard system V on Linux/BSD, AAPCS64 on aarch64, Windows x64 on MSVC, RISC-V LP64D on riscv64. We do not introduce a custom calling convention. We avoid C variadics (...) because of the AAPCS64 Apple-silicon surprise; explicit overloads or struct args instead. See note 07 §4.
11. Build driver
mochi build [PATH]
mochi build --target=TRIPLE PATH
mochi build --release PATH
mochi build --debug PATH
mochi build --secure PATH
mochi build --apex PATH # Cosmopolitan one-binary
mochi build --emit={ast|mir|c} PATH
mochi build --out PATH PATH
mochi build -j N PATH
mochi build --reproducible PATH
Cache layout under .mochi/cache/{ast,aotir,c,obj,bin}, content-addressed by a BLAKE3 hash over (source, transitive imports, profile, transpiler version). A cache hit is "is the file present?", no metadata DB. Cross-compilation defaults to a vendored zig cc (0.16 series); the vendor step lives in transpiler3/c/toolchain/zig/ and is invoked by the build driver on first use per cache directory. See note 10.
12. Reproducibility
SOURCE_DATE_EPOCH honoured. __DATE__/__TIME__ never embedded. -ffile-prefix-map=$PWD=. and -fdebug-prefix-map=$PWD=. strip absolute paths. All non-libc deps static-linked. Functions and globals ordered by sorted IR identifier, not by hash-map iteration order. Sample artefact SHA-256 published per release. See note 07 §7.
13. Hardening
Release profile defaults: -fstack-protector-strong, -D_FORTIFY_SOURCE=3, -fPIE, -Wl,-z,relro,-z,now, -Wl,--icf=safe, -fcf-protection=full on x86 CET, -mbranch-protection=standard on aarch64, /guard:cf on MSVC. --secure profile adds CFI (-fsanitize=cfi-icall), SafeStack, hardened allocator (mimalloc-secure or scudo), ShadowCallStack on aarch64.
14. Diagnostics
The compiler3 type-checker surfaces nearly all errors with Mochi spans. For errors that escape into the C compiler (linker errors, header conflicts), the build driver post-processes the compiler's output and rewrites file:line refs back to Mochi spans via the #line side-map. Include-cascade noise is collapsed. See note 07 §10.
15. Debug info
DWARF 5 on Linux/BSD/macOS, CodeView on Windows. Split-debug-info on by default. #line directives on every emitted statement so gdb/lldb/DAP show Mochi source. Release builds put debug info in a sidecar (.dSYM/.debug/.pdb).
16. Output style
Generated C is reviewable by a human (note 07 §9): two-space indent, one decl per line, braces on every block, no macros in generated code (only in runtime headers), no reserved-identifier reuse, leading source-comment per function, #line directives throughout.
Rationale
Why C, not LLVM IR, WASM, Rust, or a JIT-only path
- LLVM IR. Heavier toolchain (~300 MB), moving target across versions, harder to cross-compile without our own driver, no source-readable output for review.
- WASM as primary. Native perf gap (20-40% behind on WAMR/wasmtime), no portable threads in 2026, GC story still incomplete on common runtimes. WASM is a target of the C path, not the IR.
- Rust as target. Compile times destroy iteration; borrow checker rejects most generated code without an
Rc/RefCellsea that defeats the safety. - C++/Zig. C++ slower compile times, worse template errors; Zig pre-1.0. Neither buys us anything over C with a bit more code.
- JIT-only. Per-arch engines months each; W^X gets harder every year on macOS/iOS/Android; harder to debug; harder to sandbox. A JIT path is complementary to AOT, not a replacement; MEP-45 is the AOT story end-to-end.
C gives the project a stable, slow-moving language (C23 is the most ambitious C standard in 30 years), a vast toolchain ecosystem, and human-readable output a security reviewer can audit.
Why monomorphisation, not boxed values
Mochi's generics are limited to type parameters on collections and user record/sum types (no higher-kinded types). Polymorphic recursion is rejected at type-check time, so the instantiation set is finite. Full monomorphisation produces typed C structs and tight numeric loops with predictable layout. Boxed mochi_value exists only at FFI boundaries.
Why differential testing as the master gate
vm3 is the existing source of truth. Byte-equal stdout from the AOT binary versus vm3, on every fixture, is the strictest behaviour check available. vm3 is used here only as the recording oracle for expect.txt; the transpiler does not consume any of vm3's IR, runtime, or codegen. Property tests, fuzzing, and reproducibility are secondary gates layered on top.
Why BDWGC for v1
Mature, every tier-1 platform, ~150 KB static, pause times within target on BG corpus, proven at scale by Crystal, Vala, multiple Schemes. Swap path to a precise GC (MMTk) or refcount + cycle collection (Perceus + Bacon-Rajan) is open because the runtime layer is GC-agnostic.
Backwards Compatibility
Additive. mochi run and mochi test keep vm3 by default. mochi build gains --target=c (and indirectly every triple flag like --target=x86_64-linux-musl). No language surface change; no stdlib surface change beyond mirroring vm3's existing user-visible exposure inside the C runtime.
Observable behaviour must match vm3 byte-for-byte on the fixture corpus. Programs relying on implementation-defined vm3 behaviour (allocation order observable through mochi_runtime_stats, GC pause timing) are explicitly non-portable; the spec already disallows reliance on these.
Reference Implementation
Code lives under a single fresh tree transpiler3/c/, with no sharing of IR, runtime, or driver with any prior backend. The shape:
| Tree | Purpose |
|---|---|
transpiler3/c/aotir/ | the aotir types and verifier |
transpiler3/c/lower/ | typed AST → aotir (passes 1-4: monomorphise, lower, match-to-tree, closure-convert) |
transpiler3/c/emit/ | aotir → ISO C23 source + #line side-map (pass 5) |
transpiler3/c/build/ | driver: write source, invoke cc, link, cache (.mochi/cache/) |
transpiler3/c/toolchain/zig/ | vendored zig cc 0.16 series + install/upgrade flow |
transpiler3/c/runtime/include/ | public runtime headers (mochi/*.h) |
transpiler3/c/runtime/src/ | runtime source compiled into libmochi.a |
tests/transpiler3/c/ | fixture corpus, expect files, integration tests |
The phased delivery plan is the §Phases section below. Each phase ships as a sub-PR auto-merged per the project's auto-ship convention; tracking pages live under /docs/implementation/0045/ and get filled in along the way.
Phases
The plan walks the language surface bottom-up (Phases 0-10), then expands the target matrix (Phases 11-13), then completes the feature surface (Phases 14-15), then layers the quality gates (Phases 16-18), and culminates in v1.0 (Phase 19). The numbering is dependency order, not calendar. Phases 0-10 are strictly sequential because each adds machinery the next assumes. Phases 11-15 can run in parallel after Phase 10 lands. Phases 16-18 run after Phase 15 lands and continue in perpetuity.
Phase conventions:
- Gate. A single measurable criterion. A phase is LANDED only when its gate is green on every target listed.
- Targets. Tier-1 triples in scope at this phase. Missing tier-1 rows at phase boundary become N.1, N.2, ... sub-phases per the umbrella-phase coverage rule.
- Status / Commit columns. Filled in along the way as work lands. Status values:
NOT STARTED,IN PROGRESS,BLOCKED,LANDED,DEFERRED. Commit is the merge commit short SHA onmain. - Goal-alignment audit. Before a phase starts, a one-paragraph audit on its tracking page confirms the gate moves the user-facing goal ("ship a Mochi program as a single native binary on this target"), not spec-internal scaffolding.
- Spec-in-sync. The PR that lands a phase's code must also update this MEP file (close out the phase block, update Status / Commit) and the tracking page under
/docs/implementation/0045/. - Reference oracle. Fixture goldens (
expect.txt) are recorded by running the same source through vm3. The gate diffs the AOT binary's stdout against that golden. vm3 is the oracle only; it is not linked, imported, or otherwise depended upon bytranspiler3/c/.
Phase 0. Spec freeze and skeleton trees
| Field | Value |
|---|---|
| Status | IN PROGRESS |
| Commit | — (PR #22067) |
| Gate | This MEP merged on main; transpiler3/c/{aotir,lower,emit,build,toolchain/zig,runtime/{include,src}}/doc.go compile clean and report zero tests; tests/transpiler3/c/ exists with a README.md; implementation tracking pages for every phase exist under /docs/implementation/0045/; sidebar entries visible on the website |
| Targets | n/a (paperwork phase) |
| Tracking | /docs/implementation/0045/phase-00-skeleton |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 0.0 | This MEP merged with refactored framing, §Phases section, implementation tracking docs, sidebar wiring | IN PROGRESS | — (PR #22067) |
| 0.1 | transpiler3/c/{aotir,lower,emit,build,toolchain/zig,runtime/{include,src}}/doc.go compile clean on every host (go vet, go test returns no test files) | IN PROGRESS | — (PR #22067) |
| 0.2 | tests/transpiler3/c/README.md documents fixture layout and naming convention | IN PROGRESS | — (PR #22067) |
Test set. Documentation/website build only: npm run gen:meps && npm run build clean; go vet ./transpiler3/c/... clean.
Risks. None substantial; paperwork phase.
Phase 1. Hello world
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | mochi build --target=c-aot --out=/tmp/hello tests/transpiler3/c/fixtures/hello/hello.mochi && /tmp/hello | diff - tests/transpiler3/c/fixtures/hello/expect.txt exits 0 on host triple (Phase 1.1 onward; Phase 1.0 ships the in-process Driver.Build only). --target=c-aot is the staging target value used until MEP-42's --target=c is retired; the eventual one-flag shape is mochi build --target=c <file> --out=<path> |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-01-hello-world |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 1.0 | Source-to-binary minimum: parser reused; lower (one fn returning unit, one print(string) call); emit; build via host cc discovery ($CC, then cc, then clang, then gcc); single integration test passes | NOT STARTED | — |
| 1.1 | --target=c-aot, --out PATH, --emit=c|executable CLI flags wired through cmd/mochi/main.go; --emit=c dumps the generated C as <out>.c next to the binary for review | IN PROGRESS | — (this PR) |
| 1.2 | .mochi/cache/ BLAKE3 content-addressed cache layer (default root: $HOME/.mochi/cache, override via MOCHI_CACHE_DIR); rebuild on unchanged source is a copyFile no-op; cache key is BLAKE3 over (transpiler-version, profile, target-triple, runtime-fingerprint, source-path, source-bytes); --emit=c builds bypass the cache | IN PROGRESS | — (this PR) |
| 1.3 | Fallback to vendored zig cc when host cc discovery fails; transpiler3/c/toolchain/zig/install.go downloads zig 0.16.0 on first use and pins via SHA-256 manifest covering the six tier-1 hosts (x86_64-linux, aarch64-linux, x86_64-macos, aarch64-macos, x86_64-windows, aarch64-windows); opt-out via Driver.NoZigFallback | IN PROGRESS | — (this PR) |
Deliverables.
transpiler3/c/aotir/: type set for unit,string;Function,Block,Stmt = Call(funcRef, args),Expr = StringLit, verifier.transpiler3/c/lower/: typed AST →aotirfor the one-function-one-print shape; named pass entryLower(tast.Program) (*aotir.Program, error).transpiler3/c/emit/:aotir.Program→ C source string; emits#include,int main(void),mochi_print_str("hello, mochi!\n");,return 0;.transpiler3/c/build/:DriverwithBuild(srcPath, outPath, target, profile); host-cc discovery; cache.transpiler3/c/runtime/include/mochi/print.h+transpiler3/c/runtime/src/print.c:void mochi_print_str(const char *s);(fwrite to stdout).tests/transpiler3/c/fixtures/hello/hello.mochi+expect.txt; integration test undertranspiler3/c/build/build_test.go::TestHello.
Test set. go test ./transpiler3/c/build (Phase 1.0 TestHello for in-process Driver.Build; Phase 1.1 TestCLIHello/{executable,emit-c} for mochi build --target=c-aot; Phase 1.2 cache-hit no-op; Phase 1.3 vendored-zig fallback).
Risks. Host cc not present on minimal Docker images (Phase 1.3 closes via vendored zig). Windows-host cc discovery (deferred to Phase 11 alongside the cross matrix).
Phase 2. Primitives and control flow
| Field | Value |
|---|---|
| Status | IN PROGRESS |
| Commit | — |
| Gate | Arithmetic + control-flow suite (~50 fixtures: int/float ops, comparisons, if/else, while, for-in over int range, recursion) compiles and runs byte-equal vs vm3 on host triple |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-02-primitives-control-flow |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 2.0 | int (int64_t), float (double), bool; arithmetic ops; comparison ops; short-circuit && / ` | ` | |
| 2.1 | let/var, if/else, while, return, break, continue | IN PROGRESS | — |
| 2.2 | for x in start..end (int range); user-defined functions (multi-arg, multi-return-via-tuple deferred to Phase 3) | IN PROGRESS | — |
| 2.3 | Integer divide-by-zero raises MOCHI_ERR_DIVZERO (checked profile); UB under --fast-int | IN PROGRESS | — |
| 2.4 | Float NaN propagation matches vm3 byte-for-byte (IEEE 754 round-trip on %.17g print) | IN PROGRESS | — |
Test set. tests/transpiler3/c/fixtures/primitives/*.mochi (arithmetic suite, 30 cases); tests/transpiler3/c/fixtures/control-flow/*.mochi (20 cases); transpiler3/c/build/phase02_test.go runs each via the driver and diffs stdout.
Risks. Float-print precision divergence vs vm3 (the %.17g round-trip is exact in practice on tier-1 hosts but the gate must catch any libc variation). Resolved by 2.4.
Phase 3. Records, lists, maps, sets
| Field | Value |
|---|---|
| Status | IN PROGRESS |
| Commit | — |
| Gate | Records / collections fixture suite (~80 cases) compiles + runs byte-equal vs vm3 on host triple |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-03-records-collections |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 3.0 | Record types: struct mochi_R in source field order; field access; record literals; equality | LANDED | — |
| 3.1 | list<T>: per-T mochi_list_<T> (i64 / f64 / bool / str); literals [e, ...]; xs[i]; len(xs); append; for x in xs | LANDED | — |
| 3.2 | map<K,V>: per-(K,V) hand-rolled open-addressing hash table (8 instantiations, K in int/string, V in int/float/bool/string); literals {k: v, ...}; m[k]; len; keys; values; k in m; for k in m (key-sorted) | LANDED | — |
| 3.3 | set<T>: per-T open-addressing hash set; +, -, contains, len, for -- DEFERRED, Mochi surface (parser + type-checker) does not expose set<T>; reactivates when the language MEP adds it | DEFERRED | — |
| 3.4 | Monomorphisation pass + nested collection types: per-record list / map runtimes, list-of-map, map-of-list, empty-literal-with-annotation, list / map equality | IN PROGRESS | — |
| 3.4a | list<R> where R is a user record: per-record mochi_list_<R> struct + 4 helpers emitted into the TU prologue (mirroring the scalar list helpers); pass / return / append / for-each / index across the function-call boundary | LANDED | — |
| 3.4b | list<list<T>> where T is a scalar primitive (int / float / bool / string): per-inner mochi_list_list_<inner> struct + 4 helpers in the TU prologue (mirroring the scalar list helpers); outer literal / indexing / outer + inner len / outer append / nested for-in / pass + return across function-call boundary. map<K,list<V>> and list<map<K,V>> split out into Phase 3.4e + 3.4f. | LANDED | — |
| 3.4c | Empty literal with type annotation (let xs: list<int> = [], let m: map<int,int> = {}) | NOT STARTED | — |
| 3.4d | list / map equality (==, !=) | NOT STARTED | — |
| 3.4e | map<K, list<V>> (lists as map values) | NOT STARTED | — |
| 3.4f | list<map<K,V>> (maps as list elements) | NOT STARTED | — |
| 3.5 | omap<K,V> (insertion-order map): hash table + parallel insertion-order list -- DEFERRED, same Mochi-surface blocker as 3.3; reactivates if Phase 8 query DSL needs insertion-order group-by rows | DEFERRED | — |
Deliverables. New runtime modules per (K,V) instantiation under transpiler3/c/runtime/src/{list,map}/; monomorphisation pass under transpiler3/c/lower/mono.go. (set/ + omap/ reactivate when Phase 3.3 + 3.5 unblock.)
Test set. tests/transpiler3/c/fixtures/{records,lists,maps}/*.mochi (3.0 + 3.1 + 3.2 land 68 cases; the original "~80" target was set against 4 sub-phases including sets).
Risks. Determinism of monomorphisation pass output order (3.4 sorts by canonical type printing, so two builds of the same program produce identical generated runtime files). Phase 3.2 originally named cwisstable as the map's underlying table; it shipped as a hand-rolled open-addressing + linear-probe + load-factor-0.5 implementation (~120 LOC under MOCHI_MAP_DEFINE). The macro-expanded helpers compile into per-(K,V) externally-visible symbols, so a future cwisstable swap is a symbol-set substitution behind the same mochi_map_<K>_<V>_* ABI; the perf delta doesn't matter until Phase 18.
Phase 4. Sum types and Maranget pattern matching
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | ADT + match fixture suite (~40 cases including option<T>, result<T,E>, nested ADTs, exhaustive + non-exhaustive matches) compiles + runs byte-equal vs vm3 on host triple |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-04-sum-types-match |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 4.0 | Sum-type lowering: struct pkg_S { uint8_t tag; union { ... } u; }; recursive variants box payload; niche optimisation for ?T where T is pointer-shaped (null = None) | NOT STARTED | — |
| 4.1 | Maranget decision-tree pass: transpiler3/c/lower/match.go lowers match e { ... } to a chained switch (tag) / if (eq) tree; pass replaces MatchStmt in aotir | NOT STARTED | — |
| 4.2 | Exhaustiveness check at type-check time (already in MEP-13); panic on non-exhaustive match in --debug, UB in --fast; same behaviour as vm3 | NOT STARTED | — |
| 4.3 | Property test: theft-generated random MIR pattern set decides identically to a reference naive matcher; 10000 cases per CI run | NOT STARTED | — |
Risks. Maranget pass output size on deeply-nested patterns (note 05 §10 caps with a heuristic; revisit if any fixture explodes >100x source line count).
Phase 5. Closures and higher-order functions
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Higher-order fixture suite (~30 cases: map, filter, fold, flatMap, currying, captures by value + by reference, closures returned from functions) compiles + runs byte-equal vs vm3 on host triple |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-05-closures |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 5.0 | Closure-convert pass: transpiler3/c/lower/closure.go rewrites every closure into an explicit (code, env_struct *) fat-pointer pair; free variables move into env_struct | NOT STARTED | — |
| 5.1 | Free function as closure: env == NULL shim auto-generated per arity | NOT STARTED | — |
| 5.2 | Method as closure: env == self shim auto-generated per method | NOT STARTED | — |
| 5.3 | Closures escaping return: env heap-allocated and GC-rooted; closures known not to escape (via simple escape analysis) keep env on stack — deferred to v2 (v1 always heap-allocates) | NOT STARTED | — |
Test set. tests/transpiler3/c/fixtures/closures/*.mochi (30 cases).
Risks. Stack overflow on deeply curried chains (covered by 5.3 heap-allocation default).
Phase 6. Strings and I/O
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Strings + stdlib I/O fixture suite (~40 cases: utf-8 iteration, slicing, concat, format, file read, file write, stdin read) compiles + runs byte-equal vs vm3 on host triple |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-06-strings-io |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 6.0 | mochi_str: immutable utf-8 slice ({const uint8_t *data; size_t len; uint64_t hash;}); short-string optimisation (≤15 bytes inline); precomputed BLAKE3-trimmed hash for map keys | NOT STARTED | — |
| 6.1 | +, len, [i] (returns one-codepoint mochi_str), contains, startsWith, endsWith, split, join, toUpper/toLower via utf8proc | NOT STARTED | — |
| 6.2 | print, println, format strings ("{name} is {age}" interpolation lowers to a printf-equivalent sequence with explicit width/precision) | NOT STARTED | — |
| 6.3 | File I/O: readFile, writeFile, lines, appendFile; stdin, stdout, stderr handles | NOT STARTED | — |
| 6.4 | simdutf for utf-8 validation on read; rejected input raises MOCHI_ERR_PARSE | NOT STARTED | — |
Risks. utf8proc + simdutf static-link size on macOS (~400 KB combined); acceptable per note 04 §5.
Phase 7. Error model
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Error-model fixture suite (~30 cases: panic, try { ... } catch e { ... }, deferred cleanup, finally, nested try, panic across closure boundary) compiles + runs byte-equal vs vm3 on host triple |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-07-error-model |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 7.0 | Per-thread exception jump-buffer stack (TLS); mochi_try_push / mochi_try_pop / mochi_raise(int code, mochi_str msg) | NOT STARTED | — |
| 7.1 | try { ... } catch e { ... } lowers to if (setjmp(buf) == 0) { ... } else { ... } with cleanup on the longjmp path | NOT STARTED | — |
| 7.2 | Built-in error codes (MOCHI_ERR_*) wired through runtime calls (divide-by-zero, OOB index, type mismatch, parse failure) | NOT STARTED | — |
| 7.3 | User error codes (positive integers); user panic(code, msg) lowers to mochi_raise | NOT STARTED | — |
Risks. Stack-cleanup ordering across nested try (test fixture 7.3.4 covers the longest-chain case).
Phase 8. Query DSL
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Query fixture suite (~60 cases: filter, map, group-by, order-by, distinct, union, intersect, except, inner/left/cross join) compiles + runs byte-equal vs vm3 on host triple |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-08-query-dsl |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 8.0 | Query algebra lowering: transpiler3/c/lower/query.go rewrites query expressions to a fused iterator chain (filter+map+take collapse to a single loop) | NOT STARTED | — |
| 8.1 | group by / order by / distinct / union / intersect / except operators | NOT STARTED | — |
| 8.2 | Join operators: inner (hash-join via Swiss table), left (hash-join with right-side outer fill), cross (nested loop) | NOT STARTED | — |
| 8.3 | Arena allocation: intermediates live in a mochi_arena released at query boundary; surviving result list copied to GC | NOT STARTED | — |
| 8.4 | load/save adapters: JSON (yyjson), YAML (libfyaml), CSV (home-grown) | NOT STARTED | — |
Risks. Operator-fusion correctness on left-join + group-by composition (covered by 8.2 + 8.3 test matrix).
Phase 9. Streams, agents, M:N scheduler
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Streams + agents fixture suite (~40 cases: stream emit/subscribe, agent intent dispatch, bounded channel back-pressure, shutdown protocol, fan-out fan-in) compiles + runs byte-equal vs vm3 on host triple under TSan-clean execution |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-09-streams-agents |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 9.0 | M:N work-stealing scheduler over minicoro fibers; one OS thread per hardware core; blocking syscalls execute on overflow pool | NOT STARTED | — |
| 9.1 | chan<T>: bounded ring, point-to-point, send blocks when full | NOT STARTED | — |
| 9.2 | stream<T>: bounded ring + subscriber list (MPMC broadcast); emit blocks caller when any subscriber is full | NOT STARTED | — |
| 9.3 | Agent: record with embedded mailbox; intent calls enqueue typed messages; agent's run loop processes them in order on a dedicated fiber | NOT STARTED | — |
| 9.4 | Shutdown protocol: graceful drain on SIGINT/SIGTERM; bounded-time hard kill after timeout | NOT STARTED | — |
Risks. Race-free shutdown across nested agents (TSan-clean is the only acceptable gate; 9.4 fixtures exercise the longest dependency chain).
Phase 10. FFI shells
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | C-direct FFI fixture suite (~15 cases: call a vendored C function, pass scalars + strings + records, return scalars + records, error propagation) compiles + runs byte-equal vs vm3 on host triple |
| Targets | host triple only |
| Tracking | /docs/implementation/0045/phase-10-ffi |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 10.0 | C direct: extern "C"-style binding declarations in Mochi; emit a header in the build dir; user provides a .c neighbour that gets compiled in | NOT STARTED | — |
| 10.1 | Boxed mochi_value type for FFI-crossing values (sum of scalar + string + handle); marshalling helpers | NOT STARTED | — |
| 10.2 | Go FFI via Unix-domain RPC: deferred sub-phase, ships in 10.2 only after the C-direct shell is green | NOT STARTED | — |
| 10.3 | Python FFI via embedded libpython3: deferred sub-phase, ships in 10.3 | NOT STARTED | — |
| 10.4 | TypeScript FFI via QuickJS-NG: deferred sub-phase, ships in 10.4 | NOT STARTED | — |
Risks. ABI surprise on aarch64-Apple-silicon variadics (resolved by avoiding variadics entirely, per §10).
Phase 11. Cross-compile tier-1 matrix
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Every Phase 1-10 fixture compiles via mochi build --target=<triple> from every supported host and runs byte-equal vs vm3 on the target (native on macOS+Linux, qemu-user-static for cross-arch Linux, wasmtime for wasi) |
| Targets | x86_64-linux-gnu, x86_64-linux-musl, aarch64-linux-gnu, aarch64-linux-musl, aarch64-darwin, x86_64-darwin, x86_64-windows-msvc, x86_64-windows-gnu |
| Tracking | /docs/implementation/0045/phase-11-cross-tier1 |
Sub-phases (one per missing target)
| # | Target | Status | Commit |
|---|---|---|---|
| 11.0 | x86_64-linux-gnu (native on Linux CI) | NOT STARTED | — |
| 11.1 | x86_64-linux-musl (zig cc) | NOT STARTED | — |
| 11.2 | aarch64-linux-gnu (zig cc + qemu-user-static run-gate) | NOT STARTED | — |
| 11.3 | aarch64-linux-musl (zig cc + qemu-user-static run-gate) | NOT STARTED | — |
| 11.4 | aarch64-darwin (native on macOS CI) | NOT STARTED | — |
| 11.5 | x86_64-darwin (zig cc on macOS arm64; native on x64 macOS) | NOT STARTED | — |
| 11.6 | x86_64-windows-msvc (clang-cl) | NOT STARTED | — |
| 11.7 | x86_64-windows-gnu (zig cc + mingw) | NOT STARTED | — |
Deliverables. .github/workflows/transpiler3-c-cross.yml extends cross-aot.yml's pattern: Linux CI runs Linux + cross-arch run-gates via qemu-user-static; macOS CI runs macOS run-gates; Windows CI runs Windows run-gates.
Risks. Apple-silicon code-sign on x86_64-darwin cross-build (ad-hoc sign as part of the build driver). Windows SDK availability in CI (clang-cl needs MSVC headers; resolved by using the GitHub Actions windows-2025 runner image).
Phase 12. WASM / WASI
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Every Phase 1-10 fixture compiles via mochi build --target=wasm32-wasi and runs byte-equal vs vm3 under wasmtime |
| Targets | wasm32-wasi |
| Tracking | /docs/implementation/0045/phase-12-wasm-wasi |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 12.0 | wasi-sdk vendored under transpiler3/c/toolchain/wasi-sdk/; build driver routes --target=wasm32-wasi through it | NOT STARTED | — |
| 12.1 | Precise allocator + shadow-stack root scanning to replace BDWGC (BDWGC has no upstream WASI port) | NOT STARTED | — |
| 12.2 | Stream/agent surface narrowed: no threading; M:N scheduler collapses to a single-fibre cooperative loop | NOT STARTED | — |
| 12.3 | wasmtime-driven run-gate in CI; fixture corpus subset matching the narrowed surface | NOT STARTED | — |
Risks. Precise GC swap is the largest single technical risk (12.1 — separate sub-phase so it can land or defer without blocking the rest of Phase 12).
Phase 13. APE / Cosmopolitan
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | mochi build --apex produces one APE binary; the same binary runs and produces byte-equal output on Linux, macOS, Windows, FreeBSD, NetBSD, OpenBSD CI runners |
| Targets | one-binary covering linux+macOS+windows+BSDs |
| Tracking | /docs/implementation/0045/phase-13-ape |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 13.0 | cosmocc vendored under transpiler3/c/toolchain/cosmocc/ | NOT STARTED | — |
| 13.1 | --apex build path: cosmocc replaces zig cc, output is .com.dbg + .com (stripped APE) | NOT STARTED | — |
| 13.2 | Runtime adjustments: BDWGC works under Cosmopolitan's libc (Cosmo) per upstream tests; stream/agent surface preserved | NOT STARTED | — |
| 13.3 | Cross-OS CI runners exercise the same .com artefact: Linux + macOS + Windows + FreeBSD (cirrus-ci) | NOT STARTED | — |
Risks. Cosmopolitan upstream churn (pin a version; track via transpiler3/c/toolchain/cosmocc/VERSION).
Phase 14. LLM bindings
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | LLM fixture suite (~20 cases: generate, embed, chat against OpenAI/Anthropic/Google/llama.cpp) compiles + runs byte-equal vs vm3 in replay mode (recorded cassettes); live-mode runs available behind a flag |
| Targets | host triple only (cross matrix lands incrementally per Phase 11 closeout) |
| Tracking | /docs/implementation/0045/phase-14-llm |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 14.0 | Provider abstraction in transpiler3/c/runtime/src/llm/: trait-like vtable per provider | NOT STARTED | — |
| 14.1 | OpenAI provider (libcurl + yyjson) | NOT STARTED | — |
| 14.2 | Anthropic provider | NOT STARTED | — |
| 14.3 | Google provider | NOT STARTED | — |
| 14.4 | llama.cpp local provider (linked in only when --with-llama is set; otherwise stub) | NOT STARTED | — |
| 14.5 | Replay-mode cassettes: HTTP request/response recorded under tests/transpiler3/c/cassettes/llm/; cassette layer intercepts libcurl | NOT STARTED | — |
Risks. Cassette layer integration with libcurl's multi-interface (14.5 — covered by upstream curl-impersonate's pattern; if too costly, swap to a per-call shim).
Phase 15. Datalog / logic
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Logic fixture suite (~20 cases: ancestors, reachability, magic-set, stratified negation) compiles + runs byte-equal vs vm3 on host triple |
| Targets | host triple only (cross matrix follows Phase 11 closeout) |
| Tracking | /docs/implementation/0045/phase-15-datalog |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 15.0 | Lower datalog rules to semi-naive evaluation: transpiler3/c/lower/logic.go | NOT STARTED | — |
| 15.1 | Magic-set transform for goal-directed evaluation (Bancilhon et al., PODS 1986) | NOT STARTED | — |
| 15.2 | Stratified negation (deferred sub-phase if test corpus demands) | NOT STARTED | — |
Risks. Worst-case blow-up on naive evaluation (covered by 15.1 magic-set transform).
Phase 16. Sanitiser matrix
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Full Phase 1-15 fixture corpus compiles + runs clean under ASan, UBSan, TSan, MSan, LeakSan on x86_64-linux-gnu and aarch64-darwin |
| Targets | x86_64-linux-gnu, aarch64-darwin (tier-1 cross-arch sanitisers run nightly under Phase 17) |
| Tracking | /docs/implementation/0045/phase-16-sanitisers |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 16.0 | ASan + LeakSan clean on full corpus, both hosts | NOT STARTED | — |
| 16.1 | UBSan clean (signed overflow, alignment, OOB shifts, null deref) | NOT STARTED | — |
| 16.2 | TSan clean on streams/agents corpus | NOT STARTED | — |
| 16.3 | MSan clean on host that supports it (Linux only; Apple-silicon MSan still unsupported per upstream) | NOT STARTED | — |
| 16.4 | Build profile --debug wires sanitisers; CI nightly job runs the matrix | NOT STARTED | — |
Risks. BDWGC false-positive races under TSan (upstream documents the workaround: GC_no_dls = 1 + race-suppression file).
Phase 17. Reproducibility gate
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Each release-profile fixture, rebuilt twice on two different CI hosts (Linux CI runner + macOS CI runner cross-building to a third triple), produces byte-identical binaries (SHA-256 equality) |
| Targets | all tier-1 triples |
| Tracking | /docs/implementation/0045/phase-17-reproducibility |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 17.0 | SOURCE_DATE_EPOCH honoured; __DATE__ / __TIME__ never embedded | NOT STARTED | — |
| 17.1 | -ffile-prefix-map=$PWD=. and -fdebug-prefix-map=$PWD=. strip absolute paths | NOT STARTED | — |
| 17.2 | Function/global ordering by sorted IR identifier, never by hash-map iteration order; verified by 17.5 | NOT STARTED | — |
| 17.3 | All non-libc deps static-linked; bundled toolchain pinned by SHA-256 | NOT STARTED | — |
| 17.4 | Sample artefact SHA-256 published per release tag | NOT STARTED | — |
| 17.5 | .github/workflows/transpiler3-c-repro.yml rebuilds the corpus twice and diffs SHA-256 | NOT STARTED | — |
Risks. Cosmopolitan --apex build path reproducibility (17 closeout includes APE; if upstream changes the embed signature, the gate catches it before release).
Phase 18. Performance gate
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | Median fixture wall-clock time on the BG corpus is within 2× of the equivalent Go-backend build, on x86_64-linux-gnu and aarch64-darwin |
| Targets | x86_64-linux-gnu, aarch64-darwin |
| Tracking | /docs/implementation/0045/phase-18-perf |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 18.0 | Benchmark harness: tests/transpiler3/c/bench/ with the BG kernels (sum_loop, fib_iter, hello_world, etc.) | NOT STARTED | — |
| 18.1 | Wall-clock, peak RSS, binary size (release/strip), and compile time recorded per fixture | NOT STARTED | — |
| 18.2 | Per-release report published to a static page | NOT STARTED | — |
| 18.3 | Regression alert: > 10% wall-clock regression vs previous main posts a comment on the PR | NOT STARTED | — |
Risks. 2× of Go is a soft gate; tighten or loosen on measurement (Open Question §4).
Phase 19. v1.0 release
| Field | Value |
|---|---|
| Status | NOT STARTED |
| Commit | — |
| Gate | mochi build ships on tier-1 triples with all of Phases 1-18 green; the user-facing docs/manual/build.md page documents the build flow with no caveats; release notes filed; binaries available via the standard release channel |
| Targets | all tier-1 triples |
| Tracking | /docs/implementation/0045/phase-19-release |
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 19.0 | docs/manual/build.md written; user-facing CLI help text matches | NOT STARTED | — |
| 19.1 | Release notes + changelog entry | NOT STARTED | — |
| 19.2 | Tier-1 binaries built, signed, published | NOT STARTED | — |
| 19.3 | MEP-45 status flipped to Final; this MEP file gets a closeout block dated and committed | NOT STARTED | — |
Risks. Documentation drift between this MEP and the manual page (covered by spec-in-sync rule: any change to the build surface updates both).
Open Questions
- GC choice lock-in. BDWGC vs. precise GC for v1. Default recommendation: BDWGC, swappable.
- Python FFI shape. Embedded libpython3 (heavy, fast) vs. out-of-process RPC (light, slow). Recommend both, default chosen by
importshape. - APE tier.
--apexas tier-1 or experimental. - Perf gate target. "2x of Go" too lax or too strict.
- Sanitiser matrix cadence. Per-PR or only per-merge.
- Reproducibility gate cadence. Per-PR or only release branches.
- LLM CI mode. Replay cassettes vs. live provider calls.
- MSVC tier. Direct
cl.exetier-1 or only via clang-cl. - Effect handlers. Confirm deferred to follow-up MEP (no algebraic effects in current language surface).
References
Research notes (this MEP)
Twelve notes under ~/notes/Spec/0045/:
| # | Title |
|---|---|
| 01 | Language surface |
| 02 | Design philosophy |
| 03 | Prior-art transpilers |
| 04 | Runtime building blocks |
| 05 | Codegen design |
| 06 | Type-system lowering |
| 07 | C target and portability |
| 08 | Dataset pipeline lowering |
| 09 | Streams and agents |
| 10 | Build system |
| 11 | Testing and CI gates |
| 12 | Risks and alternatives |
Standards
- ISO/IEC 9899:2024 (C23)
- ISO/IEC 9899:2018 (C17, fallback floor)
- DWARF 5
- IEEE 754-2019
- Unicode 15.1
- WASI 0.2
Papers
- Maranget, "Compiling Pattern Matching to Good Decision Trees" (ML Workshop 2008).
- Reinking, Xie, de Moura, Leijen, "Perceus: Garbage Free Reference Counting with Reuse" (PLDI 2021).
- Lorenzen, Leijen, Swierstra, "FP² Fully in-Place Functional Programming" (ICFP 2023).
- Lorenzen, Leijen, "Frame-Limited Reuse" (ICFP 2022).
- Xie, Leijen, "Generalized Evidence Passing for Effect Handlers" (ICFP 2021).
- Schuster, Brachthäuser, Ostermann, "Compiling Effect Handlers in Capability-Passing Style" (ICFP 2020).
- Sivaramakrishnan et al., "Retrofitting Effect Handlers onto OCaml" (PLDI 2021), arXiv:2104.00250.
- Bacon, Rajan, "Concurrent Cycle Collection in Reference Counted Systems" (ECOOP 2001).
- Blackburn, McKinley, "Immix: A Mark-Region Garbage Collector" (PLDI 2008).
- Tofte, Talpin, "Region-Based Memory Management" (TOPLAS 1997).
- Grossman et al., "Region-Based Memory Management in Cyclone" (PLDI 2002).
- Bancilhon, Naughton, Ramakrishnan, Sagiv, "Magic Sets" (PODS 1986).
Libraries
BDWGC (ivmai/bdwgc), MMTk (mmtk/mmtk-core), mimalloc, scudo, minicoro (edubart/minicoro), libuv, libxev, yyjson (ibireme/yyjson), libfyaml, cwisstable (google/cwisstable), utf8proc, simdutf, libcurl, ngtcp2, nghttp3, llama.cpp, QuickJS-NG, zig (ziglang/zig), Cosmopolitan (jart/cosmopolitan), wasi-sdk, emscripten.
Comparable transpilers studied
Nim, Crystal, Vala, TinyGo, Cython, ATS, Faust, MLton, Soufflé, Roc, Cone, Koka, Lean 4, Effekt, Vélus, Hare, Zig, Odin, Cosmopolitan, Bun, TigerBeetle. See note 03.
Project context
- MEP 41. Memory Safety: runtime hardening contract that the generated C inherits (PIE, RELRO, FORTIFY, CET, BTI, MTE).
- MEP 13. Algebraic Data Types and Match: the surface MEP-45's Maranget pass lowers from.
- MEP 5. Type Inference and MEP 4. Type System: the typed-AST contract MEP-45 reads.
- MEP 23. Compile-time budget: cross-language baseline benchmarks the perf gate references.
- Implementation tracking: per-phase status pages, filled in along the way.
Copyright
This document is placed in the public domain.