MEP-45 research note 07, The C target and portability
Author: research pass for MEP-45. Date: 2026-05-22 (GMT+7).
This note covers the output side of the transpiler. What flavour of
C we emit, what compilers must accept it, which architectures and OSes
get binary artefacts, what hardening is on by default, and what style
rules the generated code follows so that a reviewer (or a tool like
clang-tidy) sees consistent output.
1. C standard
Target: C23 (ISO/IEC 9899:2024), with a clean fallback to C17 (ISO/IEC 9899:2018) when a target compiler is too old. The hard floor is C11.
Rationale:
- C23 is the first ISO C with
nullptr,constexpr,typeof,[[attributes]],_BitInt(N), the<stdckdint.h>checked-arithmetic header, the<stdbit.h>bit-manipulation header,#embed, and an unambiguous two's-complement mandate. Every one of these maps to something the transpiler wants to emit:nullptr-> the obvious literal for?Tlowering.constexpr-> emit compile-time constants forlet x: int = 42at file scope withoutstatic constshenanigans.typeof-> macros formochi_swap(a, b)and the generic helpers without_Genericselection.[[fallthrough]],[[maybe_unused]],[[nodiscard]]-> emit cleanly without compiler-specific pragmas._BitInt(N)-> a future arbitrary-precision integer story.<stdckdint.h>ckd_add/sub/mul-> integer overflow checks in debug mode without home-grown intrinsics.<stdbit.h>stdc_count_onesetc -> bit ops without compiler intrinsics.#embed-> a no-fuss way to bake the standard library's.mochisources into the binary.- Two's complement guarantee -> we can rely on bit-level integer semantics without per-platform conditionals.
The C17 fallback uses, in order: compiler-specific equivalents
(__typeof__, __attribute__, __builtin_*), then home-grown
intrinsics in mochi/core/compat.h. The transpiler picks the floor
based on the detected compiler, not the user (so the user gets the
best output their compiler can take).
2. Compiler matrix
| Compiler | Version floor | C23 status (2026) | Notes |
|---|---|---|---|
| clang | 18.0 | Most of C23 in -std=c23. Missing: full #embed in some 18.x; complete in 19+ | Default driver for mochi build; clang is what zig cc ships. |
| gcc | 14.0 | Most of C23 in -std=c23. #embed since 15. | Default on most Linux distros by 2026. |
| msvc | 19.40 (VS 2022 17.10) | Partial. nullptr, constexpr, attributes mostly there. _BitInt no. | We compile through clang-cl on Windows in the supported config; cl.exe direct is best-effort. |
| zig cc | 0.16 | Whatever the bundled clang has; 0.16 carries clang 19+. | The cross-compilation driver we use. |
| cosmocc | 4.x | Bundled clang or gcc; whichever Justine ships. | Used for the APE build. |
| tcc | mob | C11 + extensions; no C23. | --ffast-build fallback for mochi run-style quick iteration. Not for releases. |
| chibicc, cproc | various | C11 baseline | Reference targets to keep the generated C un-clever. |
Concretely: if the generated C compiles under gcc -std=c17 -Wall -Wextra -Wpedantic it must also compile under clang -std=c23 and zig cc -std=c23. The CI matrix bakes this in.
3. Architectures and OSes
Tier 1 (gated on every PR)
x86_64-linux-gnu(glibc),x86_64-linux-muslaarch64-linux-gnu,aarch64-linux-muslaarch64-darwin(macOS 14+ on Apple silicon)x86_64-darwin(macOS 13+; still alive in 2026 for a while)x86_64-windows-msvc(clang-cl),x86_64-windows-gnu(mingw via zig cc)wasm32-wasi(WASI 0.2)
Tier 2 (nightly)
aarch64-windows-msvcriscv64-linux-gnuarmv7-linux-gnueabihfx86_64-freebsd,x86_64-openbsd,x86_64-netbsdwasm32-emscripten(browser playground)
Tier 3 (best effort, no gate)
aarch64-android,x86_64-androidaarch64-ios,aarch64-ios-simloongarch64-linux-gnus390x-linux-gnu,powerpc64le-linux-gnu
4. ABI and calling conventions
The transpiler emits standard system V, AAPCS64, Windows x64, or RISC-V LP64D depending on target, by virtue of using the system C compiler. We do not pick a custom calling convention. Closure pointers follow the host pointer width.
For variadic-style internal helpers we never use C variadics (...)
because of the AAPCS64 surprise on Apple silicon (variadic args go on
the stack, not in registers). Instead, we emit explicit overloads or
wrap the args in a struct.
Pointer width
mochi_int is always int64_t regardless of host pointer width.
mochi_uint is uint64_t. We do not use intptr_t semantics for
language integers. Pointers stay opaque.
Endianness
We assume little-endian for the BG corpus byte-equality fixtures; on big-endian targets the runtime byteswaps on JSON and on the binary fixture compare. The codegen does not generate endian-specific code.
Alignment
Records align to the largest field's natural alignment. mochi_list__T
data buffers align to 16 bytes (SIMD-friendly, fits cwisstable's
contract).
5. libc matrix
| libc | Properties | Notes |
|---|---|---|
| glibc 2.38+ | full POSIX, GNU extensions | Linux desktop and server default. |
| musl 1.2.5+ | small, static-friendly | Alpine, embedded; lacks some GNU bits. |
| Bionic | Android | Different sigaction, dlfcn quirks. |
| Apple libSystem | macOS | Variadic ABI quirk; mostly POSIX. |
| ucrt / Universal C Runtime | Windows | clang-cl + ucrt is our path. |
| WASI libc | wasi-libc | Roughly POSIX-shaped; no signals, no fork. |
| Cosmopolitan libc | Justine's portable libc | Mostly POSIX-compatible. |
The runtime's portability shim mochi/core/sys.h smooths over:
clock_gettimevs.mach_absolute_timevs.QueryPerformanceCounter.pthread_*vs. Windows threads (via a thin wrapper; we do not use C11 threads because MSVC ucrt is patchy).mmapvs.VirtualAlloc.dlopen/dlsymvs.LoadLibraryW/GetProcAddress.- File path handling: UTF-8 in, the runtime widens to UTF-16 on Windows.
6. Sanitisers and dynamic checks
| Sanitiser | Compiler | Use |
|---|---|---|
| AddressSanitizer | clang, gcc, msvc | mandatory in CI matrix |
| UndefinedBehaviorSanitizer | clang, gcc | mandatory; catches signed overflow, OOB, alignment |
| ThreadSanitizer | clang, gcc | nightly, with scheduler-stress fixtures |
| MemorySanitizer | clang | nightly, finds uninit reads |
| LeakSanitizer | clang, gcc | mandatory; rules out runtime leaks under BDWGC false retention thresholds |
| ControlFlowIntegrity | clang | --secure profile only |
| SafeStack | clang | --secure profile only |
| ShadowCallStack | clang | aarch64 only |
The transpiler's output is sanitiser-friendly: no pointer
arithmetic across array boundaries, no union punning across struct
boundaries, no implicit integer conversions that lose precision (the
codegen always emits explicit casts), no memcpy of overlapping
ranges (the codegen uses memmove when overlap is possible).
7. Reproducibility
SOURCE_DATE_EPOCHhonoured throughout: the codegen never embeds__DATE__/__TIME__; the build driver passes-D__DATE__=\"redacted\"-D__TIME__=\"redacted\".-ffile-prefix-map=$PWD=.,-fdebug-prefix-map=$PWD=.strip absolute paths.- Static-link everything except libc; the libc itself is the only per-system varying piece.
- The codegen orders functions and globals by sorted MIR identifier, not by hash-map iteration order.
- The build driver sorts source files before passing to the C compiler
to keep
-fltocache hits stable. - Sample artefact digest: we publish SHA-256 of the produced binary for each tier-1 target on every release.
8. Hardening defaults
-fstack-protector-strong-D_FORTIFY_SOURCE=3(glibc 2.34+; falls back to2on older)-fPIEfor executables,-fPICfor shared libs-Wl,-z,relro,-z,nowon Linux-Wl,--icf=safefor dead-code merging (clang/gold/lld)/guard:cfon MSVC-Wl,-pie,-Wl,--dynamicbase,-Wl,--nxcompaton Windows-fcf-protection=fullon x86 CET-capable;-mbranch-protection=standardon aarch64 (PAC + BTI)- Stripped debug info into a
.dSYM/.debugsidecar; the main binary is small and stable.
Profiles:
--debug: sanitisers on, no LTO,-O0 -g3, asserts on.--dev(default): UBSan on,-O1 -g, asserts on.--release: LTO on,-O2, asserts off, debug info to sidecar.--secure: release + CFI + SafeStack + heap quarantine (mimalloc-secure or scudo).--small:-Os, no LTO, strip aggressively (kiosk / embedded).
9. Style guide for emitted C
The generated C is meant to be human-readable. Rules:
- Two-space indent. No tabs. (LLVM style, easier to diff.)
- One declaration per line.
int x = 1; int y = 2;is two lines. - Always braces on
if/else/for/while, including single-stmt. gotoonly for the cleanup-cascade pattern (goto err_close;).- No macros except in
mochi/core/*.hruntime headers. Generated code uses inline functions. - No
static inlinein.cfiles; only in headers. - Every function gets a leading source-comment with the Mochi source
span:
// mochi: foo.mochi:42:1-46:1 fn add. #linedirectives on every emitted statement so that gdb'slistshows Mochi source.- Identifier mangling per note 05 §3; no two emitted identifiers collide.
- No reserved-identifier reuse: never
_Foo, never__bar, never identifiers in the implementation's namespace. - The first lines of every generated
.cfile are:/* Generated by mochi {version} for target {triple}.Do not edit. */#include "mochi/core.h"#include "{module}.h" - Header guards use
MOCHI__{pkg}__{module}__H_form. - No trigraphs, no digraphs, no
<iso646.h>-style identifiers. - No K&R declarations. ANSI prototypes only.
- Integer literals carry the right suffix:
42L,42LL,42U,42ULLas needed; never rely on default-int rules.
10. Diagnostics
The C compiler's diagnostics are almost never what the user wants to see. Two halves of the design:
- The transpiler runs its own type-checker first; nearly all errors surface there with proper Mochi spans and good messages.
- For errors that escape (rare: macro expansions, header-only library conflicts, linker errors), the build driver post-processes the C compiler's output and rewrites file:line refs back to Mochi spans using a side-map generated at codegen time.
The driver also collapses include-cascade noise (In file included from ... In file included from ...) into a single line. Inspired by
Rust's diagnostic UX.
11. Test gate per target
For tier 1, every PR runs:
- Build the runtime + amalgamated
libmochi.aunder each compiler. - Build every fixture under
tests/cross-aot/bg/<target>/. - Run every binary under each tier-1 target where the host can run
it (
qemu-user-staticfor cross-arch on Linux CI). - Byte-equal stdout vs. the VM-recorded
expectfiles. - Sanitiser pass on a curated subset (the full corpus is heavy).
- Diff-stat the generated C against the previous release commit; large regressions on identical source flagged for review.
12. Open questions for the MEP body
- Whether to require C23 or accept C17 fallback as supported forever. Cost of supporting C17 is per-feature compat shims.
- Whether MSVC direct (not via clang-cl) is tier 1 or tier 2.
- Whether
wasm32-wasigets the same test gate as native or a slimmer one (no fork/exec). - How aggressively to apply hardening on the
--releaseprofile; stack canaries cost throughput. - Whether the
--secureprofile is a CI gate (it currently is not).
These need MEP-body resolutions.