Phase 13. LLM (generate)
| Field | Value |
|---|---|
| MEP | MEP-51 §Phases · Phase 13 |
| Status | LANDED (13.0 only; live providers DEFERRED) |
| Started | 2026-05-29 20:04 (GMT+7) |
| Landed | 2026-05-29 20:11 (GMT+7) |
| Tracking issue | (filled at ship) |
| Tracking PR | (filled at ship) |
Gate
TestPhase13LLM: 11 fixtures green on CPython 3.12.7 in transpiler3/python/build/phase13_test.go. The corpus is the C target's tests/transpiler3/c/fixtures/llm/ set copied verbatim (each fixture is a subdirectory with <name>.mochi, <name>.out renamed from expect.txt, and a cassette/ subdir of <djb2hash>.txt files). Each fixture compiles, runs python -m mochi_user_<name> with MOCHI_LLM_CASSETTE_DIR pointed at the fixture's cassette dir, and byte-compares stdout to the .out file. Coverage: all 4 providers (openai, anthropic, google, llama); default + explicit model: slot; generate as a variable initializer, inside a function body, concatenated into a string, and called multiple times. The full Phase 1-13 regression (go test ./transpiler3/python/... -count=1) finishes in 42.8s with zero regressions.
Goal-alignment audit
Mochi's generate <provider> { ... } is the LLM call surface. For the Python target, the load-bearing v1 use case is reproducible CI: fixtures that pin a prompt to a recorded response, replayed on every go test run, never hitting the network. The C target already established this contract (cassette directory, DJB2-hashed <provider>\0<model>\0<prompt> filenames, trailing-newline strip on read); the Python target's job in Phase 13.0 is to match that contract byte-for-byte so existing recorded cassettes are portable across targets. Landing 13.0 unblocks: (1) the documented "summarise this notebook cell with Claude / GPT and feed it to a Mochi pipeline" Jupyter story; (2) cross-target test parity for any program that uses generate.
Live providers (OpenAI 1.50+ SDK, Anthropic 0.40+ SDK, google-generativeai 0.5+, llama-cpp-python local) are deferred to Phase 13.1+. The deferral is intentional: live calls in CI are flaky, expensive, and gate on API keys in the secret store, none of which materially improve the user-facing surface. The dispatch shape (mochi_llm_generate(provider, model, prompt) -> str) is provider-pluggable: a Phase 13.1 patch adds branches on provider inside the same helper without touching the lower or the call site emit.
Streaming (AsyncIterator[MochiToken]) is deferred to Phase 11.1 (async colour pass) plus Phase 13.2; the v1 corpus has no streaming fixture, and synchronous full-response calls cover every program in the corpus.
Sub-phases
| # | Scope | Status | Commit |
|---|---|---|---|
| 13.0 | generate <provider> { model: ..., prompt: ... } lowers to mochi_llm_generate(provider, model, prompt) against mochi_runtime.llm; cassette-mode replay via MOCHI_LLM_CASSETTE_DIR; DJB2 hash matches C runtime byte-for-byte | LANDED 2026-05-29 | (filled at ship) |
| 13.1 | OpenAI live provider via the openai SDK 1.50+; key from OPENAI_API_KEY; cassette write-through under MOCHI_LLM_CASSETTE_RECORD | DEFERRED | -- |
| 13.2 | Anthropic, Google, llama-cpp-python live providers | DEFERRED | -- |
| 13.3 | Streaming AsyncIterator[MochiToken] surface; rides on the Phase 11.1 async colour pass | DEFERRED | -- |
| 13.4 | Pluggable cassette codecs (gzip, lz4) for large recorded responses | DEFERRED | -- |
Sub-phase 13.0 -- cassette-mode LLM via DJB2-keyed replay
Goal-alignment audit (13.0)
A Mochi program that calls generate openai { prompt: "..." } should run on the Python target without changes, deterministically, in CI, against a checked-in cassette. The C target made this possible by hashing the inputs and reading a flat file; the Python target needs the same hash and the same lookup so the cassette files port across targets. The 11 fixtures in tests/transpiler3/c/fixtures/llm/ are the spec: they were checked in on the C side with hash-keyed filenames, and the Python target reads them as-is. If the Python hash drifted by a byte, every cassette would miss and stdout would be empty.
Decisions made (13.0)
DJB2 with NUL field separators, byte-equivalent to the C runtime. The C runtime's llm_hash_key(provider, model, prompt) is h=5381; for each byte: h = (h*33) ^ byte; h = (h*33) ^ 0; between fields. The Python port uses Python ints masked to (1<<64)-1 after every multiply and XOR so the wrap-around matches uint64_t. The separator step (h = (h*33) ^ 0) is left as h = (h*33) since ^ 0 is identity; the multiply is the load-bearing operation that prevents ("a","bc") from colliding with ("ab","c"). Validated against 11 known cassette filenames produced by the C target: 11/11 match.
Filename format <key>.txt, decimal uint64. The C snprintf("%llu", key) writes decimal; Python f"{key}" writes decimal. Hex would have been a more compact filename but would have created a cross-target portability gap for no benefit. Single trailing newline stripped on read so cassette files can be edited in any editor that auto-appends \n.
Cassette dir resolution via MOCHI_LLM_CASSETTE_DIR env var. The runtime helper reads the env once per call. The test runner (runPythonFixture in build_test.go) sets the env to the fixture's cassette/ directory just before python -m <pkg> starts, so each fixture sees only its own cassette set; no global state, no test interleaving risk. Live mode (no env set) prints the same cassette not found shape to stderr that the C runtime does and returns "", so a missing-cassette program still completes rather than aborting.
Module emit imports mochi_llm_generate lazily. The lowerer tracks needsLLM bool; only programs that contain a generate expression pull in from mochi_runtime.llm import mochi_llm_generate. The hello-world fixture stays import-free.
provider is a string literal at the call site, not a runtime variable. The Mochi surface syntax generate openai { ... } makes the provider a compile-time token; the c aotir captures it as LLMGenerateExpr.Provider string. The Python lowerer emits it as a pysrc.StrLit. A future "dynamic provider dispatch" surface would need a new IR node and is out of scope for Phase 13.0.
model defaults to empty string in the IR. The c lower already substitutes StringLit("") when the source omits model:, so the Python lower does not have a separate default path. Cassette filenames recorded with empty model differ from those with explicit model (the hash diverges after the first NUL), which matches the recorded fixtures.
Cassette directory not copied into the build output. The build cache key is over the .mochi bytes + Python toolchain version + mep51-phase13 marker. Adding the cassette dir to the hash would force a rebuild on every cassette edit, which is exactly the wrong default: the user is iterating on the recording, not on the program. The runtime reads the dir at execution time from the env var the test harness supplies.
Fixture corpus (11 fixtures, ported from the C target)
tests/transpiler3/python/fixtures/phase13-llm/:
| Fixture | Provider | Model | Surface |
|---|---|---|---|
generate_text | openai | (default) | let r = generate openai { prompt: "Say hello." }; print bare |
generate_anthropic | anthropic | (default) | Count to 3 |
generate_anthropic_model | anthropic | claude-3-haiku-20240307 | Repeat: hello world |
generate_concat | openai | (default) | "Answer: " + r |
generate_google | (default) | Add 1 and 1 | |
generate_in_fun | openai | (default) | generate inside a user function body |
generate_in_var | openai | (default) | let s = "Sky color: " + r |
generate_llama | llama | (default) | Hello via local llama |
generate_multiple | openai | (default) | Two generate calls in one program (two cassette files) |
generate_openai_model | openai | gpt-4o | Square root of 16 |
generate_with_model | openai | gpt-4o-mini | Say hi |
Each subdirectory ships <name>.mochi, <name>.out, and cassette/<djb2>.txt. TestPhase13LLM walks the directory, runs runPythonFixture which detects the cassette dir alongside the .mochi and sets MOCHI_LLM_CASSETTE_DIR before invoking python -m <pkg>. All 11 fixtures pass on CPython 3.12.7.
Files changed
| File | Purpose |
|---|---|
runtime/python/mochi_runtime/llm.py (new) | mochi_llm_generate(provider, model, prompt) -> str; _llm_hash_key DJB2 port; cassette read with trailing-newline strip |
transpiler3/python/lower/llm.go (new) | lowerLLMGenerateExpr emits mochi_llm_generate(provider_str, model_expr, prompt_expr) and sets needsLLM |
transpiler3/python/lower/lower.go | needsLLM bool slot; dispatch *aotir.LLMGenerateExpr; conditional from mochi_runtime.llm import mochi_llm_generate import |
transpiler3/python/build/build.go | Cache marker bumped mep51-phase12 -> mep51-phase13 |
transpiler3/python/build/build_test.go | runPythonFixture sets MOCHI_LLM_CASSETTE_DIR=<fixture>/cassette when that dir exists |
transpiler3/python/build/phase13_test.go (new) | TestPhase13LLM walks phase13-llm/ subdirectories |
tests/transpiler3/python/fixtures/phase13-llm/ (new) | 11 fixture subdirs ported from the C target |
Deferred work
- 13.1 OpenAI live provider. Add a
provider == "openai"branch inmochi_llm_generatethat calls the officialopenaiSDK 1.50+ withOPENAI_API_KEY. HonourMOCHI_LLM_CASSETTE_RECORDfor write-through recording. Deferred because cassette-mode covers the load-bearing CI case; live calls add cost and flakiness without surfacing new programs. - 13.2 Anthropic, Google, llama-cpp-python live providers. Same pattern, one branch each. Deferred for the same reason.
- 13.3 Streaming
AsyncIterator[MochiToken]. Rides on the Phase 11.1 async colour pass and a streaming-cassette format. No v1 fixture exercises streaming. - 13.4 Pluggable cassette codecs. Some real-world recorded responses can be megabytes; gzip / lz4 transparent decode would keep the cassette dir compact. Deferred until a corpus fixture justifies it.
- Hash collision detection. Two different (provider, model, prompt) triples that DJB2-collide would silently map to the same cassette. The collision space is astronomical (DJB2 is 64-bit) but a Phase 13.x audit could record the full key string in the cassette body alongside the response and verify on read.