Skip to main content

Phase 13. LLM (generate)

FieldValue
MEPMEP-51 §Phases · Phase 13
StatusLANDED (13.0 only; live providers DEFERRED)
Started2026-05-29 20:04 (GMT+7)
Landed2026-05-29 20:11 (GMT+7)
Tracking issue(filled at ship)
Tracking PR(filled at ship)

Gate

TestPhase13LLM: 11 fixtures green on CPython 3.12.7 in transpiler3/python/build/phase13_test.go. The corpus is the C target's tests/transpiler3/c/fixtures/llm/ set copied verbatim (each fixture is a subdirectory with <name>.mochi, <name>.out renamed from expect.txt, and a cassette/ subdir of <djb2hash>.txt files). Each fixture compiles, runs python -m mochi_user_<name> with MOCHI_LLM_CASSETTE_DIR pointed at the fixture's cassette dir, and byte-compares stdout to the .out file. Coverage: all 4 providers (openai, anthropic, google, llama); default + explicit model: slot; generate as a variable initializer, inside a function body, concatenated into a string, and called multiple times. The full Phase 1-13 regression (go test ./transpiler3/python/... -count=1) finishes in 42.8s with zero regressions.

Goal-alignment audit

Mochi's generate <provider> { ... } is the LLM call surface. For the Python target, the load-bearing v1 use case is reproducible CI: fixtures that pin a prompt to a recorded response, replayed on every go test run, never hitting the network. The C target already established this contract (cassette directory, DJB2-hashed <provider>\0<model>\0<prompt> filenames, trailing-newline strip on read); the Python target's job in Phase 13.0 is to match that contract byte-for-byte so existing recorded cassettes are portable across targets. Landing 13.0 unblocks: (1) the documented "summarise this notebook cell with Claude / GPT and feed it to a Mochi pipeline" Jupyter story; (2) cross-target test parity for any program that uses generate.

Live providers (OpenAI 1.50+ SDK, Anthropic 0.40+ SDK, google-generativeai 0.5+, llama-cpp-python local) are deferred to Phase 13.1+. The deferral is intentional: live calls in CI are flaky, expensive, and gate on API keys in the secret store, none of which materially improve the user-facing surface. The dispatch shape (mochi_llm_generate(provider, model, prompt) -> str) is provider-pluggable: a Phase 13.1 patch adds branches on provider inside the same helper without touching the lower or the call site emit.

Streaming (AsyncIterator[MochiToken]) is deferred to Phase 11.1 (async colour pass) plus Phase 13.2; the v1 corpus has no streaming fixture, and synchronous full-response calls cover every program in the corpus.

Sub-phases

#ScopeStatusCommit
13.0generate <provider> { model: ..., prompt: ... } lowers to mochi_llm_generate(provider, model, prompt) against mochi_runtime.llm; cassette-mode replay via MOCHI_LLM_CASSETTE_DIR; DJB2 hash matches C runtime byte-for-byteLANDED 2026-05-29(filled at ship)
13.1OpenAI live provider via the openai SDK 1.50+; key from OPENAI_API_KEY; cassette write-through under MOCHI_LLM_CASSETTE_RECORDDEFERRED--
13.2Anthropic, Google, llama-cpp-python live providersDEFERRED--
13.3Streaming AsyncIterator[MochiToken] surface; rides on the Phase 11.1 async colour passDEFERRED--
13.4Pluggable cassette codecs (gzip, lz4) for large recorded responsesDEFERRED--

Sub-phase 13.0 -- cassette-mode LLM via DJB2-keyed replay

Goal-alignment audit (13.0)

A Mochi program that calls generate openai { prompt: "..." } should run on the Python target without changes, deterministically, in CI, against a checked-in cassette. The C target made this possible by hashing the inputs and reading a flat file; the Python target needs the same hash and the same lookup so the cassette files port across targets. The 11 fixtures in tests/transpiler3/c/fixtures/llm/ are the spec: they were checked in on the C side with hash-keyed filenames, and the Python target reads them as-is. If the Python hash drifted by a byte, every cassette would miss and stdout would be empty.

Decisions made (13.0)

DJB2 with NUL field separators, byte-equivalent to the C runtime. The C runtime's llm_hash_key(provider, model, prompt) is h=5381; for each byte: h = (h*33) ^ byte; h = (h*33) ^ 0; between fields. The Python port uses Python ints masked to (1<<64)-1 after every multiply and XOR so the wrap-around matches uint64_t. The separator step (h = (h*33) ^ 0) is left as h = (h*33) since ^ 0 is identity; the multiply is the load-bearing operation that prevents ("a","bc") from colliding with ("ab","c"). Validated against 11 known cassette filenames produced by the C target: 11/11 match.

Filename format <key>.txt, decimal uint64. The C snprintf("%llu", key) writes decimal; Python f"{key}" writes decimal. Hex would have been a more compact filename but would have created a cross-target portability gap for no benefit. Single trailing newline stripped on read so cassette files can be edited in any editor that auto-appends \n.

Cassette dir resolution via MOCHI_LLM_CASSETTE_DIR env var. The runtime helper reads the env once per call. The test runner (runPythonFixture in build_test.go) sets the env to the fixture's cassette/ directory just before python -m <pkg> starts, so each fixture sees only its own cassette set; no global state, no test interleaving risk. Live mode (no env set) prints the same cassette not found shape to stderr that the C runtime does and returns "", so a missing-cassette program still completes rather than aborting.

Module emit imports mochi_llm_generate lazily. The lowerer tracks needsLLM bool; only programs that contain a generate expression pull in from mochi_runtime.llm import mochi_llm_generate. The hello-world fixture stays import-free.

provider is a string literal at the call site, not a runtime variable. The Mochi surface syntax generate openai { ... } makes the provider a compile-time token; the c aotir captures it as LLMGenerateExpr.Provider string. The Python lowerer emits it as a pysrc.StrLit. A future "dynamic provider dispatch" surface would need a new IR node and is out of scope for Phase 13.0.

model defaults to empty string in the IR. The c lower already substitutes StringLit("") when the source omits model:, so the Python lower does not have a separate default path. Cassette filenames recorded with empty model differ from those with explicit model (the hash diverges after the first NUL), which matches the recorded fixtures.

Cassette directory not copied into the build output. The build cache key is over the .mochi bytes + Python toolchain version + mep51-phase13 marker. Adding the cassette dir to the hash would force a rebuild on every cassette edit, which is exactly the wrong default: the user is iterating on the recording, not on the program. The runtime reads the dir at execution time from the env var the test harness supplies.

Fixture corpus (11 fixtures, ported from the C target)

tests/transpiler3/python/fixtures/phase13-llm/:

FixtureProviderModelSurface
generate_textopenai(default)let r = generate openai { prompt: "Say hello." }; print bare
generate_anthropicanthropic(default)Count to 3
generate_anthropic_modelanthropicclaude-3-haiku-20240307Repeat: hello world
generate_concatopenai(default)"Answer: " + r
generate_googlegoogle(default)Add 1 and 1
generate_in_funopenai(default)generate inside a user function body
generate_in_varopenai(default)let s = "Sky color: " + r
generate_llamallama(default)Hello via local llama
generate_multipleopenai(default)Two generate calls in one program (two cassette files)
generate_openai_modelopenaigpt-4oSquare root of 16
generate_with_modelopenaigpt-4o-miniSay hi

Each subdirectory ships <name>.mochi, <name>.out, and cassette/<djb2>.txt. TestPhase13LLM walks the directory, runs runPythonFixture which detects the cassette dir alongside the .mochi and sets MOCHI_LLM_CASSETTE_DIR before invoking python -m <pkg>. All 11 fixtures pass on CPython 3.12.7.

Files changed

FilePurpose
runtime/python/mochi_runtime/llm.py (new)mochi_llm_generate(provider, model, prompt) -> str; _llm_hash_key DJB2 port; cassette read with trailing-newline strip
transpiler3/python/lower/llm.go (new)lowerLLMGenerateExpr emits mochi_llm_generate(provider_str, model_expr, prompt_expr) and sets needsLLM
transpiler3/python/lower/lower.goneedsLLM bool slot; dispatch *aotir.LLMGenerateExpr; conditional from mochi_runtime.llm import mochi_llm_generate import
transpiler3/python/build/build.goCache marker bumped mep51-phase12 -> mep51-phase13
transpiler3/python/build/build_test.gorunPythonFixture sets MOCHI_LLM_CASSETTE_DIR=<fixture>/cassette when that dir exists
transpiler3/python/build/phase13_test.go (new)TestPhase13LLM walks phase13-llm/ subdirectories
tests/transpiler3/python/fixtures/phase13-llm/ (new)11 fixture subdirs ported from the C target

Deferred work

  • 13.1 OpenAI live provider. Add a provider == "openai" branch in mochi_llm_generate that calls the official openai SDK 1.50+ with OPENAI_API_KEY. Honour MOCHI_LLM_CASSETTE_RECORD for write-through recording. Deferred because cassette-mode covers the load-bearing CI case; live calls add cost and flakiness without surfacing new programs.
  • 13.2 Anthropic, Google, llama-cpp-python live providers. Same pattern, one branch each. Deferred for the same reason.
  • 13.3 Streaming AsyncIterator[MochiToken]. Rides on the Phase 11.1 async colour pass and a streaming-cassette format. No v1 fixture exercises streaming.
  • 13.4 Pluggable cassette codecs. Some real-world recorded responses can be megabytes; gzip / lz4 transparent decode would keep the cassette dir compact. Deferred until a corpus fixture justifies it.
  • Hash collision detection. Two different (provider, model, prompt) triples that DJB2-collide would silently map to the same cassette. The collision space is astronomical (DJB2 is 64-bit) but a Phase 13.x audit could record the full key string in the cassette body alongside the response and verify on read.