Skip to main content

MEP 56. Mochi-to-Ruby transpiler

FieldValue
MEP56
TitleMochi-to-Ruby transpiler
AuthorMochi core
StatusActive
TypeStandards Track
Created2026-05-29 12:01 (GMT+7)
DependsMEP-4 (Type System), MEP-5 (Type Inference), MEP-13 (ADTs and Match), MEP-45 (C transpiler, IR reuse), MEP-46 (BEAM transpiler, IR reuse), MEP-47 (JVM transpiler, IR reuse), MEP-48 (.NET transpiler, IR reuse)
Research
Tracking/docs/implementation/0056/

Abstract

Mochi today ships vm3 (mochi run), a C transpiler producing native single-file binaries (MEP-45), a BEAM transpiler producing supervised concurrent runtimes (MEP-46), a JVM transpiler producing Maven-Central-interoperable jars (MEP-47), and a .NET transpiler producing NuGet-interoperable assemblies (MEP-48). None of these targets the Ruby ecosystem: 180,000+ gems on RubyGems.org with Rails, Sinatra, Sequel, Sidekiq, Capybara, Pry, RSpec, Bundler, and the entire Ruby on Rails web stack; Jupyter via IRuby; mruby for embedded; Tebako for single-file packaging; TruffleRuby for native-image AOT. MEP-56 specifies a fifth transpiler pipeline that lowers a type-checked Mochi program to Ruby source emitted via a structural rtree AST, runnable on CRuby 3.2+, JRuby 10, TruffleRuby 33, and (with a Mochi subset) mruby 4.

The pipeline reuses MEP-45's typed-AST and aotir IR, plus the monomorphisation, match-to-decision-tree, and closure-conversion passes shared with MEP-46, MEP-47, and MEP-48. It forks at the emit stage: instead of emitting ISO C23 (MEP-45), Core Erlang via cerl (MEP-46), Java source via JavaPoet (MEP-47), or C# source via Roslyn (MEP-48), it emits Ruby source as rtree.SourceFile trees, then renders to disk with 2-space indent and a # frozen_string_literal: true magic comment.

Seven packaging targets ship together: --target=ruby-source (default for mochi build --lang=ruby) writes a single .rb next to the runtime library on $LOAD_PATH; --target=ruby-gem produces a RubyGems .gemspec plus lib/ layout, buildable with gem build; --target=ruby-bundle produces a Bundler-managed directory (Gemfile + script) runnable with bundle exec; --target=iruby-kernel produces a Jupyter notebook (.ipynb) targeting the IRuby kernel; --target=tebako produces a Tebako packing layout plus a press.sh driving tebako press for a single-file native binary; --target=truffle-native produces a TruffleRuby native-image build layout plus a build script; --target=mruby produces an mruby compile layout (script + build_config.rb + build script) for embedded use.

The intended master correctness gate is byte-equal stdout from the produced .rb (or built binary) versus vm3 on the entire fixture corpus, across CRuby 3.2 LTS, CRuby 3.4, CRuby 4.0, JRuby 10, and TruffleRuby 33. CRuby 3.2 LTS and 3.4 are CI-gated (blocking ubuntu + non-blocking macos matrix); CRuby 4.0 (Homebrew) is verified locally. JRuby 10, TruffleRuby 33, and mruby 4 land as N.1, N.2, N.3 sub-phases per the umbrella-phase coverage rule. See /docs/implementation/0056/ for the live status. vm3 is the recording oracle for expect.txt; the transpiler does not link against or depend on vm3.

Five load-bearing decisions:

  1. Ruby source via a structural rtree AST, not raw string emission. The default emit path constructs rtree.SourceFile trees, then calls SourceFile.RubySource() to render. The structural representation gives free indentation handling, avoids whitespace bugs (a class of issue that plagues string-template-based emitters), supports peephole optimisation passes, and produces debuggable output that a Rubyist can read line-by-line. The frozen_string_literal: true magic comment is emitted on every file so string literals are interned at parse time (matching Ruby community convention since 3.0).

  2. CRuby 3.2 minimum, CRuby 4.0 preferred. Ruby 3.2 (December 2022) is the floor: Data.define (immutable value classes, the canonical lowering target for Mochi records and sum variants), case/in pattern matching (the canonical lowering target for Mochi match), Hash insertion-order preservation (canonical lowering target for omap), Set autoload (no require "set" needed), and YJIT GA all land here. Ruby 4.0 (April 2026) adds: refined YJIT performance, autoloaded Set becomes part of stdlib core, M:N threading scheduler. The build emits with the magic comment # frozen_string_literal: true for compatibility; CI gates against CRuby 3.2 LTS (blocking) and 3.4 (blocking ubuntu, non-blocking macos); CRuby 4.0 is verified locally via Homebrew. Ruby 3.1 and earlier rejected (no Data.define).

  3. Thread::SizedQueue + Mutex for streams; ordinary methods for agents; no Ractor. Mochi streams lower to a Mochi::Runtime::Stream (bounded MPMC broadcast). Each subscriber gets its own Thread::SizedQueue; emit blocks only when the slowest live subscriber's queue is full. subscribe_limit(s, N) returns a LimitedQueue that silently drops on push once size ≥ N (for back-pressure-tolerant consumers). Mochi agents lower to ordinary Ruby classes; an agent's state is instance variables; intent calls are method calls. Ractor (Ruby's actor primitive) is explicitly rejected as a runtime default because of its strict object-sharing rules (no closures over mutable state, no shared instance vars) that would force a lossy semantic translation. Users can adopt Ractor manually via FFI.

  4. Reuse MEP-45's aotir IR. The IR is target-agnostic; monomorphisation, match-to-decision-tree, and closure-conversion passes run once and feed five backends. The fork is at the emit pass: transpiler3/ruby/lower/ lowers aotir to Ruby-source structural nodes. Sharing the IR keeps the five targets semantically aligned and amortises pass-implementation work.

  5. Stdlib as fat runtime; mochi-runtime as thin runtime. The mochi-runtime gem provides only what stdlib does not: Mochi::Runtime::Stream (broadcast MPMC over Thread::SizedQueue), Mochi::Runtime::LimitedQueue (drop-on-full subscriber), Mochi::Runtime::Panic (typed exception carrying an int code), and Mochi::Runtime::IO.putln (deterministic float formatting). Everything else (HTTP via net/http, JSON via json, CSV via csv, file I/O, regex, time, locale) goes through stdlib directly. ActiveSupport, Sequel, Sinatra, and other heavyweight gems are explicitly rejected as runtime deps.

The gate for each delivery phase is empirical: every Mochi source file in tests/transpiler3/ruby/fixtures/ must compile via the Ruby pipeline and produce stdout that diffs clean against the expect.txt recorded by vm3. ruby -c clean on emitted code is the secondary gate. gem build cleanly producing a .gem (for the gem target), bundle install --local resolving (for the bundle target), jupyter nbformat validation (for the kernel target), and bash -n clean (for the Tebako / TruffleNative / MRuby build scripts) are the tertiary gates.

Motivation

Mochi today targets vm3 (mochi run), the C target (MEP-45), the BEAM target (MEP-46), the JVM target (MEP-47), and the .NET target (MEP-48). None deliver what Ruby uniquely provides:

  1. RubyGems and Bundler. As of 2026-05, RubyGems.org hosts 180,000+ gems with deep web (Rails, Sinatra, Roda, Hanami), background work (Sidekiq, GoodJob), persistence (Sequel, ActiveRecord, ROM), testing (RSpec, Minitest, Capybara), API (Grape, dry-rb), HTTP client (Faraday, HTTParty), and CLI tooling (Thor, GLI) ecosystems. A Mochi program needing Rails view rendering, Devise auth, or sidekiq job processing can import an existing gem with a Gemfile and Gemfile.lock SHA-pinned lockfile.

  2. Convention over configuration culture. Ruby community conventions (snake_case methods, PascalCase classes, frozen string literals, 2-space indent, do...end for multi-line blocks, {...} for one-liners) are well-established and largely undisputed. The transpiler emits idiomatic Ruby a Rubyist can step through under pry or byebug without reverse-engineering generated naming.

  3. Data.define (Ruby 3.2+). Immutable value classes with structural equality, pattern-matching deconstruction (#deconstruct and #deconstruct_keys auto-generated), and a lightweight syntax (Person = Data.define(:name, :age)). The canonical lowering target for Mochi records and sum-type variants. No equivalent existed pre-3.2; Struct was the closest but is mutable.

  4. case/in pattern matching (Ruby 3.2 GA). Mochi match expr { Variant(field) => ... } lowers one-to-one to Ruby case expr in Variant(field) then ... end. Deconstruction, find-pattern ([*, 42, *]), hash patterns with rest ({name:, **rest}), and pinned references (^x) are all supported. The canonical lowering target for Mochi match.

  5. Jupyter via IRuby. The IRuby Jupyter kernel (gem install iruby, iruby register --force) executes Ruby in Jupyter notebooks. Mochi-emitted Ruby drops into a notebook code cell with no special handling, opening Mochi to data-science workflows.

  6. Tebako, TruffleRuby, mruby. Three independent paths to native binaries from Ruby source: Tebako packages a CRuby interpreter plus the script via libfsm + dwarfs; TruffleRuby's GraalVM native-image AOT compiles via the Truffle interpreter; mruby is a minimal embedded interpreter compiled to a single static binary. All three reach distribution shapes (single-file binaries, embedded use) that the other Mochi targets either don't reach or reach via different toolchains.

  7. Ruby on Rails. The dominant Ruby web framework, used by Shopify, GitHub, GitLab, Basecamp, Airbnb, Stripe, and Square. Mochi-on-Ruby cannot drop into a Rails app as a model or controller subclass (that requires deep Rails reflection) but it can expose its logic as a callable Ruby module that a Rails controller invokes, opening Mochi to the Rails ecosystem via FFI.

The C target remains the right choice for embedded targets and minimal runtime footprint. The BEAM target remains the right choice for hot-reload services and OTP supervision. The JVM target remains the right choice for Maven Central deep-cut library access and Android. The .NET target remains the right choice for NuGet interop and Windows enterprise. The Ruby target is the right choice for RubyGems interop, Rails web ecosystem, Jupyter / IRuby data-science workflows, and Tebako / TruffleRuby / mruby AOT distribution.

Specification

This section is normative.

1. Pipeline and IR reuse

The Ruby pipeline reuses MEP-45's aotir IR. The emit stage forks: transpiler3/ruby/lower/Lower(prog, fileBase, moduleName) consumes *aotir.Program and returns *rtree.SourceFile. transpiler3/ruby/emit/Emit(sf, workDir) writes the rendered source to disk. The driver at transpiler3/ruby/build/Driver.Build(src, out, target) glues parse → typecheck → clower.Lower → ruby/lower.Lower → emit.

2. Toolchain detection

build.resolveToolchain() resolves Ruby in order:

  1. $MOCHI_RUBY (env override)
  2. Homebrew slots: /opt/homebrew/opt/ruby{,@3.4,@3.3,@3.2}/bin/ruby, /usr/local/opt/ruby/bin/ruby
  3. exec.LookPath("ruby")

Rejects Ruby < 3.2 (no Data.define). Returns Toolchain{Ruby, Bundle, Major, Minor}.

3. Surface-syntax lowering

Mochi constructRuby lowering
let x = ex = e
var x = ex = e (no Ruby-side const distinction; var-vs-let is type-system-only)
if/elsif/elseif/elsif/else/end
while c { ... }while c; ...; end
for i in lo..hi { ... }(lo...hi).each do |i| ...; end
for x in xs { ... }xs.each do |x| ...; end
fun f(a: T): U { ... }def self.f(a); ...; end (inside module Main)
let f = fun(a: T): U => bodybody is lifted to a synthesised module method __fn_N(__env, __a0, ...); f itself is ->(__a0, ...) { __fn_N(__env, __a0, ...) } with __env a {:k => v} Hash captured at the binding site. Parameter names are renamed __a0, __a1, ...
match e { Variant(x) => arm }case e; in UnionName::Variant(x:); arm; end (the UnionName:: prefix and field: keyword-pattern shape are emitted; arms are newline-separated, not then clauses)
type T = A | Bmodule T; A = Data.define(:f1, ...); B = Data.define(:f1, ...); end (variants are wrapped in a module T namespace)
record User { id: int }User = Data.define(:id)
type Pair = { a: int, b: int } (anonymous record via type … = { … })Pair = Data.define(:a, :b) (lowered identically to the record form, exercised by TestPhase31Integration/try_catch_around_record_field_access)
AgentType { field: value, ... } (struct-literal agent constructor)AgentType.new(field: value, ...) (alongside spawn AgentType(); both forms lower through lowerAgent / lowerAgentSpawn)
Stream s = make_stream(N)s = Mochi::Runtime::Stream.new(N)
subscribe(s)s.subscribe (returns Thread::SizedQueue)
subscribe_limit(s, N)s.subscribe_limit(N) (returns Mochi::Runtime::LimitedQueue)
emit(s, v)s.emit(v)
recv_sub(sub)sub.pop
Channel make_chan(N)Thread::SizedQueue.new(N)
chan <- vchan.push(v)
<- chanchan.pop
agent A { state ... on Msg ... }class A; def initialize(field:); @field = field; end; def intent(...); ...; end; end, emitted inside module Main
spawn AgentType()A.new(field1: zero1, field2: zero2, ...) (synthesised zero-value field map)
a.intent(arg)a.intent(arg)
async exprThread.new { expr }
await futfut.value
try { ... } catch e { ... }begin; ...; rescue Mochi::Runtime::Panic => __exc; e = __exc.code; ...; end
panic(code, msg)raise Mochi::Runtime::Panic.new(code, msg)
break / continuebreak / next (inside each blocks break exits the outer enumeration)
from x in xs where p select edesugared by clower into each + accumulator; final shape is (begin; __out = []; xs.each { |x| __out << e if p }; __out; end)
... order by k skip s take t.sort_by { |x| k } then .drop(s).first(t)
Datalog query parent(_, Y)evaluated at compile-time via semi-naive fixpoint in transpiler3/ruby/lower/datalog.go; emitted as a frozen Ruby Array literal of pre-computed result tuples
min(xs) / max(xs) / sum(xs)xs.min / xs.max / xs.sum
in(x, xs)xs.include?(x)
map(xs, f) / filter(xs, p) / reduce(xs, acc, f)xs.map { |__x| (f).call(__x) }, .select { ... }, .inject(acc) { |__a, __x| (f).call(__a, __x) } (every higher-order builtin trampolines through .call because closures are lifted-method lambdas)
len(xs) / len(s) / len(m) / len(set).length (Ruby Array#length, String#length, Hash#size, Set#size)
keys(m) / values(m)m.keys / m.values
append(xs, v)(xs + [v]) (functional, returns a new list)
slice(xs, lo, hi) / xs[lo:hi](xs[lo...hi] || []) (returns [] on out-of-range start)
sort(xs)xs.sort
abs(n) / floor(f) / ceil(f)n.abs / f.floor / f.ceil
str(v)v.to_s
upper(s) / lower(s)s.upcase / s.downcase
index(s, sub) / contains(s, sub)s.index(sub) / s.include?(sub)
substring(s, lo, hi)(s[lo...hi] || "")
reverse(s) / split(s, sep) / join(xs, sep)s.reverse / s.split(sep) / xs.join(sep)
Set literal set{1, 2} or annotated let s: set<int> = {1, 2} / add(s, x) / has(s, x)Set.new([1, 2]) / (s | Set[x]) / s.include?(x) (the bare {1, 2} form must be disambiguated by type context; the set{...} prefix and let s: set<int> annotation are both accepted, see TestPhase16SetsOMaps)
OMap literal omap{"a": 1} with let m: omap<string, int> annotation / m["a"] / has(m, k)insertion-ordered Hash, m.fetch("a"), m.key?(k) (the explicit omap<K, V> type annotation is required for round-trip; see TestPhase29EdgeCases/omap_round_trip_multi_key)
readFile(p)File.read(p)
lines(p)File.readlines(p, chomp: true)
writeFile(p, s) / appendFile(p, s)File.write(p, s) / File.open(p, "a") { |__f| __f.write(s) }
loadCSV(p)(require 'csv'; CSV.read(p))
saveCSV(rows, p)(require 'csv'; CSV.open(p, "w") { |__c| rows.each { |__r| __c << __r } })
jsonDecode(s) (also accepted: json_decode(s))(require 'json'; JSON.parse(s)) (both camelCase and snake_case identifiers route to the same aotir.JsonDecodeExpr node; tests use the snake_case form)
httpGet(url) (also accepted: fetch <url> keyword form)(require 'open-uri'; URI.parse(url).open.read) (the keyword form fetch "https://..." is used by TestPhase18IOJsonCsvHttp/http_get; both surface forms lower to aotir.HttpGetExpr)
(int)f numeric cast (also accepted: int(x))f.to_i (truncates toward zero in Ruby; int(x) and (int)x lower identically through aotir.NumCastExpr; covered by TestPhase29EdgeCases/int_cast_from_float)
a / b integer divide (BinDivI64)a / b (known divergence: Ruby Integer#/ floor-divides; Mochi spec is truncate-toward-zero matching C/JVM/Swift/.NET; for matched-sign operands the two agree, for mixed-sign with a non-zero remainder Ruby is one less. See TestPhase29EdgeCases/negative_int_floor_div_known_divergence for the locked-in pattern and lower.go:1187 for the fix-later flag)
!b boolean negation!b
Bareword identifier collisions (end, class, module, ...)suffix with _ (e.g., end_) to avoid Ruby keyword clash

4. Runtime library

mochi-runtime (Apache-2.0, ~123 LOC of Ruby across runtime.rb, runtime/io.rb, runtime/panic.rb, runtime/stream.rb, runtime/version.rb) exports:

  • Mochi::Runtime::Stream.new(cap) / #subscribe / #subscribe_limit(limit) / #emit(val)
  • Mochi::Runtime::LimitedQueue#push (drops when size ≥ limit) / #pop (blocks)
  • Mochi::Runtime::Panic < StandardError with attr_reader :code (delegates message via super(msg))
  • Mochi::Runtime::IO.putln(value) and IO.format_value(value) (deterministic float formatting via %.16g, integer-valued floats get a trailing .0, nil renders as "")
  • Mochi::Runtime::VERSION constant

Stdlib is the fat runtime, required inline at use sites (the emit pass does not blanket-require): csv (loadCSV / saveCSV), json (jsonDecode), open-uri (httpGet), and the always-available Set. set is auto-loadable in Ruby 3.2+ and is referenced without an explicit require.

5. Build targets

TargetSwitch caseOutput layout
TargetRubySourceruby-source<out>/<name>.rb
TargetRubyGemruby-gem<out>/<name>.gemspec + <out>/lib/<name>.rb
TargetRubyBundleruby-bundle<out>/Gemfile + <out>/<name>.rb
TargetIRubyKerneliruby-kernel<out>/<name>.ipynb (nbformat 4)
TargetTebakotebako<out>/root/<name>.rb + <out>/root/Gemfile + <out>/press.sh
TargetTruffleNativetruffle-native<out>/<name>.rb + <out>/native_build.sh
TargetMRubymruby<out>/<name>.rb + <out>/build_config.rb + <out>/mruby_build.sh

Build scripts use MOCHI_TEBAKO_IMAGE, MOCHI_TEBAKO_RUBY, MOCHI_GRAALVM_HOME, MOCHI_MRBC, MRUBY_HOME env vars for toolchain customisation.

Phases

See /docs/implementation/0056/ for the per-phase tracking matrix. Thirty-three phases (0-32) cover language constructs (0-21), packaging targets (22-27), and audit passes (28-32).

A phase is LANDED only when its gate is green on every Ruby implementation listed for it in §6 below.

6. Target matrix

The matrix below reflects the current CI coverage plus the intended future gate. CRuby 3.2 LTS and 3.4 are CI-gated (blocking) as of audit-4 (phase 29). CRuby 4.0 (Homebrew) is verified locally. JRuby 10, TruffleRuby 33, and mruby 4 are reserved as sub-phases 29.1, 29.2, 29.3 per the umbrella-phase coverage rule and depend on container toolchain detection landing in build.go. See /docs/implementation/0056/ for the live runtime-matrix status.

Phase scopeCRuby 3.2CRuby 3.4CRuby 4.0JRuby 10TruffleRuby 33mruby 4
Scalars / arithmeticLANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2sub-phase 29.3
Lists / maps / setsLANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2best-effort (29.3)
Records / sums (Data.define)LANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2n/a (mruby has no Data.define)
ClosuresLANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2sub-phase 29.3
Queries / DatalogLANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2best-effort (29.3)
Channels / ThreadsLANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2n/a (mruby has no Thread)
Streams (Thread::SizedQueue)LANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2n/a
Async (Thread)LANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2n/a
AgentsLANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2n/a
File / JSON / CSV / HTTPLANDED (CI)LANDED (CI)LANDED (local)sub-phase 29.1sub-phase 29.2best-effort (29.3)

mruby is intentionally a Mochi-subset target; phases 10-14 are not required to run there.

Toolchain handling currently auto-detects CRuby only (build.findRuby walks $MOCHI_RUBY, Homebrew slots, then PATH). JRuby 10, TruffleRuby 33, and mruby 4 are surfaced via user-supplied env vars (MOCHI_GRAALVM_HOME, MOCHI_MRBC, MRUBY_HOME) inside the emitted build scripts; automatic detection of those toolchains is a future sub-phase.

Alternatives considered

  1. Emit Crystal source instead. Rejected: Crystal is a separate language with its own toolchain; the audience is much smaller; mixing Crystal-emit with Ruby-emit splits the runtime library into two implementations.
  2. Lower agents to Ractor. Rejected: Ractor's strict object-sharing model (frozen objects only, no shared mutable state, no closures over outer vars) would force semantic translations that a Mochi user did not write.
  3. Emit one big monolithic class. Rejected: top-level Mochi code maps naturally to top-level Ruby module-level code; nesting everything under a class adds an extra Foo.run call site for no semantic gain.
  4. Bundle the runtime as a single inlined file per emission. Rejected: makes generated code untraceable in a debugger and breaks gem install semantics. Shipping mochi-runtime as a real gem lets users require 'mochi/runtime' from any Ruby program.

Risks

  1. Multi-runtime divergence. JRuby and TruffleRuby ship Data.define and case/in but have subtle differences from CRuby (e.g., JRuby's thread scheduling). CI must run the full fixture corpus against every supported runtime, not just CRuby.
  2. mruby surface gap. mruby has no Thread, no Data.define (as of mruby 3.3), no csv, and a smaller net/http. Streams, agents, async, and parts of the file/HTTP surface are unimplementable. The MEP scopes mruby as a Mochi-subset target only.
  3. Ractor pressure. As Ractor matures (3.4 and 4.0 add more capabilities), users may ask for a Ractor-backed agent option. The current MEP keeps Ractor out of scope; a follow-up sub-MEP can add --runtime=ractor if demand emerges.
  4. Bundler version drift. Bundler ships with Ruby but has its own version cadence. The Gemfile generated by TargetRubyBundle does not pin a bundler version; this may need a bundled with line if downstream tools require it.

Acknowledgements

This MEP builds on MEP-45 (C transpiler) for the aotir IR and clower pipeline, on MEP-46 / MEP-47 / MEP-48 for the multi-backend lowering pattern, and on the Ruby community's Data.define / case-in / frozen-string-literals conventions that make Ruby 3.2+ an unusually clean transpilation target.