Skip to main content

Phase 7. Query DSL

FieldValue
MEPMEP-52 §Phases · Phase 7
StatusNOT STARTED
Startedn/a
Landedn/a
Tracking issuen/a
Tracking PRn/a

Gate

TestPhase7Query: 40 fixtures green on Node 22, Deno 2, Bun 1.1, Chromium 130. Secondary gates: tsc strict zero diagnostics; runtime-budget gate (the runtime additions for query and join stay below 1 KB gzipped); execution-budget gate (a representative joins fixture must complete within 2x the vm3 wall-clock baseline on the linux-x64 CI runner).

Goal-alignment audit

Mochi's query DSL is the front-end for almost every data-shaped program: report generation, ETL, leaderboards, analytics. The TS surface gives us two routes: (a) Array.prototype.map/filter/... (eager, allocates intermediate arrays), or (b) TC39 Iterator helpers (Iterator.from(...).map(...).filter(...), lazy, single-pass). The MEP-52 decision is to prefer iterator helpers, falling back to Array.prototype only when the IR proves the source is finite and the user wants the intermediate T[] reified. Async sources lower to AsyncIterable<T> and the for await form. Joins, group-by, and top-K are implemented in @mochi/runtime/query because the platform does not ship them.

Sub-phases

#ScopeStatusCommit
7.0Simple comprehensions: [f(x) for x in xs if p(x)] to Iterator.from(xs).filter(p).map(f).toArray()NOT STARTEDn/a
7.1Multi-source comprehensions: [f(x, y) for x in xs for y in ys] to flatMap chainNOT STARTEDn/a
7.2Joins: hash join (default), merge join (when both sides are sorted), nested-loop (when one side is small); IR-driven choiceNOT STARTEDn/a
7.3Group-by: Map.groupBy(xs, keyFn) (ES2024) plus aggregationsNOT STARTEDn/a
7.4Top-K: min-heap implementation in @mochi/runtime/query/heapNOT STARTEDn/a
7.5Async sources: AsyncIterable<T> plumbing through query pipelines via Iterator.fromAsync and async generatorsNOT STARTEDn/a

Sub-phase 7.0, Simple comprehensions

Decisions made (7.0)

Mochi: [u.name for u in users if u.age > 18]

TypeScript:

Iterator.from(users).filter((u) => u.age > 18n).map((u) => u.name).toArray()

Why iterator helpers (ES2024): lazy single-pass; no intermediate array for filter's output. Iterator.from(array) is constant-time (array is already iterable). The terminal .toArray() materialises only when the consumer expects a T[]. Node 22 / Deno 2 / Bun 1.1 / Chromium 122+ all ship iterator helpers natively.

Fallback to Array.prototype: when the IR signals xs is an array, the producer is finite, and the consumer is a single for/for-of loop, the emitter prefers xs.filter(...).map(...) (slightly less indirection, similar perf in V8). The IR-pass query/lowerStrategy.go picks the form.

Sub-phase 7.1, Multi-source comprehensions

Decisions made (7.1)

Mochi: [f(x, y) for x in xs for y in ys if p(x, y)]

TypeScript:

Iterator.from(xs)
.flatMap((x) => Iterator.from(ys).map((y) => [x, y] as const))
.filter(([x, y]) => p(x, y))
.map(([x, y]) => f(x, y))
.toArray()

The intermediate [x, y] tuple is unavoidable for non-fused chains; TypeScript and V8 do not eliminate it via escape analysis. The emitter falls back to a hand-coded nested loop when the IR proves the tuple escapes (e.g., the comprehension returns objects whose lifetime exceeds the loop).

Sub-phase 7.2, Joins

Decisions made (7.2)

Hash join (default for equi-join):

// SELECT u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id
const usersById = new Map<bigint, User>(users.map((u) => [u.id, u]));
const result: Array<{ name: string; amount: bigint }> = [];
for (const o of orders) {
const u = usersById.get(o.userId);
if (u !== undefined) result.push({ name: u.name, amount: o.amount });
}

Merge join (when both sides are sorted by the join key): two-pointer walk over both arrays; emitter only chooses this when the IR proves both sides are sorted (via a Sorted provenance tag).

Nested-loop join (when one side is small): cardinality threshold is 16 (heuristic from Iterator.from(small).flatMap(...) benchmarks on V8); below that the hash table allocation cost dominates.

Choice algorithm: the IR query/joinStrategy.go pass selects per join. The TS emitter consumes the choice and emits the right shape.

Sub-phase 7.3, Group-by

Decisions made (7.3)

Mochi: [(k, sum(o.amount for o in g)) for (k, g) in group(orders, by=o.userId)]

TypeScript:

const groups: Map<bigint, Order[]> = Map.groupBy(orders, (o) => o.userId);
const result = Iterator.from(groups)
.map(([k, g]) => [k, g.reduce((acc, o) => acc + o.amount, 0n)] as const)
.toArray();

Map.groupBy is ES2024 (TC39 Stage 4, native in Node 22, Deno 2, Bun 1.1, Chromium 117+). Object.groupBy is also available but returns a plain object; the emitter prefers Map.groupBy for the Map<K, V[]> shape (avoids the prototype-pollution surface of object-as-map).

Aggregations: sum, count, min, max, avg lower to inlined reduces. mochiAvg(xs) is a runtime helper because avg over bigints wants exact rational return type (the Mochi spec defines avg as int -> float for now; the emitter emits Number(sum) / xs.length).

Sub-phase 7.4, Top-K

Decisions made (7.4)

@mochi/runtime/query/heap: hand-rolled min-heap, roughly 80 LOC of TS:

export class MinHeap<T> {
private buf: T[] = [];
constructor(private readonly cmp: (a: T, b: T) => number) {}
size(): number { return this.buf.length; }
peek(): T | undefined { return this.buf[0]; }
push(v: T): void { /* sift up */ }
pop(): T | undefined { /* swap, sift down */ }
}

top-K lowering: [x for x in xs order by f(x) limit K] lowers to a single pass that maintains a K-size min-heap (over the (-key, value) for top-K-largest) and emits the heap contents at the end.

Choice vs full sort: emitter uses heap when K < len(xs) / 4 (heuristic), otherwise falls back to .toSorted((a, b) => cmp(a, b)).slice(0, K).

Sub-phase 7.5, Async sources

Decisions made (7.5)

AsyncIterable<T>: Mochi stream<T> lowers to AsyncIterable<T> (Phase 10 lands the full stream surface). The query DSL pipes async sources through Iterator.fromAsync(asyncSource).filter(...).map(...).

For-await fallback: when the IR signals an async source plus a synchronous consumer that wants the full result, the emitter emits a manual for await accumulator. Iterator.fromAsync is TC39 Stage 4 in ES2024.

Backpressure: for await is naturally pull-based; the producer awaits the consumer at each iteration. No queue is interposed (the agent's queue from Phase 9 handles cross-task buffering separately).

Files (planned)

FilePurpose
transpiler3/typescript/lower/query.goComprehension and query DSL lowering; iterator-helper vs Array.prototype choice
transpiler3/typescript/lower/joins.goHash / merge / nested-loop join lowering; strategy selection from IR
transpiler3/typescript/lower/groupby.goMap.groupBy emission and aggregation inlining
transpiler3/typescript/lower/topk.goTop-K min-heap emission and full-sort fallback
runtime3/typescript/src/query/heap.tsMinHeap<T> for top-K
runtime3/typescript/src/query/aggregations.tsmochiSum, mochiCount, mochiAvg, etc.
transpiler3/typescript/build/phase07_test.goTestPhase7Query
tests/transpiler3/typescript/fixtures/phase07-query/40 fixtures

Test set

  • TestPhase7Query, 40 fixtures four-runtime.
  • TestPhase7QueryBudget, the runtime's query/ sub-path stays under 1 KB gzipped.
  • TestPhase7QueryPerf, a 1M-row representative joins fixture completes within 2x vm3 wall-clock baseline.

Deferred work

  • arquero, danfojs, duckdb-wasm, polars-js. All rejected per [[08-dataset-pipeline]] §12.
  • Window functions (row_number() over (partition by k order by t)). Phase 7 ships group-by only; window functions are a v1.5 candidate.
  • Distributed query (sharded over multiple workers). Not in MEP-52 scope.