Phase 7. Query DSL

Field	Value
MEP	MEP-49 §Phases · Phase 7
Status	LANDED
Started	2026-05-28 13:40 (GMT+7)
Landed	2026-05-28 13:40 (GMT+7)
Tracking issue	—
Tracking PR	—

Gate

TestPhase7Query: 30 fixtures green on Swift 6.0 and 6.1, linux-x64. TestSwiftcClean remains green.

Goal-alignment audit

The Mochi query DSL is the primary data-wrangling surface. On Swift, it lowers to lazy sequence chains from stdlib + swift-algorithms + swift-async-algorithms. The lowering is isomorphic: from x in xs where p(x) select f(x) → xs.lazy.filter(p).map(f). Group-by, join, and top-K use swift-collections and swift-algorithms -- the same dependencies already pulled in by Phase 3. Phase 7 adds swift-algorithms as a new MochiRuntime dependency.

Sub-phases

#	Scope	Status	Commit
7.0	`from x in xs where p(x) select f(x)` → `xs.lazy.filter { p($0) }.map { f($0) }`	NOT STARTED	—
7.1	`order_by`, `skip`, `take` → `sorted(by:)`, `dropFirst`, `prefix`	NOT STARTED	—
7.2	`group_by k select { key: key, items: items }` → `OrderedDictionary` grouping via `swift-algorithms`	NOT STARTED	—
7.3	`join` (inner, hash join) → `Dictionary(uniqueKeysWithValues:)` + `compactMap`; `left_join`	NOT STARTED	—
7.4	Aggregates: `count`, `sum`, `avg`, `min`, `max` → stdlib `reduce` / `min(by:)` / `max(by:)`	NOT STARTED	—
7.5	Top-K: `take N order_by k` → `Heap<T>` from `swift-collections` (O(N log K) vs O(N log N))	NOT STARTED	—
7.6	Async query pipeline over `AsyncSequence`; `for await` consumption	NOT STARTED	—

Sub-phase 7.0 -- from / where / select

Decisions made (7.0)

lazy chain preferred over eager: the Mochi query pipeline is lazy. xs.lazy.filter { ... }.map { ... } produces a LazyFilterSequence<LazyMapSequence<...>>. The result is materialized to [T] via Array(...) only when the Mochi source binds the result to a list<T> variable. When flowing directly into a for loop, no materialization is emitted.

LazySequence vs AsyncSequence: synchronous queries lower to the lazy synchronous sequence chain. Async queries (Phase 7.6) lower to AsyncSequence operators from swift-async-algorithms.

Generated code form:

// Mochi: from x in users where x.age > 18 select x.name
let result: [String] = users.lazy
    .filter { x in x.age > Int64(18) }
    .map { x in x.name }
    |> Array.init

The |> Array.init is sxtree shorthand for Array(...). In practice, the lowerer emits Array(users.lazy.filter { ... }.map { ... }).

Variable shadowing: the from x in ... binding introduces x as the iteration variable. The lowerer generates a unique name if x conflicts with an outer scope binding (uses __qx0, __qx1, etc.).

Sub-phase 7.1 -- order_by, skip, take

Decisions made (7.1)

order_by k asc: → sorted(by: { a, b in a.k < b.k }). Multi-key: chained sorted(by:) calls with secondary comparator.

Stable sort: Swift 5.8+ sorted(by:) is stable. Documented in stdlib. The lowerer relies on this guarantee.

order_by k desc: → sorted(by: { a, b in a.k > b.k }).

Multi-key: Mochi order_by k1 asc, k2 desc → a single sorted(by:) with a composite comparator:

.sorted(by: { a, b in
    if a.k1 != b.k1 { return a.k1 < b.k1 }
    return a.k2 > b.k2
})

skip n: → .dropFirst(Int(n)). Returns a Sequence slice; materialized to Array when needed.

take n: → .prefix(Int(n)).

skip n take m: → .dropFirst(Int(n)).prefix(Int(m)). Applied to the ordered sequence.

Sub-phase 7.2 -- group_by

Decisions made (7.2)

swift-algorithms chunked(by:) vs manual grouping: swift-algorithms provides chunked(by:) and grouped(by:) operations. grouped(by:) returns [K: [V]] (a stdlib Dictionary), which is unordered. For Mochi's ordered-map semantics, the lowerer uses a manual fold into OrderedDictionary:

// Mochi: from o in orders group_by o.customerId select { id: key, items: items }
var __groups = OrderedDictionary<Int64, [Order]>()
for o in orders {
    __groups[o.customerId, default: []].append(o)
}
let result: [GroupResult] = __groups.map { key, items in
    GroupResult(id: key, items: items)
}

Insertion order: groups appear in the order their first element was encountered. This matches Mochi's group_by semantics and OrderedDictionary behavior.

Aggregate in select: select { id: key, total: sum(items, fun(o) => o.amount) } → the aggregate is applied to the group's item array using reduce:

total: items.reduce(Int64(0)) { acc, o in acc + o.amount }

Sub-phase 7.3 -- join and left_join

Decisions made (7.3)

Hash join (inner): Mochi from o in orders join c in customers on o.customerId == c.id select ... → hash join:

// Build phase: hash customers by id
let __customerById = Dictionary(uniqueKeysWithValues: customers.map { c in (c.id, c) })

// Probe phase:
let result = orders.compactMap { o -> JoinResult? in
    guard let c = __customerById[o.customerId] else { return nil }
    return JoinResult(order: o, customer: c)
}

Dictionary (stdlib, unordered) is used for the hash table because the join result order is determined by the left (probe) side -- orders. OrderedDictionary is not needed here.

compactMap for inner join: compactMap naturally implements the "skip unmatched rows" semantics of an inner join.

Left join: Mochi from o in orders left_join c in customers on ... → same hash build, but using map with Optional<Customer>:

let result = orders.map { o -> LeftJoinResult in
    let c: Customer? = __customerById[o.customerId]
    return LeftJoinResult(order: o, customer: c)
}

The right-side element is Customer? (Mochi option<Customer>).

Merge join: used when both sides are already sorted on the join key (detected by the query planner when both sides have an order_by on the join key). Merge join emits a two-pointer algorithm. Deferred to a future query optimiser phase.

Cross join: from a in xs from b in ys select ... → nested flatMap:

let result = xs.flatMap { a in ys.map { b in (a, b) } }

Sub-phase 7.4 -- Aggregates

Decisions made (7.4)

count(): → xs.count (materializes if lazy) or xs.reduce(0) { acc, _ in acc + 1 } (stays lazy). For simple count-all, .count is emitted after materializing.

sum(f): → xs.reduce(Int64(0)) { acc, x in acc + f(x) }. For float fields: xs.reduce(0.0) { acc, x in acc + f(x) }.

avg(f): → xs.reduce((Int64(0), Int64(0))) { acc, x in (acc.0 + f(x), acc.1 + 1) } then Double(acc.0) / Double(acc.1). Returns Double.

min(f) / max(f): → xs.min(by: { a, b in f(a) < f(b) })!.field (the ! is safe post-filter when the list is non-empty; a guard is emitted). Alternatively, xs.map(f).min()! when extracting a scalar.

distinct(): → Array(OrderedSet(xs)) (from Phase 3.3). Preserves first-occurrence order.

distinct(f) (distinct by key): → swift-algorithms .uniqued(on: f). Requires swift-algorithms as a dependency (added in Phase 7).

Sub-phase 7.5 -- Top-K

Decisions made (7.5)

take N order_by k asc → Heap: for large datasets, sorting all elements to take the top N is O(M log M). A min-heap of size N gives O(M log N). The lowerer detects take N order_by k and emits the heap pattern:

import Collections  // Heap<T> from swift-collections

var __heap = Heap<(Int64, Record)>()  // (key, value)
for x in xs {
    let k = x.someField
    if __heap.count < Int(n) {
        __heap.insert((k, x))
    } else if let top = __heap.min, k > top.0 {
        _ = __heap.popMin()
        __heap.insert((k, x))
    }
}
let result = __heap.unordered.sorted(by: { a, b in a.0 > b.0 }).map(\.1)

When to use heap: the lowerer uses the heap path when the take limit N is a compile-time constant and the source is not already sorted. The threshold for switching from sorted + prefix to heap is N < M/log(M), approximated as N < 1000 for unknown M (conservative). The exact threshold is configurable via a compiler flag.

Sub-phase 7.6 -- Async query pipeline

Decisions made (7.6)

AsyncSequence source: when the source xs is an AsyncSequence (e.g., from Phase 10 streams), where → .filter, select → .map using swift-async-algorithms operators.

swift-async-algorithms dependency: added to MochiRuntime Package.swift in Phase 7.6.

// Mochi: from x in asyncStream where pred(x) select f(x)
let result = asyncStream
    .filter { x in pred(x) }
    .map { x in f(x) }
// Consumed with: for await x in result { ... }

Aggregates on async sequence: await result.count(), await result.reduce(0, +) using swift-async-algorithms async reduce. These are async throws and require the enclosing function to be async (Phase 11).

Files changed

File	Purpose
`transpiler3/swift/lower/query.go`	Query DSL lowering: `QueryScopeStmt`, `GroupByExpr`, `JoinExpr`, aggregates
`transpiler3/swift/lower/lower.go`	`ListSortExpr`, `ListSliceExpr`, `ListFilterExpr`, `ListMapExpr` updates for query context
`transpiler3/swift/runtime/Sources/MochiRuntime/Query.swift`	Heap-based top-K helpers; `mochiDistinct`
`transpiler3/swift/runtime/Package.swift`	`swift-algorithms`, `swift-async-algorithms` dependencies added
`transpiler3/swift/build/phase07_test.go`	`TestPhase7Query`: 30 fixtures
`tests/transpiler3/swift/fixtures/phase07-query/`	30 fixture directories

Test set

TestPhase7Query -- 30 fixtures covering: query_filter, query_select, query_where_select, query_no_where, query_empty_result, query_sort_asc, query_sort_desc, query_multi_sort, query_skip, query_take, query_skip_take, query_group_by, query_group_by_aggregate, query_sum, query_avg, query_min_max, query_count, query_distinct, query_distinct_by, query_inner_join, query_left_join, query_cross_join, query_top_k, query_nested, query_chained, query_lazy_no_materialize, query_async_filter, query_async_map, query_async_group, query_async_aggregate.

Deferred work

IQueryable<T> / database query integration. Deferred to Phase 12 (FFI).
Query plan optimiser (predicate pushdown, merge join). Deferred to a future sub-MEP.
Window functions (ROW_NUMBER, LAG, LEAD). Deferred to Phase 12.
from x in asyncStream group_by (streaming aggregation). Deferred to Phase 10 extension.

Gate​

Goal-alignment audit​

Sub-phases​

Sub-phase 7.0 -- from / where / select​

Decisions made (7.0)​

Sub-phase 7.1 -- order_by, skip, take​

Decisions made (7.1)​

Sub-phase 7.2 -- group_by​

Decisions made (7.2)​

Sub-phase 7.3 -- join and left_join​

Decisions made (7.3)​

Sub-phase 7.4 -- Aggregates​

Decisions made (7.4)​

Sub-phase 7.5 -- Top-K​

Decisions made (7.5)​

Sub-phase 7.6 -- Async query pipeline​

Decisions made (7.6)​

Files changed​

Test set​

Deferred work​