MEP 14. Query Algebra
| Field | Value |
|---|---|
| MEP | 14 |
| Title | Query Algebra |
| Author | Mochi core |
| Status | Informational |
| Type | Informational |
| Created | 2026-05-08 |
Abstract
Mochi has LINQ-shaped query expressions: from ... select with optional where, join, group, sort, skip, take, and distinct. This MEP documents the type rules per clause, the aggregate operators, the join cardinalities, and the soundness gaps that today's implementation accepts.
Motivation
Queries are how a lot of real Mochi programs are written. The clause type rules touch most of the type system: list types, group types, predicates, ordered values. A single document with the rules per clause keeps the implementation honest.
Status at a glance
| Item | Status |
|---|---|
from source must be list/group (T032) | closed |
Additional from clauses (cross-join nested loops) | closed |
join source must be list/group (T034) | closed |
join ... on cond requires bool (T035) | closed |
where pred requires bool (T033) | closed |
Pure predicate in where and having (T044) | closed |
group by ... into g: group binding shape | closed |
having pred requires bool (T042) | closed |
Aggregate misuse: sum/avg on non-numeric (T038, T041) | closed |
Aggregate misuse: count on non list/group (T037) | closed |
Aggregate avg(...): float, sum(...): T, count(...): int | closed |
Result shape: [U] where U = ExprType(select) | closed |
Grouped aggregate result is [U], one row per group | closed |
skip n and take n require n : int (T055) | closed |
sort by e requires e of an ordered type (T056) | closed |
select distinct e requires e of a hashable type (T057) | closed |
Left-join right-side binding becomes Option<T> | deferred (needs option-unwrap discipline; see Rationale) |
Specification
Surface
Mochi queries are LINQ-shaped. They start with a from and end with a select. The clause set:
from x in source
{ from y in source }*
{ [side] join z in source on cond }*
[ where pred ]
[ group by k1, ..., kn into g [ having pred ] ]
[ ( sort | order ) by expr ]
[ skip n ]
[ take n ]
select [ distinct ] proj
Required: from and select. Everything else is optional. The order is fixed by the grammar.
Type rules per clause
from x in xs requires xs : [T]. Inside the body of the query, x : T. T032 fires when the source is not a list (or group, which also iterates as T).
Additional from y in ys clauses behave like nested loops. The projection sees both x and y in scope. There is no implicit cross-join elision; the result enumerates the cartesian product unless joined.
join z in zs on cond requires zs : [U] (T034) and cond : bool (T035). The join side modifier (left, right, outer) controls the cardinality of unmatched rows. A left join may produce z = null for unmatched outer rows. The unmatched binding is typed as any today; the eventual target is Option<U>, but that depends on option-unwrap discipline at access sites (see Rationale).
where pred requires pred : bool (T033). Calls inside the predicate must be pure; impure calls raise T044.
group by k1, ..., kn into g partitions by the key tuple and binds g : group<K, T> where K is the key type and T is the source row type. Today the key shape is special-cased: one expression keeps its source type, two string expressions form a pair_string struct, and any other multi-key shape falls back to any. The having clause requires bool (T042) and its calls must be pure (T044).
sort by expr requires expr to be of an ordered type. Ordered types are int, int64, bigint, bigrat, float, bool, string, and lists thereof (lists are compared element-by-element). Anything else raises T056. Negation prefix (-expr) sorts descending and preserves the element type. The checker downgrades unknown-identifier errors inside the sort expression to a silent skip for compatibility with existing dataset queries (TPC-H q7 et al.) that reference group-key fields as bare names; this is documented as a soft spot until the group key is lifted into scope as named bindings.
skip n and take n require n : int (T055). The checker rejects non-int operands at compile time. any is allowed through (consistent with MEP-10 A2 escape hatch).
select expr returns [U] where U = ExprType(expr). Inside select the iteration variables are in scope.
select distinct expr is the same with deduplication. The element type must be hashable (T057). Hashable types are the numerics, bool, string, unit, lists/maps/options/structs/unions of hashable types, and any. Function and group values are rejected. The runtime canonicalises through a string form.
Aggregate operators
These are functions that take a list or a group and return a scalar:
count(g) : int. Accepts list or group. Anything else raises T037.sum(g) : TwhereTis numeric. Non-numeric raises T041. The element type is preserved (MEP-5 P2), so a[bigint]sum staysbigint.avg(g) : floatfor numerics. Non-numeric raises T038.min(g) : Tandmax(g) : Tfor orderedT. Today rejected only on non-numeric (min/max() expects numeric expression); a future MEP may broaden to ordered.
Joins and cardinality
Inner join: rows with matching cond. Output cardinality is the intersection of left and right.
Left join: every left row appears, with null filling unmatched right-side bindings. Right join is symmetric. Outer join produces both sides.
Cardinality fixtures:
- Inner join on
idwith two rows on each side, two matches: result has two rows. - Left join with one unmatched left row: result has the matched row and one row with the right side
null. - Outer join with disjoint sides: result has the union of rows.
Today the unmatched right-side binding is typed as any. The eventual target is Option<T>, but that change is gated on the option-unwrap work (see Rationale). MEP 10 A2 (null literal is any) is closed and is not the blocker here.
Group plan and types
group by k1, ..., kn into g. Inside the rest of the query, g has type group<K, T> where K is the key (single key for one expression, pair_string for two strings, any otherwise today) and T is the source row type. Aggregates over g are typed:
count(g) : intsum(g.field) : TifT : numericavg(g.field) : floatmin(g.field) : T,max(g.field) : T
The group plan builder lives at types/query_plan_builder.go and emits a structured plan that the runtime consumes. The plan is also golden-tested under tests/queryplan/.
Distinct and sort
sort by e(closed, T056). The expression must be of an ordered type (int,int64,bigint,bigrat,float,bool,string, or list thereof). Anything else raises T056. The checker tolerates unbound identifiers inside the sort expression for compatibility with existing dataset queries that reference group-key fields as bare names (TPC-H q7 et al.); this is the only soft spot in the rule.select distinct e(closed, T057). Deduplicates by structural equality one. The runtime canonicalises through a string form. The checker requireseto be of a hashable type: numerics,bool,string,unit, lists/maps/options/structs/unions of hashable elements, orany. Functions and group values are rejected.
skip and take are closed under T055.
Soundness obligations (current)
- A query whose source is not a list raises T032.
- A
wherewhose predicate is notboolraises T033. - A
havingwhose predicate is notboolraises T042. - An aggregate misuse raises T037, T038, or T041.
- A
sort byexpression that is not of an ordered type raises T056. - A
select distinctexpression that is not of a hashable type raises T057. skipandtakeoperands that are notintraise T055.- The result type of a query is
[U]whereUis the projected type.
Soundness targets
(All current targets shipped.)
Fixtures
Existing (committed in tests/types/):
- Valid:
query_basic_select,query_sort_take,query_sort_by_ordered,query_select_struct_proj,query_distinct_int,query_distinct_struct_canonical,query_join_inner_cardinality,query_join_left_unmatched_null,query_join_outer_both_sides,query_group_aggregate_int_sum,query_group_aggregate_avg_float,join_basic,join_multi. - Error:
query_source_not_list(T032),query_where_bool_required(T033),join_source_not_list(T034),join_on_not_bool(T035),count_invalid_type(T037),avg_non_numeric(T038),query_group_having_bool_required(T042),query_take_int_required(T055),query_skip_int_required(T055),query_sort_by_unordered(T056),query_distinct_non_hashable(T057).
Rationale
LINQ-shaped queries are familiar, simple to teach, and easy to translate to a relational plan. We keep the clause order fixed because that simplifies the parser and makes the plan obvious.
The null-from-left-join hack is a known wart. The right-side bindings should be Option<T> so that pattern matching is required to access them. The blocker is not MEP 10 A2 (closed): it is the surrounding option-unwrap discipline. Today a let x: T? = ... does not auto-wrap a T value, and field access on an Option<T> does not auto-unwrap, so retyping the binding would break a large set of fixtures that read c.name directly off a left-joined c. We accept the wart until the unwrap rules are written; the cardinality fixtures here pin the result shape so the eventual change is a visible diff.
Backwards Compatibility
Tightening sort and distinct to type-check at compile time is a breaking change for programs that rely on the loose runtime semantics. We pin the current behaviour in fixtures before tightening, so the closure for each gap is visible as a diff against a known baseline. skip/take (T055) already shipped under that pattern.
Reference Implementation
types/check_expr.go:1252,checkQueryExpr: source, from, join, where, skip, take, group, having, sort, and aggregate select; emits T032, T033, T034, T035, T037, T038, T041, T042, T044, T055, T056, T057.types/check_expr.goisOrderedhelper, orderable kinds forsort by.types/check_expr.goisHashablehelper, hashable kinds forselect distinct.types/query_plan_builder.go, group plan construction.types/errors.go, query-related diagnostic helpers (errQuerySourceList,errWhereBoolean,errJoinSourceList,errJoinOnBoolean,errHavingBoolean,errSumOperand,errImpurePredicate,errSkipTakeIntOperand,errSortByOrdered,errDistinctHashable).tests/queryplan/, golden tests for the plan.
Open Questions
- Ordered trait. Decide between an explicit interface for
min,max,sortor a hard-coded list of orderable kinds. - Right-side
nullin left join. Retyping toOption<T>waits on option-unwrap discipline (auto-wrap on assignment, auto-unwrap or pattern-match on access). MEP 10 A2 (null literal isany) is closed and is not the blocker.
References
- LINQ specification: https://learn.microsoft.com/dotnet/csharp/linq/.
Copyright
This document is placed in the public domain.