Skip to main content

MEP 14. Query Algebra

FieldValue
MEP14
TitleQuery Algebra
AuthorMochi core
StatusInformational
TypeInformational
Created2026-05-08

Abstract

Mochi has LINQ-shaped query expressions: from ... select with optional where, join, group, sort, skip, take, and distinct. This MEP documents the type rules per clause, the aggregate operators, the join cardinalities, and the soundness gaps that today's implementation accepts.

Motivation

Queries are how a lot of real Mochi programs are written. The clause type rules touch most of the type system: list types, group types, predicates, ordered values. A single document with the rules per clause keeps the implementation honest.

Status at a glance

ItemStatus
from source must be list/group (T032)closed
Additional from clauses (cross-join nested loops)closed
join source must be list/group (T034)closed
join ... on cond requires bool (T035)closed
where pred requires bool (T033)closed
Pure predicate in where and having (T044)closed
group by ... into g: group binding shapeclosed
having pred requires bool (T042)closed
Aggregate misuse: sum/avg on non-numeric (T038, T041)closed
Aggregate misuse: count on non list/group (T037)closed
Aggregate avg(...): float, sum(...): T, count(...): intclosed
Result shape: [U] where U = ExprType(select)closed
Grouped aggregate result is [U], one row per groupclosed
skip n and take n require n : int (T055)closed
sort by e requires e of an ordered type (T056)closed
select distinct e requires e of a hashable type (T057)closed
Left-join right-side binding becomes Option<T>deferred (needs option-unwrap discipline; see Rationale)

Specification

Surface

Mochi queries are LINQ-shaped. They start with a from and end with a select. The clause set:

from x in source
{ from y in source }*
{ [side] join z in source on cond }*
[ where pred ]
[ group by k1, ..., kn into g [ having pred ] ]
[ ( sort | order ) by expr ]
[ skip n ]
[ take n ]
select [ distinct ] proj

Required: from and select. Everything else is optional. The order is fixed by the grammar.

Type rules per clause

from x in xs requires xs : [T]. Inside the body of the query, x : T. T032 fires when the source is not a list (or group, which also iterates as T).

Additional from y in ys clauses behave like nested loops. The projection sees both x and y in scope. There is no implicit cross-join elision; the result enumerates the cartesian product unless joined.

join z in zs on cond requires zs : [U] (T034) and cond : bool (T035). The join side modifier (left, right, outer) controls the cardinality of unmatched rows. A left join may produce z = null for unmatched outer rows. The unmatched binding is typed as any today; the eventual target is Option<U>, but that depends on option-unwrap discipline at access sites (see Rationale).

where pred requires pred : bool (T033). Calls inside the predicate must be pure; impure calls raise T044.

group by k1, ..., kn into g partitions by the key tuple and binds g : group<K, T> where K is the key type and T is the source row type. Today the key shape is special-cased: one expression keeps its source type, two string expressions form a pair_string struct, and any other multi-key shape falls back to any. The having clause requires bool (T042) and its calls must be pure (T044).

sort by expr requires expr to be of an ordered type. Ordered types are int, int64, bigint, bigrat, float, bool, string, and lists thereof (lists are compared element-by-element). Anything else raises T056. Negation prefix (-expr) sorts descending and preserves the element type. The checker downgrades unknown-identifier errors inside the sort expression to a silent skip for compatibility with existing dataset queries (TPC-H q7 et al.) that reference group-key fields as bare names; this is documented as a soft spot until the group key is lifted into scope as named bindings.

skip n and take n require n : int (T055). The checker rejects non-int operands at compile time. any is allowed through (consistent with MEP-10 A2 escape hatch).

select expr returns [U] where U = ExprType(expr). Inside select the iteration variables are in scope.

select distinct expr is the same with deduplication. The element type must be hashable (T057). Hashable types are the numerics, bool, string, unit, lists/maps/options/structs/unions of hashable types, and any. Function and group values are rejected. The runtime canonicalises through a string form.

Aggregate operators

These are functions that take a list or a group and return a scalar:

  • count(g) : int. Accepts list or group. Anything else raises T037.
  • sum(g) : T where T is numeric. Non-numeric raises T041. The element type is preserved (MEP-5 P2), so a [bigint] sum stays bigint.
  • avg(g) : float for numerics. Non-numeric raises T038.
  • min(g) : T and max(g) : T for ordered T. Today rejected only on non-numeric (min/max() expects numeric expression); a future MEP may broaden to ordered.

Joins and cardinality

Inner join: rows with matching cond. Output cardinality is the intersection of left and right.

Left join: every left row appears, with null filling unmatched right-side bindings. Right join is symmetric. Outer join produces both sides.

Cardinality fixtures:

  • Inner join on id with two rows on each side, two matches: result has two rows.
  • Left join with one unmatched left row: result has the matched row and one row with the right side null.
  • Outer join with disjoint sides: result has the union of rows.

Today the unmatched right-side binding is typed as any. The eventual target is Option<T>, but that change is gated on the option-unwrap work (see Rationale). MEP 10 A2 (null literal is any) is closed and is not the blocker here.

Group plan and types

group by k1, ..., kn into g. Inside the rest of the query, g has type group<K, T> where K is the key (single key for one expression, pair_string for two strings, any otherwise today) and T is the source row type. Aggregates over g are typed:

  • count(g) : int
  • sum(g.field) : T if T : numeric
  • avg(g.field) : float
  • min(g.field) : T, max(g.field) : T

The group plan builder lives at types/query_plan_builder.go and emits a structured plan that the runtime consumes. The plan is also golden-tested under tests/queryplan/.

Distinct and sort

  • sort by e (closed, T056). The expression must be of an ordered type (int, int64, bigint, bigrat, float, bool, string, or list thereof). Anything else raises T056. The checker tolerates unbound identifiers inside the sort expression for compatibility with existing dataset queries that reference group-key fields as bare names (TPC-H q7 et al.); this is the only soft spot in the rule.
  • select distinct e (closed, T057). Deduplicates by structural equality on e. The runtime canonicalises through a string form. The checker requires e to be of a hashable type: numerics, bool, string, unit, lists/maps/options/structs/unions of hashable elements, or any. Functions and group values are rejected.

skip and take are closed under T055.

Soundness obligations (current)

  • A query whose source is not a list raises T032.
  • A where whose predicate is not bool raises T033.
  • A having whose predicate is not bool raises T042.
  • An aggregate misuse raises T037, T038, or T041.
  • A sort by expression that is not of an ordered type raises T056.
  • A select distinct expression that is not of a hashable type raises T057.
  • skip and take operands that are not int raise T055.
  • The result type of a query is [U] where U is the projected type.

Soundness targets

(All current targets shipped.)

Fixtures

Existing (committed in tests/types/):

  • Valid: query_basic_select, query_sort_take, query_sort_by_ordered, query_select_struct_proj, query_distinct_int, query_distinct_struct_canonical, query_join_inner_cardinality, query_join_left_unmatched_null, query_join_outer_both_sides, query_group_aggregate_int_sum, query_group_aggregate_avg_float, join_basic, join_multi.
  • Error: query_source_not_list (T032), query_where_bool_required (T033), join_source_not_list (T034), join_on_not_bool (T035), count_invalid_type (T037), avg_non_numeric (T038), query_group_having_bool_required (T042), query_take_int_required (T055), query_skip_int_required (T055), query_sort_by_unordered (T056), query_distinct_non_hashable (T057).

Rationale

LINQ-shaped queries are familiar, simple to teach, and easy to translate to a relational plan. We keep the clause order fixed because that simplifies the parser and makes the plan obvious.

The null-from-left-join hack is a known wart. The right-side bindings should be Option<T> so that pattern matching is required to access them. The blocker is not MEP 10 A2 (closed): it is the surrounding option-unwrap discipline. Today a let x: T? = ... does not auto-wrap a T value, and field access on an Option<T> does not auto-unwrap, so retyping the binding would break a large set of fixtures that read c.name directly off a left-joined c. We accept the wart until the unwrap rules are written; the cardinality fixtures here pin the result shape so the eventual change is a visible diff.

Backwards Compatibility

Tightening sort and distinct to type-check at compile time is a breaking change for programs that rely on the loose runtime semantics. We pin the current behaviour in fixtures before tightening, so the closure for each gap is visible as a diff against a known baseline. skip/take (T055) already shipped under that pattern.

Reference Implementation

  • types/check_expr.go:1252, checkQueryExpr: source, from, join, where, skip, take, group, having, sort, and aggregate select; emits T032, T033, T034, T035, T037, T038, T041, T042, T044, T055, T056, T057.
  • types/check_expr.go isOrdered helper, orderable kinds for sort by.
  • types/check_expr.go isHashable helper, hashable kinds for select distinct.
  • types/query_plan_builder.go, group plan construction.
  • types/errors.go, query-related diagnostic helpers (errQuerySourceList, errWhereBoolean, errJoinSourceList, errJoinOnBoolean, errHavingBoolean, errSumOperand, errImpurePredicate, errSkipTakeIntOperand, errSortByOrdered, errDistinctHashable).
  • tests/queryplan/, golden tests for the plan.

Open Questions

  • Ordered trait. Decide between an explicit interface for min, max, sort or a hard-coded list of orderable kinds.
  • Right-side null in left join. Retyping to Option<T> waits on option-unwrap discipline (auto-wrap on assignment, auto-unwrap or pattern-match on access). MEP 10 A2 (null literal is any) is closed and is not the blocker.

References

This document is placed in the public domain.