Skip to main content

MEP 12. Parametric Polymorphism

FieldValue
MEP12
TitleParametric Polymorphism
AuthorMochi core
StatusActive
TypeStandards Track
Created2026-05-08

Delivery status (2026-05-12)

MEP-12 is Active at v0.12.0. Six of seven sub-tasks have landed; the seventh (explicit type arguments) is opt-in syntactic sugar and remains tracked separately.

  • MEP-12.1, Subst and substitution-aware Unify in types/subst.go.
  • MEP-12.2, call-site Instantiate freshens quantified vars per call; the running substitution carries across argument unification.
  • MEP-12.3, new diagnostics: T047 (parameter conflict), T048 (escape), T049 (arity).
  • MEP-12.4, builtins first, last, push, append, concat, collect, reverse, reduce, keys, values carry parametric signatures; len and contains accept generic argument shapes. Loose any returns persist only on the interop set (json, str, int, lower, to_json).
  • MEP-12.5, explicit type argument syntax (f<int>(x)), pending. Tracked as a separate task; the inference path covers the common cases without it.
  • MEP-12.6, fixtures on disk (user_generic_id, generic_builtins, reduce_generic, generic_param_conflict, generic_freshness_per_call, generic_escaping_variable).

The any-builtin elimination programme from §Builtin signatures is complete for the propagation-shaped set. The unchanged interop set is documented as design intent under §Backwards Compatibility.

Abstract

Mochi parses generic function declarations but does not infer them. The syntax fun id<T>(x: T): T is accepted, the body type checks with T as a free type variable, and every call site re-reads the declared signature without substitution. The pragmatic consequence is that the standard library is typed with any parameters wherever a single type parameter would suffice, because authors observed that propagation does not happen. This MEP wires TypeVar through call inference using a small substitution-based algorithm, introduces the diagnostics that parametric inference needs, and replaces every loose any builtin with a parametric signature.

Motivation

Two thirds of the soundness gaps catalogued in MEP 10 reduce to the same root cause: the value flowing through a generic call site is typed any because the call site does not propagate. The fix is universally local. Each call introduces a substitution; the substitution is built by unifying actual argument types against the parametric parameter types; the result type is the declared result under that substitution. This is System F restricted to prenex universal quantification, which is what most production type systems implement and what the language already has the syntax for.

The payoff is concrete. After MEP-12:

  • len([1, 2, 3]) is int, not int. (No change, but for the right reason.)
  • first([1, 2, 3]) is int, not any.
  • push(xs: [int], 7) is [int], not [any].
  • keys({"a": 1, "b": 2}) is [string], not [any].
  • A function fun id<T>(x: T): T called as id(7) has result type int, not T.

Each any removed sharpens an error message somewhere downstream. The work is bounded: one helper function, one new walk over the call site, and a sweep of the builtin registration table.

Specification

Surface syntax (unchanged)

The grammar already admits type parameter lists. Repeated here for reference:

TypeParamList ::= '<' Ident (',' Ident)* '>'
FunDecl ::= 'fun' Ident TypeParamList? '(' ParamList ')'
(':' TypeRef)? Block
CallExpr ::= Ident TypeArgList? '(' ArgList ')'
TypeArgList ::= '<' TypeRef (',' TypeRef)* '>'

TypeArgList is opt-in (MEP-12.5). When omitted, the algorithm infers the type arguments from the arguments at the call site.

Algorithm

The call-site procedure, given a call f(a₁, ..., aₙ) where f has declared signature <G₁, ..., Gₖ>(P₁, ..., Pₙ) : R:

1. Allocate fresh TypeVar instances T₁, ..., Tₖ for G₁, ..., Gₖ.
2. Form the substitution s₀ = [G₁ ↦ T₁, ..., Gₖ ↦ Tₖ].
3. If the call site supplies explicit type arguments U₁, ..., Uₖ,
replace s₀ with [G₁ ↦ U₁, ..., Gₖ ↦ Uₖ]. Reject with T049 if the
arity is wrong.
4. Let s = s₀.
5. For i = 1 to n:
a. Infer Aᵢ from aᵢ.
b. s = unify(apply(s, Pᵢ), Aᵢ, s).
c. If unify fails, raise T047 with the conflict between Pᵢ[s] and Aᵢ.
6. R' = apply(s, R).
7. If any Tⱼ remains unbound in R', raise T048 with the unbound variable.
8. Return R'.

unify(s, t, sub) is the substitution-aware unifier:

unify(s, t, sub):
if s ≡ t under sub: return sub
if s is a free TypeVar α: bind α ↦ t in sub (occurs check)
if t is a free TypeVar α: bind α ↦ s in sub (occurs check)
if s and t are the same constructor:
recurse pairwise on their children, threading sub
otherwise: fail

apply(sub, t) walks t substituting bound variables. It is the same function used at step 6.

The occurs check rejects α ↦ List<α> and the family of recursive type variable cycles. It is a structural traversal over t.

Builtin signatures

The pre-MEP-12 builtin set is rewritten:

print<>(...) : unit -- variadic, by design loose
len<T>([T]) : int
len(string) : int -- overload
first<T>([T]) : option<T> -- MEP-10 A2 lands the option type
last<T>([T]) : option<T>
reverse<T>([T]) : [T]
reverse(string) : string -- overload
distinct<T>([T]) : [T]
push<T>([T], T) : [T]
append<T>([T], T) : [T]
concat<T>([T] ...) : [T]
copy<T>([T]) : [T] -- new, for the MEP-11.4 alias break
keys<K, V>({K: V}) : [K]
values<K, V>({K: V}) : [V]
collect<T>([T]) : [T]
contains<T>([T], T) : bool
contains<K, V>({K: V}, K) : bool -- overload
contains(string, string) : bool -- overload
range(int) : [int]
range(int, int) : [int]
range(int, int, int) : [int]
now() : int64
json<>(any) : unit -- by design loose
to_json<>(any) : string -- by design loose
str<>(any) : string -- by design loose
int<>(any) : int -- by design loose, runtime guard
parseIntStr(string, int) : int
upper(string) : string
lower<>(any) : string -- by design loose, runtime guard
trim(string) : string

The "by design loose" entries keep an any parameter or return because they exist precisely to bridge the dynamic world (foreign calls, JSON decode, untyped print). Every other builtin becomes parametric.

The duplicate keys registrations at types/check.go:529, 687, 712 collapse to one (MEP 6 Problems §1).

Diagnostics

This MEP introduces three error codes:

  • T047: conflicting unification. Produced when two argument positions bind the same type parameter to incompatible types. The diagnostic shows both arguments, both inferred types, and the parameter name.

    error[T047]: cannot unify type parameter `T`
    --> ex.mochi:3:5
    3 | pair(1, "a")
    | ^ ^^^
    note: first argument bound T to `int`
    note: third argument requires T to be `string`
  • T048: escaping type variable in result. Produced when the declared result type references a type parameter that no argument constrains.

    error[T048]: type parameter `T` escapes function result
    --> ex.mochi:1:1
    1 | fun bad<T>() : T { return null }
    | ^
    note: `T` is not constrained by any argument; the result type is
    unknowable at call sites
  • T049: arity mismatch on explicit type arguments.

    error[T049]: type argument arity mismatch for `id`
    --> ex.mochi:1:1
    1 | id<int, string>(1)
    | ^^^^^^^^^^^^^
    note: `id` declares 1 type parameter; 2 were supplied

Parametricity

Once the algorithm above runs at every call site, the standard parametricity properties hold for the parametric functions in the language. We pin them as runtime equality assertions:

  • id<int>(7) == 7 and id<string>("x") == "x".
  • reverse(reverse(xs)) == xs for any concrete element type.
  • map<A, B>(xs, f) == [f(x) for x in xs] once map is in the standard library.

The parametricity tests live alongside the unit fixtures and are not the main soundness obligation. They guard against regressions in the substitution path.

Interaction with subtyping

Subtyping happens at the call site after substitution. The order is:

infer Aᵢ
unify Pᵢ[s] with Aᵢ -- binds type variables
subtype Aᵢ <: Pᵢ[s'] -- with the now-applied substitution

This is the standard "instantiate first, then check variance" rule from Pierce §22.6. It interacts correctly with the function variance rule from MEP 11 §T-Fun: a generic function passed as an argument must be instantiated before its variance is judged.

Migration

Tightening builtins from any to TypeVar is the largest single break since the language acquired generics. We sequence it:

  1. Land MEP-11 first. The subtype split is the foundation.
  2. Land MEP-12.1 (substitution-aware unify) without changing any call-site behaviour.
  3. Land MEP-12.2 (call-site algorithm). At this point, user-defined generic functions infer correctly but builtins are still loose.
  4. Land MEP-12.4 (builtins) one builtin at a time. Each step runs the full fixture suite. Breakages in user code are tightenings: a program that called first(xs) and bound the result to a variable that expected any now gets T, which the variable must accept (typically by changing the variable's declared type).
  5. Land MEP-12.5 (explicit type arguments) as opt-in syntactic sugar. Deferred. The grammar change introduces </> ambiguity with comparison expressions (f < g vs f<T>(x)) and requires a dedicated parser design pass with new lookahead rules. Inference works without explicit type arguments today; the syntax is sugar for the few cases where unification cannot pin a T. Tracked as a follow-up parser MEP once a concrete program needs it.

Each step is a separate commit with a separate fixture diff.

Rationale

Why not Hindley-Milner

Hindley-Milner adds let-generalisation. Mochi has explicit signatures everywhere a generalisation would happen, so there is no benefit. The cost is an additional pass over the AST to discover generalisable let bindings, and a more complex error story when generalisation fails. We skip it.

Why prenex only

Higher-rank polymorphism (fun f(g: <T>(T): T)) is implementable but the call-site algorithm becomes substantially more complex. Real programs in idiomatic Mochi do not need it: callbacks have monomorphic signatures, and the few cases where a callback is generic over the collection element are reachable by polymorphic recursion in the caller. We defer higher-rank to a future MEP if demand surfaces.

Why fail on escaping variables

A function fun mystery<T>(): T is uncallable: nothing pins T. We reject it at declaration time with T048 rather than letting the declaration through and failing every call. The declaration is the clearer error site. This matches Haskell's "ambiguous type variable" behaviour and OCaml's value-restriction failure mode at the same position.

Why the algorithm is local

Each call site is independent. There is no flow-sensitive constraint propagation across calls, no global solver, no constraint accumulation between statements. The locality keeps the algorithm O(call_sites × average_arity × type_depth) and keeps the error messages anchored to specific source positions.

Backwards Compatibility

This MEP breaks compatibility for any program that exploited the loose-any builtin signatures. The breakages take three shapes:

  1. Tightened result types. let x: any = first(xs) where xs: [int] compiled before and yielded x : any. After MEP-12 the right-hand side has type option<int>. The assignment now requires either declaring x : option<int> (preferred) or an explicit as any cast (escape hatch).
  2. Tightened argument types. push(xs, "a") where xs: [int] compiled before because both [T] and T were any. After MEP-12.2 it fails with T047 (the first argument bound T to int, the second requires T to be string).
  3. Escaping type variables in user code. fun mystery<T>(): T compiled before. After MEP-12.2 it fails with T048.

We accept these breaks. Each fix tightens the program; no fix loosens it. The fixtures pinned in this MEP make the breaking surface visible at review time.

The escape hatch for genuinely dynamic values is the same as before: declare them any explicitly. The interop builtins (json, str, int, lower, to_json) keep their loose signatures by design.

Reference Implementation

The substitution machinery and call-site walk live in:

  • types/subst.go (new). Subst, apply, compose, unify.
  • types/check.go callType (replace). The call-site algorithm.
  • types/check.go builtin registrations (rewrite). The parametric signatures from §Builtin signatures.
  • types/errors.go (extend). T047, T048, T049.
  • parser/parser.go CallExpr (extend). Optional TypeArgList.

Algorithm pseudocode

def call_type(fn: FuncType, args: list[Expr], type_args: list[Type] | None,
env: Env) -> Type:
sub = Subst()
if type_args is not None:
if len(type_args) != len(fn.type_params):
raise T049(fn, type_args)
for g, t in zip(fn.type_params, type_args):
sub.bind(g, t)
else:
for g in fn.type_params:
sub.bind(g, TypeVar(fresh()))

for i, (param, arg) in enumerate(zip(fn.params, args)):
a_type = infer(arg, env)
try:
sub = unify(apply(sub, param.type), a_type, sub)
except UnifyError as e:
raise T047(fn, i, e)

result = apply(sub, fn.result)
escaped = free_type_vars(result) - sub.bound_set()
if escaped:
raise T048(fn, escaped)
return result

Performance

Per call site, the work is O(k + n × d) where k is the number of type parameters, n is the number of arguments, and d is the maximum type depth. The substitution is a small map[string]Type allocated fresh per call and discarded when the call type is returned. For a 100 kLoC Mochi codebase with average 4 arguments and type depth 6, the additional inference work is well under 1% of the total check time.

Test obligations

The MEP-12 fixture set, pinned in MEP-9:

  • user_generic_id (valid, on disk). Exercises fun id<T>(x: T): T and fun pair<A, B>(...) with inferred return types at each call site. Replaces the speculative generic_id_int_string / generic_pair_arg_unify pair.
  • generic_builtins (valid, on disk). Asserts inferred element types through first, last, push, keys, values, etc.
  • reduce_generic (valid, on disk). Pins the parametric reduce<A, B> signature; without MEP-12.2 the call site fell back to any and required an explicit as T cast.
  • generic_param_conflict (error T047, on disk). Two arguments share T but bind to incompatible types (pair(1, "two")).
  • generic_freshness_per_call (valid, on disk). Two calls of the same fun id<T>(x: T): T with disjoint argument types do not produce a unification error: each call site allocates fresh TypeVar instances via Instantiate, so the substitution from one call cannot poison another.
  • generic_escaping_variable (error T048, on disk). A function whose T is unconstrained by any argument (fun ghost<T>(x: any): T) raises T048 at the call site. The body uses x as T to placate the return-type check; the any as T cast is admitted by T046 because any-related casts are allowed.
  • generic_constraint_arity_under, generic_constraint_arity_over (error T049, deferred, blocked on MEP-12.5; see Implementation plan step 5).
  • generic_explicit_type_arg (valid, deferred, blocked on MEP-12.5).

Open Questions

  • Generic struct declarations. type Pair<A, B> { fst: A, snd: B } is the natural next step. The grammar does not have it today. We defer to a follow-up MEP because the parser change is substantial and the inference change is a small extension of the algorithm here.
  • Type-class style traits. T: Comparable or T: Hashable style bounds are out of scope. Mochi resolves these by builtin special cases today.
  • Polymorphic recursion. A function calling itself with a type argument different from its declared parameter requires a per-call type-argument annotation. We accept it once MEP-12.5 lands; until then, polymorphic recursion compiles to monomorphic recursion at the declared type.

References

  • Robin Milner, "A Theory of Type Polymorphism in Programming", Journal of Computer and System Sciences, 1978.
  • Luís Damas and Robin Milner, "Principal Type-Schemes for Functional Programs", POPL, 1982.
  • Benjamin C. Pierce, Types and Programming Languages, MIT Press, 2002, chapters 22 and 23.
  • Mark P. Jones, "A Theory of Qualified Types", ESOP, 1992.
  • Simon Peyton Jones et al., "Practical Type Inference for Arbitrary-Rank Types", JFP, 2007 (background for the prenex decision).

This document is placed in the public domain.