MEP 12. Parametric Polymorphism
| Field | Value |
|---|---|
| MEP | 12 |
| Title | Parametric Polymorphism |
| Author | Mochi core |
| Status | Active |
| Type | Standards Track |
| Created | 2026-05-08 |
Delivery status (2026-05-12)
MEP-12 is Active at v0.12.0. Six of seven sub-tasks have landed; the seventh (explicit type arguments) is opt-in syntactic sugar and remains tracked separately.
- MEP-12.1,
Substand substitution-awareUnifyintypes/subst.go. - MEP-12.2, call-site
Instantiatefreshens quantified vars per call; the running substitution carries across argument unification. - MEP-12.3, new diagnostics:
T047(parameter conflict),T048(escape),T049(arity). - MEP-12.4, builtins
first,last,push,append,concat,collect,reverse,reduce,keys,valuescarry parametric signatures;lenandcontainsaccept generic argument shapes. Looseanyreturns persist only on the interop set (json,str,int,lower,to_json). - MEP-12.5, explicit type argument syntax (
f<int>(x)), pending. Tracked as a separate task; the inference path covers the common cases without it. - MEP-12.6, fixtures on disk (
user_generic_id,generic_builtins,reduce_generic,generic_param_conflict,generic_freshness_per_call,generic_escaping_variable).
The any-builtin elimination programme from §Builtin signatures is complete for the propagation-shaped set. The unchanged interop set is documented as design intent under §Backwards Compatibility.
Abstract
Mochi parses generic function declarations but does not infer them. The
syntax fun id<T>(x: T): T is accepted, the body type checks with T
as a free type variable, and every call site re-reads the declared
signature without substitution. The pragmatic consequence is that the
standard library is typed with any parameters wherever a single type
parameter would suffice, because authors observed that propagation does
not happen. This MEP wires TypeVar through call inference using a
small substitution-based algorithm, introduces the diagnostics that
parametric inference needs, and replaces every loose any builtin with
a parametric signature.
Motivation
Two thirds of the soundness gaps catalogued in
MEP 10 reduce to the same root cause: the value
flowing through a generic call site is typed any because the call site
does not propagate. The fix is universally local. Each call introduces a
substitution; the substitution is built by unifying actual argument
types against the parametric parameter types; the result type is the
declared result under that substitution. This is System F restricted to
prenex universal quantification, which is what most production type
systems implement and what the language already has the syntax for.
The payoff is concrete. After MEP-12:
len([1, 2, 3])isint, notint. (No change, but for the right reason.)first([1, 2, 3])isint, notany.push(xs: [int], 7)is[int], not[any].keys({"a": 1, "b": 2})is[string], not[any].- A function
fun id<T>(x: T): Tcalled asid(7)has result typeint, notT.
Each any removed sharpens an error message somewhere downstream. The
work is bounded: one helper function, one new walk over the call site,
and a sweep of the builtin registration table.
Specification
Surface syntax (unchanged)
The grammar already admits type parameter lists. Repeated here for reference:
TypeParamList ::= '<' Ident (',' Ident)* '>'
FunDecl ::= 'fun' Ident TypeParamList? '(' ParamList ')'
(':' TypeRef)? Block
CallExpr ::= Ident TypeArgList? '(' ArgList ')'
TypeArgList ::= '<' TypeRef (',' TypeRef)* '>'
TypeArgList is opt-in (MEP-12.5). When omitted, the algorithm infers
the type arguments from the arguments at the call site.
Algorithm
The call-site procedure, given a call f(a₁, ..., aₙ) where f has
declared signature <G₁, ..., Gₖ>(P₁, ..., Pₙ) : R:
1. Allocate fresh TypeVar instances T₁, ..., Tₖ for G₁, ..., Gₖ.
2. Form the substitution s₀ = [G₁ ↦ T₁, ..., Gₖ ↦ Tₖ].
3. If the call site supplies explicit type arguments U₁, ..., Uₖ,
replace s₀ with [G₁ ↦ U₁, ..., Gₖ ↦ Uₖ]. Reject with T049 if the
arity is wrong.
4. Let s = s₀.
5. For i = 1 to n:
a. Infer Aᵢ from aᵢ.
b. s = unify(apply(s, Pᵢ), Aᵢ, s).
c. If unify fails, raise T047 with the conflict between Pᵢ[s] and Aᵢ.
6. R' = apply(s, R).
7. If any Tⱼ remains unbound in R', raise T048 with the unbound variable.
8. Return R'.
unify(s, t, sub) is the substitution-aware unifier:
unify(s, t, sub):
if s ≡ t under sub: return sub
if s is a free TypeVar α: bind α ↦ t in sub (occurs check)
if t is a free TypeVar α: bind α ↦ s in sub (occurs check)
if s and t are the same constructor:
recurse pairwise on their children, threading sub
otherwise: fail
apply(sub, t) walks t substituting bound variables. It is the same
function used at step 6.
The occurs check rejects α ↦ List<α> and the family of recursive type
variable cycles. It is a structural traversal over t.
Builtin signatures
The pre-MEP-12 builtin set is rewritten:
print<>(...) : unit -- variadic, by design loose
len<T>([T]) : int
len(string) : int -- overload
first<T>([T]) : option<T> -- MEP-10 A2 lands the option type
last<T>([T]) : option<T>
reverse<T>([T]) : [T]
reverse(string) : string -- overload
distinct<T>([T]) : [T]
push<T>([T], T) : [T]
append<T>([T], T) : [T]
concat<T>([T] ...) : [T]
copy<T>([T]) : [T] -- new, for the MEP-11.4 alias break
keys<K, V>({K: V}) : [K]
values<K, V>({K: V}) : [V]
collect<T>([T]) : [T]
contains<T>([T], T) : bool
contains<K, V>({K: V}, K) : bool -- overload
contains(string, string) : bool -- overload
range(int) : [int]
range(int, int) : [int]
range(int, int, int) : [int]
now() : int64
json<>(any) : unit -- by design loose
to_json<>(any) : string -- by design loose
str<>(any) : string -- by design loose
int<>(any) : int -- by design loose, runtime guard
parseIntStr(string, int) : int
upper(string) : string
lower<>(any) : string -- by design loose, runtime guard
trim(string) : string
The "by design loose" entries keep an any parameter or return because
they exist precisely to bridge the dynamic world (foreign calls, JSON
decode, untyped print). Every other builtin becomes parametric.
The duplicate keys registrations at types/check.go:529, 687, 712
collapse to one (MEP 6 Problems §1).
Diagnostics
This MEP introduces three error codes:
-
T047: conflicting unification. Produced when two argument positions bind the same type parameter to incompatible types. The diagnostic shows both arguments, both inferred types, and the parameter name.error[T047]: cannot unify type parameter `T`--> ex.mochi:3:53 | pair(1, "a")| ^ ^^^note: first argument bound T to `int`note: third argument requires T to be `string` -
T048: escaping type variable in result. Produced when the declared result type references a type parameter that no argument constrains.error[T048]: type parameter `T` escapes function result--> ex.mochi:1:11 | fun bad<T>() : T { return null }| ^note: `T` is not constrained by any argument; the result type isunknowable at call sites -
T049: arity mismatch on explicit type arguments.error[T049]: type argument arity mismatch for `id`--> ex.mochi:1:11 | id<int, string>(1)| ^^^^^^^^^^^^^note: `id` declares 1 type parameter; 2 were supplied
Parametricity
Once the algorithm above runs at every call site, the standard parametricity properties hold for the parametric functions in the language. We pin them as runtime equality assertions:
id<int>(7) == 7andid<string>("x") == "x".reverse(reverse(xs)) == xsfor any concrete element type.map<A, B>(xs, f) == [f(x) for x in xs]oncemapis in the standard library.
The parametricity tests live alongside the unit fixtures and are not the main soundness obligation. They guard against regressions in the substitution path.
Interaction with subtyping
Subtyping happens at the call site after substitution. The order is:
infer Aᵢ
unify Pᵢ[s] with Aᵢ -- binds type variables
subtype Aᵢ <: Pᵢ[s'] -- with the now-applied substitution
This is the standard "instantiate first, then check variance" rule from Pierce §22.6. It interacts correctly with the function variance rule from MEP 11 §T-Fun: a generic function passed as an argument must be instantiated before its variance is judged.
Migration
Tightening builtins from any to TypeVar is the largest single break
since the language acquired generics. We sequence it:
- Land MEP-11 first. The subtype split is the foundation.
- Land MEP-12.1 (substitution-aware unify) without changing any call-site behaviour.
- Land MEP-12.2 (call-site algorithm). At this point, user-defined generic functions infer correctly but builtins are still loose.
- Land MEP-12.4 (builtins) one builtin at a time. Each step runs the
full fixture suite. Breakages in user code are tightenings: a
program that called
first(xs)and bound the result to a variable that expectedanynow getsT, which the variable must accept (typically by changing the variable's declared type). - Land MEP-12.5 (explicit type arguments) as opt-in syntactic sugar. Deferred. The grammar change introduces
</>ambiguity with comparison expressions (f < gvsf<T>(x)) and requires a dedicated parser design pass with new lookahead rules. Inference works without explicit type arguments today; the syntax is sugar for the few cases where unification cannot pin aT. Tracked as a follow-up parser MEP once a concrete program needs it.
Each step is a separate commit with a separate fixture diff.
Rationale
Why not Hindley-Milner
Hindley-Milner adds let-generalisation. Mochi has explicit signatures everywhere a generalisation would happen, so there is no benefit. The cost is an additional pass over the AST to discover generalisable let bindings, and a more complex error story when generalisation fails. We skip it.
Why prenex only
Higher-rank polymorphism (fun f(g: <T>(T): T)) is implementable but
the call-site algorithm becomes substantially more complex. Real
programs in idiomatic Mochi do not need it: callbacks have monomorphic
signatures, and the few cases where a callback is generic over the
collection element are reachable by polymorphic recursion in the
caller. We defer higher-rank to a future MEP if demand surfaces.
Why fail on escaping variables
A function fun mystery<T>(): T is uncallable: nothing pins T. We
reject it at declaration time with T048 rather than letting the
declaration through and failing every call. The declaration is the
clearer error site. This matches Haskell's "ambiguous type variable"
behaviour and OCaml's value-restriction failure mode at the same
position.
Why the algorithm is local
Each call site is independent. There is no flow-sensitive constraint propagation across calls, no global solver, no constraint accumulation between statements. The locality keeps the algorithm O(call_sites × average_arity × type_depth) and keeps the error messages anchored to specific source positions.
Backwards Compatibility
This MEP breaks compatibility for any program that exploited the
loose-any builtin signatures. The breakages take three shapes:
- Tightened result types.
let x: any = first(xs)wherexs: [int]compiled before and yieldedx : any. After MEP-12 the right-hand side has typeoption<int>. The assignment now requires either declaringx : option<int>(preferred) or an explicitas anycast (escape hatch). - Tightened argument types.
push(xs, "a")wherexs: [int]compiled before because both[T]andTwereany. After MEP-12.2 it fails with T047 (the first argument bound T toint, the second requires T to bestring). - Escaping type variables in user code.
fun mystery<T>(): Tcompiled before. After MEP-12.2 it fails with T048.
We accept these breaks. Each fix tightens the program; no fix loosens it. The fixtures pinned in this MEP make the breaking surface visible at review time.
The escape hatch for genuinely dynamic values is the same as before:
declare them any explicitly. The interop builtins (json, str,
int, lower, to_json) keep their loose signatures by design.
Reference Implementation
The substitution machinery and call-site walk live in:
types/subst.go(new).Subst,apply,compose,unify.types/check.go callType(replace). The call-site algorithm.types/check.go builtin registrations(rewrite). The parametric signatures from §Builtin signatures.types/errors.go(extend). T047, T048, T049.parser/parser.go CallExpr(extend). OptionalTypeArgList.
Algorithm pseudocode
def call_type(fn: FuncType, args: list[Expr], type_args: list[Type] | None,
env: Env) -> Type:
sub = Subst()
if type_args is not None:
if len(type_args) != len(fn.type_params):
raise T049(fn, type_args)
for g, t in zip(fn.type_params, type_args):
sub.bind(g, t)
else:
for g in fn.type_params:
sub.bind(g, TypeVar(fresh()))
for i, (param, arg) in enumerate(zip(fn.params, args)):
a_type = infer(arg, env)
try:
sub = unify(apply(sub, param.type), a_type, sub)
except UnifyError as e:
raise T047(fn, i, e)
result = apply(sub, fn.result)
escaped = free_type_vars(result) - sub.bound_set()
if escaped:
raise T048(fn, escaped)
return result
Performance
Per call site, the work is O(k + n × d) where k is the number of
type parameters, n is the number of arguments, and d is the
maximum type depth. The substitution is a small map[string]Type
allocated fresh per call and discarded when the call type is returned.
For a 100 kLoC Mochi codebase with average 4 arguments and type depth
6, the additional inference work is well under 1% of the total check
time.
Test obligations
The MEP-12 fixture set, pinned in MEP-9:
user_generic_id(valid, on disk). Exercisesfun id<T>(x: T): Tandfun pair<A, B>(...)with inferred return types at each call site. Replaces the speculativegeneric_id_int_string/generic_pair_arg_unifypair.generic_builtins(valid, on disk). Asserts inferred element types throughfirst,last,push,keys,values, etc.reduce_generic(valid, on disk). Pins the parametricreduce<A, B>signature; without MEP-12.2 the call site fell back toanyand required an explicitas Tcast.generic_param_conflict(error T047, on disk). Two arguments shareTbut bind to incompatible types (pair(1, "two")).generic_freshness_per_call(valid, on disk). Two calls of the samefun id<T>(x: T): Twith disjoint argument types do not produce a unification error: each call site allocates freshTypeVarinstances viaInstantiate, so the substitution from one call cannot poison another.generic_escaping_variable(error T048, on disk). A function whoseTis unconstrained by any argument (fun ghost<T>(x: any): T) raises T048 at the call site. The body usesx as Tto placate the return-type check; theany as Tcast is admitted by T046 because any-related casts are allowed.generic_constraint_arity_under,generic_constraint_arity_over(error T049, deferred, blocked on MEP-12.5; see Implementation plan step 5).generic_explicit_type_arg(valid, deferred, blocked on MEP-12.5).
Open Questions
- Generic struct declarations.
type Pair<A, B> { fst: A, snd: B }is the natural next step. The grammar does not have it today. We defer to a follow-up MEP because the parser change is substantial and the inference change is a small extension of the algorithm here. - Type-class style traits.
T: ComparableorT: Hashablestyle bounds are out of scope. Mochi resolves these by builtin special cases today. - Polymorphic recursion. A function calling itself with a type argument different from its declared parameter requires a per-call type-argument annotation. We accept it once MEP-12.5 lands; until then, polymorphic recursion compiles to monomorphic recursion at the declared type.
References
- Robin Milner, "A Theory of Type Polymorphism in Programming", Journal of Computer and System Sciences, 1978.
- Luís Damas and Robin Milner, "Principal Type-Schemes for Functional Programs", POPL, 1982.
- Benjamin C. Pierce, Types and Programming Languages, MIT Press, 2002, chapters 22 and 23.
- Mark P. Jones, "A Theory of Qualified Types", ESOP, 1992.
- Simon Peyton Jones et al., "Practical Type Inference for Arbitrary-Rank Types", JFP, 2007 (background for the prenex decision).
Copyright
This document is placed in the public domain.