MEP 2. Grammar
| Field | Value |
|---|---|
| MEP | 2 |
| Title | Grammar |
| Author | Mochi core |
| Status | Informational |
| Type | Informational |
| Created | 2026-05-08 |
| Revised | 2026-05-11 |
Abstract
Mochi has a Parsing Expression Grammar (PEG) with ordered choice. Every well-formed source has exactly one parse tree. The grammar is encoded on Go struct tags consumed by participle/v2, but the canonical specification is the PEG productions in this document. Production order is meaningful: alternatives are tried top-to-bottom, the first that matches wins, and no later alternative is considered. Every shape the grammar accepts is well-formed by construction. There is no post-parse "this should not have parsed" pass.
The companion document is MEP 1 (Lexer). MEP 1 fixes the token stream; this MEP fixes the productions consumed from it.
Motivation
A language is easy to learn when its grammar has no special cases. Three principles drive this revision:
- One source, one parse tree. PEG ordered choice removes ambiguity. The reader does not need to know associativity tables, lookahead sets, or shift-reduce conventions to predict how a fragment parses.
- No corner cases. Anything the grammar accepts is meaningful. Shapes that have no meaning (empty subscript, bodyless match arm, untyped union literal) do not match any production. There is no separate post-parse layer that quietly rejects what the parser quietly accepted.
- Layered precedence. Binary operators are laid out in a precedence ladder where every level has the same shape:
Operand (op Operand)*. The operand is always aUnary. Both sides of every binary operator are structurally identical, soa + b,a + -b,-a + -b, anda + --!ball parse from the same rule.
These principles make the grammar comprehensible end-to-end and remove the implementation cliffs that the previous revision papered over with post-parse validators.
Grammar notation
The grammar is written in the standard PEG meta-syntax, restricted to a small operator set (no action blocks, no return-type annotations). Each operator below has its usual PEG semantics.
| Operator | Meaning |
|---|---|
e1 e2 | Sequence: match e1, then e2. |
e1 | e2 | Ordered choice: try e1; if it fails, try e2. |
e? | Optional: 0 or 1 occurrence. |
e* | Repetition: 0 or more occurrences. |
e+ | Repetition: 1 or more occurrences. |
s.e+ | One or more e, separated by s. The separator is not kept. |
&e | Positive lookahead: match without consuming input. |
!e | Negative lookahead: fail without consuming input. |
~ | Cut: commit to the current alternative, no further backtrack. |
'lit' | Match a literal token (keyword or punctuation). |
Ident | Match a token kind from MEP 1 (Ident, Int, String, ...). |
Rule names are PascalCase. The start rule is Program. Comments and whitespace are removed by the lexer (MEP 1) before parsing; they appear nowhere in the productions.
The grammar is left-factored. No rule begins with a recursive call to itself. Where the layered precedence ladder reads naturally as left recursion (Expr: Expr op Expr | Operand), the equivalent right-recursive shape Expr: Operand (op Operand)* is used instead. The two are observationally identical; the latter does not require a left-recursion-capable parser.
Specification
Top level
Program = 'package' Ident? Statement*
Statement = TestBlock | BenchBlock | ExpectStmt
| AgentDecl | StreamDecl | ModelDecl
| ImportStmt | TypeDecl
| ExternTypeDecl | ExternVarDecl
| ExternFunDecl | ExternObjectDecl
| FactStmt | RuleStmt | OnHandler | EmitStmt
| LetStmt | VarStmt | FunStmt
| ReturnStmt | IfStmt | WhileStmt | ForStmt
| BreakStmt | ContinueStmt
| FetchStmt | UpdateStmt
| AssignStmt | ExprStmt
Statement alternatives that begin with a unique keyword come first. The last two alternatives (AssignStmt, ExprStmt) do not, so they sit at the bottom. PEG ordered choice guarantees that a keyword-led form is never confused for a bare expression.
Declarations
LetStmt = 'let' Ident (':' TypeRef)? ('=' Expr)?
VarStmt = 'var' Ident (':' TypeRef)? ('=' Expr)?
AssignStmt = AssignTarget '=' Expr
AssignTarget = Ident (IndexOp | FieldOp)*
FunStmt = 'export'? 'fun' Ident TypeParams?
'(' ','.Param* ')' (':' TypeRef)?
'{' Statement* '}'
TypeParams = '<' ','.Ident+ '>'
Param = Ident (':' TypeRef)?
ReturnStmt = 'return' Expr?
BreakStmt = 'break'
ContinueStmt = 'continue'
AssignTarget is the postfix chain rooted at an identifier. The grammar therefore accepts a = 1, a.b = 1, a[0] = 1, and a.b[0].c = 1, and rejects assignments to literal-rooted expressions (1 = x, f() = x) at parse time. Call expressions cannot appear on the left of =.
Type declarations
A type declaration takes exactly one of three forms, distinguished by the first token after =:
TypeDecl = 'type' Ident TypeDeclBody
TypeDeclBody = StructBody | VariantUnion | TypeAlias
StructBody = '='? '{' ','.TypeField+ ','? '}'
| '='? '{' '}'
VariantUnion = '=' TypeVariant ('|' TypeVariant)*
& VariantOrAliasHint
TypeAlias = '=' TypeRef
TypeVariant = Ident VariantPayload?
VariantPayload = '(' ','.TypeField+ ','? ')'
| '{' TypeField* '}'
TypeField = Ident ':' TypeRef
The grammar distinguishes VariantUnion from TypeAlias by lookahead: VariantOrAliasHint succeeds only when the head identifier is followed by (, {, or |. A bare identifier or any compound type expression falls through to TypeAlias.
| Shape | Example |
|---|---|
| Struct | type Point { x: int, y: int } |
Struct, explicit = | type Point = { x: int, y: int } |
| Variant union, 1 variant | type Pair = P(a: int, b: int) |
| Variant union, N variants | type Shape = Circle(r: float) | Square(s) |
| Alias to a type ref | type Id = int |
| Alias to a generic | type IdList = list<int> |
| Alias to a function | type Fn = fun(int): string |
| Alias to inline struct | type Pt = { x: int, y: int } |
A single-variant union (type Pair = P(a, b)) is structurally a VariantUnion, not a degenerate TypeAlias. The empty-struct form type Empty {} is accepted; the empty union form type X = | is not, because VariantUnion requires at least one TypeVariant.
Type references
TypeRef = FunType
| GenericType
| InlineStructType
| Ident
FunType = 'fun' '(' ','.TypeRef* ')' (':' TypeRef)?
GenericType = Ident '<' ','.TypeRef+ '>'
InlineStructType = '{' ','.TypeField* ','? '}'
TypeRef is the type-position expression. There is no union type literal in TypeRef. Sum types must be named through TypeDecl. There is no tuple type literal. Heterogeneous fixed-shape data uses a struct.
Control flow
IfStmt = 'if' Expr '{' Statement* '}'
('else' (IfStmt | '{' Statement* '}'))?
WhileStmt = 'while' Expr '{' Statement* '}'
ForStmt = 'for' Ident 'in' ForRange '{' Statement* '}'
ForRange = Expr '..' Expr
| Expr
ForRange puts the half-open range form first so 0..n binds as a range rather than as an expression followed by stray tokens. The general form for x in xs falls through to the second alternative.
Pattern matching
MatchExpr = 'match' Expr '{' MatchArm* '}'
MatchArm = Pattern '=>' MatchBody
MatchBody = Expr | '{' Statement* '}'
Pattern = Expr
MatchBody is mandatory. A => with nothing on its right does not parse. The pattern is just an expression; the type checker decides whether a given expression shape is a legal pattern (see MEP 13).
Expressions
The expression grammar is a precedence ladder. Every level has the same shape: Operand (op Operand)*. The operand is always Unary. Operator precedence is encoded structurally; the parser does not consult a precedence table.
Expr = OrExpr
OrExpr = AndExpr ('||' AndExpr)*
AndExpr = NotExpr ('&&' NotExpr)*
NotExpr = '!'* RelExpr
RelExpr = AddExpr (RelOp AddExpr)*
RelOp = '==' | '!=' | '<=' | '>=' | '<' | '>'
| 'in' | 'union' 'all'? | 'except' | 'intersect'
AddExpr = MulExpr (('+' | '-') MulExpr)*
MulExpr = Unary (('*' | '/' | '%') Unary)*
Unary = ('-')* PostfixExpr
PostfixExpr = Primary PostfixOp*
PostfixOp = CallOp | IndexOp | FieldOp | CastOp
CallOp = '(' ','.Expr* ')'
FieldOp = '.' Ident
CastOp = 'as' TypeRef
Three invariants the grammar guarantees:
- Operand symmetry. Both operands of every binary operator are a
Unary. The right side carries the same unary-prefix slot as the left, soa + -b,x == -1,a && !b, anda + -!b * call parse. This is the canonical layered precedence shape; see Pratt 1973 and Crafting Interpreters §6. - Prefix stacking.
Unaryaccepts('-')*, so--xis two unary minuses and!!xis two boolean nots. There is no decrement or increment operator; the longest-match rule of MEP 1 is irrelevant because there are no two-character--or++tokens. - Postfix chains.
PostfixOp*is unbounded, sof()(2).field as intis one chain in one production. Method-call sugar lives inPostfixOp, not inPrimary.
Operator precedence, from loosest to tightest:
| Level | Operators | Production |
|---|---|---|
| 1 | || | OrExpr |
| 2 | && | AndExpr |
| 3 | ! prefix | NotExpr |
| 4 | == != <= >= < > in union[all]? except intersect | RelExpr |
| 5 | + - (binary) | AddExpr |
| 6 | * / % | MulExpr |
| 7 | - prefix | Unary |
| 8 | () [] . as | PostfixOp |
All binary operators are left-associative. Right-associativity is achieved by parenthesisation. Prefix - and prefix ! are not interchangeable: ! lives in NotExpr (between && and the relational level) so !a == b parses as !(a == b) only if grouped explicitly; otherwise as (!a) == b. Numeric negation lives in Unary, tightly binding to the operand.
Subscript and slice
The grammar enumerates every well-formed shape. Empty subscripts and slices have no production:
IndexOp = '[' SliceForm ']'
SliceForm = Expr ':' Expr ':' Expr (* [a:b:c] *)
| Expr ':' Expr ':' (* [a:b:] *)
| Expr ':' ':' Expr (* [a::c] *)
| Expr ':' ':' (* [a::] *)
| Expr ':' Expr (* [a:b] *)
| Expr ':' (* [a:] *)
| ':' Expr ':' Expr (* [:b:c] *)
| ':' Expr ':' (* [:b:] *)
| ':' ':' Expr (* [::c] *)
| ':' Expr (* [:b] *)
| Expr (* [a] *)
The 11 alternatives cover the full lattice of well-formed slices and the single subscript shape. xs[], xs[:], and xs[::] match no alternative; they are syntax errors at parse time, not at a later pass. A post-parse validator for empty-index shapes is no longer needed.
Primary expressions
Primary = StructLiteral | QueryExpr | LogicQueryExpr
| IfExpr | SelectorExpr
| ListLiteral | MapLiteral
| FunExpr | MatchExpr | GenerateExpr | FetchExpr
| LoadExpr | SaveExpr | Literal
| '(' Expr ')'
StructLiteral = Ident '{' ','.StructLitField* ','? '}'
StructLitField= Ident ':' Expr
SelectorExpr = Ident ('.' Ident)*
ListLiteral = '[' ','.Expr* ','? ']'
MapLiteral = '{' ','.MapEntry* ','? '}'
MapEntry = (Ident | String | Int) ':' Expr
Literal = Int | Float | String | 'true' | 'false' | 'null'
MapLiteral and InlineStructType share the { ... } shape. PEG ordered choice resolves the ambiguity by context: in expression position the parser tries StructLiteral first (only matches when an identifier precedes the brace) and MapLiteral second; in type position only InlineStructType is available.
Lambda expressions
FunExpr = 'fun' TypeParams?
'(' ','.Param* ')' (':' TypeRef)?
('{' Statement* '}' | '=>' Expr)
Block-bodied and arrow-bodied forms are both accepted. The arrow form yields its expression as the function result.
Queries
The query expression follows LINQ clause order. The grammar enforces the order; SQL-style select ... from ... does not parse.
QueryExpr = 'from' Ident 'in' Expr
FromClause*
JoinClause*
('where' Expr)?
GroupByClause?
(('sort' | 'order') 'by' Expr)?
('skip' Expr)?
('take' Expr)?
'select' 'distinct'? Expr
FromClause = 'from' Ident 'in' Expr
JoinClause = ('left' | 'right' | 'outer')? 'join' 'from'? Ident
'in' Expr 'on' Expr
GroupByClause = 'group' 'by' ','.Expr+ 'into' Ident ('having' Expr)?
Logic programming
FactStmt = 'fact' LogicPredicate
RuleStmt = 'rule' LogicPredicate ':-' ','.LogicCond+
LogicCond = LogicPredicate | LogicNeq
LogicNeq = Ident '!=' Ident
LogicPredicate = Ident '(' ','.LogicTerm* ')'
LogicTerm = Ident | String | Int
LogicQueryExpr = 'query' LogicPredicate
These productions parse and the tree-walking interpreter executes them. The bytecode VM does not. See MEP 10.
Streams and agents
StreamDecl = 'stream' Ident '{' StreamField* '}'
StreamField = Ident ':' TypeRef
EmitStmt = 'emit' Ident '{' ','.StructLitField* ','? '}'
OnHandler = 'on' Ident 'as' Ident '{' Statement* '}'
AgentDecl = 'agent' Ident '{' AgentBlock* '}'
AgentBlock = LetStmt | VarStmt | AssignStmt | OnHandler | IntentDecl
IntentDecl = 'intent' Ident '(' ','.Param* ')'
(':' TypeRef)? '{' Statement* '}'
I/O
FetchStmt = 'fetch' Expr 'into' Ident ('with' Expr)?
LoadExpr = 'load' String? 'as' TypeRef ('with' Expr)?
SaveExpr = 'save' Expr ('to' String)? ('with' Expr)?
load and save require both as Type and a format option in the with map for any non-default behaviour.
Design principles
Single parse tree per source
PEG ordered choice eliminates ambiguity at the grammar level. A reader does not need to know shift-reduce conventions, lookahead sets, or associativity tables to predict how a fragment parses. The first alternative that matches wins; no later alternative is considered.
No corner cases
A grammar that accepts shapes a post-parse pass then rejects is two grammars layered. Every shape this MEP defines is meaningful by construction:
IndexOpenumerates the 11 well-formed slice shapes.xs[],xs[:],xs[::]match no production.MatchBodyis mandatory.=>followed by nothing matches no production.AssignTargetis restricted to postfix chains rooted at an identifier.1 = x,f() = x,(a + b) = xmatch no production.VariantUnionrequires at least oneTypeVariant.type X = |matches no production.
The post-parse normalisation pass that earlier revisions documented under codes P060 and P061 is removed. Those codes are retired.
Layered precedence with operand symmetry
The expression grammar follows the layered precedence ladder of Aho-Sethi-Ullman §4.3 and Crafting Interpreters §6. Every binary level has the shape Operand (op Operand)* with Operand = Unary. Both sides of every binary operator are the same operand type. The reader and the implementor are guaranteed that any unary prefix accepted at any operand position is accepted at every operand position. There is no asymmetry between Left and Right to remember or work around.
Keyword-led statements
All compound statements lead with a unique keyword (let, var, fun, if, while, for, return, break, continue, type, fact, rule, import, stream, agent, on, emit, fetch, test, bench, expect). The grammar disambiguates by keyword, not by lookahead distance. The two keyword-less statements (AssignStmt, ExprStmt) sit last in the alternation and are recognised by their structural shape.
No left recursion in production rules
Every binary level is written Operand (op Operand)* rather than Expr op Operand | Operand. The two forms accept the same language; the right-recursive shape is parsed without a left-recursion-capable parser. The implementation runs on participle/v2, which does not support direct left recursion. The spec uses the same shape so the implementation is a literal translation.
Precedence in the type checker is an implementation detail
The flat right-recursive shape produces an AST that the type checker balances into a precedence tree. The balancing is an implementation choice. The language definition remains the layered ladder, and any implementation that parses directly into a precedence tree accepts and rejects exactly the same set of programs.
Conformance
A conforming implementation must accept every shape this MEP marks as accepted and reject every shape it marks as rejected. The obligations below are stated in terms of the language, not in terms of any one implementation. Any test corpus that exercises them satisfies this MEP.
Acceptance obligations
A conforming parser must build the documented AST for at least one example of each of the following shape classes:
- Type declarations. Each of the three
TypeDeclforms (struct, variant union, alias) over each payload variation: identifier alias, generic alias, function-type alias, inline-struct alias, struct with and without the leading=, single-variant union, multi-variant union. - Subscript and slice. Each of the 11 well-formed
SliceFormalternatives, plus chained subscripts and string-keyed maps. Each accepted shape must surface the capturedStart,End, andSteppositions in the AST so downstream passes can distinguish[a:]from[a:b]. - Match arms. Arms with an expression body and arms with a block body, in the same
matchexpression. - Top-level statements. Every form enumerated under
Statement:letwith and without value,var, field and index assignment,if/else,while,for-in,for-range,returnwith and without value, genericfun,break,continue,test,bench. - Literals. List, map, string, integer (decimal, hex, binary, octal), float (with exponent), boolean, and
null, including trailing-comma forms for containers. - Operand symmetry. Every binary operator must accept a unary prefix on its right operand, and the AST must carry that prefix in the same shape it would on the left. The cross-product
{ b, -b, !b, --b, -!b }on both sides of+must parse for all combinations.a + -b * cmust parse so-binds toband-b * cbinds tighter than the surrounding+.
Rejection obligations
A conforming parser must reject the following shapes with a positioned diagnostic, not a silent accept:
xs[](no key, no slice components)xs[:](slice with all components empty)xs[::](slice with all three components empty)match v { 0 => }(match arm with neither expression body nor block body)f() = 1(assignment target rooted at a non-identifier expression)type X = |(variant union with no variants)
The position of the diagnostic must point at the offending token, not at the next-best anchor. An implementation that emits a positioned error of any wording satisfies this obligation; the wording is not normative.
Robustness obligations
A conforming parser must terminate on every input. For any byte sequence it must produce either a valid AST or a positioned diagnostic; it must not panic, hang, or return a null result alongside a null error. Random-byte fuzzing for at least the time the language community standardises on (currently ten seconds per release) must surface no counterexamples to this invariant.
Open questions
- Union types in
TypeRef. First-class union type expressions would letlet x: int | nil = ...parse. The decision is between full union types inTypeRefand anint?shorthand for the common nil case. See MEP 10. - Tuple types. Heterogeneous fixed-length sequences have no surface today. If pattern destructuring of arbitrary product values is added, tuples will follow.
- Direct left-recursion support.
participle/v2does not implement Warth-style left-recursion memoisation. A future port to a PEG parser that does (e.g. a hand-written packrat) would let the spec match the implementation literally, removing the right-recursiveOperand (op Operand)*rewrite from the grammar. The language accepted is unchanged.
References
- Aho, Sethi, Ullman. Compilers: Principles, Techniques, and Tools. §4.3 on layered precedence grammars.
- Crafting Interpreters, chapter 6. Walks the precedence ladder that motivates operand symmetry.
- Pratt, Vaughan. Top Down Operator Precedence. POPL 1973. The original presentation of operator-precedence parsing; the operand-is-a-unary invariant is forced by the algorithm.
- Ford, Bryan. Parsing Expression Grammars: A Recognition-Based Syntactic Foundation. POPL 2004. PEG semantics, ordered choice, and the no-ambiguity guarantee.
- participle/v2 grammar reference. The struct-tag dialect this grammar is encoded in.
- MEP 1 (Lexer).
- MEP 3 (Abstract Syntax Tree).
- MEP 10 (Type system roadmap).
- MEP 13 (Pattern matching).
Copyright
This document is placed in the public domain.