Skip to main content

MEP 2. Grammar

FieldValue
MEP2
TitleGrammar
AuthorMochi core
StatusInformational
TypeInformational
Created2026-05-08
Revised2026-05-11

Abstract

Mochi has a Parsing Expression Grammar (PEG) with ordered choice. Every well-formed source has exactly one parse tree. The grammar is encoded on Go struct tags consumed by participle/v2, but the canonical specification is the PEG productions in this document. Production order is meaningful: alternatives are tried top-to-bottom, the first that matches wins, and no later alternative is considered. Every shape the grammar accepts is well-formed by construction. There is no post-parse "this should not have parsed" pass.

The companion document is MEP 1 (Lexer). MEP 1 fixes the token stream; this MEP fixes the productions consumed from it.

Motivation

A language is easy to learn when its grammar has no special cases. Three principles drive this revision:

  1. One source, one parse tree. PEG ordered choice removes ambiguity. The reader does not need to know associativity tables, lookahead sets, or shift-reduce conventions to predict how a fragment parses.
  2. No corner cases. Anything the grammar accepts is meaningful. Shapes that have no meaning (empty subscript, bodyless match arm, untyped union literal) do not match any production. There is no separate post-parse layer that quietly rejects what the parser quietly accepted.
  3. Layered precedence. Binary operators are laid out in a precedence ladder where every level has the same shape: Operand (op Operand)*. The operand is always a Unary. Both sides of every binary operator are structurally identical, so a + b, a + -b, -a + -b, and a + --!b all parse from the same rule.

These principles make the grammar comprehensible end-to-end and remove the implementation cliffs that the previous revision papered over with post-parse validators.

Grammar notation

The grammar is written in the standard PEG meta-syntax, restricted to a small operator set (no action blocks, no return-type annotations). Each operator below has its usual PEG semantics.

OperatorMeaning
e1 e2Sequence: match e1, then e2.
e1 | e2Ordered choice: try e1; if it fails, try e2.
e?Optional: 0 or 1 occurrence.
e*Repetition: 0 or more occurrences.
e+Repetition: 1 or more occurrences.
s.e+One or more e, separated by s. The separator is not kept.
&ePositive lookahead: match without consuming input.
!eNegative lookahead: fail without consuming input.
~Cut: commit to the current alternative, no further backtrack.
'lit'Match a literal token (keyword or punctuation).
IdentMatch a token kind from MEP 1 (Ident, Int, String, ...).

Rule names are PascalCase. The start rule is Program. Comments and whitespace are removed by the lexer (MEP 1) before parsing; they appear nowhere in the productions.

The grammar is left-factored. No rule begins with a recursive call to itself. Where the layered precedence ladder reads naturally as left recursion (Expr: Expr op Expr | Operand), the equivalent right-recursive shape Expr: Operand (op Operand)* is used instead. The two are observationally identical; the latter does not require a left-recursion-capable parser.

Specification

Top level

Program = 'package' Ident? Statement*

Statement = TestBlock | BenchBlock | ExpectStmt
| AgentDecl | StreamDecl | ModelDecl
| ImportStmt | TypeDecl
| ExternTypeDecl | ExternVarDecl
| ExternFunDecl | ExternObjectDecl
| FactStmt | RuleStmt | OnHandler | EmitStmt
| LetStmt | VarStmt | FunStmt
| ReturnStmt | IfStmt | WhileStmt | ForStmt
| BreakStmt | ContinueStmt
| FetchStmt | UpdateStmt
| AssignStmt | ExprStmt

Statement alternatives that begin with a unique keyword come first. The last two alternatives (AssignStmt, ExprStmt) do not, so they sit at the bottom. PEG ordered choice guarantees that a keyword-led form is never confused for a bare expression.

Declarations

LetStmt = 'let' Ident (':' TypeRef)? ('=' Expr)?
VarStmt = 'var' Ident (':' TypeRef)? ('=' Expr)?
AssignStmt = AssignTarget '=' Expr
AssignTarget = Ident (IndexOp | FieldOp)*
FunStmt = 'export'? 'fun' Ident TypeParams?
'(' ','.Param* ')' (':' TypeRef)?
'{' Statement* '}'
TypeParams = '<' ','.Ident+ '>'
Param = Ident (':' TypeRef)?
ReturnStmt = 'return' Expr?
BreakStmt = 'break'
ContinueStmt = 'continue'

AssignTarget is the postfix chain rooted at an identifier. The grammar therefore accepts a = 1, a.b = 1, a[0] = 1, and a.b[0].c = 1, and rejects assignments to literal-rooted expressions (1 = x, f() = x) at parse time. Call expressions cannot appear on the left of =.

Type declarations

A type declaration takes exactly one of three forms, distinguished by the first token after =:

TypeDecl = 'type' Ident TypeDeclBody

TypeDeclBody = StructBody | VariantUnion | TypeAlias

StructBody = '='? '{' ','.TypeField+ ','? '}'
| '='? '{' '}'
VariantUnion = '=' TypeVariant ('|' TypeVariant)*
& VariantOrAliasHint
TypeAlias = '=' TypeRef

TypeVariant = Ident VariantPayload?
VariantPayload = '(' ','.TypeField+ ','? ')'
| '{' TypeField* '}'
TypeField = Ident ':' TypeRef

The grammar distinguishes VariantUnion from TypeAlias by lookahead: VariantOrAliasHint succeeds only when the head identifier is followed by (, {, or |. A bare identifier or any compound type expression falls through to TypeAlias.

ShapeExample
Structtype Point { x: int, y: int }
Struct, explicit =type Point = { x: int, y: int }
Variant union, 1 varianttype Pair = P(a: int, b: int)
Variant union, N variantstype Shape = Circle(r: float) | Square(s)
Alias to a type reftype Id = int
Alias to a generictype IdList = list<int>
Alias to a functiontype Fn = fun(int): string
Alias to inline structtype Pt = { x: int, y: int }

A single-variant union (type Pair = P(a, b)) is structurally a VariantUnion, not a degenerate TypeAlias. The empty-struct form type Empty {} is accepted; the empty union form type X = | is not, because VariantUnion requires at least one TypeVariant.

Type references

TypeRef = FunType
| GenericType
| InlineStructType
| Ident

FunType = 'fun' '(' ','.TypeRef* ')' (':' TypeRef)?
GenericType = Ident '<' ','.TypeRef+ '>'
InlineStructType = '{' ','.TypeField* ','? '}'

TypeRef is the type-position expression. There is no union type literal in TypeRef. Sum types must be named through TypeDecl. There is no tuple type literal. Heterogeneous fixed-shape data uses a struct.

Control flow

IfStmt = 'if' Expr '{' Statement* '}'
('else' (IfStmt | '{' Statement* '}'))?
WhileStmt = 'while' Expr '{' Statement* '}'
ForStmt = 'for' Ident 'in' ForRange '{' Statement* '}'
ForRange = Expr '..' Expr
| Expr

ForRange puts the half-open range form first so 0..n binds as a range rather than as an expression followed by stray tokens. The general form for x in xs falls through to the second alternative.

Pattern matching

MatchExpr = 'match' Expr '{' MatchArm* '}'
MatchArm = Pattern '=>' MatchBody
MatchBody = Expr | '{' Statement* '}'
Pattern = Expr

MatchBody is mandatory. A => with nothing on its right does not parse. The pattern is just an expression; the type checker decides whether a given expression shape is a legal pattern (see MEP 13).

Expressions

The expression grammar is a precedence ladder. Every level has the same shape: Operand (op Operand)*. The operand is always Unary. Operator precedence is encoded structurally; the parser does not consult a precedence table.

Expr = OrExpr

OrExpr = AndExpr ('||' AndExpr)*
AndExpr = NotExpr ('&&' NotExpr)*
NotExpr = '!'* RelExpr
RelExpr = AddExpr (RelOp AddExpr)*
RelOp = '==' | '!=' | '<=' | '>=' | '<' | '>'
| 'in' | 'union' 'all'? | 'except' | 'intersect'
AddExpr = MulExpr (('+' | '-') MulExpr)*
MulExpr = Unary (('*' | '/' | '%') Unary)*

Unary = ('-')* PostfixExpr
PostfixExpr = Primary PostfixOp*
PostfixOp = CallOp | IndexOp | FieldOp | CastOp

CallOp = '(' ','.Expr* ')'
FieldOp = '.' Ident
CastOp = 'as' TypeRef

Three invariants the grammar guarantees:

  • Operand symmetry. Both operands of every binary operator are a Unary. The right side carries the same unary-prefix slot as the left, so a + -b, x == -1, a && !b, and a + -!b * c all parse. This is the canonical layered precedence shape; see Pratt 1973 and Crafting Interpreters §6.
  • Prefix stacking. Unary accepts ('-')*, so --x is two unary minuses and !!x is two boolean nots. There is no decrement or increment operator; the longest-match rule of MEP 1 is irrelevant because there are no two-character -- or ++ tokens.
  • Postfix chains. PostfixOp* is unbounded, so f()(2).field as int is one chain in one production. Method-call sugar lives in PostfixOp, not in Primary.

Operator precedence, from loosest to tightest:

LevelOperatorsProduction
1||OrExpr
2&&AndExpr
3! prefixNotExpr
4== != <= >= < > in union[all]? except intersectRelExpr
5+ - (binary)AddExpr
6* / %MulExpr
7- prefixUnary
8() [] . asPostfixOp

All binary operators are left-associative. Right-associativity is achieved by parenthesisation. Prefix - and prefix ! are not interchangeable: ! lives in NotExpr (between && and the relational level) so !a == b parses as !(a == b) only if grouped explicitly; otherwise as (!a) == b. Numeric negation lives in Unary, tightly binding to the operand.

Subscript and slice

The grammar enumerates every well-formed shape. Empty subscripts and slices have no production:

IndexOp = '[' SliceForm ']'

SliceForm = Expr ':' Expr ':' Expr (* [a:b:c] *)
| Expr ':' Expr ':' (* [a:b:] *)
| Expr ':' ':' Expr (* [a::c] *)
| Expr ':' ':' (* [a::] *)
| Expr ':' Expr (* [a:b] *)
| Expr ':' (* [a:] *)
| ':' Expr ':' Expr (* [:b:c] *)
| ':' Expr ':' (* [:b:] *)
| ':' ':' Expr (* [::c] *)
| ':' Expr (* [:b] *)
| Expr (* [a] *)

The 11 alternatives cover the full lattice of well-formed slices and the single subscript shape. xs[], xs[:], and xs[::] match no alternative; they are syntax errors at parse time, not at a later pass. A post-parse validator for empty-index shapes is no longer needed.

Primary expressions

Primary = StructLiteral | QueryExpr | LogicQueryExpr
| IfExpr | SelectorExpr
| ListLiteral | MapLiteral
| FunExpr | MatchExpr | GenerateExpr | FetchExpr
| LoadExpr | SaveExpr | Literal
| '(' Expr ')'

StructLiteral = Ident '{' ','.StructLitField* ','? '}'
StructLitField= Ident ':' Expr
SelectorExpr = Ident ('.' Ident)*
ListLiteral = '[' ','.Expr* ','? ']'
MapLiteral = '{' ','.MapEntry* ','? '}'
MapEntry = (Ident | String | Int) ':' Expr
Literal = Int | Float | String | 'true' | 'false' | 'null'

MapLiteral and InlineStructType share the { ... } shape. PEG ordered choice resolves the ambiguity by context: in expression position the parser tries StructLiteral first (only matches when an identifier precedes the brace) and MapLiteral second; in type position only InlineStructType is available.

Lambda expressions

FunExpr = 'fun' TypeParams?
'(' ','.Param* ')' (':' TypeRef)?
('{' Statement* '}' | '=>' Expr)

Block-bodied and arrow-bodied forms are both accepted. The arrow form yields its expression as the function result.

Queries

The query expression follows LINQ clause order. The grammar enforces the order; SQL-style select ... from ... does not parse.

QueryExpr = 'from' Ident 'in' Expr
FromClause*
JoinClause*
('where' Expr)?
GroupByClause?
(('sort' | 'order') 'by' Expr)?
('skip' Expr)?
('take' Expr)?
'select' 'distinct'? Expr
FromClause = 'from' Ident 'in' Expr
JoinClause = ('left' | 'right' | 'outer')? 'join' 'from'? Ident
'in' Expr 'on' Expr
GroupByClause = 'group' 'by' ','.Expr+ 'into' Ident ('having' Expr)?

Logic programming

FactStmt = 'fact' LogicPredicate
RuleStmt = 'rule' LogicPredicate ':-' ','.LogicCond+
LogicCond = LogicPredicate | LogicNeq
LogicNeq = Ident '!=' Ident
LogicPredicate = Ident '(' ','.LogicTerm* ')'
LogicTerm = Ident | String | Int
LogicQueryExpr = 'query' LogicPredicate

These productions parse and the tree-walking interpreter executes them. The bytecode VM does not. See MEP 10.

Streams and agents

StreamDecl = 'stream' Ident '{' StreamField* '}'
StreamField = Ident ':' TypeRef
EmitStmt = 'emit' Ident '{' ','.StructLitField* ','? '}'
OnHandler = 'on' Ident 'as' Ident '{' Statement* '}'
AgentDecl = 'agent' Ident '{' AgentBlock* '}'
AgentBlock = LetStmt | VarStmt | AssignStmt | OnHandler | IntentDecl
IntentDecl = 'intent' Ident '(' ','.Param* ')'
(':' TypeRef)? '{' Statement* '}'

I/O

FetchStmt = 'fetch' Expr 'into' Ident ('with' Expr)?
LoadExpr = 'load' String? 'as' TypeRef ('with' Expr)?
SaveExpr = 'save' Expr ('to' String)? ('with' Expr)?

load and save require both as Type and a format option in the with map for any non-default behaviour.

Design principles

Single parse tree per source

PEG ordered choice eliminates ambiguity at the grammar level. A reader does not need to know shift-reduce conventions, lookahead sets, or associativity tables to predict how a fragment parses. The first alternative that matches wins; no later alternative is considered.

No corner cases

A grammar that accepts shapes a post-parse pass then rejects is two grammars layered. Every shape this MEP defines is meaningful by construction:

  • IndexOp enumerates the 11 well-formed slice shapes. xs[], xs[:], xs[::] match no production.
  • MatchBody is mandatory. => followed by nothing matches no production.
  • AssignTarget is restricted to postfix chains rooted at an identifier. 1 = x, f() = x, (a + b) = x match no production.
  • VariantUnion requires at least one TypeVariant. type X = | matches no production.

The post-parse normalisation pass that earlier revisions documented under codes P060 and P061 is removed. Those codes are retired.

Layered precedence with operand symmetry

The expression grammar follows the layered precedence ladder of Aho-Sethi-Ullman §4.3 and Crafting Interpreters §6. Every binary level has the shape Operand (op Operand)* with Operand = Unary. Both sides of every binary operator are the same operand type. The reader and the implementor are guaranteed that any unary prefix accepted at any operand position is accepted at every operand position. There is no asymmetry between Left and Right to remember or work around.

Keyword-led statements

All compound statements lead with a unique keyword (let, var, fun, if, while, for, return, break, continue, type, fact, rule, import, stream, agent, on, emit, fetch, test, bench, expect). The grammar disambiguates by keyword, not by lookahead distance. The two keyword-less statements (AssignStmt, ExprStmt) sit last in the alternation and are recognised by their structural shape.

No left recursion in production rules

Every binary level is written Operand (op Operand)* rather than Expr op Operand | Operand. The two forms accept the same language; the right-recursive shape is parsed without a left-recursion-capable parser. The implementation runs on participle/v2, which does not support direct left recursion. The spec uses the same shape so the implementation is a literal translation.

Precedence in the type checker is an implementation detail

The flat right-recursive shape produces an AST that the type checker balances into a precedence tree. The balancing is an implementation choice. The language definition remains the layered ladder, and any implementation that parses directly into a precedence tree accepts and rejects exactly the same set of programs.

Conformance

A conforming implementation must accept every shape this MEP marks as accepted and reject every shape it marks as rejected. The obligations below are stated in terms of the language, not in terms of any one implementation. Any test corpus that exercises them satisfies this MEP.

Acceptance obligations

A conforming parser must build the documented AST for at least one example of each of the following shape classes:

  1. Type declarations. Each of the three TypeDecl forms (struct, variant union, alias) over each payload variation: identifier alias, generic alias, function-type alias, inline-struct alias, struct with and without the leading =, single-variant union, multi-variant union.
  2. Subscript and slice. Each of the 11 well-formed SliceForm alternatives, plus chained subscripts and string-keyed maps. Each accepted shape must surface the captured Start, End, and Step positions in the AST so downstream passes can distinguish [a:] from [a:b].
  3. Match arms. Arms with an expression body and arms with a block body, in the same match expression.
  4. Top-level statements. Every form enumerated under Statement: let with and without value, var, field and index assignment, if/else, while, for-in, for-range, return with and without value, generic fun, break, continue, test, bench.
  5. Literals. List, map, string, integer (decimal, hex, binary, octal), float (with exponent), boolean, and null, including trailing-comma forms for containers.
  6. Operand symmetry. Every binary operator must accept a unary prefix on its right operand, and the AST must carry that prefix in the same shape it would on the left. The cross-product { b, -b, !b, --b, -!b } on both sides of + must parse for all combinations. a + -b * c must parse so - binds to b and -b * c binds tighter than the surrounding +.

Rejection obligations

A conforming parser must reject the following shapes with a positioned diagnostic, not a silent accept:

  • xs[] (no key, no slice components)
  • xs[:] (slice with all components empty)
  • xs[::] (slice with all three components empty)
  • match v { 0 => } (match arm with neither expression body nor block body)
  • f() = 1 (assignment target rooted at a non-identifier expression)
  • type X = | (variant union with no variants)

The position of the diagnostic must point at the offending token, not at the next-best anchor. An implementation that emits a positioned error of any wording satisfies this obligation; the wording is not normative.

Robustness obligations

A conforming parser must terminate on every input. For any byte sequence it must produce either a valid AST or a positioned diagnostic; it must not panic, hang, or return a null result alongside a null error. Random-byte fuzzing for at least the time the language community standardises on (currently ten seconds per release) must surface no counterexamples to this invariant.

Open questions

  • Union types in TypeRef. First-class union type expressions would let let x: int | nil = ... parse. The decision is between full union types in TypeRef and an int? shorthand for the common nil case. See MEP 10.
  • Tuple types. Heterogeneous fixed-length sequences have no surface today. If pattern destructuring of arbitrary product values is added, tuples will follow.
  • Direct left-recursion support. participle/v2 does not implement Warth-style left-recursion memoisation. A future port to a PEG parser that does (e.g. a hand-written packrat) would let the spec match the implementation literally, removing the right-recursive Operand (op Operand)* rewrite from the grammar. The language accepted is unchanged.

References

This document is placed in the public domain.