Skip to main content

Compiler Pipeline Reference

Overview

Source (.mx)
-> Lexer
-> Parser
-> Frontend
-> Semantic Analysis
-> NIR Builder
-> Morph-backed Optimizer
-> MIR Builder
-> Backend Host
-> Target Artifact

Current target artifacts include native binaries through the host.llvm backend host. Morph-owned backend routes are now part of the pipeline for covered construct families.

Stage 1 - Lexer (src/lexer/)

Input: source text
Output: std::vector<Token>

Role:

  • tokenizes source text
  • attaches source locations
  • leaves context-sensitive decisions to later stages

Stage 2 - Parser (src/parser/)

Input: token stream
Output: raw AST

Role:

  • builds the structural AST
  • does not own Morph surface semantics
  • keeps operators, calls, member access, indexing, and context blocks as syntax-level constructs

Stage 3 - Frontend (src/frontend/)

Input: file paths and CLI settings
Output: loaded source graph and diagnostic setup

Role:

  • source loading
  • import resolution
  • diagnostic context wiring

Stage 4 - Semantic Analysis (src/sema/)

Input: raw AST
Output: type-checked AST plus Morph surface facts for owned constructs

Role:

  • name resolution and type checking
  • generic semantic validation
  • Morph surface semantic dispatch through the structural surface host
  • construction of typed semantic facts for Morph-owned families

Implemented Morph-owned semantic families currently include Input and Add.

Stage 5 - NIR Build + Optimization (src/nir/)

Input: semantic AST and Morph surface facts
Output: optimized NIR module

Role:

  • lowers semantic AST into typed SSA NIR
  • preserves Morph ownership by tagging Morph-owned instructions with semantic family metadata
  • runs the default Morph-backed optimizer pipeline loaded from morphs/pipelines.toml

Morph-owned lowering roots live under morphs/<Domain>/Lowering/<Root>/ and Morph-owned surface/operator families live under morphs/<Domain>/Surface/<Family>/.

Stage 6 - MIR Construction (src/mir/)

Input: optimized NIR module
Output: MIR module

Role:

  • consumes NIR, not AST
  • mirrors the optimized NIR function/block structure into MIR
  • preserves operationId, structural InstKind, executionHint, and semanticFamilyId
  • retains an owned NIR backing module for backend hosts during the transition to fully Morph-owned backend emission

This is now the canonical codegen handoff boundary used by compile, REPL, JIT, and MIR debug views.

Stage 7 - Backend Host (src/codegen/)

Input: MIR module
Output: target artifact

Role:

  • LLVMCodeGen is the host.llvm backend host/orchestrator
  • Morph-owned backend route components can emit covered semantic families through the opaque backend route API
  • uncovered legacy instructions still fall through to the generic backend host implementation during migration
  • route selection uses MorphContext and explicit project-configured route ids

Currently implemented host.llvm Morph backend routes include:

  • core.input
  • core.add

Missing backend routes for Morph-owned semantic families are hard failures.

Runtime Paths

The following entry points now use the same public stage boundary:

  • compile pipeline
  • REPL pipeline
  • JIT path
  • MIR debug views

They all build NIR first, then MIR, then invoke the backend host.

Validation

Use morphc after pipeline changes:

.\morphc.bat test "mir"
.\morphc.bat test "llvm"
.\morphc.bat build