Plugin-Governed Pipeline (Why the Morph Framework Matters)
Morph is not “a C++ compiler with a scripting layer.” The Morph Framework is how loadable packages (morphs/<Package>/) register end-to-end ownership of language surface, semantics, IR, backends, runtime, and tooling—without each feature becoming a hardcoded branch in src/.
The data plane is still the familiar pipeline (AST → NIR → MIR → backends). The control plane is manifest-driven plugin registration (morph.toml, feature.toml, generated glue) plus the stable Morph ABI in include/morphc/morph/MorphABI.h.
Engine vs plugins
| Layer | Role |
|---|---|
Compiler host (src/) | Lexer/parser skeleton, semantic orchestration, NIR/MIR infrastructure, diagnostic engine, LSP/CLI shell, workspace loading |
Morph packages (morphs/) | Declare what is owned: syntax surfaces, sema rules, lowers, optimizers, host.llvm / GPU / shader / NN / REPL routes, runtime families, nested tooling commands, execution/build hooks |
| Manifest graph | Says which package exports which capability to which backend host—so the host can dispatch without compile-time knowledge of every domain |
Packages export a MorphMorphLibrary from a stable C entry (MORPHLANG_MORPH_LIBRARY_ENTRY → morphlang_morph_get_library in generated/plugin code). The build wires each package library into the compiler; discovery is driven by manifests and codegen (src/nir/morph/codegen/*, package plugin/*).
End-to-end registration (conceptual)
morph.toml + feature.toml + block.toml
│
▼
Package C++ / NIR-Glue Registration
│
├── Syntax / declaration hooks (syntax extension, blocks, chain features)
├── Semantic phase (sema extension, feature roles)
├── NIR: optimize / lower (MORPHLANG_MORPH_PHASE_*)
├── Backend routes (host.llvm, GPU Vulkan, SPIR-V, NN graph, REPL, …)
├── Runtime families (symbols + signatures in manifest)
└── Tooling commands (nested `morph …` surfaces from packages)
│
▼
Single composed compiler + CLI behavior
Anything you see as “Morph language behavior” in the repo—gpu { }, tensor ops, flow, tests, graphics, Wasm, platform packs—is implemented as package-owned factories and routes, not as ad hoc switch trees scattered only in src/.
Inside packages: morph.toml, feature.toml, block.toml
The host does not guess which files belong to a package. The package root morph.toml names the shared library, dependencies, token requests, runtime families, provider imports/exports—and, critically, [include] globs that pull in every nested manifest.
morph.toml (examples to open in the tree)
| Section | What it declares (typical) | See |
|---|---|---|
[package] | Domain, ABI version, library path under the package build, dependencies / runtime_dependencies | morphs/Core/morph.toml, morphs/GPU/morph.toml |
[tokens] | token_requests (lexer surface → stable token id; used by forms and ops) | Core (many keywords), GPU (KW_GPU) |
[import.*] / [export.*] | Provider edges between packages (kind = "provider", id, from) | Core imports GPU/Shader providers; GPU exports provider.gpu.vulkan |
[runtime.family.*] | Named runtime import families and *.symbol.* entries (llvm_signature, native_symbol) | Core io, print, module; GPU gpu |
[runtime.bundle.*] | Shared C sources and platform slices the build links for that package | Core runtime.bundle.shared.platform.*; GPU shared / web |
[include] | Which feature.toml / block.toml files are part of this package | Core: features/*/feature.toml, blocks/*/block.toml; GPU: features/*/feature.toml only |
[build] | nir_sources, sema_sources, codegen_sources, cli_sources, etc.—concrete C++ files the package compiles into its morph library | Core and GPU morph.toml |
[diagnostics] | Diagnostic prefix for package-owned messages | CORE, GPU, … |
So: syntax trees and lowering live in block.toml + feature.toml; the root manifest lists the C++ that implements them and includes those nested files.
Syntax rule integrity
Plugin-owned syntax rules are contracts, not escape hatches for parser bugs.
- Do not weaken a
block.tomlrule just because host-side parser selection or probing is failing. - Do not replace a precise grammar with a looser "match only the leading keyword" rule to force a parse path.
- Keep the full plugin-owned syntax contract in the manifest and fix the parser/probe so it can honor that contract.
- Only generalize host infrastructure when the improvement is grammar-agnostic; syntax-specific body handling should stay with the owning plugin.
If this rule is violated, the manifest stops describing the real language surface and ownership boundaries start to erode.
block.toml — grammar surface → AST kind
blocks/<name>/block.toml holds [forms.<id>] sections: a rule (token/capture/choice/body DSL), optional kind (statement, expression_primary, …), produces (stable syntax node id), component_class, and source (C++ implementing the syntax hook).
Example (Flow): if / match statement and expression forms all live in one block manifest and point at the same Syntax.cpp:
[forms.if_statement]
kind = "statement"
rule = { seq = [
{ token = "KW_IF" },
{ capture = { name = "condition", node = { choice = [
{ seq = [{ token = "PUNC_LPAREN" }, { ref = "expr" }, { token = "PUNC_RPAREN" }] },
{ ref = "expr" }
] } } },
{ capture = { name = "then_body", node = { body = { mode = "required", owner = "if body" } } } },
{ optional = { seq = [
{ token = "KW_ELSE" },
{ choice = [
{ ref = "statement" },
{ body = { mode = "required", owner = "else body" } }
] }
] } }
] }
produces = "flow.syntax.statement.if"
component_class = "flow_if_statement_syntax"
source = "blocks/branch_block/Syntax.cpp"
feature.toml — sema, NIR, backends, REPL
Feature manifests are not all the same shape: one file may define built-ins, routes, operations, or tooling. Typical patterns:
Built-in + LLVM route (Core API bundle) — symbol, lowering component, and a [*.route.hostllvm] (or other host) subsection with provider_id, route_id, artifact_kind, required_extensions, and route source:
[core.input]
symbol = "Input"
source = "features/Input/Input.cpp"
component = "nir.core.input_lowering"
component_class = "core.input.root"
generic_t = "type"
param_prompt = ["string"]
returns = ["t"]
builtin_family = "core.input"
[core.input.route.hostllvm]
provider_id = "provider.core.host_llvm"
route_id = "host.llvm"
artifact_kind = "host.llvm_value"
required_extensions = ["llvm.base.v1", "llvm.runtime_import.v1"]
required_runtime_families = ["io"]
description = "Emits host LLVM IR for core.input."
source = "features/Input/backend/routes/host.llvm/Emit.cpp"
component_class = "core.input.backend.hostllvm"
Binary operations (Ops) — [operation.*] ties lexer token request, NIR kind, VM dispatch, precedence to a stable op id:
[operation.add]
id = "op.core.add"
mnemonic = "add"
semantic_family = "core.add"
nir_kind = "Add"
token_request_id = "OP_SUM"
vm_dispatch_id = "vm.scalar.add"
requires_non_null_operands = true
infix_precedence = 50
associativity = "left"
Registration glue is generated and wired from manifests plus src/nir/morph/codegen/* and each package’s plugin/ entry; the ABI types in MorphABI.h describe how those pieces attach to the host.
ABI: phases, roles, backends
From MorphABI.h (summaries, not every enum value):
Phases (MorphlangMorphPhase): OPTIMIZER, LOWERING, CODEGEN, SEMANTIC.
Feature roles (MorphlangMorphFeatureRole): include semantic, lowering, sema extension, backend route, syntax extension, REPL behavior, provider, etc.—this is how a feature declares where it plugs into the host.
Backend hosts (MorphlangMorphBackendHostKind): LLVM, GPU_VULKAN, SHADER_SPIRV, NN_GRAPH, REPL, …
Backend artifacts (MorphlangMorphBackendArtifactKind): LLVM values, GPU kernels, shader expressions, NN graph nodes, REPL behavior, …
Together these fields describe a typed edge in the pipeline graph: which IR artifact goes in, what role the feature plays, which host consumes the result.
Why this should appear “everywhere” in docs
- Language tutorials teach surface syntax; the framework chapter teaches who owns that syntax and where it lowers.
- Toolchain pages (
morph,morphc,morph.toml) describe invocation; the framework describes which packages satisfy those commands and routes. - Architecture overviews should always mention that
morphs/drives sema/NIR/backend composition—not optional metadata.
If a Learn Morph page talks about a construct but never names its owning package, it is incomplete for advanced readers.
Where to go next
- Morph packages README — Standard Package Layout ties folder names to
morph.toml/feature.toml/block.toml - Morph framework overview — package-first model and teaching map
- Plugin-first rule — non‑negotiable ownership
- Runtime and backend ownership
- Tooling command APIs
- Writing a plugin from scratch
- Feature manifests