Skip to main content

Plugin-Governed Pipeline (Why the Morph Framework Matters)

Morph is not “a C++ compiler with a scripting layer.” The Morph Framework is how loadable packages (morphs/<Package>/) register end-to-end ownership of language surface, semantics, IR, backends, runtime, and tooling—without each feature becoming a hardcoded branch in src/.

The data plane is still the familiar pipeline (AST → NIR → MIR → backends). The control plane is manifest-driven plugin registration (morph.toml, feature.toml, generated glue) plus the stable Morph ABI in include/morphc/morph/MorphABI.h.


Engine vs plugins

LayerRole
Compiler host (src/)Lexer/parser skeleton, semantic orchestration, NIR/MIR infrastructure, diagnostic engine, LSP/CLI shell, workspace loading
Morph packages (morphs/)Declare what is owned: syntax surfaces, sema rules, lowers, optimizers, host.llvm / GPU / shader / NN / REPL routes, runtime families, nested tooling commands, execution/build hooks
Manifest graphSays which package exports which capability to which backend host—so the host can dispatch without compile-time knowledge of every domain

Packages export a MorphMorphLibrary from a stable C entry (MORPHLANG_MORPH_LIBRARY_ENTRYmorphlang_morph_get_library in generated/plugin code). The build wires each package library into the compiler; discovery is driven by manifests and codegen (src/nir/morph/codegen/*, package plugin/*).


End-to-end registration (conceptual)

morph.toml + feature.toml + block.toml


Package C++ / NIR-Glue Registration

├── Syntax / declaration hooks (syntax extension, blocks, chain features)
├── Semantic phase (sema extension, feature roles)
├── NIR: optimize / lower (MORPHLANG_MORPH_PHASE_*)
├── Backend routes (host.llvm, GPU Vulkan, SPIR-V, NN graph, REPL, …)
├── Runtime families (symbols + signatures in manifest)
└── Tooling commands (nested `morph …` surfaces from packages)


Single composed compiler + CLI behavior

Anything you see as “Morph language behavior” in the repo—gpu { }, tensor ops, flow, tests, graphics, Wasm, platform packs—is implemented as package-owned factories and routes, not as ad hoc switch trees scattered only in src/.


Inside packages: morph.toml, feature.toml, block.toml

The host does not guess which files belong to a package. The package root morph.toml names the shared library, dependencies, token requests, runtime families, provider imports/exports—and, critically, [include] globs that pull in every nested manifest.

morph.toml (examples to open in the tree)

SectionWhat it declares (typical)See
[package]Domain, ABI version, library path under the package build, dependencies / runtime_dependenciesmorphs/Core/morph.toml, morphs/GPU/morph.toml
[tokens]token_requests (lexer surface → stable token id; used by forms and ops)Core (many keywords), GPU (KW_GPU)
[import.*] / [export.*]Provider edges between packages (kind = "provider", id, from)Core imports GPU/Shader providers; GPU exports provider.gpu.vulkan
[runtime.family.*]Named runtime import families and *.symbol.* entries (llvm_signature, native_symbol)Core io, print, module; GPU gpu
[runtime.bundle.*]Shared C sources and platform slices the build links for that packageCore runtime.bundle.shared.platform.*; GPU shared / web
[include]Which feature.toml / block.toml files are part of this packageCore: features/*/feature.toml, blocks/*/block.toml; GPU: features/*/feature.toml only
[build]nir_sources, sema_sources, codegen_sources, cli_sources, etc.—concrete C++ files the package compiles into its morph libraryCore and GPU morph.toml
[diagnostics]Diagnostic prefix for package-owned messagesCORE, GPU, …

So: syntax trees and lowering live in block.toml + feature.toml; the root manifest lists the C++ that implements them and includes those nested files.

Syntax rule integrity

Plugin-owned syntax rules are contracts, not escape hatches for parser bugs.

  • Do not weaken a block.toml rule just because host-side parser selection or probing is failing.
  • Do not replace a precise grammar with a looser "match only the leading keyword" rule to force a parse path.
  • Keep the full plugin-owned syntax contract in the manifest and fix the parser/probe so it can honor that contract.
  • Only generalize host infrastructure when the improvement is grammar-agnostic; syntax-specific body handling should stay with the owning plugin.

If this rule is violated, the manifest stops describing the real language surface and ownership boundaries start to erode.

block.toml — grammar surface → AST kind

blocks/<name>/block.toml holds [forms.<id>] sections: a rule (token/capture/choice/body DSL), optional kind (statement, expression_primary, …), produces (stable syntax node id), component_class, and source (C++ implementing the syntax hook).

Example (Flow): if / match statement and expression forms all live in one block manifest and point at the same Syntax.cpp:

[forms.if_statement]
kind = "statement"
rule = { seq = [
{ token = "KW_IF" },
{ capture = { name = "condition", node = { choice = [
{ seq = [{ token = "PUNC_LPAREN" }, { ref = "expr" }, { token = "PUNC_RPAREN" }] },
{ ref = "expr" }
] } } },
{ capture = { name = "then_body", node = { body = { mode = "required", owner = "if body" } } } },
{ optional = { seq = [
{ token = "KW_ELSE" },
{ choice = [
{ ref = "statement" },
{ body = { mode = "required", owner = "else body" } }
] }
] } }
] }
produces = "flow.syntax.statement.if"
component_class = "flow_if_statement_syntax"
source = "blocks/branch_block/Syntax.cpp"

feature.toml — sema, NIR, backends, REPL

Feature manifests are not all the same shape: one file may define built-ins, routes, operations, or tooling. Typical patterns:

Built-in + LLVM route (Core API bundle) — symbol, lowering component, and a [*.route.hostllvm] (or other host) subsection with provider_id, route_id, artifact_kind, required_extensions, and route source:

[core.input]
symbol = "Input"
source = "features/Input/Input.cpp"
component = "nir.core.input_lowering"
component_class = "core.input.root"
generic_t = "type"
param_prompt = ["string"]
returns = ["t"]
builtin_family = "core.input"

[core.input.route.hostllvm]
provider_id = "provider.core.host_llvm"
route_id = "host.llvm"
artifact_kind = "host.llvm_value"
required_extensions = ["llvm.base.v1", "llvm.runtime_import.v1"]
required_runtime_families = ["io"]
description = "Emits host LLVM IR for core.input."
source = "features/Input/backend/routes/host.llvm/Emit.cpp"
component_class = "core.input.backend.hostllvm"

Binary operations (Ops)[operation.*] ties lexer token request, NIR kind, VM dispatch, precedence to a stable op id:

[operation.add]
id = "op.core.add"
mnemonic = "add"
semantic_family = "core.add"
nir_kind = "Add"
token_request_id = "OP_SUM"
vm_dispatch_id = "vm.scalar.add"
requires_non_null_operands = true
infix_precedence = 50
associativity = "left"

Registration glue is generated and wired from manifests plus src/nir/morph/codegen/* and each package’s plugin/ entry; the ABI types in MorphABI.h describe how those pieces attach to the host.


ABI: phases, roles, backends

From MorphABI.h (summaries, not every enum value):

Phases (MorphlangMorphPhase): OPTIMIZER, LOWERING, CODEGEN, SEMANTIC.

Feature roles (MorphlangMorphFeatureRole): include semantic, lowering, sema extension, backend route, syntax extension, REPL behavior, provider, etc.—this is how a feature declares where it plugs into the host.

Backend hosts (MorphlangMorphBackendHostKind): LLVM, GPU_VULKAN, SHADER_SPIRV, NN_GRAPH, REPL, …

Backend artifacts (MorphlangMorphBackendArtifactKind): LLVM values, GPU kernels, shader expressions, NN graph nodes, REPL behavior, …

Together these fields describe a typed edge in the pipeline graph: which IR artifact goes in, what role the feature plays, which host consumes the result.


Why this should appear “everywhere” in docs

  • Language tutorials teach surface syntax; the framework chapter teaches who owns that syntax and where it lowers.
  • Toolchain pages (morph, morphc, morph.toml) describe invocation; the framework describes which packages satisfy those commands and routes.
  • Architecture overviews should always mention that morphs/ drives sema/NIR/backend composition—not optional metadata.

If a Learn Morph page talks about a construct but never names its owning package, it is incomplete for advanced readers.


Where to go next