Compilation Pipeline

AXON is a compiled language with a multi-stage transformation pipeline. Unlike interpreted languages that execute source directly, AXON transforms .axon files through multiple representations before generating backend-specific prompts for LLMs.

Pipeline Overview

Lexer — Source → Tokens

Character stream becomes structured tokens

Parser — Tokens → AST

Token stream becomes cognitive syntax tree

Type Checker — Semantic Validation

AST validated for type correctness

IR Generator — AST → IR

Cognitive AST lowered to intermediate representation

Backend — IR → Prompts

IR compiled to model-specific prompts

Runtime — Execution + Validation

Prompts executed, validated, traced

Stage 1: Lexer (Tokenization)

Purpose

Convert raw .axon source text into a stream of tokens — the atomic units of the language.

Implementation

Location: /axon/compiler/lexer.py:1
Type: Hand-written, single-pass character scanner

lexer.py

class Lexer:
    """Tokenizes AXON source code into a stream of Token objects."""
    
    def tokenize(self) -> list[Token]:
        """Scan the entire source and return all tokens."""
        while not self._at_end():
            self._skip_whitespace()
            if self._at_end():
                break
            self._scan_token()
        self._tokens.append(Token(TokenType.EOF, "", self._line, self._column))
        return self._tokens

Token Types

The lexer recognizes 35 keywords (cognitive primitives) and various symbols:

KEYWORDS = {
    "persona": TokenType.PERSONA,
    "context": TokenType.CONTEXT,
    "intent": TokenType.INTENT,
    "flow": TokenType.FLOW,
    "reason": TokenType.REASON,
    "anchor": TokenType.ANCHOR,
    "validate": TokenType.VALIDATE,
    "refine": TokenType.REFINE,
    "memory": TokenType.MEMORY,
    "tool": TokenType.TOOL,
    "probe": TokenType.PROBE,
    "weave": TokenType.WEAVE,
    # ... and more
}

Features

Comment Stripping

Removes // line comments and /* */ block comments

String Escapes

Handles \n, \t, \", \\ in string literals

Keyword Discrimination

Distinguishes flow (keyword) from flow_name (identifier)

Location Tracking

Tracks line and column for error messages

Example

Input source

persona Expert {
  domain: ["AI", "ML"]
  confidence_threshold: 0.85
}

Output tokens

[
  Token(PERSONA, "persona", line=1, col=1),
  Token(IDENTIFIER, "Expert", line=1, col=9),
  Token(LBRACE, "{", line=1, col=16),
  Token(IDENTIFIER, "domain", line=2, col=3),
  Token(COLON, ":", line=2, col=9),
  Token(LBRACKET, "[", line=2, col=11),
  Token(STRING, "AI", line=2, col=12),
  Token(COMMA, ",", line=2, col=16),
  Token(STRING, "ML", line=2, col=18),
  Token(RBRACKET, "]", line=2, col=22),
  # ...
]

Stage 2: Parser (AST Construction)

Purpose

Transform the flat token stream into a hierarchical Abstract Syntax Tree (AST) representing the program’s cognitive structure.

Implementation

Location: /axon/compiler/parser.py:1
Algorithm: Recursive descent parser with one method per grammar rule

parser.py

class Parser:
    """Recursive descent parser for the AXON language."""
    
    def parse(self) -> ProgramNode:
        """Parse the full program → ProgramNode."""
        program = ProgramNode(line=1, column=1)
        while not self._check(TokenType.EOF):
            decl = self._parse_declaration()
            if decl is not None:
                program.declarations.append(decl)
        return program

AST Node Types

Each cognitive primitive has a corresponding AST node:

Declarations
Statements
Expressions

PersonaDefinition
ContextDefinition
AnchorConstraint
FlowDefinition
ToolDefinition
MemoryDefinition
TypeDefinition

RunStatement
IntentNode
StepNode
ValidateGate
RefineBlock

ReasonChain
ProbeDirective
WeaveNode
UseToolNode
RememberNode
RecallNode

Example AST

Input

flow Analyze(doc: Document) -> Summary {
  step Extract {
    probe doc for [entities, dates]
    output: EntityMap
  }
}

Output AST (simplified)

FlowDefinition(
  name="Analyze",
  parameters=[
    ParameterNode(name="doc", type_expr="Document")
  ],
  return_type="Summary",
  steps=[
    StepNode(
      name="Extract",
      body=[
        ProbeDirective(
          target="doc",
          extract_fields=["entities", "dates"]
        )
      ],
      output_type="EntityMap"
    )
  ]
)

Cognitive AST: Unlike traditional ASTs with mechanical nodes (e.g., BinaryExpression), AXON’s AST uses semantic nodes (ReasonChain, ProbeDirective) that map directly to cognitive operations.

Stage 3: Type Checker (Semantic Validation)

Purpose

Validate the semantic correctness of the program using AXON’s epistemic type system.

Implementation

Location: /axon/compiler/type_checker.py:1
Type System: Epistemic partial order lattice

type_checker.py

class TypeChecker:
    """Semantic type validation for AXON programs."""
    
    def check(self) -> list[AxonTypeError]:
        """Validate the entire program and return type errors."""
        self._build_symbol_table()
        self._check_declarations()
        self._check_references()
        return self._errors

Validation Rules

Symbol Table Construction

Build registry of all personas, contexts, flows, anchors, types

Type Compatibility

Check Opinion ≰ FactualClaim and other subsumption rules

Reference Resolution

Ensure run Analyze(doc) references a defined flow Analyze

Uncertainty Propagation

Track Uncertainty taint through operations

Range Validation

Verify RiskScore(0.0..1.0) only accepts values in range

Type Errors

The type checker returns a list of AxonTypeError objects:

class AxonTypeError(AxonError):
    def __init__(self, message: str, line: int, column: int):
        self.message = message
        self.line = line
        self.column = column

Compile-Time Safety: Type errors prevent compilation. AXON guarantees that well-typed programs satisfy epistemic constraints.

Stage 4: IR Generator (Lowering)

Purpose

Lower the cognitive AST into the AXON Intermediate Representation (IR) — a JSON-serializable format ready for backend compilation.

Implementation

Location: /axon/compiler/ir_generator.py:1
Pattern: Visitor pattern with explicit dispatch

ir_generator.py

class IRGenerator:
    """Transforms a type-checked AST into AXON IR."""
    
    def generate(self, program: ast.ProgramNode) -> IRProgram:
        """Generate a complete IR program from validated AST."""
        self._reset()
        
        # Phase 1: Convert declarations
        for decl in program.declarations:
            self._visit(decl)
        
        # Phase 2: Resolve cross-references
        self._resolve_references()
        
        return IRProgram(
            personas=list(self._personas.values()),
            flows=list(self._flows.values()),
            runs=self._runs,
            # ...
        )

IR Node Types

The IR uses simplified, backend-agnostic nodes:

ir_nodes.py

@dataclass
class IRFlow:
    name: str
    parameters: list[IRParameter]
    return_type: str
    steps: list[IRStep]
    data_edges: list[IRDataEdge]  # Dependency graph

@dataclass
class IRStep:
    name: str
    directives: list[IRNode]  # IRReason, IRProbe, IRWeave, etc.
    output_type: str
    refinement: IRRefine | None

Cross-Reference Resolution

The IR generator links symbolic references:

Source

run AnalyzeContract(doc)
  as LegalExpert              // Resolves to IRPersona
  within LegalReview           // Resolves to IRContext
  constrained_by [NoHallucination]  // Resolves to IRAnchor

Generated IR

IRRun(
  flow_name="AnalyzeContract",
  flow_ref=IRFlow(...),         # Direct reference
  persona_ref=IRPersona(...),   # Direct reference
  context_ref=IRContext(...),   # Direct reference
  anchors=[IRAnchor(...)],      # Direct references
)

Why IR? The IR decouples language design from backend implementation. New backends (e.g., Gemini, Llama) only need to compile IR, not parse AXON source.

Stage 5: Backend (Prompt Compilation)

Purpose

Compile the backend-agnostic IR into model-specific prompts for LLM providers.

Supported Backends

Anthropic

Claude 3.x (Opus, Sonnet, Haiku)

OpenAI

GPT-4, GPT-4 Turbo, GPT-3.5

Gemini

Gemini 1.5 Pro, Flash

Ollama

Local models (Llama, Mistral)

Backend Interface

Location: /axon/backends/base_backend.py:1

base_backend.py

class BaseBackend(ABC):
    """Abstract interface for all AXON compiler backends."""
    
    @abstractmethod
    def compile_flow(self, ir_flow: IRFlow) -> str:
        """Compile an IR flow into a model-specific prompt."""
        pass
    
    @abstractmethod
    def compile_persona(self, ir_persona: IRPersona) -> str:
        """Compile an IR persona into system prompt."""
        pass

Example: Anthropic Backend

anthropic.py

class AnthropicBackend(BaseBackend):
    def compile_persona(self, ir_persona: IRPersona) -> str:
        """Generate Claude-compatible system prompt."""
        prompt = f"You are {ir_persona.name}.\n"
        prompt += f"Your expertise: {', '.join(ir_persona.domain)}\n"
        prompt += f"Tone: {ir_persona.tone}\n"
        if ir_persona.confidence_threshold:
            prompt += f"Confidence threshold: {ir_persona.confidence_threshold}\n"
        return prompt

Backend Customization: Each backend optimizes prompts for its model’s strengths. Anthropic uses XML tags, OpenAI prefers JSON, Gemini uses structured examples.

Stage 6: Runtime (Execution + Validation)

Purpose

Execute the compiled prompts, validate outputs, handle failures, and trace execution.

Runtime Components

Executor
SemanticValidator
RetryEngine
Tracer

Location: /axon/runtime/executor.py:1Orchestrates flow execution:

Resolves step dependencies (DAG)
Invokes model with compiled prompts
Passes outputs between steps

Location: /axon/runtime/semantic_validator.py:1Validates outputs at runtime:

Classifies epistemic type of LLM output
Checks actual_type ≤ declared_type
Raises ValidationError on mismatch

Location: /axon/runtime/retry_engine.py:1Handles failures adaptively:

Retries with backoff (none, linear, exponential)
Injects failure context into next attempt
Raises RefineExhaustedError after max attempts

Location: /axon/runtime/tracer.py:1Records execution:

14 event types (step start/end, validation, retry, etc.)
JSON trace output for debugging
Performance metrics

Execution Flow

CLI Usage

Check (Lex + Parse + Type Check)

axon check program.axon

Outputs type errors without executing.

Compile (Generate IR)

axon compile program.axon -o program.ir.json

Produces JSON IR for inspection or caching.

Run (End-to-End)

axon run program.axon -b anthropic --trace

Executes with chosen backend and saves trace.

Pipeline Comparison

Stage	Traditional Compiler	AXON
Input	Source code	`.axon` source
Lexer	Tokens	Tokens (with cognitive keywords)
Parser	AST (mechanical)	AST (cognitive nodes)
Type Check	Memory safety	Epistemic correctness
IR	LLVM IR, bytecode	AXON IR (JSON)
Backend	Assembly, machine code	Model-specific prompts
Runtime	CPU execution	LLM invocation + validation
Output	Binary executable	Validated semantic result

Next Steps

Cognitive Primitives

Learn what gets compiled

Type System

Understand type checking

Error Handling

See runtime behavior

CLI Reference

Use the compiler tools

Get Started

Core Concepts

Language Reference

Examples

​Pipeline Overview

​Stage 1: Lexer (Tokenization)

​Purpose

​Implementation

​Token Types

​Features

Comment Stripping

String Escapes

Keyword Discrimination

Location Tracking

​Example

​Stage 2: Parser (AST Construction)

​Purpose

​Implementation

​AST Node Types

​Example AST

​Stage 3: Type Checker (Semantic Validation)

​Purpose

​Implementation

​Validation Rules

​Type Errors

​Stage 4: IR Generator (Lowering)

​Purpose

​Implementation

​IR Node Types

​Cross-Reference Resolution

​Stage 5: Backend (Prompt Compilation)

​Purpose

​Supported Backends

Anthropic

OpenAI

Gemini

Ollama

​Backend Interface

​Example: Anthropic Backend

​Stage 6: Runtime (Execution + Validation)

​Purpose

​Runtime Components

​Execution Flow

​CLI Usage

​Check (Lex + Parse + Type Check)

​Compile (Generate IR)

​Run (End-to-End)

​Pipeline Comparison

​Next Steps

Cognitive Primitives

Type System

Error Handling

CLI Reference

Build docs developers (and LLMs) love

Pipeline Overview

Stage 1: Lexer (Tokenization)

Purpose

Implementation

Token Types

Features

Example

Stage 2: Parser (AST Construction)

Purpose

Implementation

AST Node Types

Example AST

Stage 3: Type Checker (Semantic Validation)

Purpose

Implementation

Validation Rules

Type Errors

Stage 4: IR Generator (Lowering)

Purpose

Implementation

IR Node Types

Cross-Reference Resolution

Stage 5: Backend (Prompt Compilation)

Purpose

Supported Backends

Backend Interface

Example: Anthropic Backend

Stage 6: Runtime (Execution + Validation)

Purpose

Runtime Components

Execution Flow

CLI Usage

Check (Lex + Parse + Type Check)

Compile (Generate IR)

Run (End-to-End)

Pipeline Comparison

Next Steps