Pipelines
A pipeline is a directed acyclic graph (DAG) of blocks that process data. Pipelines define the workflow from data ingestion to export.Pipeline Types
Mage supports several pipeline types:Standard (Batch)
Traditional ETL pipelines that run on a schedule or manually. Process data in batches.
Streaming
Real-time data processing pipelines for continuous data streams.
Integration
Pre-built connectors for syncing data between sources and destinations.
DBT
Run dbt models directly inside Mage with full observability.
Creating a Pipeline
Pipelines are created with thePipeline.create() method or via the CLI:
Pipeline Structure
Pipelines are stored in your project directory:metadata.yaml file contains:
Blocks
Blocks are the building units of pipelines. Each block performs a specific function in your data workflow. Blocks are Python, SQL, or R files that execute code and pass data to downstream blocks.Block Types
Mage has several block types, each serving a different purpose:- Data Loader
- Transformer
- Data Exporter
- Other Block Types
Data Loader
Data loaders import data from external sources into your pipeline.- Load data from APIs
- Read from databases (PostgreSQL, MySQL, BigQuery, etc.)
- Import from files (CSV, Parquet, JSON)
- Fetch from cloud storage (S3, GCS, Azure Blob)
Block Structure
Blocks are stored as Python files in your project:Block Decorators
Mage uses Python decorators to define block functionality:| Decorator | Purpose | Return Type |
|---|---|---|
@data_loader | Load data from external sources | DataFrame, dict, list, or any serializable object |
@transformer | Transform data from upstream blocks | DataFrame, dict, list, or any serializable object |
@data_exporter | Export data to external destinations | None (or optionally return data) |
@test | Test block output | None (raises AssertionError on failure) |
@sensor | Check external conditions | bool (True to proceed) |
@custom | Execute custom logic | Any |
@callback | Run after block completion | None |
Block Dependencies
Blocks are connected through upstream and downstream relationships:Data Flow
Data flows through your pipeline automatically:Block execution
When a block runs, it:
- Receives output from upstream blocks as function arguments
- Executes your code
- Returns data to downstream blocks
Multiple Outputs
Blocks can return multiple outputs:Variables and Configuration
Global Variables
Pass variables to your pipeline:Block Configuration
Store block-specific config inio_config.yaml:
io_config.yaml
Testing
Every block should include tests:Execution Modes
Sequential Execution
Blocks run one at a time in dependency order:Parallel Execution
Independent blocks run in parallel:Block-Level Execution
Run a single block:Advanced Features
Dynamic Blocks
Generate blocks dynamically at runtime:Conditional Blocks
Execute blocks conditionally:SQL Blocks
Write SQL queries directly:Best Practices
Keep blocks focused
Each block should do one thing well. Small, reusable blocks are easier to test and maintain.
Write tests
Always include
@test functions to validate your data quality and business logic.Use descriptive names
Name blocks clearly:
load_customer_data, not load_data_1.Document your code
Add docstrings to explain what each block does and any important assumptions.
Next Steps
Quick Start
Build your first pipeline
API Reference
Explore the complete API
Examples
Browse real-world examples
Deployment
Deploy to production