Building Your First Pipeline

Overview

In this tutorial, you’ll build your first end-to-end data pipeline in Mage. You’ll learn how to:

Load data from an API
Transform the data
Export the results to a database
Execute your pipeline

This tutorial takes approximately 10 minutes to complete.

Prerequisites

Before you begin, make sure you have:

Mage installed and running (visit http://localhost:6789)
Basic knowledge of Python
Pandas library (included with Mage)

What You’ll Build

You’ll create a simple ETL pipeline that:

Fetches data from a public API
Cleans and transforms the data
Exports it to a file or database

Create a New Pipeline

Navigate to the Mage UI and create a new pipeline:

Click the Pipelines icon in the left sidebar
Click + New pipeline
Select Standard (batch) as the pipeline type
Name your pipeline my_first_pipeline

Mage supports multiple pipeline types: Standard (batch), Streaming, and Data integration. We’ll use Standard for this tutorial.

Add a Data Loader Block

Data loader blocks are responsible for fetching data from various sources.

Click + Data loader in the pipeline editor
Select Python > API
Name it load_api_data

Replace the template code with:

import io
import pandas as pd
import requests
if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader

@data_loader
def load_data_from_api(*args, **kwargs):
    """
    Load data from a public API endpoint
    """
    # Using JSONPlaceholder API for demo
    url = 'https://jsonplaceholder.typicode.com/users'
    
    response = requests.get(url)
    data = response.json()
    
    # Convert to DataFrame
    df = pd.DataFrame(data)
    
    print(f"Loaded {len(df)} rows")
    return df

The @data_loader decorator tells Mage this function is a data loader block. Mage automatically handles execution order and data passing between blocks.

Click Execute block to test your data loader
Verify the output shows user data in the preview panel

Add a Transformer Block

Transformer blocks process and clean your data.

Click + Transformer below your data loader
Select Python > Generic (no template)
Name it transform_data

Add the transformation logic:

if 'transformer' not in globals():
    from mage_ai.data_preparation.decorators import transformer

@transformer
def transform(data, *args, **kwargs):
    """
    Transform the user data:
    - Extract relevant columns
    - Clean company names
    - Add computed fields
    """
    # Select specific columns
    df = data[['id', 'name', 'email', 'company']].copy()
    
    # Extract company name from nested dict
    df['company_name'] = df['company'].apply(lambda x: x['name'])
    
    # Add email domain
    df['email_domain'] = df['email'].apply(lambda x: x.split('@')[1])
    
    # Drop the original company column
    df = df.drop('company', axis=1)
    
    print(f"Transformed {len(df)} rows with {len(df.columns)} columns")
    
    return df

Transformer blocks automatically receive data from their upstream parent blocks as the first parameter.

Execute the block to see the transformed data
Inspect the output to verify the transformations

Add a Data Exporter Block

Data exporter blocks write your processed data to destinations.

Click + Data exporter below your transformer
Select Python > Local file
Name it export_to_file

Add the export logic:

import os
from mage_ai.settings.repo import get_repo_path
if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter

@data_exporter
def export_data(data, *args, **kwargs):
    """
    Export data to a CSV file in the project directory
    """
    # Specify output path
    output_path = os.path.join(get_repo_path(), 'exports', 'users.csv')
    
    # Create exports directory if it doesn't exist
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    
    # Export to CSV
    data.to_csv(output_path, index=False)
    
    print(f"Exported {len(data)} rows to {output_path}")
    
    return {'rows_exported': len(data)}

You can also export to databases like PostgreSQL, MySQL, or cloud storage like S3 and GCS. Check the template options when creating a data exporter.

Execute the Complete Pipeline

Now that all blocks are connected, let’s run the entire pipeline:

Click on the last block (data exporter)
Click Execute with all upstream blocks
Watch as Mage executes each block in dependency order

You’ll see:

Green checkmarks on successfully executed blocks
Execution time for each block
Output data previews
Any print statements or logs

Mage automatically determines the execution order based on block dependencies. Blocks with no dependencies run first, followed by their downstream blocks.

Test with Different Blocks

Let’s add a test block to validate your data:

Click the transformer block
Scroll down to the Tests section
Click + Add test

Add a data validation test:

from mage_ai.data_preparation.decorators import test

@test
def test_output(output, *args) -> None:
    """
    Test that the output has the expected columns and no nulls
    """
    assert output is not None, 'Output is undefined'
    assert len(output) > 0, 'No rows in output'
    
    # Check required columns exist
    required_columns = ['id', 'name', 'email', 'company_name', 'email_domain']
    for col in required_columns:
        assert col in output.columns, f'Missing column: {col}'
    
    # Check for null values
    assert output['email'].notna().all(), 'Null values found in email column'
    
    print('All tests passed!')

Execute the test to validate your transformation logic.

Understanding Block Types

Mage provides several block types for different purposes:

Data Loader

Load data from APIs, databases, files, or cloud storage

Transformer

Clean, process, and transform your data

Data Exporter

Write data to databases, files, or cloud destinations

Sensor

Wait for conditions or external events

DBT

Run dbt models as part of your pipeline

Custom

Write custom Python code for any purpose

Block Decorators

Mage uses Python decorators to identify block types:

from mage_ai.data_preparation.decorators import data_loader

@data_loader
def load_data(*args, **kwargs):
    # Returns data to downstream blocks
    return data

Viewing Pipeline Execution Results

After executing your pipeline, you can:

View block outputs: Click any block to see its output data
Check logs: View print statements and execution logs
Inspect variables: See all variables created during execution
Review execution time: Optimize slow blocks

Use the Tree view to visualize your pipeline’s dependency graph and execution flow.

Scheduling Your Pipeline

To run your pipeline automatically:

Click the Triggers icon in the left sidebar
Click Create new trigger
Select Schedule as the trigger type
Configure:
- Name: daily_user_sync
- Frequency: Daily at 9 AM
- Status: Active

schedule_type: time
schedule_interval: "0 9 * * *"
start_time: 2024-01-01 09:00:00
status: active

Mage uses cron syntax for scheduling. The format is: minute hour day month day_of_week

Next Steps

Congratulations! You’ve built your first Mage pipeline. Here’s what to explore next:

ETL Workflow Tutorial

Build a complete ETL workflow with multiple data sources

Streaming Pipeline

Process real-time data with streaming pipelines

DBT Integration

Integrate dbt models into your pipelines

ML Pipeline

Build machine learning pipelines with training and inference

Troubleshooting

Block won't execute

Check for syntax errors in your code
Ensure upstream blocks have executed successfully
Verify all required imports are present
Check the logs for error messages

Import errors

Mage includes pandas, requests, and common libraries
Install additional packages via pip install in a notebook
Add dependencies to requirements.txt for production

Data not passing between blocks

Ensure blocks are properly connected (parent-child relationship)
Verify the upstream block returns data
Check that the transformer receives data as its first parameter

Learn More

Blocks Documentation

Learn about all block types and capabilities

Configuration

Configure Mage for production deployments

Tutorials

Best Practices

Migration

Building Your First Pipeline

Overview

Prerequisites

What You’ll Build

Understanding Block Types

Data Loader

Transformer

Data Exporter

Sensor

DBT

Custom

Block Decorators

Viewing Pipeline Execution Results

Scheduling Your Pipeline

Next Steps

ETL Workflow Tutorial

Streaming Pipeline

DBT Integration

ML Pipeline

Troubleshooting

Learn More

Blocks Documentation

Configuration

Build docs developers (and LLMs) love

Tutorials

Best Practices

Migration

​Overview

​Prerequisites

​What You’ll Build

​Understanding Block Types

Data Loader

Transformer

Data Exporter

Sensor

DBT

Custom

​Block Decorators

​Viewing Pipeline Execution Results

​Scheduling Your Pipeline

​Next Steps

ETL Workflow Tutorial

Streaming Pipeline

DBT Integration

ML Pipeline

​Troubleshooting

​Learn More

Blocks Documentation

Configuration

Build docs developers (and LLMs) love

Overview

Prerequisites

What You’ll Build

Understanding Block Types

Block Decorators

Viewing Pipeline Execution Results

Scheduling Your Pipeline

Next Steps

Troubleshooting

Learn More