Skip to main content

Overview

In this tutorial, you’ll build your first end-to-end data pipeline in Mage. You’ll learn how to:
  • Load data from an API
  • Transform the data
  • Export the results to a database
  • Execute your pipeline
This tutorial takes approximately 10 minutes to complete.

Prerequisites

Before you begin, make sure you have:
  • Mage installed and running (visit http://localhost:6789)
  • Basic knowledge of Python
  • Pandas library (included with Mage)

What You’ll Build

You’ll create a simple ETL pipeline that:
  1. Fetches data from a public API
  2. Cleans and transforms the data
  3. Exports it to a file or database

1

Create a New Pipeline

Navigate to the Mage UI and create a new pipeline:
  1. Click the Pipelines icon in the left sidebar
  2. Click + New pipeline
  3. Select Standard (batch) as the pipeline type
  4. Name your pipeline my_first_pipeline
Create new pipeline
Mage supports multiple pipeline types: Standard (batch), Streaming, and Data integration. We’ll use Standard for this tutorial.
2

Add a Data Loader Block

Data loader blocks are responsible for fetching data from various sources.
  1. Click + Data loader in the pipeline editor
  2. Select Python > API
  3. Name it load_api_data
Replace the template code with:
import io
import pandas as pd
import requests
if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader

@data_loader
def load_data_from_api(*args, **kwargs):
    """
    Load data from a public API endpoint
    """
    # Using JSONPlaceholder API for demo
    url = 'https://jsonplaceholder.typicode.com/users'
    
    response = requests.get(url)
    data = response.json()
    
    # Convert to DataFrame
    df = pd.DataFrame(data)
    
    print(f"Loaded {len(df)} rows")
    return df
The @data_loader decorator tells Mage this function is a data loader block. Mage automatically handles execution order and data passing between blocks.
  1. Click Execute block to test your data loader
  2. Verify the output shows user data in the preview panel
3

Add a Transformer Block

Transformer blocks process and clean your data.
  1. Click + Transformer below your data loader
  2. Select Python > Generic (no template)
  3. Name it transform_data
Add the transformation logic:
if 'transformer' not in globals():
    from mage_ai.data_preparation.decorators import transformer

@transformer
def transform(data, *args, **kwargs):
    """
    Transform the user data:
    - Extract relevant columns
    - Clean company names
    - Add computed fields
    """
    # Select specific columns
    df = data[['id', 'name', 'email', 'company']].copy()
    
    # Extract company name from nested dict
    df['company_name'] = df['company'].apply(lambda x: x['name'])
    
    # Add email domain
    df['email_domain'] = df['email'].apply(lambda x: x.split('@')[1])
    
    # Drop the original company column
    df = df.drop('company', axis=1)
    
    print(f"Transformed {len(df)} rows with {len(df.columns)} columns")
    
    return df
Transformer blocks automatically receive data from their upstream parent blocks as the first parameter.
  1. Execute the block to see the transformed data
  2. Inspect the output to verify the transformations
4

Add a Data Exporter Block

Data exporter blocks write your processed data to destinations.
  1. Click + Data exporter below your transformer
  2. Select Python > Local file
  3. Name it export_to_file
Add the export logic:
import os
from mage_ai.settings.repo import get_repo_path
if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter

@data_exporter
def export_data(data, *args, **kwargs):
    """
    Export data to a CSV file in the project directory
    """
    # Specify output path
    output_path = os.path.join(get_repo_path(), 'exports', 'users.csv')
    
    # Create exports directory if it doesn't exist
    os.makedirs(os.path.dirname(output_path), exist_ok=True)
    
    # Export to CSV
    data.to_csv(output_path, index=False)
    
    print(f"Exported {len(data)} rows to {output_path}")
    
    return {'rows_exported': len(data)}
You can also export to databases like PostgreSQL, MySQL, or cloud storage like S3 and GCS. Check the template options when creating a data exporter.
5

Execute the Complete Pipeline

Now that all blocks are connected, let’s run the entire pipeline:
  1. Click on the last block (data exporter)
  2. Click Execute with all upstream blocks
  3. Watch as Mage executes each block in dependency order
Execute pipeline
You’ll see:
  • Green checkmarks on successfully executed blocks
  • Execution time for each block
  • Output data previews
  • Any print statements or logs
Mage automatically determines the execution order based on block dependencies. Blocks with no dependencies run first, followed by their downstream blocks.
6

Test with Different Blocks

Let’s add a test block to validate your data:
  1. Click the transformer block
  2. Scroll down to the Tests section
  3. Click + Add test
Add a data validation test:
from mage_ai.data_preparation.decorators import test

@test
def test_output(output, *args) -> None:
    """
    Test that the output has the expected columns and no nulls
    """
    assert output is not None, 'Output is undefined'
    assert len(output) > 0, 'No rows in output'
    
    # Check required columns exist
    required_columns = ['id', 'name', 'email', 'company_name', 'email_domain']
    for col in required_columns:
        assert col in output.columns, f'Missing column: {col}'
    
    # Check for null values
    assert output['email'].notna().all(), 'Null values found in email column'
    
    print('All tests passed!')
Execute the test to validate your transformation logic.

Understanding Block Types

Mage provides several block types for different purposes:

Data Loader

Load data from APIs, databases, files, or cloud storage

Transformer

Clean, process, and transform your data

Data Exporter

Write data to databases, files, or cloud destinations

Sensor

Wait for conditions or external events

DBT

Run dbt models as part of your pipeline

Custom

Write custom Python code for any purpose

Block Decorators

Mage uses Python decorators to identify block types:
from mage_ai.data_preparation.decorators import data_loader

@data_loader
def load_data(*args, **kwargs):
    # Returns data to downstream blocks
    return data

Viewing Pipeline Execution Results

After executing your pipeline, you can:
  1. View block outputs: Click any block to see its output data
  2. Check logs: View print statements and execution logs
  3. Inspect variables: See all variables created during execution
  4. Review execution time: Optimize slow blocks
Use the Tree view to visualize your pipeline’s dependency graph and execution flow.

Scheduling Your Pipeline

To run your pipeline automatically:
  1. Click the Triggers icon in the left sidebar
  2. Click Create new trigger
  3. Select Schedule as the trigger type
  4. Configure:
    • Name: daily_user_sync
    • Frequency: Daily at 9 AM
    • Status: Active
schedule_type: time
schedule_interval: "0 9 * * *"
start_time: 2024-01-01 09:00:00
status: active
Mage uses cron syntax for scheduling. The format is: minute hour day month day_of_week

Next Steps

Congratulations! You’ve built your first Mage pipeline. Here’s what to explore next:

ETL Workflow Tutorial

Build a complete ETL workflow with multiple data sources

Streaming Pipeline

Process real-time data with streaming pipelines

DBT Integration

Integrate dbt models into your pipelines

ML Pipeline

Build machine learning pipelines with training and inference

Troubleshooting

  • Check for syntax errors in your code
  • Ensure upstream blocks have executed successfully
  • Verify all required imports are present
  • Check the logs for error messages
  • Mage includes pandas, requests, and common libraries
  • Install additional packages via pip install in a notebook
  • Add dependencies to requirements.txt for production
  • Ensure blocks are properly connected (parent-child relationship)
  • Verify the upstream block returns data
  • Check that the transformer receives data as its first parameter

Learn More

Blocks Documentation

Learn about all block types and capabilities

Configuration

Configure Mage for production deployments

Build docs developers (and LLMs) love