Partitions allow you to split an asset or job into smaller, independent pieces that can be materialized separately. Backfills enable you to efficiently materialize historical partitions in bulk.Partitioning is essential for:
Processing time-series data: Handle daily, weekly, or monthly data increments
Incremental computation: Only recompute data that changed
Parallel execution: Process multiple partitions simultaneously
Historical replay: Backfill data for specific time ranges
Cost optimization: Avoid reprocessing all data when only recent data changed
Without partitions, you process all data every time:
# ❌ Processes all data on every run@assetdef user_events(): return query_all_events() # Gets millions of rows
With partitions, you process only what you need:
# ✅ Processes one day at a time@asset(partitions_def=DailyPartitionsDefinition(start_date="2024-01-01"))def user_events(context: AssetExecutionContext): date = context.partition_key # e.g., "2024-01-15" return query_events_for_date(date) # Only today's data
from dagster import sensor, RunRequest, SensorEvaluationContext@sensor(target=[releases_metadata])def new_release_sensor(context: SensorEvaluationContext): # Check for new releases new_releases = fetch_new_releases() for release in new_releases: # Add a new partition context.instance.add_dynamic_partitions( partitions_def_name="releases", partition_keys=[release.tag], ) # Request a run for the new partition yield RunRequest( partition_key=release.tag, run_key=release.tag, )
When a partitioned asset depends on another partitioned asset, Dagster automatically maps partitions:
from dagster import asset, DailyPartitionsDefinition, AssetExecutionContextdaily = DailyPartitionsDefinition(start_date="2024-01-01")@asset(partitions_def=daily)def raw_events(context: AssetExecutionContext): date = context.partition_key return fetch_raw_events(date)@asset(partitions_def=daily)def processed_events(context: AssetExecutionContext, raw_events): # raw_events automatically contains data for the same partition return process(raw_events)
Control how backfills execute using BackfillPolicy:
Single-Run Backfill
Multi-Run Backfill
from dagster import asset, DailyPartitionsDefinition, BackfillPolicy@asset( partitions_def=DailyPartitionsDefinition(start_date="2024-01-01"), backfill_policy=BackfillPolicy.single_run(),)def bulk_load_asset(context): # All partitions processed in one execution partitions = context.partition_keys return load_multiple_partitions(partitions)
Efficient for assets where loading multiple partitions together is faster.
from dagster import asset, DailyPartitionsDefinition, BackfillPolicy@asset( partitions_def=DailyPartitionsDefinition(start_date="2024-01-01"), backfill_policy=BackfillPolicy.multi_run(),)def incremental_asset(context): # Each partition processed in a separate run partition = context.partition_key return process_single_partition(partition)
Default behavior. Each partition gets its own run.
Process backfills in chunks to avoid overwhelming resources:
from dagster import asset, DailyPartitionsDefinition, BackfillPolicy@asset( partitions_def=DailyPartitionsDefinition(start_date="2024-01-01"), backfill_policy=BackfillPolicy.single_run(max_partitions_per_run=7),)def chunked_asset(context): # Processes up to 7 partitions per run partitions = context.partition_keys return process_partitions(partitions)
Test partitioned assets by providing a partition key:
from dagster import ( build_asset_context, materialize, DailyPartitionsDefinition,)def test_partitioned_asset(): # Test a specific partition context = build_asset_context(partition_key="2024-01-15") result = my_daily_asset(context) assert result is not None # Test materialization result = materialize( [my_daily_asset], partition_key="2024-01-15", ) assert result.success
Balance between too many small partitions (overhead) and too few large partitions (long execution times). Daily partitions work well for most use cases.
Use TimeWindowPartitionMapping for cross-granularity dependencies
When daily assets feed weekly assets, use TimeWindowPartitionMapping to automatically map partitions.
Implement idempotent partitions
Each partition should be independently recomputable without side effects. This enables safe retries and backfills.
Consider single-run backfills for bulk operations
If loading multiple partitions together is more efficient (e.g., bulk database queries), use BackfillPolicy.single_run().
Monitor partition status
Use the Dagster UI to track which partitions are materialized, failed, or missing. This helps identify gaps in your data.