What is a task?
A Flyte task is characterized by:- Containerized execution: Each task runs in its own container on a Kubernetes Pod, isolated from all other tasks.
- Strong typing: All inputs and outputs must be annotated with Python type hints. Flyte validates these at compile time and execution time.
- Versioning: Tasks are versioned (typically aligned with the Git SHA) and immutable once registered.
- Independent executability: Tasks can be run individually, outside of a workflow, for testing and development.
@task. The type annotations on inputs and outputs are required.
Tasks and workflows must always be called with keyword arguments:
Running a task locally
You can execute a Flyte task just like any regular Python function:pyflyte run CLI:
Task types
Flyte distinguishes between tasks based on where and how they execute.Python function tasks (default)
Python function tasks (default)
The most common task type. Any Python function decorated with
@task becomes a PythonFunctionTask. When run on a cluster, it executes inside a container on a Kubernetes Pod.Container tasks
Container tasks
Run arbitrary shell commands or binaries inside a container, without writing Python:
SQL / data warehouse tasks
SQL / data warehouse tasks
Flyte includes backend plugins for running queries on distributed data warehouses. These tasks do not require a Python function body — they delegate execution to the external service:
- Athena (
flytekitplugins-aws-athena) - BigQuery (
flytekitplugins-bigquery) - Snowflake (
flytekitplugins-snowflake) - Hive (via the Hive backend plugin)
Spark tasks
Spark tasks
Submit Apache Spark jobs directly from a Flyte task using the
pyspark plugin:PyTorch / distributed training tasks
PyTorch / distributed training tasks
Use the Kubeflow PyTorch plugin to run distributed training jobs:
Caching task outputs
Flyte supports memoization of task outputs. When you enable caching, Flyte checks whether an identical invocation (same inputs andcache_version) was executed before and returns the stored output instead of re-running the task.
cache=Trueenables caching for this task.cache_versionis a string you control. Change it when you want to invalidate the cache (for example, after fixing a bug in the task logic).
Retries
Flyte categorizes failures into two types and handles them independently:| Error type | Description | Configuration |
|---|---|---|
| User errors | Application-level failures: logic errors, invalid inputs, value errors | retries parameter in @task |
| System errors | Infrastructure failures: spot preemptions, network issues, hardware faults | Platform-level config (max-node-retries-system-failures) |
The number of user retries must be 10 or fewer. All user exceptions are considered non-recoverable unless the exception subclasses
FlyteRecoverableException.Spot instances and retries
Tasks markedinterruptible=True run on preemptible (spot) instances. Preemptions count against the system retry budget, not your user retry budget. The last system retry automatically runs on a non-preemptible instance to guarantee completion:
Timeouts
Use thetimeout parameter to protect against tasks that hang indefinitely. After the timeout period elapses, the task is marked as failed:
Resource allocation
Flyte tasks can declare their resource requirements using theResources object. This allows workflows to be composed of tasks with heterogeneous hardware needs:
limits to cap the maximum resources the task may use:
Map tasks
Usemap_task to parallelize a task over a list of inputs without writing explicit parallelism logic:
What makes a good Flyte task?
When deciding whether a unit of work is a good candidate for a Flyte task, consider:- Well-defined exit criteria: A task is expected to exit after processing its inputs. Long-running daemons do not fit the task model.
- Repeatability: Under certain circumstances (retries, re-runs), a task may be executed multiple times with the same inputs. It should produce the same output every time. Avoid using random seeds based on the current clock.
- Minimal side effects: Tasks should be pure functions where possible. When side effects are unavoidable (e.g., writing to a database), ensure the operation is idempotent.