Skip to main content
The Pipeline class is the primary interface for creating and executing PDAL operations. It can be constructed from JSON strings, sequences of Stage objects, or by piping stages together.

Constructor

Pipeline(
    spec: Union[None, str, Sequence[Stage]] = None,
    arrays: Sequence[np.ndarray] = (),
    loglevel: int = logging.ERROR,
    json: Optional[str] = None,
    dataframes: Sequence[DataFrame] = (),
    stream_handlers: Sequence[Callable[[], int]] = ()
)
spec
Union[None, str, Sequence[Stage]]
default:"None"
Pipeline specification. Can be a JSON string or a sequence of Stage objects.
arrays
Sequence[np.ndarray]
default:"()"
Numpy arrays to use as input data for the pipeline.
loglevel
int
default:"logging.ERROR"
Logging level using Python’s logging module constants (ERROR, WARNING, INFO, DEBUG).
json
Optional[str]
default:"None"
JSON string specification (alternative to spec parameter). Cannot be used together with spec.
dataframes
Sequence[DataFrame]
default:"()"
Pandas DataFrames to use as input data. Will be converted to Numpy structured arrays.
stream_handlers
Sequence[Callable[[], int]]
default:"()"
Functions called to populate input arrays during streaming execution. Must match the number of input arrays/dataframes.

Example

import pdal
import logging

# From JSON string
pipeline = pdal.Pipeline('{"pipeline": ["input.las", {"type": "filters.sort", "dimension": "X"}]}')

# From Stage objects
pipeline = pdal.Pipeline([pdal.Reader.las("input.las"), pdal.Filter.sort(dimension="X")])

# Using pipe operator
pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")

# With numpy arrays as input
import numpy as np
array = np.array([(0, 0, 0)], dtype=[('X', float), ('Y', float), ('Z', float)])
pipeline = pdal.Filter.sort(dimension="X").pipeline(array)

Properties

stages

@property
stages -> List[Stage]
Returns a list of Stage objects in the pipeline.

streamable

@property
streamable -> bool
Returns True if all stages in the pipeline support streaming execution.

loglevel

@property
loglevel -> int
Gets or sets the logging level. Accepts Python logging module constants.
pipeline.loglevel = logging.INFO

arrays

arrays: List[np.ndarray]
Numpy structured arrays containing the point cloud data after pipeline execution. Each array represents a point view output from the pipeline.
pipeline.execute()
point_data = pipeline.arrays[0]
print(point_data['X'])  # Access X coordinates

meshes

meshes: List[np.ndarray]
Numpy arrays containing mesh data (triangles) from stages like filters.delaunay. Each triangle is a tuple (A, B, C) of indices into the corresponding point view.

metadata

metadata: dict
Dictionary containing metadata from the pipeline execution. This is automatically parsed from JSON.
pipeline.execute()
print(pipeline.metadata)  # Access metadata as dict

log

log: str
Log output from the pipeline execution.

schema

schema: dict
Dictionary containing the schema information (dimensions and their types) for the point cloud data.
pipeline.execute()
print(pipeline.schema)

pipeline

pipeline: str
JSON string representation of the pipeline configuration. This is the internal representation used by PDAL.

quickinfo

quickinfo: dict
Dictionary containing quick preview information about the data source without fully reading it. Useful for inspecting file headers and metadata.
pipeline = pdal.Reader.las("input.las").pipeline()
info = pipeline.quickinfo
print(info)

srswkt2

srswkt2: str
Spatial reference system in WKT2 format.

Methods

execute

execute(allowed_dims: list = []) -> int
Executes the pipeline in standard (non-streaming) mode.
allowed_dims
list
default:"[]"
Optional list of dimension names to include in the output arrays. If empty, all dimensions are included.
return
int
Total number of points processed.
pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")
count = pipeline.execute()
print(f"Processed {count} points")

# Only load specific dimensions
count = pipeline.execute(allowed_dims=['X', 'Y', 'Z', 'Intensity'])

execute_streaming

execute_streaming(chunk_size: int = 10000, allowed_dims: list = []) -> int
Executes a streamable pipeline in streaming mode without allocating arrays in memory. Useful when the pipeline has Writer stages and you don’t need to access point data.
chunk_size
int
default:"10000"
Number of points to process per chunk.
allowed_dims
list
default:"[]"
Optional list of dimension names to include. If empty, all dimensions are included.
return
int
Total number of points processed.
pipeline = pdal.Reader.las("input.las") | pdal.Writer.las("output.las")
count = pipeline.execute_streaming(chunk_size=10000)

iterator

iterator(chunk_size: int = 10000, prefetch: int = 0, allowed_dims: list = []) -> Iterator[np.ndarray]
Returns an iterator that yields Numpy arrays of up to chunk_size points at a time. Only works with streamable pipelines.
chunk_size
int
default:"10000"
Maximum number of points per yielded array.
prefetch
int
default:"0"
Number of arrays to prefetch and buffer in parallel.
allowed_dims
list
default:"[]"
Optional list of dimension names to include in yielded arrays. If empty, all dimensions are included.
return
Iterator[np.ndarray]
Iterator yielding Numpy structured arrays.
pipeline = pdal.Reader.las("input.las") | pdal.Filter.range(limits="Intensity[100:200]")
for chunk in pipeline.iterator(chunk_size=5000):
    print(f"Processing {len(chunk)} points")
    # Process chunk...

# Only iterate over specific dimensions
for chunk in pipeline.iterator(chunk_size=5000, allowed_dims=['X', 'Y', 'Z']):
    print(f"Processing {len(chunk)} points with X, Y, Z only")

toJSON

toJSON() -> str
Serializes the pipeline to a JSON string representation.
return
str
JSON string of the pipeline configuration.
pipeline = pdal.Reader.las("input.las") | pdal.Filter.sort(dimension="X")
json_str = pipeline.toJSON()
print(json_str)

get_meshio

get_meshio(idx: int) -> Optional[Mesh]
Creates a meshio Mesh object from the point view and mesh data at the specified index. Requires the meshio package to be installed.
idx
int
Index of the point view to convert.
return
Optional[Mesh]
Meshio Mesh object, or None if no mesh data exists.
import pdal

pipeline = pdal.Reader.las("input.las") | pdal.Filter.delaunay()
pipeline.execute()

mesh = pipeline.get_meshio(0)
if mesh:
    mesh.write('output.obj')

get_dataframe

get_dataframe(idx: int) -> Optional[DataFrame]
Converts the point view at the specified index to a Pandas DataFrame. Requires the pandas package to be installed.
idx
int
Index of the point view to convert.
return
Optional[DataFrame]
Pandas DataFrame containing the point data.
pipeline.execute()
df = pipeline.get_dataframe(0)
print(df.head())

get_geodataframe

get_geodataframe(idx: int, xyz: bool = False, crs: Any = None) -> Optional[GeoDataFrame]
Converts the point view at the specified index to a GeoPandas GeoDataFrame with Point geometries. Requires the geopandas package to be installed.
idx
int
Index of the point view to convert.
xyz
bool
default:"False"
If True, creates 3D points including Z coordinates. Otherwise creates 2D points.
crs
Any
default:"None"
Coordinate reference system to assign to the GeoDataFrame.
return
Optional[GeoDataFrame]
GeoPandas GeoDataFrame with Point geometries.
pipeline.execute()
gdf = pipeline.get_geodataframe(0, xyz=True, crs="EPSG:4326")

Pipeline Composition

Pipelines support the pipe operator (|) for composition:
# Pipe stages together
pipeline = stage1 | stage2 | stage3

# Pipe a stage to an existing pipeline
pipeline |= new_stage

# Pipe pipelines together
combined = pipeline1 | pipeline2

Build docs developers (and LLMs) love