Skip to main content

Array

Base class for all Arrow array types.
import pyarrow as pa

arr = pa.array([1, 2, 3, 4, 5])
print(arr)

array()

Create a pyarrow.Array instance from a Python object.
pa.array(obj, type=None, mask=None, size=None, from_pandas=None, safe=True, memory_pool=None)
obj
sequence, iterable, ndarray, pandas.Series
If both type and size are specified may be a single use iterable. If not strongly-typed, Arrow type will be inferred for resulting array. Any Arrow-compatible array that implements the Arrow PyCapsule Protocol can be passed as well.
type
pyarrow.DataType
default:"None"
Explicit type to attempt to coerce to, otherwise will be inferred from the data.
mask
array[bool]
default:"None"
Indicate which values are null (True) or not null (False).
size
int64
default:"None"
Size of the elements. If the input is larger than size bail at this length.
from_pandas
bool
default:"None"
Use pandas’s semantics for inferring nulls from values in ndarray-like data. Defaults to False if not passed explicitly, or True if a pandas object is passed in.
safe
bool
default:"True"
Check for overflows or other unsafe conversions.
memory_pool
pyarrow.MemoryPool
default:"None"
If not passed, will allocate memory from the currently-set default memory pool.
array
pyarrow.Array or pyarrow.ChunkedArray
A ChunkedArray instead of an Array is returned if the object data overflowed binary storage or the object’s __arrow_array__ protocol method returned a chunked array.

Properties

type

Return the data type of the array.
arr = pa.array([1, 2, 3])
print(arr.type)  # int64
type
pyarrow.DataType
The Arrow data type of this array.

null_count

Return the number of null values in the array.
arr = pa.array([1, None, 3])
print(arr.null_count)  # 1
null_count
int
The count of null values.

Methods

slice()

Compute zero-copy slice of this array.
array.slice(offset=0, length=None)
offset
int
default:"0"
Offset from start of array to slice.
length
int
default:"None"
Length of slice (default is until end of array from offset).
sliced
pyarrow.Array
A zero-copy slice of the array.

cast()

Cast array values to another data type.
array.cast(target_type, safe=True)
target_type
pyarrow.DataType
Type to cast to.
safe
bool
default:"True"
Check for overflows or other unsafe conversions.
casted
pyarrow.Array
Array with values cast to the target type.

to_pylist()

Convert to a Python list.
arr = pa.array([1, 2, 3])
py_list = arr.to_pylist()
print(py_list)  # [1, 2, 3]
list
list
A Python list with the array values.

to_numpy()

Convert to a NumPy array.
array.to_numpy(zero_copy_only=True, writable=False)
zero_copy_only
bool
default:"True"
If True, raise an exception if conversion requires copying data.
writable
bool
default:"False"
For numpy arrays created with zero copy, return a writable view.
array
numpy.ndarray
A NumPy array with the data.

ChunkedArray

An array-like composed from a collection of pyarrow.Arrays.
import pyarrow as pa

chunked = pa.chunked_array([[1, 2], [3, 4, 5]])
print(chunked)
print(f"Number of chunks: {chunked.num_chunks}")

chunked_array()

Construct a ChunkedArray from a list of arrays.
pa.chunked_array(arrays, type=None)
arrays
list of Array
List of arrays to compose into a ChunkedArray.
type
pyarrow.DataType
default:"None"
If provided, all arrays will be cast to this type.
chunked_array
pyarrow.ChunkedArray
A ChunkedArray composed of the input arrays.

Properties

num_chunks

Number of underlying chunks.
chunked = pa.chunked_array([[1, 2], [3, 4]])
print(chunked.num_chunks)  # 2
num_chunks
int
The number of chunks.

chunks

List of chunks.
chunked = pa.chunked_array([[1, 2], [3, 4]])
for chunk in chunked.chunks:
    print(chunk)
chunks
list of Array
The underlying chunks as a list.

Methods

chunk()

Select a chunk by its index.
chunked_array.chunk(i)
i
int
Index of the chunk to select.
chunk
pyarrow.Array
The selected chunk.

Typed Array Classes

NumericArray

Base class for all numeric array types (integers and floats).

IntegerArray

Base class for all integer array types.

Int8Array, Int16Array, Int32Array, Int64Array

Signed integer arrays of 8, 16, 32, and 64 bits respectively.
arr = pa.array([1, 2, 3], type=pa.int32())
print(type(arr))  # Int32Array

UInt8Array, UInt16Array, UInt32Array, UInt64Array

Unsigned integer arrays of 8, 16, 32, and 64 bits respectively.

FloatingPointArray

Base class for floating point array types.

HalfFloatArray, FloatArray, DoubleArray

Floating point arrays for 16-bit (half), 32-bit (float), and 64-bit (double) precision.
arr = pa.array([1.0, 2.5, 3.7], type=pa.float32())
print(type(arr))  # FloatArray

BooleanArray

Boolean (true/false) array.
arr = pa.array([True, False, True])
print(type(arr))  # BooleanArray

StringArray

Variable-length UTF-8 string array.
arr = pa.array(["hello", "world", "arrow"])
print(type(arr))  # StringArray

BinaryArray

Variable-length binary array.
arr = pa.array([b"hello", b"world"], type=pa.binary())
print(type(arr))  # BinaryArray

ListArray

Array of variable-length lists.
arr = pa.array([[1, 2], [3, 4, 5], [6]])
print(type(arr))  # ListArray

StructArray

Array of structured (named fields) values.
struct_type = pa.struct([('x', pa.int64()), ('y', pa.float64())])
arr = pa.array([{'x': 1, 'y': 1.5}, {'x': 2, 'y': 2.5}], type=struct_type)
print(type(arr))  # StructArray

DictionaryArray

Array with dictionary encoding (categorical data).
indices = pa.array([0, 1, 0, 1, 2])
dictionary = pa.array(['cat', 'dog', 'bird'])
dict_array = pa.DictionaryArray.from_arrays(indices, dictionary)
print(dict_array)

TimestampArray

Array of timestamp values with timezone and unit.
from datetime import datetime
arr = pa.array([datetime(2020, 1, 1), datetime(2021, 1, 1)])
print(type(arr))  # TimestampArray

Date32Array, Date64Array

Date arrays stored as 32-bit or 64-bit integers.

Time32Array, Time64Array

Time of day arrays with 32-bit or 64-bit storage.

DurationArray

Array of duration (time interval) values.

Utility Functions

nulls()

Create an array of all null values.
pa.nulls(size, type=None, memory_pool=None)
size
int
Number of null values.
type
pyarrow.DataType
default:"None"
Data type (defaults to null type).
memory_pool
pyarrow.MemoryPool
default:"None"
Memory pool for allocation.
array
pyarrow.Array
Array of null values.

repeat()

Create an array by repeating a value.
pa.repeat(value, size, memory_pool=None)
value
scalar-like
Value to repeat.
size
int
Number of times to repeat the value.
memory_pool
pyarrow.MemoryPool
default:"None"
Memory pool for allocation.
array
pyarrow.Array
Array with repeated value.

concat_arrays()

Concatenate multiple arrays into a single array.
pa.concat_arrays(arrays, memory_pool=None)
arrays
list of Array
Arrays to concatenate.
memory_pool
pyarrow.MemoryPool
default:"None"
Memory pool for allocation.
array
pyarrow.Array
Concatenated array.

Build docs developers (and LLMs) love