Installing Python Library (PyArrow)

PyArrow is the Python library for Apache Arrow, providing a Python API for Arrow’s functionality along with tools for integration with pandas, NumPy, and other Python data tools.

Quick Install

Install PyArrow using pip or conda:

pip install pyarrow

These commands install the latest stable version of PyArrow with pre-compiled binary wheels for your platform.

Supported Platforms

PyArrow provides pre-built binary wheels for:

Linux: x86_64, aarch64 (ARM64)
macOS: Intel (x86_64) and Apple Silicon (ARM64)
Windows: x86_64

Supported Python versions: 3.9, 3.10, 3.11, 3.12, and 3.13

Installation Methods

Using pip (PyPI)

Install from the Python Package Index (PyPI):

Install PyArrow

pip install pyarrow

This installs the latest stable version.

Install a specific version

# Install a specific version
pip install pyarrow==15.0.0

# Install the latest version in a major series
pip install "pyarrow>=15.0.0,<16.0.0"

Verify installation

python -c "import pyarrow as pa; print(pa.__version__)"

Windows Users: If you encounter import errors, you may need to install the Visual C++ Redistributable for Visual Studio.

Using conda (conda-forge)

Install from the conda-forge channel:

Install from conda-forge

conda install pyarrow -c conda-forge

The -c conda-forge flag ensures you get the latest version from conda-forge.

Create a new environment with PyArrow

conda create -n arrow-env python=3.11 pyarrow -c conda-forge
conda activate arrow-env

Verify installation

python -c "import pyarrow as pa; print(pa.__version__)"

conda-forge provides binary packages across all supported platforms and often includes additional features like S3 support.

Installing Optional Dependencies

PyArrow has optional dependencies for additional functionality:

# Install PyArrow with pandas
pip install pyarrow pandas

Testing Your Installation

Verify PyArrow is working correctly:

Import PyArrow

import pyarrow as pa
print(f"PyArrow version: {pa.__version__}")

Create a simple array

import pyarrow as pa

# Create an Arrow array
arr = pa.array([1, 2, 3, 4, 5])
print(arr)
print(f"Type: {arr.type}")

Test Parquet support

import pyarrow as pa
import pyarrow.parquet as pq

# Create a simple table
table = pa.table({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
print("Parquet support available!")

Test pandas integration

import pyarrow as pa
import pandas as pd

# Create a pandas DataFrame
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

# Convert to Arrow table
table = pa.Table.from_pandas(df)
print(f"Arrow table shape: {table.shape}")

# Convert back to pandas
df2 = table.to_pandas()
print("Pandas integration working!")

Common Use Cases

Reading and Writing Parquet Files

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'London', 'Paris']
})

# Write to Parquet
table = pa.Table.from_pandas(df)
pq.write_table(table, 'data.parquet')

# Read from Parquet
table2 = pq.read_table('data.parquet')
df2 = table2.to_pandas()
print(df2)

Working with CSV Files

import pyarrow.csv as csv
import pyarrow as pa

# Read CSV file
table = csv.read_csv('data.csv')

# Convert to pandas if needed
df = table.to_pandas()

Memory-Mapped Files for Large Datasets

import pyarrow as pa
import pyarrow.parquet as pq

# Read large Parquet file without loading into memory
parquet_file = pq.ParquetFile('large_data.parquet')

# Read in batches
for batch in parquet_file.iter_batches(batch_size=10000):
    # Process batch
    df = batch.to_pandas()
    # Your processing here

Version Compatibility

PyArrow follows semantic versioning. The Arrow IPC format is stable, but API changes may occur between major versions. Always check the changelog when upgrading.

Python Version Support

Python 3.9+: Recommended for the latest PyArrow versions
Python 3.8: Supported in PyArrow < 15.0
Python 3.7 and earlier: No longer supported

Troubleshooting

ImportError on Windows

If you get an import error on Windows, install the Visual C++ Redistributable:

Download from Microsoft’s official page
Install the x64 version
Restart your Python environment

Module not found error

Ensure PyArrow is installed in the correct Python environment:

# Check which Python you're using
which python

# Check if PyArrow is installed
pip list | grep pyarrow

# Reinstall if needed
pip install --force-reinstall pyarrow

Version conflicts with other packages

If you have conflicts with NumPy or pandas:

# Update all related packages
pip install --upgrade pyarrow pandas numpy

# Or create a fresh environment
conda create -n fresh-env python=3.11 pyarrow pandas numpy
conda activate fresh-env

Slow import times

First import of PyArrow may be slow. This is normal. Subsequent imports are faster:

import pyarrow as pa  # First import may take a few seconds
# Subsequent operations are fast

Building from Source

For development or custom builds:

Building PyArrow from source requires Arrow C++ to be built first. This is an advanced option and pre-built wheels are recommended for most users.

# Clone the repository
git clone https://github.com/apache/arrow.git
cd arrow/python

# Install build dependencies
pip install -r requirements-build.txt

# Build and install
python setup.py build_ext --build-type=release install

See the Python Development Guide for detailed instructions.

Next Steps

Now that you have PyArrow installed:

Explore the PyArrow Documentation
Learn about Pandas Integration
Try the PyArrow Cookbook
Read about Performance Tips

Installation

Quickstart Guides

Installing Python Library (PyArrow)

Quick Install

Supported Platforms

Installation Methods

Using pip (PyPI)

Using conda (conda-forge)

Installing Optional Dependencies

Testing Your Installation

Common Use Cases

Reading and Writing Parquet Files

Working with CSV Files

Memory-Mapped Files for Large Datasets

Version Compatibility

Python Version Support

Troubleshooting

Building from Source

Next Steps

Build docs developers (and LLMs) love

Installation

Quickstart Guides

​Quick Install

​Supported Platforms

​Installation Methods

​Using pip (PyPI)

​Using conda (conda-forge)

​Installing Optional Dependencies

​Testing Your Installation

​Common Use Cases

​Reading and Writing Parquet Files

​Working with CSV Files

​Memory-Mapped Files for Large Datasets

​Version Compatibility

​Python Version Support

​Troubleshooting

​Building from Source

​Next Steps

Build docs developers (and LLMs) love

Quick Install

Supported Platforms

Installation Methods

Using pip (PyPI)

Using conda (conda-forge)

Installing Optional Dependencies

Testing Your Installation

Common Use Cases

Reading and Writing Parquet Files

Working with CSV Files

Memory-Mapped Files for Large Datasets

Version Compatibility

Python Version Support

Troubleshooting

Building from Source

Next Steps