Skip to main content
PyArrow is the Python library for Apache Arrow, providing a Python API for Arrow’s functionality along with tools for integration with pandas, NumPy, and other Python data tools.

Quick Install

Install PyArrow using pip or conda:
pip install pyarrow
These commands install the latest stable version of PyArrow with pre-compiled binary wheels for your platform.

Supported Platforms

PyArrow provides pre-built binary wheels for:
  • Linux: x86_64, aarch64 (ARM64)
  • macOS: Intel (x86_64) and Apple Silicon (ARM64)
  • Windows: x86_64
Supported Python versions: 3.9, 3.10, 3.11, 3.12, and 3.13

Installation Methods

Using pip (PyPI)

Install from the Python Package Index (PyPI):
1

Install PyArrow

pip install pyarrow
This installs the latest stable version.
2

Install a specific version

# Install a specific version
pip install pyarrow==15.0.0

# Install the latest version in a major series
pip install "pyarrow>=15.0.0,<16.0.0"
3

Verify installation

python -c "import pyarrow as pa; print(pa.__version__)"
Windows Users: If you encounter import errors, you may need to install the Visual C++ Redistributable for Visual Studio.

Using conda (conda-forge)

Install from the conda-forge channel:
1

Install from conda-forge

conda install pyarrow -c conda-forge
The -c conda-forge flag ensures you get the latest version from conda-forge.
2

Create a new environment with PyArrow

conda create -n arrow-env python=3.11 pyarrow -c conda-forge
conda activate arrow-env
3

Verify installation

python -c "import pyarrow as pa; print(pa.__version__)"
conda-forge provides binary packages across all supported platforms and often includes additional features like S3 support.

Installing Optional Dependencies

PyArrow has optional dependencies for additional functionality:
# Install PyArrow with pandas
pip install pyarrow pandas

Testing Your Installation

Verify PyArrow is working correctly:
1

Import PyArrow

import pyarrow as pa
print(f"PyArrow version: {pa.__version__}")
2

Create a simple array

import pyarrow as pa

# Create an Arrow array
arr = pa.array([1, 2, 3, 4, 5])
print(arr)
print(f"Type: {arr.type}")
3

Test Parquet support

import pyarrow as pa
import pyarrow.parquet as pq

# Create a simple table
table = pa.table({'a': [1, 2, 3], 'b': ['x', 'y', 'z']})
print("Parquet support available!")
4

Test pandas integration

import pyarrow as pa
import pandas as pd

# Create a pandas DataFrame
df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

# Convert to Arrow table
table = pa.Table.from_pandas(df)
print(f"Arrow table shape: {table.shape}")

# Convert back to pandas
df2 = table.to_pandas()
print("Pandas integration working!")

Common Use Cases

Reading and Writing Parquet Files

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'London', 'Paris']
})

# Write to Parquet
table = pa.Table.from_pandas(df)
pq.write_table(table, 'data.parquet')

# Read from Parquet
table2 = pq.read_table('data.parquet')
df2 = table2.to_pandas()
print(df2)

Working with CSV Files

import pyarrow.csv as csv
import pyarrow as pa

# Read CSV file
table = csv.read_csv('data.csv')

# Convert to pandas if needed
df = table.to_pandas()

Memory-Mapped Files for Large Datasets

import pyarrow as pa
import pyarrow.parquet as pq

# Read large Parquet file without loading into memory
parquet_file = pq.ParquetFile('large_data.parquet')

# Read in batches
for batch in parquet_file.iter_batches(batch_size=10000):
    # Process batch
    df = batch.to_pandas()
    # Your processing here

Version Compatibility

PyArrow follows semantic versioning. The Arrow IPC format is stable, but API changes may occur between major versions. Always check the changelog when upgrading.

Python Version Support

  • Python 3.9+: Recommended for the latest PyArrow versions
  • Python 3.8: Supported in PyArrow < 15.0
  • Python 3.7 and earlier: No longer supported

Troubleshooting

If you get an import error on Windows, install the Visual C++ Redistributable:
  1. Download from Microsoft’s official page
  2. Install the x64 version
  3. Restart your Python environment
Ensure PyArrow is installed in the correct Python environment:
# Check which Python you're using
which python

# Check if PyArrow is installed
pip list | grep pyarrow

# Reinstall if needed
pip install --force-reinstall pyarrow
If you have conflicts with NumPy or pandas:
# Update all related packages
pip install --upgrade pyarrow pandas numpy

# Or create a fresh environment
conda create -n fresh-env python=3.11 pyarrow pandas numpy
conda activate fresh-env
First import of PyArrow may be slow. This is normal. Subsequent imports are faster:
import pyarrow as pa  # First import may take a few seconds
# Subsequent operations are fast

Building from Source

For development or custom builds:
Building PyArrow from source requires Arrow C++ to be built first. This is an advanced option and pre-built wheels are recommended for most users.
# Clone the repository
git clone https://github.com/apache/arrow.git
cd arrow/python

# Install build dependencies
pip install -r requirements-build.txt

# Build and install
python setup.py build_ext --build-type=release install
See the Python Development Guide for detailed instructions.

Next Steps

Now that you have PyArrow installed:

Build docs developers (and LLMs) love