Masked Arrays Overview

Introduction

Masked arrays are arrays that may have missing or invalid entries. The numpy.ma module provides a nearly work-alike replacement for NumPy that supports data arrays with masks. When performing operations on masked arrays, invalid values are automatically suppressed from computations.

What Are Masked Arrays?

A masked array consists of two components:

Data array: The underlying NumPy array containing the actual data
Mask: A boolean array where True indicates invalid/masked values and False indicates valid values

Masked values are excluded from any computation, allowing you to work with incomplete datasets without propagating invalid values.

When to Use Masked Arrays

Masked arrays are particularly useful when:

Missing Data

Your dataset has missing values that should be excluded from calculations

Invalid Measurements

Sensor readings contain outliers or errors (NaN, inf) that need to be ignored

Conditional Analysis

You need to temporarily exclude certain values based on conditions without modifying the original data

Data Quality Control

Processing scientific data where quality flags indicate unreliable measurements

Basic Example

Consider an array with NaN values:

import numpy as np
import numpy.ma as ma

# Array with missing data
x = np.array([2, 1, 3, np.nan, 5, 2, 3, np.nan])

# Regular mean fails
print(np.mean(x))  # Output: nan

# Create masked array
m = ma.masked_array(x, np.isnan(x))
print(m)
# masked_array(data=[2.0, 1.0, 3.0, --, 5.0, 2.0, 3.0, --],
#              mask=[False, False, False, True, False, False, False, True],
#        fill_value=1e+20)

# Calculate mean of valid values
print(ma.mean(m))  # Output: 2.666666666666667

Key Concepts

The Mask

The mask is a boolean array with the same shape as the data:

True: Value is masked (invalid/excluded)
False: Value is unmasked (valid/included)
nomask: Special value indicating no elements are masked

import numpy.ma as ma

# Create masked array with explicit mask
data = [1, 2, 3, 4, 5]
mask = [False, False, True, False, True]
x = ma.array(data, mask=mask)
print(x)
# masked_array(data=[1, 2, --, 4, --],
#              mask=[False, False, True, False, True],
#        fill_value=999999)

Fill Values

Fill values are used to replace masked values when converting back to a regular array:

import numpy.ma as ma

x = ma.array([1, 2, 3, 4], mask=[0, 0, 1, 0])
print(x.fill_value)  # Default: 999999

# Set custom fill value
x.fill_value = -1
print(x.filled())
# Output: array([1, 2, -1, 4])

Default fill values depend on the data type:

Data Type	Default Fill Value
`bool`	`True`
`int`	`999999`
`float`	`1.e20`
`complex`	`1.e20+0j`
`object`	`'?'`
`string`	`'N/A'`

Hard vs. Soft Masks

Soft mask (default): Masked values can be unmasked by assigning new values

import numpy.ma as ma

x = ma.array([1, 2, 3], mask=[0, 1, 0])
x[1] = 999  # Unmasks and assigns new value
print(x.mask)  # [False False False]

Hard mask: Once masked, values cannot be unmasked

import numpy.ma as ma

x = ma.array([1, 2, 3], mask=[0, 1, 0], hard_mask=True)
x[1] = 999  # Value remains masked
print(x.mask)  # [False True False]

Common Operations

Masked arrays support most NumPy operations:

import numpy as np
import numpy.ma as ma

# Create masked arrays
x = ma.array([1, 2, 3, 4, 5], mask=[0, 0, 1, 0, 0])
y = ma.array([10, 20, 30, 40, 50], mask=[0, 1, 0, 0, 0])

# Arithmetic operations
print(x + y)
# masked_array(data=[11, --, --, 44, 55],
#              mask=[False, True, True, False, False],
#        fill_value=999999)

# Aggregation functions
print(x.sum())   # 12 (excludes masked value)
print(x.mean())  # 3.0
print(x.std())   # 1.414...

# Comparison operations
print(x > 2)
# masked_array(data=[False, False, --, True, True],
#              mask=[False, False, True, False, False],
#        fill_value=True)

Performance Considerations

Masked arrays have some overhead compared to regular NumPy arrays:

Additional memory for the mask array
Mask checks during operations
More complex indexing and broadcasting

For large-scale numerical computations where performance is critical, consider:

Using NaN for missing values in float arrays
Filtering data before computation
Using specialized libraries like pandas

However, masked arrays provide the cleanest interface for handling missing data in scientific computing contexts.

Overview

Array Creation

Array Manipulation

Mathematical Functions

Linear Algebra

Statistics

FFT

Random

Polynomials

Data Types

I/O

Logic

Indexing & Selection

Testing

Masked Arrays

Masked Arrays Overview

Introduction

What Are Masked Arrays?

When to Use Masked Arrays

Missing Data

Invalid Measurements

Conditional Analysis

Data Quality Control

Basic Example

Key Concepts

The Mask

Fill Values

Hard vs. Soft Masks

Common Operations

Performance Considerations

See Also

Build docs developers (and LLMs) love

Overview

Array Creation

Array Manipulation

Mathematical Functions

Linear Algebra

Statistics

FFT

Random

Polynomials

Data Types

I/O

Logic

Indexing & Selection

Testing

Masked Arrays

​Introduction

​What Are Masked Arrays?

​When to Use Masked Arrays

Missing Data

Invalid Measurements

Conditional Analysis

Data Quality Control

​Basic Example

​Key Concepts

​The Mask

​Fill Values

​Hard vs. Soft Masks

​Common Operations

​Performance Considerations

​See Also

​Related Documentation

Build docs developers (and LLMs) love

Introduction

What Are Masked Arrays?

When to Use Masked Arrays

Basic Example

Key Concepts

The Mask

Fill Values

Hard vs. Soft Masks

Common Operations

Performance Considerations

See Also

Related Documentation