Skip to main content

Introduction

Masked arrays are arrays that may have missing or invalid entries. The numpy.ma module provides a nearly work-alike replacement for NumPy that supports data arrays with masks. When performing operations on masked arrays, invalid values are automatically suppressed from computations.

What Are Masked Arrays?

A masked array consists of two components:
  • Data array: The underlying NumPy array containing the actual data
  • Mask: A boolean array where True indicates invalid/masked values and False indicates valid values
Masked values are excluded from any computation, allowing you to work with incomplete datasets without propagating invalid values.

When to Use Masked Arrays

Masked arrays are particularly useful when:

Missing Data

Your dataset has missing values that should be excluded from calculations

Invalid Measurements

Sensor readings contain outliers or errors (NaN, inf) that need to be ignored

Conditional Analysis

You need to temporarily exclude certain values based on conditions without modifying the original data

Data Quality Control

Processing scientific data where quality flags indicate unreliable measurements

Basic Example

Consider an array with NaN values:
import numpy as np
import numpy.ma as ma

# Array with missing data
x = np.array([2, 1, 3, np.nan, 5, 2, 3, np.nan])

# Regular mean fails
print(np.mean(x))  # Output: nan

# Create masked array
m = ma.masked_array(x, np.isnan(x))
print(m)
# masked_array(data=[2.0, 1.0, 3.0, --, 5.0, 2.0, 3.0, --],
#              mask=[False, False, False, True, False, False, False, True],
#        fill_value=1e+20)

# Calculate mean of valid values
print(ma.mean(m))  # Output: 2.666666666666667

Key Concepts

The Mask

The mask is a boolean array with the same shape as the data:
  • True: Value is masked (invalid/excluded)
  • False: Value is unmasked (valid/included)
  • nomask: Special value indicating no elements are masked
import numpy.ma as ma

# Create masked array with explicit mask
data = [1, 2, 3, 4, 5]
mask = [False, False, True, False, True]
x = ma.array(data, mask=mask)
print(x)
# masked_array(data=[1, 2, --, 4, --],
#              mask=[False, False, True, False, True],
#        fill_value=999999)

Fill Values

Fill values are used to replace masked values when converting back to a regular array:
import numpy.ma as ma

x = ma.array([1, 2, 3, 4], mask=[0, 0, 1, 0])
print(x.fill_value)  # Default: 999999

# Set custom fill value
x.fill_value = -1
print(x.filled())
# Output: array([1, 2, -1, 4])
Default fill values depend on the data type:
Data TypeDefault Fill Value
boolTrue
int999999
float1.e20
complex1.e20+0j
object'?'
string'N/A'

Hard vs. Soft Masks

Soft mask (default): Masked values can be unmasked by assigning new values
import numpy.ma as ma

x = ma.array([1, 2, 3], mask=[0, 1, 0])
x[1] = 999  # Unmasks and assigns new value
print(x.mask)  # [False False False]
Hard mask: Once masked, values cannot be unmasked
import numpy.ma as ma

x = ma.array([1, 2, 3], mask=[0, 1, 0], hard_mask=True)
x[1] = 999  # Value remains masked
print(x.mask)  # [False True False]

Common Operations

Masked arrays support most NumPy operations:
import numpy as np
import numpy.ma as ma

# Create masked arrays
x = ma.array([1, 2, 3, 4, 5], mask=[0, 0, 1, 0, 0])
y = ma.array([10, 20, 30, 40, 50], mask=[0, 1, 0, 0, 0])

# Arithmetic operations
print(x + y)
# masked_array(data=[11, --, --, 44, 55],
#              mask=[False, True, True, False, False],
#        fill_value=999999)

# Aggregation functions
print(x.sum())   # 12 (excludes masked value)
print(x.mean())  # 3.0
print(x.std())   # 1.414...

# Comparison operations
print(x > 2)
# masked_array(data=[False, False, --, True, True],
#              mask=[False, False, True, False, False],
#        fill_value=True)

Performance Considerations

Masked arrays have some overhead compared to regular NumPy arrays:
  • Additional memory for the mask array
  • Mask checks during operations
  • More complex indexing and broadcasting
For large-scale numerical computations where performance is critical, consider:
  • Using NaN for missing values in float arrays
  • Filtering data before computation
  • Using specialized libraries like pandas
However, masked arrays provide the cleanest interface for handling missing data in scientific computing contexts.

See Also

Build docs developers (and LLMs) love