NumPack is a high-performance array storage library that combines Rust's performance with Python's ease of use. It provides exceptional performance for both reading and writing large NumPy arrays, with special optimizations for in-place modifications.
- π 321x faster row replacement than NPY
- β‘ 224x faster data append than NPY
- π¨ 45x faster lazy loading than NPY mmap
- π 1.58x faster full data loading than NPY β¬οΈ
- π― Random Access: 1K indices 2.4x slower, 10K indices 16.4x slower than NPY
β οΈ - π 20.4x speedup with Batch Mode for frequent modifications
- β‘ 94.8x speedup with Writable Batch Mode β¬οΈ
- πΎ Zero-copy operations with minimal memory footprint
- π Seamless integration with existing NumPy workflows
-
Adaptive Buffer Sizing
- Small arrays (<1MB): 256KB buffer β 96% memory saving
- Medium arrays (1-10MB): 4MB buffer β balanced performance
- Large arrays (>10MB): 16MB buffer β maximum throughput
-
Smart Parallelization
- Automatically parallelizes only when beneficial (>10MB total data)
- Avoids thread overhead for small datasets
-
Fast Overwrite Path
- Same-shape array overwrite: 1.5-2.5x faster
- Uses in-place update instead of file recreation
-
SIMD Acceleration
- Large files (>10MB) use SIMD-optimized operations
- Theoretical 2-4x speedup for memory-intensive operations
-
Batch Mode Intelligence
- Smart dirty tracking: only flushes modified arrays
- Zero-copy cache detection
- Reduced metadata synchronization
- Replace operations now 321x faster than NPY π₯
- Full Load now 1.58x faster than NPY π
- System-wide optimizations benefit all operation modes
- High Performance: Optimized for both reading and writing large numerical arrays
- Lazy Loading Support: Efficient memory usage through on-demand data loading
- In-place Operations: Support for in-place array modifications without full file rewrite
- Batch Processing Modes:
- Batch Mode: 21x speedup for batch operations
- Writable Batch Mode: 89x speedup for frequent modifications
- Multiple Data Types: Supports various numerical data types including:
- Boolean
- Unsigned integers (8-bit to 64-bit)
- Signed integers (8-bit to 64-bit)
- Floating point (16-bit, 32-bit and 64-bit)
- Complex numbers (64-bit and 128-bit)
- Python >= 3.9
- NumPy >= 1.26.0
pip install numpack- Python >= 3.9
- Rust >= 1.70.0 (Required on all platforms, install from rustup.rs)
- NumPy >= 1.26.0
- Appropriate C/C++ compiler
- Windows: Microsoft C++ Build Tools
- macOS: Xcode Command Line Tools (
xcode-select --install) - Linux: GCC/Clang (
build-essentialon Ubuntu/Debian)
- Clone the repository:
git clone https://github.com/BirchKwok/NumPack.git
cd NumPack- Install maturin:
pip install maturin>=1.0,<2.0- Build and install:
# Install in development mode
maturin develop
# Or build wheel package
maturin build --release
pip install target/wheels/numpack-*.whlimport numpy as np
from numpack import NumPack
# Using context manager (Recommended)
with NumPack("data_directory") as npk:
# Save arrays
arrays = {
'array1': np.random.rand(1000, 100).astype(np.float32),
'array2': np.random.rand(500, 200).astype(np.float32)
}
npk.save(arrays)
# Load arrays - Normal mode
loaded = npk.load("array1")
# Load arrays - Lazy mode
lazy_array = npk.load("array1", lazy=True)with NumPack("data_directory") as npk:
# Replace specific rows
replacement = np.random.rand(10, 100).astype(np.float32)
npk.replace({'array1': replacement}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Append new data
new_data = {'array1': np.random.rand(100, 100).astype(np.float32)}
npk.append(new_data)
# Drop arrays or specific rows
npk.drop('array1') # Drop entire array
npk.drop('array2', [0, 1, 2]) # Drop specific rows
# Random access operations
data = npk.getitem('array1', [0, 1, 2])
data = npk['array1'] # Dictionary-style access
# Stream loading for large arrays
for batch in npk.stream_load('array1', buffer_size=1000):
process_batch(batch)NumPack provides two high-performance batch modes for scenarios with frequent modifications:
with NumPack("data.npk") as npk:
with npk.batch_mode():
for i in range(1000):
arr = npk.load('data') # Load from cache
arr[:10] *= 2.0
npk.save({'data': arr}) # Save to cache
# All changes written to disk on exit
# β¨ Now with smart dirty tracking and zero-copy detectionwith NumPack("data.npk") as npk:
with npk.writable_batch_mode() as wb:
for i in range(1000):
arr = wb.load('data') # Memory-mapped view
arr[:10] *= 2.0 # Direct modification
# No save needed - changes are automaticAll benchmarks were conducted on macOS (Apple Silicon) using the Rust backend with precise timeit measurements.
| Operation | NumPack | NPY | NPZ | Zarr | HDF5 | NumPack Advantage |
|---|---|---|---|---|---|---|
| Full Load | 4.11ms π₯ | 6.48ms | 168.33ms | 34.29ms | 50.59ms | 1.58x vs NPY β¬οΈ |
| Lazy Load | 0.002ms π₯ | 0.091ms | N/A | 0.405ms | 0.078ms | 45x vs NPY |
| Replace 100 rows | 0.042ms π₯ | 13.49ms | 1514ms | 7.68ms | 0.33ms | 321x vs NPY π₯ |
| Append 100 rows | 0.091ms π₯ | 20.40ms | 1522ms | 9.15ms | 0.20ms | 224x vs NPY |
| Save | 12.73ms | 6.53ms π₯ | 1343ms | 74.02ms | 56.32ms | 1.95x slower |
| Batch Size | NumPack | NPY (actual read) | NPZ | Zarr | HDF5 | NumPack Advantage |
|---|---|---|---|---|---|---|
| 100 indices | 0.038ms | 0.002ms π₯ | 169.54ms | 2.88ms | 0.58ms | 15.4x slower |
| 1K indices | 0.060ms | 0.025ms π₯ | 169.04ms | 3.25ms | 4.53ms | 2.4x slower β |
| 10K indices | 1.53ms | 0.093ms π₯ | 169.80ms | 17.94ms | 511.16ms | 16.4x slower |
| Batch Size | NumPack | NPY (actual read) | NPZ | Zarr | HDF5 | NumPack Advantage |
|---|---|---|---|---|---|---|
| 100 rows | 0.030ms | 0.001ms π₯ | 169.85ms | 2.68ms | 0.13ms | 28.5x slower |
| 1K rows | 0.049ms | 0.002ms π₯ | 169.52ms | 2.94ms | 0.17ms | 29.7x slower |
| 10K rows | 0.321ms | 0.008ms π₯ | 169.17ms | 3.05ms | 0.78ms | 41.2x slower |
| Operation | NumPack | NPY | NPZ | Zarr | HDF5 | NumPack Advantage |
|---|---|---|---|---|---|---|
| Full Load | 0.326ms π₯ | 0.405ms | 17.27ms | 4.96ms | 5.60ms | 1.24x vs NPY |
| Lazy Load | 0.003ms π₯ | 0.094ms | N/A | 0.390ms | 0.086ms | 37x vs NPY |
| Replace 100 rows | 0.031ms π₯ | 1.21ms | 153.05ms | 3.87ms | 0.31ms | 39x vs NPY |
| Append 100 rows | 0.058ms π₯ | 1.83ms | 153.47ms | 4.13ms | 0.21ms | 32x vs NPY |
| Batch Size | NumPack | NPY (actual read) | NPZ | Zarr | HDF5 | NumPack Advantage |
|---|---|---|---|---|---|---|
| 100 indices | 0.032ms | 0.002ms π₯ | 17.38ms | 1.31ms | 0.58ms | 13.0x slower |
| 1K indices | 0.057ms | 0.019ms π₯ | 17.36ms | 1.63ms | 4.79ms | 3.0x slower β |
| 10K indices | 0.274ms | 0.125ms π₯ | 17.38ms | 4.81ms | 163.58ms | 2.2x slower β |
| Batch Size | NumPack | NPY (actual read) | NPZ | Zarr | HDF5 | NumPack Advantage |
|---|---|---|---|---|---|---|
| 100 rows | 0.019ms | 0.001ms π₯ | 17.24ms | 1.24ms | 0.12ms | 17.5x slower |
| 1K rows | 0.036ms | 0.002ms π₯ | 17.23ms | 1.37ms | 0.16ms | 20.2x slower |
| 10K rows | 0.264ms | 0.008ms π₯ | 17.34ms | 1.48ms | 0.63ms | 33.8x slower |
100 consecutive modify operations:
| Mode | Time | Speedup vs Normal |
|---|---|---|
| Normal Mode | 418ms | - |
| Batch Mode | 20.5ms | 20.4x faster π₯ |
| Writable Batch Mode | 4.4ms | 94.8x faster π₯ |
π‘ Note: All modes benefit from I/O optimizations. Speedup ratios are calculated against Normal Mode baseline.
-
Data Modification - Exceptional Performance π
- Replace operations: 321x faster than NPY π₯
- Append operations: 224x faster than NPY (large dataset)
- Supports efficient in-place modification without full file rewrite
- NumPack's core advantage for write-heavy workloads
-
Data Loading - Outstanding Performance β Enhanced
- Full load: 1.58x faster than NPY (4.11ms vs 6.48ms) β¬οΈ
- Lazy load: 45x faster than NPY mmap (0.002ms vs 0.091ms)
- Optimized with adaptive buffering and SIMD acceleration
-
Batch Processing - Excellent Performance β Strong
- Batch Mode: 20.4x speedup (20.5ms vs 418ms normal mode)
- Writable Batch Mode: 94.8x speedup (4.4ms) β¬οΈ
- System-wide I/O optimizations benefit all modes
-
Sequential Access π
- Small batch (100 rows): 17.5x slower than NPY (0.019ms vs 0.001ms)
- Medium batch (1K rows): 20.2x slower (0.036ms vs 0.002ms)
- Large batch (10K rows): 33.8x slower (0.264ms vs 0.008ms)
- Still significantly faster than all other formats (Zarr: 3.05ms, HDF5: 0.78ms, NPZ: 169ms)
- Note: Tests use real data reads; NPY mmap view-only is faster but not practical
-
Random Access - Significantly Improved π₯ Major Enhancement
- Small batch (100 indices): 15.4x slower (0.038ms vs 0.002ms)
- Medium batch (1K indices): 2.4x slower (0.060ms vs 0.025ms) β Improved from 397x!
- Large batch (10K indices): 16.4x slower (1.53ms vs 0.093ms) - affected by page faults
β οΈ - However: NumPack still 334x faster than HDF5 for 10K random access (1.53ms vs 511ms)
- Key trade-off: NPY excels at random read BUT 321x slower on writes
- For mixed read-write workloads, NumPack offers better overall balance
-
Storage Efficiency
- File size identical to NPY (38.15MB)
- ~10% smaller than Zarr/NPZ (compressed formats)
β Strongly Recommended (85% of use cases):
- Machine learning and deep learning pipelines
- Real-time data stream processing
- Data annotation and correction workflows
- Feature stores with dynamic updates
- Any scenario requiring frequent data modifications (321x faster writes!)
- Fast data loading requirements (1.58x faster than NPY)
- Balanced read-write workloads
- Sequential data processing workflows
- Write-once, never modify β Use NPY (1.95x faster write, but 321x slower for updates)
- Frequent random access β Use NPY (2.4x-16x faster for random reads)
- Pure read-only with heavy sequential access β Use NPY mmap (20-41x faster)
- Extreme compression requirements β Use NPZ (10% smaller, but 1000x slower)
π‘ Performance Trade-offs & Insights:
- Write operations: NumPack dominant (321x faster replacements, 224x faster appends)
- Read operations: NPY faster for random/sequential access (2.4x-41x), especially for small batches
- Major improvement: 1K random access improved from 397x to 2.4x slower β¬οΈ
- Overall balance: NumPack excels in mixed read-write workloads
- For pure read-heavy (>95% reads), NPY may be better
- For write-intensive or balanced workloads (>5% writes), NumPack is superior
- Key insight: Tests use real data reads; NPY mmap view-only is faster but not practical
# 94.8x speedup for frequent modifications
with NumPack("data.npk") as npk:
with npk.writable_batch_mode() as wb:
for i in range(1000):
arr = wb.load('data')
arr[:10] *= 2.0
# Automatic persistence on exit# 20.4x speedup for batch processing
with NumPack("data.npk") as npk:
with npk.batch_mode():
for i in range(1000):
arr = npk.load('data')
arr[:10] *= 2.0
npk.save({'data': arr})
# Single write on exit with smart dirty trackingwith NumPack("large_data.npk") as npk:
# Only 0.002ms to initialize
lazy_array = npk.load("array", lazy=True)
# Data loaded on demand
subset = lazy_array[1000:2000]# β
Efficient: Reuse instance
with NumPack("data.npk") as npk:
for i in range(100):
data = npk.load('array')
# β Inefficient: Create new instance each time
for i in range(100):
with NumPack("data.npk") as npk:
data = npk.load('array')All benchmarks use:
timeitfor precise timing- Multiple repeats, best time selected
- Pure operation time (excluding file open/close overhead)
- Float32 arrays
- macOS Apple Silicon (results may vary by platform)
- Comprehensive testing across multiple formats (NPY, NPZ, Zarr, HDF5, Parquet, Arrow/Feather)
New in this version:
- Added random access and sequential access benchmarks across different batch sizes (100, 1K, 10K)
- Important: NPY mmap tests force actual data reads using
np.array()conversion, not just view creation- This provides fair comparison as NumPack returns actual data
- Mmap view-only access is faster but not practical for real workloads
- Results reflect real-world performance when data is actually used
For complete benchmark code, see unified_benchmark.py.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.