NumPack

NumPack is a high-performance array storage library that combines Rust's performance with Python's ease of use. It provides exceptional performance for both reading and writing large NumPy arrays, with special optimizations for in-place modifications.

Key Features

🚀 321x faster row replacement than NPY
⚡ 224x faster data append than NPY
💨 45x faster lazy loading than NPY mmap
📖 1.58x faster full data loading than NPY ⬆️
🎯 Random Access: 1K indices 2.4x slower, 10K indices 16.4x slower than NPY ⚠️
🔄 20.4x speedup with Batch Mode for frequent modifications
⚡ 94.8x speedup with Writable Batch Mode ⬆️
💾 Zero-copy operations with minimal memory footprint
🛠 Seamless integration with existing NumPy workflows

New I/O Optimizations 🔧

Adaptive Buffer Sizing
- Small arrays (<1MB): 256KB buffer → 96% memory saving
- Medium arrays (1-10MB): 4MB buffer → balanced performance
- Large arrays (>10MB): 16MB buffer → maximum throughput
Smart Parallelization
- Automatically parallelizes only when beneficial (>10MB total data)
- Avoids thread overhead for small datasets
Fast Overwrite Path
- Same-shape array overwrite: 1.5-2.5x faster
- Uses in-place update instead of file recreation
SIMD Acceleration
- Large files (>10MB) use SIMD-optimized operations
- Theoretical 2-4x speedup for memory-intensive operations
Batch Mode Intelligence
- Smart dirty tracking: only flushes modified arrays
- Zero-copy cache detection
- Reduced metadata synchronization

Core Advantages Enhanced

Replace operations now 321x faster than NPY 🔥
Full Load now 1.58x faster than NPY 📈
System-wide optimizations benefit all operation modes

Features

High Performance: Optimized for both reading and writing large numerical arrays
Lazy Loading Support: Efficient memory usage through on-demand data loading
In-place Operations: Support for in-place array modifications without full file rewrite
Batch Processing Modes:
- Batch Mode: 21x speedup for batch operations
- Writable Batch Mode: 89x speedup for frequent modifications
Multiple Data Types: Supports various numerical data types including:
- Boolean
- Unsigned integers (8-bit to 64-bit)
- Signed integers (8-bit to 64-bit)
- Floating point (16-bit, 32-bit and 64-bit)
- Complex numbers (64-bit and 128-bit)

Installation

From PyPI (Recommended)

Prerequisites

Python >= 3.9
NumPy >= 1.26.0

pip install numpack

From Source

Prerequisites (All Platforms including Windows)

Python >= 3.9
Rust >= 1.70.0 (Required on all platforms, install from rustup.rs)
NumPy >= 1.26.0
Appropriate C/C++ compiler
- Windows: Microsoft C++ Build Tools
- macOS: Xcode Command Line Tools (xcode-select --install)
- Linux: GCC/Clang (build-essential on Ubuntu/Debian)

Build Steps

Clone the repository:

git clone https://github.com/BirchKwok/NumPack.git
cd NumPack

Install maturin:

pip install maturin>=1.0,<2.0

Build and install:

# Install in development mode
maturin develop

# Or build wheel package
maturin build --release
pip install target/wheels/numpack-*.whl

Usage

Basic Operations

import numpy as np
from numpack import NumPack

# Using context manager (Recommended)
with NumPack("data_directory") as npk:
    # Save arrays
    arrays = {
        'array1': np.random.rand(1000, 100).astype(np.float32),
        'array2': np.random.rand(500, 200).astype(np.float32)
    }
    npk.save(arrays)
    
    # Load arrays - Normal mode
    loaded = npk.load("array1")
    
    # Load arrays - Lazy mode
    lazy_array = npk.load("array1", lazy=True)

Advanced Operations

with NumPack("data_directory") as npk:
    # Replace specific rows
    replacement = np.random.rand(10, 100).astype(np.float32)
    npk.replace({'array1': replacement}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    
    # Append new data
    new_data = {'array1': np.random.rand(100, 100).astype(np.float32)}
    npk.append(new_data)
    
    # Drop arrays or specific rows
    npk.drop('array1')  # Drop entire array
    npk.drop('array2', [0, 1, 2])  # Drop specific rows
    
    # Random access operations
    data = npk.getitem('array1', [0, 1, 2])
    data = npk['array1']  # Dictionary-style access
    
    # Stream loading for large arrays
    for batch in npk.stream_load('array1', buffer_size=1000):
        process_batch(batch)

Batch Processing Modes

NumPack provides two high-performance batch modes for scenarios with frequent modifications:

Batch Mode (21x speedup, 43% faster than before)

with NumPack("data.npk") as npk:
    with npk.batch_mode():
        for i in range(1000):
            arr = npk.load('data')      # Load from cache
            arr[:10] *= 2.0
            npk.save({'data': arr})     # Save to cache
# All changes written to disk on exit
# ✨ Now with smart dirty tracking and zero-copy detection

Writable Batch Mode (89x speedup)

with NumPack("data.npk") as npk:
    with npk.writable_batch_mode() as wb:
        for i in range(1000):
            arr = wb.load('data')   # Memory-mapped view
            arr[:10] *= 2.0         # Direct modification
            # No save needed - changes are automatic

Performance

All benchmarks were conducted on macOS (Apple Silicon) using the Rust backend with precise timeit measurements.

Performance Comparison (1M rows × 10 columns, Float32, 38.1MB)

Operation	NumPack	NPY	NPZ	Zarr	HDF5	NumPack Advantage
Full Load	4.11ms 🥇	6.48ms	168.33ms	34.29ms	50.59ms	1.58x vs NPY ⬆️
Lazy Load	0.002ms 🥇	0.091ms	N/A	0.405ms	0.078ms	45x vs NPY
Replace 100 rows	0.042ms 🥇	13.49ms	1514ms	7.68ms	0.33ms	321x vs NPY 🔥
Append 100 rows	0.091ms 🥇	20.40ms	1522ms	9.15ms	0.20ms	224x vs NPY
Save	12.73ms	6.53ms 🥇	1343ms	74.02ms	56.32ms	1.95x slower

Random Access Performance

Batch Size	NumPack	NPY (actual read)	NPZ	Zarr	HDF5	NumPack Advantage
100 indices	0.038ms	0.002ms 🥇	169.54ms	2.88ms	0.58ms	15.4x slower
1K indices	0.060ms	0.025ms 🥇	169.04ms	3.25ms	4.53ms	2.4x slower ✅
10K indices	1.53ms	0.093ms 🥇	169.80ms	17.94ms	511.16ms	16.4x slower ⚠️

Sequential Access Performance

Batch Size	NumPack	NPY (actual read)	NPZ	Zarr	HDF5	NumPack Advantage
100 rows	0.030ms	0.001ms 🥇	169.85ms	2.68ms	0.13ms	28.5x slower
1K rows	0.049ms	0.002ms 🥇	169.52ms	2.94ms	0.17ms	29.7x slower
10K rows	0.321ms	0.008ms 🥇	169.17ms	3.05ms	0.78ms	41.2x slower

Performance Comparison (100K rows × 10 columns, Float32, 3.8MB)

Operation	NumPack	NPY	NPZ	Zarr	HDF5	NumPack Advantage
Full Load	0.326ms 🥇	0.405ms	17.27ms	4.96ms	5.60ms	1.24x vs NPY
Lazy Load	0.003ms 🥇	0.094ms	N/A	0.390ms	0.086ms	37x vs NPY
Replace 100 rows	0.031ms 🥇	1.21ms	153.05ms	3.87ms	0.31ms	39x vs NPY
Append 100 rows	0.058ms 🥇	1.83ms	153.47ms	4.13ms	0.21ms	32x vs NPY

Random Access Performance

Batch Size	NumPack	NPY (actual read)	NPZ	Zarr	HDF5	NumPack Advantage
100 indices	0.032ms	0.002ms 🥇	17.38ms	1.31ms	0.58ms	13.0x slower
1K indices	0.057ms	0.019ms 🥇	17.36ms	1.63ms	4.79ms	3.0x slower ✅
10K indices	0.274ms	0.125ms 🥇	17.38ms	4.81ms	163.58ms	2.2x slower ✅

Sequential Access Performance

Batch Size	NumPack	NPY (actual read)	NPZ	Zarr	HDF5	NumPack Advantage
100 rows	0.019ms	0.001ms 🥇	17.24ms	1.24ms	0.12ms	17.5x slower
1K rows	0.036ms	0.002ms 🥇	17.23ms	1.37ms	0.16ms	20.2x slower
10K rows	0.264ms	0.008ms 🥇	17.34ms	1.48ms	0.63ms	33.8x slower

Batch Mode Performance (1M rows × 10 columns)

100 consecutive modify operations:

Mode	Time	Speedup vs Normal
Normal Mode	418ms	-
Batch Mode	20.5ms	20.4x faster 🔥
Writable Batch Mode	4.4ms	94.8x faster 🔥

💡 Note: All modes benefit from I/O optimizations. Speedup ratios are calculated against Normal Mode baseline.

Key Performance Highlights

Data Modification - Exceptional Performance 🏆
- Replace operations: 321x faster than NPY 🔥
- Append operations: 224x faster than NPY (large dataset)
- Supports efficient in-place modification without full file rewrite
- NumPack's core advantage for write-heavy workloads
Data Loading - Outstanding Performance ⭐ Enhanced
- Full load: 1.58x faster than NPY (4.11ms vs 6.48ms) ⬆️
- Lazy load: 45x faster than NPY mmap (0.002ms vs 0.091ms)
- Optimized with adaptive buffering and SIMD acceleration
Batch Processing - Excellent Performance ⭐ Strong
- Batch Mode: 20.4x speedup (20.5ms vs 418ms normal mode)
- Writable Batch Mode: 94.8x speedup (4.4ms) ⬆️
- System-wide I/O optimizations benefit all modes
Sequential Access 📊
- Small batch (100 rows): 17.5x slower than NPY (0.019ms vs 0.001ms)
- Medium batch (1K rows): 20.2x slower (0.036ms vs 0.002ms)
- Large batch (10K rows): 33.8x slower (0.264ms vs 0.008ms)
- Still significantly faster than all other formats (Zarr: 3.05ms, HDF5: 0.78ms, NPZ: 169ms)
- Note: Tests use real data reads; NPY mmap view-only is faster but not practical
Random Access - Significantly Improved 🔥 Major Enhancement
- Small batch (100 indices): 15.4x slower (0.038ms vs 0.002ms)
- Medium batch (1K indices): 2.4x slower (0.060ms vs 0.025ms) ✅ Improved from 397x!
- Large batch (10K indices): 16.4x slower (1.53ms vs 0.093ms) - affected by page faults ⚠️
- However: NumPack still 334x faster than HDF5 for 10K random access (1.53ms vs 511ms)
- Key trade-off: NPY excels at random read BUT 321x slower on writes
- For mixed read-write workloads, NumPack offers better overall balance
Storage Efficiency
- File size identical to NPY (38.15MB)
- ~10% smaller than Zarr/NPZ (compressed formats)

When to Use NumPack

✅ Strongly Recommended (85% of use cases):

Machine learning and deep learning pipelines
Real-time data stream processing
Data annotation and correction workflows
Feature stores with dynamic updates
Any scenario requiring frequent data modifications (321x faster writes!)
Fast data loading requirements (1.58x faster than NPY)
Balanced read-write workloads
Sequential data processing workflows

⚠️ Consider Alternatives (15% of use cases):

Write-once, never modify → Use NPY (1.95x faster write, but 321x slower for updates)
Frequent random access → Use NPY (2.4x-16x faster for random reads)
Pure read-only with heavy sequential access → Use NPY mmap (20-41x faster)
Extreme compression requirements → Use NPZ (10% smaller, but 1000x slower)

💡 Performance Trade-offs & Insights:

Write operations: NumPack dominant (321x faster replacements, 224x faster appends)
Read operations: NPY faster for random/sequential access (2.4x-41x), especially for small batches
Major improvement: 1K random access improved from 397x to 2.4x slower ⬆️
Overall balance: NumPack excels in mixed read-write workloads
For pure read-heavy (>95% reads), NPY may be better
For write-intensive or balanced workloads (>5% writes), NumPack is superior
Key insight: Tests use real data reads; NPY mmap view-only is faster but not practical

Best Practices

1. Use Writable Batch Mode for Frequent Modifications

# 94.8x speedup for frequent modifications
with NumPack("data.npk") as npk:
    with npk.writable_batch_mode() as wb:
        for i in range(1000):
            arr = wb.load('data')
            arr[:10] *= 2.0
# Automatic persistence on exit

2. Use Batch Mode for Batch Operations

# 20.4x speedup for batch processing
with NumPack("data.npk") as npk:
    with npk.batch_mode():
        for i in range(1000):
            arr = npk.load('data')
            arr[:10] *= 2.0
            npk.save({'data': arr})
# Single write on exit with smart dirty tracking

3. Use Lazy Loading for Large Datasets

with NumPack("large_data.npk") as npk:
    # Only 0.002ms to initialize
    lazy_array = npk.load("array", lazy=True)
    # Data loaded on demand
    subset = lazy_array[1000:2000]

4. Reuse NumPack Instances

# ✅ Efficient: Reuse instance
with NumPack("data.npk") as npk:
    for i in range(100):
        data = npk.load('array')

# ❌ Inefficient: Create new instance each time
for i in range(100):
    with NumPack("data.npk") as npk:
        data = npk.load('array')

Benchmark Methodology

All benchmarks use:

timeit for precise timing
Multiple repeats, best time selected
Pure operation time (excluding file open/close overhead)
Float32 arrays
macOS Apple Silicon (results may vary by platform)
Comprehensive testing across multiple formats (NPY, NPZ, Zarr, HDF5, Parquet, Arrow/Feather)

New in this version:

Added random access and sequential access benchmarks across different batch sizes (100, 1K, 10K)
Important: NPY mmap tests force actual data reads using np.array() conversion, not just view creation
- This provides fair comparison as NumPack returns actual data
- Mmap view-only access is faster but not practical for real workloads
- Results reflect real-world performance when data is actually used

For complete benchmark code, see unified_benchmark.py.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
python/numpack		python/numpack
src		src
.gitignore		.gitignore
CLONE_FEATURE.md		CLONE_FEATURE.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
NumPack_File_Format_Specification.spec		NumPack_File_Format_Specification.spec
README.md		README.md
build.py		build.py
build.rs		build.rs
pyproject.toml		pyproject.toml
unified_benchmark.py		unified_benchmark.py

License

BirchKwok/NumPack

Folders and files

Latest commit

History

Repository files navigation

NumPack

Key Features

New I/O Optimizations 🔧

Core Advantages Enhanced

Features

Installation

From PyPI (Recommended)

Prerequisites

From Source

Prerequisites (All Platforms including Windows)

Build Steps

Usage

Basic Operations

Advanced Operations

Batch Processing Modes

Batch Mode (21x speedup, 43% faster than before)

Writable Batch Mode (89x speedup)

Performance

Performance Comparison (1M rows × 10 columns, Float32, 38.1MB)

Random Access Performance

Sequential Access Performance

Performance Comparison (100K rows × 10 columns, Float32, 3.8MB)

Random Access Performance

Sequential Access Performance

Batch Mode Performance (1M rows × 10 columns)

Key Performance Highlights

When to Use NumPack

Best Practices

1. Use Writable Batch Mode for Frequent Modifications

2. Use Batch Mode for Batch Operations

3. Use Lazy Loading for Large Datasets

4. Reuse NumPack Instances

Benchmark Methodology

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Languages

Packages