Skip to content

πŸš€ Supercharge Your NumPy Arrays | ⚑️ Instant TB-scale Data Ops | πŸ’Ύ Zero Memory Overhead | πŸ”„ Stream Huge Arrays Like Small Ones | πŸ›‘οΈ Production-Ready

License

Notifications You must be signed in to change notification settings

BirchKwok/NumPack

Repository files navigation

NumPack

NumPack is a high-performance array storage library that combines Rust's performance with Python's ease of use. It provides exceptional performance for both reading and writing large NumPy arrays, with special optimizations for in-place modifications.

Key Features

  • πŸš€ 321x faster row replacement than NPY
  • ⚑ 224x faster data append than NPY
  • πŸ’¨ 45x faster lazy loading than NPY mmap
  • πŸ“– 1.58x faster full data loading than NPY ⬆️
  • 🎯 Random Access: 1K indices 2.4x slower, 10K indices 16.4x slower than NPY ⚠️
  • πŸ”„ 20.4x speedup with Batch Mode for frequent modifications
  • ⚑ 94.8x speedup with Writable Batch Mode ⬆️
  • πŸ’Ύ Zero-copy operations with minimal memory footprint
  • πŸ›  Seamless integration with existing NumPy workflows

New I/O Optimizations πŸ”§

  1. Adaptive Buffer Sizing

    • Small arrays (<1MB): 256KB buffer β†’ 96% memory saving
    • Medium arrays (1-10MB): 4MB buffer β†’ balanced performance
    • Large arrays (>10MB): 16MB buffer β†’ maximum throughput
  2. Smart Parallelization

    • Automatically parallelizes only when beneficial (>10MB total data)
    • Avoids thread overhead for small datasets
  3. Fast Overwrite Path

    • Same-shape array overwrite: 1.5-2.5x faster
    • Uses in-place update instead of file recreation
  4. SIMD Acceleration

    • Large files (>10MB) use SIMD-optimized operations
    • Theoretical 2-4x speedup for memory-intensive operations
  5. Batch Mode Intelligence

    • Smart dirty tracking: only flushes modified arrays
    • Zero-copy cache detection
    • Reduced metadata synchronization

Core Advantages Enhanced

  • Replace operations now 321x faster than NPY πŸ”₯
  • Full Load now 1.58x faster than NPY πŸ“ˆ
  • System-wide optimizations benefit all operation modes

Features

  • High Performance: Optimized for both reading and writing large numerical arrays
  • Lazy Loading Support: Efficient memory usage through on-demand data loading
  • In-place Operations: Support for in-place array modifications without full file rewrite
  • Batch Processing Modes:
    • Batch Mode: 21x speedup for batch operations
    • Writable Batch Mode: 89x speedup for frequent modifications
  • Multiple Data Types: Supports various numerical data types including:
    • Boolean
    • Unsigned integers (8-bit to 64-bit)
    • Signed integers (8-bit to 64-bit)
    • Floating point (16-bit, 32-bit and 64-bit)
    • Complex numbers (64-bit and 128-bit)

Installation

From PyPI (Recommended)

Prerequisites

  • Python >= 3.9
  • NumPy >= 1.26.0
pip install numpack

From Source

Prerequisites (All Platforms including Windows)

  • Python >= 3.9
  • Rust >= 1.70.0 (Required on all platforms, install from rustup.rs)
  • NumPy >= 1.26.0
  • Appropriate C/C++ compiler
    • Windows: Microsoft C++ Build Tools
    • macOS: Xcode Command Line Tools (xcode-select --install)
    • Linux: GCC/Clang (build-essential on Ubuntu/Debian)

Build Steps

  1. Clone the repository:
git clone https://github.com/BirchKwok/NumPack.git
cd NumPack
  1. Install maturin:
pip install maturin>=1.0,<2.0
  1. Build and install:
# Install in development mode
maturin develop

# Or build wheel package
maturin build --release
pip install target/wheels/numpack-*.whl

Usage

Basic Operations

import numpy as np
from numpack import NumPack

# Using context manager (Recommended)
with NumPack("data_directory") as npk:
    # Save arrays
    arrays = {
        'array1': np.random.rand(1000, 100).astype(np.float32),
        'array2': np.random.rand(500, 200).astype(np.float32)
    }
    npk.save(arrays)
    
    # Load arrays - Normal mode
    loaded = npk.load("array1")
    
    # Load arrays - Lazy mode
    lazy_array = npk.load("array1", lazy=True)

Advanced Operations

with NumPack("data_directory") as npk:
    # Replace specific rows
    replacement = np.random.rand(10, 100).astype(np.float32)
    npk.replace({'array1': replacement}, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    
    # Append new data
    new_data = {'array1': np.random.rand(100, 100).astype(np.float32)}
    npk.append(new_data)
    
    # Drop arrays or specific rows
    npk.drop('array1')  # Drop entire array
    npk.drop('array2', [0, 1, 2])  # Drop specific rows
    
    # Random access operations
    data = npk.getitem('array1', [0, 1, 2])
    data = npk['array1']  # Dictionary-style access
    
    # Stream loading for large arrays
    for batch in npk.stream_load('array1', buffer_size=1000):
        process_batch(batch)

Batch Processing Modes

NumPack provides two high-performance batch modes for scenarios with frequent modifications:

Batch Mode (21x speedup, 43% faster than before)

with NumPack("data.npk") as npk:
    with npk.batch_mode():
        for i in range(1000):
            arr = npk.load('data')      # Load from cache
            arr[:10] *= 2.0
            npk.save({'data': arr})     # Save to cache
# All changes written to disk on exit
# ✨ Now with smart dirty tracking and zero-copy detection

Writable Batch Mode (89x speedup)

with NumPack("data.npk") as npk:
    with npk.writable_batch_mode() as wb:
        for i in range(1000):
            arr = wb.load('data')   # Memory-mapped view
            arr[:10] *= 2.0         # Direct modification
            # No save needed - changes are automatic

Performance

All benchmarks were conducted on macOS (Apple Silicon) using the Rust backend with precise timeit measurements.

Performance Comparison (1M rows Γ— 10 columns, Float32, 38.1MB)

Operation NumPack NPY NPZ Zarr HDF5 NumPack Advantage
Full Load 4.11ms πŸ₯‡ 6.48ms 168.33ms 34.29ms 50.59ms 1.58x vs NPY ⬆️
Lazy Load 0.002ms πŸ₯‡ 0.091ms N/A 0.405ms 0.078ms 45x vs NPY
Replace 100 rows 0.042ms πŸ₯‡ 13.49ms 1514ms 7.68ms 0.33ms 321x vs NPY πŸ”₯
Append 100 rows 0.091ms πŸ₯‡ 20.40ms 1522ms 9.15ms 0.20ms 224x vs NPY
Save 12.73ms 6.53ms πŸ₯‡ 1343ms 74.02ms 56.32ms 1.95x slower

Random Access Performance

Batch Size NumPack NPY (actual read) NPZ Zarr HDF5 NumPack Advantage
100 indices 0.038ms 0.002ms πŸ₯‡ 169.54ms 2.88ms 0.58ms 15.4x slower
1K indices 0.060ms 0.025ms πŸ₯‡ 169.04ms 3.25ms 4.53ms 2.4x slower βœ…
10K indices 1.53ms 0.093ms πŸ₯‡ 169.80ms 17.94ms 511.16ms 16.4x slower ⚠️

Sequential Access Performance

Batch Size NumPack NPY (actual read) NPZ Zarr HDF5 NumPack Advantage
100 rows 0.030ms 0.001ms πŸ₯‡ 169.85ms 2.68ms 0.13ms 28.5x slower
1K rows 0.049ms 0.002ms πŸ₯‡ 169.52ms 2.94ms 0.17ms 29.7x slower
10K rows 0.321ms 0.008ms πŸ₯‡ 169.17ms 3.05ms 0.78ms 41.2x slower

Performance Comparison (100K rows Γ— 10 columns, Float32, 3.8MB)

Operation NumPack NPY NPZ Zarr HDF5 NumPack Advantage
Full Load 0.326ms πŸ₯‡ 0.405ms 17.27ms 4.96ms 5.60ms 1.24x vs NPY
Lazy Load 0.003ms πŸ₯‡ 0.094ms N/A 0.390ms 0.086ms 37x vs NPY
Replace 100 rows 0.031ms πŸ₯‡ 1.21ms 153.05ms 3.87ms 0.31ms 39x vs NPY
Append 100 rows 0.058ms πŸ₯‡ 1.83ms 153.47ms 4.13ms 0.21ms 32x vs NPY

Random Access Performance

Batch Size NumPack NPY (actual read) NPZ Zarr HDF5 NumPack Advantage
100 indices 0.032ms 0.002ms πŸ₯‡ 17.38ms 1.31ms 0.58ms 13.0x slower
1K indices 0.057ms 0.019ms πŸ₯‡ 17.36ms 1.63ms 4.79ms 3.0x slower βœ…
10K indices 0.274ms 0.125ms πŸ₯‡ 17.38ms 4.81ms 163.58ms 2.2x slower βœ…

Sequential Access Performance

Batch Size NumPack NPY (actual read) NPZ Zarr HDF5 NumPack Advantage
100 rows 0.019ms 0.001ms πŸ₯‡ 17.24ms 1.24ms 0.12ms 17.5x slower
1K rows 0.036ms 0.002ms πŸ₯‡ 17.23ms 1.37ms 0.16ms 20.2x slower
10K rows 0.264ms 0.008ms πŸ₯‡ 17.34ms 1.48ms 0.63ms 33.8x slower

Batch Mode Performance (1M rows Γ— 10 columns)

100 consecutive modify operations:

Mode Time Speedup vs Normal
Normal Mode 418ms -
Batch Mode 20.5ms 20.4x faster πŸ”₯
Writable Batch Mode 4.4ms 94.8x faster πŸ”₯

πŸ’‘ Note: All modes benefit from I/O optimizations. Speedup ratios are calculated against Normal Mode baseline.

Key Performance Highlights

  1. Data Modification - Exceptional Performance πŸ†

    • Replace operations: 321x faster than NPY πŸ”₯
    • Append operations: 224x faster than NPY (large dataset)
    • Supports efficient in-place modification without full file rewrite
    • NumPack's core advantage for write-heavy workloads
  2. Data Loading - Outstanding Performance ⭐ Enhanced

    • Full load: 1.58x faster than NPY (4.11ms vs 6.48ms) ⬆️
    • Lazy load: 45x faster than NPY mmap (0.002ms vs 0.091ms)
    • Optimized with adaptive buffering and SIMD acceleration
  3. Batch Processing - Excellent Performance ⭐ Strong

    • Batch Mode: 20.4x speedup (20.5ms vs 418ms normal mode)
    • Writable Batch Mode: 94.8x speedup (4.4ms) ⬆️
    • System-wide I/O optimizations benefit all modes
  4. Sequential Access πŸ“Š

    • Small batch (100 rows): 17.5x slower than NPY (0.019ms vs 0.001ms)
    • Medium batch (1K rows): 20.2x slower (0.036ms vs 0.002ms)
    • Large batch (10K rows): 33.8x slower (0.264ms vs 0.008ms)
    • Still significantly faster than all other formats (Zarr: 3.05ms, HDF5: 0.78ms, NPZ: 169ms)
    • Note: Tests use real data reads; NPY mmap view-only is faster but not practical
  5. Random Access - Significantly Improved πŸ”₯ Major Enhancement

    • Small batch (100 indices): 15.4x slower (0.038ms vs 0.002ms)
    • Medium batch (1K indices): 2.4x slower (0.060ms vs 0.025ms) βœ… Improved from 397x!
    • Large batch (10K indices): 16.4x slower (1.53ms vs 0.093ms) - affected by page faults ⚠️
    • However: NumPack still 334x faster than HDF5 for 10K random access (1.53ms vs 511ms)
    • Key trade-off: NPY excels at random read BUT 321x slower on writes
    • For mixed read-write workloads, NumPack offers better overall balance
  6. Storage Efficiency

    • File size identical to NPY (38.15MB)
    • ~10% smaller than Zarr/NPZ (compressed formats)

When to Use NumPack

βœ… Strongly Recommended (85% of use cases):

  • Machine learning and deep learning pipelines
  • Real-time data stream processing
  • Data annotation and correction workflows
  • Feature stores with dynamic updates
  • Any scenario requiring frequent data modifications (321x faster writes!)
  • Fast data loading requirements (1.58x faster than NPY)
  • Balanced read-write workloads
  • Sequential data processing workflows

⚠️ Consider Alternatives (15% of use cases):

  • Write-once, never modify β†’ Use NPY (1.95x faster write, but 321x slower for updates)
  • Frequent random access β†’ Use NPY (2.4x-16x faster for random reads)
  • Pure read-only with heavy sequential access β†’ Use NPY mmap (20-41x faster)
  • Extreme compression requirements β†’ Use NPZ (10% smaller, but 1000x slower)

πŸ’‘ Performance Trade-offs & Insights:

  • Write operations: NumPack dominant (321x faster replacements, 224x faster appends)
  • Read operations: NPY faster for random/sequential access (2.4x-41x), especially for small batches
  • Major improvement: 1K random access improved from 397x to 2.4x slower ⬆️
  • Overall balance: NumPack excels in mixed read-write workloads
  • For pure read-heavy (>95% reads), NPY may be better
  • For write-intensive or balanced workloads (>5% writes), NumPack is superior
  • Key insight: Tests use real data reads; NPY mmap view-only is faster but not practical

Best Practices

1. Use Writable Batch Mode for Frequent Modifications

# 94.8x speedup for frequent modifications
with NumPack("data.npk") as npk:
    with npk.writable_batch_mode() as wb:
        for i in range(1000):
            arr = wb.load('data')
            arr[:10] *= 2.0
# Automatic persistence on exit

2. Use Batch Mode for Batch Operations

# 20.4x speedup for batch processing
with NumPack("data.npk") as npk:
    with npk.batch_mode():
        for i in range(1000):
            arr = npk.load('data')
            arr[:10] *= 2.0
            npk.save({'data': arr})
# Single write on exit with smart dirty tracking

3. Use Lazy Loading for Large Datasets

with NumPack("large_data.npk") as npk:
    # Only 0.002ms to initialize
    lazy_array = npk.load("array", lazy=True)
    # Data loaded on demand
    subset = lazy_array[1000:2000]

4. Reuse NumPack Instances

# βœ… Efficient: Reuse instance
with NumPack("data.npk") as npk:
    for i in range(100):
        data = npk.load('array')

# ❌ Inefficient: Create new instance each time
for i in range(100):
    with NumPack("data.npk") as npk:
        data = npk.load('array')

Benchmark Methodology

All benchmarks use:

  • timeit for precise timing
  • Multiple repeats, best time selected
  • Pure operation time (excluding file open/close overhead)
  • Float32 arrays
  • macOS Apple Silicon (results may vary by platform)
  • Comprehensive testing across multiple formats (NPY, NPZ, Zarr, HDF5, Parquet, Arrow/Feather)

New in this version:

  • Added random access and sequential access benchmarks across different batch sizes (100, 1K, 10K)
  • Important: NPY mmap tests force actual data reads using np.array() conversion, not just view creation
    • This provides fair comparison as NumPack returns actual data
    • Mmap view-only access is faster but not practical for real workloads
    • Results reflect real-world performance when data is actually used

For complete benchmark code, see unified_benchmark.py.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the Apache License, Version 2.0 - see the LICENSE file for details.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

πŸš€ Supercharge Your NumPy Arrays | ⚑️ Instant TB-scale Data Ops | πŸ’Ύ Zero Memory Overhead | πŸ”„ Stream Huge Arrays Like Small Ones | πŸ›‘οΈ Production-Ready

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published