fastLowess

High-performance parallel and GPU-accelerated LOWESS (Locally Weighted Scatterplot Smoothing) for Rust — A high-level wrapper around the lowess crate that adds rayon-based parallelism, GPU acceleration, and seamless ndarray integration.

Important

For a minimal, single-threaded, and no_std version, use base lowess.

Features

Parallel by Default: Multi-core regression fits via rayon, achieving multiple orders of magnitude speedups on large datasets.
ndarray Integration: Native support for Array1<T> and ArrayView1<T>.
Robust Statistics: MAD-based scale estimation and IRLS with Bisquare, Huber, or Talwar weighting.
Uncertainty Quantification: Point-wise standard errors, confidence intervals, and prediction intervals.
Optimized Performance: Delta optimization for skipping dense regions and streaming/online modes.
Parameter Selection: Built-in cross-validation for automatic smoothing fraction selection.

Robustness Advantages

Built on the same core as lowess, this implementation is more robust than statsmodels due to two key design choices:

MAD-Based Scale Estimation

For robustness weight calculations, this crate uses Median Absolute Deviation (MAD) for scale estimation:

s = median(|r_i - median(r)|)

In contrast, statsmodels uses median of absolute residuals:

s = median(|r_i|)

Why MAD is more robust:

MAD is a breakdown-point-optimal estimator—it remains valid even when up to 50% of data are outliers.
The median-centering step removes asymmetric bias from residual distributions.
MAD provides consistent outlier detection regardless of whether residuals are centered around zero.

Boundary Padding

This crate applies boundary policies (Extend, Reflect, Zero) at dataset edges:

Extend: Repeats edge values to maintain local neighborhood size.
Reflect: Mirrors data symmetrically around boundaries.
Zero: Pads with zeros (useful for signal processing).

statsmodels does not apply boundary padding, which can lead to:

Biased estimates near boundaries due to asymmetric local neighborhoods.
Increased variance at the edges of the smoothed curve.

Gaussian Consistency Factor

For interval estimation (confidence/prediction), residual scale is computed using:

sigma = 1.4826 * MAD

The factor 1.4826 = 1/Phi^-1(3/4) ensures consistency with the standard deviation under Gaussian assumptions.

Performance Advantages

The fastLowess crate demonstrates massive performance gains over Python's statsmodels. The Rust CPU backend is the decisive winner across almost all standard benchmarks, often achieving multi-hundred-fold speedups.

The table below shows speedups relative to the baseline.

Standard Benchmarks: Baseline is statsmodels (Python).
Large Scale Benchmarks: Baseline is Rust (Serial) (1x), as statsmodels times out.

Name	statsmodels	R	Rust (CPU)*	Rust (GPU)
clustered	162.77ms	[82.8x]²	[203-433x]¹	32.4x
constant_y	133.63ms	[92.3x]²	[212-410x]¹	17.5x
delta_large	0.51ms	[0.8x]²	[3.8-2.2x]¹	0.1x
delta_medium	0.79ms	[1.3x]²	[4.4-3.4x]¹	0.1x
delta_none	414.86ms	2.5x	[3.8-13x]²	[63.5x]¹
delta_small	1.45ms	[1.7x]²	[4.3-4.5x]¹	0.2x
extreme_outliers	488.96ms	[106.4x]²	[201-388x]¹	28.9x
financial_1000	13.55ms	[76.6x]²	[145-108x]¹	4.7x
financial_10000	302.20ms	[168.3x]²	[453-611x]¹	26.3x
financial_500	6.49ms	[58.0x]¹	[113-58x]²	2.7x
financial_5000	103.94ms	[117.3x]²	[296-395x]¹	14.1x
fraction_0.05	122.00ms	[177.6x]²	[421-350x]¹	14.5x
fraction_0.1	140.59ms	[112.8x]²	[291-366x]¹	15.9x
fraction_0.2	181.57ms	[85.3x]²	[210-419x]¹	19.3x
fraction_0.3	220.98ms	[84.8x]²	[168-380x]¹	22.4x
fraction_0.5	296.47ms	[80.9x]²	[146-415x]¹	27.3x
fraction_0.67	362.59ms	[83.1x]²	[129-413x]¹	32.0x
genomic_1000	17.82ms	[15.9x]²	[19-33x]¹	6.5x
genomic_10000	399.90ms	3.6x	[5.3-16x]²	[70.3x]¹
genomic_5000	138.49ms	5.0x	[7.0-19x]²	[34.8x]¹
genomic_50000	6776.57ms	2.4x	[3.5-11x]²	[269.2x]¹
high_noise	435.85ms	[132.6x]²	[134-375x]¹	32.3x
iterations_0	45.18ms	[128.4x]²	[266-405x]¹	10.6x
iterations_1	94.10ms	[114.3x]²	[236-384x]¹	14.4x
iterations_10	495.65ms	[116.0x]²	[204-369x]¹	27.0x
iterations_2	135.48ms	[109.0x]²	[219-432x]¹	16.6x
iterations_3	181.56ms	[108.8x]²	[213-382x]¹	18.7x
iterations_5	270.58ms	[110.4x]²	[208-345x]¹	22.7x
scale_1000	17.95ms	[82.6x]²	[150-107x]¹	8.1x
scale_10000	408.13ms	[178.1x]²	[433-552x]¹	76.3x
scale_5000	139.81ms	[133.6x]²	[289-401x]¹	28.8x
scale_50000	6798.58ms	[661.0x]²	[1077-1264x]¹	277.2x
scientific_1000	19.04ms	[70.1x]²	[113-115x]¹	5.4x
scientific_10000	479.57ms	[190.7x]²	[370-663x]¹	35.2x
scientific_500	8.59ms	[49.6x]²	[91-52x]¹	3.2x
scientific_5000	161.42ms	[124.9x]²	[244-427x]¹	17.9x
scale_100000**	-	-	1-1.3x	0.3x
scale_1000000**	-	-	1-1.3x	0.3x
scale_2000000**	-	-	1-1.5x	0.3x
scale_250000**	-	-	1-1.4x	0.3x
scale_500000**	-	-	1-1.3x	0.3x

* Rust (CPU): Shows range Seq - Par. E.g., 12-48x means 12x speedup (Sequential) and 48x speedup (Parallel). Rank determined by Parallel speedup. ** Large Scale: Rust (Serial) is the baseline (1x).

¹ Winner (Fastest implementation) ² Runner-up (Second fastest implementation)

Key Takeaways:

Rust (Parallel CPU) is the dominant performer for general-purpose workloads, consistently achieving the highest speedups (often 300x-500x over statsmodels).
R (stats::lowess) is a very strong runner-up, frequently outperforming statsmodels by ~80-150x, but generally trailing Rust Parallel.
Rust (GPU) excels in specific high-compute scenarios (e.g., genomic with large datasets or delta_none where interpolation is skipped), but carries overhead that makes it slower than the highly optimized CPU backend for smaller datasets.
Large Scale Scaling: At very large scales (100k - 2M points), the parallel CPU backend maintains a modest lead (1.3x - 1.5x) over the sequential CPU backend, likely bottlenecked by memory bandwidth rather than compute.
Small vs Large Delta: Setting delta=0 (no interpolation, delta_none) allows the GPU to shine (63.5x speedup), outperforming both CPU variants due to the massive O(N²) interaction workload being parallelized across thousands of GPU cores.

Check Benchmarks for detailed results and reproducible benchmarking code.

Installation

CPU Backend (Default)

The default installation includes rayon-based parallelism and ndarray support:

[dependencies]
fastLowess = "0.4"

Or explicitly enable the cpu feature:

[dependencies]
fastLowess = { version = "0.4", features = ["cpu"] }

GPU Backend

For GPU acceleration using wgpu, enable the gpu feature:

[dependencies]
fastLowess = { version = "0.4", features = ["gpu"] }

Note

The GPU backend requires compatible GPU hardware and drivers. See the Backend Comparison section below for feature limitations.

Quick Start

use fastLowess::prelude::*;
use ndarray::Array1;

fn main() -> Result<(), LowessError> {
    // Data as ndarray Array1
    let x = Array1::linspace(0.0, 10.0, 100);
    let y = x.mapv(|v| v.sin() + 0.1 * v);

    // Build the model (parallel by default)
    let result = Lowess::new()
        .fraction(0.5)
        .adapter(Batch)
        .parallel(true)
        .build()?
        .fit(&x, &y)?;

    println!("Smoothed values: {:?}", result.y);
    Ok(())
}

Builder Methods

use fastLowess::prelude::*;

Lowess::new()
    // Smoothing span (0, 1]
    .fraction(0.5)

    // Robustness iterations
    .iterations(3)

    // Interpolation threshold
    .delta(0.01)

    // Kernel selection
    .weight_function(Tricube)

    // Robustness method
    .robustness_method(Bisquare)

    // Zero-weight fallback behavior
    .zero_weight_fallback(UseLocalMean)

    // Boundary handling (for edge effects)
    .boundary_policy(Extend)

    // Confidence intervals
    .confidence_intervals(0.95)

    // Prediction intervals
    .prediction_intervals(0.95)

    // Diagnostics
    .return_diagnostics()
    .return_residuals()
    .return_robustness_weights()

    // Cross-validation (for parameter selection)
    .cross_validate(KFold(5).with_fractions(&[0.3, 0.5, 0.7]).seed(123))

    // Convergence
    .auto_converge(1e-4)

    // Execution mode
    .adapter(Batch)

    // Backend (CPU or GPU)
    .backend(CPU)

    // Parallelism
    .parallel(true)

    // Build the model
    .build()?;

Backend Comparison

Backend	Use Case	Features	Limitations
CPU	General	All features	None
GPU (beta)	High-performance	Special circumstances	Only vanilla LOWESS

Warning

GPU Backend Limitations: The GPU backend is currently in Beta and is limited to vanilla LOWESS and does not support all features of the CPU backend:

Only Tricube kernel function
Only Bisquare robustness method
Only Batch adapter
No cross-validation
No intervals
No edge handling (bias at edges, original LOWESS behavior)
No zero-weight fallback
No diagnostics
No streaming or online mode

CPU Backend (Backend::CPU): The default and recommended choice. It is faster for all standard dense computations, supports all features (cross-validation, intervals, etc.), and has zero setup overhead.
GPU Backend (Backend::GPU): Use only if you have a massive dataset (> 10 million points) AND you are using no or very small delta optimization (e.g., delta(0.01)). In this specific "sparse" scenario, the GPU scales better than the CPU. for dense computation, the CPU is still faster.

Note

GPU vs CPU Precision: Results from the GPU backend are not guaranteed to be identical to the CPU backend due to:

Different floating-point precision
No padding at the edges in the GPU backend
Different scale estimation methods (MAD in CPU, MAR in GPU)

Result Structure

pub struct LowessResult<T> {
    /// Sorted x values (independent variable)
    pub x: Vec<T>,

    /// Smoothed y values (dependent variable)
    pub y: Vec<T>,

    /// Point-wise standard errors of the fit
    pub standard_errors: Option<Vec<T>>,

    /// Confidence interval bounds (if computed)
    pub confidence_lower: Option<Vec<T>>,
    pub confidence_upper: Option<Vec<T>>,

    /// Prediction interval bounds (if computed)
    pub prediction_lower: Option<Vec<T>>,
    pub prediction_upper: Option<Vec<T>>,

    /// Residuals (y - fit)
    pub residuals: Option<Vec<T>>,

    /// Final robustness weights from outlier downweighting
    pub robustness_weights: Option<Vec<T>>,

    /// Detailed fit diagnostics (RMSE, R^2, Effective DF, etc.)
    pub diagnostics: Option<Diagnostics<T>>,

    /// Number of robustness iterations actually performed
    pub iterations_used: Option<usize>,

    /// Smoothing fraction used (optimal if selected via CV)
    pub fraction_used: T,

    /// RMSE scores for each fraction tested during CV
    pub cv_scores: Option<Vec<T>>,
}

Tip

Using with ndarray: While the result struct uses Vec<T> for maximum compatibility, you can effortlessly convert any field to an Array1 using Array1::from_vec(result.y).

Streaming Processing

For datasets that don't fit in memory:

use fastLowess::prelude::*;

let mut processor = Lowess::new()
    .fraction(0.3)
    .iterations(2)
    .adapter(Streaming)
    .parallel(true)   // Enable parallel chunk processing
    .chunk_size(1000)
    .overlap(100)
    .build()?;

// Process data in chunks
for chunk in data_chunks {
    let result = processor.process_chunk(&chunk.x, &chunk.y)?;
}

// Finalize processing
let final_result = processor.finalize()?;

Online Processing

For real-time data streams:

use fastLowess::prelude::*;

let mut processor = Lowess::new()
    .fraction(0.2)
    .iterations(1)
    .adapter(Online)
    .parallel(false)  // Sequential for lowest per-point latency
    .window_capacity(100)
    .build()?;

// Process points as they arrive
for (x, y) in data_stream {
    if let Some(output) = processor.add_point(x, y)? {
        println!("Smoothed: {}", output.smoothed);
    }
}

Parameter Selection Guide

Fraction (Smoothing Span)

0.1-0.3: Local, captures rapid changes (wiggly)
0.4-0.6: Balanced, general-purpose
0.7-1.0: Global, smooth trends only
Default: 0.67 (2/3, Cleveland's choice)
Use CV when uncertain

Robustness Iterations

0: Clean data, speed critical
1-2: Light contamination
3: Default, good balance (recommended)
4-5: Heavy outliers
>5: Diminishing returns

Kernel Function

Tricube (default): Best all-around, smooth, efficient
Epanechnikov: Theoretically optimal MSE
Gaussian: Very smooth, no compact support
Uniform: Fastest, least smooth (moving average)

Delta Optimization

None: Small datasets (n < 1000)
0.01 × range(x): Good starting point for dense data
Manual tuning: Adjust based on data density

Examples

Check the examples directory for advanced usage:

cargo run --example batch_smoothing
cargo run --example online_smoothing
cargo run --example streaming_smoothing

MSRV

Rust 1.85.0 or later (2024 Edition).

Validation

Validated against:

Python (statsmodels): Passed on 44 distinct test scenarios.
Original Paper: Reproduces Cleveland (1979) results.

Check Validation for more information. Small variations in results are expected due to differences in scale estimation and padding.

Related Work

Contributing

Contributions are welcome! Please see the CONTRIBUTING.md file for more information.

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

References

Cleveland, W.S. (1979). "Robust Locally Weighted Regression and Smoothing Scatterplots". JASA.
Cleveland, W.S. (1981). "LOWESS: A Program for Smoothing Scatterplots". The American Statistician.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

fastLowess

Features

Robustness Advantages

MAD-Based Scale Estimation

Boundary Padding

Gaussian Consistency Factor

Performance Advantages

Installation

CPU Backend (Default)

GPU Backend

Quick Start

Builder Methods

Backend Comparison

Result Structure

Streaming Processing

Online Processing

Parameter Selection Guide

Fraction (Smoothing Span)

Robustness Iterations

Kernel Function

Delta Optimization

Examples

MSRV

Validation

Related Work

Contributing

License

References

About

Licenses found

Uh oh!

Releases 6

Languages

License

Licenses found

thisisamirv/fastLowess

Folders and files

Latest commit

History

Repository files navigation

fastLowess

Features

Robustness Advantages

MAD-Based Scale Estimation

Boundary Padding

Gaussian Consistency Factor

Performance Advantages

Installation

CPU Backend (Default)

GPU Backend

Quick Start

Builder Methods

Backend Comparison

Result Structure

Streaming Processing

Online Processing

Parameter Selection Guide

Fraction (Smoothing Span)

Robustness Iterations

Kernel Function

Delta Optimization

Examples

MSRV

Validation

Related Work

Contributing

License

References

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Languages