Skip to content

ML-based row-level anomaly detection#990

Open
vb-dbrks wants to merge 234 commits intomainfrom
957-ml-has_no_anomaly
Open

ML-based row-level anomaly detection#990
vb-dbrks wants to merge 234 commits intomainfrom
957-ml-has_no_anomaly

Conversation

@vb-dbrks
Copy link
Contributor

@vb-dbrks vb-dbrks commented Jan 8, 2026

Changes

This PR adds ML-based row anomaly detection to automatically find unusual rows in data (per‑record anomalies with explanations) without manually specifying thresholds so you can catch issues that rule-based checks miss. You provide recent good data. DQX trains a model and flags rows that don't fit typical patterns.

Key features:

  • Auto‑discovery of columns and segmentation where appropriate
  • Isolation Forest training with Spark scoring
  • Explainability via SHAP-based feature contributions
  • Unity Catalog / MLflow integration for model storage and versioning
  • New check function: has_no_anomalies() (percentile‑based severity)
  • Production defaults: severity threshold 95, contributions enabled by default, ensemble support

What’s included:

  • New AnomalyEngine for training models
  • Feature engineering for numeric, categorical, datetime, and boolean columns
  • Model registry + drift detection metadata
  • Demo notebook updates
  • Documentation updates (guide + reference + install)
  • Test improvements (unit + integration)

Complements Data Quality Monitoring which focuses on completeness and freshness.

Resolves #957

Tests

  • manually tested (ran demos on Databricks)
  • added unit tests (new tests for exact distinct + expected_anomaly_rate behavior)
  • added integration tests (training, scoring, segmentation, explainability, drift)
  • added end-to-end tests
  • added performance tests

@vb-dbrks vb-dbrks linked an issue Jan 8, 2026 that may be closed by this pull request
1 task
@vb-dbrks vb-dbrks added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 8, 2026
@vb-dbrks vb-dbrks marked this pull request as ready for review January 8, 2026 17:26
@vb-dbrks vb-dbrks requested a review from a team as a code owner January 8, 2026 17:26
@vb-dbrks vb-dbrks requested review from pratikk-databricks and removed request for a team January 8, 2026 17:26
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

✅ 593/593 passed, 36 skipped, 6h27m19s total

Running from acceptance #3996

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: ML-based Anomaly Detection for row-level (has_no_anomalies)

3 participants