Skip to content

Conversation

@ashvp
Copy link

@ashvp ashvp commented Nov 22, 2025

Metadata
Reference Issue: Fixes #875
New Tests Added: Yes
Documentation Updated: No
Change Log Entry: “Add configurable checksum (digest) checking to downloads”

Details:
This PR implements the feature requested in Issue #875, allowing users to configure whether MD5 digest validation should be performed when downloading data.

What this PR implements

  1. Adds a keyword-only parameter check_digest to _send_request, defaulting to True to maintain current behavior.
  2. Integrates the flag with the existing checksum validation logic.
  3. Updates global configuration (openml.config.check_digest) so users can disable digest checking across all downloads.
  4. Adds a dedicated test suite validating:
    • Correct checksum acceptance
    • Mismatch detection
    • Digest-skipping behavior when the flag is turned off

Why this change is needed
The client currently always validates checksums, which can cause performance overhead or failures in workflows where:

  • files are repeatedly downloaded in tight loops,
  • local caching is trusted,
  • or when users intentionally bypass digest validation for speed.
    Making digest checking optional improves flexibility without compromising default safety.

How to reproduce the issue being solved
Before:
Calling _send_request(..., md5_checksum="wrong") always raises OpenMLHashException.

After:
Calling _send_request(..., md5_checksum="wrong", check_digest=False) succeeds without raising an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenML Python API: allow configuring if digest should be checked when downloading data

2 participants