Thank you for contributing to Apache DataSketches!
The goal of this document is to provide everything you need to start contributing to this core Rust library.
- Fork the DataSketches repository in your own GitHub account.
- Create a new Git branch.
- Make your changes.
- Submit the branch as a pull request to the upstream repo. A DataSketches team member should comment and/or review your pull request within a few days. Although, depending on the circumstances, it may take longer.
This repo develops Apache® DataSketches™ Core Rust Library Component. To build this project, you will need to set up Rust development first. We highly recommend using rustup for the setup process.
For Linux or macOS users, use the following command:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shFor Windows users, download rustup-init.exe from here instead.
Rustup will read the rust-toolchain.toml file and set up everything else automatically. To ensure that everything works correctly, run cargo version under the root directory:
cargo version
# cargo 1.85.0 (<hash> 2024-12-31)To keep code style consistent, run cargo x lint --fix to automatically fix any style issues before committing your changes.
We recommend using cargo x as a single entrypoint (provided by the workspace xtask crate). This repo defines the cargo x alias in .cargo/config.toml, which maps to cargo run --package x -- ....
Build:
cargo build --workspaceTest:
cargo x test
# or
cargo test --workspace --no-default-featuresLint:
cargo x lintcargo x lint runs the following steps. Use these directly when you need more control or want to isolate failures:
cargo +nightly clippy --tests --all-features --all-targets --workspace -- -D warnings
cargo +nightly fmt --all --check
taplo format --check
typos
hawkeye checkAutomatic fix commands:
cargo +nightly clippy --tests --all-features --all-targets --workspace --allow-staged --allow-dirty --fix
cargo +nightly fmt --all
taplo format
hawkeye format --fail-if-updated=falseInstall the extra tools with:
cargo install taplo-cli typos-cli hawkeyeSome tests depend on snapshot files under datasketches/tests/serialization_test_data. If they are missing, tests will fail. Regenerate them with:
python3 ./tools/generate_serialization_test_data.py --allThe script pulls datasketches-java and datasketches-cpp and writes files to:
datasketches/tests/serialization_test_data/java_generated_filesdatasketches/tests/serialization_test_data/cpp_generated_files
You can generate them separately:
python3 ./tools/generate_serialization_test_data.py --java
python3 ./tools/generate_serialization_test_data.py --cppThe script requires these commands on PATH (and network access):
- Java data:
git,java,mvn - C++ data:
git,cmake,ctest
The current datasketches-java generation flow requires JDK >= 25 and Maven >= 3.9.11, otherwise Maven Enforcer will fail.
We expect all community members to follow our Code of Conduct.