Skip to content

Add GPU acceleration for PQ transform and cluster assignment#4

Merged
afloresep merged 1 commit intomasterfrom
gpu
Mar 18, 2026
Merged

Add GPU acceleration for PQ transform and cluster assignment#4
afloresep merged 1 commit intomasterfrom
gpu

Conversation

@afloresep
Copy link
Owner

Adds optional CUDA support via Triton and PyTorch for the two compute-heavy pipeline steps: PQ encoding (encoder.transform) and cluster label assignment (clusterer.predict).

Both methods accept a device parameter ('auto'|'gpu'|'cpu'). The default 'auto' uses the GPU when available and falls back to CPU transparently

Benchmarked on 20M real molecules (K=100K, RTX 4070 Ti 16 GB):

  • PQ Transform: 7.3 s GPU vs 45.3 s CPU (6.2x)
  • Cluster Assignment: 29.9 s GPU vs ~879 s CPU (29.4x)
  • Combined for 9.6B: ~5 h GPU vs ~123 h CPU (~25x)

Adds optional CUDA support via Triton and PyTorch for the two
compute-heavy pipeline steps: PQ encoding (encoder.transform) and
cluster label assignment (clusterer.predict).

Both methods accept a `device` parameter ('auto'|'gpu'|'cpu').
The default 'auto' uses the GPU when available and falls back to
CPU transparently.  All existing tests pass (102/102).

Benchmarked on 20M real molecules (K=100K, RTX 4070 Ti 16 GB):
  - PQ Transform:        7.3 s GPU vs 45.3 s CPU  (6.2x)
  - Cluster Assignment: 29.9 s GPU vs ~879 s CPU (29.4x)
  - Combined for 9.6B:  ~5 h GPU vs ~123 h CPU  (~25x)

New files:
  chelombus/clustering/_gpu_predict.py  — Triton kernel
  scripts/benchmark_gpu_predict.py      — GPU vs CPU benchmark
  scripts/cluster_smiles.py             — end-to-end pipeline script
  data/10M_smiles.txt.gz                — test SMILES (85 MB gzip)
@afloresep afloresep merged commit 0cc33e5 into master Mar 18, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant