Release v0.1.7 · pytorch/helion

What's Changed

Generalize the cuda-bias test cases by replacing hardcoded "cuda" literal with the DEVICE variable by @EikanWang in #775
Make progress bar prettier by @oulgen in #786
Upgrade ruff==0.13.3 pyright==1.1.406 by @jansel in #790
Add hl.split and hl.join by @jansel in #791
Generalize test_print and test_tensor_descriptor to support different accelerators by @EikanWang in #801
Limit rebench to 1000 iterations by @jansel in #789
Turn down autotuner defaults by @jansel in #788
Enable torch.xpu._XpuDeviceProperties in Helion kernel by @EikanWang in #798
Better error message for augmented assignment (e.g. +=) on host tensor without subscript by @yf225 in #807
Add Pattern Search autotuning algorithm to docs. by @choijon5 in #810
Support 0dim tensor in output code printing by @oulgen in #806
Set range_num_stages <= 1 if using tensor_descriptor, to avoid CUDA misaligned address error by @yf225 in #792
Add hl.inline_triton API by @jansel in #811
Add out_dtype arg to hl.dot by @jansel in #813
Add autotune_config_overrides by @jansel in #814
Reduce initial_population to 100 by @jansel in #800
Disable range_num_stages for kernels with aliasing by @jansel in #812
Adding new setting, autotune_max_generations, that allows user to set the maximum number of generations for autotuning by @choijon5 in #796
Increase tolerance for test_matmul_reshape_m_2 by @jansel in #816
Update docs by @jansel in #815
Fix torch version check by @adam-smnk in #818
[Benchmark] Keep going when a single benchmark fails by @oulgen in #820
Faster Helion JSD by @PaulZhang12 in #733
Faster KL Div by @PaulZhang12 in #822
Normalize device name and decorate cuda-only test cases by @EikanWang in #819
Improved log messages for autotuning by @choijon5 in #817
Apply simplification to range indexing in order to reuse block size symbols by @yf225 in #809
Fix hl.rand to use tile specific offsets instead of fixed offsets, ensure unique random num per tile by @karthickai in #685
Match cuda versions for benchmark by @oulgen in #828
Print nvidia-smi/rocminfo by @oulgen in #827
Dump nvidia-smi/rocminfo on benchmarks by @oulgen in #829
Add 3.14 support by @oulgen in #830
Remove py312 vanilla test by @oulgen in #831
Pad to next power of 2 for hl.specialize'ed shape value used in device tensor creation by @yf225 in #804
Autotune eviction policy by @oulgen in #823
[Docs] Consistent pre-commit/lint by @oulgen in #836
[Docs] Recommend venv instead of conda by @oulgen in #837
[Docs] Helion works on 3.10 through 3.14 by @oulgen in #838
[Docs] Add eviction policy by @oulgen in #839
Update to use the new attribute setting for tf32. by @choijon5 in #835

Full Changelog: v0.1.6...v0.1.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.7

What's Changed

Contributors

Uh oh!