Releases: gpu-mode/popcorn-cli
Releases · gpu-mode/popcorn-cli
Release v1.3.6
Merge pull request #52 from yf225/docs/move-fp8-quant-to-problem-1
Release v1.3.5
More ui friendly register flow2
Release v1.3.4
More ui friendly register flow
Release v1.3.3
docs: add Discord auth hint after register step (#50) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Release v1.3.2
docs: simplify hackathon scoring to top-3 points system (#49) Replace the rank-based correctness/performance formula with a simpler top-3 system: 5 pts (1st), 3 pts (2nd), 1 pt (3rd) per scored problem. Mark fp8_quant as an unscored warm-up. Ties decided by kernel quality.
Release v1.3.1
docs: mention HELION_BACKEND=tileir alongside ENABLE_TILE=1 (#47) The TileIR backend requires both env vars to be set. Update the table, step-by-step instructions, and "Which should I use?" section to consistently mention both ENABLE_TILE=1 and HELION_BACKEND=tileir.
Release v1.3.0
feat: create dedicated kernel folder on `popcorn setup` (#46) * feat: create dedicated kernel folder on `popcorn setup` Instead of writing files directly into the current directory (which overwrites existing files), `popcorn setup` now creates a subfolder named after the problem directory (e.g. `softmax/`). If a folder with that name already exists, a `-N` suffix is appended (`softmax-1/`, `softmax-2/`, etc.) to avoid collisions. * docs: update setup docs to reflect new project folder behavior * style: fix rustfmt formatting in setup.rs
Release v1.2.51
docs: add scoring system, rules, and contribution track (#45) * docs: add TileIR backend usage guide to helion-hackathon.md Documents ENABLE_TILE=0 vs ENABLE_TILE=1 and the TileIR compilation pipeline available via nvtriton on B200 instances. Covers how to enable TileIR with Helion (ENABLE_TILE=1 + HELION_BACKEND=tileir), the different tunables (num_ctas/occupancy vs num_warps/maxnreg), and how to hardcode TileIR configs in submissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: restructure ACF + TileIR as optional performance knobs Group both sections under a single "Optional: Extra Performance Knobs" heading to emphasize neither is required. Streamline both into step 1 (autotune) / step 2 (hardcode) format. Add a "Which combination" section showing all 4 options to try. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove "(Booster Pack)" from ACF heading Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: consolidate TileIR env var instructions Remove duplicate bash export block — the Python os.environ in the code example is sufficient for both local autotuning and submissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify TileIR tunables come from autotuner output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: shorten "Which should I use?" section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: be explicit about ENABLE_TILE=0 vs ENABLE_TILE=1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: simplify TileIR comparison table to just backend names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add scoring system, rules, and open-ended contribution track Add point allocation table, scoring formula (correctness + performance ranking), rules & requirements, and the separate open-ended contribution track for non-kernel Helion contributions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: allow unlimited submissions, best one counts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify rules to match actual submission format Each submission uses one static helion.Config for all shapes, not per-shape configs. Simplified rules to reflect this. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Revert "docs: clarify rules to match actual submission format" This reverts commit 4fac3a8445c66c3a4e7f1994652d19e4c7fe76bf. * Add per-shape config dispatch pattern to all submissions Use a factory function (_make_kernel) to create kernel variants with different helion.Config objects, and dispatch in custom_kernel() based on input tensor shapes. This lets participants optimize each benchmark shape independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update example to show all shapes, remove DEFAULT_CONFIG Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: use Config(...) placeholders with distinct TODO comments for test vs benchmark shapes Test shapes: TODO to replace with default config or any config that passes correctness. Benchmark shapes: TODO to replace with autotuned config. Also add instructions on getting default config via autotune_effort="none". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove references to single-config-for-all-shapes pattern Per-shape configs are the recommended approach. Remove mentions of using a single config across all shapes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove references to default config in rules section Configs are always participant-provided. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add tips for version control, tmux, and machine reboots Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move GPU machine tips to standalone section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix performance metric description to match actual eval method The previous description incorrectly stated geometric mean of 100 runs. The actual helion eval uses CUDA graphs with L2 cache clearing, 10 measurements, and arithmetic mean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Replace hard 30% LOC limit with judges' discretion for inline triton/asm The LOC-based rule was gameable (denominator inflation with padding code), so switch to a qualitative rule: inline triton/asm is allowed as escape hatches, but predominantly inline submissions may be disqualified. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add spawn mode tip for autotuning in GPU machine section Spawn mode isolates each autotuner trial in a subprocess with timeout protection, preventing hangs or crashes from killing the entire run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Clarify that spawn mode is slower than fork mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Release v1.2.50
docs: add TileIR backend (ENABLE_TILE) usage guide (#44) * docs: add TileIR backend usage guide to helion-hackathon.md Documents ENABLE_TILE=0 vs ENABLE_TILE=1 and the TileIR compilation pipeline available via nvtriton on B200 instances. Covers how to enable TileIR with Helion (ENABLE_TILE=1 + HELION_BACKEND=tileir), the different tunables (num_ctas/occupancy vs num_warps/maxnreg), and how to hardcode TileIR configs in submissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: restructure ACF + TileIR as optional performance knobs Group both sections under a single "Optional: Extra Performance Knobs" heading to emphasize neither is required. Streamline both into step 1 (autotune) / step 2 (hardcode) format. Add a "Which combination" section showing all 4 options to try. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove "(Booster Pack)" from ACF heading Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: consolidate TileIR env var instructions Remove duplicate bash export block — the Python os.environ in the code example is sufficient for both local autotuning and submissions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify TileIR tunables come from autotuner output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: shorten "Which should I use?" section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: be explicit about ENABLE_TILE=0 vs ENABLE_TILE=1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: simplify TileIR comparison table to just backend names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Release v1.2.49
docs: add ACF (booster pack) usage guide (#43) * docs: add ACF (booster pack) usage guide to helion-hackathon.md Documents how to use PTXAS Advanced Controls Files from /opt/booster_pack/ during autotuning (autotune_search_acf) and in hardcoded submissions (advanced_controls_file). Includes the important caveat that ACF search only works when the autotuner actually runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove "How ACFs work" subsection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>