Releases · gpu-mode/popcorn-cli

14 Mar 21:08

v1.3.6

1d36991

Release v1.3.6 Latest

Latest

Merge pull request #52 from yf225/docs/move-fp8-quant-to-problem-1

Assets 5

14 Mar 17:37

github-actions

v1.3.5

fa1291f

Release v1.3.5

More ui friendly register flow2

Assets 5

14 Mar 17:32

github-actions

v1.3.4

3fd2da9

Release v1.3.4

More ui friendly register flow

Assets 5

14 Mar 17:31

github-actions

v1.3.3

66daac5

Release v1.3.3

docs: add Discord auth hint after register step (#50)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Assets 5

14 Mar 17:18

github-actions

v1.3.2

68fd44a

Release v1.3.2

docs: simplify hackathon scoring to top-3 points system (#49)

Replace the rank-based correctness/performance formula with a simpler
top-3 system: 5 pts (1st), 3 pts (2nd), 1 pt (3rd) per scored problem.
Mark fp8_quant as an unscored warm-up. Ties decided by kernel quality.

Assets 5

14 Mar 16:18

github-actions

v1.3.1

5b9f29d

Release v1.3.1

docs: mention HELION_BACKEND=tileir alongside ENABLE_TILE=1 (#47)

The TileIR backend requires both env vars to be set. Update the
table, step-by-step instructions, and "Which should I use?" section
to consistently mention both ENABLE_TILE=1 and HELION_BACKEND=tileir.

Assets 5

14 Mar 04:15

github-actions

v1.3.0

d87db4d

Release v1.3.0

feat: create dedicated kernel folder on `popcorn setup` (#46)

* feat: create dedicated kernel folder on `popcorn setup`

Instead of writing files directly into the current directory (which
overwrites existing files), `popcorn setup` now creates a subfolder
named after the problem directory (e.g. `softmax/`). If a folder with
that name already exists, a `-N` suffix is appended (`softmax-1/`,
`softmax-2/`, etc.) to avoid collisions.

* docs: update setup docs to reflect new project folder behavior

* style: fix rustfmt formatting in setup.rs

Assets 5

14 Mar 04:15

github-actions

v1.2.51

506fbd4

Release v1.2.51

docs: add scoring system, rules, and contribution track (#45)

* docs: add TileIR backend usage guide to helion-hackathon.md

Documents ENABLE_TILE=0 vs ENABLE_TILE=1 and the TileIR compilation
pipeline available via nvtriton on B200 instances. Covers how to enable
TileIR with Helion (ENABLE_TILE=1 + HELION_BACKEND=tileir), the
different tunables (num_ctas/occupancy vs num_warps/maxnreg), and how
to hardcode TileIR configs in submissions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: restructure ACF + TileIR as optional performance knobs

Group both sections under a single "Optional: Extra Performance Knobs"
heading to emphasize neither is required. Streamline both into
step 1 (autotune) / step 2 (hardcode) format. Add a "Which combination"
section showing all 4 options to try.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove "(Booster Pack)" from ACF heading

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: consolidate TileIR env var instructions

Remove duplicate bash export block — the Python os.environ in the
code example is sufficient for both local autotuning and submissions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify TileIR tunables come from autotuner output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: shorten "Which should I use?" section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: be explicit about ENABLE_TILE=0 vs ENABLE_TILE=1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: simplify TileIR comparison table to just backend names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add scoring system, rules, and open-ended contribution track

Add point allocation table, scoring formula (correctness + performance
ranking), rules & requirements, and the separate open-ended contribution
track for non-kernel Helion contributions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: allow unlimited submissions, best one counts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify rules to match actual submission format

Each submission uses one static helion.Config for all shapes, not
per-shape configs. Simplified rules to reflect this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "docs: clarify rules to match actual submission format"

This reverts commit 4fac3a8445c66c3a4e7f1994652d19e4c7fe76bf.

* Add per-shape config dispatch pattern to all submissions

Use a factory function (_make_kernel) to create kernel variants with
different helion.Config objects, and dispatch in custom_kernel() based
on input tensor shapes. This lets participants optimize each benchmark
shape independently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update example to show all shapes, remove DEFAULT_CONFIG

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: use Config(...) placeholders with distinct TODO comments for test vs benchmark shapes

Test shapes: TODO to replace with default config or any config that passes correctness.
Benchmark shapes: TODO to replace with autotuned config.
Also add instructions on getting default config via autotune_effort="none".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove references to single-config-for-all-shapes pattern

Per-shape configs are the recommended approach. Remove mentions of
using a single config across all shapes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove references to default config in rules section

Configs are always participant-provided.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add tips for version control, tmux, and machine reboots

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: move GPU machine tips to standalone section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix performance metric description to match actual eval method

The previous description incorrectly stated geometric mean of 100 runs.
The actual helion eval uses CUDA graphs with L2 cache clearing, 10
measurements, and arithmetic mean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Replace hard 30% LOC limit with judges' discretion for inline triton/asm

The LOC-based rule was gameable (denominator inflation with padding code),
so switch to a qualitative rule: inline triton/asm is allowed as escape
hatches, but predominantly inline submissions may be disqualified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add spawn mode tip for autotuning in GPU machine section

Spawn mode isolates each autotuner trial in a subprocess with timeout
protection, preventing hangs or crashes from killing the entire run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Clarify that spawn mode is slower than fork mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Assets 5

14 Mar 04:13

github-actions

v1.2.50

44a8b35

Release v1.2.50

docs: add TileIR backend (ENABLE_TILE) usage guide (#44)

* docs: add TileIR backend usage guide to helion-hackathon.md

Documents ENABLE_TILE=0 vs ENABLE_TILE=1 and the TileIR compilation
pipeline available via nvtriton on B200 instances. Covers how to enable
TileIR with Helion (ENABLE_TILE=1 + HELION_BACKEND=tileir), the
different tunables (num_ctas/occupancy vs num_warps/maxnreg), and how
to hardcode TileIR configs in submissions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: restructure ACF + TileIR as optional performance knobs

Group both sections under a single "Optional: Extra Performance Knobs"
heading to emphasize neither is required. Streamline both into
step 1 (autotune) / step 2 (hardcode) format. Add a "Which combination"
section showing all 4 options to try.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove "(Booster Pack)" from ACF heading

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: consolidate TileIR env var instructions

Remove duplicate bash export block — the Python os.environ in the
code example is sufficient for both local autotuning and submissions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify TileIR tunables come from autotuner output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: shorten "Which should I use?" section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: be explicit about ENABLE_TILE=0 vs ENABLE_TILE=1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: simplify TileIR comparison table to just backend names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Assets 5

13 Mar 22:32

github-actions

v1.2.49

7d015d3

Release v1.2.49

docs: add ACF (booster pack) usage guide (#43)

* docs: add ACF (booster pack) usage guide to helion-hackathon.md

Documents how to use PTXAS Advanced Controls Files from /opt/booster_pack/
during autotuning (autotune_search_acf) and in hardcoded submissions
(advanced_controls_file). Includes the important caveat that ACF search
only works when the autotuner actually runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: remove "How ACFs work" subsection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Assets 5

Releases: gpu-mode/popcorn-cli

Release v1.3.6

Uh oh!

Release v1.3.5

Uh oh!

Release v1.3.4

Uh oh!

Release v1.3.3

Uh oh!

Release v1.3.2

Uh oh!

Release v1.3.1

Uh oh!

Release v1.3.0

Uh oh!

Release v1.2.51

Uh oh!

Release v1.2.50

Uh oh!

Release v1.2.49

Uh oh!