Skip to content

Conversation

@JessieeeNotLi
Copy link

@JessieeeNotLi JessieeeNotLi commented Nov 19, 2025

add cache latency benchmarks

Summary by CodeRabbit

  • New Features

    • Added many new benchmarks: GPU cache microbenchmark, BERT language inference, image-classification, recommendation (DLRM), multiple JAX scientific suites, and expanded linear-algebra benchmarks.
  • Improvements

    • Migrated CI to GitHub Actions for automated linting and reporting.
    • Added local pre-commit and editor settings for consistent Python formatting.
    • Normalized code/style across the benchmark suite and updated benchmark data submodule.

✏️ Tip: You can customize this high-level summary in your review settings.

McLavish and others added 30 commits October 30, 2025 08:53
sync: image-classification requirements + add 605.lu benchmark

- Resolve conflicts in 413.image-classification/python/requirements*
- Drop py3.6/py3.7 variants removed upstream; keep/update 3.8–3.11
- Add new 600.linearalgebra/605.lu benchmark (config, input, function, reqs)
- Rename local_deployment.tmp -> 600.linearalgebra/605.lu/config.json
- Update local_deployment.json; add out_benchmark*.json; update out_storage.json
@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Walkthrough

Removes CircleCI config, adds a GitHub Actions lint workflow, updates submodule URL and mypy settings, introduces pre-commit and VSCode settings, and adds many new/updated benchmarks (GPU pointer‑chase, ONNX inference, JAX workloads, PyTorch linear‑algebra plus supporting configs, scripts, and requirements). Many files also have stylistic quote/formatting changes.

Changes

Cohort / File(s) Summary
CI / Linting
\.circleci/config.yml, .github/workflows/lint.yml
Removed CircleCI config; added GitHub Actions lint workflow (Black check, Flake8, deps install, artifact upload).
Repo infra & editor
.gitmodules, .mypy.ini, .pre-commit-config.yaml, .vscode/settings.json, requirements.local.txt, benchmarks-data
Updated benchmarks-data submodule URL and commit; broadened mypy ignores for docker.*; added local pre-commit hooks for Black/Flake8; added VSCode Python settings; removed minio pin from local requirements.
Global style changes
benchmarks/** (many files)
Wide stylistic changes across benchmarks: single→double quotes, reflowed function signatures, removed unused imports, minor path-string normalizations. No intended semantic changes for most files.
GPU pointer-chase microbenchmark
benchmarks/000.microbenchmarks/050.gpu-cache-latency/*
New benchmark: config, input generator, PyTorch pointer-chase implementation with pattern modes, CUDA/CPU fallback, and requirements (torch, numpy).
Clock sync enhancement
benchmarks/000.microbenchmarks/030.clock-synchronization/python/function.py
Augmented clock-sync loop: iterative exchanges, failure counters, min-time detection, CSV logging, and improved timing/result payload.
Microbenchmarks edits
benchmarks/000.microbenchmarks/*
Formatting, quoting, small unused-import removals; one added module-level size_generators in server-reply/input.py.
Webapps & utilities
benchmarks/100.webapps/*, benchmarks/300.utilities/*, benchmarks/200.multimedia/*
Mostly formatting and quoting normalization; minor docstring and path adjustments.
Image recognition
benchmarks/400.inference/411.image-recognition/*
Introduced module-level storage client and lazy model load (resnet50), preprocessing tweaks; formatting changes.
ONNX BERT language inference
benchmarks/400.inference/412.language-bert/*
New benchmark: config, input helpers, ONNX Runtime-based handler with lazy model extraction/loading, tokenizer/labels handling, CUDA provider enforcement, timing instrumentation, packaging scripts, and multi-version requirements.
ONNX image-classification
benchmarks/400.inference/413.image-classification/*
New benchmark: config, input generator, ONNX inference handler with preprocessing, label map JSON, packaging/init scripts, and multiple requirements files.
Recommendation (TinyDLRM)
benchmarks/400.inference/413.recommendation/*
New TinyDLRM benchmark: input helpers, model loader, batched CUDA inference handler, timing instrumentation, packaging scripts, and requirements.
JAX scientific benchmarks
benchmarks/500.scientific/5xx.*
New JAX-based benchmarks (channel_flow CFD, compute, ResNet-like block) with implementations, inputs, configs, and jax[cuda12] requirements.
PyTorch linear-algebra suite
benchmarks/600.linearalgebra/60[1-6].*
Six new CUDA-enabled linear algebra benchmarks (GEMM, AXPY, Jacobi-2D, Cholesky, LU, SPMV) with configs, input generators, handlers, timing, and torch requirements.
Misc new benchmark inputs & small changes
benchmarks/400.inference/412.*, 413.*, 5xx.*, 6xx.*
Many added input.py files, config.json manifests, requirements variants, and helper scripts for the new benchmarks.

Sequence Diagram(s)

sequenceDiagram
    actor GitHub_Actions
    participant Repo
    participant PythonEnv
    rect rgb(240, 245, 255)
    Note over GitHub_Actions,Repo: Lint workflow (new)
    GitHub_Actions->>Repo: checkout
    GitHub_Actions->>PythonEnv: setup python, cache deps
    PythonEnv->>PythonEnv: install requirements via install.py
    PythonEnv->>Repo: run Black --check, run Flake8 -> artifact
    PythonEnv-->>GitHub_Actions: upload flake report
    end
Loading
sequenceDiagram
    participant Handler
    participant Storage
    participant ONNX
    participant GPU

    rect rgb(235, 245, 235)
    Note over Handler,Storage: ONNX inference (language/image)
    Handler->>Storage: download model archive / inputs
    Storage-->>Handler: model archive, input file(s)
    Handler->>Handler: extract if needed, _ensure_model (lazy)
    Handler->>ONNX: create/session load (CUDA provider)
    ONNX->>GPU: initialize on device (if CUDA)
    Handler->>ONNX: prepare inputs (tokenize / preprocess)
    ONNX-->>Handler: logits / outputs
    Handler->>Handler: postprocess (softmax, top-k, map labels)
    Handler-->>Storage: optionally upload results
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • ONNX-based modules (412, 413): model extraction, CUDA provider checks, lazy init, and tokenizer/label handling.
  • New CUDA PyTorch benchmarks (600.*, 050.gpu-cache-latency): correct device handling, CUDA event timing, and deterministic seeding.
  • JAX workloads (500.scientific/5xx.*): algorithmic correctness, device_get handling, and stability of iterative solvers.
  • Packaging/init scripts and multiple per-version requirements: ensure packaging steps and version pins are consistent.

Possibly related PRs

  • Add interface for NoSQL storage #214 — related changes to benchmark input signatures adding nosql_func and integration of storage/NoSQL tooling that align with new input generator signatures and benchmark wiring in this PR.

Poem

🐰 I hopped through diffs, rewrote each quote,

From CircleCI’s nest to GitHub’s new moat.
CUDA drums beat, ONNX sings bright,
JAX and PyTorch race into night.
A carrot of benchmarks — hop, run, and float! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.89% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Jessie/microbenchmarks' is vague and does not clearly describe the main changes. While it references 'microbenchmarks', it fails to convey specific information about the PR's purpose or scope. Revise the title to be more descriptive, such as 'Add GPU cache latency benchmark and ONNX inference models' or 'Introduce new microbenchmarks for GPU cache latency, BERT, image classification, and recommendation systems' to clearly reflect the primary changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9632860 and a1caa5a.

📒 Files selected for processing (1)
  • requirements.local.txt (0 hunks)
💤 Files with no reviewable changes (1)
  • requirements.local.txt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JessieeeNotLi
Copy link
Author

add cache latency benchmarks

JessieeeNotLi and others added 3 commits November 19, 2025 20:27
This document provides detailed instructions for running the GPU Cache Latency benchmark, including system requirements, build steps, and example output.
This readme provides detailed instructions for running the GPU Cache Latency benchmark, including system requirements, build steps, and example commands.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 28

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
benchmarks/000.microbenchmarks/030.clock-synchronization/python/function.py (1)

35-75: Fix inconsistent failure handling after changing timeout threshold

Inside the loop you now abort after 7 consecutive timeouts:

if consecutive_failures == 7:
    print("Can't setup the connection")
    break

but after the loop you still use if consecutive_failures != 5: to decide whether to write/upload results. This makes the “failure” else branch effectively unreachable, and even in the pure-failure case (7 timeouts, no data) you still write and upload a CSV (likely empty).

A cleaner approach is to drive this off whether you actually collected any measurements:

-    if consecutive_failures != 5:
+    if times:
         with open("/tmp/data.csv", "w", newline="") as csvfile:
             writer = csv.writer(csvfile, delimiter=",")
             writer.writerow(["id", "client_send", "client_rcv"])
             for row in times:
                 writer.writerow(row)

         client = storage.storage.get_instance()
         filename = "results-{}.csv".format(request_id)
         key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv")
-    else:
-        key = None
+    else:
+        key = None

This restores the intended behavior: only upload when you have data; otherwise return key = None, regardless of how many consecutive failures occurred.

♻️ Duplicate comments (1)
benchmarks/400.inference/413.recommendation/python/requirements.txt.3.11 (1)

1-1: Verify PyTorch 2.2.2 for security vulnerabilities.

Same concern as requirements.txt.3.10: ensure torch 2.2.2 is still supported and free from known security vulnerabilities.

What security vulnerabilities exist for PyTorch 2.2.2?
🧹 Nitpick comments (38)
benchmarks/400.inference/413.image-classification/python/requirements.txt.3.9 (1)

1-2: Consider adding upper bounds to version constraints.

The dependencies numpy>=2.0 and pillow>=10.0 lack upper bounds, which could lead to unexpected breaking changes when new major versions are released. Consider using constraints like numpy>=2.0,<3.0 and pillow>=10.0,<11.0 for more predictable builds.

benchmarks/000.microbenchmarks/040.server-reply/input.py (1)

1-11: Input mapping is consistent; unused parameters are expected interface baggage

size_generators plus generate_input returning {"sleep": size_generators[size]} matches the patterns in other microbenchmarks and keeps the shared generate_input signature.

The unused parameters (data_dir, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func) are expected here to conform to the common interface; Ruff’s ARG001 warnings can be ignored or, if you prefer, silenced by prefixing them with _ (e.g., _data_dir) without changing behavior.

benchmarks/100.webapps/120.uploader/input.py (1)

1-25: Uploader generate_input remains correct; unused args are for interface compatibility

The URL mapping and generate_input logic remain consistent with the rest of the suite: you set object.url from url_generators[size] and fill bucket.bucket/bucket.output from the provided arguments.

The extra parameters (data_dir, input_buckets, upload_func, nosql_func) are unused but needed for the common generate_input signature. If Ruff’s ARG001 warnings become noisy, you can optionally prefix them with _ to keep linters happy without changing behavior.

benchmarks/200.multimedia/220.video-processing/python/function.py (1)

16-66: Video-processing logic is unchanged; consider silencing unused event args

The ffmpeg wrapper and handler flow (download → ffmpeg op → upload + timing) look unchanged and correct. A few minor points:

  • The S603 subprocess warning is not concerning here: you invoke a fixed ffmpeg binary with shell=False, so there’s no shell injection surface beyond normal path handling.
  • The /tmp/... paths for intermediate files are standard for FaaS-style benchmarks; S108 is more of a generic hardening suggestion than a real issue in this context.
  • To silence unused-argument warnings for event while keeping the operations interface (video, duration, event), you can rename the parameter:
-def to_gif(video, duration, event):
+def to_gif(video, duration, _event):
@@
-def watermark(video, duration, event):
+def watermark(video, duration, _event):

No behavioral change, but tools like Ruff/flake8 will stop flagging ARG001 here.

Also applies to: 73-84, 114-121

benchmarks/000.microbenchmarks/020.network-benchmark/input.py (1)

5-12: Network-benchmark generate_input matches the common interface

The returned structure under "bucket" ("bucket" and "output") is consistent with how the UDP benchmark handler consumes its configuration. The extra parameters (data_dir, size, input_paths, upload_func, nosql_func) are unused but required for the shared generate_input signature; if desired, prefix them with _ to quiet ARG001 warnings without changing behavior.

.github/workflows/lint.yml (2)

19-19: Consider pinning a more specific Python version.

Using python-version: '3.x' will install the latest Python 3 release, which may introduce variability across runs. Consider pinning to a specific minor version (e.g., '3.11' or '3.12') for reproducible builds.

Apply this diff to pin to a specific version:

-          python-version: '3.x'
+          python-version: '3.12'

25-25: Cache key includes PR-specific reference.

The cache key includes ${{ github.ref_name }}, which means each PR branch will have its own cache. This prevents cache reuse across PRs and may increase build times. Consider whether the cache should be shared across branches.

If cache sharing is desired, you could remove the ref-specific portion:

-          key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}-${{ github.ref_name }}
+          key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}
           restore-keys: |
             venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}-

However, keeping branch-specific caches can be beneficial if different branches have incompatible dependencies.

benchmarks/400.inference/413.image-classification/python/init.sh (1)

3-10: Quote variables in cp to avoid issues with spaces or globbing

cp ${path} ${DIR} will misbehave if either contains spaces or wildcard characters. Quoting is a low-cost hardening:

- cp ${path} ${DIR}
+ cp "${path}" "${DIR}"
benchmarks/500.scientific/504.dna-visualisation/python/function.py (1)

20-36: Use a context manager when reading the downloaded file

data = open(download_path, "r").read() leaves the file handle to be closed by GC rather than deterministically.

Recommend:

with open(download_path, "r") as f:
    data = f.read()

This avoids leaking file descriptors with essentially no cost.

benchmarks/400.inference/413.image-classification/python/package.sh (1)

1-32: Harden packaging script (shebang, quoting, and find usage)

Non-blocking, but a few small tweaks would make this script more robust:

  • Add an explicit shebang so tools and CI know the target shell:
+#!/bin/bash
 # Stripping package code is based on https://github.com/ryfeus/lambda-packs repo
  • Actually use PACKAGE_DIR and quote it, guarding cd:
-PACKAGE_DIR=$1
-echo "Original size $(du -sh $1 | cut -f1)"
+PACKAGE_DIR=$1
+echo "Original size $(du -sh "${PACKAGE_DIR}" | cut -f1)"

 CUR_DIR=$(pwd)
-cd $1
+cd "${PACKAGE_DIR}" || exit 1
  • Likewise when returning:
-cd ${CUR_DIR}
+cd "${CUR_DIR}" || exit 1
  • Avoid xargs filename pitfalls by using -exec:
-find -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip
-find -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip
+find . -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -exec strip {} +
+find . -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -exec strip {} +
benchmarks/400.inference/411.image-recognition/python/function.py (1)

28-41: Tidy up file/resource handling and inference context

The handler logic looks correct and the timing breakdown is clear. A few non-blocking improvements:

  • Avoid leaving file descriptors to GC by using context managers:
# At module init
with open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r") as f:
    class_idx = json.load(f)
idx2label = [class_idx[str(k)][1] for k in range(len(class_idx))]
  • Likewise for the image:
process_begin = datetime.datetime.now()
with Image.open(image_path) as input_image:
    input_tensor = preprocess(input_image)
    input_batch = input_tensor.unsqueeze(0)
    with torch.no_grad():
        output = model(input_batch)
    _, index = torch.max(output, 1)
    ret = idx2label[index]
process_end = datetime.datetime.now()

Using torch.no_grad() also avoids unnecessary grad tracking during inference, which can slightly reduce memory and latency.

Also applies to: 65-80

benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/function.py (2)

13-36: Custom conv2d implementation is consistent for NHWC, stride 1

The explicit lax.dynamic_slice + lax.scan implementation matches a valid “valid” 2D convolution for NHWC with stride 1 and square kernels and should be fine for microbenchmarks, even if not as idiomatic as lax.conv_general_dilated.

If you ever want to compare against a reference implementation, you could add an un-jitted version using lax.conv_general_dilated purely for validation (not necessarily for benchmarking).


68-79: Initialization is fine for benchmarks, but note input name and NumPy RNG

Using NumPy’s default_rng(42) and dense random inputs/weights is appropriate for deterministic microbenchmarks. One minor nit: input shadows the built-in name; consider renaming to x or images if you touch this again.

benchmarks/400.inference/413.image-classification/config.json (1)

1-6: Config is minimal and consistent; just ensure 512 MB is sufficient

The config is straightforward and matches the Python-based image-classification benchmark using storage:

{ "timeout": 60, "memory": 512, "languages": ["python"], "modules": ["storage"] }

This looks fine; just confirm from runs that 512 MB memory is enough for PyTorch + model + dependencies in your target serverless platform.

benchmarks/400.inference/412.language-bert/python/requirements.txt.3.9 (1)

1-3: Pinned deps look reasonable; ensure 3.9 compatibility and keep variants in sync.

The pinned versions make sense for a GPU BERT pipeline, but please double‑check that this exact trio works with your Python 3.9 runtime and GPU drivers, and that it intentionally matches the base requirements.txt for this benchmark so they don’t drift over time.

benchmarks/600.linearalgebra/603.jacobi2d/python/requirements.txt (1)

1-1: Torch pin is clear; consider alignment with other torch-based benchmarks.

Pinning torch==2.4.1 is good for reproducibility. Since other new benchmarks (e.g., image-classification) pin a different Torch version, please confirm there’s a deliberate reason (e.g., tested/perf sweet spot here) and that all target Python/CUDA stacks support this version. If there’s no strong reason, aligning Torch versions across the linear algebra benchmarks could simplify maintenance.

benchmarks/400.inference/412.language-bert/python/requirements.txt (1)

1-3: Base requirements mirror the 3.9 variant; watch for drift.

These pins match the .3.9 file, which is good for consistency. Just be mindful that having two copies means future updates need to touch both; if you later add more per‑version files, you may want a simple generation script or a single source of truth to avoid accidental divergence.

benchmarks/400.inference/413.image-classification/python/requirements.txt.3.12 (1)

1-4: Check version strategy: ranges vs pins for reproducible benchmarks.

Torch/torchvision are strictly pinned, but numpy/pillow use >=, which allows environment drift over time. For a benchmark suite, you may prefer fully pinned versions to keep results comparable across runs; if you intentionally want flexibility here, that’s fine, just confirm this combination is supported on Python 3.12 and your target CUDA stack.

benchmarks/400.inference/413.recommendation/python/package.sh (1)

3-4: Remove unused variable assignment or use it consistently.

PACKAGE_DIR is assigned but never referenced; the script uses $1 directly instead. Either remove the variable declaration or replace $1 with $PACKAGE_DIR for consistency.

-PACKAGE_DIR=$1
-echo "DLRM GPU package size $(du -sh $1 | cut -f1)"
+PACKAGE_DIR=$1
+echo "DLRM GPU package size $(du -sh $PACKAGE_DIR | cut -f1)"
benchmarks/600.linearalgebra/603.jacobi2d/input.py (1)

1-7: Input generator is consistent with other linear algebra benchmarks

size_generators and generate_input() match the established pattern in 600.* (size lookup + fixed seed). Unused parameters are expected here due to the shared SEBS interface and don’t need to be removed; if Ruff is enforced, consider per-file ignores instead of changing the signature.

benchmarks/600.linearalgebra/601.matmul/input.py (1)

1-7: Matmul input generator matches project conventions

The size_generators mapping and generate_input() return shape are in line with other 600.* benchmarks (size + deterministic seed). Unused parameters are by design for the common interface; no change needed unless you want to quiet Ruff via ignores.

benchmarks/100.webapps/110.dynamic-html/input.py (1)

1-9: Dynamic-HTML input generator is simple and correct

generate_input() correctly maps the logical size to random_len and keeps the standard SEBS signature. Given inputs are framework-controlled, relying on size_generators[size] without extra validation is fine. Unused parameters are expected for this shared interface; suppressing Ruff is preferable to changing the signature.

benchmarks/600.linearalgebra/601.matmul/python/function.py (1)

6-62: Use gpu_ms in the result and clean up unused locals in handler

Functionally this works, but in handler:

  • C_out and gpu_ms are unused, which Ruff rightfully flags.
  • Other 600.* benchmarks expose gpu_time in the measurement, so you’re missing potentially useful data here.

A minimal alignment with the rest of the suite would be:

-    matmul_begin = datetime.datetime.now()
-    C_out, gpu_ms = kernel_gemm(alpha, beta, C, A, B, reps=1)
-    matmul_end = datetime.datetime.now()
+    matmul_begin = datetime.datetime.now()
+    _C_out, gpu_ms = kernel_gemm(alpha, beta, C, A, B, reps=1)
+    matmul_end = datetime.datetime.now()
@@
-        "measurement": {
-            "generating_time": matrix_generating_time,
-            "compute_time": matmul_time,
-        },
+        "measurement": {
+            "generating_time": matrix_generating_time,
+            "compute_time": matmul_time,
+            "gpu_time": gpu_ms,
+        },

Also, the seed variable computed inside the "seed" in event branch is currently unused; you can either remove that block or actually plumb the seed into tensor initialization if you plan to introduce randomness later.

benchmarks/400.inference/413.recommendation/input.py (1)

1-30: Recommendation input wiring looks coherent

buckets_count(), upload_files(), and generate_input() form a consistent trio: model goes to bucket 0, requests to bucket 1, and the returned cfg["object"] / cfg["bucket"] structure matches the usual pattern for inference benchmarks. Unused parameters in generate_input() are acceptable here because of the shared interface across benchmarks.

benchmarks/600.linearalgebra/603.jacobi2d/python/function.py (1)

6-72: Jacobi2D kernel looks good; tidy up unused outputs and dead seed logic

The Jacobi2D kernel and timing look sound, including the warmup and CUDA event usage. Two minor cleanups would make this sharper:

  • A_out and B_out from kernel_jacobi2d() are never used; prefix them with underscores to satisfy Ruff:
  • matmul_begin = datetime.datetime.now()
  • A_out, B_out, gpu_ms = kernel_jacobi2d(A, B, iters=50)
  • matmul_end = datetime.datetime.now()
  • matmul_begin = datetime.datetime.now()
  • _A_out, _B_out, gpu_ms = kernel_jacobi2d(A, B, iters=50)
  • matmul_end = datetime.datetime.now()
- The `seed` computed in the `"seed" in event` branch is currently unused. Either remove that block or, if you plan to introduce randomness later, thread the seed into initialization to match other benchmarks.

</blockquote></details>
<details>
<summary>benchmarks/400.inference/412.language-bert/input.py (2)</summary><blockquote>

`9-15`: **Object key construction in `upload_files`**

The traversal and relative-key logic look fine. If you want slightly cleaner keys at the root level, you could special‑case `prefix == "."` to avoid prefixes like `"./file"`, but this is cosmetic and safe to defer.

---

`18-33`: **Mark unused parameters in `generate_input` to satisfy Ruff**

`size`, `output_paths`, and `nosql_func` are unused but required by the common input-generator signature, hence Ruff’s `ARG001` warnings. Renaming them with a leading underscore keeps the API shape while silencing lint.


```diff
-def generate_input(
-    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
-):
+def generate_input(
+    data_dir,
+    _size,
+    benchmarks_bucket,
+    input_paths,
+    _output_paths,
+    upload_func,
+    _nosql_func,
+):
benchmarks/400.inference/413.image-classification/input.py (3)

10-15: Rename unused dirs loop variable

dirs is unused in the os.walk loop and triggers Ruff B007. Renaming to _dirs (or _) keeps intent clear and silences the warning.

-    for root, dirs, files in os.walk(data_dir):
+    for root, _dirs, files in os.walk(data_dir):

You may also consider normalizing keys so root-level files don’t get a "./" prefix, but that is purely cosmetic.


18-26: Docstring content is misleading for this benchmark

The free‑standing triple‑quoted comment still refers to a “compression test”, which doesn’t match this image‑classification benchmark. Consider updating or removing it to avoid confusion for future maintainers.


29-51: Unused parameters and workload-size handling in generate_input

  • size, output_paths, and nosql_func are unused (Ruff ARG001). As with other inputs, they exist for interface compatibility, so renaming them with leading underscores is a simple fix.
  • The function currently always selects the first image from val_map.txt; this is fine for a simple microbenchmark, but if you intend different workloads for different size values, you could extend this later.
-def generate_input(
-    data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
-):
+def generate_input(
+    data_dir,
+    _size,
+    benchmarks_bucket,
+    input_paths,
+    _output_paths,
+    upload_func,
+    _nosql_func,
+):
benchmarks/500.scientific/5xx.compute_jax_npbench/python/function.py (1)

24-62: Clarify event size expectation and jax.device_get exception handling

handler assumes event["size"] is always present: size, M, and N are only defined under the "size" in event branch but are later used unconditionally (including in the returned payload). This matches patterns in other NPBench handlers, but it does mean a missing "size" key would raise at runtime. If the driver always supplies size, that’s fine; otherwise you may want an explicit assertion or default.

The broad try/except Exception: pass around jax.device_get also mirrors existing code but triggers Ruff S110/BLE001. If you want to keep lint clean, consider either narrowing the exception to the specific JAX error types you expect, or adding an inline # noqa: BLE001,S110 with a short comment explaining why silent failure on host transfer is acceptable here.

benchmarks/400.inference/413.recommendation/python/function.py (3)

47-52: Unreachable return in _select_device

Given the explicit raise RuntimeError("CUDA is not available"), the final return torch.device("cpu") is never reached. If the intent is to enforce CUDA for this benchmark, you can safely drop that line; if you want a CPU fallback for local testing, you’ll need to reorder the logic accordingly.


54-82: Temporary model file handling and unused MODEL_CACHE

_load_model currently:

  • Ensures MODEL_CACHE exists but never actually uses it to store the model.
  • Writes the checkpoint to /tmp/<uuid>-dlrm_tiny.pt and deletes it after loading.

This is functionally fine for a benchmark, but:

  • You can either remove MODEL_CACHE and the os.makedirs call, or start writing tmp_path into that directory if you want a persistent on-disk cache.
  • Ruff flags hard-coded /tmp as S108; if you care about linting/hardening, consider using tempfile.NamedTemporaryFile(delete=False) or TemporaryDirectory instead.

97-144: Handler event assumptions, /tmp usage, and zip(strict=)

A few small points in handler:

  • Event shape: bucket, model_prefix, requests_prefix, and requests_key are all fetched via nested .get calls and assumed to be present. If the harness always provides them, that’s fine; otherwise, you may want an assertion or clearer error if any are None.
  • Temporary request file: req_path lives under /tmp, which triggers Ruff S108. For stricter hygiene, you could switch to tempfile.NamedTemporaryFile or similar instead of manual /tmp paths.
  • Iteration over predictions: Ruff B905 recommends making the zip strict so mismatched lengths don’t silently truncate:
-    for req, score in zip(payloads, scores):
+    for req, score in zip(payloads, scores, strict=True):

Also note that your aggregate download_time and compute_time already include the model’s download and processing times, which are additionally exposed as model_download_time and model_time. That’s fine as long as downstream consumers expect the aggregation to double-count those phases.

benchmarks/600.linearalgebra/604.cholesky/python/function.py (2)

18-31: Confirm intended scaling of Cholesky repetitions

kernel_cholesky runs Cholesky A.size(0) times inside the timed region. For large N, this deliberately amplifies GPU work, which is reasonable for a microbenchmark, but it does mean runtime scales quadratically with N. Just confirm this repetition count matches your intended workload.


34-61: Suppress unused-L lint in handler

In handler, the unpacked L from kernel_cholesky isn’t used, which triggers Ruff RUF059. You can keep the call exactly as-is and simply mark the variable as intentionally unused.

-    L, gpu_ms = kernel_cholesky(A)
+    _L, gpu_ms = kernel_cholesky(A)

Also, the computed seed variable is currently unused beyond seeding Python’s random; if you don’t need reproducible PyTorch RNG, you can simplify that block or extend it to seed torch as well.

benchmarks/600.linearalgebra/605.lu/python/function.py (1)

53-59: Clarify unused outputs in handler (and dead seed logic)

handler unpacks B, gpu_ms = kernel(A) but never uses B, and the local seed derived from event["seed"] is also unused. This is harmless but slightly confusing in a benchmarking context.

You can make the intent explicit and silence linters by ignoring the unused result:

-    if "seed" in event:
-        import random
-
-        random.seed(event["seed"])
-
-        seed = event.get("seed", 42)
-        seed = int(seed)
+    if "seed" in event:
+        import random
+
+        # Keep seeding here only if future randomness will depend on it
+        random.seed(int(event["seed"]))
@@
-    comp_begin = datetime.datetime.now()
-    B, gpu_ms = kernel(A)
+    comp_begin = datetime.datetime.now()
+    _, gpu_ms = kernel(A)

If you don’t plan to use the Python RNG at all, you can drop the whole if "seed" in event block.

Also applies to: 66-66

benchmarks/400.inference/412.language-bert/python/function.py (1)

133-144: Optional: make the zip contract explicit with strict=

for sentence, probs in zip(sentences, probabilities): assumes sentences and probabilities always have the same length (which they should). To make this explicit and avoid silent truncation if shapes ever diverge, you could use:

-    for sentence, probs in zip(sentences, probabilities):
+    for sentence, probs in zip(sentences, probabilities, strict=True):

This will raise if something upstream goes wrong, which is often preferable for benchmarks.

Comment on lines +39 to +42
- name: Black (check)
run: |
. python-venv/bin/activate
black benchmarks --check --config .black.toml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify Black is only intended to run on benchmarks/.

The Black check only targets the benchmarks directory, but the pre-commit configuration in .pre-commit-config.yaml applies to both sebs/ and benchmarks/. Consider whether the workflow should also check sebs/ for consistency.

Run the following script to check if sebs/ contains Python files that should be formatted:


🏁 Script executed:

#!/bin/bash
# Description: Check if sebs/ directory contains Python files

fd -e py . sebs/ --max-results 10

Length of output: 198


I've confirmed that sebs/ contains Python files. Now I need to verify the pre-commit configuration to see if it actually applies Black to both directories.


---

🏁 Script executed:

```shell
#!/bin/bash
# Description: Check .pre-commit-config.yaml for Black configuration scope

cat .pre-commit-config.yaml | grep -A 20 "black"

Length of output: 506


Update the Black check in the lint workflow to include the sebs/ directory.

The pre-commit configuration applies Black to files: ^(sebs/|benchmarks/), but the workflow only checks benchmarks/. The sebs/ directory contains multiple Python files and should be included in the CI check for consistency.

Update line 42 to:

black sebs benchmarks --check --config .black.toml
🤖 Prompt for AI Agents
.github/workflows/lint.yml around lines 39 to 42: the Black check currently only
runs against the benchmarks/ directory while pre-commit targets both sebs/ and
benchmarks/; update the workflow to run Black against both directories by
changing the command to invoke Black on sebs and benchmarks with --check and the
.black.toml config so CI matches pre-commit.

Comment on lines 47 to +57
if consecutive_failures != 5:
with open('/tmp/data.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(["id", "client_send", "client_rcv"])
with open("/tmp/data.csv", "w", newline="") as csvfile:
writer = csv.writer(csvfile, delimiter=",")
writer.writerow(["id", "client_send", "client_rcv"])
for row in times:
writer.writerow(row)

client = storage.storage.get_instance()
filename = 'results-{}.csv'.format(request_id)
key = client.upload(output_bucket, os.path.join(output_prefix, filename), '/tmp/data.csv')
filename = "results-{}.csv".format(request_id)
key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv")

return { 'result': key }
return {"result": key}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix uninitialized key when connection setup fails

If the UDP connection fails 5 times, consecutive_failures becomes 5, the loop breaks, and the if consecutive_failures != 5: block is skipped. In that path key is never assigned, but you still return {"result": key}, which will raise an UnboundLocalError.

Initialize key before the conditional (or add an else branch) so the failure path is well-defined:

-    server_socket.close()
-
-    if consecutive_failures != 5:
+    server_socket.close()
+
+    key = None
+    if consecutive_failures != 5:
         with open("/tmp/data.csv", "w", newline="") as csvfile:
             writer = csv.writer(csvfile, delimiter=",")
             writer.writerow(["id", "client_send", "client_rcv"])
             for row in times:
                 writer.writerow(row)
 
         client = storage.storage.get_instance()
         filename = "results-{}.csv".format(request_id)
         key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv")
 
     return {"result": key}

The Ruff S108 warnings about /tmp/data.csv are expected in this serverless benchmark context; using a fixed path under /tmp is generally fine here, so I’d treat that as optional to change.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if consecutive_failures != 5:
with open('/tmp/data.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(["id", "client_send", "client_rcv"])
with open("/tmp/data.csv", "w", newline="") as csvfile:
writer = csv.writer(csvfile, delimiter=",")
writer.writerow(["id", "client_send", "client_rcv"])
for row in times:
writer.writerow(row)
client = storage.storage.get_instance()
filename = 'results-{}.csv'.format(request_id)
key = client.upload(output_bucket, os.path.join(output_prefix, filename), '/tmp/data.csv')
filename = "results-{}.csv".format(request_id)
key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv")
return { 'result': key }
return {"result": key}
key = None
if consecutive_failures != 5:
with open("/tmp/data.csv", "w", newline="") as csvfile:
writer = csv.writer(csvfile, delimiter=",")
writer.writerow(["id", "client_send", "client_rcv"])
for row in times:
writer.writerow(row)
client = storage.storage.get_instance()
filename = "results-{}.csv".format(request_id)
key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv")
return {"result": key}
🧰 Tools
🪛 Ruff (0.14.5)

47-47: Probable insecure usage of temporary file or directory: "/tmp/data.csv"

(S108)


55-55: Probable insecure usage of temporary file or directory: "/tmp/data.csv"

(S108)

🤖 Prompt for AI Agents
In benchmarks/000.microbenchmarks/020.network-benchmark/python/function.py
around lines 46-57, the variable `key` can be uninitialized if
`consecutive_failures == 5` (the upload block is skipped) causing an
UnboundLocalError on `return {"result": key}`; fix by defining `key` before the
conditional (e.g., `key = None` or an empty string) or add an `else` branch that
sets `key` to a sentinel/error value, then ensure the returned value is always
defined and meaningful for the failure path.

Comment on lines +40 to +41
# Number of ints in the working set
n = max(1, working_set_bytes // 4)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix working-set size calculation to match actual tensor memory usage

next_idx is allocated as torch.long, which is typically 8 bytes per element, but n is derived assuming 4 bytes:

# Number of ints in the working set
n = max(1, working_set_bytes // 4)

This means the actual working set is about twice as large as requested, which distorts the cache‑latency measurements.

You can compute n from the actual element size instead:

-    # Number of ints in the working set
-    n = max(1, working_set_bytes // 4)
+    # Number of indices in the working set, based on the actual element size of torch.long
+    elem_bytes = torch.tensor(0, dtype=torch.long).element_size()
+    n = max(1, working_set_bytes // elem_bytes)

This keeps the benchmark’s working_set_bytes parameter consistent with the true memory footprint across platforms.

🤖 Prompt for AI Agents
benchmarks/000.microbenchmarks/050.gpu-cache-latency/python/function.py around
lines 40 to 41, the working-set size n is computed using 4 bytes per element but
next_idx is a torch.long (8 bytes on typical platforms), so the allocated memory
is roughly double the requested size; change the calculation to derive n from
the actual tensor element size (query the dtype element size for torch.long at
runtime and use working_set_bytes // element_size), keeping the max(1, ...)
guard so the benchmark's working_set_bytes parameter matches the real memory
footprint.

Comment on lines +1 to +2
torch
numpy No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Pin dependency versions for reproducible microbenchmarks.

Unpinned torch and numpy dependencies prevent reproducible benchmark results. Since this is a performance microbenchmark measuring GPU cache latency, version differences can significantly impact the measurements.

Apply this diff to pin specific versions:

-torch
-numpy
+torch==2.5.1
+numpy==2.0.2

Verify the latest stable versions and check for security advisories:


🏁 Script executed:

#!/bin/bash
# Check PyPI for latest versions
echo "=== PyTorch latest version ==="
curl -s https://pypi.org/pypi/torch/json | jq -r '.info.version'

echo "=== NumPy latest version ==="
curl -s https://pypi.org/pypi/numpy/json | jq -r '.info.version'

# Check for security vulnerabilities
echo "=== PyTorch security advisories ==="
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "torch") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

echo "=== NumPy security advisories ==="
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "numpy") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 2330


Pin dependency versions to secure, stable releases, not outdated vulnerable versions.

The suggested versions in the original review are problematic. PyTorch 2.5.1 is vulnerable to a CRITICAL remote code execution vulnerability in torch.load (CVE affecting versions < 2.6.0, patched in 2.6.0). Pinning to this version would actually introduce known security risks.

Pin to secure, current versions instead:

-torch
-numpy
+torch==2.9.1
+numpy==2.3.5

These versions are current, stable, and free from known critical vulnerabilities. The older versions suggested elsewhere contain exploitable security issues.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
torch
numpy
torch==2.9.1
numpy==2.3.5
🤖 Prompt for AI Agents
In benchmarks/000.microbenchmarks/050.gpu-cache-latency/python/requirements.txt
(lines 1-2) the dependencies are unpinned which risks installing
outdated/vulnerable releases (notably torch versions <2.6.0 are vulnerable to
RCE via torch.load); update the file to pin to secure, supported releases — at
minimum use torch >= 2.6.0 (or the latest stable torch >=2.6.x) and pin numpy to
a current stable release (e.g., numpy >= 1.25.0 or the project’s chosen
supported minor), and include exact version specifiers (== or >= with upper
bounds) so CI installs known-safe versions and rerun tests to confirm
compatibility.

Comment on lines +18 to +26
"""
Generate test, small and large workload for compression test.
:param data_dir: directory where benchmark data is placed
:param size: workload size
:param input_buckets: input storage containers for this benchmark
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
"""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Move docstring inside the function.

The docstring is currently floating between functions rather than being placed inside generate_input. Python docstrings must be the first statement after a function definition to be recognized by documentation tools.

Apply this diff to correctly place the docstring:

-
-"""
-    Generate test, small and large workload for compression test.
-
-    :param data_dir: directory where benchmark data is placed
-    :param size: workload size
-    :param input_buckets: input storage containers for this benchmark
-    :param output_buckets:
-    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
-"""
-
-
 def generate_input(
     data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
 ):
+    """
+    Generate test, small and large workload for compression test.
+
+    :param data_dir: directory where benchmark data is placed
+    :param size: workload size
+    :param input_buckets: input storage containers for this benchmark
+    :param output_buckets:
+    :param upload_func: upload function taking three params(bucket_idx, key, filepath)
+    """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""
Generate test, small and large workload for compression test.
:param data_dir: directory where benchmark data is placed
:param size: workload size
:param input_buckets: input storage containers for this benchmark
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
'''
def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func):
"""
def generate_input(
data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
):
"""
Generate test, small and large workload for compression test.
:param data_dir: directory where benchmark data is placed
:param size: workload size
:param input_buckets: input storage containers for this benchmark
:param output_buckets:
:param upload_func: upload function taking three params(bucket_idx, key, filepath)
"""
🤖 Prompt for AI Agents
In benchmarks/300.utilities/311.compression/input.py around lines 18 to 26, the
module-level string currently between functions should be moved so it becomes
the docstring for the generate_input function; open the generate_input
definition and place that triple-quoted string immediately as the first
statement inside the function body (remove the floating string from its current
position), preserving the text and indentation so Python and documentation tools
recognize it as the function docstring.

@@ -0,0 +1 @@
jax[cuda12] No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Pin the JAX version for reproducible builds.

The unpinned jax[cuda12] dependency can lead to non-reproducible benchmark results, as different JAX versions may introduce performance changes or breaking API changes.

Apply this diff to pin a specific version:

-jax[cuda12]
+jax[cuda12]==0.4.35

Verify the latest stable JAX version and check for security advisories:


🏁 Script executed:

#!/bin/bash
# Check PyPI for the latest JAX version
curl -s https://pypi.org/pypi/jax/json | jq -r '.info.version'

# Check for security vulnerabilities
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: PIP, package: "jax") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 55


Update JAX version pin to a more recent stable release.

Pinning JAX for reproducible builds is essential. However, the suggested version 0.4.35 is significantly outdated. The latest stable JAX version is 0.8.1, and there are no known security vulnerabilities. Update the pin to:

-jax[cuda12]
+jax[cuda12]==0.8.1

Verify this version is compatible with your CUDA 12 environment and benchmark requirements before merging.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jax[cuda12]
jax[cuda12]==0.8.1
🤖 Prompt for AI Agents
In
benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/requirements.txt
lines 1-1, the JAX dependency is unpinned; update the pin to the newer stable
release by replacing the current unpinned entry with the explicit version that
targets CUDA 12 (e.g., jax[cuda12]==0.8.1). Ensure the pinned version is
compatible with the target CUDA 12 toolchain and any other benchmark
dependencies before committing.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify torch version currency and security.

The pinned version approach is good for reproducibility. However, torch==2.4.1 differs from other benchmarks using torch==2.2.2 and torch==2.5.1. Ensure this version is intentional and verify it's current and secure.


🏁 Script executed:

#!/bin/bash
# Check if torch 2.4.1 is current
echo "=== PyTorch latest version ==="
curl -s https://pypi.org/pypi/torch/json | jq -r '.info.version'

# Check for security vulnerabilities affecting 2.4.1
echo "=== PyTorch security advisories ==="
gh api graphql -f query='
{
  securityVulnerabilities(first: 10, ecosystem: PIP, package: "torch") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 1635


Update torch version to address critical security vulnerabilities.

torch==2.4.1 contains at least two CRITICAL security vulnerabilities with available patches:

  1. "torch.load with weights_only=True leads to remote code execution" (patched in 2.6.0)
  2. "Improper Resource Shutdown or Release vulnerability" (patched in 2.8.0)

Additionally, the latest stable version is 2.9.1. Consider updating to at least 2.6.0 or later to address these security issues, while maintaining consistency with other benchmarks in the repository.

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/601.matmul/python/requirements.txt around lines
1 to 1, the pinned torch==2.4.1 has known critical security issues; update the
requirement to a patched release (minimum torch>=2.6.0, preferably torch==2.9.1
to match latest stable) by replacing the version specifier accordingly, then run
project tests/benchmarks and CI to verify compatibility, update any related
requirements/lock files across the repo to maintain consistency, and regenerate
dependency locks or Docker images if present.

Comment on lines +6 to +59
def initialize_torch(N, dtype=torch.float32, device="cuda", seed=42):
if seed is not None:
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
alpha = torch.randn((), dtype=dtype, device=device)
x = torch.randn(N, dtype=dtype, device=device)
y = torch.randn(N, dtype=dtype, device=device)
return alpha, x, y


def kernel_axpy(alpha, x, y, reps=100):
torch.cuda.synchronize()
_ = alpha * x + y # warmup
torch.cuda.synchronize()

start_evt = torch.cuda.Event(enable_timing=True)
end_evt = torch.cuda.Event(enable_timing=True)
start_evt.record()
for _ in range(reps):
y = alpha * x + y
end_evt.record()
torch.cuda.synchronize()
gpu_ms = float(start_evt.elapsed_time(end_evt))
return y, gpu_ms


def handler(event):
size = event.get("size")
if "seed" in event:
import random

random.seed(event["seed"])

seed = event.get("seed", 42)
seed = int(seed)

gen_begin = datetime.datetime.now()
alpha, x, y = initialize_torch(size, dtype=torch.float32, device="cuda", seed=seed)
gen_end = datetime.datetime.now()

comp_begin = datetime.datetime.now()
y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
comp_end = datetime.datetime.now()

gen_us = (gen_end - gen_begin) / datetime.timedelta(microseconds=1)
comp_us = (comp_end - comp_begin) / datetime.timedelta(microseconds=1)

return {
"measurement": {
"generating_time": gen_us,
"compute_time": comp_us,
"gpu_time": gpu_ms,
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix potential seed UnboundLocalError and unused y_out in handler

In handler, seed is only defined inside the "seed" in event branch, but it’s always used in the call to initialize_torch(...). If event ever omits "seed", this will raise an UnboundLocalError.

You can mirror the safer pattern used in other 600.* handlers by providing a default:

 def handler(event):
     size = event.get("size")
     if "seed" in event:
         import random
 
         random.seed(event["seed"])
 
         seed = event.get("seed", 42)
         seed = int(seed)
+    else:
+        seed = 42
@@
-    comp_begin = datetime.datetime.now()
-    y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
-    comp_end = datetime.datetime.now()
+    comp_begin = datetime.datetime.now()
+    _y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
+    comp_end = datetime.datetime.now()

This both removes the potential runtime error and satisfies Ruff’s warning about the unused y_out.

🧰 Tools
🪛 Ruff (0.14.5)

47-47: Unpacked variable y_out is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/602.axpy/python/function.py around lines 6 to
59, handler() may reference seed without it being defined if "seed" is absent
and also y_out is unused; initialize seed with a default before the conditional
(e.g., seed = int(event.get("seed", 42))) or set seed = 42 then override if
present, and suppress the unused variable by assigning the kernel result to _
(e.g., _, gpu_ms = kernel_axpy(...)) or reuse y_out into y to remove the Ruff
warning.

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

CRITICAL: PyTorch 2.4.1 has multiple severe security vulnerabilities.

Static analysis has identified 5 critical security vulnerabilities in torch 2.4.1, including:

Upgrade to the latest patched version of PyTorch to address these vulnerabilities.


🌐 Web query:

What is the latest secure version of PyTorch and when was 2.4.1 patched?

💡 Result:

  • Latest stable PyTorch: 2.7.0 (PyTorch 2.7 release). [1]

  • 2.4.1 patch: the Debian package incorporating fixes was published as version 2.4.1-1 (uploaded Sep 29, 2024). [2]

  • Relevant security advisory: the remote-code-execution issue in torch.load (patched upstream in the 2.6.0+ line) is documented in the PyTorch GHSA/CVE advisories. [3]

Sources:
[1] PyTorch 2.7 release.
[2] Debian upload / changelog showing pytorch 2.4.1-1 (Sep 29, 2024).
[3] PyTorch security advisory / PYSEC entry (GHSA-53q9-r3pm-6pq6 / CVE-2025-32434) and related CVE discussion.


I need to gather more specific information about the vulnerabilities and whether they affect PyTorch 2.4.1 directly.

I need to verify the other vulnerabilities mentioned in the review. Let me search for them specifically.

Upgrade to PyTorch 2.6.0 or later to address a critical RCE vulnerability.

PyTorch 2.4.1 is affected by CVE-2025-32434, a critical Remote Command Execution vulnerability in the torch.load() function when used with weights_only=True. This vulnerability has been patched in PyTorch version 2.6.0 or higher. The vulnerability received a CVSS score of 9.8 (Critical), and any application that loads PyTorch models could be at risk if an attacker can supply a malicious .pt model file.

🧰 Tools
🪛 OSV Scanner (2.2.4)

[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2024-259)


[CRITICAL] 1-1: torch 2.4.1: undefined

(PYSEC-2025-41)


[CRITICAL] 1-1: torch 2.4.1: PyTorch susceptible to local Denial of Service

(GHSA-3749-ghw9-m3mg)


[CRITICAL] 1-1: torch 2.4.1: PyTorch: torch.load with weights_only=True leads to remote code execution

(GHSA-53q9-r3pm-6pq6)


[CRITICAL] 1-1: torch 2.4.1: PyTorch Improper Resource Shutdown or Release vulnerability

(GHSA-887c-mr87-cxwp)

@@ -0,0 +1 @@
torch==2.4.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify PyTorch version validity and security advisories.

Please ensure that torch==2.4.1 is a valid version and check for any known security vulnerabilities.

Run the following script to verify the PyTorch version:


🏁 Script executed:

#!/bin/bash
# Description: Verify torch==2.4.1 exists and check for security advisories

# Check PyPI for available torch versions
curl -s https://pypi.org/pypi/torch/json | jq -r '.releases | keys[]' | grep "^2\.4\."

# Check for security advisories
gh api graphql -f query='
{
  securityVulnerabilities(first: 10, ecosystem: PIP, package: "torch") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 1574


Upgrade torch to address multiple security vulnerabilities.

torch==2.4.1 is a valid version, but it has three known security vulnerabilities:

  1. CRITICAL: Remote code execution via torch.load with weights_only=True (affecting versions < 2.6.0) — fixed in 2.6.0
  2. MODERATE: Improper resource shutdown or release (affecting versions ≤ 2.7.1) — fixed in 2.8.0
  3. LOW: Local denial of service (affecting versions < 2.7.1-rc1) — fixed in 2.7.1-rc1

Upgrade to torch 2.6.0 or later to address the critical RCE vulnerability, or 2.8.0 to address all known vulnerabilities.

🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/605.lu/python/requirements.txt lines 1-1 the
project pins torch==2.4.1 which contains multiple security vulnerabilities
(including a CRITICAL RCE fixed in 2.6.0); update the requirement to a safe
version (recommend torch==2.8.0 to cover all listed fixes) by replacing the
version pin, then regenerate any lock files or dependency manifests and run the
test suite/CI to verify compatibility and rebuild any containers or artifacts
that install dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants