-
Notifications
You must be signed in to change notification settings - Fork 93
Jessie/microbenchmarks #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Feature/bert inference
Hotfix/code quality on benchmarks
sync: image-classification requirements + add 605.lu benchmark - Resolve conflicts in 413.image-classification/python/requirements* - Drop py3.6/py3.7 variants removed upstream; keep/update 3.8–3.11 - Add new 600.linearalgebra/605.lu benchmark (config, input, function, reqs) - Rename local_deployment.tmp -> 600.linearalgebra/605.lu/config.json - Update local_deployment.json; add out_benchmark*.json; update out_storage.json
Removed configuration details for MinIO and ScyllaDB.
Removed sensitive SSH private key from eval command.
Merge benchmarks by Russell and Yuxuan into Development
This reverts commit 4fca4aa.
added recommender benchmark
WalkthroughRemoves CircleCI config, adds a GitHub Actions lint workflow, updates submodule URL and mypy settings, introduces pre-commit and VSCode settings, and adds many new/updated benchmarks (GPU pointer‑chase, ONNX inference, JAX workloads, PyTorch linear‑algebra plus supporting configs, scripts, and requirements). Many files also have stylistic quote/formatting changes. Changes
Sequence Diagram(s)sequenceDiagram
actor GitHub_Actions
participant Repo
participant PythonEnv
rect rgb(240, 245, 255)
Note over GitHub_Actions,Repo: Lint workflow (new)
GitHub_Actions->>Repo: checkout
GitHub_Actions->>PythonEnv: setup python, cache deps
PythonEnv->>PythonEnv: install requirements via install.py
PythonEnv->>Repo: run Black --check, run Flake8 -> artifact
PythonEnv-->>GitHub_Actions: upload flake report
end
sequenceDiagram
participant Handler
participant Storage
participant ONNX
participant GPU
rect rgb(235, 245, 235)
Note over Handler,Storage: ONNX inference (language/image)
Handler->>Storage: download model archive / inputs
Storage-->>Handler: model archive, input file(s)
Handler->>Handler: extract if needed, _ensure_model (lazy)
Handler->>ONNX: create/session load (CUDA provider)
ONNX->>GPU: initialize on device (if CUDA)
Handler->>ONNX: prepare inputs (tokenize / preprocess)
ONNX-->>Handler: logits / outputs
Handler->>Handler: postprocess (softmax, top-k, map labels)
Handler-->>Storage: optionally upload results
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
add cache latency benchmarks |
This document provides detailed instructions for running the GPU Cache Latency benchmark, including system requirements, build steps, and example output.
This readme provides detailed instructions for running the GPU Cache Latency benchmark, including system requirements, build steps, and example commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 28
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
benchmarks/000.microbenchmarks/030.clock-synchronization/python/function.py (1)
35-75: Fix inconsistent failure handling after changing timeout thresholdInside the loop you now abort after 7 consecutive timeouts:
if consecutive_failures == 7: print("Can't setup the connection") breakbut after the loop you still use
if consecutive_failures != 5:to decide whether to write/upload results. This makes the “failure”elsebranch effectively unreachable, and even in the pure-failure case (7 timeouts, no data) you still write and upload a CSV (likely empty).A cleaner approach is to drive this off whether you actually collected any measurements:
- if consecutive_failures != 5: + if times: with open("/tmp/data.csv", "w", newline="") as csvfile: writer = csv.writer(csvfile, delimiter=",") writer.writerow(["id", "client_send", "client_rcv"]) for row in times: writer.writerow(row) client = storage.storage.get_instance() filename = "results-{}.csv".format(request_id) key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv") - else: - key = None + else: + key = NoneThis restores the intended behavior: only upload when you have data; otherwise return
key = None, regardless of how many consecutive failures occurred.
♻️ Duplicate comments (1)
benchmarks/400.inference/413.recommendation/python/requirements.txt.3.11 (1)
1-1: Verify PyTorch 2.2.2 for security vulnerabilities.Same concern as requirements.txt.3.10: ensure torch 2.2.2 is still supported and free from known security vulnerabilities.
What security vulnerabilities exist for PyTorch 2.2.2?
🧹 Nitpick comments (38)
benchmarks/400.inference/413.image-classification/python/requirements.txt.3.9 (1)
1-2: Consider adding upper bounds to version constraints.The dependencies numpy>=2.0 and pillow>=10.0 lack upper bounds, which could lead to unexpected breaking changes when new major versions are released. Consider using constraints like
numpy>=2.0,<3.0andpillow>=10.0,<11.0for more predictable builds.benchmarks/000.microbenchmarks/040.server-reply/input.py (1)
1-11: Input mapping is consistent; unused parameters are expected interface baggage
size_generatorsplusgenerate_inputreturning{"sleep": size_generators[size]}matches the patterns in other microbenchmarks and keeps the sharedgenerate_inputsignature.The unused parameters (
data_dir,benchmarks_bucket,input_paths,output_paths,upload_func,nosql_func) are expected here to conform to the common interface; Ruff’s ARG001 warnings can be ignored or, if you prefer, silenced by prefixing them with_(e.g.,_data_dir) without changing behavior.benchmarks/100.webapps/120.uploader/input.py (1)
1-25: Uploader generate_input remains correct; unused args are for interface compatibilityThe URL mapping and
generate_inputlogic remain consistent with the rest of the suite: you setobject.urlfromurl_generators[size]and fillbucket.bucket/bucket.outputfrom the provided arguments.The extra parameters (
data_dir,input_buckets,upload_func,nosql_func) are unused but needed for the commongenerate_inputsignature. If Ruff’s ARG001 warnings become noisy, you can optionally prefix them with_to keep linters happy without changing behavior.benchmarks/200.multimedia/220.video-processing/python/function.py (1)
16-66: Video-processing logic is unchanged; consider silencing unusedeventargsThe ffmpeg wrapper and handler flow (download → ffmpeg op → upload + timing) look unchanged and correct. A few minor points:
- The S603 subprocess warning is not concerning here: you invoke a fixed ffmpeg binary with
shell=False, so there’s no shell injection surface beyond normal path handling.- The
/tmp/...paths for intermediate files are standard for FaaS-style benchmarks; S108 is more of a generic hardening suggestion than a real issue in this context.- To silence unused-argument warnings for
eventwhile keeping the operations interface(video, duration, event), you can rename the parameter:-def to_gif(video, duration, event): +def to_gif(video, duration, _event): @@ -def watermark(video, duration, event): +def watermark(video, duration, _event):No behavioral change, but tools like Ruff/flake8 will stop flagging ARG001 here.
Also applies to: 73-84, 114-121
benchmarks/000.microbenchmarks/020.network-benchmark/input.py (1)
5-12: Network-benchmark generate_input matches the common interfaceThe returned structure under
"bucket"("bucket"and"output") is consistent with how the UDP benchmark handler consumes its configuration. The extra parameters (data_dir,size,input_paths,upload_func,nosql_func) are unused but required for the sharedgenerate_inputsignature; if desired, prefix them with_to quiet ARG001 warnings without changing behavior..github/workflows/lint.yml (2)
19-19: Consider pinning a more specific Python version.Using
python-version: '3.x'will install the latest Python 3 release, which may introduce variability across runs. Consider pinning to a specific minor version (e.g.,'3.11'or'3.12') for reproducible builds.Apply this diff to pin to a specific version:
- python-version: '3.x' + python-version: '3.12'
25-25: Cache key includes PR-specific reference.The cache key includes
${{ github.ref_name }}, which means each PR branch will have its own cache. This prevents cache reuse across PRs and may increase build times. Consider whether the cache should be shared across branches.If cache sharing is desired, you could remove the ref-specific portion:
- key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}-${{ github.ref_name }} + key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('requirements.txt') }} restore-keys: | venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}-However, keeping branch-specific caches can be beneficial if different branches have incompatible dependencies.
benchmarks/400.inference/413.image-classification/python/init.sh (1)
3-10: Quote variables incpto avoid issues with spaces or globbing
cp ${path} ${DIR}will misbehave if either contains spaces or wildcard characters. Quoting is a low-cost hardening:- cp ${path} ${DIR} + cp "${path}" "${DIR}"benchmarks/500.scientific/504.dna-visualisation/python/function.py (1)
20-36: Use a context manager when reading the downloaded file
data = open(download_path, "r").read()leaves the file handle to be closed by GC rather than deterministically.Recommend:
with open(download_path, "r") as f: data = f.read()This avoids leaking file descriptors with essentially no cost.
benchmarks/400.inference/413.image-classification/python/package.sh (1)
1-32: Harden packaging script (shebang, quoting, andfindusage)Non-blocking, but a few small tweaks would make this script more robust:
- Add an explicit shebang so tools and CI know the target shell:
+#!/bin/bash # Stripping package code is based on https://github.com/ryfeus/lambda-packs repo
- Actually use
PACKAGE_DIRand quote it, guardingcd:-PACKAGE_DIR=$1 -echo "Original size $(du -sh $1 | cut -f1)" +PACKAGE_DIR=$1 +echo "Original size $(du -sh "${PACKAGE_DIR}" | cut -f1)" CUR_DIR=$(pwd) -cd $1 +cd "${PACKAGE_DIR}" || exit 1
- Likewise when returning:
-cd ${CUR_DIR} +cd "${CUR_DIR}" || exit 1
- Avoid
xargsfilename pitfalls by using-exec:-find -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip -find -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" | xargs strip +find . -name "*.so" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -exec strip {} + +find . -name "*.so.*" -not -path "*/PIL/*" -not -path "*/Pillow.libs/*" -not -path "*libgfortran*" -exec strip {} +benchmarks/400.inference/411.image-recognition/python/function.py (1)
28-41: Tidy up file/resource handling and inference contextThe handler logic looks correct and the timing breakdown is clear. A few non-blocking improvements:
- Avoid leaving file descriptors to GC by using context managers:
# At module init with open(os.path.join(SCRIPT_DIR, "imagenet_class_index.json"), "r") as f: class_idx = json.load(f) idx2label = [class_idx[str(k)][1] for k in range(len(class_idx))]
- Likewise for the image:
process_begin = datetime.datetime.now() with Image.open(image_path) as input_image: input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) with torch.no_grad(): output = model(input_batch) _, index = torch.max(output, 1) ret = idx2label[index] process_end = datetime.datetime.now()Using
torch.no_grad()also avoids unnecessary grad tracking during inference, which can slightly reduce memory and latency.Also applies to: 65-80
benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/function.py (2)
13-36: Custom conv2d implementation is consistent for NHWC, stride 1The explicit
lax.dynamic_slice+lax.scanimplementation matches a valid “valid” 2D convolution for NHWC with stride 1 and square kernels and should be fine for microbenchmarks, even if not as idiomatic aslax.conv_general_dilated.If you ever want to compare against a reference implementation, you could add an un-jitted version using
lax.conv_general_dilatedpurely for validation (not necessarily for benchmarking).
68-79: Initialization is fine for benchmarks, but noteinputname and NumPy RNGUsing NumPy’s
default_rng(42)and dense random inputs/weights is appropriate for deterministic microbenchmarks. One minor nit:inputshadows the built-in name; consider renaming toxorimagesif you touch this again.benchmarks/400.inference/413.image-classification/config.json (1)
1-6: Config is minimal and consistent; just ensure 512 MB is sufficientThe config is straightforward and matches the Python-based image-classification benchmark using storage:
{ "timeout": 60, "memory": 512, "languages": ["python"], "modules": ["storage"] }This looks fine; just confirm from runs that 512 MB memory is enough for PyTorch + model + dependencies in your target serverless platform.
benchmarks/400.inference/412.language-bert/python/requirements.txt.3.9 (1)
1-3: Pinned deps look reasonable; ensure 3.9 compatibility and keep variants in sync.The pinned versions make sense for a GPU BERT pipeline, but please double‑check that this exact trio works with your Python 3.9 runtime and GPU drivers, and that it intentionally matches the base
requirements.txtfor this benchmark so they don’t drift over time.benchmarks/600.linearalgebra/603.jacobi2d/python/requirements.txt (1)
1-1: Torch pin is clear; consider alignment with other torch-based benchmarks.Pinning
torch==2.4.1is good for reproducibility. Since other new benchmarks (e.g., image-classification) pin a different Torch version, please confirm there’s a deliberate reason (e.g., tested/perf sweet spot here) and that all target Python/CUDA stacks support this version. If there’s no strong reason, aligning Torch versions across the linear algebra benchmarks could simplify maintenance.benchmarks/400.inference/412.language-bert/python/requirements.txt (1)
1-3: Base requirements mirror the 3.9 variant; watch for drift.These pins match the
.3.9file, which is good for consistency. Just be mindful that having two copies means future updates need to touch both; if you later add more per‑version files, you may want a simple generation script or a single source of truth to avoid accidental divergence.benchmarks/400.inference/413.image-classification/python/requirements.txt.3.12 (1)
1-4: Check version strategy: ranges vs pins for reproducible benchmarks.Torch/torchvision are strictly pinned, but
numpy/pillowuse>=, which allows environment drift over time. For a benchmark suite, you may prefer fully pinned versions to keep results comparable across runs; if you intentionally want flexibility here, that’s fine, just confirm this combination is supported on Python 3.12 and your target CUDA stack.benchmarks/400.inference/413.recommendation/python/package.sh (1)
3-4: Remove unused variable assignment or use it consistently.
PACKAGE_DIRis assigned but never referenced; the script uses$1directly instead. Either remove the variable declaration or replace$1with$PACKAGE_DIRfor consistency.-PACKAGE_DIR=$1 -echo "DLRM GPU package size $(du -sh $1 | cut -f1)" +PACKAGE_DIR=$1 +echo "DLRM GPU package size $(du -sh $PACKAGE_DIR | cut -f1)"benchmarks/600.linearalgebra/603.jacobi2d/input.py (1)
1-7: Input generator is consistent with other linear algebra benchmarks
size_generatorsandgenerate_input()match the established pattern in 600.* (size lookup + fixed seed). Unused parameters are expected here due to the shared SEBS interface and don’t need to be removed; if Ruff is enforced, consider per-file ignores instead of changing the signature.benchmarks/600.linearalgebra/601.matmul/input.py (1)
1-7: Matmul input generator matches project conventionsThe
size_generatorsmapping andgenerate_input()return shape are in line with other 600.* benchmarks (size + deterministic seed). Unused parameters are by design for the common interface; no change needed unless you want to quiet Ruff via ignores.benchmarks/100.webapps/110.dynamic-html/input.py (1)
1-9: Dynamic-HTML input generator is simple and correct
generate_input()correctly maps the logical size torandom_lenand keeps the standard SEBS signature. Given inputs are framework-controlled, relying onsize_generators[size]without extra validation is fine. Unused parameters are expected for this shared interface; suppressing Ruff is preferable to changing the signature.benchmarks/600.linearalgebra/601.matmul/python/function.py (1)
6-62: Usegpu_msin the result and clean up unused locals inhandlerFunctionally this works, but in
handler:
C_outandgpu_msare unused, which Ruff rightfully flags.- Other 600.* benchmarks expose
gpu_timein the measurement, so you’re missing potentially useful data here.A minimal alignment with the rest of the suite would be:
- matmul_begin = datetime.datetime.now() - C_out, gpu_ms = kernel_gemm(alpha, beta, C, A, B, reps=1) - matmul_end = datetime.datetime.now() + matmul_begin = datetime.datetime.now() + _C_out, gpu_ms = kernel_gemm(alpha, beta, C, A, B, reps=1) + matmul_end = datetime.datetime.now() @@ - "measurement": { - "generating_time": matrix_generating_time, - "compute_time": matmul_time, - }, + "measurement": { + "generating_time": matrix_generating_time, + "compute_time": matmul_time, + "gpu_time": gpu_ms, + },Also, the
seedvariable computed inside the"seed" in eventbranch is currently unused; you can either remove that block or actually plumb the seed into tensor initialization if you plan to introduce randomness later.benchmarks/400.inference/413.recommendation/input.py (1)
1-30: Recommendation input wiring looks coherent
buckets_count(),upload_files(), andgenerate_input()form a consistent trio: model goes to bucket 0, requests to bucket 1, and the returnedcfg["object"]/cfg["bucket"]structure matches the usual pattern for inference benchmarks. Unused parameters ingenerate_input()are acceptable here because of the shared interface across benchmarks.benchmarks/600.linearalgebra/603.jacobi2d/python/function.py (1)
6-72: Jacobi2D kernel looks good; tidy up unused outputs and deadseedlogicThe Jacobi2D kernel and timing look sound, including the warmup and CUDA event usage. Two minor cleanups would make this sharper:
A_outandB_outfromkernel_jacobi2d()are never used; prefix them with underscores to satisfy Ruff:- matmul_begin = datetime.datetime.now()
- A_out, B_out, gpu_ms = kernel_jacobi2d(A, B, iters=50)
- matmul_end = datetime.datetime.now()
- matmul_begin = datetime.datetime.now()
- _A_out, _B_out, gpu_ms = kernel_jacobi2d(A, B, iters=50)
- matmul_end = datetime.datetime.now()
- The `seed` computed in the `"seed" in event` branch is currently unused. Either remove that block or, if you plan to introduce randomness later, thread the seed into initialization to match other benchmarks. </blockquote></details> <details> <summary>benchmarks/400.inference/412.language-bert/input.py (2)</summary><blockquote> `9-15`: **Object key construction in `upload_files`** The traversal and relative-key logic look fine. If you want slightly cleaner keys at the root level, you could special‑case `prefix == "."` to avoid prefixes like `"./file"`, but this is cosmetic and safe to defer. --- `18-33`: **Mark unused parameters in `generate_input` to satisfy Ruff** `size`, `output_paths`, and `nosql_func` are unused but required by the common input-generator signature, hence Ruff’s `ARG001` warnings. Renaming them with a leading underscore keeps the API shape while silencing lint. ```diff -def generate_input( - data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func -): +def generate_input( + data_dir, + _size, + benchmarks_bucket, + input_paths, + _output_paths, + upload_func, + _nosql_func, +):benchmarks/400.inference/413.image-classification/input.py (3)
10-15: Rename unuseddirsloop variable
dirsis unused in theos.walkloop and triggers Ruff B007. Renaming to_dirs(or_) keeps intent clear and silences the warning.- for root, dirs, files in os.walk(data_dir): + for root, _dirs, files in os.walk(data_dir):You may also consider normalizing keys so root-level files don’t get a
"./"prefix, but that is purely cosmetic.
18-26: Docstring content is misleading for this benchmarkThe free‑standing triple‑quoted comment still refers to a “compression test”, which doesn’t match this image‑classification benchmark. Consider updating or removing it to avoid confusion for future maintainers.
29-51: Unused parameters and workload-size handling ingenerate_input
size,output_paths, andnosql_funcare unused (RuffARG001). As with other inputs, they exist for interface compatibility, so renaming them with leading underscores is a simple fix.- The function currently always selects the first image from
val_map.txt; this is fine for a simple microbenchmark, but if you intend different workloads for differentsizevalues, you could extend this later.-def generate_input( - data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func -): +def generate_input( + data_dir, + _size, + benchmarks_bucket, + input_paths, + _output_paths, + upload_func, + _nosql_func, +):benchmarks/500.scientific/5xx.compute_jax_npbench/python/function.py (1)
24-62: Clarify eventsizeexpectation andjax.device_getexception handling
handlerassumesevent["size"]is always present:size,M, andNare only defined under the"size" in eventbranch but are later used unconditionally (including in the returned payload). This matches patterns in other NPBench handlers, but it does mean a missing"size"key would raise at runtime. If the driver always suppliessize, that’s fine; otherwise you may want an explicit assertion or default.The broad
try/except Exception: passaroundjax.device_getalso mirrors existing code but triggers Ruff S110/BLE001. If you want to keep lint clean, consider either narrowing the exception to the specific JAX error types you expect, or adding an inline# noqa: BLE001,S110with a short comment explaining why silent failure on host transfer is acceptable here.benchmarks/400.inference/413.recommendation/python/function.py (3)
47-52: Unreachablereturnin_select_deviceGiven the explicit
raise RuntimeError("CUDA is not available"), the finalreturn torch.device("cpu")is never reached. If the intent is to enforce CUDA for this benchmark, you can safely drop that line; if you want a CPU fallback for local testing, you’ll need to reorder the logic accordingly.
54-82: Temporary model file handling and unusedMODEL_CACHE
_load_modelcurrently:
- Ensures
MODEL_CACHEexists but never actually uses it to store the model.- Writes the checkpoint to
/tmp/<uuid>-dlrm_tiny.ptand deletes it after loading.This is functionally fine for a benchmark, but:
- You can either remove
MODEL_CACHEand theos.makedirscall, or start writingtmp_pathinto that directory if you want a persistent on-disk cache.- Ruff flags hard-coded
/tmpas S108; if you care about linting/hardening, consider usingtempfile.NamedTemporaryFile(delete=False)orTemporaryDirectoryinstead.
97-144: Handler event assumptions,/tmpusage, andzip(strict=)A few small points in
handler:
- Event shape:
bucket,model_prefix,requests_prefix, andrequests_keyare all fetched via nested.getcalls and assumed to be present. If the harness always provides them, that’s fine; otherwise, you may want an assertion or clearer error if any areNone.- Temporary request file:
req_pathlives under/tmp, which triggers Ruff S108. For stricter hygiene, you could switch totempfile.NamedTemporaryFileor similar instead of manual/tmppaths.- Iteration over predictions: Ruff B905 recommends making the
zipstrict so mismatched lengths don’t silently truncate:- for req, score in zip(payloads, scores): + for req, score in zip(payloads, scores, strict=True):Also note that your aggregate
download_timeandcompute_timealready include the model’s download and processing times, which are additionally exposed asmodel_download_timeandmodel_time. That’s fine as long as downstream consumers expect the aggregation to double-count those phases.benchmarks/600.linearalgebra/604.cholesky/python/function.py (2)
18-31: Confirm intended scaling of Cholesky repetitions
kernel_choleskyruns CholeskyA.size(0)times inside the timed region. For largeN, this deliberately amplifies GPU work, which is reasonable for a microbenchmark, but it does mean runtime scales quadratically withN. Just confirm this repetition count matches your intended workload.
34-61: Suppress unused-Llint inhandlerIn
handler, the unpackedLfromkernel_choleskyisn’t used, which triggers Ruff RUF059. You can keep the call exactly as-is and simply mark the variable as intentionally unused.- L, gpu_ms = kernel_cholesky(A) + _L, gpu_ms = kernel_cholesky(A)Also, the computed
seedvariable is currently unused beyond seeding Python’srandom; if you don’t need reproducible PyTorch RNG, you can simplify that block or extend it to seed torch as well.benchmarks/600.linearalgebra/605.lu/python/function.py (1)
53-59: Clarify unused outputs inhandler(and dead seed logic)
handlerunpacksB, gpu_ms = kernel(A)but never usesB, and the localseedderived fromevent["seed"]is also unused. This is harmless but slightly confusing in a benchmarking context.You can make the intent explicit and silence linters by ignoring the unused result:
- if "seed" in event: - import random - - random.seed(event["seed"]) - - seed = event.get("seed", 42) - seed = int(seed) + if "seed" in event: + import random + + # Keep seeding here only if future randomness will depend on it + random.seed(int(event["seed"])) @@ - comp_begin = datetime.datetime.now() - B, gpu_ms = kernel(A) + comp_begin = datetime.datetime.now() + _, gpu_ms = kernel(A)If you don’t plan to use the Python RNG at all, you can drop the whole
if "seed" in eventblock.Also applies to: 66-66
benchmarks/400.inference/412.language-bert/python/function.py (1)
133-144: Optional: make thezipcontract explicit withstrict=
for sentence, probs in zip(sentences, probabilities):assumessentencesandprobabilitiesalways have the same length (which they should). To make this explicit and avoid silent truncation if shapes ever diverge, you could use:- for sentence, probs in zip(sentences, probabilities): + for sentence, probs in zip(sentences, probabilities, strict=True):This will raise if something upstream goes wrong, which is often preferable for benchmarks.
| - name: Black (check) | ||
| run: | | ||
| . python-venv/bin/activate | ||
| black benchmarks --check --config .black.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify Black is only intended to run on benchmarks/.
The Black check only targets the benchmarks directory, but the pre-commit configuration in .pre-commit-config.yaml applies to both sebs/ and benchmarks/. Consider whether the workflow should also check sebs/ for consistency.
Run the following script to check if sebs/ contains Python files that should be formatted:
🏁 Script executed:
#!/bin/bash
# Description: Check if sebs/ directory contains Python files
fd -e py . sebs/ --max-results 10Length of output: 198
I've confirmed that sebs/ contains Python files. Now I need to verify the pre-commit configuration to see if it actually applies Black to both directories.
---
🏁 Script executed:
```shell
#!/bin/bash
# Description: Check .pre-commit-config.yaml for Black configuration scope
cat .pre-commit-config.yaml | grep -A 20 "black"
Length of output: 506
Update the Black check in the lint workflow to include the sebs/ directory.
The pre-commit configuration applies Black to files: ^(sebs/|benchmarks/), but the workflow only checks benchmarks/. The sebs/ directory contains multiple Python files and should be included in the CI check for consistency.
Update line 42 to:
black sebs benchmarks --check --config .black.toml
🤖 Prompt for AI Agents
.github/workflows/lint.yml around lines 39 to 42: the Black check currently only
runs against the benchmarks/ directory while pre-commit targets both sebs/ and
benchmarks/; update the workflow to run Black against both directories by
changing the command to invoke Black on sebs and benchmarks with --check and the
.black.toml config so CI matches pre-commit.
| if consecutive_failures != 5: | ||
| with open('/tmp/data.csv', 'w', newline='') as csvfile: | ||
| writer = csv.writer(csvfile, delimiter=',') | ||
| writer.writerow(["id", "client_send", "client_rcv"]) | ||
| with open("/tmp/data.csv", "w", newline="") as csvfile: | ||
| writer = csv.writer(csvfile, delimiter=",") | ||
| writer.writerow(["id", "client_send", "client_rcv"]) | ||
| for row in times: | ||
| writer.writerow(row) | ||
|
|
||
| client = storage.storage.get_instance() | ||
| filename = 'results-{}.csv'.format(request_id) | ||
| key = client.upload(output_bucket, os.path.join(output_prefix, filename), '/tmp/data.csv') | ||
| filename = "results-{}.csv".format(request_id) | ||
| key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv") | ||
|
|
||
| return { 'result': key } | ||
| return {"result": key} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix uninitialized key when connection setup fails
If the UDP connection fails 5 times, consecutive_failures becomes 5, the loop breaks, and the if consecutive_failures != 5: block is skipped. In that path key is never assigned, but you still return {"result": key}, which will raise an UnboundLocalError.
Initialize key before the conditional (or add an else branch) so the failure path is well-defined:
- server_socket.close()
-
- if consecutive_failures != 5:
+ server_socket.close()
+
+ key = None
+ if consecutive_failures != 5:
with open("/tmp/data.csv", "w", newline="") as csvfile:
writer = csv.writer(csvfile, delimiter=",")
writer.writerow(["id", "client_send", "client_rcv"])
for row in times:
writer.writerow(row)
client = storage.storage.get_instance()
filename = "results-{}.csv".format(request_id)
key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv")
return {"result": key}The Ruff S108 warnings about /tmp/data.csv are expected in this serverless benchmark context; using a fixed path under /tmp is generally fine here, so I’d treat that as optional to change.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if consecutive_failures != 5: | |
| with open('/tmp/data.csv', 'w', newline='') as csvfile: | |
| writer = csv.writer(csvfile, delimiter=',') | |
| writer.writerow(["id", "client_send", "client_rcv"]) | |
| with open("/tmp/data.csv", "w", newline="") as csvfile: | |
| writer = csv.writer(csvfile, delimiter=",") | |
| writer.writerow(["id", "client_send", "client_rcv"]) | |
| for row in times: | |
| writer.writerow(row) | |
| client = storage.storage.get_instance() | |
| filename = 'results-{}.csv'.format(request_id) | |
| key = client.upload(output_bucket, os.path.join(output_prefix, filename), '/tmp/data.csv') | |
| filename = "results-{}.csv".format(request_id) | |
| key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv") | |
| return { 'result': key } | |
| return {"result": key} | |
| key = None | |
| if consecutive_failures != 5: | |
| with open("/tmp/data.csv", "w", newline="") as csvfile: | |
| writer = csv.writer(csvfile, delimiter=",") | |
| writer.writerow(["id", "client_send", "client_rcv"]) | |
| for row in times: | |
| writer.writerow(row) | |
| client = storage.storage.get_instance() | |
| filename = "results-{}.csv".format(request_id) | |
| key = client.upload(output_bucket, os.path.join(output_prefix, filename), "/tmp/data.csv") | |
| return {"result": key} |
🧰 Tools
🪛 Ruff (0.14.5)
47-47: Probable insecure usage of temporary file or directory: "/tmp/data.csv"
(S108)
55-55: Probable insecure usage of temporary file or directory: "/tmp/data.csv"
(S108)
🤖 Prompt for AI Agents
In benchmarks/000.microbenchmarks/020.network-benchmark/python/function.py
around lines 46-57, the variable `key` can be uninitialized if
`consecutive_failures == 5` (the upload block is skipped) causing an
UnboundLocalError on `return {"result": key}`; fix by defining `key` before the
conditional (e.g., `key = None` or an empty string) or add an `else` branch that
sets `key` to a sentinel/error value, then ensure the returned value is always
defined and meaningful for the failure path.
| # Number of ints in the working set | ||
| n = max(1, working_set_bytes // 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix working-set size calculation to match actual tensor memory usage
next_idx is allocated as torch.long, which is typically 8 bytes per element, but n is derived assuming 4 bytes:
# Number of ints in the working set
n = max(1, working_set_bytes // 4)This means the actual working set is about twice as large as requested, which distorts the cache‑latency measurements.
You can compute n from the actual element size instead:
- # Number of ints in the working set
- n = max(1, working_set_bytes // 4)
+ # Number of indices in the working set, based on the actual element size of torch.long
+ elem_bytes = torch.tensor(0, dtype=torch.long).element_size()
+ n = max(1, working_set_bytes // elem_bytes)This keeps the benchmark’s working_set_bytes parameter consistent with the true memory footprint across platforms.
🤖 Prompt for AI Agents
benchmarks/000.microbenchmarks/050.gpu-cache-latency/python/function.py around
lines 40 to 41, the working-set size n is computed using 4 bytes per element but
next_idx is a torch.long (8 bytes on typical platforms), so the allocated memory
is roughly double the requested size; change the calculation to derive n from
the actual tensor element size (query the dtype element size for torch.long at
runtime and use working_set_bytes // element_size), keeping the max(1, ...)
guard so the benchmark's working_set_bytes parameter matches the real memory
footprint.
| torch | ||
| numpy No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Pin dependency versions for reproducible microbenchmarks.
Unpinned torch and numpy dependencies prevent reproducible benchmark results. Since this is a performance microbenchmark measuring GPU cache latency, version differences can significantly impact the measurements.
Apply this diff to pin specific versions:
-torch
-numpy
+torch==2.5.1
+numpy==2.0.2Verify the latest stable versions and check for security advisories:
🏁 Script executed:
#!/bin/bash
# Check PyPI for latest versions
echo "=== PyTorch latest version ==="
curl -s https://pypi.org/pypi/torch/json | jq -r '.info.version'
echo "=== NumPy latest version ==="
curl -s https://pypi.org/pypi/numpy/json | jq -r '.info.version'
# Check for security vulnerabilities
echo "=== PyTorch security advisories ==="
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "torch") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
echo "=== NumPy security advisories ==="
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "numpy") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'Length of output: 2330
Pin dependency versions to secure, stable releases, not outdated vulnerable versions.
The suggested versions in the original review are problematic. PyTorch 2.5.1 is vulnerable to a CRITICAL remote code execution vulnerability in torch.load (CVE affecting versions < 2.6.0, patched in 2.6.0). Pinning to this version would actually introduce known security risks.
Pin to secure, current versions instead:
-torch
-numpy
+torch==2.9.1
+numpy==2.3.5These versions are current, stable, and free from known critical vulnerabilities. The older versions suggested elsewhere contain exploitable security issues.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| torch | |
| numpy | |
| torch==2.9.1 | |
| numpy==2.3.5 |
🤖 Prompt for AI Agents
In benchmarks/000.microbenchmarks/050.gpu-cache-latency/python/requirements.txt
(lines 1-2) the dependencies are unpinned which risks installing
outdated/vulnerable releases (notably torch versions <2.6.0 are vulnerable to
RCE via torch.load); update the file to pin to secure, supported releases — at
minimum use torch >= 2.6.0 (or the latest stable torch >=2.6.x) and pin numpy to
a current stable release (e.g., numpy >= 1.25.0 or the project’s chosen
supported minor), and include exact version specifiers (== or >= with upper
bounds) so CI installs known-safe versions and rerun tests to confirm
compatibility.
| """ | ||
| Generate test, small and large workload for compression test. | ||
| :param data_dir: directory where benchmark data is placed | ||
| :param size: workload size | ||
| :param input_buckets: input storage containers for this benchmark | ||
| :param output_buckets: | ||
| :param upload_func: upload function taking three params(bucket_idx, key, filepath) | ||
| ''' | ||
| def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move docstring inside the function.
The docstring is currently floating between functions rather than being placed inside generate_input. Python docstrings must be the first statement after a function definition to be recognized by documentation tools.
Apply this diff to correctly place the docstring:
-
-"""
- Generate test, small and large workload for compression test.
-
- :param data_dir: directory where benchmark data is placed
- :param size: workload size
- :param input_buckets: input storage containers for this benchmark
- :param output_buckets:
- :param upload_func: upload function taking three params(bucket_idx, key, filepath)
-"""
-
-
def generate_input(
data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func
):
+ """
+ Generate test, small and large workload for compression test.
+
+ :param data_dir: directory where benchmark data is placed
+ :param size: workload size
+ :param input_buckets: input storage containers for this benchmark
+ :param output_buckets:
+ :param upload_func: upload function taking three params(bucket_idx, key, filepath)
+ """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """ | |
| Generate test, small and large workload for compression test. | |
| :param data_dir: directory where benchmark data is placed | |
| :param size: workload size | |
| :param input_buckets: input storage containers for this benchmark | |
| :param output_buckets: | |
| :param upload_func: upload function taking three params(bucket_idx, key, filepath) | |
| ''' | |
| def generate_input(data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func): | |
| """ | |
| def generate_input( | |
| data_dir, size, benchmarks_bucket, input_paths, output_paths, upload_func, nosql_func | |
| ): | |
| """ | |
| Generate test, small and large workload for compression test. | |
| :param data_dir: directory where benchmark data is placed | |
| :param size: workload size | |
| :param input_buckets: input storage containers for this benchmark | |
| :param output_buckets: | |
| :param upload_func: upload function taking three params(bucket_idx, key, filepath) | |
| """ |
🤖 Prompt for AI Agents
In benchmarks/300.utilities/311.compression/input.py around lines 18 to 26, the
module-level string currently between functions should be moved so it becomes
the docstring for the generate_input function; open the generate_input
definition and place that triple-quoted string immediately as the first
statement inside the function body (remove the floating string from its current
position), preserving the text and indentation so Python and documentation tools
recognize it as the function docstring.
| @@ -0,0 +1 @@ | |||
| jax[cuda12] No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Pin the JAX version for reproducible builds.
The unpinned jax[cuda12] dependency can lead to non-reproducible benchmark results, as different JAX versions may introduce performance changes or breaking API changes.
Apply this diff to pin a specific version:
-jax[cuda12]
+jax[cuda12]==0.4.35Verify the latest stable JAX version and check for security advisories:
🏁 Script executed:
#!/bin/bash
# Check PyPI for the latest JAX version
curl -s https://pypi.org/pypi/jax/json | jq -r '.info.version'
# Check for security vulnerabilities
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "jax") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'Length of output: 55
Update JAX version pin to a more recent stable release.
Pinning JAX for reproducible builds is essential. However, the suggested version 0.4.35 is significantly outdated. The latest stable JAX version is 0.8.1, and there are no known security vulnerabilities. Update the pin to:
-jax[cuda12]
+jax[cuda12]==0.8.1Verify this version is compatible with your CUDA 12 environment and benchmark requirements before merging.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| jax[cuda12] | |
| jax[cuda12]==0.8.1 |
🤖 Prompt for AI Agents
In
benchmarks/500.scientific/5xx.deep_learning_resnet_jax_npbench/python/requirements.txt
lines 1-1, the JAX dependency is unpinned; update the pin to the newer stable
release by replacing the current unpinned entry with the explicit version that
targets CUDA 12 (e.g., jax[cuda12]==0.8.1). Ensure the pinned version is
compatible with the target CUDA 12 toolchain and any other benchmark
dependencies before committing.
| @@ -0,0 +1 @@ | |||
| torch==2.4.1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify torch version currency and security.
The pinned version approach is good for reproducibility. However, torch==2.4.1 differs from other benchmarks using torch==2.2.2 and torch==2.5.1. Ensure this version is intentional and verify it's current and secure.
🏁 Script executed:
#!/bin/bash
# Check if torch 2.4.1 is current
echo "=== PyTorch latest version ==="
curl -s https://pypi.org/pypi/torch/json | jq -r '.info.version'
# Check for security vulnerabilities affecting 2.4.1
echo "=== PyTorch security advisories ==="
gh api graphql -f query='
{
securityVulnerabilities(first: 10, ecosystem: PIP, package: "torch") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'Length of output: 1635
Update torch version to address critical security vulnerabilities.
torch==2.4.1 contains at least two CRITICAL security vulnerabilities with available patches:
- "torch.load with weights_only=True leads to remote code execution" (patched in 2.6.0)
- "Improper Resource Shutdown or Release vulnerability" (patched in 2.8.0)
Additionally, the latest stable version is 2.9.1. Consider updating to at least 2.6.0 or later to address these security issues, while maintaining consistency with other benchmarks in the repository.
🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/601.matmul/python/requirements.txt around lines
1 to 1, the pinned torch==2.4.1 has known critical security issues; update the
requirement to a patched release (minimum torch>=2.6.0, preferably torch==2.9.1
to match latest stable) by replacing the version specifier accordingly, then run
project tests/benchmarks and CI to verify compatibility, update any related
requirements/lock files across the repo to maintain consistency, and regenerate
dependency locks or Docker images if present.
| def initialize_torch(N, dtype=torch.float32, device="cuda", seed=42): | ||
| if seed is not None: | ||
| torch.manual_seed(seed) | ||
| torch.cuda.manual_seed_all(seed) | ||
| alpha = torch.randn((), dtype=dtype, device=device) | ||
| x = torch.randn(N, dtype=dtype, device=device) | ||
| y = torch.randn(N, dtype=dtype, device=device) | ||
| return alpha, x, y | ||
|
|
||
|
|
||
| def kernel_axpy(alpha, x, y, reps=100): | ||
| torch.cuda.synchronize() | ||
| _ = alpha * x + y # warmup | ||
| torch.cuda.synchronize() | ||
|
|
||
| start_evt = torch.cuda.Event(enable_timing=True) | ||
| end_evt = torch.cuda.Event(enable_timing=True) | ||
| start_evt.record() | ||
| for _ in range(reps): | ||
| y = alpha * x + y | ||
| end_evt.record() | ||
| torch.cuda.synchronize() | ||
| gpu_ms = float(start_evt.elapsed_time(end_evt)) | ||
| return y, gpu_ms | ||
|
|
||
|
|
||
| def handler(event): | ||
| size = event.get("size") | ||
| if "seed" in event: | ||
| import random | ||
|
|
||
| random.seed(event["seed"]) | ||
|
|
||
| seed = event.get("seed", 42) | ||
| seed = int(seed) | ||
|
|
||
| gen_begin = datetime.datetime.now() | ||
| alpha, x, y = initialize_torch(size, dtype=torch.float32, device="cuda", seed=seed) | ||
| gen_end = datetime.datetime.now() | ||
|
|
||
| comp_begin = datetime.datetime.now() | ||
| y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100) | ||
| comp_end = datetime.datetime.now() | ||
|
|
||
| gen_us = (gen_end - gen_begin) / datetime.timedelta(microseconds=1) | ||
| comp_us = (comp_end - comp_begin) / datetime.timedelta(microseconds=1) | ||
|
|
||
| return { | ||
| "measurement": { | ||
| "generating_time": gen_us, | ||
| "compute_time": comp_us, | ||
| "gpu_time": gpu_ms, | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix potential seed UnboundLocalError and unused y_out in handler
In handler, seed is only defined inside the "seed" in event branch, but it’s always used in the call to initialize_torch(...). If event ever omits "seed", this will raise an UnboundLocalError.
You can mirror the safer pattern used in other 600.* handlers by providing a default:
def handler(event):
size = event.get("size")
if "seed" in event:
import random
random.seed(event["seed"])
seed = event.get("seed", 42)
seed = int(seed)
+ else:
+ seed = 42
@@
- comp_begin = datetime.datetime.now()
- y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
- comp_end = datetime.datetime.now()
+ comp_begin = datetime.datetime.now()
+ _y_out, gpu_ms = kernel_axpy(alpha, x, y, reps=100)
+ comp_end = datetime.datetime.now()This both removes the potential runtime error and satisfies Ruff’s warning about the unused y_out.
🧰 Tools
🪛 Ruff (0.14.5)
47-47: Unpacked variable y_out is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/602.axpy/python/function.py around lines 6 to
59, handler() may reference seed without it being defined if "seed" is absent
and also y_out is unused; initialize seed with a default before the conditional
(e.g., seed = int(event.get("seed", 42))) or set seed = 42 then override if
present, and suppress the unused variable by assigning the kernel result to _
(e.g., _, gpu_ms = kernel_axpy(...)) or reuse y_out into y to remove the Ruff
warning.
| @@ -0,0 +1 @@ | |||
| torch==2.4.1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
CRITICAL: PyTorch 2.4.1 has multiple severe security vulnerabilities.
Static analysis has identified 5 critical security vulnerabilities in torch 2.4.1, including:
- Remote Code Execution (RCE) via
torch.loadwithweights_only=True(GHSA-53q9-r3pm-6pq6) - Local Denial of Service (GHSA-3749-ghw9-m3mg)
- Improper resource shutdown (GHSA-887c-mr87-cxwp)
- Additional vulnerabilities: PYSEC-2024-259, PYSEC-2025-41
Upgrade to the latest patched version of PyTorch to address these vulnerabilities.
🌐 Web query:
What is the latest secure version of PyTorch and when was 2.4.1 patched?
💡 Result:
-
Latest stable PyTorch: 2.7.0 (PyTorch 2.7 release). [1]
-
2.4.1 patch: the Debian package incorporating fixes was published as version 2.4.1-1 (uploaded Sep 29, 2024). [2]
-
Relevant security advisory: the remote-code-execution issue in torch.load (patched upstream in the 2.6.0+ line) is documented in the PyTorch GHSA/CVE advisories. [3]
Sources:
[1] PyTorch 2.7 release.
[2] Debian upload / changelog showing pytorch 2.4.1-1 (Sep 29, 2024).
[3] PyTorch security advisory / PYSEC entry (GHSA-53q9-r3pm-6pq6 / CVE-2025-32434) and related CVE discussion.
I need to gather more specific information about the vulnerabilities and whether they affect PyTorch 2.4.1 directly.
I need to verify the other vulnerabilities mentioned in the review. Let me search for them specifically.
Upgrade to PyTorch 2.6.0 or later to address a critical RCE vulnerability.
PyTorch 2.4.1 is affected by CVE-2025-32434, a critical Remote Command Execution vulnerability in the torch.load() function when used with weights_only=True. This vulnerability has been patched in PyTorch version 2.6.0 or higher. The vulnerability received a CVSS score of 9.8 (Critical), and any application that loads PyTorch models could be at risk if an attacker can supply a malicious .pt model file.
🧰 Tools
🪛 OSV Scanner (2.2.4)
[CRITICAL] 1-1: torch 2.4.1: undefined
(PYSEC-2024-259)
[CRITICAL] 1-1: torch 2.4.1: undefined
(PYSEC-2025-41)
[CRITICAL] 1-1: torch 2.4.1: PyTorch susceptible to local Denial of Service
[CRITICAL] 1-1: torch 2.4.1: PyTorch: torch.load with weights_only=True leads to remote code execution
[CRITICAL] 1-1: torch 2.4.1: PyTorch Improper Resource Shutdown or Release vulnerability
| @@ -0,0 +1 @@ | |||
| torch==2.4.1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify PyTorch version validity and security advisories.
Please ensure that torch==2.4.1 is a valid version and check for any known security vulnerabilities.
Run the following script to verify the PyTorch version:
🏁 Script executed:
#!/bin/bash
# Description: Verify torch==2.4.1 exists and check for security advisories
# Check PyPI for available torch versions
curl -s https://pypi.org/pypi/torch/json | jq -r '.releases | keys[]' | grep "^2\.4\."
# Check for security advisories
gh api graphql -f query='
{
securityVulnerabilities(first: 10, ecosystem: PIP, package: "torch") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'Length of output: 1574
Upgrade torch to address multiple security vulnerabilities.
torch==2.4.1 is a valid version, but it has three known security vulnerabilities:
- CRITICAL: Remote code execution via
torch.loadwithweights_only=True(affecting versions < 2.6.0) — fixed in 2.6.0 - MODERATE: Improper resource shutdown or release (affecting versions ≤ 2.7.1) — fixed in 2.8.0
- LOW: Local denial of service (affecting versions < 2.7.1-rc1) — fixed in 2.7.1-rc1
Upgrade to torch 2.6.0 or later to address the critical RCE vulnerability, or 2.8.0 to address all known vulnerabilities.
🤖 Prompt for AI Agents
In benchmarks/600.linearalgebra/605.lu/python/requirements.txt lines 1-1 the
project pins torch==2.4.1 which contains multiple security vulnerabilities
(including a CRITICAL RCE fixed in 2.6.0); update the requirement to a safe
version (recommend torch==2.8.0 to cover all listed fixes) by replacing the
version pin, then regenerate any lock files or dependency manifests and run the
test suite/CI to verify compatibility and rebuild any containers or artifacts
that install dependencies.
add cache latency benchmarks
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.