[pytorch] Update to 2.9.0.dev20250820 #4298

sjarus · 2025-08-21T23:32:47Z

Updates PyTorch version since old version is not downloadable.

Now at 2.9.0.dev20250820
TorchVision at 0.24.0.dev20250820

Updates PyTorch version since old version is not downloadable. - Now at 2.9.0.dev20250820 - TorchVision at 0.24.0.dev20250820 Signed-off-by: Suraj Sudhir <[email protected]>

zjgarvey · 2025-08-22T15:34:04Z

It looks like #4262 is still relevant. Let's try to address this promptly.

If we need to temporarily add these to xfails or no-run sets, let's do this to unblock CI.

sjarus · 2025-08-25T23:42:16Z

@zjgarvey I'm having trouble locally reproducing the failure with " torch._dynamo.exc.InternalTorchDynamoError: TimeoutError: Timeout", which doesn't happen locally when build in-tree as in docs/dev within a venv and then do:
./projects/pt1/tools/e2e_test.sh --config onnx
./projects/pt1/tools/e2e_test.sh --config fx_importer
./projects/pt1/tools/e2e_test.sh --config fx_importer_stablehlo
./projects/pt1/tools/e2e_test.sh --config fx_importer_tosa
This is the case even though the logs indicate the right Torch version i.e.
TORCH_VERSION_FOR_COMPARISON = 2.9.0.dev20250820

Is there a pointer somewhere to how to exactly mimic the CI steps ?

vivekkhandelwal1 · 2025-08-26T04:33:53Z

It looks like #4262 is still relevant. Let's try to address this promptly.

If we need to temporarily add these to xfails or no-run sets, let's do this to unblock CI.

@zjgarvey, as per my observation the set of tests failing is not deterministic. For different execution, different tests fail. I suspect that the runner is using Python3.10, and that could be a possible reason for failure or something else related to runner. Since, I have never been able to reproduce this issue locally.

sahas3 · 2025-08-26T15:26:30Z

I ran the steps of the ci workflow from

torch-mlir/.github/workflows/ci.yml

Line 54 in 3b77c0b

bash build_tools/ci/install_python_deps.sh ${{ matrix.torch-version }}

but unfortunately couldn't repro the failure. One thing I noticed is that somehow the error is pointing to python3.10 even though workflow file specifies python3.11 (locally I have python 3.11) https://github.com/llvm/torch-mlir/actions/runs/17141703353/job/48630987308?pr=4298#step:9:8705 -- I am not sure where it's getting 3.10 from and whether this has something to do with the failure.

sjarus · 2025-08-26T20:40:03Z

My working venv uses 3.10.12 . The pip list is

cmake             3.31.4
dill              0.3.9
filelock          3.16.1
fsspec            2024.12.0
Jinja2            3.1.5
MarkupSafe        3.0.2
mpmath            1.3.0
multiprocess      0.70.17
nanobind          2.5.0
networkx          3.4.2
ninja             1.11.1.3
numpy             2.2.1
onnx              1.16.1
packaging         24.2
pillow            11.1.0
pip               25.2
protobuf          5.29.3
pybind11          2.13.6
PyYAML            6.0.2
setuptools        59.6.0
sympy             1.13.3
torch             2.9.0.dev20250820+cpu
torchvision       0.24.0.dev20250820+cpu
typing_extensions 4.12.2
wheel             0.45.1

e2e_test.sh works fine:
onnx:

Summary:
    Passed: 941
    Expectedly Failed: 651

fx_importer:

Summary:
    Passed: 1493
    Expectedly Failed: 115

fx_importer_tosa:

Summary:
    Passed: 1173
    Expectedly Failed: 463

Not only did the new torch work, but it removed a bunch of things from xfails. This is very likely a CI setup problem and not a Torch-MLIR/PyTorch interaction issue as such.

zjgarvey · 2025-08-27T18:32:57Z

Sorry for the late reply. @sjarus this is a bit strange. It does look like deps are being installed to a 3.10 site packages, but then why are we installing python3.11 ? Something strange is going on here.

sjarus · 2025-08-27T18:47:08Z

Hi @zjgarvey yes it seems something about the CI setup scripts mangles things to such an extent that it's somehow interfering with the torch fx import step. Internally, we just do the standard build steps within the docs/development.md under 3.10.12 . For clarity these are our exact steps:

python -m venv mlir_venv
source mlir_venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -r torchvision-requirements.txt
 
# Update submodules
git submodule update --init --recursive
 
# Create target:
cmake -GNinja -Bbuild \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DPython_FIND_VIRTUALENV=ONLY \
  -DPython3_FIND_VIRTUALENV=ONLY \
  -DLLVM_ENABLE_PROJECTS=mlir \
  -DLLVM_EXTERNAL_PROJECTS="torch-mlir" \
  -DLLVM_EXTERNAL_TORCH_MLIR_SOURCE_DIR="$PWD" \
  -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
  -DLLVM_TARGETS_TO_BUILD=host \
  -DTORCH_MLIR_ENABLE_PYTORCH_EXTENSIONS=ON \
  -DTORCH_MLIR_ENABLE_STABLEHLO=OFF \
  externals/llvm-project/llvm
 
# Build
cmake --build build

# Run
./projects/pt1/tools/e2e_test.sh --config fx_importer_tosa
./projects/pt1/tools/e2e_test.sh --config onnx_tosa

This works fine, and internally we have bumped up the PyTorch and TorchVision versions as listed in this PR, with clean runs.

There is a secondary problem with the CI scripts in that the stable and nightly report different results - but neither appear to function quite the same as the CI itself does.

The CI scripts and/or docker instance may need a serious review.

sahas3 · 2025-08-28T00:26:08Z

@sjarus, @zjgarvey Looks like setting up python was the issue in the CI workflow. I've attempted to fix that and have a successful nightly build https://github.com/llvm/torch-mlir/actions/runs/17281729733/job/49051279175.

I do see different tests failing in nightly vs stable in the CI. Building stable locally now to see if it's also a CI only issue.

sahas3 · 2025-08-28T11:24:36Z

@sjarus, @zjgarvey Looks like setting up python was the issue in the CI workflow. I've attempted to fix that and have a successful nightly build https://github.com/llvm/torch-mlir/actions/runs/17281729733/job/49051279175.

I do see different tests failing in nightly vs stable in the CI. Building stable locally now to see if it's also a CI only issue.

Stable test failure was reproducible locally too. Got clean CI in #4301.

zjgarvey · 2025-08-28T15:35:31Z

Closed as #4301 merged this change into main. Thanks!

[pytorch] Update to 2.9.0.dev20250820

3b77c0b

Updates PyTorch version since old version is not downloadable. - Now at 2.9.0.dev20250820 - TorchVision at 0.24.0.dev20250820 Signed-off-by: Suraj Sudhir <[email protected]>

sahas3 mentioned this pull request Aug 22, 2025

Support decomposition of torch.broadcast_tensors #4253

Merged

zjgarvey requested a review from vivekkhandelwal1 August 22, 2025 15:48

zjgarvey closed this Aug 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pytorch] Update to 2.9.0.dev20250820 #4298

[pytorch] Update to 2.9.0.dev20250820 #4298

Uh oh!

sjarus commented Aug 21, 2025

Uh oh!

zjgarvey commented Aug 22, 2025

Uh oh!

sjarus commented Aug 25, 2025

Uh oh!

vivekkhandelwal1 commented Aug 26, 2025 •

edited

Loading

Uh oh!

sahas3 commented Aug 26, 2025

Uh oh!

sjarus commented Aug 26, 2025

Uh oh!

zjgarvey commented Aug 27, 2025

Uh oh!

sjarus commented Aug 27, 2025 •

edited

Loading

Uh oh!

sahas3 commented Aug 28, 2025

Uh oh!

sahas3 commented Aug 28, 2025

Uh oh!

zjgarvey commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[pytorch] Update to 2.9.0.dev20250820 #4298

[pytorch] Update to 2.9.0.dev20250820 #4298

Uh oh!

Conversation

sjarus commented Aug 21, 2025

Uh oh!

zjgarvey commented Aug 22, 2025

Uh oh!

sjarus commented Aug 25, 2025

Uh oh!

vivekkhandelwal1 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahas3 commented Aug 26, 2025

Uh oh!

sjarus commented Aug 26, 2025

Uh oh!

zjgarvey commented Aug 27, 2025

Uh oh!

sjarus commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sahas3 commented Aug 28, 2025

Uh oh!

sahas3 commented Aug 28, 2025

Uh oh!

zjgarvey commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vivekkhandelwal1 commented Aug 26, 2025 •

edited

Loading

sjarus commented Aug 27, 2025 •

edited

Loading