-
Notifications
You must be signed in to change notification settings - Fork 606
[pytorch] Update to 2.9.0.dev20250820 #4298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Updates PyTorch version since old version is not downloadable. - Now at 2.9.0.dev20250820 - TorchVision at 0.24.0.dev20250820 Signed-off-by: Suraj Sudhir <[email protected]>
It looks like #4262 is still relevant. Let's try to address this promptly. If we need to temporarily add these to xfails or no-run sets, let's do this to unblock CI. |
@zjgarvey I'm having trouble locally reproducing the failure with " torch._dynamo.exc.InternalTorchDynamoError: TimeoutError: Timeout", which doesn't happen locally when build in-tree as in docs/dev within a venv and then do: Is there a pointer somewhere to how to exactly mimic the CI steps ? |
@zjgarvey, as per my observation the set of tests failing is not deterministic. For different execution, different tests fail. I suspect that the runner is using Python3.10, and that could be a possible reason for failure or something else related to runner. Since, I have never been able to reproduce this issue locally. |
I ran the steps of the ci workflow from torch-mlir/.github/workflows/ci.yml Line 54 in 3b77c0b
|
My working venv uses 3.10.12 . The pip list is
e2e_test.sh works fine:
fx_importer:
fx_importer_tosa:
Not only did the new torch work, but it removed a bunch of things from xfails. This is very likely a CI setup problem and not a Torch-MLIR/PyTorch interaction issue as such. |
Sorry for the late reply. @sjarus this is a bit strange. It does look like deps are being installed to a 3.10 site packages, but then why are we installing python3.11 ? Something strange is going on here. |
Hi @zjgarvey yes it seems something about the CI setup scripts mangles things to such an extent that it's somehow interfering with the torch fx import step. Internally, we just do the standard build steps within the docs/development.md under 3.10.12 . For clarity these are our exact steps:
This works fine, and internally we have bumped up the PyTorch and TorchVision versions as listed in this PR, with clean runs. There is a secondary problem with the CI scripts in that the stable and nightly report different results - but neither appear to function quite the same as the CI itself does. The CI scripts and/or docker instance may need a serious review. |
@sjarus, @zjgarvey Looks like setting up python was the issue in the CI workflow. I've attempted to fix that and have a successful nightly build https://github.com/llvm/torch-mlir/actions/runs/17281729733/job/49051279175. I do see different tests failing in nightly vs stable in the CI. Building stable locally now to see if it's also a CI only issue. |
Stable test failure was reproducible locally too. Got clean CI in #4301. |
Closed as #4301 merged this change into main. Thanks! |
Updates PyTorch version since old version is not downloadable.