Skip to content

Commit a15a3c6

Browse files
committed
ci: use stable rust and gate on number of gpus
1 parent faf90ec commit a15a3c6

File tree

2 files changed

+9
-4
lines changed

2 files changed

+9
-4
lines changed

.github/workflows/unittest.yaml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,17 @@ jobs:
2525
gpu-arch-type: ${{ matrix.gpu-arch-type }}
2626
gpu-arch-version: ${{ matrix.gpu-arch-version }}
2727
script: |
28+
set -ex
29+
30+
# install python and protobuf
2831
conda create -n venv python=3.10 protobuf -y
2932
conda activate venv
33+
python -m pip install --upgrade pip
3034
31-
yum install -y rust cargo
35+
# install recent version of Rust via rustup
36+
curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain=stable --profile=default -y
37+
. "$HOME/.cargo/env"
3238
33-
python -m pip install --upgrade pip
3439
pip install -e .[dev] -v
3540
3641
pytest -v

torchft/process_group_test.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,8 +110,8 @@ def test_dummy(self) -> None:
110110
m = torch.nn.parallel.DistributedDataParallel(m, process_group=pg)
111111
m(torch.rand(2, 3))
112112

113-
@skipUnless(torch.cuda.is_available(), "needs CUDA")
114-
def test_baby_nccl(self) -> None:
113+
@skipUnless(torch.cuda.device_count() >= 2, "need two CUDA devices")
114+
def test_baby_nccl_2gpu(self) -> None:
115115
store = TCPStore(
116116
host_name="localhost", port=0, is_master=True, wait_for_workers=False
117117
)

0 commit comments

Comments
 (0)