forked from tensorflow/tensorflow
-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Description
Root cause: tensorflow@f734ee8
Init fix: d29b6d6 or tensorflow#59501
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //tensorflow/dtensor/python/tests:multi_client_test_nccl_local_2gpus
-----------------------------------------------------------------------------
2023-01-31 11:16:27.156744: E tensorflow/tsl/lib/monitoring/collection_registry.cc:81] Cannot register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay
2023-01-31 11:16:27.170465: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Check per client log in Test artifacts.
2023-01-31 11:16:28.129654: E tensorflow/tsl/lib/monitoring/collection_registry.cc:81] Cannot register 2 metrics with the same name: /tensorflow/core/bfc_allocator_delay
2023-01-31 11:16:28.143067: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
It could be AMDGPUs do not support multiple NCCL managers?
tensorflow#58090
Metadata
Metadata
Assignees
Labels
No labels