Skip to content

Conversation

boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Oct 7, 2025

This is a performance optimization and does not impact test fidelity. There have been some flakes where this script will fail to download files, exit with code 1, causing the job to fail before it even starts running tests. This is undesirable as the tests will only run 10-15% slower without this, so catch the exceptions and emit a warning we can track later in the rare case we cannot download the timing files.

This fixes #162294.

This is a performance optimization and does not impact test fidelity. There
have been some flakes where this script will fail to download files, exit
with code 1, causing the job to fail before it even starts running tests.
This is undesirable as the tests will only run 10-15% slower without this,
so catch the exceptions and emit a warning we can track later in the rare
case we cannot download the timing files.
@boomanaiden154 boomanaiden154 requested a review from cmtice October 7, 2025 16:18
@llvmbot llvmbot added the infrastructure Bugs about LLVM infrastructure label Oct 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2025

@llvm/pr-subscribers-infrastructure

Author: Aiden Grossman (boomanaiden154)

Changes

This is a performance optimization and does not impact test fidelity. There have been some flakes where this script will fail to download files, exit with code 1, causing the job to fail before it even starts running tests. This is undesirable as the tests will only run 10-15% slower without this, so catch the exceptions and emit a warning we can track later in the rare case we cannot download the timing files.


Full diff: https://github.com/llvm/llvm-project/pull/162316.diff

1 Files Affected:

  • (modified) .ci/cache_lit_timing_files.py (+16-2)
diff --git a/.ci/cache_lit_timing_files.py b/.ci/cache_lit_timing_files.py
index 2f43e46fc0e56..27a5cf6b0fda3 100644
--- a/.ci/cache_lit_timing_files.py
+++ b/.ci/cache_lit_timing_files.py
@@ -17,6 +17,7 @@
 import glob
 
 from google.cloud import storage
+from google.api_core import exceptions
 
 GCS_PARALLELISM = 100
 
@@ -50,7 +51,14 @@ def _maybe_download_timing_file(blob):
 
 def download_timing_files(storage_client, bucket_name: str):
     bucket = storage_client.bucket(bucket_name)
-    blobs = bucket.list_blobs(prefix="lit_timing")
+    try:
+        blobs = bucket.list_blobs(prefix="lit_timing")
+    except exceptions.ClientError as client_error:
+        print(
+            "::warning file=cache_lit_timing_files.py::Failed to list blobs "
+            "in bucket."
+        )
+        sys.exit(0)
     with multiprocessing.pool.ThreadPool(GCS_PARALLELISM) as thread_pool:
         futures = []
         for timing_file_blob in blobs:
@@ -60,7 +68,13 @@ def download_timing_files(storage_client, bucket_name: str):
                 )
             )
         for future in futures:
-            future.get()
+            future.wait()
+            if not future.successful():
+                print(
+                    "::warning file=cache_lit_timing_files.py::Failed to "
+                    "download lit timing file."
+                )
+                continue
     print("Done downloading")
 
 

@boomanaiden154 boomanaiden154 merged commit 93f2e0a into llvm:main Oct 7, 2025
12 checks passed
@boomanaiden154 boomanaiden154 deleted the lit-timing-file-caching-resilience-downloading branch October 7, 2025 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Bugs about LLVM infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] Reliability issues with lit timing file caching

3 participants