rebase

boomanaiden154 · boomanaiden154 · commit 13fb28eb669b · 2025-07-20T19:54:15.000Z
Created using spr 1.3.4
diff --git a/premerge/architecture.md b/premerge/architecture.md
@@ -20,9 +20,9 @@ To balance cost/performance, we keep both types.
  - building & testing LLVM shall be done on self-hosted runners.
 
 LLVM has several flavor of self-hosted runners:
- - libcxx runners.
  - MacOS runners for HLSL managed by Microsoft.
  - GCP windows/linux runners managed by Google.
+ - GCP linux runners setup for libcxx managed by Google.
 
 This document only focuses on Google's GCP hosted runners.
 
@@ -47,10 +47,11 @@ Any relevant differences are explicitly enumerated.
 
 Our runners are hosted on GCP Kubernetes clusters, and use the
 [Action Runner Controller (ARC)](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/about-actions-runner-controller).
-The clusters have 3 pools:
+The clusters have 4 main pools:
   - llvm-premerge-linux
   - llvm-premerge-linux-service
-  - llvm-premerge-windows
+  - llvm-premerge-windows-2022
+  - llvm-premerge-libcxx
 
 **llvm-premerge-linux-service** is a fixed pool, only used to host the
 services required to manage the premerge infra (controller, listeners,
@@ -60,10 +61,15 @@ monitoring). Today, this pool has three `e2-highcpu-4` machine.
 VMs. This pool runs the Linux workflows. In the US West cluster, the machines
 are `n2d-standard-64` due to quota limitations.
 
-**llvm-premerge-windows** is a auto-scaling pool with large `n2-standard-32`
+**llvm-premerge-windows-2022** is a auto-scaling pool with large `n2-standard-32`
 VMs. Similar to the Linux pool, but this time it runs Windows workflows. In the
 US West cluster, the machines are `n2d-standard-32` due to quota limitations.
 
+**llvm-premerge-libcxx** is a auto-scaling pool with large `n2-standard-32`
+VMs. This is similar to the Linux pool but with smaller machines tailored
+to the libcxx testing workflows. In the US West Cluster, the machines are
+`n2d-standard-32` due to quota limitations.
+
 ### Service pool: llvm-premerge-linux-service
 
 This pool runs all the services managing the presubmit infra.
@@ -87,7 +93,7 @@ How a job is run:
  - If the instance is not reused in the next 10 minutes, the autoscaler
    will turn down the instance, freeing resources.
 
-### Worker pools : llvm-premerge-linux, llvm-premerge-windows
+### Worker pools : llvm-premerge-linux, llvm-premerge-windows-2022, llvm-premerge-libcxx
 
 To make sure each runner pod is scheduled on the correct pool (linux or
 windows, avoiding the service pool), we use labels and taints.
@@ -98,6 +104,7 @@ So if we do not enforce limits, the controller could schedule 2 runners on
 the same instance, forcing containers to share resources.
 
 Those bits are configures in the
-[linux runner configuration](linux_runners_values.yaml) and
-[windows runner configuration](windows_runner_values.yaml).
+[linux runner configuration](linux_runners_values.yaml),
+[windows runner configuration](windows_runner_values.yaml), and
+[libcxx runner configuration](libcxx_runners_values.yaml).
 
diff --git a/premerge/cluster-management.md b/premerge/cluster-management.md
@@ -33,11 +33,10 @@ The main part you want too look into is `Menu > Kubernetes Engine > Clusters`.
 
 Currently, we have 4 clusters:
  - `llvm-premerge-checks`: the cluster hosting BuildKite Linux runners.
- - `windows-cluster`: the cluster hosting BuildKite Windows runners.
  - `llvm-premerge-cluster-us-central`: The first cluster for GCP hosted runners.
  - `llvm-premerge-cluster-us-west`: The second cluster for GCP hosted runners.
 
-`llvm-premerge-checks` and `windows-cluster` are part of the old Buildkite
+`llvm-premerge-checks` is part of the old Buildkite
 infrastructure. For the new infrastructure, we have two clusters,
 `llvm-premerge-cluster-us-central` and `llvm-premerge-cluster-us-west` for GCP
 hosted runners to form a high availability setup. They both load balance, and
@@ -56,7 +55,8 @@ If you click on `llvm-premerge-cluster-us-central`, and go to the `Nodes` tab, y
 will see 3 node pools:
 - llvm-premerge-linux
 - llvm-premerge-linux-service
-- llvm-premerge-windows
+- llvm-premerge-windows-2022
+- llvm-premerge-libcxx
 
 Definitions for each pool are in [Architecture overview](architecture.md).
 
@@ -95,10 +95,12 @@ To apply any changes to the cluster:
 ```
 terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_linux_service
 terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_linux
-terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_windows
+terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_windows_2022
+terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_libcxx
 terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_linux_service
 terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_linux
-terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_windows
+terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_windows_2022
+terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_libcxx
 terraform apply
 ```
 
@@ -144,7 +146,10 @@ on a kubernetes destroy command:
 
 ```bash
 terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_linux
-terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_windows
+terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_windows_2022
+terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx
+terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx_release
+terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx_next
 ```
 
 These should complete, but if they do not, we are still able to get things
@@ -156,7 +161,10 @@ commands by deleting the kubernetes namespaces all the resources live in:
 
 ```bash
 terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_linux_runners
-terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_windows_runners
+terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_windows_2022_runners
+terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_runners
+terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_release_runners
+terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_next_runners
 ```
 
 If things go smoothly, these should complete quickly. If they do not complete,