Skip to content

Commit 13fb28e

Browse files
rebase
Created using spr 1.3.4
2 parents 35bc2ca + 913cf2e commit 13fb28e

File tree

2 files changed

+29
-14
lines changed

2 files changed

+29
-14
lines changed

premerge/architecture.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ To balance cost/performance, we keep both types.
2020
- building & testing LLVM shall be done on self-hosted runners.
2121

2222
LLVM has several flavor of self-hosted runners:
23-
- libcxx runners.
2423
- MacOS runners for HLSL managed by Microsoft.
2524
- GCP windows/linux runners managed by Google.
25+
- GCP linux runners setup for libcxx managed by Google.
2626

2727
This document only focuses on Google's GCP hosted runners.
2828

@@ -47,10 +47,11 @@ Any relevant differences are explicitly enumerated.
4747

4848
Our runners are hosted on GCP Kubernetes clusters, and use the
4949
[Action Runner Controller (ARC)](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/about-actions-runner-controller).
50-
The clusters have 3 pools:
50+
The clusters have 4 main pools:
5151
- llvm-premerge-linux
5252
- llvm-premerge-linux-service
53-
- llvm-premerge-windows
53+
- llvm-premerge-windows-2022
54+
- llvm-premerge-libcxx
5455

5556
**llvm-premerge-linux-service** is a fixed pool, only used to host the
5657
services required to manage the premerge infra (controller, listeners,
@@ -60,10 +61,15 @@ monitoring). Today, this pool has three `e2-highcpu-4` machine.
6061
VMs. This pool runs the Linux workflows. In the US West cluster, the machines
6162
are `n2d-standard-64` due to quota limitations.
6263

63-
**llvm-premerge-windows** is a auto-scaling pool with large `n2-standard-32`
64+
**llvm-premerge-windows-2022** is a auto-scaling pool with large `n2-standard-32`
6465
VMs. Similar to the Linux pool, but this time it runs Windows workflows. In the
6566
US West cluster, the machines are `n2d-standard-32` due to quota limitations.
6667

68+
**llvm-premerge-libcxx** is a auto-scaling pool with large `n2-standard-32`
69+
VMs. This is similar to the Linux pool but with smaller machines tailored
70+
to the libcxx testing workflows. In the US West Cluster, the machines are
71+
`n2d-standard-32` due to quota limitations.
72+
6773
### Service pool: llvm-premerge-linux-service
6874

6975
This pool runs all the services managing the presubmit infra.
@@ -87,7 +93,7 @@ How a job is run:
8793
- If the instance is not reused in the next 10 minutes, the autoscaler
8894
will turn down the instance, freeing resources.
8995

90-
### Worker pools : llvm-premerge-linux, llvm-premerge-windows
96+
### Worker pools : llvm-premerge-linux, llvm-premerge-windows-2022, llvm-premerge-libcxx
9197

9298
To make sure each runner pod is scheduled on the correct pool (linux or
9399
windows, avoiding the service pool), we use labels and taints.
@@ -98,6 +104,7 @@ So if we do not enforce limits, the controller could schedule 2 runners on
98104
the same instance, forcing containers to share resources.
99105

100106
Those bits are configures in the
101-
[linux runner configuration](linux_runners_values.yaml) and
102-
[windows runner configuration](windows_runner_values.yaml).
107+
[linux runner configuration](linux_runners_values.yaml),
108+
[windows runner configuration](windows_runner_values.yaml), and
109+
[libcxx runner configuration](libcxx_runners_values.yaml).
103110

premerge/cluster-management.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,10 @@ The main part you want too look into is `Menu > Kubernetes Engine > Clusters`.
3333

3434
Currently, we have 4 clusters:
3535
- `llvm-premerge-checks`: the cluster hosting BuildKite Linux runners.
36-
- `windows-cluster`: the cluster hosting BuildKite Windows runners.
3736
- `llvm-premerge-cluster-us-central`: The first cluster for GCP hosted runners.
3837
- `llvm-premerge-cluster-us-west`: The second cluster for GCP hosted runners.
3938

40-
`llvm-premerge-checks` and `windows-cluster` are part of the old Buildkite
39+
`llvm-premerge-checks` is part of the old Buildkite
4140
infrastructure. For the new infrastructure, we have two clusters,
4241
`llvm-premerge-cluster-us-central` and `llvm-premerge-cluster-us-west` for GCP
4342
hosted runners to form a high availability setup. They both load balance, and
@@ -56,7 +55,8 @@ If you click on `llvm-premerge-cluster-us-central`, and go to the `Nodes` tab, y
5655
will see 3 node pools:
5756
- llvm-premerge-linux
5857
- llvm-premerge-linux-service
59-
- llvm-premerge-windows
58+
- llvm-premerge-windows-2022
59+
- llvm-premerge-libcxx
6060

6161
Definitions for each pool are in [Architecture overview](architecture.md).
6262

@@ -95,10 +95,12 @@ To apply any changes to the cluster:
9595
```
9696
terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_linux_service
9797
terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_linux
98-
terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_windows
98+
terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_windows_2022
99+
terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_libcxx
99100
terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_linux_service
100101
terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_linux
101-
terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_windows
102+
terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_windows_2022
103+
terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_libcxx
102104
terraform apply
103105
```
104106

@@ -144,7 +146,10 @@ on a kubernetes destroy command:
144146

145147
```bash
146148
terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_linux
147-
terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_windows
149+
terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_windows_2022
150+
terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx
151+
terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx_release
152+
terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx_next
148153
```
149154

150155
These should complete, but if they do not, we are still able to get things
@@ -156,7 +161,10 @@ commands by deleting the kubernetes namespaces all the resources live in:
156161

157162
```bash
158163
terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_linux_runners
159-
terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_windows_runners
164+
terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_windows_2022_runners
165+
terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_runners
166+
terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_release_runners
167+
terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_next_runners
160168
```
161169

162170
If things go smoothly, these should complete quickly. If they do not complete,

0 commit comments

Comments
 (0)