Skip to content

Conversation

@jriv01
Copy link
Contributor

@jriv01 jriv01 commented Jul 11, 2025

These resources are for a CronJob that executes the container at ghcr.io/llvm/operations-metrics:latest on a daily basis (07:00 UTC), which will scrape daily metrics regarding LLVM's commit volume and upload them for visualization in Grafana.

Changes were made to the already existing terraform files since many of the same resources are being reused anyway. This way we can keep all relevant changes in the same place instead of having two separate terraform directories that access and modify shared resources.

Since the container needs access to the BigQuery Google Cloud API, IAM and K8S service accounts were used to grant that access via Workload Identity Federation for GKE. More details at https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity

@jriv01
Copy link
Contributor Author

jriv01 commented Jul 11, 2025

@boomanaiden154 @lnihlen

premerge/main.tf Outdated
}

# The container for scraping LLVM commits needs persistent storage
# for a local check-out of llvm/llvm-project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be stored persistently? It's pretty cheap to clone LLVM and a PVC I think adds unnecessary complexity on top of making things more complicated because they are now stateful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I neglected to mention this, but there's also a persistent file that keeps track of the last commits we've seen. Originally, the script was to run at a more frequent cadence so we wanted to keep track of commits we've seen as to avoid reprocessing them.

Now that the script only scrapes a day worth of data at a time, maybe we don't need a persistent state to keep track of commits we've seen. Although it might still be valuable for ensuring the quality of the commit data between iterations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a PVC for a persistent file would make more sense.

I still think it's a bit of an antipattern though. If you want to ensure you're only looking at new commits and its a cron job, you can just look at the last 24 hours of commits (which it seems like you're already doing?). Making this stateless makes things quite a bit simpler and aligns things more with how k8s expects them to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed removal of dependency on persistent storage in #501

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

@boomanaiden154
Copy link
Contributor

Please fix the terraform formatting before landing this though.

At some point we might want to refactor this out into its own TF module, but for now I think this is reasonable enough.

@boomanaiden154 boomanaiden154 merged commit 5d8aab5 into llvm:main Jul 22, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants