Skip to content

Conversation

@jriv01
Copy link
Contributor

@jriv01 jriv01 commented Jul 28, 2025

As we are already making calls to the GitHub GraphQL API for data validation, we can just remove the added complexity of using GitHub Archive BigQuery as a data source and query the API directly. Using BigQuery has the advantage of not being rate-limited, but we often have to query for 50-70 commits via the API anyway due to missing records of events in GitHub Archive. With more than half of the BigQuery data points needing amending, it makes more sense to use the API as the original data source.

@jriv01
Copy link
Contributor Author

jriv01 commented Jul 30, 2025

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable enough to me.

Can you also put up a PR to remove the service account from the terraform?

@jriv01
Copy link
Contributor Author

jriv01 commented Jul 30, 2025

Can you also put up a PR to remove the service account from the terraform?

We'll still want the service account around, I have an upcoming PR that will export the processed data to our own BigQuery dataset so we'll still need some IAM infra. But I could remove the binding for the current BigQuery role at least, since they'll be unused permissions.

@boomanaiden154 boomanaiden154 merged commit 772b264 into llvm:main Jul 30, 2025
5 checks passed
boomanaiden154 pushed a commit that referenced this pull request Aug 5, 2025
…cs (#535)

This change reintroduces a BigQuery role binding that was removed in
#525. Now that our CronJob is also querying past data to determine the
number of unique LLVM contributors over time, we must grant the
associated service account `roles/bigquery.JobUser` so that the BigQuery
client can create query jobs.

This is the error without this binding:

```
google.api_core.exceptions.Forbidden: 403 POST: Access Denied: User does not have bigquery.jobs.create 
permission in project llvm-premerge-checks.
```
vvereschaka pushed a commit to vvereschaka/llvm-zorg that referenced this pull request Sep 25, 2025
As we are already making calls to the GitHub GraphQL API for data
validation, we can just remove the added complexity of using GitHub
Archive BigQuery as a data source and query the API directly. Using
BigQuery has the advantage of not being rate-limited, but we often have
to query for 50-70 commits via the API anyway due to missing records of
events in GitHub Archive. With more than half of the BigQuery data
points needing amending, it makes more sense to use the API as the
original data source.
vvereschaka pushed a commit to vvereschaka/llvm-zorg that referenced this pull request Sep 25, 2025
…cs (llvm#535)

This change reintroduces a BigQuery role binding that was removed in
llvm#525. Now that our CronJob is also querying past data to determine the
number of unique LLVM contributors over time, we must grant the
associated service account `roles/bigquery.JobUser` so that the BigQuery
client can create query jobs.

This is the error without this binding:

```
google.api_core.exceptions.Forbidden: 403 POST: Access Denied: User does not have bigquery.jobs.create 
permission in project llvm-premerge-checks.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants