-
Notifications
You must be signed in to change notification settings - Fork 119
[CI] Use GraphQL API instead of BigQuery to get review data #525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
boomanaiden154
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable enough to me.
Can you also put up a PR to remove the service account from the terraform?
We'll still want the service account around, I have an upcoming PR that will export the processed data to our own BigQuery dataset so we'll still need some IAM infra. But I could remove the binding for the current BigQuery role at least, since they'll be unused permissions. |
…cs (#535) This change reintroduces a BigQuery role binding that was removed in #525. Now that our CronJob is also querying past data to determine the number of unique LLVM contributors over time, we must grant the associated service account `roles/bigquery.JobUser` so that the BigQuery client can create query jobs. This is the error without this binding: ``` google.api_core.exceptions.Forbidden: 403 POST: Access Denied: User does not have bigquery.jobs.create permission in project llvm-premerge-checks. ```
As we are already making calls to the GitHub GraphQL API for data validation, we can just remove the added complexity of using GitHub Archive BigQuery as a data source and query the API directly. Using BigQuery has the advantage of not being rate-limited, but we often have to query for 50-70 commits via the API anyway due to missing records of events in GitHub Archive. With more than half of the BigQuery data points needing amending, it makes more sense to use the API as the original data source.
…cs (llvm#535) This change reintroduces a BigQuery role binding that was removed in llvm#525. Now that our CronJob is also querying past data to determine the number of unique LLVM contributors over time, we must grant the associated service account `roles/bigquery.JobUser` so that the BigQuery client can create query jobs. This is the error without this binding: ``` google.api_core.exceptions.Forbidden: 403 POST: Access Denied: User does not have bigquery.jobs.create permission in project llvm-premerge-checks. ```
As we are already making calls to the GitHub GraphQL API for data validation, we can just remove the added complexity of using GitHub Archive BigQuery as a data source and query the API directly. Using BigQuery has the advantage of not being rate-limited, but we often have to query for 50-70 commits via the API anyway due to missing records of events in GitHub Archive. With more than half of the BigQuery data points needing amending, it makes more sense to use the API as the original data source.