-
Notifications
You must be signed in to change notification settings - Fork 730
[GH-2351] [CI] Fix R CI flakiness with Spark download from PySpark #2352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently Apache Archive downloading is very slow. Tested locally:
Also, the spark package cache can NOT be found on github: gh api repos/apache/sedona/actions/caches --jq '.actions_caches[].key' |
Maybe we should see if we can download the full distribution via pyspark? see how we do it in PySpark to avoid touching archive.apache.org: https://github.com/apache/sedona/blob/master/.github/workflows/python.yml#L157 |
Great suggestion! I have implemented the same way as python.yml and it works well without timeout. Now R tests finishes around 10 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses CI flakiness by changing the Spark installation method from using cached Spark installations to using PySpark with a pre-installed Spark setup.
- Replaces Spark caching mechanism with PySpark installation and direct JAR management
- Adds support for using SPARK_HOME environment variable in R test connections
- Implements retry logic with exponential backoff for downloading JAI libraries
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
.github/workflows/r.yml | Replaces cached Spark installation with PySpark setup, adds JAI library downloads with retry logic, and sets SPARK_HOME |
R/tests/testthat/helper-initialize.R | Adds SPARK_HOME support for test connections and conditional logic for local vs CI environments |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject
. Closes #<issue_number>What changes were proposed in this PR?
Add retry logic with exponential backoff for Spark downloads to make the workflow resilient to temporary network issues.
How was this patch tested?
test on ci run
Did this PR include necessary documentation updates?