[GH-2351] [CI] Fix R CI flakiness with Spark download from PySpark #2352

zhangfengcdt · 2025-09-16T17:23:41Z

Did you read the Contributor Guide?

Yes, I have read the Contributor Rules and Contributor Development Guide

Is this PR related to a ticket?

Yes, and the PR name follows the format [GH-XXX] my subject. Closes #<issue_number>

What changes were proposed in this PR?

Add retry logic with exponential backoff for Spark downloads to make the workflow resilient to temporary network issues.

How was this patch tested?

test on ci run

Did this PR include necessary documentation updates?

No, this PR does not affect any public API so no need to change the documentation.

zhangfengcdt · 2025-09-16T18:18:40Z

Currently Apache Archive downloading is very slow.

Tested locally:

Speed: ~44 KB/s (43,968 bytes/sec)
10MB would take: ~4 minutes
400MB Spark would take: ~2.5 hours!

Also, the spark package cache can NOT be found on github:

gh api repos/apache/sedona/actions/caches --jq '.actions_caches[].key'

jiayuasu · 2025-09-16T22:01:16Z

Maybe we should see if we can download the full distribution via pyspark? see how we do it in PySpark to avoid touching archive.apache.org: https://github.com/apache/sedona/blob/master/.github/workflows/python.yml#L157

…che archive

zhangfengcdt · 2025-09-16T23:52:42Z

Maybe we should see if we can download the full distribution via pyspark? see how we do it in PySpark to avoid touching archive.apache.org: https://github.com/apache/sedona/blob/master/.github/workflows/python.yml#L157

Great suggestion! I have implemented the same way as python.yml and it works well without timeout. Now R tests finishes around 10 minutes.

Copilot

Pull Request Overview

This PR addresses CI flakiness by changing the Spark installation method from using cached Spark installations to using PySpark with a pre-installed Spark setup.

Replaces Spark caching mechanism with PySpark installation and direct JAR management
Adds support for using SPARK_HOME environment variable in R test connections
Implements retry logic with exponential backoff for downloading JAI libraries

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
.github/workflows/r.yml	Replaces cached Spark installation with PySpark setup, adds JAI library downloads with retry logic, and sets SPARK_HOME
R/tests/testthat/helper-initialize.R	Adds SPARK_HOME support for test connections and conditional logic for local vs CI environments

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

.github/workflows/r.yml

Fix R CI flakiness with Spark download retry logic and timeout

4837503

zhangfengcdt requested a review from jiayuasu as a code owner September 16, 2025 17:23

github-actions bot added sedona-r github-actions labels Sep 16, 2025

zhangfengcdt added 5 commits September 16, 2025 15:41

test the same way python.yml download spark

e14e43e

add jai versions env

9b3a45b

revert r install script

7109a82

update R helper to check SPARK_HOME before downloading spark from apa…

c568f5d

…che archive

clean up comments

2f2c0f5

zhangfengcdt changed the title ~~[GH-2351] [CI] Fix R CI flakiness with Spark download retry logic and timeout~~ [GH-2351] [CI] Fix R CI flakiness with Spark download from PySpark Sep 16, 2025

zhangfengcdt requested a review from Copilot September 16, 2025 23:52

Copilot AI reviewed Sep 16, 2025

View reviewed changes

.github/workflows/r.yml Show resolved Hide resolved

.github/workflows/r.yml Show resolved Hide resolved

jiayuasu merged commit 6c8f66c into apache:master Sep 17, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GH-2351] [CI] Fix R CI flakiness with Spark download from PySpark #2352

[GH-2351] [CI] Fix R CI flakiness with Spark download from PySpark #2352

Uh oh!

zhangfengcdt commented Sep 16, 2025

Uh oh!

zhangfengcdt commented Sep 16, 2025

Uh oh!

jiayuasu commented Sep 16, 2025

Uh oh!

zhangfengcdt commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[GH-2351] [CI] Fix R CI flakiness with Spark download from PySpark #2352

[GH-2351] [CI] Fix R CI flakiness with Spark download from PySpark #2352

Uh oh!

Conversation

zhangfengcdt commented Sep 16, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

How was this patch tested?

Did this PR include necessary documentation updates?

Uh oh!

zhangfengcdt commented Sep 16, 2025

Uh oh!

jiayuasu commented Sep 16, 2025

Uh oh!

zhangfengcdt commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!