Skip to content

refactor(pipelines/tidb): use OCI artifacts and per-stage agents in br pipelines#4186

Open
wuhuizuo wants to merge 1 commit intomainfrom
refactor/download-pingcap-oci-artifact-3
Open

refactor(pipelines/tidb): use OCI artifacts and per-stage agents in br pipelines#4186
wuhuizuo wants to merge 1 commit intomainfrom
refactor/download-pingcap-oci-artifact-3

Conversation

@wuhuizuo
Copy link
Contributor

pipelines

add OCI_TAG_* variables for pd/tikv/tiflash and other deps (ycsb, fake-gcs-server, kes, minio) and handle ticdc special-case; replace FILE_SERVER_URL with OCI_ARTIFACT_HOST. change top-level agent to none and run a kubernetes agent per "Checkout & Prepare" stage, merge debug, checkout and prepare steps into a single stage, increase checkout timeout
to 15m, use REFS.repo for checkout dir and update caching. fix tiflash symlink typo across releases.

pipelines

add OCI_TAG_* variables for pd/tikv/tiflash and other deps (ycsb,
fake-gcs-server, kes, minio) and handle ticdc special-case; replace
FILE_SERVER_URL with OCI_ARTIFACT_HOST. change top-level agent to none
and run a kubernetes agent per "Checkout & Prepare" stage, merge debug,
checkout and prepare steps into a single stage, increase checkout
timeout
to 15m, use REFS.repo for checkout dir and update caching. fix tiflash
symlink typo across releases.
@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wuhuizuo for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wuhuizuo wuhuizuo changed the title refactor(pipelines/tidb): use OCI artifacts and per-stage agents in br refactor(pipelines/tidb): use OCI artifacts and per-stage agents in br pipelines Feb 26, 2026
@ti-chi-bot ti-chi-bot bot added the size/XXL label Feb 26, 2026
Copy link

@ti-chi-bot ti-chi-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already done a preliminary review for you, and I hope to help you do a better job.

Summary

The PR refactors multiple pipeline scripts for TiDB integration tests, replacing FILE_SERVER_URL with OCI_ARTIFACT_HOST, adding OCI tag computation for dependencies, and consolidating "Checkout" and "Prepare" stages into a single "Checkout & Prepare" stage. The changes streamline artifact handling, improve caching, and update timeout configurations. The overall quality is good, but several areas could benefit from improved error handling, testing coverage, and adherence to best practices.


Critical Issues

  1. Error Handling in Artifact Download

    • File: Multiple Groovy scripts (e.g., pipelines/pingcap/tidb/release-6.1/pull_br_integration_test.groovy, line 55)
    • Issue: The artifact download script (download_pingcap_oci_artifact.sh) relies on retries but lacks proper error handling or validation to confirm successful downloads. If the script fails repeatedly, the pipeline will proceed without verifying all artifacts are available.
    • Suggestion: Add validation steps to ensure all required artifacts are downloaded successfully before proceeding to later stages. Example:
      sh label: "verify artifact download", script: """
          if [ ! -f ./pd-server ] || [ ! -f ./tikv-server ] || [ ! -f ./tiflash ]; then
              echo "Error: Required artifacts missing. Exiting."
              exit 1
          fi
      """
  2. Caching Key Consistency

    • File: Multiple Groovy scripts (e.g., pipelines/pingcap/tidb/release-6.5/pull_br_integration_test.groovy, line 87)
    • Issue: The caching keys (prow.getCacheKey) for binary and workspace caching are inconsistent across scripts. This inconsistency could lead to redundant builds or cache invalidation.
    • Suggestion: Standardize cache keys for binaries and workspace across all scripts by introducing a common key structure (e.g., using ${REFS.repo}-${BUILD_TAG}).

Code Improvements

  1. Refactoring Repeated Code for Artifact Download

    • File: Multiple Groovy scripts
    • Issue: The artifact download logic is repeated across several scripts, leading to code duplication.
    • Suggestion: Extract the artifact download logic into a reusable function or script and call it across all pipelines. This reduces duplication and centralizes updates. Example:
      def downloadArtifacts() {
          sh """
              ${WORKSPACE}/scripts/artifacts/download_pingcap_oci_artifact.sh \
                  --pd=${OCI_TAG_PD} \
                  --tikv=${OCI_TAG_TIKV} \
                  --tiflash=${OCI_TAG_TIFLASH} \
                  --ycsb=${OCI_TAG_YCSB} \
                  --fake-gcs-server=${OCI_TAG_FAKE_GCS_SERVER} \
                  --kes=${OCI_TAG_KES} \
                  --minio=${OCI_TAG_MINIO}
          """
      }
  2. Timeout Optimization for Artifact Download

    • File: Multiple Groovy scripts (e.g., pipelines/pingcap/tidb/release-6.1/pull_br_integration_test.groovy, line 55)
    • Issue: The retry(2) mechanism for downloading artifacts does not consider a timeout for each attempt, which could delay pipeline execution if downloads hang.
    • Suggestion: Add a timeout to the artifact download step to prevent indefinite hangs:
      timeout(time: 10, unit: 'MINUTES') {
          retry(2) {
              sh label: "download tidb components", script: """
                  ${WORKSPACE}/scripts/artifacts/download_pingcap_oci_artifact.sh ...
              """
          }
      }

Best Practices

  1. Testing Coverage for Edge Cases

    • File: All scripts
    • Issue: There is no explicit testing for edge cases, such as missing or corrupted artifacts, which could disrupt subsequent stages.
    • Suggestion: Add unit tests or pre-checks in the pipeline stages to validate artifacts before proceeding (e.g., checksum validation for downloaded files).
  2. Environment Variable Documentation

    • File: All scripts
    • Issue: Newly introduced environment variables (e.g., OCI_ARTIFACT_HOST) lack documentation or inline comments explaining their purpose.
    • Suggestion: Add comments explaining the purpose and usage of each environment variable. Example:
      environment {
          OCI_ARTIFACT_HOST = 'hub-zot.pingcap.net/mirrors/hub'  // Cache mirror to store OCI artifacts for faster access
      }
  3. Improve Logging for Debugging

    • File: All scripts
    • Issue: The debug information stage was removed, making it harder to troubleshoot issues during pipeline execution.
    • Suggestion: Add concise logging or a minimal debug step to capture critical environment variables and configurations. Example:
      stage('Debug Info') {
          steps {
              sh label: "Debugging", script: """
                  echo 'Environment Variables:'
                  printenv | grep 'OCI_'
                  echo 'Git Repo: ${REFS.repo}'
              """
          }
      }
  4. Naming Conventions for Variables

    • File: Multiple scripts
    • Issue: Variable names like OCI_TAG_TIKV and OCI_TAG_PD are inconsistent with broader naming conventions.
    • Suggestion: Rename variables to follow consistent naming patterns (e.g., OCI_TAG_<COMPONENT> or TAG_<COMPONENT>).

Conclusion

The PR introduces meaningful improvements in pipeline efficiency and artifact management. Addressing the critical issues regarding error handling and caching consistency, refactoring duplicated code, and adhering to best practices will significantly enhance the robustness and maintainability of the pipelines.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @wuhuizuo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Jenkins pipelines for TiDB's BR (Backup & Restore) integration tests and e2e tests. The primary goal is to modernize dependency management by transitioning to OCI artifacts and to enhance pipeline efficiency through optimized agent allocation and stage consolidation. These changes aim to improve build reliability and reduce execution times by leveraging OCI for artifact distribution and refining the build process flow.

Highlights

  • OCI Artifact Adoption: The pipelines now utilize OCI (Open Container Initiative) artifacts for downloading dependencies like PD, TiKV, TiFlash, YCSB, Fake-GCS-Server, KES, Minio, and TiCDC, replacing the previous file server URL approach.
  • Pipeline Agent Refactoring: The top-level Jenkins agent has been set to 'none', and a Kubernetes agent is now dynamically provisioned for each 'Checkout & Prepare' stage, optimizing resource allocation.
  • Stage Consolidation and Optimization: The 'Debug info', 'Checkout', and 'Prepare' stages have been merged into a single 'Checkout & Prepare' stage, streamlining the pipeline execution flow. The checkout timeout for this stage has also been increased to 15 minutes.
  • Improved Caching and Directory Management: The checkout directory now consistently uses REFS.repo, and caching mechanisms have been updated to improve efficiency and reliability across pipeline runs.
  • Symlink Typo Correction: A persistent tiflash symlink typo has been corrected across multiple release pipelines, ensuring correct binary execution.
Changelog
  • pipelines/pingcap/tidb/release-6.1/pull_br_integration_test.groovy
    • Added OCI tag variables for PD, TiKV, TiFlash, YCSB, Fake-GCS-Server, KES, and Minio.
    • Replaced FILE_SERVER_URL with OCI_ARTIFACT_HOST.
    • Changed the main pipeline agent to none.
    • Merged 'Debug info', 'Checkout', and 'Prepare' stages into a new 'Checkout & Prepare' stage.
    • Increased the timeout for the 'Checkout & Prepare' stage to 15 minutes.
    • Updated the checkout directory to REFS.repo.
    • Modified artifact download logic to use download_pingcap_oci_artifact.sh with OCI tags.
    • Fixed the tiflash symlink creation to ln -s tiflash_dir/tiflash tiflash.
    • Updated the workspace cache key.
    • Added touch rev-${REFS.pulls[0].sha} for cache validation.
    • Updated the tar command in the post-failure step with --warning=no-file-changed.
  • pipelines/pingcap/tidb/release-6.5-20241101-v6.5.7/pull_br_integration_test.groovy
    • Fixed the tiflash symlink creation typo.
  • pipelines/pingcap/tidb/release-6.5-fips/pull_br_integration_test.groovy
    • Fixed the tiflash symlink creation typo.
  • pipelines/pingcap/tidb/release-6.5/pull_br_integration_test.groovy
    • Added OCI tag variables for PD, TiKV, TiFlash, YCSB, Fake-GCS-Server, KES, and Minio.
    • Replaced FILE_SERVER_URL with OCI_ARTIFACT_HOST.
    • Changed the main pipeline agent to none.
    • Merged 'Debug info', 'Checkout', and 'Prepare' stages into a new 'Checkout & Prepare' stage.
    • Increased the timeout for the 'Checkout & Prepare' stage to 15 minutes.
    • Updated the checkout directory to REFS.repo.
    • Modified artifact download logic to use download_pingcap_oci_artifact.sh with OCI tags.
    • Fixed the tiflash symlink creation to ln -s tiflash_dir/tiflash tiflash.
    • Updated the workspace cache key.
    • Added touch rev-${REFS.pulls[0].sha} for cache validation.
    • Updated the tar command in the post-failure step with --warning=no-file-changed.
  • pipelines/pingcap/tidb/release-7.1/pull_br_integration_test.groovy
    • Added OCI tag variables for PD, TiKV, TiFlash, YCSB, Fake-GCS-Server, KES, and Minio.
    • Replaced FILE_SERVER_URL with OCI_ARTIFACT_HOST.
    • Changed the main pipeline agent to none.
    • Merged 'Debug info', 'Checkout', and 'Prepare' stages into a new 'Checkout & Prepare' stage.
    • Increased the timeout for the 'Checkout & Prepare' stage to 15 minutes.
    • Updated the checkout directory to REFS.repo.
    • Modified artifact download logic to use download_pingcap_oci_artifact.sh with OCI tags.
    • Fixed the tiflash symlink creation to ln -s tiflash_dir/tiflash tiflash.
    • Updated the workspace cache key.
    • Added touch rev-${REFS.pulls[0].sha} for cache validation.
    • Updated the tar command in the post-failure step with --warning=no-file-changed.
  • pipelines/pingcap/tidb/release-7.3/pull_br_integration_test.groovy
    • Fixed the tiflash symlink creation typo.
  • pipelines/pingcap/tidb/release-7.4/pull_br_integration_test.groovy
    • Fixed the tiflash symlink creation typo.
  • pipelines/pingcap/tidb/release-7.5/pull_br_integration_test.groovy
    • Added OCI tag variables for PD, TiKV, TiFlash, YCSB, Fake-GCS-Server, KES, and Minio.
    • Replaced FILE_SERVER_URL with OCI_ARTIFACT_HOST.
    • Changed the main pipeline agent to none.
    • Merged 'Debug info', 'Checkout', and 'Prepare' stages into a new 'Checkout & Prepare' stage.
    • Increased the timeout for the 'Checkout & Prepare' stage to 15 minutes.
    • Updated the checkout directory to REFS.repo.
    • Modified artifact download logic to use download_pingcap_oci_artifact.sh with OCI tags.
    • Fixed the tiflash symlink creation to ln -s tiflash_dir/tiflash tiflash.
    • Updated the workspace cache key.
    • Removed the run_group.sh others check from the prepare stage.
    • Adjusted the formatting of TEST_GROUP values.
    • Added touch rev-${REFS.pulls[0].sha} for cache validation.
    • Added the run_group.sh others check back into the test stage.
    • Updated the tar command in the post-failure step with --warning=no-file-changed.
  • pipelines/pingcap/tidb/release-8.5/pull_integration_e2e_test.groovy
    • Added OCI tag variables for PD, TiKV, TiFlash, and TiCDC.
    • Replaced FILE_SERVER_URL with OCI_ARTIFACT_HOST.
    • Removed the 'Debug info' stage.
    • Updated the checkout directory to REFS.repo.
    • Removed the 'Prepare' stage.
    • Integrated artifact download directly into the 'Tests' stage using download_pingcap_oci_artifact.sh.
    • Fixed the tiflash symlink creation to ln -s tiflash_dir/tiflash tiflash.
    • Updated the artifact archiving path and added allowEmptyArchive.
Activity
  • The pull request introduces significant refactoring to multiple Jenkins pipelines, indicating a planned architectural improvement.
  • The changes are consistent across several release branches, suggesting a broad application of the new OCI artifact and agent strategy.
  • The author, wuhuizuo, has implemented a comprehensive update to how dependencies are managed and how pipeline stages are structured.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the br integration test pipelines to use OCI artifacts and per-stage agents, which is a great improvement for maintainability and efficiency. The changes are applied consistently across multiple release branches, and they simplify the pipeline structure by merging checkout and preparation steps. The changes look solid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant