refactor(pipelines/tidb): use OCI artifacts and per-stage agents in br pipelines#4186
refactor(pipelines/tidb): use OCI artifacts and per-stage agents in br pipelines#4186
Conversation
pipelines add OCI_TAG_* variables for pd/tikv/tiflash and other deps (ycsb, fake-gcs-server, kes, minio) and handle ticdc special-case; replace FILE_SERVER_URL with OCI_ARTIFACT_HOST. change top-level agent to none and run a kubernetes agent per "Checkout & Prepare" stage, merge debug, checkout and prepare steps into a single stage, increase checkout timeout to 15m, use REFS.repo for checkout dir and update caching. fix tiflash symlink typo across releases.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
I have already done a preliminary review for you, and I hope to help you do a better job.
Summary
The PR refactors multiple pipeline scripts for TiDB integration tests, replacing FILE_SERVER_URL with OCI_ARTIFACT_HOST, adding OCI tag computation for dependencies, and consolidating "Checkout" and "Prepare" stages into a single "Checkout & Prepare" stage. The changes streamline artifact handling, improve caching, and update timeout configurations. The overall quality is good, but several areas could benefit from improved error handling, testing coverage, and adherence to best practices.
Critical Issues
-
Error Handling in Artifact Download
- File: Multiple Groovy scripts (e.g.,
pipelines/pingcap/tidb/release-6.1/pull_br_integration_test.groovy, line 55) - Issue: The artifact download script (
download_pingcap_oci_artifact.sh) relies on retries but lacks proper error handling or validation to confirm successful downloads. If the script fails repeatedly, the pipeline will proceed without verifying all artifacts are available. - Suggestion: Add validation steps to ensure all required artifacts are downloaded successfully before proceeding to later stages. Example:
sh label: "verify artifact download", script: """ if [ ! -f ./pd-server ] || [ ! -f ./tikv-server ] || [ ! -f ./tiflash ]; then echo "Error: Required artifacts missing. Exiting." exit 1 fi """
- File: Multiple Groovy scripts (e.g.,
-
Caching Key Consistency
- File: Multiple Groovy scripts (e.g.,
pipelines/pingcap/tidb/release-6.5/pull_br_integration_test.groovy, line 87) - Issue: The caching keys (
prow.getCacheKey) for binary and workspace caching are inconsistent across scripts. This inconsistency could lead to redundant builds or cache invalidation. - Suggestion: Standardize cache keys for binaries and workspace across all scripts by introducing a common key structure (e.g., using
${REFS.repo}-${BUILD_TAG}).
- File: Multiple Groovy scripts (e.g.,
Code Improvements
-
Refactoring Repeated Code for Artifact Download
- File: Multiple Groovy scripts
- Issue: The artifact download logic is repeated across several scripts, leading to code duplication.
- Suggestion: Extract the artifact download logic into a reusable function or script and call it across all pipelines. This reduces duplication and centralizes updates. Example:
def downloadArtifacts() { sh """ ${WORKSPACE}/scripts/artifacts/download_pingcap_oci_artifact.sh \ --pd=${OCI_TAG_PD} \ --tikv=${OCI_TAG_TIKV} \ --tiflash=${OCI_TAG_TIFLASH} \ --ycsb=${OCI_TAG_YCSB} \ --fake-gcs-server=${OCI_TAG_FAKE_GCS_SERVER} \ --kes=${OCI_TAG_KES} \ --minio=${OCI_TAG_MINIO} """ }
-
Timeout Optimization for Artifact Download
- File: Multiple Groovy scripts (e.g.,
pipelines/pingcap/tidb/release-6.1/pull_br_integration_test.groovy, line 55) - Issue: The
retry(2)mechanism for downloading artifacts does not consider a timeout for each attempt, which could delay pipeline execution if downloads hang. - Suggestion: Add a timeout to the artifact download step to prevent indefinite hangs:
timeout(time: 10, unit: 'MINUTES') { retry(2) { sh label: "download tidb components", script: """ ${WORKSPACE}/scripts/artifacts/download_pingcap_oci_artifact.sh ... """ } }
- File: Multiple Groovy scripts (e.g.,
Best Practices
-
Testing Coverage for Edge Cases
- File: All scripts
- Issue: There is no explicit testing for edge cases, such as missing or corrupted artifacts, which could disrupt subsequent stages.
- Suggestion: Add unit tests or pre-checks in the pipeline stages to validate artifacts before proceeding (e.g., checksum validation for downloaded files).
-
Environment Variable Documentation
- File: All scripts
- Issue: Newly introduced environment variables (e.g.,
OCI_ARTIFACT_HOST) lack documentation or inline comments explaining their purpose. - Suggestion: Add comments explaining the purpose and usage of each environment variable. Example:
environment { OCI_ARTIFACT_HOST = 'hub-zot.pingcap.net/mirrors/hub' // Cache mirror to store OCI artifacts for faster access }
-
Improve Logging for Debugging
- File: All scripts
- Issue: The debug information stage was removed, making it harder to troubleshoot issues during pipeline execution.
- Suggestion: Add concise logging or a minimal debug step to capture critical environment variables and configurations. Example:
stage('Debug Info') { steps { sh label: "Debugging", script: """ echo 'Environment Variables:' printenv | grep 'OCI_' echo 'Git Repo: ${REFS.repo}' """ } }
-
Naming Conventions for Variables
- File: Multiple scripts
- Issue: Variable names like
OCI_TAG_TIKVandOCI_TAG_PDare inconsistent with broader naming conventions. - Suggestion: Rename variables to follow consistent naming patterns (e.g.,
OCI_TAG_<COMPONENT>orTAG_<COMPONENT>).
Conclusion
The PR introduces meaningful improvements in pipeline efficiency and artifact management. Addressing the critical issues regarding error handling and caching consistency, refactoring duplicated code, and adhering to best practices will significantly enhance the robustness and maintainability of the pipelines.
Summary of ChangesHello @wuhuizuo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the Jenkins pipelines for TiDB's BR (Backup & Restore) integration tests and e2e tests. The primary goal is to modernize dependency management by transitioning to OCI artifacts and to enhance pipeline efficiency through optimized agent allocation and stage consolidation. These changes aim to improve build reliability and reduce execution times by leveraging OCI for artifact distribution and refining the build process flow. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the br integration test pipelines to use OCI artifacts and per-stage agents, which is a great improvement for maintainability and efficiency. The changes are applied consistently across multiple release branches, and they simplify the pipeline structure by merging checkout and preparation steps. The changes look solid.
pipelines
add OCI_TAG_* variables for pd/tikv/tiflash and other deps (ycsb, fake-gcs-server, kes, minio) and handle ticdc special-case; replace FILE_SERVER_URL with OCI_ARTIFACT_HOST. change top-level agent to none and run a kubernetes agent per "Checkout & Prepare" stage, merge debug, checkout and prepare steps into a single stage, increase checkout timeout
to 15m, use REFS.repo for checkout dir and update caching. fix tiflash symlink typo across releases.