-
Notifications
You must be signed in to change notification settings - Fork 207
chore(e2e): rhidp-9016 - Create automated script to provision RHDH Orchestrator integrated #3616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
The image is available at: |
|
/retest |
|
The image is available at: |
|
The image is available at: |
| costmetrics_operator_source: redhat-operators | ||
| costmetrics_operator_source_namespace: openshift-marketplace | ||
|
|
||
| costmetrics_client_id: "e989874e-279e-4291-b104-60fab5d7f9bc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note - this is revoked
|
Are you including the Ansible roles on purpose? If so, some of the roles are not directly related to RHDH / Orchestrator so you can remove them:
|
| EOF | ||
|
|
||
| echo "=== Waiting for database initialization ===" | ||
| oc wait job -l job-name --for=condition=Complete -n ${NAMESPACE} --timeout=60s 2> /dev/null || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should job-name be something specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added app: init-orchestrator-db label to the Job and updated the wait command to use it.
| # Clonar repositório de workflows | ||
| TEMP_DIR=$(mktemp -d) | ||
| echo "Cloning workflows repository to ${TEMP_DIR}..." | ||
| git clone "${WORKFLOW_REPO}" "${TEMP_DIR}/workflows" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloning the workflows repo is only necessary when installing the greeting workflow, which I do not see happening in this script.
You're absolutely right,I should remove those roles. I used |
8b88ea8 to
b5b8b4c
Compare
…ntegrated with RHDH Signed-off-by: Gustavo Lira <[email protected]>
Signed-off-by: Gustavo Lira <[email protected]>
- Changed loop variable from 'i' to '_' in deploy-orchestrator.sh for clarity. - Updated git clone command in 04-deploy-workflows.sh to use quotes around variables for better handling of paths. Signed-off-by: Gustavo Lira <[email protected]>
- Modified the condition for creating a namespace to ensure it only executes when helm_managed_rhdh is false and the workflow_namespace is not "sonataflow-infra". - Removed redundant condition check to streamline the task logic. Signed-off-by: Gustavo Lira <[email protected]>
… to adhere to best practices. - Updated various shell commands and YAML configurations for improved readability and consistency. - Ensured proper formatting in multiple files, including deployment scripts and task definitions. Signed-off-by: Gustavo Lira <[email protected]>
- Remove unused roles from flight-path-auto-tests base: - deploy-cost-metrics-operator - deploy-optimizer-app - deploy-orchestrator (old) - deploy-resource-optimization-plugin - deploy-resource-optimization-workflow - odf-node-recovery - post-mortem - Remove unnecessary workflow repo cloning in 04-deploy-workflows.sh (workflows are created inline, no repo cloning needed) - Fix PostgreSQL Job initialization: - Added app: init-orchestrator-db label to Job - Updated wait command to use correct label Addresses feedback from @chadcrum
…eating SonataFlowPlatform The SonataFlowPlatform CRD was being created before the Serverless Logic Operator was ready, causing 'Failed to find exact match for sonataflow.org/v1alpha08.SonataFlowPlatform' error. Moved the wait for serverless operator components to run immediately after the plugin-infra.sh script execution, before creating PostgreSQL and SonataFlowPlatform resources.
…d of relying on external script The external plugin-infra.sh script was failing with GitHub rate limits (429). Replace it with direct operator installation using Kubernetes resources. Changes: - Install Red Hat OpenShift Serverless Operator via Subscription - Install Red Hat OpenShift Serverless Logic Operator via Subscription - Create required namespaces before operator installation - Add proper wait logic for CSVs to be in Succeeded state - Remove dependency on external GitHub script This makes the deployment more reliable and predictable.
…nstalling Logic Operator The Logic Operator requires Knative Serving and Knative Eventing to be ready before it can install properly. Changes: - Create KnativeServing instance in knative-serving namespace - Create KnativeEventing instance in knative-eventing namespace - Wait for Knative components to be Ready before installing Logic Operator - Add explicit wait for Logic Operator deployment to be Available - Add wait for SonataFlow CRDs to be created before attempting to use them This ensures the correct operator installation sequence and prevents 'Failed to find exact match for SonataFlowPlatform' errors.
The Logic Operator subscription needs an OperatorGroup in its namespace to be properly resolved and installed. Changes: - Add OperatorGroup for openshift-serverless-logic namespace - Remove wait for non-existent logic-operator deployment (Logic Operator is managed by Serverless Operator, no separate deployment) - Increase CSV wait timeout from 5 to 10 minutes - Improve error messages in wait loops This ensures the Logic Operator CSV is created and the SonataFlow CRDs are properly installed in the cluster.
…fra namespace The vars_files was loading role defaults AFTER our vars definition, causing rhdh_ns to be overwritten with 'rhdh-operator' instead of 'orchestrator-infra'. Changes: - Remove vars_files that was overriding our namespace variable - Add explicit kubeconfig_path variable definition - This ensures PostgreSQL and SonataFlowPlatform are created in the correct 'orchestrator-infra' namespace Fixes issue where all infrastructure was being created in wrong namespaces.
The Logic Operator does not support OwnNamespace mode and was failing with 'OwnNamespace InstallModeType not supported' error. Changes: - Remove targetNamespaces from OperatorGroup (enables AllNamespaces mode) - Update CSV wait to accept both Succeeded and Failed phases - Add note that Failed phase is acceptable if CRDs are installed - The Logic Operator successfully installs CRDs even when CSV shows Failed This is expected behavior - the Logic Operator installs the required SonataFlow CRDs regardless of the CSV phase.
…Operator The Logic Operator requires an OperatorGroup with AllNamespaces mode. Changes: - Rename OperatorGroup from 'openshift-serverless-logic' to 'global-operators' - Keep empty spec to enable AllNamespaces mode - Add wait for Logic Operator controller pod to be ready - Add dbMigrationStrategy to SonataFlow Platform persistence config - Improve wait conditions for all deployments - Add KUBECONFIG environment to all shell commands - Wait for deployments to be created before checking their status This ensures the Logic Operator CSV reaches Succeeded state and the controller pod is running before attempting to create SonataFlowPlatform.
… troubleshooting Changes: - Update script name from deploy.sh to deploy-orchestrator.sh - Update command-line flags to match actual implementation - Add troubleshooting section for Logic Operator CSV Failed state - Document that Failed CSV state is expected when CRDs are installed - Add verification commands for Logic Operator controller
…ler pod The Logic Operator controller pod uses label 'app.kubernetes.io/name=sonataflow-operator' not 'app.kubernetes.io/name=logic-operator-rhel8'. This fixes the wait condition that was failing to find the controller pod.
… playbook The variable orchestrator_db_name was missing from the main playbook vars after removing the vars_files that was loading defaults/main.yml. This fixes the error: 'orchestrator_db_name' is undefined
…ests_subpath variables These variables are required for workflow deployment but were missing from the main playbook vars after removing vars_files. This fixes the error: 'workflow_repo' is undefined
The SonataFlowPlatform uses 'Succeed' condition type, not 'Ready'. This was causing the wait task to timeout unnecessarily. Changes: - Updated jsonpath to look for conditions[?(@.type=="Succeed")] - Updated variable name from READY to SUCCEED for clarity - Updated log messages to reflect correct condition name
…flows Workflows were trying to connect to extracted hostname from secret but should use the hardcoded service name 'postgresql'. This fixes UnknownHostException: sonataflow-psql-postgresql.orchestrator-infra Changes: - Set dynamic_psql_svc_name to 'postgresql' directly - Removed regex extraction that was causing incorrect hostname - Workflows will now connect to postgresql.orchestrator-infra correctly
The namespace deletion wasn't waiting long enough and wasn't cleaning SonataFlow resources first, which can block namespace deletion. Changes: - Delete SonataFlow and SonataFlowPlatform resources before namespace - Improved wait loop with better feedback (shows progress every 5s) - Added force cleanup for stuck resources after 60s timeout - Better logging to show deletion progress This ensures clean reinstallation when running without --no-clean flag.
…credentials Workflows were failing with 'couldn't find key POSTGRES_USER in Secret' because the secret uses POSTGRESQL_USER and POSTGRESQL_PASSWORD, not POSTGRES_USER and POSTGRES_PASSWORD. Changes: - Updated dynamic_psql_user_key to 'POSTGRESQL_USER' - Updated dynamic_psql_password_key to 'POSTGRESQL_PASSWORD' - Fixed extraction shell command to read POSTGRESQL_USER from secret - Added error handling for missing POSTGRES_HOST key This fixes CreateContainerConfigError in workflow pods.
…abase The sonataflow user doesn't have permission to create databases. Need to use postgres superuser instead. Changes: - Updated CREATE DATABASE command to use 'postgres' user - Added OWNER sonataflow to ensure correct ownership - This ensures the database is created successfully on first run
Allow users to override workflow images via variables. Defaults to 'latest' tag but can be changed to specific versions if image pull issues occur. Changes: - Added user_onboarding_image variable (defaults to latest) - Added image patch task for user-onboarding workflow - Image can be overridden at runtime: -e user_onboarding_image=<image:tag> - Automatically restarts pods when image is changed Example usage: ./deploy-orchestrator.sh # Or with specific tag: ansible-playbook ... -e user_onboarding_image=quay.io/orchestrator/demo-user-onboarding:656e56bd
…NSES files Deleted outdated CHANGELOG.md and REVIEW_RESPONSES.md files to streamline the repository and eliminate unnecessary clutter. Updated deploy.yml to fix whitespace issues for better readability.
…meout The wait tasks were hanging indefinitely when pods didn't exist or were in error states like ImagePullBackOff. Changes: - Added loop to check pod existence before waiting (24 iterations x 5s = 2min) - Check pod count first, then wait for readiness with short timeout - Always exit 0 (success) after timeout to allow deployment to continue - Better logging to show progress during wait - Prevents Ansible from hanging on workflow pod issues This ensures the deployment continues even if workflow pods have temporary issues, allowing for manual troubleshooting without blocking the entire deploy.
The script was showing incorrect port (8080) for Data Index Service. The actual service port is 80, not 8080. Changes: - Updated help text to show correct URL with /graphql endpoint - Updated final success message with correct port (80) - Added alternative full URL format for clarity - Removed incorrect :8080 port reference Correct URLs: - Short: http://sonataflow-platform-data-index-service.orchestrator-infra/graphql - Full: http://sonataflow-platform-data-index-service.orchestrator-infra.svc.cluster.local:80/graphql This fixes RHDH connection errors to Data Index Service.
…port Updated documentation to show correct Data Index Service URL. The service uses port 80, not 8080, and requires /graphql endpoint. Changes: - Updated example configuration with correct URL - Added both short and full URL formats - Removed incorrect :8080 port reference This matches the actual service configuration and prevents connection errors.
…ntegrated Signed-off-by: Gustavo Lira <[email protected]>
b5b8b4c to
f25f7ad
Compare
|
@gustavolira: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
The image is available at: |
|
This PR is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 21 days. |
Description
Introduces a new automated script to provision the RHDH Orchestrator.
A dedicated shell script that sets up the Orchestrator backend with default values and a single command.
Configuration adjustments to streamline deployment and reduce manual steps.
Extended logging and validation to ensure correct integration of the Orchestrator module.
Which issue(s) does this PR fix
https://issues.redhat.com/browse/RHIDP-9016
PR acceptance criteria
Please make sure that the following steps are complete:
How to test changes / Special notes to the reviewer