Drop rabbitmq-cluster-operator dependency and manage RabbitMQ directly#551
Conversation
0e13654 to
472692b
Compare
|
retest |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ef64cbd98adf4d5f850016ae4677b0bc ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 26m 35s |
472692b to
39526df
Compare
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/install_yamls#1140 is needed. |
|
retest |
|
/test infra-operator-build-deploy-kuttl |
1 similar comment
|
/test infra-operator-build-deploy-kuttl |
|
/test infra-operator-build-deploy-kuttl |
5dc2a5d to
2f0db6d
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ef446e528f6942b69d23863d3272635c ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 33m 50s |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/cbc313e663c54e5a843813e91c0915a9 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 35m 11s |
|
/test infra-operator-build-deploy-kuttl |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/15bd33ab5841446c89f03c64a290ccd9 ❌ openstack-k8s-operators-content-provider FAILURE in 9m 28s |
|
recheck |
|
/test infra-operator-build-deploy-kuttl |
|
/test functional |
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2b49c72f6bb5489d89ca612b6070a481 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 26m 51s |
|
recheck |
|
/test infra-operator-build-deploy-kuttl |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
aec7490 to
a56e071
Compare
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
recheck |
|
Merge Failed. This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. |
|
recheck |
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1857 is needed. |
|
recheck |
a56e071 to
2937300
Compare
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1857 is needed. |
|
recheck |
|
/test functional |
Remove the dependency on the external rabbitmq-cluster-operator and have the infra-operator manage RabbitMQ StatefulSets, Services, ConfigMaps, and Secrets directly. Core controller changes: - Direct StatefulSet management with volume mounts, config generation, and TLS support (client and inter-node) - Service creation for client (AMQP/AMQPS) and headless node discovery - ConfigMap generation for server config, plugins, and config-data - Secret management for default-user credentials and Erlang cookie - PodDisruptionBudget for multi-replica deployments - Fix stale ownerReferences in volumeClaimTemplates from adopted StatefulSets (orphan-delete + recreate with annotation-based storage class preservation) - Label pods with skipPreStopChecks before StatefulSet deletion so the Downward API volume is populated when cascade deletion triggers the PreStop hook, preventing 7-day termination hangs - Set ObservedGeneration at end of reconciliation so consumers only see it after the spec has been fully processed - Add nil-safe DefaultUser checks in checkClusterReadiness and all reconcile-delete paths to prevent nil dereference during startup - Handle error from helper.NewHelper in sub-resource controllers Version upgrade workflow (3.x to 4.x): - State machine with phases: None -> DeletingResources -> WaitingForCluster -> None - Detect targetVersion changes and trigger storage wipe when crossing major versions (required by RabbitMQ for 3.x -> 4.x upgrades) - Set wipeReason=VersionUpgrade in status to track upgrade progress - Delete StatefulSet to stop all pods atomically, then recreate with a wipe-data init container that clears /var/lib/rabbitmq on the existing PVs (marker files prevent re-wipes across pod restarts) - Track currentVersion in status after successful upgrade - Reject version downgrades in validation webhook - Reject scale-down in validation webhook (both RabbitMq and SpecCore) Queue type migration (Mirrored to Quorum): - Support migrating from classic mirrored (ha-all policy) queues to quorum queues via spec.queueType change - Trigger storage wipe with wipeReason=QueueTypeMigration - Manage ha-all policy lifecycle: apply for Mirrored (replicas > 1), remove when transitioning away from Mirrored - Force queueType from Mirrored to Quorum in defaulting webhook when targetVersion is 4.x+, since mirrored queues are not supported in RabbitMQ 4.x. This enables the openstack-operator to upgrade from 3.x (Mirrored) to 4.x and have the migration handled automatically - Reject Mirrored+4.x in validation webhook as a safety net after defaulting AMQP proxy sidecar: - Python-based TCP proxy injected as a sidecar container when status.proxyRequired is true (after version upgrade or queue migration) - Rewrite AMQP queue.declare frames to force durable=True and x-queue-type=quorum, and exchange.declare frames to force durable=True - Listen on port 5672 (plain) or 5671 (TLS) depending on TLS config - Forward connections to RabbitMQ backend on port 5673 - Remove via clients-reconfigured annotation once consumers reconnect - RabbitMQProxyActive condition provides visibility into proxy state with actionable message explaining how to clear it - Include liveness/readiness probes and TLS certificate mounting Migration from rabbitmq-cluster-operator: - Detect migration by checking for an existing RabbitmqCluster CR with the same name; if none is found (or the CRD is not installed), the controller skips all migration logic and sets OldCRCleaned=True, allowing both operators to run side-by-side managing their own independent resources without conflict - Reparent existing StatefulSets, Services, and Secrets from old RabbitmqCluster owner to new RabbitMq CR - Strip old ownerReferences from PVCs before deleting old CR to prevent cascade garbage collection - Clean up old RabbitmqCluster CR after successful adoption - Fix stale volumeClaimTemplate ownerReferences that cause new PVCs to be garbage-collected when scaling up adopted StatefulSets - Migrate deprecated fields (persistence, rabbitmq, override.service) to new fields in webhook; remove CRD-level default from storage to avoid double-defaulting during migration Transport URL: - URL-encode username and password in transport URL to handle special characters in user-provided credentials Testing: - Functional tests (envtest): RabbitMQ controller reconciliation, RabbitMQPolicy lifecycle, TransportURL (plain, TLS, custom user/vhost, credential rotation), VCT fix, combined version+queue migration, proxy sidecar, operator coexistence (OldCRCleaned set when no old CR exists, unrelated RabbitmqCluster CRs not touched), webhook migration (persistence, config, override.service field migration with defaults) - Transport URL secret unit tests for password URL-encoding with special characters - Kuttl integration tests: basic cluster deployment, cluster resource ownership, credential rotation with cleanup-blocked finalizer, deletion with dependent resources, plugin enable/disable, policy enforcement via rabbitmqctl, queue migration (Mirrored to Quorum) with AMQP proxy rewrite verification (classic non-durable -> quorum durable), resource management (vhost/user/policy), scale-up with PDB, TLS configuration, TLS TransportURL, custom TransportURL, migration from old operator, version upgrades (3.9->4.2 with/without TLS, Mirrored upgrade), operator coexistence (both operators managing independent clusters without interference) - Kuttl tests use quay.io/openstack-k8s-operators/rabbitmq:3.9 for 3.x and quay.io/podified-antelope-centos9/openstack-rabbitmq:current-podified for 4.x Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2937300 to
e4e3813
Compare
|
This change depends on a change that failed to merge. Change openstack-k8s-operators/openstack-operator#1857 is needed. |
|
/test infra-operator-build-deploy-kuttl |
|
recheck |
|
/test infra-operator-build-deploy-kuttl |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lmiccini, stuggi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f57b3a5
into
openstack-k8s-operators:main
|
FTR, this seems to have broken the telemetry-operator which uses the rabbitmqv1.RabbitmqCluster CRs created under the hood https://github.com/openstack-k8s-operators/telemetry-operator/blob/2e5cb810e29db97a9036d6710bf8e0cc39f983e5/internal/controller/metricstorage_controller.go#L769C17-L769C43 The deployment ci job is not enabling MetricStorage in telemetry https://logserver.rdoproject.org/859/rdoproject.org/859776a774cb412a9db7c3ff47a24aaf/controller/ci-framework-data/logs/openstack-must-gather/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-58cb6caac54870493f224cc37a6252a09ea9ccbbda2851efaaba72cc8a6f1339/namespaces/openstack/crs/ It may be good add a job deploying additional services in infra-operator? |
Switch RabbitMQ scrape config discovery from rabbitmq.com/RabbitmqCluster (rabbitmq-cluster-operator) to rabbitmq.openstack.org/RabbitMq (infra-operator), following the removal of the rabbitmq-cluster-operator from infra-operator in openstack-k8s-operators/infra-operator#551. - Replace rabbitmq/cluster-operator imports with infra-operator rabbitmq/v1beta1 - Update RBAC from rabbitmq.com/rabbitmqclusters to rabbitmq.openstack.org/rabbitmqs - Remove rabbitmqclusterv1 scheme registration - Bump infra-operator, keystone-operator, and heat-operator Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The rabbitmq-cluster-operator has been removed from infra-operator (openstack-k8s-operators/infra-operator#551). RabbitMQ is now managed natively by infra-operator via the rabbitmq.openstack.org/RabbitMq CRD. Update all rabbitmq targets to use infra-operator instead of the standalone rabbitmq-cluster-operator: - Use KIND=RabbitMq and infra-operator sample CR - Remove rabbitmq operator install/cleanup targets (infra handles it) - Remove rabbitmq-cluster special-casing from helper scripts - Remove rabbitmq operator dep from kuttl_common_prep (already installed via kuttl_db_prep -> infra) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The rabbitmq-cluster-operator has been removed from infra-operator (openstack-k8s-operators/infra-operator#551). RabbitMQ is now managed natively by infra-operator via the rabbitmq.openstack.org/RabbitMq CRD. - Replace rabbitmq_deploy with an inline RabbitMq CR applied directly via oc apply, instead of cloning repos and running kustomize - Remove rabbitmq operator install/cleanup targets (infra handles it) - Remove rabbitmq-cluster special-casing from helper scripts - Remove rabbitmq operator dep from kuttl_common_prep (already installed via kuttl_db_prep -> infra) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the dependency on the external rabbitmq-cluster-operator and have
the infra-operator manage RabbitMQ StatefulSets, Services, ConfigMaps,
and Secrets directly.
Core controller changes:
generation, and TLS support (both client and inter-node)
StatefulSets (orphan-delete + recreate with annotation-based storage
class preservation)
Downward API volume is populated when cascade deletion triggers the
PreStop hook, preventing 7-day termination hangs
Version upgrade workflow (3.x to 4.x):
major versions (required by RabbitMQ for 3.x -> 4.x upgrades)
with a wipe-data init container that clears /var/lib/rabbitmq on the
existing PVs (marker files prevent re-wipes across pod restarts)
Queue type migration (Mirrored to Quorum):
quorum queues via spec.queueType change
removes when transitioning away from Mirrored
targetVersion is 4.x+, since mirrored queues are not supported in
RabbitMQ 4.x. This enables the openstack-operator to upgrade from
3.x (Mirrored) to 4.x and have the migration handled automatically
defaulting
AMQP proxy sidecar:
status.proxyRequired is true (after version upgrade or queue migration)
x-queue-type=quorum, and exchange.declare frames to force durable=True
Migration from rabbitmq-cluster-operator:
the same name; if none is found (or the CRD is not installed), the
controller skips all migration logic and sets OldCRCleaned=True,
allowing both operators to run side-by-side managing their own
independent resources without conflict
from old RabbitmqCluster owner to new RabbitMq CR
cascade garbage collection
to be garbage-collected when scaling up adopted StatefulSets
Testing:
RabbitMQPolicy lifecycle, TransportURL (plain, TLS, custom user/vhost,
credential rotation), VCT fix, combined version+queue migration,
proxy sidecar, operator coexistence (OldCRCleaned set when no old CR
exists, unrelated RabbitmqCluster CRs not touched)
ownership, credential rotation with cleanup-blocked finalizer,
deletion with dependent resources, plugin enable/disable, policy
enforcement via rabbitmqctl, queue migration (Mirrored to Quorum)
with AMQP proxy rewrite verification (classic non-durable -> quorum
durable), resource management (vhost/user/policy), scale-up with PDB,
TLS configuration, TLS TransportURL, custom TransportURL, migration
from old operator, version upgrades (3.9->4.2 with/without TLS,
Mirrored upgrade), operator coexistence (both operators managing
independent clusters without interference)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Depends-on: openstack-k8s-operators/openstack-operator#1857