Skip to content

Drop rabbitmq-cluster-operator dependency and manage RabbitMQ directly#551

Merged
openshift-merge-bot[bot] merged 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:drop_clusterop_squashed
Apr 7, 2026
Merged

Drop rabbitmq-cluster-operator dependency and manage RabbitMQ directly#551
openshift-merge-bot[bot] merged 1 commit intoopenstack-k8s-operators:mainfrom
lmiccini:drop_clusterop_squashed

Conversation

@lmiccini
Copy link
Copy Markdown
Contributor

@lmiccini lmiccini commented Mar 18, 2026

Remove the dependency on the external rabbitmq-cluster-operator and have
the infra-operator manage RabbitMQ StatefulSets, Services, ConfigMaps,
and Secrets directly.

Core controller changes:

  • Direct StatefulSet management with proper volume mounts, config
    generation, and TLS support (both client and inter-node)
  • Service creation for client (AMQP/AMQPS) and headless node discovery
  • ConfigMap generation for server config, plugins, and config-data
  • Secret management for default-user credentials and Erlang cookie
  • PodDisruptionBudget for multi-replica deployments
  • Fix stale ownerReferences in volumeClaimTemplates from adopted
    StatefulSets (orphan-delete + recreate with annotation-based storage
    class preservation)
  • Label pods with skipPreStopChecks before StatefulSet deletion so the
    Downward API volume is populated when cascade deletion triggers the
    PreStop hook, preventing 7-day termination hangs

Version upgrade workflow (3.x to 4.x):

  • State machine with phases: None -> DeletingResources -> WaitingForCluster -> None
  • Detects targetVersion changes and triggers storage wipe when crossing
    major versions (required by RabbitMQ for 3.x -> 4.x upgrades)
  • Sets wipeReason=VersionUpgrade in status to track upgrade progress
  • Deletes StatefulSet to stop all pods atomically, then recreates it
    with a wipe-data init container that clears /var/lib/rabbitmq on the
    existing PVs (marker files prevent re-wipes across pod restarts)
  • Tracks currentVersion in status after successful upgrade

Queue type migration (Mirrored to Quorum):

  • Supports migrating from classic mirrored (ha-all policy) queues to
    quorum queues via spec.queueType change
  • Triggers storage wipe with wipeReason=QueueTypeMigration
  • Manages ha-all policy lifecycle: applies for Mirrored (replicas > 1),
    removes when transitioning away from Mirrored
  • Defaulting webhook forces queueType from Mirrored to Quorum when
    targetVersion is 4.x+, since mirrored queues are not supported in
    RabbitMQ 4.x. This enables the openstack-operator to upgrade from
    3.x (Mirrored) to 4.x and have the migration handled automatically
  • Validation webhook rejects Mirrored+4.x as a safety net after
    defaulting

AMQP proxy sidecar:

  • Python-based TCP proxy injected as a sidecar container when
    status.proxyRequired is true (after version upgrade or queue migration)
  • Rewrites AMQP queue.declare frames to force durable=True and
    x-queue-type=quorum, and exchange.declare frames to force durable=True
  • Listens on port 5672 (plain) or 5671 (TLS) depending on TLS config
  • Forwards connections to RabbitMQ backend on port 5673
  • Removed via clients-reconfigured annotation once consumers reconnect
  • Includes liveness/readiness probes and TLS certificate mounting

Migration from rabbitmq-cluster-operator:

  • Detects migration by checking for an existing RabbitmqCluster CR with
    the same name; if none is found (or the CRD is not installed), the
    controller skips all migration logic and sets OldCRCleaned=True,
    allowing both operators to run side-by-side managing their own
    independent resources without conflict
  • Adoption logic reparents existing StatefulSets, Services, and Secrets
    from old RabbitmqCluster owner to new RabbitMq CR
  • Strips old ownerReferences from PVCs before deleting old CR to prevent
    cascade garbage collection
  • Cleans up old RabbitmqCluster CR after successful adoption
  • Fixes stale volumeClaimTemplate ownerReferences that cause new PVCs
    to be garbage-collected when scaling up adopted StatefulSets

Testing:

  • Functional tests (envtest): RabbitMQ controller reconciliation,
    RabbitMQPolicy lifecycle, TransportURL (plain, TLS, custom user/vhost,
    credential rotation), VCT fix, combined version+queue migration,
    proxy sidecar, operator coexistence (OldCRCleaned set when no old CR
    exists, unrelated RabbitmqCluster CRs not touched)
  • Kuttl integration tests: basic cluster deployment, cluster resource
    ownership, credential rotation with cleanup-blocked finalizer,
    deletion with dependent resources, plugin enable/disable, policy
    enforcement via rabbitmqctl, queue migration (Mirrored to Quorum)
    with AMQP proxy rewrite verification (classic non-durable -> quorum
    durable), resource management (vhost/user/policy), scale-up with PDB,
    TLS configuration, TLS TransportURL, custom TransportURL, migration
    from old operator, version upgrades (3.9->4.2 with/without TLS,
    Mirrored upgrade), operator coexistence (both operators managing
    independent clusters without interference)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Depends-on: openstack-k8s-operators/openstack-operator#1857

@lmiccini
Copy link
Copy Markdown
Contributor Author

retest

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ef64cbd98adf4d5f850016ae4677b0bc

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 26m 35s
podified-multinode-edpm-deployment-crc FAILURE in 1h 05m 04s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 11m 24s

@softwarefactory-project-zuul
Copy link
Copy Markdown

This change depends on a change that failed to merge.

Change openstack-k8s-operators/install_yamls#1140 is needed.

@lmiccini
Copy link
Copy Markdown
Contributor Author

retest

@lmiccini
Copy link
Copy Markdown
Contributor Author

/test infra-operator-build-deploy-kuttl

1 similar comment
@lmiccini
Copy link
Copy Markdown
Contributor Author

/test infra-operator-build-deploy-kuttl

@lmiccini
Copy link
Copy Markdown
Contributor Author

/test infra-operator-build-deploy-kuttl

@lmiccini lmiccini force-pushed the drop_clusterop_squashed branch 3 times, most recently from 5dc2a5d to 2f0db6d Compare March 19, 2026 15:39
@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ef446e528f6942b69d23863d3272635c

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 33m 50s
podified-multinode-edpm-deployment-crc FAILURE in 1h 05m 31s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 16m 15s

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/cbc313e663c54e5a843813e91c0915a9

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 35m 11s
podified-multinode-edpm-deployment-crc FAILURE in 1h 04m 02s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 19m 12s

@lmiccini
Copy link
Copy Markdown
Contributor Author

/test infra-operator-build-deploy-kuttl

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/15bd33ab5841446c89f03c64a290ccd9

openstack-k8s-operators-content-provider FAILURE in 9m 28s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider

@lmiccini
Copy link
Copy Markdown
Contributor Author

recheck

@lmiccini
Copy link
Copy Markdown
Contributor Author

/test infra-operator-build-deploy-kuttl

@lmiccini
Copy link
Copy Markdown
Contributor Author

/test functional

@softwarefactory-project-zuul
Copy link
Copy Markdown

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2b49c72f6bb5489d89ca612b6070a481

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 26m 51s
podified-multinode-edpm-deployment-crc FAILURE in 1h 06m 09s
cifmw-crc-podified-edpm-baremetal FAILURE in 1h 09m 44s

@lmiccini
Copy link
Copy Markdown
Contributor Author

recheck

@lmiccini
Copy link
Copy Markdown
Contributor Author

/test infra-operator-build-deploy-kuttl

@softwarefactory-project-zuul
Copy link
Copy Markdown

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1857,08c8ff59ad3e5f3b8525c613b0bc177cb629ef5a

@lmiccini lmiccini force-pushed the drop_clusterop_squashed branch from aec7490 to a56e071 Compare April 2, 2026 08:15
@softwarefactory-project-zuul
Copy link
Copy Markdown

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1857,bb09200101795a5081cd8fb16938d211156f1aab

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Apr 2, 2026

recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/openstack-operator for 1857,8c40eed7ff4f09d918678c548c661e8273d14e85

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Apr 2, 2026

recheck

@softwarefactory-project-zuul
Copy link
Copy Markdown

This change depends on a change that failed to merge.

Change openstack-k8s-operators/openstack-operator#1857 is needed.

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Apr 2, 2026

recheck

@lmiccini lmiccini force-pushed the drop_clusterop_squashed branch from a56e071 to 2937300 Compare April 2, 2026 09:57
@softwarefactory-project-zuul
Copy link
Copy Markdown

This change depends on a change that failed to merge.

Change openstack-k8s-operators/openstack-operator#1857 is needed.

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Apr 2, 2026

recheck

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Apr 2, 2026

/test functional

Remove the dependency on the external rabbitmq-cluster-operator and have
the infra-operator manage RabbitMQ StatefulSets, Services, ConfigMaps,
and Secrets directly.

Core controller changes:
- Direct StatefulSet management with volume mounts, config generation,
  and TLS support (client and inter-node)
- Service creation for client (AMQP/AMQPS) and headless node discovery
- ConfigMap generation for server config, plugins, and config-data
- Secret management for default-user credentials and Erlang cookie
- PodDisruptionBudget for multi-replica deployments
- Fix stale ownerReferences in volumeClaimTemplates from adopted
  StatefulSets (orphan-delete + recreate with annotation-based storage
  class preservation)
- Label pods with skipPreStopChecks before StatefulSet deletion so the
  Downward API volume is populated when cascade deletion triggers the
  PreStop hook, preventing 7-day termination hangs
- Set ObservedGeneration at end of reconciliation so consumers only see
  it after the spec has been fully processed
- Add nil-safe DefaultUser checks in checkClusterReadiness and all
  reconcile-delete paths to prevent nil dereference during startup
- Handle error from helper.NewHelper in sub-resource controllers

Version upgrade workflow (3.x to 4.x):
- State machine with phases: None -> DeletingResources -> WaitingForCluster -> None
- Detect targetVersion changes and trigger storage wipe when crossing
  major versions (required by RabbitMQ for 3.x -> 4.x upgrades)
- Set wipeReason=VersionUpgrade in status to track upgrade progress
- Delete StatefulSet to stop all pods atomically, then recreate with
  a wipe-data init container that clears /var/lib/rabbitmq on the
  existing PVs (marker files prevent re-wipes across pod restarts)
- Track currentVersion in status after successful upgrade
- Reject version downgrades in validation webhook
- Reject scale-down in validation webhook (both RabbitMq and SpecCore)

Queue type migration (Mirrored to Quorum):
- Support migrating from classic mirrored (ha-all policy) queues to
  quorum queues via spec.queueType change
- Trigger storage wipe with wipeReason=QueueTypeMigration
- Manage ha-all policy lifecycle: apply for Mirrored (replicas > 1),
  remove when transitioning away from Mirrored
- Force queueType from Mirrored to Quorum in defaulting webhook when
  targetVersion is 4.x+, since mirrored queues are not supported in
  RabbitMQ 4.x. This enables the openstack-operator to upgrade from
  3.x (Mirrored) to 4.x and have the migration handled automatically
- Reject Mirrored+4.x in validation webhook as a safety net after
  defaulting

AMQP proxy sidecar:
- Python-based TCP proxy injected as a sidecar container when
  status.proxyRequired is true (after version upgrade or queue migration)
- Rewrite AMQP queue.declare frames to force durable=True and
  x-queue-type=quorum, and exchange.declare frames to force durable=True
- Listen on port 5672 (plain) or 5671 (TLS) depending on TLS config
- Forward connections to RabbitMQ backend on port 5673
- Remove via clients-reconfigured annotation once consumers reconnect
- RabbitMQProxyActive condition provides visibility into proxy state
  with actionable message explaining how to clear it
- Include liveness/readiness probes and TLS certificate mounting

Migration from rabbitmq-cluster-operator:
- Detect migration by checking for an existing RabbitmqCluster CR with
  the same name; if none is found (or the CRD is not installed), the
  controller skips all migration logic and sets OldCRCleaned=True,
  allowing both operators to run side-by-side managing their own
  independent resources without conflict
- Reparent existing StatefulSets, Services, and Secrets from old
  RabbitmqCluster owner to new RabbitMq CR
- Strip old ownerReferences from PVCs before deleting old CR to prevent
  cascade garbage collection
- Clean up old RabbitmqCluster CR after successful adoption
- Fix stale volumeClaimTemplate ownerReferences that cause new PVCs
  to be garbage-collected when scaling up adopted StatefulSets
- Migrate deprecated fields (persistence, rabbitmq, override.service)
  to new fields in webhook; remove CRD-level default from storage to
  avoid double-defaulting during migration

Transport URL:
- URL-encode username and password in transport URL to handle special
  characters in user-provided credentials

Testing:
- Functional tests (envtest): RabbitMQ controller reconciliation,
  RabbitMQPolicy lifecycle, TransportURL (plain, TLS, custom user/vhost,
  credential rotation), VCT fix, combined version+queue migration,
  proxy sidecar, operator coexistence (OldCRCleaned set when no old CR
  exists, unrelated RabbitmqCluster CRs not touched), webhook migration
  (persistence, config, override.service field migration with defaults)
- Transport URL secret unit tests for password URL-encoding with special
  characters
- Kuttl integration tests: basic cluster deployment, cluster resource
  ownership, credential rotation with cleanup-blocked finalizer,
  deletion with dependent resources, plugin enable/disable, policy
  enforcement via rabbitmqctl, queue migration (Mirrored to Quorum)
  with AMQP proxy rewrite verification (classic non-durable -> quorum
  durable), resource management (vhost/user/policy), scale-up with PDB,
  TLS configuration, TLS TransportURL, custom TransportURL, migration
  from old operator, version upgrades (3.9->4.2 with/without TLS,
  Mirrored upgrade), operator coexistence (both operators managing
  independent clusters without interference)
- Kuttl tests use quay.io/openstack-k8s-operators/rabbitmq:3.9 for 3.x
  and quay.io/podified-antelope-centos9/openstack-rabbitmq:current-podified
  for 4.x

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lmiccini lmiccini force-pushed the drop_clusterop_squashed branch from 2937300 to e4e3813 Compare April 2, 2026 12:58
@softwarefactory-project-zuul
Copy link
Copy Markdown

This change depends on a change that failed to merge.

Change openstack-k8s-operators/openstack-operator#1857 is needed.

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Apr 2, 2026

/test infra-operator-build-deploy-kuttl

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Apr 2, 2026

recheck

@lmiccini
Copy link
Copy Markdown
Contributor Author

lmiccini commented Apr 2, 2026

/test infra-operator-build-deploy-kuttl

@dciabrin
Copy link
Copy Markdown
Contributor

dciabrin commented Apr 7, 2026

/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 7, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lmiccini, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit f57b3a5 into openstack-k8s-operators:main Apr 7, 2026
7 checks passed
@amoralej
Copy link
Copy Markdown

amoralej commented Apr 8, 2026

lmiccini added a commit to lmiccini/telemetry-operator that referenced this pull request Apr 8, 2026
Switch RabbitMQ scrape config discovery from rabbitmq.com/RabbitmqCluster
(rabbitmq-cluster-operator) to rabbitmq.openstack.org/RabbitMq
(infra-operator), following the removal of the rabbitmq-cluster-operator
from infra-operator in openstack-k8s-operators/infra-operator#551.

- Replace rabbitmq/cluster-operator imports with infra-operator rabbitmq/v1beta1
- Update RBAC from rabbitmq.com/rabbitmqclusters to rabbitmq.openstack.org/rabbitmqs
- Remove rabbitmqclusterv1 scheme registration
- Bump infra-operator, keystone-operator, and heat-operator

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lmiccini added a commit to lmiccini/install_yamls that referenced this pull request Apr 8, 2026
The rabbitmq-cluster-operator has been removed from infra-operator
(openstack-k8s-operators/infra-operator#551). RabbitMQ is now managed
natively by infra-operator via the rabbitmq.openstack.org/RabbitMq CRD.

Update all rabbitmq targets to use infra-operator instead of the
standalone rabbitmq-cluster-operator:
- Use KIND=RabbitMq and infra-operator sample CR
- Remove rabbitmq operator install/cleanup targets (infra handles it)
- Remove rabbitmq-cluster special-casing from helper scripts
- Remove rabbitmq operator dep from kuttl_common_prep (already
  installed via kuttl_db_prep -> infra)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lmiccini added a commit to lmiccini/install_yamls that referenced this pull request Apr 8, 2026
The rabbitmq-cluster-operator has been removed from infra-operator
(openstack-k8s-operators/infra-operator#551). RabbitMQ is now managed
natively by infra-operator via the rabbitmq.openstack.org/RabbitMq CRD.

- Replace rabbitmq_deploy with an inline RabbitMq CR applied directly
  via oc apply, instead of cloning repos and running kustomize
- Remove rabbitmq operator install/cleanup targets (infra handles it)
- Remove rabbitmq-cluster special-casing from helper scripts
- Remove rabbitmq operator dep from kuttl_common_prep (already
  installed via kuttl_db_prep -> infra)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants