Skip to content

Conversation

@G8XSU
Copy link
Contributor

@G8XSU G8XSU commented Mar 13, 2025

Note: It is only enabled if we enable events-rabbitmq feature.

RabbitMQ was selected because it is one of the most robust and battle-tested open-source messaging systems, allowing for durable, buffered messages and flexible exchange options.

This ensures that events are not lost, even in the case of restarts, network failures, or consumer failures, while keeping consumers decoupled.

  • Feature-based: Only enabled if we enable events-rabbitmq feature, this allows us to support multiple implementations and no strict coupling with a single impl while keep our dependencies to the minimum.
  • Durable Publishing: Uses publisher confirms and persistent delivery mode to guarantee events are stored by RabbitMQ before returning Ok(()).
  • Lazy Initialized connection: Wraps Connection and Channel in Option for lazy connection init via ensure_connected.
  • Reconnection Handling: Implements a single reconnect attempt per publish call. Event publishing is already retried, so this keeps it simple for now.

Depends on #51
As part of #37

@G8XSU G8XSU requested a review from jkczyz March 13, 2025 18:04
@G8XSU G8XSU marked this pull request as ready for review March 18, 2025 00:40
@G8XSU G8XSU force-pushed the 2025-03-12-rabbitmq-impl branch from 64989d9 to ddf96fa Compare March 18, 2025 19:37
@G8XSU
Copy link
Contributor Author

G8XSU commented Mar 18, 2025

Rebased on main after #51 merged.

This was referenced Mar 18, 2025
Comment on lines 22 to 23
# Required for RabittMQ based EventPublisher. Only enabled for `events-rabbitmq` feature.
lapin = { version = "2.5.1", optional = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is require a change of the MSRV to 1.81.0 with the feature enabled.

Copy link
Contributor Author

@G8XSU G8XSU Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I can fix the msrv to 1.75 for this commit,
Fwiw, longer term, I don't think we can make a commitment to msrv when these optional features are enabled.
This is because most of these deps and rust ecosystem doesn't follow the same msrv principle.
rustls recently bumped their msrv due to which ldk-node, bdk and everyone had to bump theirs.
aws sdk for rust doesn't follow a conservative msrv, and older versions risk being incompatible with apis.
Might need to come up with a separate msrv policy when optional features are enabled.

fixed msrv build for 1.75. for now.

Comment on lines +109 to +118
let event_publisher: Arc<dyn EventPublisher> = {
let rabbitmq_config = RabbitMqConfig {
connection_string: config_file.rabbitmq_connection_string,
exchange_name: config_file.rabbitmq_exchange_name,
};
Arc::new(RabbitMqEventPublisher::new(rabbitmq_config))
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we open a connection to ensure the config parameters are valid?

Copy link
Contributor Author

@G8XSU G8XSU Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good question.
I was doing that earlier, then moved to lazy connection initialization. Opening a connection in construction of RabbitMqEventPublisher makes it fallible, it has its pros and cons.

It is mainly about : should we fail ldk-server startup if we can't reach rabbitmq?
In case of production deployments, should deployments of ldk-server fail if a dependency service is experiencing an outage?

Argument for non-fallible: we should keep deployments of different systems decoupled.
Argument for fallible: we check availability and correctness of endpoint upfront.

I tend to favor the non-fallible approach since rabbitmq unavailability shouldn't create a ldk-server outage(in case of restarts/deployments etc.). Even with event-publishing degraded, uptime of lightning node is important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, though how would an operator be aware of the rabbitmq unavailability? IIUC, unavailability would cause the event queue to be backed up. The underlying node would still progress, but from the user's perspective the payment store would be stale since event publishing failure would prevent persisting the payment.

This doesn't need to hold up the PR, but just wondering how this would be monitored.

Copy link
Contributor Author

@G8XSU G8XSU Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to have metrics for this and persistence failures, ideally those should be monitored and alerted on. We can configure pages on metrics, that way we don't have to shutdown the service to bring attention.

Maybe we can also configure some threshold(configurable) at which the process will shutdown instead of accumulating more backlog.

@G8XSU G8XSU force-pushed the 2025-03-12-rabbitmq-impl branch from 661aa18 to 368c398 Compare March 19, 2025 22:22
@G8XSU G8XSU requested a review from jkczyz March 19, 2025 22:24
@G8XSU G8XSU force-pushed the 2025-03-12-rabbitmq-impl branch from 38d271b to b11e719 Compare March 20, 2025 21:42
@G8XSU
Copy link
Contributor Author

G8XSU commented Mar 20, 2025

Rebased due to merge conflicts with main and squashed current fixups as well.

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

G8XSU added 3 commits March 20, 2025 15:05
RabbitMQ was selected because it is one of the most robust and
battle-tested open-source messaging systems, allowing for
durable, buffered messages and flexible exchange options.

This ensures that events are not lost, even in the case of
restarts, network failures, or consumer failures, while
keeping consumers decoupled.

* Feature-based: Only enabled if we enable `events-rabbitmq`
  feature, this allows us to support multiple implementations
  and no strict coupling with a single impl while keep our
  dependencies to the minimum.
* Durable Publishing: Uses publisher confirms and persistent
  delivery mode to guarantee events are stored by RabbitMQ
  before returning Ok(()).
* Lazy Initialized connection: Wraps Connection and Channel
  in Option for lazy connection init via ensure_connected.
* Reconnection Handling: Implements a single reconnect
  attempt per publish call. Event publishing is already
  retried, so this keeps it simple for now.
@G8XSU G8XSU force-pushed the 2025-03-12-rabbitmq-impl branch from b11e719 to d6c3618 Compare March 20, 2025 22:07
@G8XSU
Copy link
Contributor Author

G8XSU commented Mar 20, 2025

Fixed and squashed fixup.

@G8XSU G8XSU requested a review from jkczyz March 20, 2025 22:10
@G8XSU G8XSU added the Weekly Goal Someone wants to land this this week label Mar 20, 2025
@G8XSU G8XSU merged commit b61af2e into lightningdevkit:main Mar 20, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Weekly Goal Someone wants to land this this week

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants