Skip to content

Commit d7cbe9f

Browse files
tishunuglideggivokiryazovi-redisCopilot
authored
[Hitless Upgrades] React to maintenance events #3345 (#3354)
* [Hitless Upgrades] React to maintenance events (#3345) * v0.1 * Simple reconnect now working * Bind address from message is now considered * Self-register the handler * Format code * Filter push messages in a more stable way * (very hacky) Relax comand expire timers globbaly * Configure if timeout relaxing should be applied * Proper way to close channel * Configure the timneout relaxing * Sequential handover implemented * Did not address formatting * Prolong the rebind windwow for relaxed tiemouts * PubSub no longer required; CommandExpiryWriter is now channel aware; Polishing * Use the new MOVING push message from the RE server * Unit test was not chaining delgates in the same way that the RedisClient/RedisClusterClient was * Fix REBIND message validation * Fixed the expiry mechanism * Polishing * Fix NPE. Seems like AttributeMap.attr is not accurate and actually return's null causing some unit test failures. * Add support for MIGRATING/MIGRATED message handling in command expiry This commit adds the ability to listen for MIGRATING and MIGRATED messages and trigger extended command expiry timeouts during Redis shard migration. Key changes: - Enhanced RebindAwareConnectionWatchdog to detect MIGRATING/MIGRATED messages - RebindAwareExpiryWriter to trigger timeout relaxation whenever MIGRATING message is received This feature allows commands to have relaxed timeouts during shard migration operations, preventing unnecessary timeouts when Redis is temporarily busy with migration tasks. * formating * Fix Disabling relaxTimeouts after upgrade can interfere with an ongoing one from re-bind * Additional fix for timeout relaxing disabled * Fix push message listener registered multiple times after rebind. * Fix: Report correct command timeout when relaxTimeout is configured * Disable relaxedTimeout after configured grace period - Introduce a delay before disabling relaxedTimeout - Grace period duration is provided via push notification * Code clean up - Remove reading from pub/sub chanel and relay only on push notifications * Add FAILING_OVER/FAILED_OVER * Polishing : Rename components to use the word 'maintenace' --------- Co-authored-by: Igor Malinovskiy <[email protected]> Co-authored-by: ggivo <[email protected]> # Conflicts: # src/main/java/io/lettuce/core/ClientOptions.java * [Hitless upgrades] Add unit tests for the newly introduced classes #3355 (#3356) * Unit tests for the maintanence aware classes * Did not format properly * Proper license * Cae 633 add functional tests notifications (#3357) - excluding JsonExample.java * Cae 1130 timeout tests (#3377) * initial WIP, with lots of debugging, and some non-working tests * debug * more attemtps at debugging * Refactor: Move cluster state management methods from MaintenanceNotificationTest to RedisEnterpriseConfig - Moved refreshClusterConfig, recordOriginalClusterState, and restoreOriginalClusterState methods - Updated call sites in MaintenanceNotificationTest to use RedisEnterpriseConfig methods - Added required imports and static variables to RedisEnterpriseConfig - Maintained existing functionality while improving code organization Improvements to RelaxedTimeoutConfigurationTest: - Simplified traffic generation logic by removing complex multi-phase testing - Streamlined BLPOP command execution with better timeout detection - Added relaxed timeout detection and recording during maintenance events - Improved logging and error handling for timeout analysis - Enhanced test assertions to focus on relaxed timeout detection rather than success counts - Added MOVING operation duration tracking for better test analysis * Improve test reliability and cleanup: Add @AfterEach cleanup, enhance endpoint tracking, and improve logging * add un-relaxed tests. will investigate further why they got broken at some point via diff * CAE-1130: Update timeout configuration test and watchdog implementation * Reset MaintenanceAwareConnectionWatchdog.java and log4j2-test.xml to upstream versions * Clean up debug info and outdated comments from timeout tests - Remove large debug block with reflection-based debugging in RelaxedTimeoutConfigurationTest - Simplify excessive debug logging and verbose markers (*** and ===) - Clean up maintenance notification test logging - Improve push notification monitor message formatting - Maintain all test functionality while improving code readability * Refactor: Move inline comments above code and fix string comparisons - Move all inline comments to be above the code they reference in: * RelaxedTimeoutConfigurationTest.java * RedisEnterpriseConfig.java * MaintenanceNotificationTest.java - Replace string != comparisons with .equals() for proper string comparison - Apply code formatting via Maven formatter This improves code readability and follows Java best practices for string comparison. * [Hitless Upgrades] Zero-server-side configuration with client-side opt-in (#3380) * Support for Client-side opt-in A client can tell the server if it wants to receive maintenance push notifications via the following command: CLIENT MAINT_NOTIFICATIONS <ON | OFF> [parameter value parameter value ...] * update maintenance events to latest format - MIGRATING <seq_number> <time> <shard_id-s>: A shard migration is going to start within <time> seconds. - MIGRATED <seq_number> <shard_id-s>: A shard migration ended. - FAILING_OVER <seq_number> <time> <shard_id-s>: A shard failover of a healthy shard started. - FAILED_OVER <seq_number> <shard_id-s>: A shard failover of a healthy shard ended. - MOVING <seq_number> <time> <endpoint>: A specific endpoint is going to move to another node within <time> seconds * clean up * Update FAILED_OVER & MIGRATED to include additional time field * update is private reserver check & add unit tests update is private reserver check * add unit tests for handshake with enabled maintenance events * add missing copyrights/docs * format * address review comments * Revert address after rebind operation expires * Update event's validation spec - MIGRATING <seq_number> <time> <shard_id-s>: A shard migration is going to start within <time> seconds. - MIGRATED <seq_number> <shard_id-s>: A shard migration ended. - FAILING_OVER <seq_number> <time> <shard_id-s>: A shard failover of a healthy shard started. - FAILED_OVER <seq_number> <shard_id-s>: A shard failover of a healthy shard ended. - MOVING <seq_number> <time> <endpoint>: A specific endpoint is going to move to another node within <time> seconds * rebase * format after rebase * Apply suggestions from code review Co-authored-by: Tihomir Krasimirov Mateev <[email protected]> * javadoc updated * Update src/main/java/io/lettuce/core/internal/NetUtils.java Co-authored-by: Tihomir Krasimirov Mateev <[email protected]> --------- Co-authored-by: Tihomir Krasimirov Mateev <[email protected]> * [hitless upgrade] Support for none moving-endpoint-type in maintenance event notifications (CAE-1285) (#3396) * support MOVING with none none indicates that the MOVING message doesn’t need to contain an endpoint. In such a case, the client is expected to schedule a graceful reconnect to its currently configured endpoint after half of the grace period that was communicated by the server is over. * formatting * Apply suggestions from code review Co-authored-by: Copilot <[email protected]> * Fix NPE * Add test to cover null rebindAddress null for rebind adress can be returned as part of MOVING notification if client is connected using 'moving-endpoint-type=none' * Add java docs to RebindAwareAddressSupplier --------- Co-authored-by: Copilot <[email protected]> * [Hitless Upgrades] Enable maintenance events support by default (CAE-1303) (#3415) * set default relaxed timeout to 10s * Enable maintenance events support by default * Enable maintenance events support by default * fix tests - ensure MaintenanceAwareExpiryWriter is registered for events when wathcdog is created - command timeout was not applied * fix tests - sporadic failure because of timeout of 50ms RedisHandshakeUnitTests.handshakeDelayedCredentialProvider:153 » ConditionTimeout - new command introduced during handshake, increase the timeout to 100ms * resolve errors after rebase on main - reset() - removed with issue#3328 - remove deprecated code from issue#907 (#3395) * Remove LettuceMaintenanceEventsDemo.java - no longer needed. --------- Co-authored-by: Igor Malinovskiy <[email protected]> Co-authored-by: ggivo <[email protected]> Co-authored-by: kiryazovi-redis <[email protected]> Co-authored-by: Copilot <[email protected]>
1 parent 1daf120 commit d7cbe9f

37 files changed

+6515
-64
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,5 @@ dependency-reduced-pom.xml
1616
.idea
1717
.flattened-pom.xml
1818
*.java-version
19-
*.DS_Store
19+
*.DS_Store
20+
.vscode*

src/main/java/io/lettuce/core/AbstractRedisClient.java

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@
2020
package io.lettuce.core;
2121

2222
import java.io.Closeable;
23+
import java.net.InetAddress;
24+
import java.net.InetSocketAddress;
2325
import java.net.SocketAddress;
2426
import java.time.Duration;
2527
import java.util.ArrayList;
@@ -34,6 +36,7 @@
3436
import java.util.concurrent.atomic.AtomicBoolean;
3537
import java.util.concurrent.atomic.AtomicInteger;
3638

39+
import io.lettuce.core.MaintenanceEventsOptions.AddressTypeSource;
3740
import reactor.core.publisher.Mono;
3841
import io.lettuce.core.event.command.CommandListener;
3942
import io.lettuce.core.event.connection.ConnectEvent;
@@ -586,8 +589,16 @@ private CompletableFuture<Void> closeClientResources(long quietPeriod, long time
586589
}
587590

588591
protected RedisHandshake createHandshake(ConnectionState state) {
592+
AddressTypeSource source = null;
593+
if (clientOptions.getMaintenanceEventsOptions().supportsMaintenanceEvents()) {
594+
LettuceAssert.notNull(clientOptions.getMaintenanceEventsOptions().getAddressTypeSource(),
595+
"Address type source must not be null");
596+
597+
source = clientOptions.getMaintenanceEventsOptions().getAddressTypeSource();
598+
}
599+
589600
return new RedisHandshake(clientOptions.getConfiguredProtocolVersion(), clientOptions.isPingBeforeActivateConnection(),
590-
state);
601+
state, source);
591602
}
592603

593604
}

src/main/java/io/lettuce/core/ClientOptions.java

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,8 @@ public class ClientOptions implements Serializable {
5151

5252
public static final boolean DEFAULT_AUTO_RECONNECT = true;
5353

54+
public static final MaintenanceEventsOptions DEFAULT_MAINTENANCE_EVENTS_OPTIONS = MaintenanceEventsOptions.enabled();
55+
5456
public static final Predicate<RedisCommand<?, ?, ?>> DEFAULT_REPLAY_FILTER = (cmd) -> false;
5557

5658
public static final int DEFAULT_BUFFER_USAGE_RATIO = 3;
@@ -93,6 +95,8 @@ public class ClientOptions implements Serializable {
9395

9496
private final boolean autoReconnect;
9597

98+
private final MaintenanceEventsOptions maintenanceEventsOptions;
99+
96100
private final Predicate<RedisCommand<?, ?, ?>> replayFilter;
97101

98102
private final DecodeBufferPolicy decodeBufferPolicy;
@@ -127,6 +131,7 @@ public class ClientOptions implements Serializable {
127131

128132
protected ClientOptions(Builder builder) {
129133
this.autoReconnect = builder.autoReconnect;
134+
this.maintenanceEventsOptions = builder.maintenanceEventsOptions;
130135
this.replayFilter = builder.replayFilter;
131136
this.decodeBufferPolicy = builder.decodeBufferPolicy;
132137
this.disconnectedBehavior = builder.disconnectedBehavior;
@@ -147,6 +152,7 @@ protected ClientOptions(Builder builder) {
147152

148153
protected ClientOptions(ClientOptions original) {
149154
this.autoReconnect = original.isAutoReconnect();
155+
this.maintenanceEventsOptions = original.getMaintenanceEventsOptions();
150156
this.replayFilter = original.getReplayFilter();
151157
this.decodeBufferPolicy = original.getDecodeBufferPolicy();
152158
this.disconnectedBehavior = original.getDisconnectedBehavior();
@@ -200,6 +206,8 @@ public static class Builder {
200206

201207
private boolean autoReconnect = DEFAULT_AUTO_RECONNECT;
202208

209+
private MaintenanceEventsOptions maintenanceEventsOptions = DEFAULT_MAINTENANCE_EVENTS_OPTIONS;
210+
203211
private Predicate<RedisCommand<?, ?, ?>> replayFilter = DEFAULT_REPLAY_FILTER;
204212

205213
private DecodeBufferPolicy decodeBufferPolicy = DecodeBufferPolicies.ratio(DEFAULT_BUFFER_USAGE_RATIO);
@@ -247,6 +255,21 @@ public Builder autoReconnect(boolean autoReconnect) {
247255
return this;
248256
}
249257

258+
/**
259+
* Configure whether the driver should listen for server events that notify on current maintenance activities. When
260+
* enabled, this option will help with the connection handover and reduce the number of failed commands. This feature
261+
* requires the server to support maintenance events. Defaults to {@code false}. See
262+
* {@link #DEFAULT_MAINTENANCE_EVENTS_OPTIONS}.
263+
*
264+
* @param maintenanceEventsOptions true/false
265+
* @return {@code this}
266+
* @since 7.0
267+
*/
268+
public Builder supportMaintenanceEvents(MaintenanceEventsOptions maintenanceEventsOptions) {
269+
this.maintenanceEventsOptions = maintenanceEventsOptions;
270+
return this;
271+
}
272+
250273
/**
251274
* When {@link #autoReconnect(boolean)} is set to true, this {@link Predicate} is used to filter commands to replay when
252275
* the connection is reestablished after a disconnect. Returning <code>false</code> means the command will not be
@@ -507,7 +530,8 @@ public ClientOptions build() {
507530
public ClientOptions.Builder mutate() {
508531
Builder builder = new Builder();
509532

510-
builder.autoReconnect(isAutoReconnect()).replayFilter(getReplayFilter()).decodeBufferPolicy(getDecodeBufferPolicy())
533+
builder.autoReconnect(isAutoReconnect()).supportMaintenanceEvents(getMaintenanceEventsOptions())
534+
.replayFilter(getReplayFilter()).decodeBufferPolicy(getDecodeBufferPolicy())
511535
.disconnectedBehavior(getDisconnectedBehavior()).reauthenticateBehavior(getReauthenticateBehaviour())
512536
.readOnlyCommands(getReadOnlyCommands()).publishOnScheduler(isPublishOnScheduler())
513537
.pingBeforeActivateConnection(isPingBeforeActivateConnection()).protocolVersion(getConfiguredProtocolVersion())
@@ -531,6 +555,18 @@ public boolean isAutoReconnect() {
531555
return autoReconnect;
532556
}
533557

558+
/**
559+
* Returns the {@link MaintenanceEventsOptions} to listen for server events that notify on current maintenance activities.
560+
*
561+
* @return {@link MaintenanceEventsOptions}
562+
* @since 7.0
563+
* @see #DEFAULT_MAINTENANCE_EVENTS_OPTIONS
564+
* @see #getMaintenanceEventsOptions()
565+
*/
566+
public MaintenanceEventsOptions getMaintenanceEventsOptions() {
567+
return maintenanceEventsOptions;
568+
}
569+
534570
/**
535571
* Controls which {@link RedisCommand} will be replayed after a re-connect. The {@link Predicate} returns <code>true</code>
536572
* if command should be filtered out and not replayed. Defaults to {@link #DEFAULT_REPLAY_FILTER}.

src/main/java/io/lettuce/core/ConnectionBuilder.java

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@
2727
import java.util.function.Function;
2828
import java.util.function.Supplier;
2929

30+
import io.lettuce.core.protocol.MaintenanceAwareComponent;
31+
import io.lettuce.core.protocol.MaintenanceAwareConnectionWatchdog;
3032
import jdk.net.ExtendedSocketOptions;
3133
import reactor.core.publisher.Mono;
3234
import io.lettuce.core.internal.LettuceAssert;
@@ -153,9 +155,21 @@ protected ConnectionWatchdog createConnectionWatchdog() {
153155
LettuceAssert.assertState(bootstrap != null, "Bootstrap must be set for autoReconnect=true");
154156
LettuceAssert.assertState(socketAddressSupplier != null, "SocketAddressSupplier must be set for autoReconnect=true");
155157

156-
ConnectionWatchdog watchdog = new ConnectionWatchdog(clientResources.reconnectDelay(), clientOptions, bootstrap,
157-
clientResources.timer(), clientResources.eventExecutorGroup(), socketAddressSupplier, reconnectionListener,
158-
connection, clientResources.eventBus(), endpoint);
158+
ConnectionWatchdog watchdog;
159+
if (clientOptions.getMaintenanceEventsOptions().supportsMaintenanceEvents()) {
160+
MaintenanceAwareConnectionWatchdog maintenanceAwareWatchdog = new MaintenanceAwareConnectionWatchdog(
161+
clientResources.reconnectDelay(), clientOptions, bootstrap, clientResources.timer(),
162+
clientResources.eventExecutorGroup(), socketAddressSupplier, reconnectionListener, connection,
163+
clientResources.eventBus(), endpoint);
164+
if (connection.getChannelWriter() instanceof MaintenanceAwareComponent) {
165+
maintenanceAwareWatchdog.setMaintenanceEventListener((MaintenanceAwareComponent) connection.getChannelWriter());
166+
}
167+
watchdog = maintenanceAwareWatchdog;
168+
} else {
169+
watchdog = new ConnectionWatchdog(clientResources.reconnectDelay(), clientOptions, bootstrap,
170+
clientResources.timer(), clientResources.eventExecutorGroup(), socketAddressSupplier, reconnectionListener,
171+
connection, clientResources.eventBus(), endpoint);
172+
}
159173

160174
endpoint.registerConnectionWatchdog(watchdog);
161175

src/main/java/io/lettuce/core/ConnectionMetadata.java

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ class ConnectionMetadata {
1111

1212
private volatile String libraryVersion;
1313

14+
private volatile boolean sslEnabled;
15+
1416
public ConnectionMetadata() {
1517
}
1618

@@ -23,13 +25,15 @@ public void apply(RedisURI redisURI) {
2325
setClientName(redisURI.getClientName());
2426
setLibraryName(redisURI.getLibraryName());
2527
setLibraryVersion(redisURI.getLibraryVersion());
28+
setSslEnabled(redisURI.isSsl());
2629
}
2730

2831
public void apply(ConnectionMetadata metadata) {
2932

3033
setClientName(metadata.getClientName());
3134
setLibraryName(metadata.getLibraryName());
3235
setLibraryVersion(metadata.getLibraryVersion());
36+
setSslEnabled(metadata.isSslEnabled());
3337
}
3438

3539
protected void setClientName(String clientName) {
@@ -56,4 +60,12 @@ String getLibraryVersion() {
5660
return libraryVersion;
5761
}
5862

63+
boolean isSslEnabled() {
64+
return sslEnabled;
65+
}
66+
67+
void setSslEnabled(boolean sslEnabled) {
68+
this.sslEnabled = sslEnabled;
69+
}
70+
5971
}

0 commit comments

Comments
 (0)