Skip to content

Jpersaud/reload plugin hot swap#2559

Draft
justinpersaud wants to merge 9 commits intomainfrom
jpersaud/reload-plugin-hot-swap
Draft

Jpersaud/reload plugin hot swap#2559
justinpersaud wants to merge 9 commits intomainfrom
jpersaud/reload-plugin-hot-swap

Conversation

@justinpersaud
Copy link
Copy Markdown
Contributor

Details

Fixed Issues

Fixes GH_LINK

Tests


Internal Testing Reminder: when changing bedrock, please compile auth against your new changes

justinpersaud and others added 9 commits March 23, 2026 13:25
Store the dlopen handle and filesystem path for each plugin loaded
via shared library. These are saved in new static maps on
BedrockPlugin (g_pluginDLHandles and g_pluginPaths), keyed by the
upper-cased plugin name. This is prerequisite infrastructure for
hot-reloading plugins without restarting Bedrock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Track in-flight commands per plugin via an atomic activeCommandCount
on BedrockPlugin. Incremented in the BedrockCommand constructor and
decremented in the destructor. This allows the ReloadPlugin handler
to drain only the target plugin's commands rather than waiting for
all commands globally.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement a new control port command that hot-reloads a dynamically
loaded plugin (.so) without restarting Bedrock. The database stays
open and mmap'd, the cluster membership persists, and only the plugin
code is swapped.

The reload follows a 7-phase sequence:
1. Validate the plugin exists and has a dlopen handle
2. Block the command port and reject new commands for the plugin
3. Drain in-flight commands (120s timeout, based on DEFAULT_TIMEOUT)
4. Destroy the old plugin instance and dlclose the old .so
5. dlopen the new .so and instantiate the new plugin
6. Run upgradeDatabase (if LEADING) and stateChanged
7. Unblock the command port

On failure, attempts to roll back by re-loading the old .so. If
rollback also fails, the plugin becomes unavailable but Bedrock
continues serving other plugins.

Also adds a shared_mutex (_pluginsMutex) to protect the plugins map
during the brief swap window, with shared locks on all plugin
iteration paths (getCommandFromPlugins, notifyStateChangeToPlugins,
_upgradeDB, postPoll timer loop).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test coverage for the ReloadPlugin control command:
- Happy path reload with command verification
- Drain of in-flight commands before reload completes
- Rejection of built-in plugin reload (DB)
- Rejection of nonexistent plugin reload
- Database state preservation across reloads
- Follower node reload
- Serialization of concurrent reload requests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ReloadPlugin control command uses a "Plugin" header to specify
which plugin to reload. Without whitelisting this param name, the
logging system throws an exception in dev mode when trying to log
the response, crashing the process.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues caused a crash when sending ReloadPlugin:

1. The "Plugin" header name collided with the internal "plugin"
   header that _reply() uses to route responses to plugin port
   handlers. SData uses case-insensitive key comparison, so
   request["Plugin"] matched request["plugin"]. When _reply()
   found this non-empty value, it tried to look up the plugin by
   name in the plugins map, failed, and hit SERROR (abort).
   Fix: rename the header to "PluginName" which does not collide.

2. The stored .so path was the bare filename passed to -plugins
   (e.g., "auth.so"), not the resolved absolute path. SFileExists
   failed because "auth.so" doesn't exist relative to CWD — dlopen
   found it via /usr/lib. Fix: use dlinfo(RTLD_DI_LINKMAP) after
   dlopen to resolve the actual filesystem path, then realpath()
   to follow symlinks.

Also updates the log params whitelist and test suite to use
"PluginName".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auth's stateChanged() spawns a detached thread that captures the
SQLite& db reference and uses it asynchronously (waiting for
isUpgradeComplete(), then querying maxOnyxUpdateID). When called
from the ReloadPlugin handler, the db reference came from a
SQLiteScopedHandle that was destroyed at the end of the block,
leaving the detached thread with a dangling reference → segfault.

Fix: acquire a DB handle from the pool without scoping it, call
stateChanged, then spawn a background thread that waits for
isUpgradeComplete() before returning the handle to the pool. This
ensures the db reference remains valid for the full duration of
the plugin's async initialization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dlopen caches loaded libraries by path. Calling dlclose() followed
by dlopen() on the same path returns the old in-memory copy rather
than re-reading from disk. This meant ReloadPlugin appeared to
succeed but actually loaded the same code.

Fix: copy the .so to a unique temporary path (appending a timestamp)
before calling dlopen, then delete the temp file. Since the library
is already mapped into memory after dlopen, deleting the file is
safe. This forces the dynamic linker to load fresh code from disk
on every reload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The top-level version string reported by the Status command was not
updated after a plugin reload — it still contained the old plugin
version hash. Rebuild it from all current plugins after each
successful reload so the version string stays consistent with the
per-plugin version data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant