Skip to content

[KYUUBI #7301] Fix engine not killed after deregisterEngine#7302

Open
taylor12805 wants to merge 2 commits intoapache:masterfrom
taylor12805:deregisterEngine
Open

[KYUUBI #7301] Fix engine not killed after deregisterEngine#7302
taylor12805 wants to merge 2 commits intoapache:masterfrom
taylor12805:deregisterEngine

Conversation

@taylor12805
Copy link

(cherry picked from commit b6b05e9)

Why are the changes needed?

Fix old engine not killed bug: #7301

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where non-CONNECTION share level engines were not killed when deregistered, leaving zombie engine processes running. The fix adds logic to kill the engine application process when deregistering engines with share levels USER, GROUP, SERVER_LOCAL, or SERVER.

Changes:

  • Added engine process killing logic in the deregister method for non-CONNECTION share level engines
  • The kill attempt is wrapped in a try-catch block to handle failures gracefully

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 371 to 374
if (shareLevel != CONNECTION && builder != null) {
try {
val appMgrInfo = builder.appMgrInfo()
engineManager.killApplication(appMgrInfo, engineRefId, Some(appUser))
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing null check for engineManager. The condition should be updated to check both builder and engineManager for null before attempting to call killApplication. Without this check, a NullPointerException will occur when engineManager is null, such as in test scenarios where EngineRef is instantiated with a null engineManager parameter.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (526a8aa) to head (b1501cf).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...ain/scala/org/apache/kyuubi/engine/EngineRef.scala 0.00% 5 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7302   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         698     698           
  Lines       43646   43654    +8     
  Branches     5894    5895    +1     
======================================
- Misses      43646   43654    +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@taylor12805
Copy link
Author

I suppose the failing test is not correlated to this PR

@ruanwenjun
Copy link
Member

ruanwenjun commented Jan 22, 2026

It might be better to make the Engine can know whether it is zombie, and stop itself. Since the killApplication operation might also not be response?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +371 to +380
if (shareLevel != CONNECTION && builder != null && engineManager != null) {
try {
val appMgrInfo = builder.appMgrInfo()
engineManager.killApplication(appMgrInfo, engineRefId, Some(appUser))
info(s"Killed engine process for $engineRefId with share level $shareLevel")
} catch {
case e: Exception =>
warn(s"Error killing engine process for $engineRefId", e)
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical bug: The condition builder != null will prevent killing engines in the most common deregister scenarios. The builder field is only set when this EngineRef instance creates a new engine (line 216 in the create method). However, when an existing engine is discovered via getOrCreate, the builder remains null. This means when deregister is called for discovered engines (the exact scenario described in issue #7301), the engine process won't be killed.

Additionally, even if builder were non-null, calling builder.appMgrInfo() is incorrect here because the appMgrInfo should be retrieved from the ServiceNodeInfo that was just fetched from ZooKeeper. The correct pattern (as shown in AdminResource.scala:312) is to get appMgrInfo from sn.attributes.get(KyuubiReservedKeys.KYUUBI_ENGINE_APP_MGR_INFO_KEY).

The fix should:

  1. Get the ServiceNodeInfo from ZooKeeper
  2. Extract appMgrInfo from sn.attributes
  3. Use that to kill the application, without depending on the builder at all

Copilot uses AI. Check for mistakes.
Comment on lines +371 to +380
if (shareLevel != CONNECTION && builder != null && engineManager != null) {
try {
val appMgrInfo = builder.appMgrInfo()
engineManager.killApplication(appMgrInfo, engineRefId, Some(appUser))
info(s"Killed engine process for $engineRefId with share level $shareLevel")
} catch {
case e: Exception =>
warn(s"Error killing engine process for $engineRefId", e)
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing test for deregister (line 377-398 in EngineRefTests.scala) doesn't verify that the engine process is actually killed when deregister is called. The test only verifies that the engine is removed from ZooKeeper. Since this PR's purpose is to fix the bug where engines are not killed after deregistration, a test should be added to verify that killApplication is called with the correct parameters when deregister succeeds. This could be done by mocking the engineManager and verifying the killApplication call.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants