Skip to content

fix: prevent CPU runaway in HttpNodesStatsAction JSON parsing#18

Merged
marevol merged 2 commits intomainfrom
fix/nodes-stats-parser-infinite-loop
Mar 22, 2026
Merged

fix: prevent CPU runaway in HttpNodesStatsAction JSON parsing#18
marevol merged 2 commits intomainfrom
fix/nodes-stats-parser-infinite-loop

Conversation

@marevol
Copy link
Contributor

@marevol marevol commented Mar 22, 2026

Summary

  • Fix infinite CPU loop caused by missing parser.nextToken() in three parse methods (parseSearchBackpressureStats, parseTaskCancellationStats, parseSearchPipelineStats) introduced in feat(action): add HTTP action implementations for cluster and node APIs #14
  • Add defensive null-token (EOF) guards in critical parsing loops to prevent future runaway bugs
  • Add START_ARRAY handling in consumeObject and remove orphan new ArrayList<>()
  • Add 61 unit tests for comprehensive coverage of the Node Stats parsing logic

Root Cause

When consumeObject() or sub-parse methods were called with the parser at START_OBJECT (without first calling parser.nextToken() to advance into the object), consumeObject() consumed not only the target object but also all sibling fields up to the parent's END_OBJECT. This caused cascading scope corruption: parseNodeStats would exit at the wrong END_OBJECT, parseNodes would read past the JSON stream end, and null != END_OBJECT would evaluate to true indefinitely — an infinite busy loop.

Each periodic _nodes/stats API call spawned a new thread that got stuck, accumulating to 1098% CPU usage.

Test plan

  • All 61 unit tests pass (token boundary, field ordering, multiple nodes, truncated JSON, concurrent parsing)
  • mvn formatter:format && mvn license:format applied
  • Deploy and verify CPU usage normalizes during crawling

marevol added 2 commits March 22, 2026 11:39
Three parse methods introduced in #14 were missing `parser.nextToken()`
before calling `consumeObject()` or sub-parse methods when the current
token was START_OBJECT. This caused `consumeObject()` to consume tokens
beyond its scope (including sibling fields), corrupting the parser state
and eventually leading to an infinite loop at EOF — spinning all eshttp
threads at 100% CPU each.

Fixes:
- Add `parser.nextToken()` in parseSearchBackpressureStats,
  parseTaskCancellationStats, and parseSearchPipelineStats
- Add null-token (EOF) guards in fromXContent, parseNodes,
  parseNodeStats, and consumeObject to throw IOException instead of
  spinning
- Add START_ARRAY handling in consumeObject
- Remove orphan `new ArrayList<>()` in parseNodeStats

Tests: add 61 unit tests covering token boundary verification, field
ordering, multiple nodes, partial sub-fields, deeply nested structures,
truncated JSON, and concurrent parsing.
- Fix fromXContent to call nextToken() when parser is uninitialized
  (currentToken is null before first read), preventing false EOF
  detection that broke integration tests
- Change test log level from ALL to INFO in 5 integration test classes
  to reduce output from ~30k lines to ~1k lines for GitHub Actions
@marevol marevol merged commit 3b2df40 into main Mar 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant