Skip to content

Fix an issue where the AI Assistant was not retaining conversation context between messages, with chat history compaction to manage token budgets.#9737

Open
dpage wants to merge 2 commits intopgadmin-org:masterfrom
dpage:chat_context
Open

Fix an issue where the AI Assistant was not retaining conversation context between messages, with chat history compaction to manage token budgets.#9737
dpage wants to merge 2 commits intopgadmin-org:masterfrom
dpage:chat_context

Conversation

@dpage
Copy link
Contributor

@dpage dpage commented Mar 11, 2026

Summary by CodeRabbit

  • Bug Fixes

    • AI Assistant now retains conversation context between messages, preserving chat awareness across exchanges.
  • New Features

    • Conversation history compaction to optimize token usage while protecting key messages and tool-response pairs.
    • NLQ chat now accepts and returns filtered conversation history and includes conversation IDs in streaming responses.
    • Client UI preserves and manages conversation history, supports clears, and ignores late stream events.
  • Tests

    • Added comprehensive tests for compaction, serialization, filtering, and history-aware chat flows.
  • Documentation

    • Release notes and test guidance updated for history compaction.

…ntext between messages, with chat history compaction to manage token budgets.
@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

Walkthrough

Adds conversation-history compaction for the AI Assistant: a new compaction module with token-estimation, message classification, serialization, and filtering; integrates history deserialization/compaction into NLQ chat backend; frontend sends/receives history; tests and release notes added.

Changes

Cohort / File(s) Summary
Release Notes & Test Docs
docs/en_US/release_notes_9_14.rst, web/pgadmin/llm/tests/README.md
Added bug-fix release note for issue #9736 (conversation context retention and history compaction) and documented compaction tests.
Compaction Module
web/pgadmin/llm/compaction.py
New module implementing token estimation, per-message token estimation, message importance classification, tool-call/tool-result pair handling, history compaction algorithm, serialization/deserialization, and conversational filtering.
Compaction Tests
web/pgadmin/llm/tests/test_compaction.py
New comprehensive unit tests for token estimation, compaction behavior (preserve first/recent, drop low-importance, tool-pair integrity), deserialization, and conversational filtering.
NLQ Chat Backend Integration
web/pgadmin/tools/sqleditor/__init__.py
nlq_chat_stream now accepts optional history, deserializes and compacts history before calling chat_with_database, and returns filtered conversational history plus conversation_id in SSE events.
NLQ Chat Frontend
web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
Added conversationHistory state, includes history in POST payloads, updates local history from SSE events, and clears/cancels streams correctly when conversation is cleared.
NLQ Chat Tests
web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
Added scenario "NLQ Chat - With History" and test adjustments to assert conversation_history is passed to chat_with_database when provided.
Manifest
package.json
Updated manifest changes included (lines modified as part of the PR).

Sequence Diagram

sequenceDiagram
    participant Frontend as NLQChatPanel
    participant Backend as nlq_chat_stream
    participant Compaction as compaction.py
    participant LLM as chat_with_database

    Frontend->>Backend: POST (message + optional history)
    activate Backend

    Backend->>Compaction: deserialize_history(history_data)
    activate Compaction
    Compaction-->>Backend: Message[] objects
    deactivate Compaction

    Backend->>Compaction: compact_history(messages, max_tokens)
    activate Compaction
    Note over Compaction: Classify messages by importance<br/>Preserve first & recent messages<br/>Maintain tool-call/result pairs<br/>Drop low-importance messages until within budget
    Compaction-->>Backend: compacted Message[]
    deactivate Compaction

    Backend->>LLM: chat_with_database(conversation_history)
    activate LLM
    LLM-->>Backend: updated_history
    deactivate LLM

    Backend->>Compaction: filter_conversational(updated_history)
    activate Compaction
    Note over Compaction: Remove internal tool messages<br/>Keep user & final assistant messages
    Compaction-->>Backend: filtered Message[]
    deactivate Compaction

    Backend-->>Frontend: SSE event (response + history + conversation_id)
    deactivate Backend

    Frontend->>Frontend: update conversationHistory state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: fixing conversation context retention and adding chat history compaction to manage token budgets.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/llm/compaction.py`:
- Around line 185-212: _find_tool_pair_indices currently only records one
partner index per assistant turn, which loses additional tool-result indices
when a single assistant message has multiple Message.tool_calls; change it to
return (or build) groupings of indices so each index maps to the full set of
indices that belong to that tool-use turn (e.g., map an assistant index and all
matching tool-result indices to the same set/list of indices), update the
function logic to collect all matching j indices for every assistant i (use a
collection like set for tool_call_ids and matched_tool_indices), and then return
a mapping where every index in that group points to the complete group (and
update compact_history to consult the group for protection/removal instead of
expecting a single partner).
- Around line 249-258: The current protected set (protected, protected.add(0),
recent_window) can itself exceed max_tokens and leave candidates empty; add a
fallback after building protected that computes the token count of protected
messages (use the same tokenization logic used elsewhere) and, if
protected_tokens > max_tokens, iteratively relax protections by shrinking the
recent window (e.g., decrement recent_window or remove the highest-index entries
from protected, preserving index 0) until protected_tokens <= max_tokens or only
the first message remains protected; then rebuild candidates and proceed so
compaction can produce a trimmed history even when the initially protected set
is over budget.

In `@web/pgadmin/llm/tests/test_compaction.py`:
- Around line 138-168: The test_drops_low_value fixture accidentally inflates
low-value messages with the shared filler so they no longer classify as
low-importance; update the test so the transient/routine messages (the ones with
contents 'ok', 'thanks', etc.) remain short (remove the appended filler for
those specific Message.user/Message.assistant entries) or alternatively stub the
token estimation used by compact_history/_classify_message to force those
entries to be low-value, then assert that those exact short entries are removed
while the first and recent messages remain.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`:
- Around line 408-412: The Clear handler (handleClearConversation) currently
resets state via setMessages, setConversationId and setConversationHistory but
does not prevent in-flight stream handlers from later re-appending old events;
add a silent-cancel token/guard: store a mutable requestId or AbortController in
component state (e.g., currentRequestToken/currentAbortController), increment or
replace it inside handleClearConversation, and update all
streaming/read/complete event handlers to compare the incoming event's token
against the latest token (or check abortController.signal) and return early if
it doesn’t match, so late "Generation stopped." or "complete" events are ignored
and cannot repopulate messages/conversationId/conversationHistory after Clear.

In `@web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py`:
- Around line 50-65: The test case for "NLQ Chat - With History" doesn't verify
that nlq_chat_stream() passes the provided history to chat_with_database() or
that the SSE output includes the history; update the test to assert that
chat_with_database() was called with a non-empty conversation_history (or
specifically equals the provided history) and/or parse the emitted SSE event
from nlq_chat_stream() to assert the returned payload contains the history field
matching the input; reference the test helper/mock for chat_with_database(), the
nlq_chat_stream() invocation, and the "history" variable in the test to locate
where to add these assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a096758c-1b91-42eb-8f4f-519c35414927

📥 Commits

Reviewing files that changed from the base of the PR and between c8bd75c and c30e633.

📒 Files selected for processing (7)
  • docs/en_US/release_notes_9_14.rst
  • web/pgadmin/llm/compaction.py
  • web/pgadmin/llm/tests/README.md
  • web/pgadmin/llm/tests/test_compaction.py
  • web/pgadmin/tools/sqleditor/__init__.py
  • web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py

@dpage
Copy link
Contributor Author

dpage commented Mar 12, 2026

The failed test here appears to be the one fixed in #9723

- Track tool-use turns as groups instead of one-to-one pairs, so
  multi-tool assistant messages don't leave orphaned results.
- Add fallback to shrink the recent window when protected messages
  alone exceed the token budget, preventing compaction no-ops.
- Fix low-value test fixtures to keep transient messages short so
  they actually classify as low-importance.
- Guard Clear button against in-flight stream race conditions by
  adding a clearedRef flag and cancelling active streams.
- Assert that conversation history is actually passed through to
  chat_with_database in the "With History" test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx (1)

409-424: ⚠️ Potential issue | 🟠 Major

Make Clear request-scoped instead of relying on one shared clearedRef.

Line 464 resets clearedRef.current for every new submit. If request A is cleared and request B starts before A finishes unwinding, A's later cleanup can still clobber B's active refs/loading state and potentially reapply stale conversation state. Please gate cleanup and SSE handling with a per-request token (or current controller identity), not a single component-wide boolean.

💡 Suggested direction
+  const requestIdRef = useRef(0);
+
   const handleClearConversation = () => {
+    requestIdRef.current += 1;
     clearedRef.current = true;
     ...
   };

   const handleSubmit = async () => {
+    const requestId = ++requestIdRef.current;
     stoppedRef.current = false;
     clearedRef.current = false;
     ...
-              handleSSEEvent(data, thinkingId);
+              if (requestIdRef.current === requestId) {
+                handleSSEEvent(data, thinkingId);
+              }
     ...
-    } finally {
-      setIsLoading(false);
-      setThinkingMessageId(null);
-    }
+    } finally {
+      if (requestIdRef.current === requestId) {
+        setIsLoading(false);
+        setThinkingMessageId(null);
+      }
+    }
   };

Also applies to: 462-464, 579-582

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`
around lines 409 - 424, The component-wide clearedRef flow causes race
conditions; replace it with a request-scoped identity (e.g., a requestId or
attach an id to the AbortController) and use that id to gate cleanup and SSE
handlers instead of clearedRef.current. On submit create a new id (or use
abortControllerRef.current) and store it as the “currentRequestId” on a ref;
when starting/handling streams (and in any async callbacks) compare the
callback’s captured requestId to the live currentRequestId before mutating
readerRef, abortControllerRef, setMessages, setIsLoading, setConversationId, or
setConversationHistory. Modify handleClearConversation to cancel/abort the
specific controller and advance/clear the currentRequestId so only that
request’s callbacks will run cleanup, and ensure all SSE/stream handlers check
the request-scoped id before applying state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/llm/compaction.py`:
- Around line 256-279: The recent-window protection logic builds the protected
set (protected, recent_window, recent_start) but can split tool-use groups
leaving orphaned Role.TOOL or assistant tool-call messages; before computing
protected_tokens (and before the while budget loop) expand protected to include
entire tool-use groups or mark any group that intersects protected as
undroppable so all partner messages are included; specifically, after building
protected (and similarly in the other similar blocks around the code handling
indices 311-318 and 333-339), iterate the message group boundaries (using
messages, their roles and any tool-call markers used by your grouping logic) to
union in all indices of any group that intersects protected, then compute
protected_tokens via estimate_message_tokens(provider) on the expanded set so
the budgeting loop never leaves an orphaned tool message.
- Around line 349-400: deserialize_history currently assumes history_data and
nested items are dicts and calls .get() directly, which will crash if any item
is None or not a dict; update deserialize_history to validate shapes before
accessing keys: ensure history_data is an iterable of dicts (skip non-dict
items), treat item.get('tool_calls') and item.get('tool_results') as
(item.get(...) or []) and iterate only over entries that are dicts, and check
each tc_data/tr_data is a dict before constructing ToolCall/ToolResult; also
validate role_str is a string before passing to Role(role_str) and skip
malformed entries so Message, Role, ToolCall, ToolResult creation only occurs
for well-formed dicts.

In `@web/pgadmin/llm/tests/test_compaction.py`:
- Around line 133-136: The test only compares overlapping elements so it can
pass if compact_history() returns fewer than four messages; update the assertion
to ensure the full four-message tail is preserved by first asserting
len(result[-4:]) == 4 (or len(result) >= 4) and then asserting the two slices
are equal (compare last_4_original == last_4_result) so the test fails when any
protected recent message is missing; locate this change around compact_history()
usage and the variables last_4_original / last_4_result in test_compaction.py.

---

Duplicate comments:
In `@web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx`:
- Around line 409-424: The component-wide clearedRef flow causes race
conditions; replace it with a request-scoped identity (e.g., a requestId or
attach an id to the AbortController) and use that id to gate cleanup and SSE
handlers instead of clearedRef.current. On submit create a new id (or use
abortControllerRef.current) and store it as the “currentRequestId” on a ref;
when starting/handling streams (and in any async callbacks) compare the
callback’s captured requestId to the live currentRequestId before mutating
readerRef, abortControllerRef, setMessages, setIsLoading, setConversationId, or
setConversationHistory. Modify handleClearConversation to cancel/abort the
specific controller and advance/clear the currentRequestId so only that
request’s callbacks will run cleanup, and ensure all SSE/stream handlers check
the request-scoped id before applying state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 49df7280-efd9-4ef9-9139-880143b82c2a

📥 Commits

Reviewing files that changed from the base of the PR and between c30e633 and 11ea048.

📒 Files selected for processing (4)
  • web/pgadmin/llm/compaction.py
  • web/pgadmin/llm/tests/test_compaction.py
  • web/pgadmin/tools/sqleditor/static/js/components/sections/NLQChatPanel.jsx
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • web/pgadmin/tools/sqleditor/tests/test_nlq_chat.py

Comment on lines +256 to +279
# Determine protected indices
protected = set()

# Always protect the first message
protected.add(0)

# Always protect the recent window
recent_start = max(1, total - recent_window)
for i in range(recent_start, total):
protected.add(i)

# If protected messages alone exceed the budget, shrink the
# recent window until we have room for compaction candidates.
while recent_window > 0:
protected_tokens = sum(
estimate_message_tokens(messages[i], provider)
for i in protected
)
if protected_tokens <= max_tokens:
break
recent_window -= 1
recent_start = max(1, total - recent_window)
protected = {0} | set(range(recent_start, total))

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

A protected recent window can still split a tool-use turn.

If one member of a tool group lands in protected and another does not, the drop loop can still remove the unprotected side because it only skips protected partners. That leaves an orphaned assistant tool-call or Role.TOOL message in the compacted history. Expand protected to whole groups before budgeting, or treat any group intersecting protected as undroppable.

Also applies to: 311-318, 333-339

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/compaction.py` around lines 256 - 279, The recent-window
protection logic builds the protected set (protected, recent_window,
recent_start) but can split tool-use groups leaving orphaned Role.TOOL or
assistant tool-call messages; before computing protected_tokens (and before the
while budget loop) expand protected to include entire tool-use groups or mark
any group that intersects protected as undroppable so all partner messages are
included; specifically, after building protected (and similarly in the other
similar blocks around the code handling indices 311-318 and 333-339), iterate
the message group boundaries (using messages, their roles and any tool-call
markers used by your grouping logic) to union in all indices of any group that
intersects protected, then compute protected_tokens via
estimate_message_tokens(provider) on the expanded set so the budgeting loop
never leaves an orphaned tool message.

Comment on lines +349 to +400
def deserialize_history(
history_data: list[dict]
) -> list[Message]:
"""Deserialize conversation history from JSON request data.

Converts a list of message dictionaries (from the frontend) into
Message objects suitable for passing to chat_with_database().

Args:
history_data: List of dicts with 'role' and 'content' keys,
and optionally 'tool_calls' and 'tool_results'.

Returns:
List of Message objects.
"""
messages = []
for item in history_data:
role_str = item.get('role', '')
content = item.get('content', '')

try:
role = Role(role_str)
except ValueError:
continue # Skip unknown roles

# Reconstruct tool calls if present
tool_calls = []
for tc_data in item.get('tool_calls', []):
tool_calls.append(ToolCall(
id=tc_data.get('id', ''),
name=tc_data.get('name', ''),
arguments=tc_data.get('arguments', {})
))

# Reconstruct tool results if present
from pgadmin.llm.models import ToolResult
tool_results = []
for tr_data in item.get('tool_results', []):
tool_results.append(ToolResult(
tool_call_id=tr_data.get('tool_call_id', ''),
content=tr_data.get('content', ''),
is_error=tr_data.get('is_error', False)
))

messages.append(Message(
role=role,
content=content,
tool_calls=tool_calls,
tool_results=tool_results
))

return messages
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate deserialized history before calling .get().

This helper consumes request JSON, but it assumes every history, tool_calls, and tool_results entry is a dict. A single null or string value will raise here and fail the whole turn instead of just ignoring the bad element. Guard the top-level and nested items with shape checks, and use or [] for nullable lists.

💡 Suggested fix
 def deserialize_history(
     history_data: list[dict]
 ) -> list[Message]:
@@
-    messages = []
+    if not isinstance(history_data, list):
+        return []
+
+    messages = []
     for item in history_data:
+        if not isinstance(item, dict):
+            continue
+
         role_str = item.get('role', '')
         content = item.get('content', '')
@@
-        for tc_data in item.get('tool_calls', []):
+        for tc_data in item.get('tool_calls') or []:
+            if not isinstance(tc_data, dict):
+                continue
             tool_calls.append(ToolCall(
                 id=tc_data.get('id', ''),
                 name=tc_data.get('name', ''),
                 arguments=tc_data.get('arguments', {})
             ))
@@
-        for tr_data in item.get('tool_results', []):
+        for tr_data in item.get('tool_results') or []:
+            if not isinstance(tr_data, dict):
+                continue
             tool_results.append(ToolResult(
                 tool_call_id=tr_data.get('tool_call_id', ''),
                 content=tr_data.get('content', ''),
                 is_error=tr_data.get('is_error', False)
             ))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/compaction.py` around lines 349 - 400, deserialize_history
currently assumes history_data and nested items are dicts and calls .get()
directly, which will crash if any item is None or not a dict; update
deserialize_history to validate shapes before accessing keys: ensure
history_data is an iterable of dicts (skip non-dict items), treat
item.get('tool_calls') and item.get('tool_results') as (item.get(...) or []) and
iterate only over entries that are dicts, and check each tc_data/tr_data is a
dict before constructing ToolCall/ToolResult; also validate role_str is a string
before passing to Role(role_str) and skip malformed entries so Message, Role,
ToolCall, ToolResult creation only occurs for well-formed dicts.

Comment on lines +133 to +136
last_4_original = messages[-4:]
last_4_result = result[-4:]
for orig, res in zip(last_4_original, last_4_result):
self.assertEqual(orig.content, res.content)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the full protected tail, not just the overlapping prefix.

This comparison still passes when compact_history() returns fewer than four trailing messages, because the loop only checks as many elements as result[-4:] happens to contain. Compare the two four-message slices directly, or assert the tail length first, so the test fails when any protected recent message disappears.

💡 Suggested fix
-        last_4_original = messages[-4:]
-        last_4_result = result[-4:]
-        for orig, res in zip(last_4_original, last_4_result):
-            self.assertEqual(orig.content, res.content)
+        self.assertGreaterEqual(len(result), 5)
+        self.assertEqual(
+            [m.content for m in result[-4:]],
+            [m.content for m in messages[-4:]],
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
last_4_original = messages[-4:]
last_4_result = result[-4:]
for orig, res in zip(last_4_original, last_4_result):
self.assertEqual(orig.content, res.content)
last_4_original = messages[-4:]
self.assertGreaterEqual(len(result), 5)
self.assertEqual(
[m.content for m in result[-4:]],
[m.content for m in messages[-4:]],
)
🧰 Tools
🪛 Ruff (0.15.5)

[warning] 135-135: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/tests/test_compaction.py` around lines 133 - 136, The test
only compares overlapping elements so it can pass if compact_history() returns
fewer than four messages; update the assertion to ensure the full four-message
tail is preserved by first asserting len(result[-4:]) == 4 (or len(result) >= 4)
and then asserting the two slices are equal (compare last_4_original ==
last_4_result) so the test fails when any protected recent message is missing;
locate this change around compact_history() usage and the variables
last_4_original / last_4_result in test_compaction.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant