Skip to content

test(agent): add agent loop conversion test suite#1028

Open
BloggerBust wants to merge 7 commits intotrycua:mainfrom
BloggerBust:tests/issue-944
Open

test(agent): add agent loop conversion test suite#1028
BloggerBust wants to merge 7 commits intotrycua:mainfrom
BloggerBust:tests/issue-944

Conversation

@BloggerBust
Copy link

@BloggerBust BloggerBust commented Feb 6, 2026

  • Add unit tests for Anthropic, shared message conversion utilities, and UITARS
  • Drop user messages that contain only filtered content to avoid invalid API payloads
  • Guard computer_call_output handling for non dict outputs
  • Narrow agent Python version constraint and add optional test dependencies

Closes #944

Summary by CodeRabbit

  • Bug Fixes

    • Fixed message handling to properly drop empty filtered image content
    • Enhanced type validation for tool outputs
  • Tests

    • Added comprehensive test coverage for message conversion workflows
    • Added extensive tests for anthropic and UITARS integration
  • Chores

    • Updated test dependencies and configuration

@vercel
Copy link
Contributor

vercel bot commented Feb 6, 2026

@BloggerBust is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a5602a7a-d179-4fbc-a300-2b2bab024bc5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request adds comprehensive unit test coverage for agent loop implementations (Anthropic, message conversion utilities, and UITARS) while applying targeted bug fixes to message handling logic, including defensive type checking in the Anthropic loop message conversion.

Changes

Cohort / File(s) Summary
Bug Fixes in Anthropic Loop
libs/python/agent/agent/loops/anthropic.py
Added isinstance guard for dict type checking before accessing output properties; modified logic to only append "user" messages with non-empty converted_content, effectively dropping messages with filtered-only images.
Test Infrastructure Setup
libs/python/agent/pyproject.toml, pyproject.toml
Added test optional dependency group with pytest, pytest-asyncio, and pytest-mock; simplified root-level test dependencies by consolidating pytest configuration.
Agent Loop Unit Tests
libs/python/agent/tests/test_anthropic.py, libs/python/agent/tests/test_message_conversion.py, libs/python/agent/tests/test_uitars.py
Added three new comprehensive test modules covering message conversion, formatting, merging behavior, edge cases, and action serialization across Anthropic, generic message conversion utilities, and UITARS agent loops.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop along through tests galore,
Message conversions to the core!
Anthropic, UITARS, merged with care,
Empty messages? They won't be there!
Defensive checks and edge cases squared,
Agent loops are now declared prepared!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title 'test(agent): add agent loop conversion test suite' is clear, concise, and directly describes the main change—adding comprehensive test coverage for agent loop conversion utilities.
Linked Issues check ✅ Passed PR implements all coding-related requirements from issue #944: comprehensive unit tests for test_anthropic.py, test_message_conversion.py, and test_uitars.py covering message formatting, merging, edge cases, and serialization with pytest.
Out of Scope Changes check ✅ Passed All changes align with issue #944 objectives: new test modules, test dependency updates in pyproject.toml, and targeted fixes to anthropic.py (dropping filtered messages, guarding dict checks) directly supporting test validation needs.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
libs/python/agent/pyproject.toml (1)

30-34: Pin minimum versions for test dependencies.

pytest, pytest-asyncio, and pytest-mock are unpinned. This can cause unexpected breakage when major new releases introduce incompatible defaults (e.g., pytest-asyncio's default async mode changed between versions).

🔧 Suggested minimum version pins
 test = [
-  "pytest",
-  "pytest-asyncio",
-  "pytest-mock",
+  "pytest>=8.0",
+  "pytest-asyncio>=0.23",
+  "pytest-mock>=3.12",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/pyproject.toml` around lines 30 - 34, The test dependencies
in pyproject.toml (the "test" list containing "pytest", "pytest-asyncio", and
"pytest-mock") are unpinned; update those entries to include safe minimum
version pins (for example "pytest>=X.Y", "pytest-asyncio>=A.B",
"pytest-mock>=C.D") to avoid breakage from incompatible major releases—edit the
"test" array to replace the plain package names with pinned minimum versions and
run a quick install/test to verify compatibility.
libs/python/agent/tests/test_message_conversion.py (1)

199-224: Consider whether the placeholder text assertion is too implementation-specific.

Line 222 hard-codes "[Execution completed" — the verbatim placeholder string emitted by convert_responses_items_to_completion_messages when allow_images_in_tool_results=False. This is fine as a behavioral regression guard, but it will fail if the placeholder text ever changes. A looser check like assert result[1]["role"] == "tool" plus assert result[1]["content"] (non-empty) would make the test more resilient to message copy changes while still verifying structure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/tests/test_message_conversion.py` around lines 199 - 224,
The test asserts a specific placeholder string "[Execution completed" in
convert_responses_items_to_completion_messages output which is brittle; change
the assertion to verify the tool message exists and has non-empty content
instead of exact text. In the
test_computer_call_output_with_image_separate_user_message case, keep the checks
that result[1]["role"] == "tool" and replace the exact placeholder assertion
with something like asserting result[1]["content"] is truthy/non-empty (and
optionally contains expected structural hints), while retaining the user image
assertions for result[2].
libs/python/agent/tests/test_uitars.py (1)

249-270: Strengthen drag assertion to verify coordinates.

assert "drag" in text passes even if the coordinates are wrong. The implementation formats start_box and end_box explicitly, so the assertion can be more precise.

🔧 Suggested improvement
 assert len(result) == 1
 text = result[0]["content"][0]["text"]
-assert "drag" in text
+assert "drag(start_box='(100,150)', end_box='(200,250)')" in text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/tests/test_uitars.py` around lines 249 - 270, Update the
test_drag_action_format to assert the exact formatted coordinates rather than
only checking for the word "drag": in the test function
(test_drag_action_format) use the output from convert_uitars_messages_to_litellm
(result -> text) to assert that the expected "start_box" and "end_box"
substrings with the specific numbers (e.g., start_x:100 start_y:150 and
end_x:200 end_y:250) appear in text, ensuring the formatted coordinate pairs
produced by the implementation are present and correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/python/agent/pyproject.toml`:
- Line 27: The pyproject change tightened requires-python to ">=3.12,<3.13"
which unintentionally excludes Python 3.13 (and 3.14); revert or document
intentionally. Either restore the original range to ">=3.12,<3.14" in
libs/python/agent/pyproject.toml (the requires-python field) to allow 3.12 and
3.13, or if exclusion of 3.13 is intentional, add a short comment in the
repo/docs and a justification in the pyproject or a README stating which
dependency or incompatibility forces the upper bound to <3.13.

---

Nitpick comments:
In `@libs/python/agent/pyproject.toml`:
- Around line 30-34: The test dependencies in pyproject.toml (the "test" list
containing "pytest", "pytest-asyncio", and "pytest-mock") are unpinned; update
those entries to include safe minimum version pins (for example "pytest>=X.Y",
"pytest-asyncio>=A.B", "pytest-mock>=C.D") to avoid breakage from incompatible
major releases—edit the "test" array to replace the plain package names with
pinned minimum versions and run a quick install/test to verify compatibility.

In `@libs/python/agent/tests/test_message_conversion.py`:
- Around line 199-224: The test asserts a specific placeholder string
"[Execution completed" in convert_responses_items_to_completion_messages output
which is brittle; change the assertion to verify the tool message exists and has
non-empty content instead of exact text. In the
test_computer_call_output_with_image_separate_user_message case, keep the checks
that result[1]["role"] == "tool" and replace the exact placeholder assertion
with something like asserting result[1]["content"] is truthy/non-empty (and
optionally contains expected structural hints), while retaining the user image
assertions for result[2].

In `@libs/python/agent/tests/test_uitars.py`:
- Around line 249-270: Update the test_drag_action_format to assert the exact
formatted coordinates rather than only checking for the word "drag": in the test
function (test_drag_action_format) use the output from
convert_uitars_messages_to_litellm (result -> text) to assert that the expected
"start_box" and "end_box" substrings with the specific numbers (e.g.,
start_x:100 start_y:150 and end_x:200 end_y:250) appear in text, ensuring the
formatted coordinate pairs produced by the implementation are present and
correct.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0fdfa3a and 5adf1c2.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • libs/python/agent/agent/loops/anthropic.py
  • libs/python/agent/pyproject.toml
  • libs/python/agent/tests/test_anthropic.py
  • libs/python/agent/tests/test_message_conversion.py
  • libs/python/agent/tests/test_uitars.py

@BloggerBust
Copy link
Author

Just checking in! I am happy to revise if needed or adjust scope. Let me know if there is anything blocking review.

- Add unit tests for Anthropic, shared message conversion utilities, and UITARS
- Drop user messages that contain only filtered content to avoid invalid API payloads
- Guard computer_call_output handling for non dict outputs
- Narrow agent Python version constraint and add optional test dependencies

Closes trycua#944
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
libs/python/agent/pyproject.toml (1)

30-34: Consider pinning minimum versions for test dependencies.

The root pyproject.toml pins minimum versions for test dependencies (e.g., pytest>=8.0.0), but this package uses bare names. For consistency and reproducibility, consider adding minimum version constraints.

♻️ Suggested version pins
 test = [
-  "pytest",
-  "pytest-asyncio",
-  "pytest-mock",
+  "pytest>=8.0.0",
+  "pytest-asyncio>=0.21.1",
+  "pytest-mock>=3.10.0",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/pyproject.toml` around lines 30 - 34, Update the test
extras in pyproject.toml to pin minimum versions for the test dependencies;
replace the bare names "pytest", "pytest-asyncio", and "pytest-mock" with
version-constrained entries like "pytest>=8.0.0", "pytest-asyncio>=0.20.0",
"pytest-mock>=3.10.0" (or other chosen minima) so the package matches the root
project's reproducibility policy and avoids pulling unbounded versions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 55-59: The test dependency list in pyproject.toml is missing
pytest-cov which causes pytest to fail when CI runs with --cov flags; update the
test extras list (the "test" array) to include "pytest-cov>=X.Y.Z" (pick a
compatible minimum, e.g., "pytest-cov>=4.0.0") so the pytest invocation in CI
recognizes the --cov and --cov-report options; modify the same "test" dependency
block shown in the diff to add the pytest-cov entry.

---

Nitpick comments:
In `@libs/python/agent/pyproject.toml`:
- Around line 30-34: Update the test extras in pyproject.toml to pin minimum
versions for the test dependencies; replace the bare names "pytest",
"pytest-asyncio", and "pytest-mock" with version-constrained entries like
"pytest>=8.0.0", "pytest-asyncio>=0.20.0", "pytest-mock>=3.10.0" (or other
chosen minima) so the package matches the root project's reproducibility policy
and avoids pulling unbounded versions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4daccd0f-71cc-4a98-be9c-bad3fafb96ba

📥 Commits

Reviewing files that changed from the base of the PR and between 65f9701 and d98fa2d.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • libs/python/agent/agent/loops/anthropic.py
  • libs/python/agent/pyproject.toml
  • libs/python/agent/tests/test_anthropic.py
  • libs/python/agent/tests/test_message_conversion.py
  • libs/python/agent/tests/test_uitars.py
  • pyproject.toml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: Add Unit Tests for Agent Loop Implementations

1 participant