test(agent): add agent loop conversion test suite by BloggerBust · Pull Request #1028 · trycua/cua

BloggerBust · 2026-02-06T00:53:58Z

Add unit tests for Anthropic, shared message conversion utilities, and UITARS
Drop user messages that contain only filtered content to avoid invalid API payloads
Guard computer_call_output handling for non dict outputs
Narrow agent Python version constraint and add optional test dependencies

Closes #944

Summary by CodeRabbit

Bug Fixes
- Fixed message handling to properly drop empty filtered image content
- Enhanced type validation for tool outputs
Tests
- Added comprehensive test coverage for message conversion workflows
- Added extensive tests for anthropic and UITARS integration
Chores
- Updated test dependencies and configuration

vercel · 2026-02-06T00:54:03Z

@BloggerBust is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-02-06T00:54:08Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a5602a7a-d179-4fbc-a300-2b2bab024bc5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request adds comprehensive unit test coverage for agent loop implementations (Anthropic, message conversion utilities, and UITARS) while applying targeted bug fixes to message handling logic, including defensive type checking in the Anthropic loop message conversion.

Changes

Cohort / File(s)	Summary
Bug Fixes in Anthropic Loop `libs/python/agent/agent/loops/anthropic.py`	Added isinstance guard for dict type checking before accessing output properties; modified logic to only append "user" messages with non-empty converted_content, effectively dropping messages with filtered-only images.
Test Infrastructure Setup `libs/python/agent/pyproject.toml`, `pyproject.toml`	Added test optional dependency group with pytest, pytest-asyncio, and pytest-mock; simplified root-level test dependencies by consolidating pytest configuration.
Agent Loop Unit Tests `libs/python/agent/tests/test_anthropic.py`, `libs/python/agent/tests/test_message_conversion.py`, `libs/python/agent/tests/test_uitars.py`	Added three new comprehensive test modules covering message conversion, formatting, merging behavior, edge cases, and action serialization across Anthropic, generic message conversion utilities, and UITARS agent loops.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop along through tests galore,
Message conversions to the core!
Anthropic, UITARS, merged with care,
Empty messages? They won't be there!
Defensive checks and edge cases squared,
Agent loops are now declared prepared! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Title 'test(agent): add agent loop conversion test suite' is clear, concise, and directly describes the main change—adding comprehensive test coverage for agent loop conversion utilities.
Linked Issues check	✅ Passed	PR implements all coding-related requirements from issue `#944`: comprehensive unit tests for test_anthropic.py, test_message_conversion.py, and test_uitars.py covering message formatting, merging, edge cases, and serialization with pytest.
Out of Scope Changes check	✅ Passed	All changes align with issue `#944` objectives: new test modules, test dependency updates in pyproject.toml, and targeted fixes to anthropic.py (dropping filtered messages, guarding dict checks) directly supporting test validation needs.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

libs/python/agent/pyproject.toml (1)
30-34: Pin minimum versions for test dependencies.

pytest, pytest-asyncio, and pytest-mock are unpinned. This can cause unexpected breakage when major new releases introduce incompatible defaults (e.g., pytest-asyncio's default async mode changed between versions).
🔧 Suggested minimum version pins
 test = [
-  "pytest",
-  "pytest-asyncio",
-  "pytest-mock",
+  "pytest>=8.0",
+  "pytest-asyncio>=0.23",
+  "pytest-mock>=3.12",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/pyproject.toml` around lines 30 - 34, The test dependencies
in pyproject.toml (the "test" list containing "pytest", "pytest-asyncio", and
"pytest-mock") are unpinned; update those entries to include safe minimum
version pins (for example "pytest>=X.Y", "pytest-asyncio>=A.B",
"pytest-mock>=C.D") to avoid breakage from incompatible major releases—edit the
"test" array to replace the plain package names with pinned minimum versions and
run a quick install/test to verify compatibility.
libs/python/agent/tests/test_message_conversion.py (1)
199-224: Consider whether the placeholder text assertion is too implementation-specific.

Line 222 hard-codes "[Execution completed" — the verbatim placeholder string emitted by convert_responses_items_to_completion_messages when allow_images_in_tool_results=False. This is fine as a behavioral regression guard, but it will fail if the placeholder text ever changes. A looser check like assert result[1]["role"] == "tool" plus assert result[1]["content"] (non-empty) would make the test more resilient to message copy changes while still verifying structure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/tests/test_message_conversion.py` around lines 199 - 224,
The test asserts a specific placeholder string "[Execution completed" in
convert_responses_items_to_completion_messages output which is brittle; change
the assertion to verify the tool message exists and has non-empty content
instead of exact text. In the
test_computer_call_output_with_image_separate_user_message case, keep the checks
that result[1]["role"] == "tool" and replace the exact placeholder assertion
with something like asserting result[1]["content"] is truthy/non-empty (and
optionally contains expected structural hints), while retaining the user image
assertions for result[2].
libs/python/agent/tests/test_uitars.py (1)
249-270: Strengthen drag assertion to verify coordinates.

assert "drag" in text passes even if the coordinates are wrong. The implementation formats start_box and end_box explicitly, so the assertion can be more precise.
🔧 Suggested improvement
 assert len(result) == 1
 text = result[0]["content"][0]["text"]
-assert "drag" in text
+assert "drag(start_box='(100,150)', end_box='(200,250)')" in text
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/tests/test_uitars.py` around lines 249 - 270, Update the
test_drag_action_format to assert the exact formatted coordinates rather than
only checking for the word "drag": in the test function
(test_drag_action_format) use the output from convert_uitars_messages_to_litellm
(result -> text) to assert that the expected "start_box" and "end_box"
substrings with the specific numbers (e.g., start_x:100 start_y:150 and
end_x:200 end_y:250) appear in text, ensuring the formatted coordinate pairs
produced by the implementation are present and correct.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/python/agent/pyproject.toml`:
- Line 27: The pyproject change tightened requires-python to ">=3.12,<3.13"
which unintentionally excludes Python 3.13 (and 3.14); revert or document
intentionally. Either restore the original range to ">=3.12,<3.14" in
libs/python/agent/pyproject.toml (the requires-python field) to allow 3.12 and
3.13, or if exclusion of 3.13 is intentional, add a short comment in the
repo/docs and a justification in the pyproject or a README stating which
dependency or incompatibility forces the upper bound to <3.13.

---

Nitpick comments:
In `@libs/python/agent/pyproject.toml`:
- Around line 30-34: The test dependencies in pyproject.toml (the "test" list
containing "pytest", "pytest-asyncio", and "pytest-mock") are unpinned; update
those entries to include safe minimum version pins (for example "pytest>=X.Y",
"pytest-asyncio>=A.B", "pytest-mock>=C.D") to avoid breakage from incompatible
major releases—edit the "test" array to replace the plain package names with
pinned minimum versions and run a quick install/test to verify compatibility.

In `@libs/python/agent/tests/test_message_conversion.py`:
- Around line 199-224: The test asserts a specific placeholder string
"[Execution completed" in convert_responses_items_to_completion_messages output
which is brittle; change the assertion to verify the tool message exists and has
non-empty content instead of exact text. In the
test_computer_call_output_with_image_separate_user_message case, keep the checks
that result[1]["role"] == "tool" and replace the exact placeholder assertion
with something like asserting result[1]["content"] is truthy/non-empty (and
optionally contains expected structural hints), while retaining the user image
assertions for result[2].

In `@libs/python/agent/tests/test_uitars.py`:
- Around line 249-270: Update the test_drag_action_format to assert the exact
formatted coordinates rather than only checking for the word "drag": in the test
function (test_drag_action_format) use the output from
convert_uitars_messages_to_litellm (result -> text) to assert that the expected
"start_box" and "end_box" substrings with the specific numbers (e.g.,
start_x:100 start_y:150 and end_x:200 end_y:250) appear in text, ensuring the
formatted coordinate pairs produced by the implementation are present and
correct.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0fdfa3a and 5adf1c2.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

libs/python/agent/agent/loops/anthropic.py
libs/python/agent/pyproject.toml
libs/python/agent/tests/test_anthropic.py
libs/python/agent/tests/test_message_conversion.py
libs/python/agent/tests/test_uitars.py

libs/python/agent/pyproject.toml

BloggerBust · 2026-03-03T17:49:11Z

Just checking in! I am happy to revise if needed or adjust scope. Let me know if there is anything blocking review.

- Add unit tests for Anthropic, shared message conversion utilities, and UITARS - Drop user messages that contain only filtered content to avoid invalid API payloads - Guard computer_call_output handling for non dict outputs - Narrow agent Python version constraint and add optional test dependencies Closes trycua#944

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

libs/python/agent/pyproject.toml (1)
30-34: Consider pinning minimum versions for test dependencies.

The root pyproject.toml pins minimum versions for test dependencies (e.g., pytest>=8.0.0), but this package uses bare names. For consistency and reproducibility, consider adding minimum version constraints.
♻️ Suggested version pins
 test = [
-  "pytest",
-  "pytest-asyncio",
-  "pytest-mock",
+  "pytest>=8.0.0",
+  "pytest-asyncio>=0.21.1",
+  "pytest-mock>=3.10.0",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/agent/pyproject.toml` around lines 30 - 34, Update the test
extras in pyproject.toml to pin minimum versions for the test dependencies;
replace the bare names "pytest", "pytest-asyncio", and "pytest-mock" with
version-constrained entries like "pytest>=8.0.0", "pytest-asyncio>=0.20.0",
"pytest-mock>=3.10.0" (or other chosen minima) so the package matches the root
project's reproducibility policy and avoids pulling unbounded versions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 55-59: The test dependency list in pyproject.toml is missing
pytest-cov which causes pytest to fail when CI runs with --cov flags; update the
test extras list (the "test" array) to include "pytest-cov>=X.Y.Z" (pick a
compatible minimum, e.g., "pytest-cov>=4.0.0") so the pytest invocation in CI
recognizes the --cov and --cov-report options; modify the same "test" dependency
block shown in the diff to add the pytest-cov entry.

---

Nitpick comments:
In `@libs/python/agent/pyproject.toml`:
- Around line 30-34: Update the test extras in pyproject.toml to pin minimum
versions for the test dependencies; replace the bare names "pytest",
"pytest-asyncio", and "pytest-mock" with version-constrained entries like
"pytest>=8.0.0", "pytest-asyncio>=0.20.0", "pytest-mock>=3.10.0" (or other
chosen minima) so the package matches the root project's reproducibility policy
and avoids pulling unbounded versions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4daccd0f-71cc-4a98-be9c-bad3fafb96ba

📥 Commits

Reviewing files that changed from the base of the PR and between 65f9701 and d98fa2d.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (6)

libs/python/agent/agent/loops/anthropic.py
libs/python/agent/pyproject.toml
libs/python/agent/tests/test_anthropic.py
libs/python/agent/tests/test_message_conversion.py
libs/python/agent/tests/test_uitars.py
pyproject.toml

pyproject.toml

BloggerBust force-pushed the tests/issue-944 branch from 2ec5be6 to 5adf1c2 Compare February 6, 2026 03:51

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

libs/python/agent/pyproject.toml Outdated Show resolved Hide resolved

BloggerBust force-pushed the tests/issue-944 branch from 5adf1c2 to d1c5e93 Compare February 24, 2026 16:46

BloggerBust added 5 commits March 8, 2026 21:11

fix(agent): restore Python support range to <3.14

82faf79

chore(agent): pin test extra minimum versions

a62689a

test(agent): assert drag start/end coordinates in UITARS conversion

ce2b02b

test(agent): adapt anthropic loop tests to tuple return contract

d98fa2d

BloggerBust force-pushed the tests/issue-944 branch from 5cbe49f to d98fa2d Compare March 9, 2026 04:23

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

pyproject.toml Show resolved Hide resolved

BloggerBust added 2 commits March 8, 2026 22:32

chore(deps): update uv.lock

0de8e2b

fix(agent): restore test dependencies

3e4994f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(agent): add agent loop conversion test suite#1028

test(agent): add agent loop conversion test suite#1028
BloggerBust wants to merge 7 commits intotrycua:mainfrom
BloggerBust:tests/issue-944

BloggerBust commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

BloggerBust commented Mar 3, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BloggerBust commented Feb 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Feb 6, 2026

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BloggerBust commented Mar 3, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BloggerBust commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading