test(agent): add agent loop conversion test suite#1028
test(agent): add agent loop conversion test suite#1028BloggerBust wants to merge 7 commits intotrycua:mainfrom
Conversation
|
@BloggerBust is attempting to deploy a commit to the Cua Team on Vercel. A member of the Team first needs to authorize it. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request adds comprehensive unit test coverage for agent loop implementations (Anthropic, message conversion utilities, and UITARS) while applying targeted bug fixes to message handling logic, including defensive type checking in the Anthropic loop message conversion. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
2ec5be6 to
5adf1c2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
libs/python/agent/pyproject.toml (1)
30-34: Pin minimum versions for test dependencies.
pytest,pytest-asyncio, andpytest-mockare unpinned. This can cause unexpected breakage when major new releases introduce incompatible defaults (e.g.,pytest-asyncio's default async mode changed between versions).🔧 Suggested minimum version pins
test = [ - "pytest", - "pytest-asyncio", - "pytest-mock", + "pytest>=8.0", + "pytest-asyncio>=0.23", + "pytest-mock>=3.12", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/agent/pyproject.toml` around lines 30 - 34, The test dependencies in pyproject.toml (the "test" list containing "pytest", "pytest-asyncio", and "pytest-mock") are unpinned; update those entries to include safe minimum version pins (for example "pytest>=X.Y", "pytest-asyncio>=A.B", "pytest-mock>=C.D") to avoid breakage from incompatible major releases—edit the "test" array to replace the plain package names with pinned minimum versions and run a quick install/test to verify compatibility.libs/python/agent/tests/test_message_conversion.py (1)
199-224: Consider whether the placeholder text assertion is too implementation-specific.Line 222 hard-codes
"[Execution completed"— the verbatim placeholder string emitted byconvert_responses_items_to_completion_messageswhenallow_images_in_tool_results=False. This is fine as a behavioral regression guard, but it will fail if the placeholder text ever changes. A looser check likeassert result[1]["role"] == "tool"plusassert result[1]["content"](non-empty) would make the test more resilient to message copy changes while still verifying structure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/agent/tests/test_message_conversion.py` around lines 199 - 224, The test asserts a specific placeholder string "[Execution completed" in convert_responses_items_to_completion_messages output which is brittle; change the assertion to verify the tool message exists and has non-empty content instead of exact text. In the test_computer_call_output_with_image_separate_user_message case, keep the checks that result[1]["role"] == "tool" and replace the exact placeholder assertion with something like asserting result[1]["content"] is truthy/non-empty (and optionally contains expected structural hints), while retaining the user image assertions for result[2].libs/python/agent/tests/test_uitars.py (1)
249-270: Strengthen drag assertion to verify coordinates.
assert "drag" in textpasses even if the coordinates are wrong. The implementation formatsstart_boxandend_boxexplicitly, so the assertion can be more precise.🔧 Suggested improvement
assert len(result) == 1 text = result[0]["content"][0]["text"] -assert "drag" in text +assert "drag(start_box='(100,150)', end_box='(200,250)')" in text🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/agent/tests/test_uitars.py` around lines 249 - 270, Update the test_drag_action_format to assert the exact formatted coordinates rather than only checking for the word "drag": in the test function (test_drag_action_format) use the output from convert_uitars_messages_to_litellm (result -> text) to assert that the expected "start_box" and "end_box" substrings with the specific numbers (e.g., start_x:100 start_y:150 and end_x:200 end_y:250) appear in text, ensuring the formatted coordinate pairs produced by the implementation are present and correct.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/python/agent/pyproject.toml`:
- Line 27: The pyproject change tightened requires-python to ">=3.12,<3.13"
which unintentionally excludes Python 3.13 (and 3.14); revert or document
intentionally. Either restore the original range to ">=3.12,<3.14" in
libs/python/agent/pyproject.toml (the requires-python field) to allow 3.12 and
3.13, or if exclusion of 3.13 is intentional, add a short comment in the
repo/docs and a justification in the pyproject or a README stating which
dependency or incompatibility forces the upper bound to <3.13.
---
Nitpick comments:
In `@libs/python/agent/pyproject.toml`:
- Around line 30-34: The test dependencies in pyproject.toml (the "test" list
containing "pytest", "pytest-asyncio", and "pytest-mock") are unpinned; update
those entries to include safe minimum version pins (for example "pytest>=X.Y",
"pytest-asyncio>=A.B", "pytest-mock>=C.D") to avoid breakage from incompatible
major releases—edit the "test" array to replace the plain package names with
pinned minimum versions and run a quick install/test to verify compatibility.
In `@libs/python/agent/tests/test_message_conversion.py`:
- Around line 199-224: The test asserts a specific placeholder string
"[Execution completed" in convert_responses_items_to_completion_messages output
which is brittle; change the assertion to verify the tool message exists and has
non-empty content instead of exact text. In the
test_computer_call_output_with_image_separate_user_message case, keep the checks
that result[1]["role"] == "tool" and replace the exact placeholder assertion
with something like asserting result[1]["content"] is truthy/non-empty (and
optionally contains expected structural hints), while retaining the user image
assertions for result[2].
In `@libs/python/agent/tests/test_uitars.py`:
- Around line 249-270: Update the test_drag_action_format to assert the exact
formatted coordinates rather than only checking for the word "drag": in the test
function (test_drag_action_format) use the output from
convert_uitars_messages_to_litellm (result -> text) to assert that the expected
"start_box" and "end_box" substrings with the specific numbers (e.g.,
start_x:100 start_y:150 and end_x:200 end_y:250) appear in text, ensuring the
formatted coordinate pairs produced by the implementation are present and
correct.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
libs/python/agent/agent/loops/anthropic.pylibs/python/agent/pyproject.tomllibs/python/agent/tests/test_anthropic.pylibs/python/agent/tests/test_message_conversion.pylibs/python/agent/tests/test_uitars.py
5adf1c2 to
d1c5e93
Compare
|
Just checking in! I am happy to revise if needed or adjust scope. Let me know if there is anything blocking review. |
- Add unit tests for Anthropic, shared message conversion utilities, and UITARS - Drop user messages that contain only filtered content to avoid invalid API payloads - Guard computer_call_output handling for non dict outputs - Narrow agent Python version constraint and add optional test dependencies Closes trycua#944
5cbe49f to
d98fa2d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
libs/python/agent/pyproject.toml (1)
30-34: Consider pinning minimum versions for test dependencies.The root
pyproject.tomlpins minimum versions for test dependencies (e.g.,pytest>=8.0.0), but this package uses bare names. For consistency and reproducibility, consider adding minimum version constraints.♻️ Suggested version pins
test = [ - "pytest", - "pytest-asyncio", - "pytest-mock", + "pytest>=8.0.0", + "pytest-asyncio>=0.21.1", + "pytest-mock>=3.10.0", ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/python/agent/pyproject.toml` around lines 30 - 34, Update the test extras in pyproject.toml to pin minimum versions for the test dependencies; replace the bare names "pytest", "pytest-asyncio", and "pytest-mock" with version-constrained entries like "pytest>=8.0.0", "pytest-asyncio>=0.20.0", "pytest-mock>=3.10.0" (or other chosen minima) so the package matches the root project's reproducibility policy and avoids pulling unbounded versions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Around line 55-59: The test dependency list in pyproject.toml is missing
pytest-cov which causes pytest to fail when CI runs with --cov flags; update the
test extras list (the "test" array) to include "pytest-cov>=X.Y.Z" (pick a
compatible minimum, e.g., "pytest-cov>=4.0.0") so the pytest invocation in CI
recognizes the --cov and --cov-report options; modify the same "test" dependency
block shown in the diff to add the pytest-cov entry.
---
Nitpick comments:
In `@libs/python/agent/pyproject.toml`:
- Around line 30-34: Update the test extras in pyproject.toml to pin minimum
versions for the test dependencies; replace the bare names "pytest",
"pytest-asyncio", and "pytest-mock" with version-constrained entries like
"pytest>=8.0.0", "pytest-asyncio>=0.20.0", "pytest-mock>=3.10.0" (or other
chosen minima) so the package matches the root project's reproducibility policy
and avoids pulling unbounded versions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4daccd0f-71cc-4a98-be9c-bad3fafb96ba
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
libs/python/agent/agent/loops/anthropic.pylibs/python/agent/pyproject.tomllibs/python/agent/tests/test_anthropic.pylibs/python/agent/tests/test_message_conversion.pylibs/python/agent/tests/test_uitars.pypyproject.toml
Closes #944
Summary by CodeRabbit
Bug Fixes
Tests
Chores