Hi IDEA team,
Thanks for the amazing work on DINO-X! I noticed that your paper and online demo mention support for multiple output heads including:
- Region-level/object-level captioning
- Pose estimation with keypoints for person/hands
Could you please clarify:
- Are these modules accessible via DDS Cloud API?
- If not yet public, is there a plan/timeline to release these features?
- Are there any available examples or references to test these functionalities locally?
Really appreciate your work — this model has been very helpful for our robotics perception project!
Best regards,
Tony