[Release] How to access object-level caption and pose estimation from DINO-X?

Hi IDEA team,

Thanks for the amazing work on DINO-X! I noticed that your paper and online demo mention support for multiple output heads including:

- Region-level/object-level **captioning**
- **Pose estimation** with keypoints for person/hands

Could you please clarify:
1. Are these modules accessible via DDS Cloud API?
2. If not yet public, is there a plan/timeline to release these features?
3. Are there any available examples or references to test these functionalities locally?

Really appreciate your work — this model has been very helpful for our robotics perception project!

Best regards,  
Tony


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Release] How to access object-level caption and pose estimation from DINO-X? #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Release] How to access object-level caption and pose estimation from DINO-X? #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions