Skip to content

Benchmark shared UI schema with multiple popular models including non-Google ones #312

@jacobsimionato

Description

@jacobsimionato

Let's make sure the schema we're working with can perform well with multiple models from Google, Anthropic, Open AI etc.

The things to test are:

  • Can accept schema in structured output mode successfully, i.e. doesn't use unsupported features
  • Can generate valid UI responses for a range of sample UI use cases where validity means:
    • Only refers to valid widgets in the catalog
    • Tree structure is valid with ID references etc
    • UI structure is reasonable given the use case. Harder to evaluate objectively, but can check that it includes key pieces of information, uses key widgets
  • Can do the above in more scaled use cases, e.g. in a case with a large number of custom widgets (say 50-100), or with longer, deeply nested outputs, e.g. a full screen view with a lot of nested components.

Metadata

Metadata

Labels

P1A high-priority issue. Someone should be assigned and actively working on it.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions