Use eval results with an llm to automatically improve an agent's system prompt

Help devs use eval results to improve their agents used to create the evals automatically by providing the eval results as context to a new feature which uses an LLM to iteratively update a system prompt and re-eval. 

<img width="4162" height="775" alt="Image" src="https://github.com/user-attachments/assets/edd39cde-877d-4150-be87-b5c1b5260541" />

**Why this matters**
- It's not clear what to do with evals right now
- Evals should help make improvements as part of the dev journey

**Key scenarios this enables**
- Detecting prompt regressions in agents
- Bulk model and prompt experimentation

**MVP**
- New tool available under the Agent and Workflow tools section, Agent Optimizer
- Select an existing eval result
- Select an agent, or provide the system prompt
- Provide a system prompt for the LLM judge or use the default one provided
- Specify the output schema for the LLM judge
- Specify the max iterations and target, runs until either one is reached
- Agent can be modified / saved directly, or system prompt updates can be copied and pasted to wherever dev has them. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use eval results with an llm to automatically improve an agent's system prompt #292

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use eval results with an llm to automatically improve an agent's system prompt #292

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions