-
Notifications
You must be signed in to change notification settings - Fork 159
Open
Labels
feature requestThe issue is a feature requestThe issue is a feature request
Description
Help devs use eval results to improve their agents used to create the evals automatically by providing the eval results as context to a new feature which uses an LLM to iteratively update a system prompt and re-eval.
Why this matters
- It's not clear what to do with evals right now
- Evals should help make improvements as part of the dev journey
Key scenarios this enables
- Detecting prompt regressions in agents
- Bulk model and prompt experimentation
MVP
- New tool available under the Agent and Workflow tools section, Agent Optimizer
- Select an existing eval result
- Select an agent, or provide the system prompt
- Provide a system prompt for the LLM judge or use the default one provided
- Specify the output schema for the LLM judge
- Specify the max iterations and target, runs until either one is reached
- Agent can be modified / saved directly, or system prompt updates can be copied and pasted to wherever dev has them.
Metadata
Metadata
Assignees
Labels
feature requestThe issue is a feature requestThe issue is a feature request