Skip to content

Use eval results with an llm to automatically improve an agent's system prompt #292

@therealjohn

Description

@therealjohn

Help devs use eval results to improve their agents used to create the evals automatically by providing the eval results as context to a new feature which uses an LLM to iteratively update a system prompt and re-eval.

Image

Why this matters

  • It's not clear what to do with evals right now
  • Evals should help make improvements as part of the dev journey

Key scenarios this enables

  • Detecting prompt regressions in agents
  • Bulk model and prompt experimentation

MVP

  • New tool available under the Agent and Workflow tools section, Agent Optimizer
  • Select an existing eval result
  • Select an agent, or provide the system prompt
  • Provide a system prompt for the LLM judge or use the default one provided
  • Specify the output schema for the LLM judge
  • Specify the max iterations and target, runs until either one is reached
  • Agent can be modified / saved directly, or system prompt updates can be copied and pasted to wherever dev has them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestThe issue is a feature request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions