-
Notifications
You must be signed in to change notification settings - Fork 3.4k
feat: LLM EvalKit - A practical framework for building a open-source … #2341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: LLM EvalKit - A practical framework for building a open-source … #2341
Conversation
…application for prompt engineering
Summary of ChangesHello @Michael-Santoro, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a powerful new tool, the LLM EvalKit, which streamlines the entire prompt engineering process. It provides a robust framework for developers to efficiently create, test, evaluate, and optimize LLM prompts, significantly simplifying interactions with Vertex AI and enabling a more focused approach to achieving high-quality, reliable results from large language models. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the LLM EvalKit, a comprehensive framework for managing, evaluating, and optimizing LLM prompts. The contribution is substantial, adding a full Streamlit application with multiple pages, backend logic for interacting with Google Cloud services, and a tutorial notebook. My review focuses on ensuring adherence to the provided style guide, correctness of the implementation, and the usability of the tutorial. I've identified some critical issues, including a missing dependency and a bug in response generation, that need to be addressed. There are also several violations of the style guide regarding SDK usage and recommended model versions, along with some issues in the tutorial notebook that would prevent users from running the application correctly. Overall, this is a great addition, and with these fixes, it will be a very powerful tool.
…application for prompt engineering
Line 74 of llmevalkit/pages/1_Prompt_Management.py included this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way this could be condensed down/made easier to follow? Since you already have a Jupyter Notebook for the main flow, could the other utility functions be added into it?
Yes, that's alright. |
@Michael-Santoro please fix the spelling. Thanks. |
@Michael-Santoro: can you please 1) move the code under /tools? 2) add a brief summary in the readme file to describe what it is and what it does for developers? Thanks. |
Don't worry about the current spelling test failure. I'll add an exception for this because the smart quote is required |
1d0e1ca
to
97315ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question about this sample app. Most, if not all of the steps in this tutorial can be accomplished using the Cloud Console Vertex AI Evaluation page instead of this Streamlit App.
https://console.cloud.google.com/vertex-ai/evaluation/create
I'm not sure this extra UI wrapper for the APIs is needed. I wonder if it would make sense to either restructure the Notebook to show how the API calls for this example would work, or create a tutorial showing how to do this in the cloud console.
Maybe look at updating this tutorial in the docs to follow what you're doing here: https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-genai-console
…application for prompt engineering
Summary
This project is a powerful tool designed for the systematic evaluation and optimization of prompts for Large Language Models (LLMs). It provides a robust Python-based framework specifically for assessing prompt performance against a defined problem, enabling users to identify the most effective prompt variations.
The core purpose of this tool is to abstract away some of the complexities of Google Cloud's Vertex AI platform. By simplifying interactions with the underlying infrastructure, it empowers users to concentrate on the critical task of prompt engineering and performance analysis.
Ultimately, this project enables teams to rigorously evaluate and deploy the best possible prompts for their applications, ensuring higher quality and more reliable results from their Large Language Models.
Description
Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
CONTRIBUTING
Guide.CODEOWNERS
for the file(s).nox -s format
from the repository root to format).Fixes #<issue_number_goes_here> 🦕