You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/cleanlab_codex/project.py
+14-18Lines changed: 14 additions & 18 deletions
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ def __str__(self) -> str:
38
38
classProject:
39
39
"""Represents a Codex project.
40
40
41
-
To integrate a Codex project into your RAG/Agentic system, we recommend using one of our abstractions such as [`Validator`](/codex/api/python/validator).
41
+
To integrate a Codex project into your RAG/Agentic system, we recommend using the [`Project.validate()` method](/codex/api/python/project#method-validate).
"""Evaluate the quality of an AI-generated response using the structured message history, query, and retrieved context.
160
+
"""Evaluate the quality of an AI-generated `response` based on the same exact inputs that your LLM used to generate the response.
161
161
162
-
This method runs validation on an AI response using the full `messages` history (formatted as OpenAI-style chat messages),
163
-
which should include the latest user query and any preceding system or assistant messages.
162
+
Supply the same `messages` that your LLM used to generate its response (formatted as OpenAI-style chat messages),
163
+
including all past user/assistant messages, and any preceding system messages (including any retrieved context).
164
164
165
-
**For single-turn Q&A apps, `messages` can be a minimal list with one user message. For multi-turn conversations, provide the full dialog
166
-
leading up to the final response.
167
-
168
-
The function assesses the trustworthiness and quality of the AI `response` in light of the provided `context` and
169
-
`query`, which should align with the most recent user message in `messages`.
165
+
**For single-turn Q&A apps, `messages` can be a minimal list with one user message containing all relevant info that was supplied to your LLM.
166
+
For multi-turn conversations, provide the full dialog leading up to the final response (not including the final response).
170
167
171
168
If your AI response is flagged as problematic, then this method will:
172
169
- return an expert answer if one was previously provided for a similar query
173
170
- otherwise log this query for future SME review (to consider providing an expert answer) in the Web interface.
174
171
175
172
Args:
176
-
messages (list[ChatCompletionMessageParam]): The full message history from the AI conversation, formatted for OpenAI-style chat completion.
177
-
This must include the final user message that triggered the AI response. All other arguments—`query`, `context`, and `response`—
178
-
must correspond specifically to this final user message.
179
-
response (ChatCompletion | str): Your AI-response that was generated based on the given `messages`. This is the response being evaluated, and should not appear in the `messages`.
180
-
query (str): The user query that the `response` is answering. This query should be the latest user message in `messages`.
181
-
context (str): The retrieved context (e.g., from your RAG system) that was supplied to the AI when generating the `response` to the final user query in `messages`.
182
-
rewritten_query (str, optional): An optional reformulation of `query` (e.g. made self-contained w.r.t multi-turn conversations) to improve retrieval quality.
183
-
metadata (object, optional): Arbitrary metadata to associate with this validation for logging or analysis inside the Codex project.
173
+
messages (list[ChatCompletionMessageParam]): The full prompt given to your LLM that generated the response, in the OpenAI Chat Completions format.
174
+
This must include the final user message that triggered the AI response. This must include all of the state that was supplied to your LLM (including: full conversation history, system instructions/prompt, retrieved context, etc).
175
+
response (ChatCompletion | str): Your AI-response that was generated by LLM given the same `messages`. This is the response being evaluated, and should not appear in the `messages`.
176
+
query (str): The core user query that the `response` is answering, i.e. the latest user message in `messages`. Specifying the `query` (as a part of the full `messages` object) enables Cleanlab to: match this against other users' queries (e.g. for serving expert answers), run certain Evals, and display the query in the Web Interface.
177
+
context (str): All retrieved context (e.g., from your RAG/retrieval/search system) that was supplied as part of `messages` for generating the LLM `response`. Specifying the `context` (as a part of the full `messages` object) enables Cleanlab to run certain Evals and display the retrieved context in the Web Inferface.
178
+
rewritten_query (str, optional): An optional reformulation of `query` (e.g. to form a self-contained question out of a multi-turn conversation history) to improve retrieval quality. If you are using a query-rewriter in your RAG system, you can provide its output here. If not provided, Cleanlab may internally do its own query rewrite when necessary.
179
+
metadata (object, optional): Arbitrary metadata to associate with this LLM `response` for logging/analytics inside the Project.
184
180
tools (list[ChatCompletionToolParam], optional): Optional tools that were used to generate the response. This is useful for validating correct tool usage in the response.
185
-
eval_scores (dict[str, float], optional): Precomputed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. Consider providing these scores if you already have them precomputed to reduce runtime.
181
+
eval_scores (dict[str, float], optional): Pre-computed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. If you already have them pre-computed, this can reduce runtime.
186
182
187
183
Returns:
188
184
ProjectValidateResponse: A structured object with the following fields:
0 commit comments