Skip to content

Commit caaf8a1

Browse files
authored
validate docstring
1 parent 82c4a87 commit caaf8a1

File tree

1 file changed

+13
-17
lines changed

1 file changed

+13
-17
lines changed

src/cleanlab_codex/project.py

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -156,31 +156,27 @@ def validate(
156156
metadata: Optional[object] = None,
157157
eval_scores: Optional[Dict[str, float]] = None,
158158
) -> ProjectValidateResponse:
159-
"""Evaluate the quality of an AI-generated response using the structured message history, query, and retrieved context.
159+
"""Evaluate the quality of an AI-generated `response` based on the same exact inputs that your LLM used to generate the response.
160160
161-
This method runs validation on an AI response using the full `messages` history (formatted as OpenAI-style chat messages),
162-
which should include the latest user query and any preceding system or assistant messages.
161+
Supply the same `messages` that your LLM used to generate its response (formatted as OpenAI-style chat messages),
162+
including all past user/assistant messages, and any preceding system messages (including any retrieved context).
163163
164-
**For single-turn Q&A apps, `messages` can be a minimal list with one user message. For multi-turn conversations, provide the full dialog
165-
leading up to the final response.
166-
167-
The function assesses the trustworthiness and quality of the AI `response` in light of the provided `context` and
168-
`query`, which should align with the most recent user message in `messages`.
164+
**For single-turn Q&A apps, `messages` can be a minimal list with one user message containing all relevant info that was supplied to your LLM.
165+
For multi-turn conversations, provide the full dialog leading up to the final response (not including the final response).
169166
170167
If your AI response is flagged as problematic, then this method will:
171168
- return an expert answer if one was previously provided for a similar query
172169
- otherwise log this query for future SME review (to consider providing an expert answer) in the Web interface.
173170
174171
Args:
175-
messages (list[ChatCompletionMessageParam]): The full message history from the AI conversation, formatted for OpenAI-style chat completion.
176-
This must include the final user message that triggered the AI response. All other arguments—`query`, `context`, and `response`—
177-
must correspond specifically to this final user message.
178-
response (ChatCompletion | str): Your AI-response that was generated based on the given `messages`. This is the response being evaluated, and should not appear in the `messages`.
179-
query (str): The user query that the `response` is answering. This query should be the latest user message in `messages`.
180-
context (str): The retrieved context (e.g., from your RAG system) that was supplied to the AI when generating the `response` to the final user query in `messages`.
181-
rewritten_query (str, optional): An optional reformulation of `query` (e.g. made self-contained w.r.t multi-turn conversations) to improve retrieval quality.
182-
metadata (object, optional): Arbitrary metadata to associate with this validation for logging or analysis inside the Codex project.
183-
eval_scores (dict[str, float], optional): Precomputed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. Consider providing these scores if you already have them precomputed to reduce runtime.
172+
messages (list[ChatCompletionMessageParam]): The full prompt given to your LLM that generated the response, in the OpenAI Chat Completions format.
173+
This must include the final user message that triggered the AI response. This must include all of the state that was supplied to your LLM (including: full conversation history, system instructions/prompt, retrieved context, etc).
174+
response (ChatCompletion | str): Your AI-response that was generated by LLM given the same `messages`. This is the response being evaluated, and should not appear in the `messages`.
175+
query (str): The core user query that the `response` is answering, i.e. the latest user message in `messages`. Specifying the `query` (as a part of the full `messages` object) enables Cleanlab to: match this against other users' queries (e.g. for serving expert answers), run certain Evals, and display the query in the Web Interface.
176+
context (str): All retrieved context (e.g., from your RAG/retrieval/search system) that was supplied as part of `messages` for generating the LLM `response`. Specifying the `context` (as a part of the full `messages` object) enables Cleanlab to run certain Evals and display the retrieved context in the Web Inferface.
177+
rewritten_query (str, optional): An optional reformulation of `query` (e.g. to form a self-contained question out of a multi-turn conversation history) to improve retrieval quality. If you are using a query-rewriter in your RAG system, you can provide its output here. If not provided, Cleanlab may internally do its own query rewrite when necessary.
178+
metadata (object, optional): Arbitrary metadata to associate with this LLM `response` for logging/analytics inside the Project.
179+
eval_scores (dict[str, float], optional): Pre-computed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. If you already have them pre-computed, this can reduce runtime.
184180
185181
Returns:
186182
ProjectValidateResponse: A structured object with the following fields:

0 commit comments

Comments
 (0)