validate docstring

jwmueller · web-flow · commit caaf8a17099f · 2025-07-25T09:30:32.000-07:00
diff --git a/src/cleanlab_codex/project.py b/src/cleanlab_codex/project.py
@@ -156,31 +156,27 @@ def validate(
         metadata: Optional[object] = None,
         eval_scores: Optional[Dict[str, float]] = None,
     ) -> ProjectValidateResponse:
-        """Evaluate the quality of an AI-generated response using the structured message history, query, and retrieved context.
+        """Evaluate the quality of an AI-generated `response` based on the same exact inputs that your LLM used to generate the response.
 
-        This method runs validation on an AI response using the full `messages` history (formatted as OpenAI-style chat messages),
-        which should include the latest user query and any preceding system or assistant messages.
+        Supply the same `messages` that your LLM used to generate its response (formatted as OpenAI-style chat messages),
+        including all past user/assistant messages, and any preceding system messages (including any retrieved context).
 
-        **For single-turn Q&A apps, `messages` can be a minimal list with one user message. For multi-turn conversations, provide the full dialog
-        leading up to the final response.
-
-        The function assesses the trustworthiness and quality of the AI `response` in light of the provided `context` and
-        `query`, which should align with the most recent user message in `messages`.
+        **For single-turn Q&A apps, `messages` can be a minimal list with one user message containing all relevant info that was supplied to your LLM.
+        For multi-turn conversations, provide the full dialog leading up to the final response (not including the final response).
 
         If your AI response is flagged as problematic, then this method will:
             - return an expert answer if one was previously provided for a similar query
             - otherwise log this query for future SME review (to consider providing an expert answer) in the Web interface.
 
         Args:
-            messages (list[ChatCompletionMessageParam]): The full message history from the AI conversation, formatted for OpenAI-style chat completion.
-                This must include the final user message that triggered the AI response. All other arguments—`query`, `context`, and `response`—
-                must correspond specifically to this final user message.
-            response (ChatCompletion | str): Your AI-response that was generated based on the given `messages`. This is the response being evaluated, and should not appear in the `messages`.
-            query (str): The user query that the `response` is answering. This query should be the latest user message in `messages`.
-            context (str): The retrieved context (e.g., from your RAG system) that was supplied to the AI when generating the `response` to the final user query in `messages`.
-            rewritten_query (str, optional): An optional reformulation of `query` (e.g. made self-contained w.r.t multi-turn conversations) to improve retrieval quality.
-            metadata (object, optional): Arbitrary metadata to associate with this validation for logging or analysis inside the Codex project.
-            eval_scores (dict[str, float], optional): Precomputed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. Consider providing these scores if you already have them precomputed to reduce runtime.
+            messages (list[ChatCompletionMessageParam]): The full prompt given to your LLM that generated the response, in the OpenAI Chat Completions format.
+                This must include the final user message that triggered the AI response. This must include all of the state that was supplied to your LLM (including: full conversation history, system instructions/prompt, retrieved context, etc). 
+            response (ChatCompletion | str): Your AI-response that was generated by LLM given the same `messages`. This is the response being evaluated, and should not appear in the `messages`.
+            query (str): The core user query that the `response` is answering, i.e. the latest user message in `messages`. Specifying the `query` (as a part of the full `messages` object) enables Cleanlab to: match this against other users' queries (e.g. for serving expert answers), run certain Evals, and display the query in the Web Interface.
+            context (str): All retrieved context (e.g., from your RAG/retrieval/search system) that was supplied as part of `messages` for generating the LLM `response`. Specifying the `context` (as a part of the full `messages` object) enables Cleanlab to run certain Evals and display the retrieved context in the Web Inferface. 
+            rewritten_query (str, optional): An optional reformulation of `query` (e.g. to form a self-contained question out of a multi-turn conversation history) to improve retrieval quality. If you are using a query-rewriter in your RAG system, you can provide its output here. If not provided, Cleanlab may internally do its own query rewrite when necessary.
+            metadata (object, optional): Arbitrary metadata to associate with this LLM `response` for logging/analytics inside the Project.
+            eval_scores (dict[str, float], optional): Pre-computed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. If you already have them pre-computed, this can reduce runtime.
 
         Returns:
             ProjectValidateResponse: A structured object with the following fields: