update tools

ulya-tkch · ulya-tkch · commit 81bdbab99f71 · 2025-07-28T17:15:24.000-07:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [1.0.25] 2025-07-17
+
+- Fix broken link in docstring
+
 ## [1.0.24] 2025-07-10
 
 - Remove `Validator` class, move `validate()` functionality to `Project` class. Adds conversational support for validation.
@@ -116,7 +120,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - Initial release of the `cleanlab-codex` client library.
 
-[Unreleased]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.24...HEAD
+[Unreleased]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.25...HEAD
+[1.0.25]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.24...v1.0.25
 [1.0.24]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.23...v1.0.24
 [1.0.23]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.22...v1.0.23
 [1.0.22]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.21...v1.0.22
diff --git a/src/cleanlab_codex/__about__.py b/src/cleanlab_codex/__about__.py
@@ -1,2 +1,2 @@
 # SPDX-License-Identifier: MIT
-__version__ = "1.0.24"
+__version__ = "1.0.25"
diff --git a/src/cleanlab_codex/project.py b/src/cleanlab_codex/project.py
@@ -38,7 +38,7 @@ def __str__(self) -> str:
 class Project:
     """Represents a Codex project.
 
-    To integrate a Codex project into your RAG/Agentic system, we recommend using one of our abstractions such as [`Validator`](/codex/api/python/validator).
+    To integrate a Codex project into your RAG/Agentic system, we recommend using the [`Project.validate()` method](/codex/api/python/project#method-validate).
     """
 
     def __init__(self, sdk_client: _Codex, project_id: str, *, verify_existence: bool = True):
@@ -157,32 +157,28 @@ def validate(
         tools: Optional[list[ChatCompletionToolParam]] = None,
         eval_scores: Optional[Dict[str, float]] = None,
     ) -> ProjectValidateResponse:
-        """Evaluate the quality of an AI-generated response using the structured message history, query, and retrieved context.
+        """Evaluate the quality of an AI-generated `response` based on the same exact inputs that your LLM used to generate the response.
 
-        This method runs validation on an AI response using the full `messages` history (formatted as OpenAI-style chat messages),
-        which should include the latest user query and any preceding system or assistant messages.
+        Supply the same `messages` that your LLM used to generate its response (formatted as OpenAI-style chat messages),
+        including all past user/assistant messages, and any preceding system messages (including any retrieved context).
 
-        **For single-turn Q&A apps, `messages` can be a minimal list with one user message. For multi-turn conversations, provide the full dialog
-        leading up to the final response.
-
-        The function assesses the trustworthiness and quality of the AI `response` in light of the provided `context` and
-        `query`, which should align with the most recent user message in `messages`.
+        **For single-turn Q&A apps, `messages` can be a minimal list with one user message containing all relevant info that was supplied to your LLM.
+        For multi-turn conversations, provide the full dialog leading up to the final response (not including the final response).
 
         If your AI response is flagged as problematic, then this method will:
             - return an expert answer if one was previously provided for a similar query
             - otherwise log this query for future SME review (to consider providing an expert answer) in the Web interface.
 
         Args:
-            messages (list[ChatCompletionMessageParam]): The full message history from the AI conversation, formatted for OpenAI-style chat completion.
-                This must include the final user message that triggered the AI response. All other arguments—`query`, `context`, and `response`—
-                must correspond specifically to this final user message.
-            response (ChatCompletion | str): Your AI-response that was generated based on the given `messages`. This is the response being evaluated, and should not appear in the `messages`.
-            query (str): The user query that the `response` is answering. This query should be the latest user message in `messages`.
-            context (str): The retrieved context (e.g., from your RAG system) that was supplied to the AI when generating the `response` to the final user query in `messages`.
-            rewritten_query (str, optional): An optional reformulation of `query` (e.g. made self-contained w.r.t multi-turn conversations) to improve retrieval quality.
-            metadata (object, optional): Arbitrary metadata to associate with this validation for logging or analysis inside the Codex project.
+            messages (list[ChatCompletionMessageParam]): The full prompt given to your LLM that generated the response, in the OpenAI Chat Completions format.
+                This must include the final user message that triggered the AI response. This must include all of the state that was supplied to your LLM (including: full conversation history, system instructions/prompt, retrieved context, etc).
+            response (ChatCompletion | str): Your AI-response that was generated by LLM given the same `messages`. This is the response being evaluated, and should not appear in the `messages`.
+            query (str): The core user query that the `response` is answering, i.e. the latest user message in `messages`. Specifying the `query` (as a part of the full `messages` object) enables Cleanlab to: match this against other users' queries (e.g. for serving expert answers), run certain Evals, and display the query in the Web Interface.
+            context (str): All retrieved context (e.g., from your RAG/retrieval/search system) that was supplied as part of `messages` for generating the LLM `response`. Specifying the `context` (as a part of the full `messages` object) enables Cleanlab to run certain Evals and display the retrieved context in the Web Inferface.
+            rewritten_query (str, optional): An optional reformulation of `query` (e.g. to form a self-contained question out of a multi-turn conversation history) to improve retrieval quality. If you are using a query-rewriter in your RAG system, you can provide its output here. If not provided, Cleanlab may internally do its own query rewrite when necessary.
+            metadata (object, optional): Arbitrary metadata to associate with this LLM `response` for logging/analytics inside the Project.
             tools (list[ChatCompletionToolParam], optional): Optional tools that were used to generate the response. This is useful for validating correct tool usage in the response.
-            eval_scores (dict[str, float], optional): Precomputed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. Consider providing these scores if you already have them precomputed to reduce runtime.
+            eval_scores (dict[str, float], optional): Pre-computed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. If you already have them pre-computed, this can reduce runtime.
 
         Returns:
             ProjectValidateResponse: A structured object with the following fields:

Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,2 @@`
`1`	`1`	`# SPDX-License-Identifier: MIT`
`2`		`-__version__ = "1.0.24"`
	`2`	`+__version__ = "1.0.25"`