Skip to content

Commit 81bdbab

Browse files
committed
update tools
2 parents fe0f74e + bb2d31c commit 81bdbab

File tree

3 files changed

+21
-20
lines changed

3 files changed

+21
-20
lines changed

CHANGELOG.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.0.25] 2025-07-17
11+
12+
- Fix broken link in docstring
13+
1014
## [1.0.24] 2025-07-10
1115

1216
- Remove `Validator` class, move `validate()` functionality to `Project` class. Adds conversational support for validation.
@@ -116,7 +120,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
116120

117121
- Initial release of the `cleanlab-codex` client library.
118122

119-
[Unreleased]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.24...HEAD
123+
[Unreleased]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.25...HEAD
124+
[1.0.25]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.24...v1.0.25
120125
[1.0.24]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.23...v1.0.24
121126
[1.0.23]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.22...v1.0.23
122127
[1.0.22]: https://github.com/cleanlab/cleanlab-codex/compare/v1.0.21...v1.0.22

src/cleanlab_codex/__about__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
# SPDX-License-Identifier: MIT
2-
__version__ = "1.0.24"
2+
__version__ = "1.0.25"

src/cleanlab_codex/project.py

Lines changed: 14 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ def __str__(self) -> str:
3838
class Project:
3939
"""Represents a Codex project.
4040
41-
To integrate a Codex project into your RAG/Agentic system, we recommend using one of our abstractions such as [`Validator`](/codex/api/python/validator).
41+
To integrate a Codex project into your RAG/Agentic system, we recommend using the [`Project.validate()` method](/codex/api/python/project#method-validate).
4242
"""
4343

4444
def __init__(self, sdk_client: _Codex, project_id: str, *, verify_existence: bool = True):
@@ -157,32 +157,28 @@ def validate(
157157
tools: Optional[list[ChatCompletionToolParam]] = None,
158158
eval_scores: Optional[Dict[str, float]] = None,
159159
) -> ProjectValidateResponse:
160-
"""Evaluate the quality of an AI-generated response using the structured message history, query, and retrieved context.
160+
"""Evaluate the quality of an AI-generated `response` based on the same exact inputs that your LLM used to generate the response.
161161
162-
This method runs validation on an AI response using the full `messages` history (formatted as OpenAI-style chat messages),
163-
which should include the latest user query and any preceding system or assistant messages.
162+
Supply the same `messages` that your LLM used to generate its response (formatted as OpenAI-style chat messages),
163+
including all past user/assistant messages, and any preceding system messages (including any retrieved context).
164164
165-
**For single-turn Q&A apps, `messages` can be a minimal list with one user message. For multi-turn conversations, provide the full dialog
166-
leading up to the final response.
167-
168-
The function assesses the trustworthiness and quality of the AI `response` in light of the provided `context` and
169-
`query`, which should align with the most recent user message in `messages`.
165+
**For single-turn Q&A apps, `messages` can be a minimal list with one user message containing all relevant info that was supplied to your LLM.
166+
For multi-turn conversations, provide the full dialog leading up to the final response (not including the final response).
170167
171168
If your AI response is flagged as problematic, then this method will:
172169
- return an expert answer if one was previously provided for a similar query
173170
- otherwise log this query for future SME review (to consider providing an expert answer) in the Web interface.
174171
175172
Args:
176-
messages (list[ChatCompletionMessageParam]): The full message history from the AI conversation, formatted for OpenAI-style chat completion.
177-
This must include the final user message that triggered the AI response. All other arguments—`query`, `context`, and `response`—
178-
must correspond specifically to this final user message.
179-
response (ChatCompletion | str): Your AI-response that was generated based on the given `messages`. This is the response being evaluated, and should not appear in the `messages`.
180-
query (str): The user query that the `response` is answering. This query should be the latest user message in `messages`.
181-
context (str): The retrieved context (e.g., from your RAG system) that was supplied to the AI when generating the `response` to the final user query in `messages`.
182-
rewritten_query (str, optional): An optional reformulation of `query` (e.g. made self-contained w.r.t multi-turn conversations) to improve retrieval quality.
183-
metadata (object, optional): Arbitrary metadata to associate with this validation for logging or analysis inside the Codex project.
173+
messages (list[ChatCompletionMessageParam]): The full prompt given to your LLM that generated the response, in the OpenAI Chat Completions format.
174+
This must include the final user message that triggered the AI response. This must include all of the state that was supplied to your LLM (including: full conversation history, system instructions/prompt, retrieved context, etc).
175+
response (ChatCompletion | str): Your AI-response that was generated by LLM given the same `messages`. This is the response being evaluated, and should not appear in the `messages`.
176+
query (str): The core user query that the `response` is answering, i.e. the latest user message in `messages`. Specifying the `query` (as a part of the full `messages` object) enables Cleanlab to: match this against other users' queries (e.g. for serving expert answers), run certain Evals, and display the query in the Web Interface.
177+
context (str): All retrieved context (e.g., from your RAG/retrieval/search system) that was supplied as part of `messages` for generating the LLM `response`. Specifying the `context` (as a part of the full `messages` object) enables Cleanlab to run certain Evals and display the retrieved context in the Web Inferface.
178+
rewritten_query (str, optional): An optional reformulation of `query` (e.g. to form a self-contained question out of a multi-turn conversation history) to improve retrieval quality. If you are using a query-rewriter in your RAG system, you can provide its output here. If not provided, Cleanlab may internally do its own query rewrite when necessary.
179+
metadata (object, optional): Arbitrary metadata to associate with this LLM `response` for logging/analytics inside the Project.
184180
tools (list[ChatCompletionToolParam], optional): Optional tools that were used to generate the response. This is useful for validating correct tool usage in the response.
185-
eval_scores (dict[str, float], optional): Precomputed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. Consider providing these scores if you already have them precomputed to reduce runtime.
181+
eval_scores (dict[str, float], optional): Pre-computed evaluation scores to bypass automatic scoring. Providing `eval_scores` for specific evaluations bypasses automated scoring and uses the supplied scores instead. If you already have them pre-computed, this can reduce runtime.
186182
187183
Returns:
188184
ProjectValidateResponse: A structured object with the following fields:

0 commit comments

Comments
 (0)