feat: Auto-rater allows results of previous checkers as context #57
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This pull request introduces a new mechanism to allow individual rating functions to share their results with subsequent ratings. The primary motivation is to enable more context-aware evaluations, specifically allowing the LLM-based code quality auto-rater to be influenced by the results of earlier ratings, such as the
safety-websecurity scan.Key Changes
Introduced
ratingsContext:ratingsContextobject is created in the mainrateGeneratedCodeorchestration function (runner/ratings/rate-code.ts).ratingsContextobject, keyed by the rating's unique ID.runPerBuildRating,runPerFileRating,runLlmBasedRating).Created a Centralized
RatingsContextType:RatingsContextwas created inrunner/ratings/rating-types.ts.Record<>type.Plumbed Context to Auto-Raters:
ratingsContextis now passed throughcodeQualityRating(runner/ratings/built-in-ratings/code-quality-rating.ts) to the underlyingautoRateCodefunction (runner/ratings/autoraters/code-rater.ts).autoRateFiles(runner/ratings/autoraters/rate-files.ts) to ensure it's available when the auto-rater is called from other entry points.Implemented Context-Aware Prompting:
autoRateCode, the code now specifically checks theratingsContextfor the result of thesafety-webrating.safety-webrating has executed, its entire result object is serialized to a JSON string.SAFETY_WEB_RESULTS_JSONvariable, making the detailed security scan results available for interpolation in custom code rating prompts.How to Use
A custom code rating prompt can now be configured in an environment's
config.jsand can access the security results like this:You are a code quality evaluator. Please assess the following code. A security scan was previously run on this code. Here are the results: {{ SAFETY_WEB_RESULTS_JSON }} Please take these security findings into account during your evaluation.