|
The HoarePrompt-Experiment guide provides instructions on how to run the HoarePrompt tool for processing an entire dataset.
Before running the project, ensure that the necessary environment variables are set, including GROQ_API_KEY and OPENAI_API_KEY. These keys are essential for accessing the respective APIs that power the LLM services used in this experiment.
It is recommended to create a virtual environment to ensure dependency isolation and avoid conflicts with other projects. Follow these steps:
# Create a virtual environment
python3 -m venv hoareprompt-env
# Activate the virtual environment
source hoareprompt-env/bin/activateOnce your virtual environment is activated, install the necessary dependencies:
pip install -r requirements.txtAdditionally, install the hoareprompt package locally using an editable install:
pip install -e /path/to/hoarepromptReplace
/path/to/hoarepromptwith the actual path to thehoarepromptdirectory from the GitHub repository. The-eflag enables editable mode, meaning any changes made to thehoarepromptpackage are immediately reflected without requiring reinstallation.
Depending on the LLM service you intend to use, set the appropriate environment variables:
For OpenAI models:
export OPENAI_API_KEY="your-openai-api-key"For Groq models:
export GROQ_API_KEY="your-groq-api-key"To persist these variables across terminal sessions, add the above export commands to your .bashrc or .zshrc file.
This guide helps run multiple test cases in JSON format efficiently. Configuration options are the same as those detailed in the main HoarePrompt README (../README.md).
The input data file should be a JSON file containing test cases with the following fields:
description: Natural language description of the problem.correct: Ground truth label indicating correctness.unique_id: Unique identifier for the test case.task_id: Identifier related to the problem set.generated_code: The program code to be evaluated.
You can use our dataset located at ../Results/Dataset.
Execute the following command, specifying the data file, log directory, and configuration file paths:
python3 -m src.main_verify --data /path/to/data/file --log /path/to/log/dir --config /path/to/config/file--data: Path to the input JSON data file.--log: Directory where logs will be stored.--config: Path to the configuration file.
python3 -m src.main_verify --data data/input.json --log logs/experiment1 --config configs/custom_config.jsonIf you do not provide the --config argument, the default configuration file (default_config.json) will be used.
The main_verify script should be used for running HoarePrompt with all available classifiers except for tester.
If you want to use the tester classifier, run the following command with the corresponding tester configuration:
python3 -m src.main_tester --data /path/to/data/file --log /path/to/log/dir --config /path/to/config/fileThis ensures compatibility when testing HoarePrompt or HoarePrompt-No-Unroll configurations depending on the specified setup.
This guide provides a streamlined approach for executing HoarePrompt experiments efficiently. Refer to the main HoarePrompt documentation (../README.md) for more in-depth details on configuration options and parameter settings.
The HoarePrompt project consists of multiple repositories, each serving a specific function:
-
- The core repository containing the implementation of HoarePrompt, including the logic for analyzing programs, state propagation, and correctness verification.
-
- A dedicated repository for storing experimental results and datasets related to HoarePrompt evaluations. This includes correctness assessments, counterexamples, and other findings from our testing.
-
HoarePrompt-experiments (This is the current repo you are on)
- This repository provides scripts and configurations to run large-scale experiments with HoarePrompt on various datasets. It includes automated evaluation scripts and batch processing tools.
-
CoCoClaNeL
This repository: the dataset and associated documentation.
This dataset is released under the MIT License.
See the LICENSE file for details.
