Scratchpad 3Q Reasoning: Surfacing Assumptions to Mitigate Hallucination and Improve Truthfulness in Large Language Models

A lightweight prompt-level framework that asks an LLM to produce a short, delimited scratchpad alongside a single concise final answer. The scratchpad exposes three short sections (What I Know, What I Need, What I Am Assuming) to make model reasoning auditable and to reduce hallucination in user-visible responses. The project includes a small reference prompt parser (utils/scratchpad.py), caching utilities, an optional local model server, and an evaluation harness using TruthfulQA.

Highlights

Prompt-level intervention (no model retraining required).
Produces a machine-parseable scratchpad and a final answer separated by explicit markers.
Logs scratchpads for audit, analysis, and downstream hallucination detection research.
Includes example integration with an evaluation harness (TruthfulQA).

Requirements

The project uses Python and depends on a few packages. Exact requirements depend on your chosen execution mode (OpenAI API vs. local model server).

Typical dependencies (install into a virtualenv):

pip install -r requirements.txt

Setup

Create and activate a Python virtual environment.
Install the dependencies listed above.
Set environment variables:

For OpenAI API usage (default mode):

export OPENAI_MODEL="gpt-4"  # or whichever model name you want to use
export OPENAI_API_KEY="sk-..."  # set your OpenAI API key in env

Usage examples

1) Use the scratchpad prompt wrapper (OpenAI API mode)

Example (Python):

from utils.scratchpad import get_scratchpad_response

query = "Is it possible for water to flow uphill?"
final = get_scratchpad_response(query, max_tokens=200, temperature=0.0)
print("Final answer:\n", final)

If you prefer to access the full parsed scratchpad, you can call wrap_with_scratchpad_instruction() and parse_scratchpad_response() directly (see utils/scratchpad.py).

2) Running TruthfulQA evaluation

utils/eval_TruthfulQA.py provides a small harness that evaluates the model with and without the scratchpad. It uses a custom CustomModel wrapper which calls get_response() or get_scratchpad_response() depending on the scratchpad flag.

Run the evaluation (requires deepeval and the TruthfulQA dataset available via that package):

python utils/eval_TruthfulQA.py

The script will print standard and scratchpad evaluation results and per-task scores.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
utils		utils
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
paper-3Q-Reasoning.pdf		paper-3Q-Reasoning.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scratchpad 3Q Reasoning: Surfacing Assumptions to Mitigate Hallucination and Improve Truthfulness in Large Language Models

Highlights

Requirements

Setup

Usage examples

1) Use the scratchpad prompt wrapper (OpenAI API mode)

2) Running TruthfulQA evaluation

About

Uh oh!

Languages

License

Pro-GenAI/S3Q-Reasoning

Folders and files

Latest commit

History

Repository files navigation

Scratchpad 3Q Reasoning: Surfacing Assumptions to Mitigate Hallucination and Improve Truthfulness in Large Language Models

Highlights

Requirements

Setup

Usage examples

1) Use the scratchpad prompt wrapper (OpenAI API mode)

2) Running TruthfulQA evaluation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages