This repository is made for TogetherCrew's LLM bot.
Run our RAG evaluations locally or in GitHub Actions. Results are written to results.csv and results_cost.json.
Prerequisites:
- Create a
.envfile with yourOPENAI_API_KEY(and any other required envs).
Run:
docker compose -f docker-compose.evaluation.yml up --buildThis will:
- Start a local Qdrant at port 6333
- Run
evaluation/evaluation.py --community-id 1234 --platform-id 4321 - Persist
results.csvandresults_cost.jsonto the repo root on your host
- Workflow: RAG Evaluation (manual trigger)
- Steps performed:
- Boot a Qdrant service
- Install Python dependencies and spaCy model
- Run the evaluation
- Compute and publish averages (faithfulness, answer_relevancy, context_precision, context_recall) to the job summary
- Upload
results.csvandresults_cost.jsonas artifacts
Ensure OPENAI_API_KEY is set as a repository secret.
results.csv: exact evaluation results (per-sample)results_cost.json: aggregate token/cost info
- Fetch the Qdrant snapshot from S3 and persist it in Docker Compose evaluation
- Fetch the test dataset from S3 and update
evaluation/evaluation.pyto load from S3 (configurable root)