Remove beta KL divergence from training loss by arcticfly · Pull Request #607 · OpenPipe/ART

arcticfly · 2026-03-09T16:47:06Z

Summary

Remove the beta parameter and Schulman KL divergence estimator (exp(r-n) - (r-n) - 1) that was added directly to the training loss
The kl_penalty_coef mechanism (zero-mean advantage adjustment) remains as the preferred approach for KL regularization

Changes

src/art/types.py: Remove beta field from TrainConfig
src/art/loss.py: Remove mean_kl from Loss class and the KL divergence computation
src/art/local/backend.py: Remove beta parameter from LocalBackend.train()
src/art/serverless/backend.py: Remove beta parameter from ServerlessBackend.train()
src/art/unsloth/train.py: Remove beta * mean_kl loss addition and kl_div metric logging
src/art/megatron/train.py: Remove beta * mean_kl loss addition
src/art/preprocessing/inputs.py: Remove beta from warmup config override

Test plan

uv run prek run --all-files passes locally (ruff, ruff format, ty)
test_backend_train_api.py passed on H200 GPU cluster — model registration, trajectory gathering, training, and logging all succeeded

🤖 Generated with Claude Code

Remove the Schulman KL estimator (beta * KL) that was added directly to the training loss. The kl_penalty_coef mechanism (advantage adjustment) remains as the preferred approach for KL regularization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

arcticfly requested a review from corbt March 9, 2026 17:12

corbt approved these changes Mar 9, 2026

View reviewed changes

Merge branch 'main' into fix/remove-beta-kl-divergence

ff37853

arcticfly merged commit d69345e into main Mar 10, 2026
5 checks passed

arcticfly deleted the fix/remove-beta-kl-divergence branch March 10, 2026 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove beta KL divergence from training loss#607

Remove beta KL divergence from training loss#607
arcticfly merged 2 commits intomainfrom
fix/remove-beta-kl-divergence

arcticfly commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arcticfly commented Mar 9, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants