Skip to content

Remove beta KL divergence from training loss#607

Merged
arcticfly merged 2 commits intomainfrom
fix/remove-beta-kl-divergence
Mar 10, 2026
Merged

Remove beta KL divergence from training loss#607
arcticfly merged 2 commits intomainfrom
fix/remove-beta-kl-divergence

Conversation

@arcticfly
Copy link
Collaborator

Summary

  • Remove the beta parameter and Schulman KL divergence estimator (exp(r-n) - (r-n) - 1) that was added directly to the training loss
  • The kl_penalty_coef mechanism (zero-mean advantage adjustment) remains as the preferred approach for KL regularization

Changes

  • src/art/types.py: Remove beta field from TrainConfig
  • src/art/loss.py: Remove mean_kl from Loss class and the KL divergence computation
  • src/art/local/backend.py: Remove beta parameter from LocalBackend.train()
  • src/art/serverless/backend.py: Remove beta parameter from ServerlessBackend.train()
  • src/art/unsloth/train.py: Remove beta * mean_kl loss addition and kl_div metric logging
  • src/art/megatron/train.py: Remove beta * mean_kl loss addition
  • src/art/preprocessing/inputs.py: Remove beta from warmup config override

Test plan

  • uv run prek run --all-files passes locally (ruff, ruff format, ty)
  • test_backend_train_api.py passed on H200 GPU cluster — model registration, trajectory gathering, training, and logging all succeeded

🤖 Generated with Claude Code

Remove the Schulman KL estimator (beta * KL) that was added directly
to the training loss. The kl_penalty_coef mechanism (advantage
adjustment) remains as the preferred approach for KL regularization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@arcticfly arcticfly requested a review from corbt March 9, 2026 17:12
@arcticfly arcticfly merged commit d69345e into main Mar 10, 2026
5 checks passed
@arcticfly arcticfly deleted the fix/remove-beta-kl-divergence branch March 10, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants