Skip to content

Conversation

@BlueCrescent
Copy link
Collaborator

What does this PR do?

Mainly fixes a bug causing RotaryTransform to be initialized wrongly when using deferred initialization.
Previously, the inverse frequencies would be initialized to zero (or NaN when using deterministic algorithms).

General Changes

  • Fixed RotaryTransform bug.
  • Added a corresponding test for deferred initialization.
  • Fixed bug in CausalSelfAttention init().
  • Removed some typos.
  • Added utility functions that were very helpful in debugging this.

Breaking Changes

  • None

Checklist before submitting final PR

  • My PR is minimal and addresses one issue in isolation
  • I have merged the latest version of the target branch into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have checked that all tests run through (python tests/tests.py)
  • I have updated the internal changelog (CHANGELOG_DEV.md)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical bug in the RotaryTransform class that caused incorrect initialization when using deferred initialization (meta device), where inverse frequencies would be initialized to zero or NaN. The PR also corrects several typos and adds debugging utilities.

Key changes:

  • Fixed RotaryTransform to properly handle deferred initialization by creating a reset_parameters method that respects the device context
  • Added a test to verify deferred initialization produces the same weights as eager initialization
  • Corrected attribute name from attention_config.attention_config to attention_config.qk_norm_config in CausalSelfAttention
  • Fixed multiple spelling errors: "stoppping" → "stopping" and "savig" → "saving"

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/modalities/models/gpt2/gpt2_model.py Fixed RotaryTransform deferred initialization bug and corrected CausalSelfAttention attribute name typo
tests/nn/model_initialization/test_deferred_initialization.py Added comprehensive test to verify deferred vs eager initialization produces identical results
tests/utility.py Added debug utilities: deterministic CUDA context manager and NaN detection hooks
src/modalities/gym.py Fixed typo in parameter name: early_stoppping → early_stopping
src/modalities/checkpointing/checkpoint_saving_strategies.py Fixed typo in parameter name across multiple methods
src/modalities/checkpointing/checkpoint_saving.py Fixed typo in parameter name and docstring

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants