Skip to content

Weight processing/position emebeddings attention#1218

Merged
jlarson4 merged 2 commits intodev-3.x-canaryfrom
weight-processing/position-emebeddings_attention
Mar 28, 2026
Merged

Weight processing/position emebeddings attention#1218
jlarson4 merged 2 commits intodev-3.x-canaryfrom
weight-processing/position-emebeddings_attention

Conversation

@jlarson4
Copy link
Copy Markdown
Collaborator

Description

Fixed a series of weight processing bugs. Most were caused in HookedTransformer by the transition to transformers v5

Hooked Transformer changes

  • Pythia/GPTNeoX & Phi – Explicit setting of default_prepend_bos=False
  • Qwen3 & Phi – Add Qwen3ForCausalLM to the add_bos_token exclusion list to avoid transformers v5 error
  • Phi – Added try/except block that handles if pad_token_id is not set, defaults to None if not provided (pad_token_id required in transformers v5)
  • OLMo2 – center_unembed and fold_value_biases were erroneously disabled for all models
  • Gemma3 larger models – 4B+ uses linear RoPE scaling (factor=8.0) for global attention layers, but HT applied standard frequencies. Added rotary_scaling_factor config field. Applied it in calculate_sin_cos_rotary for non-local layers. Set rotary_scaling_factor: 8.0 in 4B, 12B, and 27B configs.
  • Gemma 1 – Changed "gelu_new" to "gelu" in all four Gemma 1 config blocks to match the HuggingFace implementation used by TransformerBridge

TransformerBridge changes

  • Adjusted positional embedding bridge to differentiate between models that use different canonical names for their o_proj

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@jlarson4 jlarson4 merged commit 2c41b6c into dev-3.x-canary Mar 28, 2026
51 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant