You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fuse_qk_norm_rope, # If fuse_qk_norm_rope is true, do not apply fused RoPE in attention OP, and self.rotary_emb will be skipped in the overridden apply_rope.
56
56
layer_idx=layer_idx,
57
57
dtype=config.torch_dtype,
58
58
dense_bias=config.attention_bias,
59
59
config=model_config,
60
60
)
61
61
62
-
# If fuse_qk_norm_rope is true, we pass pos_embd_params=None to super().__init__,
63
-
# so we need to do assignment to record the actual pos_embd_params.
num_key_value_heads (int): The number of key value heads.
61
52
max_position_embeddings (int): The maximum position embeddings.
62
53
bias (bool): Whether to use bias in the linear layers.
63
-
pos_embd_params (PositionalEmbeddingParams): The positional embedding parameters.
64
-
qk_norm_type (QkNormType): The type of QK normalization.
65
-
layer_idx (int): The layer index.
54
+
pos_embd_params (Optional[PositionalEmbeddingParams]): The positional embedding parameters.
55
+
rope_fusion (Optional[bool]): Whether to fuse RoPE into the attention OP and skip applying unfused RoPE. If None, whether to fuse is decided by the capability of the attention backend.
56
+
layer_idx (Optional[int]): The layer index.
66
57
dtype (torch.dtype): The data type.
67
-
dense_bias (bool): Whether to use bias in the output projection layer.
68
-
config (ModelConfig): The model configuration.
58
+
dense_bias (Optional[bool]): Whether to use bias in the output projection layer.
59
+
config (Optional[ModelConfig]): The model configuration.
69
60
q_scaling (float): The scaling factor for the qk_scale. The definition is $O = softmax(QK^T * qk_scale) * V, qk_scale = 1 / (sqrt(head_dim) * q_scaling)$. The default value is 1.0.
70
-
attention_chunk_size (int): See [Chunked Attention] below.
61
+
attention_chunk_size (Optional[int]): See [Chunked Attention] below.
0 commit comments