Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
### Warning
`fsdp_transformer_layer_cls_to_wrap` must be set to the name of the specific decoder layer.
The LLaMA Hugging Face PR is not stable.
Earlier commits used the name `LLaMADecoderLayer` for their decoder layer (the commit hash our code is based on this).
Earlier commits used the name `LLaMADecoderLayer` for their decoder layer (as did the commit hash our code is based on).
More recent commits use `LlamaDecoderLayer` (notice the small case difference).
Not setting `fsdp_transformer_layer_cls_to_wrap` to the correct name will lead to drastic slowdowns in training.

Expand Down