-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Added SFT LoRA notebook #4244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added SFT LoRA notebook #4244
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
lgtm! I just simplified the notebook (removed the metadata) 18k lines to 500. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work, @sergiopaniego!! 🤗
This notebook is a fantastic initiative, with the right balance between accessibility and technical relevance, specially focused on beginner users to give them a truly hands-on way to experiment with large models without needing expensive hardware or prior setup knowledge.
A few standout strengths:
- Beginner-friendly and engaging: The explanations, structure, and use of Colab make it easy for newcomers to learn by doing.
- Practical and relevant: LoRA and QLoRA are modern, efficient fine-tuning methods, and showing both is very instructive.
- Great ecosystem integration: Featuring Hugging Face tools like TRL, Transformers, and PEFT highlights the synergy within the ecosystem and encourages best practices.
- Well-scoped: Covers the essential steps (setup → fine-tuning → inference) without overwhelming users.
- Community-oriented: The tone and links to official TRL resources invite readers to explore and contribute further.
In summary, this notebook is not just a demo, but an ideal learning resource for new TRL users, both educational and inspiring!💡🚀
| "source": [ | ||
| "Learn how to perform **Supervised Fine-Tuning (SFT)** with **LoRA/QLoRA** using **TRL**." | ||
| ] | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding an introductory section explaining the main concepts? This would help beginner users understand the key concepts before diving into installation or code. On the other hand, I would keep this section short (concise, bullet-style; maybe with links to more extensive explanations), so we avoid users to skip it (if too long) and make it clear the essentials quickly before jumping into the code.
Something like:
## Key concepts
- SFT: Trains models from example input-output pairs to align behavior with human preferences.
- LoRA: Updates only a few low-rank parameters, reducing training cost and memory.
- QLoRA: A quantized version of LoRA that enables even larger models to fit on small GPUs.
- TRL: The Hugging Face library that makes fine-tuning and reinforcement learning simple and efficient.Additionally, we could also define base model and lora/qlora adapter, which are mentioned in the notebook without explaining them.
| "outputs": [], | ||
| "source": [ | ||
| "# If using QLoRA\n", | ||
| "!pip install -Uq bitsandbytes\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth adding a sentence in the markdown above explaining what bitsandbytes is?
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
|
Thanks for the feedback I've updated the notebook based on it! I still have a few todos to cover (I've also added them at the top of the notebook). Once they're covered, we're ready for a final review 😄 @burtenshaw I've updated the dataset to @qgallouedec should we clean completely the output in the notebook? From a reader view, they can miss some details that are in the outputs (training traces, prints...) @albertvillanova thanks for the review! I've included your suggestions! TODOS:
|
I think you can keep the outputs that facilitate understanding and delete the others (pip installation logs, etc.). |
|
Ready for final review! For the record, I cleaned the notebook with this answer to prevent GitHub from not rendering it: https://github.com/orgs/community/discussions/155944#discussioncomment-14611780 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! Thanks, @sergiopaniego!
I think there is an issue with the image: https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/sft-lora-notebook-trackio.png
request_id: 01K80DK5CG94YGDQW5KGG5KZ2H; (10) DB Error: dispatch; code: 2
Apart from that, everything OK from my side.




What does this PR do?
Add SFT LoRA notebook
Colab: https://colab.research.google.com/github/huggingface/trl/blob/sft-lora-notebook/examples/notebooks/sft_trl_lora_qlora.ipynb
Notebook can be early-reviewed.
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.