Added SFT LoRA notebook #4244

sergiopaniego · 2025-10-09T15:22:18Z

What does this PR do?

Add SFT LoRA notebook

Colab: https://colab.research.google.com/github/huggingface/trl/blob/sft-lora-notebook/examples/notebooks/sft_trl_lora_qlora.ipynb

Notebook can be early-reviewed.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-10-09T15:54:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

burtenshaw · 2025-10-09T16:01:10Z

Nice work. Reviewing is really hard with nbdime so I'm going to use screenshots. Sorry!

❤️ Capybara. But I think we could go for a more up to date dataset that contains new samples for the model. i.e. a math, code, or reasoning task.

I would simplify this cell down to 5 or so models. But I like the explanation.

I would also only do QLoRA and really go for the lowest memory. You can add a note saying, why not try without?

We should explain target modules/ useful because it's intuitive to understand.

…nb`)

qgallouedec · 2025-10-09T19:14:01Z

lgtm! I just simplified the notebook (removed the metadata) 18k lines to 500.

nb-clean clean -M examples/notebooks/sft_trl_lora_qlora.ipynb

albertvillanova

Excellent work, @sergiopaniego!! 🤗

This notebook is a fantastic initiative, with the right balance between accessibility and technical relevance, specially focused on beginner users to give them a truly hands-on way to experiment with large models without needing expensive hardware or prior setup knowledge.

A few standout strengths:

Beginner-friendly and engaging: The explanations, structure, and use of Colab make it easy for newcomers to learn by doing.
Practical and relevant: LoRA and QLoRA are modern, efficient fine-tuning methods, and showing both is very instructive.
Great ecosystem integration: Featuring Hugging Face tools like TRL, Transformers, and PEFT highlights the synergy within the ecosystem and encourages best practices.
Well-scoped: Covers the essential steps (setup → fine-tuning → inference) without overwhelming users.
Community-oriented: The tone and links to official TRL resources invite readers to explore and contribute further.

In summary, this notebook is not just a demo, but an ideal learning resource for new TRL users, both educational and inspiring!💡🚀

examples/notebooks/sft_trl_lora_qlora.ipynb

albertvillanova · 2025-10-10T07:20:00Z

examples/notebooks/sft_trl_lora_qlora.ipynb

+   "source": [
+    "Learn how to perform **Supervised Fine-Tuning (SFT)** with **LoRA/QLoRA** using **TRL**."
+   ]
+  },


What about adding an introductory section explaining the main concepts? This would help beginner users understand the key concepts before diving into installation or code. On the other hand, I would keep this section short (concise, bullet-style; maybe with links to more extensive explanations), so we avoid users to skip it (if too long) and make it clear the essentials quickly before jumping into the code.

Something like:

## Key concepts - SFT: Trains models from example input-output pairs to align behavior with human preferences. - LoRA: Updates only a few low-rank parameters, reducing training cost and memory. - QLoRA: A quantized version of LoRA that enables even larger models to fit on small GPUs. - TRL: The Hugging Face library that makes fine-tuning and reinforcement learning simple and efficient.

Additionally, we could also define base model and lora/qlora adapter, which are mentioned in the notebook without explaining them.

albertvillanova · 2025-10-10T07:21:31Z

examples/notebooks/sft_trl_lora_qlora.ipynb

+   "outputs": [],
+   "source": [
+    "# If using QLoRA\n",
+    "!pip install -Uq bitsandbytes\n",


Maybe worth adding a sentence in the markdown above explaining what bitsandbytes is?

examples/notebooks/sft_trl_lora_qlora.ipynb

Co-authored-by: Albert Villanova del Moral <[email protected]>

…ft-lora-notebook

sergiopaniego · 2025-10-10T14:34:55Z

Thanks for the feedback I've updated the notebook based on it!

I still have a few todos to cover (I've also added them at the top of the notebook). Once they're covered, we're ready for a final review 😄

@burtenshaw I've updated the dataset to HuggingFaceH4/Multilingual-Thinking. Contains reasoning traces in different languages and was used in the gpt-oss recipe. Since it's a reasoning dataset, we may need to add another option for non-reasoning models (I'd just add a comment about it in the notebook).
I'm now only using QLoRA. I've added more details about it and how to use LoRA if needed.

@qgallouedec should we clean completely the output in the notebook? From a reader view, they can miss some details that are in the outputs (training traces, prints...)

@albertvillanova thanks for the review! I've included your suggestions!

TODOS:

Check if dataset needs to be adapted (maybe convert it and upload it to org).
In case of using this reasoning dataset, add details or fallback for non-reasoning models (another dataset).
Run training for one epoch.
Update trackio screenshot.

qgallouedec · 2025-10-15T16:54:07Z

@qgallouedec should we clean completely the output in the notebook? From a reader view, they can miss some details that are in the outputs (training traces, prints...)

I think you can keep the outputs that facilitate understanding and delete the others (pip installation logs, etc.).

sergiopaniego · 2025-10-16T11:01:26Z

Ready for final review!

For the record, I cleaned the notebook with this answer to prevent GitHub from not rendering it: https://github.com/orgs/community/discussions/155944#discussioncomment-14611780

albertvillanova

Amazing work! Thanks, @sergiopaniego!

I think there is an issue with the image: https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/sft-lora-notebook-trackio.png

request_id: 01K80DK5CG94YGDQW5KGG5KZ2H; (10) DB Error: dispatch; code: 2

Apart from that, everything OK from my side.

Added SFT LoRA notebook

ce5ca20

qgallouedec added 4 commits October 9, 2025 19:07

cleaned (`nb-clean clean -M examples/notebooks/sft_trl_lora_qlora.ipy…

ba7d47b

…nb`)

lighter

0d5f372

banner

f1d0fad

nit

0efbb7e

albertvillanova reviewed Oct 10, 2025

View reviewed changes

sergiopaniego and others added 6 commits October 10, 2025 11:48

Update examples/notebooks/sft_trl_lora_qlora.ipynb

6e4d7e8

Co-authored-by: Albert Villanova del Moral <[email protected]>

Update examples/notebooks/sft_trl_lora_qlora.ipynb

e20d3ac

Co-authored-by: Albert Villanova del Moral <[email protected]>

Update examples/notebooks/sft_trl_lora_qlora.ipynb

5f09e1c

Co-authored-by: Albert Villanova del Moral <[email protected]>

Updated notebook

d0bb8b9

Merge branch 'sft-lora-notebook' of github.com:huggingface/trl into s…

1d2c756

…ft-lora-notebook

Merge branch 'main' into sft-lora-notebook

72dcff0

sergiopaniego added 4 commits October 16, 2025 12:45

Merge branch 'main' into sft-lora-notebook

c39f20e

Update notebook

288aacc

Updated image

561d4a4

Removed widgets

f35a659

albertvillanova approved these changes Oct 20, 2025

View reviewed changes

sergiopaniego merged commit 28bba8c into main Oct 20, 2025
3 checks passed

sergiopaniego deleted the sft-lora-notebook branch October 20, 2025 09:24

Added SFT LoRA notebook #4244

Added SFT LoRA notebook #4244

Uh oh!

Conversation

sergiopaniego commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 9, 2025

Uh oh!

burtenshaw commented Oct 9, 2025

Uh oh!

qgallouedec commented Oct 9, 2025

Uh oh!

albertvillanova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

albertvillanova Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

albertvillanova Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergiopaniego commented Oct 10, 2025

Uh oh!

qgallouedec commented Oct 15, 2025

Uh oh!

sergiopaniego commented Oct 16, 2025

Uh oh!

albertvillanova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sergiopaniego commented Oct 9, 2025 •

edited

Loading

albertvillanova left a comment •

edited

Loading

albertvillanova left a comment •

edited

Loading