Skip to content

Conversation

albertvillanova
Copy link
Member

@albertvillanova albertvillanova commented Oct 2, 2025

Replace setup with pyproject, as discussed internally in HF open source team.

This PR replaces the legacy setup.cfg configuration with a modern pyproject.toml setup.

Additionally, this PR fixes a long-standing packaging issue, as unintended modules (examples, scripts and tests) were being included in the distributed package:

In [1]: !pip install trl

In [2]: !ls /usr/local/lib/python3.12/dist-packages/scripts
Out[2]:
add_copyrights.py             generate_zen_image_dataset.py
generate_harmony_dataset.py   generate_zen_multi_image_dataset.py
generate_tiny_models.py       log_example_reports.py
generate_toolcall_dataset.py  log_reports.py
generate_zen_dataset.py       __pycache__

These have now been properly excluded to ensure only the intended source code is packaged.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines 117 to 119
[tool.setuptools]
include-package-data = true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[tool.setuptools]
include-package-data = true
[tool.setuptools.package-data]
"trl" = [
"templates/*.md",
"accelerate_configs/*.yaml",
"LICENSE",
"CONTRIBUTING.md",
"README.md",
]
[tool.setuptools.exclude-package-data]
"*" = ["__pycache__"]

maybe we can also drop MANIFEST.in and have everything here

pyproject.toml Outdated
version = { file = "VERSION" }

[tool.setuptools.packages.find]
where = ["."]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where = ["."]
where = ["trl*"]

otherwise building will include the examples, which we don't want

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe you don't even need exclude = ["tests*"] in this case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing that setup.cfg was actually mistakenly including examples and scripts.

@albertvillanova
Copy link
Member Author

I'm realizing that setup.cfg was actually mistakenly including examples and scripts.

Yes, @qgallouedec!!

Initially, my goal was just to replicate the previous behavior while migrating from setup.cfg to pyproject.toml.

But after looking deeper, I noticed that we were previously packaging and distributing unintended modules!

I did some investigation and found that we're not the only ones. For example, see the issue I opened in spanner-graph-notebook (which is preinstalled in all Collab environments):

So this PR will actually fix a long-standing packaging issue on our side as well.

@albertvillanova
Copy link
Member Author

albertvillanova commented Oct 6, 2025

At the end I had to keep MANIFEST.in because it is the only way we can exclude tests from being packaged due to default behavior of setuptools backend. See: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html#controlling-files-in-the-distribution

More precisely, the following files are included in a source distribution by default:
...

  • Files that match the following glob patterns: tests/test*.py, test/test*.py;

@albertvillanova albertvillanova changed the title Replace setup with pyproject Replace setup with pyproject and fix packaging unintended modules Oct 6, 2025
Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this solution works for me. I’ve tested it locally and can confirm that the packaged version contains exactly the required files.

@albertvillanova albertvillanova merged commit 56a8f11 into huggingface:main Oct 6, 2025
2 of 10 checks passed
qgallouedec added a commit that referenced this pull request Oct 6, 2025
commit ae6837f
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 18:40:18 2025 +0200

    Removed tokenizer/processor creation from example scripts (#4211)

commit 56a8f11
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 17:45:44 2025 +0200

    Replace setup with pyproject and fix packaging unintended modules (#4194)

commit 5291015
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 16:04:06 2025 +0200

    Remove `Optional` from `processing_class` in `PPOTrainer` (#4212)

commit 0588b1f
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 15:57:17 2025 +0200

    Updated vLLM integration guide (#4162)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 45ee98b
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 11:14:54 2025 +0200

    Replace unittest with pytest (#4188)

commit 3800a6e
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 11:13:21 2025 +0200

    Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 7ad9ce8
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 11:04:20 2025 +0200

    Remove tokenizer creation from `sft` example script (#4197)

commit 0c2dc14
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 08:31:58 2025 +0200

    Remove custome_container for building the docs (#4198)

commit ced8b33
Author: burtenshaw <[email protected]>
Date:   Mon Oct 6 08:23:11 2025 +0200

    [DOCS/FIX] lora without regrets - fix lr (#4207)
qgallouedec added a commit that referenced this pull request Oct 6, 2025
commit 65eb45c
Author: Quentin Gallouédec <[email protected]>
Date:   Mon Oct 6 13:07:18 2025 -0600

    Apply style and revert change in `sft_video_llm` example (#4214)

commit ae6837f
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 18:40:18 2025 +0200

    Removed tokenizer/processor creation from example scripts (#4211)

commit 56a8f11
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 17:45:44 2025 +0200

    Replace setup with pyproject and fix packaging unintended modules (#4194)

commit 5291015
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 16:04:06 2025 +0200

    Remove `Optional` from `processing_class` in `PPOTrainer` (#4212)

commit 0588b1f
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 15:57:17 2025 +0200

    Updated vLLM integration guide (#4162)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 45ee98b
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 11:14:54 2025 +0200

    Replace unittest with pytest (#4188)

commit 3800a6e
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 11:13:21 2025 +0200

    Hotfix: Exclude transformers 4.57.0 for Python 3.9 (#4209)

    Co-authored-by: Sergio Paniego Blanco <[email protected]>

commit 7ad9ce8
Author: Sergio Paniego Blanco <[email protected]>
Date:   Mon Oct 6 11:04:20 2025 +0200

    Remove tokenizer creation from `sft` example script (#4197)

commit 0c2dc14
Author: Albert Villanova del Moral <[email protected]>
Date:   Mon Oct 6 08:31:58 2025 +0200

    Remove custome_container for building the docs (#4198)

commit ced8b33
Author: burtenshaw <[email protected]>
Date:   Mon Oct 6 08:23:11 2025 +0200

    [DOCS/FIX] lora without regrets - fix lr (#4207)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants