-
Notifications
You must be signed in to change notification settings - Fork 30.5k
Add Zagros and ZagrosNext model architectures to Transformers #41135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, zagros, zagros_next |
hi @ZagrosLLMModel, we generally don't accept PRs for new architectures without significant pre-trained checkpoints! Are there any pre-trained Zagros models available? |
Hi, yes sir we have: |
What does this PR do?
This PR introduces two new model architectures, Zagros and ZagrosNext, to the Transformers library. The implementation includes:
Model architecture files (modeling_zagros.py and modeling_zagros_next.py) in src/transformers/models/zagros/ and src/transformers/models/zagros_next/.
Configuration files (configuration_zagros.py and configuration_zagros_next.py) for both models.
Comprehensive tests for both models in tests/models/zagros/ and tests/models/zagros_next/.
Updated documentation in docs/source/en/ with usage examples and details for Zagros and ZagrosNext.
Both models follow the standard structure and conventions of existing Transformers models, ensuring compatibility with the library's pipelines and utilities.
Motivation and Context
The Zagros and ZagrosNext models are designed to [briefly describe the purpose, e.g., "enhance performance on specific NLP tasks with novel architectural improvements"].
These models leverage standard Transformer conventions, making them easy to integrate into existing workflows.
The implementation is fully compatible with the Transformers library and supports all standard functionalities (e.g., training, inference, and pipeline integration).
Dependencies
No additional dependencies are required beyond the standard Transformers setup.
Tested with Python 3.9+ and PyTorch 2.0+.
Before submitting
This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?
Who can review?
@ArthurZucker (for text models)@Rocketknight1 (for pipelines and library compatibility)@stevhliu (for documentation)
Thank you for reviewing!