Add mixed-attention Core ML mask support for stateful generation by Skyline-23 · Pull Request #331 · huggingface/swift-transformers

Skyline-23 · 2026-03-08T11:30:46Z

What

Add support for stateful Core ML language models that require multiple attention masks during generation.

Why

The current runtime only handles attentionMask / causalMask, which is not sufficient for mixed-attention Core ML exports that need separate masks for different layer types.

This change allows the stateful generation path to populate:

fullAttentionMask
slidingAttentionMask

when those inputs are present in the Core ML model description.

Implementation

add fullAttentionMask and slidingAttentionMask keys to LanguageModel.Keys
detect those inputs from modelDescription
build additive full-attention and sliding-window masks in the stateful generation path
resolve the sliding window size from Core ML metadata first, and fall back to Hugging Face config when needed
factor stateful generation input assembly into a reusable helper for test coverage
keep existing single-mask models working unchanged

Tests

add regression tests for additive full-attention mask construction
add regression tests for sliding-window mask construction
add an integration-style test that verifies the full input dictionary for a mixed-attention model contract
verify with:
swift test --filter LanguageModelCoreMLMaskTests

Scope clarification

This PR is intended to support explicit multi-mask Core ML generation contracts in the runtime.

It does not attempt to fix exporter-side approaches that reconstruct multiple masks inside a Core ML graph from a single causalMask input.

Additional context

Closes #330

Example converted Core ML repo using the explicit multi-mask contract:
https://huggingface.co/Skyline23/translategemma-4b-it-coreml

- add support for fullAttentionMask and slidingAttentionMask model inputs in the stateful generation path - derive sliding window masks from model metadata or config when needed - add regression tests for additive full and sliding attention mask construction

- add fullAttentionMask and slidingAttentionMask handling to the stateful generation path - resolve sliding window size from model metadata or config for mixed-attention models - add regression tests for additive full and sliding attention mask construction

- factor stateful generation input assembly into a reusable helper - verify full and sliding attention mask keys, shapes, and additive values - keep single-mask generation behavior unchanged while covering mixed-attention inputs

pcuenca

Very interesting and cool PR @Skyline-23! I won't be able to properly test and review it until the end of the week. Meanwhile, a couple of questions:

The converted example model seems to be using float32 instead of float16 (because of this line, and because the repo takes ~16 GB). Did you try to convert to float16? Did you try any quantization options?
Are you using or planning to use this Core ML model in a downstream app?

Thanks a lot for the contribution!

Skyline-23 · 2026-03-11T00:32:35Z

@pcuenca Sorry for late reply! It's fine. Please review slowly
The convert script was not optimized for the CoreML, I fixed it and add options with 4-bit and 8-bit quantized models.
It was problem with float 32 attention.
I will planning to use with on device translate app, In the test bed.
It could be canceled, but In my opinion this PR going to move repo good way for the various model support
Thanks!

Skyline-23 added 3 commits March 8, 2026 20:13

pcuenca reviewed Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mixed-attention Core ML mask support for stateful generation#331

Add mixed-attention Core ML mask support for stateful generation#331
Skyline-23 wants to merge 3 commits intohuggingface:mainfrom
Skyline-23:feat/mixed-attention-coreml-masks

Skyline-23 commented Mar 8, 2026 •

edited

Loading

Uh oh!

pcuenca left a comment

Uh oh!

Skyline-23 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Skyline-23 commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Skyline-23 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Skyline-23 commented Mar 8, 2026 •

edited

Loading