audio tester class by tarekziade · Pull Request #45391 · huggingface/transformers

tarekziade · 2026-04-13T06:32:49Z

What does this PR do?

Similarly to the VLM tester, this patch introduces a audio tester class, used in

Qwen2Audio
AudioFlamingo3
GraniteSpeech

Adding a new audio-language model using this will require ~8-20 lines for the tester (vs ~100-160 before). The boilerplate (config introspection, input preparation, SDPA dispatch test, common skips) lives in one place.

HuggingFaceDocBuilderDev · 2026-04-13T06:42:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tarekziade · 2026-04-13T07:18:06Z

run-slow: audioflamingo3, granite_speech, qwen2_audio

github-actions · 2026-04-13T07:19:27Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/audioflamingo3", "models/granite_speech", "models/qwen2_audio"]
quantizations: []

github-actions · 2026-04-13T07:32:46Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	5472faa4	workflow commit (merge commit)
PR	0817bdbd	branch commit (from PR)
main	a5533957	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

eustlb

This is cool!! 🔥
Models that should be covered by this PR:

audioflamingo3
glmasr
granite_speech
higgs_audio_v2
kyutai_speech_to_text
qwen2_audio
vibevoice_asr
voxtral
voxtral_realtime
musicflamingo

might:

gemma3n
gemma4
qwen2_5_omni
qwen3_omni_moe

eustlb · 2026-04-13T09:28:09Z

tests/alm_tester.py

+
+class AudioModelTester:
+    # If the model follows standard naming conventions, only `config_class` and


Even if ALMs acronym is not as conventionally accepted as VLMs, such a notation is used, so I'd rather align fully with VLMs here, it would be clearer imo

Suggested change

class AudioModelTester:

# If the model follows standard naming conventions, only `config_class` and

class ALMModelTester:

# If the model follows standard naming conventions, only `config_class` and

and rename the test file to alm_tester.py

eustlb · 2026-04-13T09:31:02Z

tests/alm_tester.py

+    base_model_class = None
+    sequence_classification_class = None
+


I don't see a model for which sequence_classification_class would be set. Not sure we should keep it.

eustlb · 2026-04-13T10:37:01Z

tests/alm_tester.py

+        kwargs.setdefault("audio_token_id", 0)
+        kwargs.setdefault("audio_token_index", 0)  # Alias for models that use this name
+        kwargs.setdefault("ignore_index", -100)


I would rather have it added directly when init is overwritten

Suggested change

kwargs.setdefault("audio_token_id", 0)

kwargs.setdefault("audio_token_index", 0) # Alias for models that use this name

kwargs.setdefault("ignore_index", -100)

kwargs.setdefault("audio_token_id", 0)

kwargs.setdefault("ignore_index", -100)

+1, i think we won't even need audio_token_index because config has an attribute_map

eustlb · 2026-04-13T10:47:32Z

tests/alm_tester.py

+                raise ValueError(
+                    f"You have inherited from AudioModelTester but did not set the {required_attribute} attribute."
+                )
+


we should remove the defaults for text_config and audio_config no? and rather use the same as for VLMs

Suggested change

for required_attribute in [

"base_model_class",

"config_class",

"conditional_generation_class",

"text_config_class",

"audio_config_class",

]:

if getattr(self, required_attribute) is None:

raise ValueError(

f"You have inherited from VLMModelTester but did not set the {required_attribute} attribute."

)

eustlb · 2026-04-13T10:48:34Z

tests/alm_tester.py

+    def get_num_audio_tokens(self, audio_features):
+        """Compute number of audio placeholder tokens from features. Override for different subsampling."""
+        # Default: 2-stage pooling (common for Whisper-style encoders)
+        input_length = (audio_features.shape[-1] - 1) // 2 + 1
+        return (input_length - 2) // 2 + 1


we shouldn't put whisper defaults here but rather force sub classes to write this method

eustlb · 2026-04-13T10:50:46Z

tests/alm_tester.py

+        config = self.get_config()
+        audio_features = self.create_audio_features()
+        num_audio_tokens = self.get_num_audio_tokens(audio_features)


some models take input_values: eg vibevoice_asr. We should handle those too

github-actions · 2026-04-13T12:08:28Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: audioflamingo3, granite_speech, qwen2_audio

zucchini-nlp · 2026-04-13T12:36:33Z

tests/alm_tester.py

+        # Text config defaults (small Qwen2-style backbone)
+        kwargs.setdefault(
+            "text_config",
+            {
+                "model_type": "qwen2",
+                "intermediate_size": 36,
+                "initializer_range": 0.02,
+                "hidden_size": 32,
+                "max_position_embeddings": 52,
+                "num_hidden_layers": 2,
+                "num_attention_heads": 4,
+                "num_key_value_heads": 2,
+                "vocab_size": 99,
+                "pad_token_id": 1,
+            },
+        )


can we really have a default for text/audio configs?

zucchini-nlp · 2026-04-13T12:38:16Z

tests/alm_tester.py

+    def test_sdpa_can_dispatch_composite_models(self):
+        """Verify SDPA toggles propagate correctly to audio and text sub-modules."""
+        if not self.has_attentions:


hmm why is this not handled by the Mixin, afair it gets base-model and the common attributes?

audio tester

3562c7f

tarekziade requested review from eustlb and zucchini-nlp April 13, 2026 06:32

tarekziade self-assigned this Apr 13, 2026

tweak check repo for audio tester

0817bdb

eustlb reviewed Apr 13, 2026

View reviewed changes

audio -> ALM

356c922

zucchini-nlp reviewed Apr 13, 2026

View reviewed changes


		class AudioModelTester:
		# If the model follows standard naming conventions, only `config_class` and

+       for required_attribute in [
+            "base_model_class",
+            "config_class",
+            "conditional_generation_class",
+            "text_config_class",
+            "audio_config_class",
+        ]:
+            if getattr(self, required_attribute) is None:
+                raise ValueError(
+                    f"You have inherited from VLMModelTester but did not set the {required_attribute} attribute."
+                )

Conversation

tarekziade commented Apr 13, 2026

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 13, 2026

Uh oh!

tarekziade commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

CI Results

Commit Info

Uh oh!

eustlb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants