Skip to content

AudioMCQ: A 571k audio multiple-choice question dataset for post-training Large Audio Language Models with dual CoT annotations and audio-contribution filtering. πŸ† 1st place in DCASE 2025 Challenge.

License

Notifications You must be signed in to change notification settings

inclusionAI/AudioMCQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AudioMCQ: Audio Multiple-Choice Question Dataset

Official repository for the paper "Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models"

arXiv Dataset DCASE 2025

Quick Links

Overview

AudioMCQ is a comprehensive audio multiple-choice question dataset with 571k samples designed for post-training Large Audio Language Models (LALMs). The dataset features dual chain-of-thought annotations and audio-contribution filtering, achieving state-of-the-art results in audio understanding tasks.

Overview of dataset construction pipeline.

Distribution analysis of AudioMCQ dataset.

Randomly sampled questions from four distinct question types.

Key Highlights

  • 571k high-quality samples across sound, music, speech, and temporal domains
  • Dual CoT annotations: Structured and unstructured reasoning paths
  • Audio-Contribution filtering: Weak (54.8%) and strong (45.2%) splits
  • Pre-trained models available: Weak-to-Strong and Mixed-to-Strong paradigms

Dataset Access

For complete dataset information, statistics, data format, and download instructions, please visit:

The Hugging Face repository contains:

  • Full dataset documentation
  • Detailed statistics and examples
  • Data format specifications
  • Download links for audio files
  • Usage instructions
  • Model checkpoints

Model Checkpoints

We provide trained model checkpoints for two post-training paradigms:

Training Paradigm Hugging Face Link
Weak-to-Strong inclusionAI/AudioMCQ-Weak-To-Strong
Mixed-to-Strong inclusionAI/AudioMCQ-Mixed-To-Strong

Training Scripts

All training code used for this project can be found in the /training_scripts directory.

News

  • [2025.09] Paper published on arXiv
  • [2025.09] AudioMCQ dataset released with 571k samples
  • [2025.07] Achieved 1st place in DCASE 2025 Audio-Question-Answering challenge

Contact

Contributors

Citation

If you find AudioMCQ useful in your research, please cite:

@article{he2025audiomcq,
  title={Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models},
  author={He, Haolin and others},
  journal={arXiv preprint arXiv:2509.21060},
  year={2025}
}

Acknowledgements

We thank the organizers of DCASE 2025 and the research community for their valuable feedback and support.

Related Resources

About

AudioMCQ: A 571k audio multiple-choice question dataset for post-training Large Audio Language Models with dual CoT annotations and audio-contribution filtering. πŸ† 1st place in DCASE 2025 Challenge.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •