Skip to content

Loading audio file with torchaudio fails (memory crash) #3919

@noorbraik

Description

@noorbraik

🐛 Describe the bug

When I try to load a 43-second .wav file, the memory consumption increases, which causes the session to crash. I have about 12GB of RAM.
This is the piece of code that I have

from transformers import ClapProcessor, ClapModel
import torchaudio
import torch
# Setup device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load CLAP model and processor
model = ClapModel.from_pretrained("laion/clap-htsat-unfused").to(device)
processor = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
# Load audio file
audio, sr = torchaudio.load("/content/temp_audio_6169.wav")
# Resample to 48kHz if needed
if sr != 48000:
    audio = torchaudio.transforms.Resample(sr, 48000)(audio)
# Convert stereo to mono
if audio.shape[0] > 1:
    audio = audio.mean(dim=0)
# Limit to 10 seconds (CLAP expects max 480000 samples at 48kHz)
audio = audio[:480000]
# Process audio
inputs = processor(audios=audio, sampling_rate=48000, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}  # Move to GPU
# Extract audio embedding
with torch.no_grad():
    audio_embedding = model.get_audio_features(**inputs)
print(":loud_sound: Audio embedding shape:", audio_embedding.shape)

audio file
temp_audio_6169.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions