Skip to content

Conversation

eoffermann
Copy link

Summary

Adds a self-contained Gradio front-end that turns text prompts into audio clips using the existing diffusion sampler and BigVGAN vocoder. Launch it with one command, explore results in the browser, download clips on demand—no disk writes unless the user clicks Download.


Highlights

  • Zero configuration: python app.py opens http://127.0.0.1:7860.
  • Five inputs: prompt, DDIM steps, duration, guidance scale, sample count (up to 10).
  • Parallel previews: up to 10 audio players appear dynamically; each has a built-in download button.
  • Stateless: all artefacts remain in RAM; nothing persists after the session.
  • Efficient cold-start: models load once at import; subsequent generations reuse them.

How to run

python app.py

That’s it—the default browser will open automatically.


Implementation notes

  • Tested locally on CUDA 12.4 GPU and on a CPU-only machine.
  • generate_and_update always returns a list of exactly MAX_AUDIO_PLAYERS gr.update objects, keeping Gradio’s diffing predictable.
  • TODOs are embedded in the docstring (GPU OOM handling, input validation, seed control).

Checklist

  • Code follows project style and PEP 8.
  • Comprehensive docstrings and inline comments.
  • No new runtime dependencies (except gradio)
  • Manual tests: GPU (CUDA 12.4) and CPU paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants