error in first train gen loss=0.0 

Hi, thank you very much for your amazing work! It's truly incredible. I want to train from scratch When I start the first training session, everything seems to be going well, but all the epochs show loss=0.0, disc loss=0.0, etc. Only the Mel Loss updates. After completing the 50 epochs, I'll send an image to show how it looks

![er1](https://github.com/yl4579/StyleTTS2/assets/147736764/3683270c-5c33-4b31-9431-d2eac25d0283)

I've completed 50 epochs in four hours on the testing database.

"I started the second training session now, but I encountered an error. It refuses to start and prompts me with a trace error, leaving me stuck. so need closing the terminal or notebook because freeze
```
#in this line code
 running_loss += loss_mel.item()
            g_loss.backward()
            if torch.isnan(g_loss):
                from IPython.core.debugger import set_trace
                set_trace()

            optimizer.step('bert_encoder')
            optimizer.step('bert')
            optimizer.step('predictor')
            optimizer.step('predictor_encoder')
```
Here's a screenshot showing how it looks when it gets stuck during the second training session. 

![er2](https://github.com/yl4579/StyleTTS2/assets/147736764/9f53166c-478d-4abc-b600-733149e90b1b)

I also attempted to use the 'accelerate launch --mixed_precision=fp16' command both with and without '--mixed_precision', and even tried running without acceleration using simple Python commands, but encountered the same issue.

I'm using the default 'config.yml' with a batch size of 20 and a maximum length of 300. I experimented with different batch sizes (e.g., 16) and maximum lengths (e.g., 100), but the problem persisted.

I've tested on rented GPUs such as the A6000 and on my own machine with an RTX4090 using WSL Linux, but encountered the same issue each time. I also tried using different databases, including ones with varying durations (e.g., 1, 4, and 6 hours) from a single speaker.

For the database format, I have around 5000 samples for training and 500 for validation, structured as follows format:
filename.wav|transcription|speaker

"I replaced the default dataset with my own, using 'data/train.txt' for training and 'data/val.txt' for validation. However, I'm unsure about the purpose of the 'OOD_text.txt' file. Should I modify or include this file in some way?

Could someone please help me understand what I might be doing wrong here?"


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

error in first train gen loss=0.0 #206

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

error in first train gen loss=0.0 #206

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions