|
1 | | -This an example of using [Nvidia's NeMo toolkit](https://github.com/NVIDIA/NeMo) for creating ASR/NLU/TTS pre-labels. |
2 | | - |
3 | 1 | ## Automatic Speech Recognition |
4 | 2 |
|
| 3 | +This an example of using [Nvidia's NeMo toolkit](https://github.com/NVIDIA/NeMo) for creating ASR/NLU/TTS pre-labels. |
| 4 | + |
5 | 5 | With ASR models, you can do audio pre-annotations drawn within a text area, aka _transcriptions_. |
6 | 6 |
|
7 | 7 | <div style="margin:auto; text-align:center; width:100%"><img src="/images/nemo-asr.png" style="opacity: 0.7"/></div> |
8 | 8 |
|
| 9 | +## Start using it |
| 10 | + |
| 11 | +1. Follow [this installation guide](https://github.com/NVIDIA/NeMo#installation) to set up NeMo environment. |
9 | 12 |
|
10 | | -1. Follow [this installation guide](https://github.com/NVIDIA/NeMo#installation) to set up NeMo environment |
11 | | -2. Initialize Label Studio machine learning backend |
| 13 | +2. Download <a href="https://github.com/heartexlabs/label-studio/tree/master/label_studio/ml/examples/nemo/asr.py">asr.py</a> from github into the current directory (or use `label_studio/ml/examples/nemo/asr.py` from LS package) and initialize Label Studio machine learning backend: |
12 | 14 | ```bash |
13 | | - label-studio-ml init my_model --from label_studio/ml/examples/nemo/asr.py |
| 15 | + label-studio-ml init my_model --from asr.py |
14 | 16 | ``` |
| 17 | + |
15 | 18 | 3. Start machine learning backend: |
16 | 19 | ```bash |
17 | 20 | label-studio-ml start my_model |
18 | 21 | ``` |
| 22 | + Wait until ML backend app starts on the default 9090 port. |
19 | 23 |
|
20 | | -After this app starts on the default 9090 port, configure the template for ASR: |
21 | | -1. In Label Studio, open the project settings page. |
22 | | -2. From the templates list, select `Speech Transcription`. You can also create your own with `<TextArea>` and `<Audio>` tags. |
23 | | -
|
24 | | -Or copy this labeling config into LS: |
25 | | -``` |
26 | | -<View> |
27 | | - <Header value="Listen to the audio and write the transcription" /> |
28 | | - <AudioPlus name="audio" value="$audio" /> |
29 | | - <TextArea name="transcription" toName="audio" editable="true" |
30 | | - rows="4" transcription="true" maxSubmissions="1" /> |
31 | | - |
32 | | - |
33 | | - <Style> |
34 | | - [dataneedsupdate]>div:first-child{flex-grow:1;order:2} |
35 | | - [dataneedsupdate]>div:last-child{margin-top:0 !important;margin-right:1em} |
36 | | - </Style> |
37 | | -</View> |
38 | | -``` |
39 | | -
|
40 | | -> Note: The NeMo engine downloads models automatically. This can take some time and could cause Label Studio UI to hang on the Model page while the models download. |
| 24 | +4. Open the project Settings page in Label Studio. |
| 25 | + |
| 26 | +5. From the template list, select `Speech Transcription`. You can also create your own with `<TextArea>` and `<Audio>` tags. Or copy this labeling config into LS: |
| 27 | + ```xml |
| 28 | + <View> |
| 29 | + <Header value="Listen to the audio and write the transcription" /> |
| 30 | + <AudioPlus name="audio" value="$audio" /> |
| 31 | + <TextArea name="transcription" toName="audio" editable="true" |
| 32 | + rows="4" transcription="true" maxSubmissions="1" /> |
| 33 | + |
| 34 | + |
| 35 | + <Style> |
| 36 | + [dataneedsupdate]>div:first-child{flex-grow:1;order:2} |
| 37 | + [dataneedsupdate]>div:last-child{margin-top:0 !important;margin-right:1em} |
| 38 | + </Style> |
| 39 | + </View> |
| 40 | + ``` |
| 41 | +6. Open the Model page in Label Studio. |
| 42 | + > Note: The NeMo engine downloads models automatically. This can take some time and could cause Label Studio UI to hang on the Model page while the models download. |
| 43 | +
|
| 44 | +7. Add the ML backend using this address: `http://localhost:9090` |
| 45 | +
|
0 commit comments