Skip to content

Commit 2bed56f

Browse files
committed
Fix docs about ASR and NeMo
1 parent 8523ba9 commit 2bed56f

File tree

2 files changed

+32
-28
lines changed

2 files changed

+32
-28
lines changed
Lines changed: 31 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,45 @@
1-
This an example of using [Nvidia's NeMo toolkit](https://github.com/NVIDIA/NeMo) for creating ASR/NLU/TTS pre-labels.
2-
31
## Automatic Speech Recognition
42

3+
This an example of using [Nvidia's NeMo toolkit](https://github.com/NVIDIA/NeMo) for creating ASR/NLU/TTS pre-labels.
4+
55
With ASR models, you can do audio pre-annotations drawn within a text area, aka _transcriptions_.
66

77
<div style="margin:auto; text-align:center; width:100%"><img src="/images/nemo-asr.png" style="opacity: 0.7"/></div>
88

9+
## Start using it
10+
11+
1. Follow [this installation guide](https://github.com/NVIDIA/NeMo#installation) to set up NeMo environment.
912

10-
1. Follow [this installation guide](https://github.com/NVIDIA/NeMo#installation) to set up NeMo environment
11-
2. Initialize Label Studio machine learning backend
13+
2. Download <a href="https://github.com/heartexlabs/label-studio/tree/master/label_studio/ml/examples/nemo/asr.py">asr.py</a> from github into the current directory (or use `label_studio/ml/examples/nemo/asr.py` from LS package) and initialize Label Studio machine learning backend:
1214
```bash
13-
label-studio-ml init my_model --from label_studio/ml/examples/nemo/asr.py
15+
label-studio-ml init my_model --from asr.py
1416
```
17+
1518
3. Start machine learning backend:
1619
```bash
1720
label-studio-ml start my_model
1821
```
22+
Wait until ML backend app starts on the default 9090 port.
1923

20-
After this app starts on the default 9090 port, configure the template for ASR:
21-
1. In Label Studio, open the project settings page.
22-
2. From the templates list, select `Speech Transcription`. You can also create your own with `<TextArea>` and `<Audio>` tags.
23-
24-
Or copy this labeling config into LS:
25-
```
26-
<View>
27-
<Header value="Listen to the audio and write the transcription" />
28-
<AudioPlus name="audio" value="$audio" />
29-
<TextArea name="transcription" toName="audio" editable="true"
30-
rows="4" transcription="true" maxSubmissions="1" />
31-
32-
33-
<Style>
34-
[dataneedsupdate]>div:first-child{flex-grow:1;order:2}
35-
[dataneedsupdate]>div:last-child{margin-top:0 !important;margin-right:1em}
36-
</Style>
37-
</View>
38-
```
39-
40-
> Note: The NeMo engine downloads models automatically. This can take some time and could cause Label Studio UI to hang on the Model page while the models download.
24+
4. Open the project Settings page in Label Studio.
25+
26+
5. From the template list, select `Speech Transcription`. You can also create your own with `<TextArea>` and `<Audio>` tags. Or copy this labeling config into LS:
27+
```xml
28+
<View>
29+
<Header value="Listen to the audio and write the transcription" />
30+
<AudioPlus name="audio" value="$audio" />
31+
<TextArea name="transcription" toName="audio" editable="true"
32+
rows="4" transcription="true" maxSubmissions="1" />
33+
34+
35+
<Style>
36+
[dataneedsupdate]>div:first-child{flex-grow:1;order:2}
37+
[dataneedsupdate]>div:last-child{margin-top:0 !important;margin-right:1em}
38+
</Style>
39+
</View>
40+
```
41+
6. Open the Model page in Label Studio.
42+
> Note: The NeMo engine downloads models automatically. This can take some time and could cause Label Studio UI to hang on the Model page while the models download.
43+
44+
7. Add the ML backend using this address: `http://localhost:9090`
45+

label_studio/ml/examples/nemo/asr.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
import nemo.collections.asr as nemo_asr
55

66
from label_studio.ml import LabelStudioMLBase
7-
from label_studio.ml.utils import get_image_local_path
87

98

109
logger = logging.getLogger(__name__)
@@ -22,7 +21,7 @@ def __init__(self, model_name='QuartzNet15x5Base-En', **kwargs):
2221
self.model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name=model_name)
2322

2423
def predict(self, tasks, **kwargs):
25-
audio_path = get_image_local_path(tasks[0]['data'][self.value])
24+
audio_path = self.get_local_path(tasks[0]['data'][self.value])
2625
transcription = self.model.transcribe(paths2audio_files=[audio_path])[0]
2726
return [{
2827
'result': [{

0 commit comments

Comments
 (0)