Fix docs about ASR and NeMo

makseq · makseq · commit 2bed56fb40f7 · 2021-02-17T02:18:03.000+03:00
diff --git a/label_studio/ml/examples/nemo/README.md b/label_studio/ml/examples/nemo/README.md
@@ -1,40 +1,45 @@
-This an example of using [Nvidia's NeMo toolkit](https://github.com/NVIDIA/NeMo) for creating ASR/NLU/TTS pre-labels.
-
 ## Automatic Speech Recognition
 
+This an example of using [Nvidia's NeMo toolkit](https://github.com/NVIDIA/NeMo) for creating ASR/NLU/TTS pre-labels.
+
 With ASR models, you can do audio pre-annotations drawn within a text area, aka _transcriptions_.
 
 <div style="margin:auto; text-align:center; width:100%"><img src="/images/nemo-asr.png" style="opacity: 0.7"/></div>
 
+## Start using it
+
+1. Follow [this installation guide](https://github.com/NVIDIA/NeMo#installation) to set up NeMo environment.
 
-1. Follow [this installation guide](https://github.com/NVIDIA/NeMo#installation) to set up NeMo environment
-2. Initialize Label Studio machine learning backend
+2. Download <a href="https://github.com/heartexlabs/label-studio/tree/master/label_studio/ml/examples/nemo/asr.py">asr.py</a> from github into the current directory (or use `label_studio/ml/examples/nemo/asr.py` from LS package) and initialize Label Studio machine learning backend: 
     ```bash
-    label-studio-ml init my_model --from label_studio/ml/examples/nemo/asr.py
+    label-studio-ml init my_model --from asr.py
     ```
+   
 3. Start machine learning backend:
    ```bash
    label-studio-ml start my_model
    ```
+   Wait until ML backend app starts on the default 9090 port.
    
-After this app starts on the default 9090 port, configure the template for ASR:
-1. In Label Studio, open the project settings page.
-2. From the templates list, select `Speech Transcription`. You can also create your own with `<TextArea>` and `<Audio>` tags. 
-
-Or copy this labeling config into LS: 
-```
-<View>
-  <Header value="Listen to the audio and write the transcription" />
-  <AudioPlus name="audio" value="$audio" />
-  <TextArea name="transcription" toName="audio" editable="true"
-            rows="4" transcription="true" maxSubmissions="1" />
-
-
-  <Style>
-  [dataneedsupdate]>div:first-child{flex-grow:1;order:2}
-  [dataneedsupdate]>div:last-child{margin-top:0 !important;margin-right:1em}
-  </Style>
-</View>
-```
-
-> Note: The NeMo engine downloads models automatically. This can take some time and could cause Label Studio UI to hang on the Model page while the models download.  
+4. Open the project Settings page in Label Studio.
+
+5. From the template list, select `Speech Transcription`. You can also create your own with `<TextArea>` and `<Audio>` tags. Or copy this labeling config into LS: 
+    ```xml
+    <View>
+      <Header value="Listen to the audio and write the transcription" />
+      <AudioPlus name="audio" value="$audio" />
+      <TextArea name="transcription" toName="audio" editable="true"
+                rows="4" transcription="true" maxSubmissions="1" />
+    
+    
+      <Style>
+      [dataneedsupdate]>div:first-child{flex-grow:1;order:2}
+      [dataneedsupdate]>div:last-child{margin-top:0 !important;margin-right:1em}
+      </Style>
+    </View>
+    ```
+6. Open the Model page in Label Studio.
+    > Note: The NeMo engine downloads models automatically. This can take some time and could cause Label Studio UI to hang on the Model page while the models download.   
+
+7. Add the ML backend using this address: `http://localhost:9090`
+
diff --git a/label_studio/ml/examples/nemo/asr.py b/label_studio/ml/examples/nemo/asr.py
@@ -4,7 +4,6 @@
 import nemo.collections.asr as nemo_asr
 
 from label_studio.ml import LabelStudioMLBase
-from label_studio.ml.utils import get_image_local_path
 
 
 logger = logging.getLogger(__name__)
@@ -22,7 +21,7 @@ def __init__(self, model_name='QuartzNet15x5Base-En', **kwargs):
         self.model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name=model_name)
 
     def predict(self, tasks, **kwargs):
-        audio_path = get_image_local_path(tasks[0]['data'][self.value])
+        audio_path = self.get_local_path(tasks[0]['data'][self.value])
         transcription = self.model.transcribe(paths2audio_files=[audio_path])[0]
         return [{
             'result': [{