You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Lab 3: Create and manage customizations using OCI Console
1
+
# Lab 3: Create, enable and manage customizations using OCI Console
2
2
3
3
## Introduction
4
-
When the Live Transcribe service as is, it may not provide perfect transcriptions for domain-specific words, acronyms and proper nouns.
4
+
When the Live Transcribe service is used as is, it may not provide perfect transcriptions for domain-specific words, acronyms and proper nouns.
5
5
Speech Customizations can be enabled when using the Live Transcribe service to improve the transcription accuracy in such cases.
6
6
In this session, we will help users get familiar with customizations and how to create and manage them using the OCI Console.
7
7
8
-
***Estimated Lab Time***: 5 minutes
8
+
***Estimated Lab Time***: 30 minutes
9
9
10
10
### Objectives
11
11
@@ -20,12 +20,12 @@ In this lab, you will:
20
20
21
21
## Task 1: Navigate to Speech Overview Page
22
22
23
-
Log into OCI Cloud Console. Using the Burger Menu on the top left corner, navigate to Analytics and AI menu and click it, and then select Language item under AI services.
23
+
Log into OCI Cloud Console. Using the Burger Menu in the top left corner, navigate to Analytics and AI menu and click it, and then select Speech item under AI services.
24
24

25
25
26
26
This will navigate you to the Speech overview page.
27
-
On the left you can toggle between overview and transcription jobs listing page.
28
-
Under documentation, you can find helpful links relevant to OCI speech service
27
+
On the left, you can access various features of the OCI Speech service i.e. Transcription Jobs, Live Transcribe, Customizations, and Text to Speech.
28
+
Under documentation, you can find helpful links relevant to OCI speech service.
29
29

30
30
31
31
@@ -93,6 +93,7 @@ We see that the live transcribe produces the right output only once. In order to
93
93
94
94
Let's create an ObjectStorageDataset from the console. For EBITDA, let's create a JSON file like so
95
95
```
96
+
<copy>
96
97
{
97
98
"datasetType": "ENTITY_LIST",
98
99
"entityList": [
@@ -117,6 +118,7 @@ We see that the live transcribe produces the right output only once. In order to
117
118
}
118
119
]
119
120
}
121
+
</copy>
120
122
```
121
123
122
124
2. Upload this JSON file to an Object Storage bucket
@@ -132,6 +134,7 @@ We see that the live transcribe produces the right output only once. In order to
132
134
133
135
5. If you want to provide an audio file as the pronunciation, first upload the audio file to object storage. You can provide multiple audio files for the same entity. The JSON file can look like:
134
136
```
137
+
<copy>
135
138
{
136
139
"datasetType": "ENTITY_LIST",
137
140
"entityList": [
@@ -167,13 +170,15 @@ We see that the live transcribe produces the right output only once. In order to
167
170
}
168
171
]
169
172
}
173
+
</copy>
170
174
```
171
175
172
176
173
177
## Task 4: Create and enable a customization with custom pronunciations and reference examples
174
178
Sometimes, even when enabling a customization with custom pronunciations, Live Transcribe may still provide some mis-transcription. For example, let's say you have an organization that's abbreviated as "DSCRD" and is pronounced as "discord". Even if you create and enable a customization for these entities, the transcription might still not have the right entity every time.
175
179
For example, let's say you created a customization for DSCRD using the below dataset:
176
180
```
181
+
<copy>
177
182
{
178
183
"datasetType": "ENTITY_LIST",
179
184
"entityList": [
@@ -192,6 +197,7 @@ For example, let's say you created a customization for DSCRD using the below dat
192
197
}
193
198
]
194
199
}
200
+
</copy>
195
201
```
196
202
When enabling this customization with the Live Transcribe service, there might still be instances where the transcription does not have DSCRD. For example, in the screenshot below the transcript says "What is the return on equity for discard".
197
203

@@ -202,13 +208,14 @@ Here is where Reference Examples come in.
202
208
203
209
You can provide simple examples of sentences where you expect to see entities of a given type. For example, since DSCRD has an entity type of "organizations" here are some sample reference examples for the "organizations" entity type:
204
210
205
-
* organization called \<organizations\>
206
-
* return on equity for \<organizations>
207
-
* work for \<organizations\>
208
-
* statement for \<organizations\>
211
+
* organization called `<organizations>`
212
+
* return on equity for `<organizations>`
213
+
* work for `<organizations>`
214
+
* statement for `<organizations>`
209
215
210
216
You can add these examples to the JSON file from before and upload the json file to object storage.
211
217
```
218
+
<copy>
212
219
{
213
220
"datasetType": "ENTITY_LIST",
214
221
"entityList": [
@@ -233,12 +240,13 @@ Here is where Reference Examples come in.
233
240
"statement for <organizations>"
234
241
]
235
242
}
243
+
</copy>
236
244
```
237
245
238
246
2. Use this dataset when creating the customization. The Customizations service will provide two customizations, one for the reference examples (we call this the Main Customization) and one for the entity list that you provide (we call this the Slot Customization). The Slot Customization i.e. the customization created for the entity list will have "--<entity-type>" in the display name.
239
247

240
248
241
-
If you open the details of the Main Customization, which in this case has the display name "dsrcd-customization", you will see that it has an entities section that shows which Slot Customization it refers to, for the "organizations" entity type (it is also possible to override this Slot Customization with another one when calling Live Transcribe).
249
+
If you open the details of the Main Customization, which in this case has the display name "dsrcd-customization", you will see that it has an entities section that shows which Slot Customization it refers to, for the "organizations" entity type.
242
250

243
251
244
252
If you open the details of the Slot Customization, which in this case has the display name "dsrcd-customization--organizations", you will see that it does not have an entities section.
@@ -247,8 +255,10 @@ Here is where Reference Examples come in.
247
255
3. It is important to enable the Main Customization when making the Live Transcribe call. Think of the main customization as encapsulating both the reference examples and the entity lists. Enabling the Slot Customization means that Live Transcribe will use just the entity list and not the reference examples. The Live Transcribe output with the main customization enabled looks like
248
256

249
257
258
+
We see that "DSCRD" appears as expected for the first 5 sentences. The goal of reference examples is to add more context for the custom entities that you have defined.
259
+
Note that the transcript may not have the custom entity for utterances that are NOT included in any of the reference examples. For example, in the above screenshot, the last line "The discord app is amazing" could very well have been "The DSCRD app is amazing". If you want to increase the chances of the model transcribing it as "The DSCRD app is amazing", you can add the reference example "`<organizations>` app" into the dataset.
250
260
251
-
## Task 5: Create and enable a customization with multiple entity lists, soundsLike and reference examples
261
+
## Task 5: Create and enable a customization with multiple entity lists, custom pronunciations and reference examples
252
262
One of the best use-cases for Live Transcribe is in the healthcare domain for doctor-patient conversations. Let's say you are a hospital using OCI Live Transcribe and you have the following requirements.
253
263
254
264
- You have 2 patients with the names - Daniel and Sorabh, and you want Live Transcribe to accurately transcribe both these names. As a bonus requirement, let's say that if the doctor utters "Dan" or "Danny", the Live Transcribe should still transcribe that as Daniel
@@ -259,91 +269,96 @@ When used as is, the Live Transcribe produces an output like so:
259
269

260
270
261
271
Let's use Speech Customizations to make this transcription better.
262
-
1. Creating and enabling the customization
272
+
1. Creating a customization using the below dataset. This dataset has two entity lists - one for the "names" entity type and one for the "medical" entity type.
273
+
Reference examples have been added for both the entity types. Note that you can have multiple entity types in a single reference example.
274
+
For example, you can have a reference example like "hi `<names>`, have you been taking `<medical>`".
275
+
263
276
```
277
+
<copy>
264
278
{
265
-
"datasetType": "ENTITY_LIST",
266
-
"entityList": [
267
-
{
268
-
"entityType": "names",
269
-
"entities": [
270
-
{
271
-
"entityValue": "Daniel",
272
-
"pronunciations": [
273
-
{
274
-
"soundsLike": "Danny"
275
-
},
276
-
{
277
-
"soundsLike": "Dan"
278
-
}
279
-
]
280
-
},
281
-
{
282
-
"entityValue": "Sorabh",
283
-
"pronunciations": [
284
-
{
285
-
"soundsLike": "Saurabh"
286
-
},
287
-
{
288
-
"soundsLike": "so rub"
289
-
},
290
-
{
291
-
"soundsLike": "so raab"
292
-
}
293
-
]
294
-
}
295
-
]
296
-
},
297
-
{
298
-
"entityType": "medical",
299
-
"entities": [
300
-
{
301
-
"entityValue": "PeriCare",
302
-
"pronunciations": [
303
-
{
304
-
"soundsLike": "Perry care"
305
-
},
306
-
{
307
-
"soundsLike": "Paris care"
308
-
}
309
-
]
310
-
},
279
+
"datasetType": "ENTITY_LIST",
280
+
"entityList": [
311
281
{
312
-
"entityValue": "procapil",
313
-
"pronunciations": [
314
-
{
315
-
"soundsLike": "pro sepil"
316
-
},
317
-
{
318
-
"soundsLike": "pro capill"
319
-
}
320
-
]
282
+
"entityType": "names",
283
+
"entities": [
284
+
{
285
+
"entityValue": "Daniel",
286
+
"pronunciations": [
287
+
{
288
+
"soundsLike": "Danny"
289
+
},
290
+
{
291
+
"soundsLike": "Dan"
292
+
}
293
+
]
294
+
},
295
+
{
296
+
"entityValue": "Sorabh",
297
+
"pronunciations": [
298
+
{
299
+
"soundsLike": "Saurabh"
300
+
},
301
+
{
302
+
"soundsLike": "so rub"
303
+
},
304
+
{
305
+
"soundsLike": "so raab"
306
+
}
307
+
]
308
+
}
309
+
]
321
310
},
322
311
{
323
-
"entityValue": "EpiCeram",
324
-
"pronunciations": [
325
-
{
326
-
"soundsLike": "epic serum"
327
-
},
328
-
{
329
-
"soundsLike": "a PC rum"
330
-
}
331
-
]
312
+
"entityType": "medical",
313
+
"entities": [
314
+
{
315
+
"entityValue": "PeriCare",
316
+
"pronunciations": [
317
+
{
318
+
"soundsLike": "Perry care"
319
+
},
320
+
{
321
+
"soundsLike": "Paris care"
322
+
}
323
+
]
324
+
},
325
+
{
326
+
"entityValue": "procapil",
327
+
"pronunciations": [
328
+
{
329
+
"soundsLike": "pro sepil"
330
+
},
331
+
{
332
+
"soundsLike": "pro capill"
333
+
}
334
+
]
335
+
},
336
+
{
337
+
"entityValue": "EpiCeram",
338
+
"pronunciations": [
339
+
{
340
+
"soundsLike": "epic serum"
341
+
},
342
+
{
343
+
"soundsLike": "a PC rum"
344
+
}
345
+
]
346
+
}
347
+
]
332
348
}
333
-
]
334
-
}
335
-
],
336
-
"referenceExamples": [
337
-
"hi <names>",
338
-
"hello <names>",
339
-
"good morning <names>",
340
-
"good afternoon <names>",
341
-
"tell me <names>",
342
-
"prescribe <medical>",
343
-
"take <medical>",
344
-
"apply <medical>"
345
-
]
349
+
],
350
+
"referenceExamples": [
351
+
"hi <names>",
352
+
"hello <names>",
353
+
"good morning <names>",
354
+
"good afternoon <names>",
355
+
"tell me <names>",
356
+
"prescribe <medical>",
357
+
"take <medical>",
358
+
"apply <medical>"
359
+
]
346
360
}
361
+
</copy>
347
362
```
348
363
2. This dataset would create 3 customizations - one main customization and two slot customizations.
0 commit comments