Skip to content

Conversation

@lenisha
Copy link
Contributor

@lenisha lenisha commented Dec 19, 2023

Purpose

Add MultiChannel Support for audio files in batch transcription

Pull Request Type

What kind of change does this Pull Request introduce?
Support to set MultiChannel transcription feature

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

#2185

}
},
"Channels": {
"defaultValue": "1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default value of transcription is [0,1]. i.e. If no channels is specified, both channel 0 and channel 1 of stereo will be transcribed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not what happens if we do not pass Channels in the Transcription request properties , it results with Invalid Data error .Tested on multichannel audios.

"channels": [
      0,
      1
    ],

"TextAnalyticsEndpoint": {
"defaultValue": "",
"type": "String",
"allowedValues": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we support TextAnalytics in any region now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to latest release of ARM in this repo yes Endpoint is supported specifically to support Private Endpoints that are not regional. Not sure why latest Tag is not merged in master

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/ingestion-v2.0.11/samples/ingestion/ingestion-client/Setup/ArmTemplateBatch.json


public static readonly string StartTranscriptionServiceBusConnectionString = Environment.GetEnvironmentVariable(nameof(StartTranscriptionServiceBusConnectionString), EnvironmentVariableTarget.Process);

public static readonly int[] Channels = int.TryParse(Environment.GetEnvironmentVariable(nameof(Channels), EnvironmentVariableTarget.Process), out int result) && result == 1 ? Constants.Channels : new int[] { result };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading channels from environment variables looks weird. Do we ask customers to change the environment variable value for each audio to be transcribed? How about if multiple audios are submitted but with different channel settings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's global setting for Azure Function applicable to all audios

string locale,
IEnumerable<string> contentUrls,
Dictionary<string, string> properties,
Dictionary<string, object> properties,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird. Why do we need to change this to "object"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Properties for transcription request not always just string , for example

"properties": {
    "diarizationEnabled": false,
    "wordLevelTimestampsEnabled": true,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "languageIdentification": {
      "candidateLocales": [
        "en-US",
        "de-DE",
        "es-ES"
      ]
    }
  },

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-create?pivots=rest-api#create-a-transcription-job


public const string SummarizationSupportedLocalePrefix = "en";

public static readonly int[] Channels = new int[] { 0, 1 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I understand the purpose of this PR. Multichannel support should be enabled by default in batch transcription. Only if a customer wants to transcribe a specific channel, he needs to specify the channel number in request property. For example, he only wants to transcribe channel 1, but not channel 0, of a stereo audio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test the default settings - it always returns "Invalid Data" for multi Channel audios, it does not do that by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants