-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Description
Describe the bug
The SpeechRecognizer does not properly recognize text unless there is enough silence before spoken words.
To Reproduce
Steps to reproduce the behavior:
- Unzip the attached
TestSpeechToText.zip
- Edit
Program.cs
and replace theSpeechConfig.FromSubscription
with a valid subscription key. - Build and run the project.
- Observe the unexpected output.
- Comment out the line corresponding to
PhoneSurgeryBad.raw
and uncomment line corresponding toPhoneSurgeryGood.raw
- Run the project and observe the expected output.
PhoneSurgeryBad.raw
audio produces recognized text "Phone" instead of "Phone surgery":
Session started for ../../../PhoneSurgeryBad.raw.
Recognizing: phone surgery
Recognized: Phone.
Session stopped
PhoneSurgeryGood.raw
audio produces expected recognized text "Phone surgery":
Session started for ../../../PhoneSurgeryGood.raw.
Recognizing: phone surgery
Recognized: Phone surgery.
Session stopped
The only difference between the audio files is that there is a bit more leading silence in the PhoneSurgeryGood.raw
file.
Version of the Cognitive Services Speech SDK
- 1.45.0
Platform, Operating System, and Programming Language
- Windows 11
- x64
- C#
Additional context
- TestSpeechToText.zip file contents:
- TestSpeechToText.sln: Visual Studio 2022 v17.14.13 Solution File.
- TestSpeechToText.csproj: Visual Studio 2022 v17.14.13 Project File.
- Program.cs: Main source code.
- PhoneSurgeryBad.raw: 16Khz, 16-bit, mono audio containing speech "Phone Surgery" with 0.20s of leading silence. Speech was generated using SpeechSynthesizer.
- PhoneSurgeryGood.raw: 16Khz, 16-bit, mono audio containing speech "Phone Surgery" with 0.23s of leading silence. Speech was generated using SpeechSynthesizer.
- log-bad.txt: SPX_DBG logging when executing the program with PhoneSurgeryBad.raw.
- log-good.txt: SPX_DBG logging when executing the program with PhoneSurgeryGood.raw.
Metadata
Metadata
Assignees
Labels
No labels