Skip to content

SpeechRecognizer does not properly recognize text unless there is enough silence before spoken words. #2921

@amubiera

Description

@amubiera

Describe the bug

The SpeechRecognizer does not properly recognize text unless there is enough silence before spoken words.

To Reproduce

Steps to reproduce the behavior:

  1. Unzip the attached TestSpeechToText.zip
  2. Edit Program.cs and replace the SpeechConfig.FromSubscription with a valid subscription key.
  3. Build and run the project.
  4. Observe the unexpected output.
  5. Comment out the line corresponding to PhoneSurgeryBad.raw and uncomment line corresponding to PhoneSurgeryGood.raw
  6. Run the project and observe the expected output.

PhoneSurgeryBad.raw audio produces recognized text "Phone" instead of "Phone surgery":

Session started for ../../../PhoneSurgeryBad.raw.
Recognizing: phone surgery
Recognized: Phone.
Session stopped

PhoneSurgeryGood.raw audio produces expected recognized text "Phone surgery":

Session started for ../../../PhoneSurgeryGood.raw.
Recognizing: phone surgery
Recognized: Phone surgery.
Session stopped

The only difference between the audio files is that there is a bit more leading silence in the PhoneSurgeryGood.raw file.

Version of the Cognitive Services Speech SDK

  • 1.45.0

Platform, Operating System, and Programming Language

  • Windows 11
  • x64
  • C#

Additional context

  • TestSpeechToText.zip file contents:
    • TestSpeechToText.sln: Visual Studio 2022 v17.14.13 Solution File.
    • TestSpeechToText.csproj: Visual Studio 2022 v17.14.13 Project File.
    • Program.cs: Main source code.
    • PhoneSurgeryBad.raw: 16Khz, 16-bit, mono audio containing speech "Phone Surgery" with 0.20s of leading silence. Speech was generated using SpeechSynthesizer.
    • PhoneSurgeryGood.raw: 16Khz, 16-bit, mono audio containing speech "Phone Surgery" with 0.23s of leading silence. Speech was generated using SpeechSynthesizer.
    • log-bad.txt: SPX_DBG logging when executing the program with PhoneSurgeryBad.raw.
    • log-good.txt: SPX_DBG logging when executing the program with PhoneSurgeryGood.raw.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions