generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 102
Open
Labels
bugSomething isn't workingSomething isn't workingp2pending-responseIssue is pending response from authorIssue is pending response from author
Description
Environment information
Framework: React (TypeScript)
AWS Amplify Version: 6.6.6, e.g. @aws-amplify/[email protected]
Browser: Chrome (latest, desktop)
OS: Windows 11
Device: Desktop
Audio Capture: navigator.mediaDevices.getUserMedia with AudioWorklet
Encoding: PCM16, WAV, 16kHz, mono
Describe the bug
# 🐞 Bug Report: Predictions.convert Speech-to-Text
## Problem Description
When calling `Predictions.convert` with short WAV audio (1–3 seconds), the result is always:
```json
{
"fullText": ""
}
Occasionally, instead of empty text, the call fails with:
Error from AWS Predictions: Error: Your stream is too big. Reduce the frame size and try your request again
This happens even with very small audio clips (~70–140 KB).
🔍 Notes
- Audio was tested at 16kHz, mono, PCM16.
- Other sample rates and stereo also tested → still empty.
- Verified that WAVs play correctly in browser via
Audio()
element.
Reproduction steps
🔬 Steps to Reproduce
- Record microphone input using
navigator.mediaDevices.getUserMedia
. - Capture raw PCM samples with an
AudioWorkletNode
. - Merge Int16 samples and encode into a 16-bit PCM WAV file at 16kHz.
- Pass the resulting
ArrayBuffer
toPredictions.convert
.
Recording Code
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const audioContext = new AudioContext({ sampleRate: 16000 });
await audioContext.audioWorklet.addModule("/recorderWorklet.js");
const source = audioContext.createMediaStreamSource(stream);
const workletNode = new AudioWorkletNode(audioContext, "recorder-processor");
workletNode.port.onmessage = (event) => {
const int16Array = event.data;
audioBuffer.addData(int16Array);
};
source.connect(workletNode).connect(audioContext.destination);
Buffer Management
const getBuffer = () => {
let buffer: any[] = [];
const add = (raw: any) => {
if (Array.isArray(raw)) {
buffer = buffer.concat(raw);
} else {
buffer.push(raw);
}
return buffer;
};
const reset = () => { buffer = []; };
return {
reset,
addData: add,
getData: () => buffer,
};
};
Convert to WAV and Call Predictions
const convertFromBuffer = async () => {
const mergedPCM = mergeInt16Arrays(audioBuffer.getData());
// Encode to WAV (PCM16 LE)
const wavBytes = encodeWAV(mergedPCM, 16000);
const wavArrayBuffer: ArrayBuffer = wavBytes.buffer.slice(
wavBytes.byteOffset,
wavBytes.byteOffset + wavBytes.byteLength
);
console.log("WAV length (bytes):", wavBytes.byteLength);
console.log("ArrayBuffer length:", wavArrayBuffer.byteLength);
try {
const { transcription } = await Predictions.convert({
transcription: {
source: { bytes: wavArrayBuffer },
language: "en-US",
},
});
console.log("Transcription result:", transcription);
} catch (error) {
console.error("AWS Predictions Error:", error);
}
};
WAV Encoder
function encodeWAV(int16Array: Int16Array, sampleRate: number): Uint8Array {
const buffer = new ArrayBuffer(44 + int16Array.length * 2);
const view = new DataView(buffer);
const writeString = (view: DataView, offset: number, str: string) => {
for (let i = 0; i < str.length; i++) {
view.setUint8(offset + i, str.charCodeAt(i));
}
};
const bytesPerSample = 2;
writeString(view, 0, "RIFF");
view.setUint32(4, 36 + int16Array.length * bytesPerSample, true);
writeString(view, 8, "WAVE");
writeString(view, 12, "fmt ");
view.setUint32(16, 16, true);
view.setUint16(20, 1, true);
view.setUint16(22, 1, true);
view.setUint32(24, sampleRate, true);
view.setUint32(28, sampleRate * bytesPerSample, true);
view.setUint16(32, bytesPerSample, true);
view.setUint16(34, 16, true);
writeString(view, 36, "data");
view.setUint32(40, int16Array.length * bytesPerSample, true);
let offset = 44;
for (let i = 0; i < int16Array.length; i++, offset += 2) {
view.setInt16(offset, int16Array[i], true);
}
return new Uint8Array(buffer);
}
⚠️ Observed Behavior
- Always returns empty transcription
{ fullText: "" }
. - Sometimes errors with
"Your stream is too big"
. - Happens even with small WAVs (2–3 seconds, 70–140 KB).
✅ Expected Behavior
- Short WAVs (≤3 seconds, ≤200 KB) should return valid transcriptions.
- If the format is invalid, the API should return a descriptive error instead of silently returning
{ fullText: "" }
.
❓ Questions for AWS Team
- What is the maximum supported audio size/duration for
Predictions.convert
? - Which formats are supported? Docs suggest PCM16 WAV or FLAC. Are others (WebM/Opus, MP3) valid?
- Why does
{ fullText: "" }
appear with no error? Does this indicate decoding failure (bad WAV header)? - Why is a ~140 KB (3-second) WAV sometimes rejected as "too big"?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingp2pending-responseIssue is pending response from authorIssue is pending response from author