Problem Description
While doing secondary development based on go2rtc, I found that when FFmpeg pulls a Xiaomi-protocol stream and converts it to an RTSP URL, audio disappears after about 10 minutes.
Verification
- Surface symptom
When pulling the RTSP stream with FFmpeg, I found that the audio track drifts too far from the video track, causing audio frames to be dropped.
- VLC verification
When playing with VLC, I can see that after buffer 1544, audio data starts being dropped, while video data is not dropped.
Root Cause
After reviewing the Xiaomi-related code in the go2rtc project, I found that in pkg/xiaomi/miss/producer.go, it uses:
timestamp40ms = 48000 * 0.040 = 1,920
with the comment:
// known cameras sends packets with 40ms long
However, after printing logs, I found Xiaomi stream packets already contain Timestamp.
I tried converting it using TimeToRTP(pkt.Timestamp, 48000), and the logs are as follows:
From the logs, each audio frame is not always 40ms, so the RTP timestamp increment is not always 1920, and in most cases it is less than 1920.
If we always add 1920 for every packet, the audio timeline becomes increasingly faster. After enough time, audio frames are dropped due to excessive A/V drift.
Problem Description
While doing secondary development based on go2rtc, I found that when FFmpeg pulls a Xiaomi-protocol stream and converts it to an RTSP URL, audio disappears after about 10 minutes.
Verification
When pulling the RTSP stream with FFmpeg, I found that the audio track drifts too far from the video track, causing audio frames to be dropped.
When playing with VLC, I can see that after buffer 1544, audio data starts being dropped, while video data is not dropped.
Root Cause
After reviewing the Xiaomi-related code in the go2rtc project, I found that in
pkg/xiaomi/miss/producer.go, it uses:timestamp40ms = 48000 * 0.040 = 1,920with the comment:
// known cameras sends packets with 40ms longHowever, after printing logs, I found Xiaomi stream packets already contain
Timestamp.I tried converting it using
TimeToRTP(pkt.Timestamp, 48000), and the logs are as follows:From the logs, each audio frame is not always 40ms, so the RTP timestamp increment is not always 1920, and in most cases it is less than 1920.
If we always add 1920 for every packet, the audio timeline becomes increasingly faster. After enough time, audio frames are dropped due to excessive A/V drift.