Skip to content

Conversation

@AkshDass
Copy link

When video frames are significantly behind schedule (i.e. when the check in line 176 passes), self._next_frame_time should not be handled any differently.

By setting self._next_frame_time = time.perf_counter(), we don't allow for frames to be played at a faster frame-rate to be able to catch up. This is especially a problem whenever the video falls significantly behind the audio.

Removing this line means that until self.next_frame_time catches up to current_time (defined in line 164), we don't wait before playing available video frames. This allows us to have some semblance of ground truth w.r.t. video frames and allows us to "catch up" to where the video should be if we fall significantly behind schedule.

@CLAassistant
Copy link

CLAassistant commented May 12, 2025

CLA assistant check
All committers have signed the CLA.

@davidzhao davidzhao requested a review from longcw May 12, 2025 23:57
@longcw
Copy link
Contributor

longcw commented May 13, 2025

This is intended. Otherwise, when the video catches up after it was behinds the audio for a while, the sleep time is < 0 for long time where the stream fps is not limited at all.

You can disable it by increasing the max_delay_tolerance_ms.

@AkshDass
Copy link
Author

This is intended. Otherwise, when the video catches up after it was behinds the audio for a while, the sleep time is < 0 for long time where the stream fps is not limited at all.

You can disable it by increasing the max_delay_tolerance_ms.

I may be using this class incorrectly then, but I'm still a bit confused as to why this behavior would be intended. Specifically, I'm using this class in the context of livekit agents for an AI avatar. If our video stream falls behind the audio stream, beyond max_delay_tolerance_ms, then each iteration where sleep time is < 0 means that we "catch up" 1 frame closer to what the frame should be.

After we reset next_frame_time, we then go back to sleeping by self._frame_interval seconds, which means that in order for our video to align with our audio properly we need to manually omit pushing frames to the AVSynchronizer.

To illustrate, here's an example. Imagine my video queue is 300 ms and my audio queue is 1000 ms. I am submitting 10 seconds of video and 10 seconds of audio to be played through the AV synchronizer. If my video falls behind for 800 ms (i.e. a frame is not ready for 500 ms) then if we do not reset _next_frame_time, as long as my other frames are ready in faster than self._frame_interval time my video will quickly catch up to where it is supposed to be in line with the audio. However, if we reset self._frame_interval, then yes we go back to playing video at the original FPS, but it is always 500ms behind (since we play out the 300ms in my video queue then wait another 500 ms before we have another frame available).

Assuming you have frames 1 - N in a stream that get pushed in order to the AVSynchronizer, I don't understand why you would want to reset self._next_frame_time unless you manually skip over what frame must be pushed.

@longcw
Copy link
Contributor

longcw commented May 13, 2025

So first, there is already a max_delay_tolerance_ms default 0.3s, that is to say it's fine to have video behind audio for 0.3s and we allow the video to catch up with sleep_time <= 0.

When the video behind the audio too much, allowing unlimited tolerance, either you will see a insync in the frontend, or when the video generation runs faster, it streams in much higher fps to catch up which is not expected as the it actually cannot stream or display that many frames at a time (the fps is much higher than expected since the sleep time is still 0 for a while as we didn't rest the next frame time).

Resting the next frame time will cause audio lag, basically it makes the audio wait for the video. It's fine if it happens occasionally, assuming the video generation is faster than realtime for most of the time. If it happens frequently, you should optimize the video generation or decrease your target fps instead.

Or if you realy don't want it, just increase the max_delay_tolerance_ms or set it to inf.

@longcw
Copy link
Contributor

longcw commented May 13, 2025

I think the misunderstanding part is

then yes we go back to playing video at the original FPS, but it is always 500ms behind

It's not video always behind audio 500ms, it's the audio playing will have a lag to wait for the video then they still are in sync with the expected fps.

@longcw longcw closed this May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants