-
Notifications
You must be signed in to change notification settings - Fork 1.4k
add pre-connected audio buffer #2171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
❌ Invalid Changeset Format DetectedOne or more changeset files in this PR have an invalid format. Please ensure they adhere to:
Error details: |
| logger.debug( | ||
| "pre-connect audio connected", | ||
| extra={ | ||
| "sample_rate": sample_rate, | ||
| "num_channels": num_channels, | ||
| "participant": participant_id, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this log, we're not very verbose for other stuff, it doesn't seems important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a PR where I have a LK_DEBUG=1 or LK_DEBUG=2, ... env var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to know the buffer is used or not without any logs? I also saw the case there was the byte stream received but never closed. We will never know that happened without log...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add warnings for the timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I agree with that logs that print multiple times is annoying, but for this one it's a one time log, and I am wondering if we try to avoid debugging logs, how ppl to check if a component works and get the debug info when they run it in dev mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, replaced this one with a timeout warning
livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py
Outdated
Show resolved
Hide resolved
| pre_connect_audio_timeout: float = 5.0 | ||
| """The pre-connect audio will be ignored if it doesn't arrive within this time.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the timeout? ideally we know if we should wait for it based on the attributes. (So exposing a configuration for it isn't very useful, we can have a hard limit in the worst scenario)
| pre_connect_audio_timeout: float = 5.0 | |
| """The pre-connect audio will be ignored if it doesn't arrive within this time.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the byte stream is not stable in my test, a short timeout is needed IMO. otherwise user will need to wait for tens of seconds to get the response from the agent. I think it's better expose this to user if they want to decrease or increase the value based on needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my intuition is that it should just work OOTB since it's going to be a features implemented in all our client SDKs, so this value could just be internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still prefer to expose this option, to let users know that enabling this feature is not free, there is a timeout if the buffer is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like a client side bug? If the attribute is set that means the client will send the stream. if the stream never comes that's definitely a bug and we should fix it. do you have more info on what you mean by "the byte stream is not stable in my test"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the configurable timeout still needed with the other fixed to SFU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it's not needed? There is still a chance we cannot receive the buffer bc of the network right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just asking back to @theomonnom 's original question: do users need to think about this or can it just be internal? it's confusing anyways because it's not clear from the name if this covers the time from agent start to stream arrive or stream end. it actually turns out it's just the time from stream header to stream trailer.
no strong feelings here just re-raising the original question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to stream end. Sometimes stream head received but the tail lost, we don't rely on the stream head for anything.
I still think it's better to expose the timeout option, mainly to notify the user there is a timeout if the buffer is not arrived in time, if the buffer lost there will be a side effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing tail seems to be correlated with the way of sending (frame-by-frame vs max stream chunk size).
livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py
Outdated
Show resolved
Hide resolved
|
|
||
| @utils.log_exceptions(logger=logger) | ||
| async def _read_audio_task(self, reader: rtc.ByteStreamReader, participant_id: str): | ||
| if (fut := self._buffers.get(participant_id)) and fut.done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any technical limitations here, ofc @lukasIO open for feedback as always 😸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, I'm mostly struggling with managing the Track lifecycle here reliably, as previously it was just created during connect(), now we need to maintain its identity without introducing any surprises (e.g. record some other track, publish on the newly created one, audio engine state, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcherry yeah, that's what I meant by #2156 (comment)
the problem is that we don't have a track sid before publication time on the publishing client, so we'd need to use e.g. the mediastreamtrack id instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't have a track sid before publication time on the publishing client
@lukasIO not sure if I get it - the stream attributes are set just before sending (on iOS) vs track attribute will be set while publishing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically I wanted to decouple track buffer (depends on LocalTrack) from depending on LocalTrackPublication's state.
It would be a bit faster if we don't have to wait for the track publish response and instead start sending the byte stream already in parallel to the track publication request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something along the lines of this (still WIP)
the promise to send the byte stream is not awaited so the add track request can start in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about it a potential downside of my idea is that we might have a short gap in audio between audio buffer that's sent until the publication is sending valid data :/
might be worse than the additional time waiting for the response, but would need some comparison testing to make sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you shouldn't send the byte stream until after the remote participant has subscribed to your track. that also gives the agents framework a good time to start counting for a buffer timeout (the moment the subscription occurs)
|
iOS PR (WIP) livekit/client-sdk-swift#685 @longcw for your testing 🧪 |
|
waiting for a new release from livekit/python-sdks#433 |
| participant_identity: NotGivenOr[str] = NOT_GIVEN | ||
| """The participant to link to. If not provided, link to the first participant. | ||
| Can be overridden by the `participant` argument of RoomIO constructor or `set_participant`.""" | ||
| pre_connect_audio: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we enable that by default (or remove the flag)? There shouldn't be any significant penalty for doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion with @bcherry - should we just let the client decide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the Swift example instabilities are caused by sending the stream "too early" as discussed.
I'm fine with unblocking this PR, I wasn't able to break it yet 🤞
| # The process works in three steps: | ||
| # 1. RoomIO is set up with pre_connect_audio=True | ||
| # 2. When connecting to the room, the client sends any audio spoken before connection | ||
| # 3. This pre-connection audio is combined with new audio after connection is established |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider just omitting the specific example and ensuring the other examples are compatible instead
|
@longcw are there any remaining blockers to merge + release? |

replace #2156