add pre-connected audio buffer #2171

longcw · 2025-05-01T02:32:44Z

replace #2156

github-actions · 2025-05-01T02:32:59Z

❌ Invalid Changeset Format Detected

One or more changeset files in this PR have an invalid format. Please ensure they adhere to:

Start with --- and include a closing --- on its own line.
Each package line must be in the format:
"package-name": patch|minor|major
No duplicate package entries allowed.
A non-empty change description must follow the front matter.

Error details:
.github/next-release/changeset-c58eb3b3.md: Failed to read file from git branch 'pr_head'.

…io-clean

livekit-agents/livekit/agents/voice/room_io/room_io.py

theomonnom · 2025-05-01T10:35:05Z

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py

+            logger.debug(
+                "pre-connect audio connected",
+                extra={
+                    "sample_rate": sample_rate,
+                    "num_channels": num_channels,
+                    "participant": participant_id,
+                },
+            )


I would remove this log, we're not very verbose for other stuff, it doesn't seems important

I have a PR where I have a LK_DEBUG=1 or LK_DEBUG=2, ... env var

It's hard to know the buffer is used or not without any logs? I also saw the case there was the byte stream received but never closed. We will never know that happened without log...

we can add warnings for the timeout?

btw, I agree with that logs that print multiple times is annoying, but for this one it's a one time log, and I am wondering if we try to avoid debugging logs, how ppl to check if a component works and get the debug info when they run it in dev mode.

sg, replaced this one with a timeout warning

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py

theomonnom · 2025-05-01T10:37:29Z

livekit-agents/livekit/agents/voice/room_io/room_io.py

+    pre_connect_audio_timeout: float = 5.0
+    """The pre-connect audio will be ignored if it doesn't arrive within this time."""


Can we remove the timeout? ideally we know if we should wait for it based on the attributes. (So exposing a configuration for it isn't very useful, we can have a hard limit in the worst scenario)

Suggested change

pre_connect_audio_timeout: float = 5.0

"""The pre-connect audio will be ignored if it doesn't arrive within this time."""

the byte stream is not stable in my test, a short timeout is needed IMO. otherwise user will need to wait for tens of seconds to get the response from the agent. I think it's better expose this to user if they want to decrease or increase the value based on needs.

Right, my intuition is that it should just work OOTB since it's going to be a features implemented in all our client SDKs, so this value could just be internal

I still prefer to expose this option, to let users know that enabling this feature is not free, there is a timeout if the buffer is missing

sounds like a client side bug? If the attribute is set that means the client will send the stream. if the stream never comes that's definitely a bug and we should fix it. do you have more info on what you mean by "the byte stream is not stable in my test"?

Is the configurable timeout still needed with the other fixed to SFU?

Why it's not needed? There is still a chance we cannot receive the buffer bc of the network right?

just asking back to @theomonnom 's original question: do users need to think about this or can it just be internal? it's confusing anyways because it's not clear from the name if this covers the time from agent start to stream arrive or stream end. it actually turns out it's just the time from stream header to stream trailer.

no strong feelings here just re-raising the original question.

It's to stream end. Sometimes stream head received but the tail lost, we don't rely on the stream head for anything.

I still think it's better to expose the timeout option, mainly to notify the user there is a timeout if the buffer is not arrived in time, if the buffer lost there will be a side effect.

Missing tail seems to be correlated with the way of sending (frame-by-frame vs max stream chunk size).

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py

livekit-agents/livekit/agents/voice/room_io/_input.py

…io-clean

bcherry · 2025-05-02T21:36:50Z

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py

+
+    @utils.log_exceptions(logger=logger)
+    async def _read_audio_task(self, reader: rtc.ByteStreamReader, participant_id: str):
+        if (fut := self._buffers.get(participant_id)) and fut.done():


@lukasIO @pblazej are we planning to send the track id on the stream attributes as well? would be good to use that as the key here if so (assuming we're using the track features to drive the flag for this feature rather than a setting on roomio)

I don't see any technical limitations here, ofc @lukasIO open for feedback as always 😸

Just FYI, I'm mostly struggling with managing the Track lifecycle here reliably, as previously it was just created during connect(), now we need to maintain its identity without introducing any surprises (e.g. record some other track, publish on the newly created one, audio engine state, etc.).

@bcherry yeah, that's what I meant by #2156 (comment)
the problem is that we don't have a track sid before publication time on the publishing client, so we'd need to use e.g. the mediastreamtrack id instead.

don't have a track sid before publication time on the publishing client

@lukasIO not sure if I get it - the stream attributes are set just before sending (on iOS) vs track attribute will be set while publishing

Basically I wanted to decouple track buffer (depends on LocalTrack) from depending on LocalTrackPublication's state.

It would be a bit faster if we don't have to wait for the track publish response and instead start sending the byte stream already in parallel to the track publication request.

something along the lines of this (still WIP)

the promise to send the byte stream is not awaited so the add track request can start in parallel

thinking about it a potential downside of my idea is that we might have a short gap in audio between audio buffer that's sent until the publication is sending valid data :/
might be worse than the additional time waiting for the response, but would need some comparison testing to make sure

yeah you shouldn't send the byte stream until after the remote participant has subscribed to your track. that also gives the agents framework a good time to start counting for a buffer timeout (the moment the subscription occurs)

…io-clean

pblazej · 2025-05-05T06:09:28Z

iOS PR (WIP) livekit/client-sdk-swift#685 @longcw for your testing 🧪

…io-clean

longcw · 2025-05-08T09:54:29Z

waiting for a new release from livekit/python-sdks#433

pblazej · 2025-05-09T14:12:05Z

livekit-agents/livekit/agents/voice/room_io/room_io.py

    participant_identity: NotGivenOr[str] = NOT_GIVEN
    """The participant to link to. If not provided, link to the first participant.
    Can be overridden by the `participant` argument of RoomIO constructor or `set_participant`."""
+    pre_connect_audio: bool = False


Can we enable that by default (or remove the flag)? There shouldn't be any significant penalty for doing so.

Per discussion with @bcherry - should we just let the client decide?

pblazej

Looks like the Swift example instabilities are caused by sending the stream "too early" as discussed.

I'm fine with unblocking this PR, I wasn't able to break it yet 🤞

…io-clean

bcherry · 2025-05-13T02:02:31Z

examples/voice_agents/pre_connect_audio_buffer.py

+# The process works in three steps:
+# 1. RoomIO is set up with pre_connect_audio=True
+# 2. When connecting to the room, the client sends any audio spoken before connection
+# 3. This pre-connection audio is combined with new audio after connection is established


I'd consider just omitting the specific example and ensuring the other examples are compatible instead

…io-clean

bcherry · 2025-05-14T22:40:40Z

@longcw are there any remaining blockers to merge + release?

longcw added 13 commits April 29, 2025 16:32

add pre-connect audio buffer

355b47f

read buffer as a list

b732d68

clean logs

ca15aa0

add wait_for_data for PreConnectAudioData

c161ebd

Merge remote-tracking branch 'origin/main' into longc/pre-connect-audio

b77db38

support multi participant

4b0073f

update PreConnectAudioData

ad3e6b7

move PreConnectAudioHandler to room io

1ea9d6b

update comments

13171f9

update comments

f066d26

clean up timeout

92fa347

check PRE_CONNECT_AUDIO_ATTRIBUTE == true

ca5537e

add warning for PreConnectAudioHandler

7da1d04

Merge remote-tracking branch 'origin/main' into longc/pre-connect-aud…

f1a7df1

…io-clean

longcw mentioned this pull request May 1, 2025

read frames from pre-connect audio buffer #2156

Closed

theomonnom reviewed May 1, 2025

View reviewed changes

livekit-agents/livekit/agents/voice/room_io/room_io.py Outdated Show resolved Hide resolved

theomonnom reviewed May 1, 2025

View reviewed changes

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py Outdated Show resolved Hide resolved

theomonnom reviewed May 1, 2025

View reviewed changes

livekit-agents/livekit/agents/voice/room_io/_pre_connect_audio.py Outdated Show resolved Hide resolved

theomonnom reviewed May 1, 2025

View reviewed changes

livekit-agents/livekit/agents/voice/room_io/_input.py Outdated Show resolved Hide resolved

longcw added 2 commits May 1, 2025 19:35

Merge remote-tracking branch 'origin/main' into longc/pre-connect-aud…

420f153

…io-clean

update logs and timeout

5dc0b23

bcherry reviewed May 2, 2025

View reviewed changes

longcw added 3 commits May 5, 2025 09:39

Merge remote-tracking branch 'origin/main' into longc/pre-connect-aud…

439f377

…io-clean

update audio buffer timeout

c964dd3

pass publication to forward task

181fed8

lukasIO mentioned this pull request May 5, 2025

Bump protocol to v1.37.1 and add audio_features handling livekit/rust-sdks#634

Merged

longcw added 5 commits May 7, 2025 17:01

Merge remote-tracking branch 'origin/main' into longc/pre-connect-aud…

b3125e0

…io-clean

use track id as the buffer key

fb6aefd

check AudioTrackFeature TF_PRECONNECT_BUFFER

ed6c8c1

Merge remote-tracking branch 'origin/main' into longc/pre-connect-aud…

8e733f9

…io-clean

ruff

cc04cb1

upgrade livekit sdk

2b73403

pblazej reviewed May 9, 2025

View reviewed changes

pblazej approved these changes May 9, 2025

View reviewed changes

pblazej mentioned this pull request May 12, 2025

Pre-connect final API livekit/client-sdk-swift#685

Merged

longcw added 2 commits May 13, 2025 09:33

Merge remote-tracking branch 'origin/main' into longc/pre-connect-aud…

ba90cf5

…io-clean

set pre_connect_audio default True

73fe674

bcherry approved these changes May 13, 2025

View reviewed changes

longcw added 4 commits May 13, 2025 23:29

remove pre connect audio buffer example

9086ba1

log warning for connection order only the buffer used

56e278e

set default timeout for pre-connect buffer to 3s

cc093f5

Merge remote-tracking branch 'origin/main' into longc/pre-connect-aud…

c4308f1

…io-clean

longcw merged commit 45987b6 into main May 15, 2025
14 of 20 checks passed

longcw deleted the longc/pre-connect-audio-clean branch May 15, 2025 01:35

jayesh-mivi pushed a commit to mivi-dev-org/custom-livekit-agents that referenced this pull request Jun 4, 2025

add pre-connected audio buffer (livekit#2171)

8bd91ed

		pre_connect_audio_timeout: float = 5.0
		"""The pre-connect audio will be ignored if it doesn't arrive within this time."""

add pre-connected audio buffer #2171

add pre-connected audio buffer #2171

Uh oh!

Conversation

longcw commented May 1, 2025

Uh oh!

github-actions bot commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Invalid Changeset Format Detected

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

theomonnom May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theomonnom May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bcherry May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pblazej May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pblazej commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

longcw commented May 8, 2025

Uh oh!

pblazej May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

github-actions bot commented May 1, 2025 •

edited

Loading

theomonnom May 1, 2025 •

edited

Loading

theomonnom May 1, 2025 •

edited

Loading

bcherry May 2, 2025 •

edited

Loading

pblazej May 5, 2025 •

edited

Loading

pblazej commented May 5, 2025 •

edited

Loading

pblazej May 9, 2025 •

edited

Loading