Skip to content

Conversation

@longcw
Copy link
Contributor

@longcw longcw commented May 1, 2025

replace #2156

@github-actions
Copy link
Contributor

github-actions bot commented May 1, 2025

❌ Invalid Changeset Format Detected

One or more changeset files in this PR have an invalid format. Please ensure they adhere to:

  • Start with --- and include a closing --- on its own line.
  • Each package line must be in the format:
    "package-name": patch|minor|major
  • No duplicate package entries allowed.
  • A non-empty change description must follow the front matter.

Error details:
.github/next-release/changeset-c58eb3b3.md: Failed to read file from git branch 'pr_head'.

Comment on lines 88 to 95
logger.debug(
"pre-connect audio connected",
extra={
"sample_rate": sample_rate,
"num_channels": num_channels,
"participant": participant_id,
},
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this log, we're not very verbose for other stuff, it doesn't seems important

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR where I have a LK_DEBUG=1 or LK_DEBUG=2, ... env var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to know the buffer is used or not without any logs? I also saw the case there was the byte stream received but never closed. We will never know that happened without log...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add warnings for the timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I agree with that logs that print multiple times is annoying, but for this one it's a one time log, and I am wondering if we try to avoid debugging logs, how ppl to check if a component works and get the debug info when they run it in dev mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, replaced this one with a timeout warning

Comment on lines 76 to 77
pre_connect_audio_timeout: float = 5.0
"""The pre-connect audio will be ignored if it doesn't arrive within this time."""
Copy link
Member

@theomonnom theomonnom May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the timeout? ideally we know if we should wait for it based on the attributes. (So exposing a configuration for it isn't very useful, we can have a hard limit in the worst scenario)

Suggested change
pre_connect_audio_timeout: float = 5.0
"""The pre-connect audio will be ignored if it doesn't arrive within this time."""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the byte stream is not stable in my test, a short timeout is needed IMO. otherwise user will need to wait for tens of seconds to get the response from the agent. I think it's better expose this to user if they want to decrease or increase the value based on needs.

Copy link
Member

@theomonnom theomonnom May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my intuition is that it should just work OOTB since it's going to be a features implemented in all our client SDKs, so this value could just be internal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer to expose this option, to let users know that enabling this feature is not free, there is a timeout if the buffer is missing

Copy link
Contributor

@bcherry bcherry May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a client side bug? If the attribute is set that means the client will send the stream. if the stream never comes that's definitely a bug and we should fix it. do you have more info on what you mean by "the byte stream is not stable in my test"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the configurable timeout still needed with the other fixed to SFU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's not needed? There is still a chance we cannot receive the buffer bc of the network right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just asking back to @theomonnom 's original question: do users need to think about this or can it just be internal? it's confusing anyways because it's not clear from the name if this covers the time from agent start to stream arrive or stream end. it actually turns out it's just the time from stream header to stream trailer.

no strong feelings here just re-raising the original question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to stream end. Sometimes stream head received but the tail lost, we don't rely on the stream head for anything.

I still think it's better to expose the timeout option, mainly to notify the user there is a timeout if the buffer is not arrived in time, if the buffer lost there will be a side effect.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tail seems to be correlated with the way of sending (frame-by-frame vs max stream chunk size).


@utils.log_exceptions(logger=logger)
async def _read_audio_task(self, reader: rtc.ByteStreamReader, participant_id: str):
if (fut := self._buffers.get(participant_id)) and fut.done():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasIO @pblazej are we planning to send the track id on the stream attributes as well? would be good to use that as the key here if so (assuming we're using the track features to drive the flag for this feature rather than a setting on roomio)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any technical limitations here, ofc @lukasIO open for feedback as always 😸

image

Copy link

@pblazej pblazej May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, I'm mostly struggling with managing the Track lifecycle here reliably, as previously it was just created during connect(), now we need to maintain its identity without introducing any surprises (e.g. record some other track, publish on the newly created one, audio engine state, etc.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcherry yeah, that's what I meant by #2156 (comment)
the problem is that we don't have a track sid before publication time on the publishing client, so we'd need to use e.g. the mediastreamtrack id instead.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't have a track sid before publication time on the publishing client

@lukasIO not sure if I get it - the stream attributes are set just before sending (on iOS) vs track attribute will be set while publishing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I wanted to decouple track buffer (depends on LocalTrack) from depending on LocalTrackPublication's state.

It would be a bit faster if we don't have to wait for the track publish response and instead start sending the byte stream already in parallel to the track publication request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something along the lines of this (still WIP)

the promise to send the byte stream is not awaited so the add track request can start in parallel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about it a potential downside of my idea is that we might have a short gap in audio between audio buffer that's sent until the publication is sending valid data :/
might be worse than the additional time waiting for the response, but would need some comparison testing to make sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you shouldn't send the byte stream until after the remote participant has subscribed to your track. that also gives the agents framework a good time to start counting for a buffer timeout (the moment the subscription occurs)

@pblazej
Copy link

pblazej commented May 5, 2025

iOS PR (WIP) livekit/client-sdk-swift#685 @longcw for your testing 🧪

@longcw
Copy link
Contributor Author

longcw commented May 8, 2025

waiting for a new release from livekit/python-sdks#433

participant_identity: NotGivenOr[str] = NOT_GIVEN
"""The participant to link to. If not provided, link to the first participant.
Can be overridden by the `participant` argument of RoomIO constructor or `set_participant`."""
pre_connect_audio: bool = False
Copy link

@pblazej pblazej May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we enable that by default (or remove the flag)? There shouldn't be any significant penalty for doing so.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion with @bcherry - should we just let the client decide?

Copy link

@pblazej pblazej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the Swift example instabilities are caused by sending the stream "too early" as discussed.

I'm fine with unblocking this PR, I wasn't able to break it yet 🤞

# The process works in three steps:
# 1. RoomIO is set up with pre_connect_audio=True
# 2. When connecting to the room, the client sends any audio spoken before connection
# 3. This pre-connection audio is combined with new audio after connection is established
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider just omitting the specific example and ensuring the other examples are compatible instead

@bcherry
Copy link
Contributor

bcherry commented May 14, 2025

@longcw are there any remaining blockers to merge + release?

@longcw longcw merged commit 45987b6 into main May 15, 2025
14 of 20 checks passed
@longcw longcw deleted the longc/pre-connect-audio-clean branch May 15, 2025 01:35
jayesh-mivi pushed a commit to mivi-dev-org/custom-livekit-agents that referenced this pull request Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants