-
Notifications
You must be signed in to change notification settings - Fork 132
Add explicit failure when agent disconnects from the room #1228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This should hopefully make the agents sdk more robust to cases like a local agent server being terminated, a hosted agent leaving a call due to it being connected longer than the drain threshold, etc. It still is an open question to a degree how we can / should disambiguate between purposeful agent disconnects and outright crashes. It doesn't seem like today there is enough metadata to do this so right now I am assuming all non user triggered disconnects to be of the "crash" variety.
🦋 Changeset detectedLatest commit: 1087076 The changes in this PR will be included in the next version bump. This PR includes changesets to release 5 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
size-limit report 📦
|
| }; | ||
| }, [room]); | ||
|
|
||
| // If the agent participant disconnects in the middle of a conversation unexpectedly, mark that as an explicit failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to rethink the disambiguation here, IMO this is fine for single agent use case (no "handoff", (re)joining, etc.)
|
Don't wanna repeat myself: either going deeper with this or removing absent agent from the picture and moving this problem to the consumer. I'm fine with both, we just need to make sure that we may not be able to cover/predict all cases in the 1st approach. |
|
Digging around in the python agents sdk, I found that there is a mechanism used to indicate whether it shutting down intentionally or not to the SFU via a |
pblazej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the discussion(s), let's treat disconnection as unconditional failure for now 👍
lukasIO
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
- Adds more computed `is_` props to the (private for now) agent state - JS livekit/components-js#1231 - `buffering` state was already handled inside the `.connecting` enum case (associated value is less error-prone) - Adds disconnected == failed concept - JS livekit/components-js#1228 - Checks actual dispatch response when waiting for the agent - JS livekit/components-js#1226
This should hopefully make the agents sdk more robust to cases like a local agent server being terminated, a hosted agent leaving a call due to it being connected longer than the drain threshold, etc.
It still is an open question how we can / should disambiguate between purposeful agent disconnects and outright crashes. It doesn't seem like today there is enough metadata to do this, so right now I am assuming all non user triggered disconnects to be of the "crash" variety.
I think if the metadata to disambiguate this properly becomes available, then it would be worth making this difference clear somehow - maybe purposeful disconnects go into
disconnected(and maybe there's an event that is emitted that a user can tap into and reset their app's connection state since right now that would be managed out of band), and crashes retain thisfailedstatus with the reason.