Skip to content

Conversation

@shikokuchuo
Copy link
Member

Summarising the changes in this PR:

  • Uses nanonext::write_stdout() rather than cat() for robustness and efficiency (I happened to implement this a couple of days ago for something else!)
  • For send_aio() temporarily assign the aio object to prevent finalisers being run too early (ultimately we may decide to make these synchronous)
  • Avoids the overhead of R serialisation altogether as we're dealing with json strings throughout.
  • Open stdin in blocking mode to avoid the potentially buggy behaviour that @wch found on MacOS. It's safe to do this, as we only read when there's activity detected on the FD. There's possibly a more robust solution doing it from C, which we can substitute in later if we find it.

Copy link
Collaborator

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for such clearly delineated commits + the thorough description. Very much appreciated.

With this PR, I'm currently seeing an indefinite hang once a tool request is made. When I start the server before the proxy (i.e. mcp_serve() and then launch Claude Desktop), I intermittently see:

Warning: data could not be converted to a character string
<simpleError: parse error: trailing garbage
                                    42 0a 03 00 00 00 00 05 04 00 00 0
                     (right here) ------^

...in my server R session.

@shikokuchuo
Copy link
Member Author

Did you install the package (as opposed to load_all) and restart Claude etc.? 😏 That seems to be a mismatch between the server and proxy versions, as I switched from using serialisation to not.

@simonpcouch
Copy link
Collaborator

Able to reproduce the issue on main and with a commit from last week so I've clearly botched something in my dev environment😆 Will go ahead and merge and try to revisit my own setup later today.

@simonpcouch simonpcouch merged commit c4bd3f8 into posit-dev:main Apr 30, 2025
6 checks passed
@shikokuchuo shikokuchuo deleted the dev branch April 30, 2025 13:56
@wch
Copy link
Collaborator

wch commented Apr 30, 2025

@shikokuchuo I'm curious about the comment about send_aio() and finalizers: what is the potential problem, and is it something that users of send_aio() in general need to worry about?

@shikokuchuo
Copy link
Member Author

@shikokuchuo I'm curious about the comment about send_aio() and finalizers: what is the potential problem, and is it something that users of send_aio() in general need to worry about?

send_aio() returns an Aio object. Theoretically if garbage collection occurs immediately after the call, then the external pointer finalizer may free the C-level aio before it's completed the send. I don't actually know if this happens or not -previously I thought I'd come across this, but there may have been other confounding factors.

As our loops here don't seem to be that tight, just keeping the reference should be safe enough. 100% safe usage would be to wait for the result e.g. by send_aio()[]. Practically this wouldn't add much time as it returns as soon as the message is accepted by the system socket.

This also means that we can mostly replace with a send() as this doesn't actually block in most cases. I've added a TODO in the code - I didn't want to make a change until I'd investigated the above.

@wch
Copy link
Collaborator

wch commented Apr 30, 2025

send_aio() returns an Aio object. Theoretically if garbage collection occurs immediately after the call, then the external pointer finalizer may free the C-level aio before it's completed the send. I don't actually know if this happens or not -previously I thought I'd come across this, but there may have been other confounding factors.

That sounds dangerous. Maybe nanonext should keep references to these objects until the send happens?

@shikokuchuo
Copy link
Member Author

That sounds dangerous. Maybe nanonext should keep references to these objects until the send happens?

Sorry I should have been clearer - it's not dangerous in the way you're thinking. It implicitly stops the aio safely before deallocation happens, so at worst the send fails. It'd be nice to ensure it completes, but that would involve the finaliser waiting, and I wouldn't want to block the garbage collector.

@wch
Copy link
Collaborator

wch commented May 1, 2025

It still sounds like it could be the source of surprising and hard-to-debug problems -- I have dealt with surprising and random-looking errors related to GC in the past and it is not fun!

If someone calls send_aio() without saving a reference to the return value, they should be able to be confident that the message won't get lost if a GC happens at an unlucky time. Maybe nanonext could keep a reference to the object until the send actually happens?

@shikokuchuo
Copy link
Member Author

I hear what you're saying. Previously I'd probably have said 'user error' in not keeping a reference, but I agree that eliminating this potential failure point makes for more robust (and predictable) behaviour. I'm opening a PR at nanonext that amends this, thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants