Skip to content

Conversation

Aklakan
Copy link
Contributor

@Aklakan Aklakan commented Sep 23, 2025

GitHub issue resolved #3471

This is the async HTTP part factored out from #3184 .

Pull request Description:

  • Update HTTP(s) machinery to use async requests. This allows for QueryExecHTTP.abort() to immediately return even if the server has not yet responded to the connection request.
  • Fixes resource leak in Service.java
  • Adds (nullable) getUpdateString and getUpdateRequest methods to UpdateProcessor so that the underlying request may be exposed.

  • [ ] Tests are included (no additional tests added)
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

@Aklakan Aklakan force-pushed the 20250922_async-http branch from 6380a1c to 8a59aba Compare September 23, 2025 12:45
Copy link
Member

@rvesse rvesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making some of the HTTP stuff Async, though I wonder what the knock on effect will be on end users as I know some users in the past, myself included, have used Jena's HTTP helpers directly, so the HttpLib changes in particular may have some unexpected side effects for downstream users.

There seems to be some leftover debugging code around exception handling that looks like it needs tidying up

@Aklakan
Copy link
Contributor Author

Aklakan commented Sep 23, 2025

[...] the HttpLib changes in particular may have some unexpected side effects for downstream users.

The main change should be only that the original sync methods now internally wait for the completion of a CompletableFuture. My feeling is that maintaining an additional non-async implementation based on the sync java.net.http API might eventually cause issues as it duplicates code that may diverge. The original test cases pass that way, but I am not sure how well tested this part is. Perhaps your previous use of the API could be added as (integration) tests?

@rvesse
Copy link
Member

rvesse commented Sep 24, 2025

[...] the HttpLib changes in particular may have some unexpected side effects for downstream users.

The main change should be only that the original sync methods now internally wait for the completion of a CompletableFuture. My feeling is that maintaining an additional non-async implementation based on the sync java.net.http API might eventually cause issues as it duplicates code that may diverge. The original test cases pass that way, but I am not sure how well tested this part is.

Totally agree, and no problems with this general approach

Perhaps your previous use of the API could be added as (integration) tests?

That was many years ago now and against a much older Jena version, the code in question has bit-rotted horribly. These days I use other APIs for HTTP stuff rather than leaning on Jena's APIs.

This might be a non-issue, just something to highlight and make sure we communicate to users in the release this eventually lands in.

@afs
Copy link
Member

afs commented Sep 24, 2025

It is already async!

The JDK implementation of HttpClient.send (HttpClientImpl.send) is a call to HttpClientImpl.sendAsync then CompletableFuture.get with some exception management around that.

General use of HTTP/2 and HTTP/3 makes it necessary because they both multiplex the underlying network connection.

@Aklakan Aklakan force-pushed the 20250922_async-http branch 4 times, most recently from 017f631 to 791fa78 Compare September 25, 2025 13:35
@Aklakan
Copy link
Contributor Author

Aklakan commented Sep 25, 2025

It is already async!

Right but that's the JDK internals. For clarification: Jena's HttpLib.executeJDK relied on HttpClient.send and this PR changes this and HttpLib's surrounding logic to HttpClient.sendAsync.

I removed the breadcrumb exceptions (the clean way to retain the breadcrumbs seems to be wrapping with custom checked exceptions).

@Aklakan Aklakan force-pushed the 20250922_async-http branch 3 times, most recently from 91e2943 to 30b70dd Compare September 25, 2025 15:38
}
// Note: Can't reuse AsyncHttpRDF.handleRuntimeException because of this HttpException.
throw new HttpException(httpRequest.method()+" "+httpRequest.uri().toString(), cause);
} else if (cause instanceof RuntimeException re) {
Copy link
Member

@afs afs Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is a change to exception handling in the sync case. (I hope I read the code correctly.) Does this matter?

AsyncHttpRDF.getOrElseThrow passes up runtime exceptions untouched. That means the app gets the exception from the async thread and a stacktrace does not indicate where in the application code their synchronous call was made.

HttpClientImpl.send builds an exception with the other threads exception as the cause. The app gets an exception with the app stack. When printed, the stacktrace identifies the app code that makes the sync call.

The HttpClientImpl.send exception rebuild is a bit yuk - it has to enumerate the possible classes.

We could have a HttpException, even HttpAsyncException < HttpException, as a carrier to the app call stack making it robust against any future exception introduced by the JDK library. The change would be here, although in AsyncHttpRDF.getOrElseThrow is a possibility as well.

Copy link
Contributor Author

@Aklakan Aklakan Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means the app gets the exception from the async thread and a stacktrace does not indicate where in the application code their synchronous call was made.

Yes, that's the issue.

I think your suggestion to wrap an exception with the already existing unchecked HttpException is the clean way forward. Consolidating this exception handling in AsyncHttpRDF.getOrElseThrow seems reasonable.

A variant of getOrElseThrow with an additional HttpRequest argument could be used to add the request URIs to the exceptions (which is what happens in HttpLib):

throw new HttpException(httpRequest.method()+" "+httpRequest.uri().toString(), cause);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code was revised and is ready for review.

@Aklakan Aklakan force-pushed the 20250922_async-http branch 3 times, most recently from 2638547 to bf0cf8d Compare October 1, 2025 14:01
@Aklakan Aklakan force-pushed the 20250922_async-http branch from bf0cf8d to 45b75b8 Compare October 1, 2025 14:11
@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 1, 2025

I have completed my revisions:

  • There is now a unified AsyncHttpRDF.getOrElseThrow(CompletableFuture<T> cf, HttpRequest httpRequest) method that wraps any exception as an HTTPException and in some cases also adds information from the optional httpRequest argument. Existing references were updated to use this method.
  • QueryExecHTTP.abort() no longer closes the physical HTTP InputStream but just a wrapper. Only the close() method (assumed to be called from the thread that started the query exec) will close the physical input stream and thereby consume any remaining data.

@Aklakan Aklakan force-pushed the 20250922_async-http branch from 45b75b8 to 1f466ee Compare October 1, 2025 14:22
Comment on lines 678 to 731
this.retainedConnectionView = new ProxyInputStream(in) {
@Override
protected void beforeRead(int n) throws IOException {
checkNotAborted();
super.beforeRead(n);
}
Copy link
Contributor Author

@Aklakan Aklakan Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note: This is still not the nicest solution because it pulls the rug from under the parsers' feet - but the original solution also just closed the physical HTTP input stream.
Now abort() just sets a flag (instant action) that causes ProxyInputStream's next read to fail - whereas only close() closes the physical stream (may take time to consume remaining data).
If really needed, further revisions could go into future PRs.

@afs
Copy link
Member

afs commented Oct 1, 2025

  • QueryExecHTTP.abort() no longer closes the physical HTTP InputStream but just a wrapper. Only the close() method (assumed to be called from the thread that started the query exec) will close the physical input stream and thereby consume any remaining data.

As long as someone ™️ does an exhaustive close to the HTTP connection.

@afs
Copy link
Member

afs commented Oct 1, 2025

One conflict otherwise looks good.

@Aklakan Aklakan force-pushed the 20250922_async-http branch from 1f466ee to 41d7165 Compare October 1, 2025 15:35
@Aklakan
Copy link
Contributor Author

Aklakan commented Oct 1, 2025

Conflict resolved.

@Aklakan Aklakan force-pushed the 20250922_async-http branch from 41d7165 to f66ace2 Compare October 1, 2025 15:49
@afs afs merged commit f903806 into apache:main Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mitigate cases where QueryExecHTTP.abort() hangs.
3 participants