-
Notifications
You must be signed in to change notification settings - Fork 806
SOLR-17436: Create a v2 equivalent for /admin/metrics #4057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add interface MetricsApi and class GetMetrics. Add MetricsUtil to merge the List<MetricSnapshot> in either MetricsHandler or GetMetrics. Implementations in MessageBodyWriters: not sure if needed. Note in SplitShardCmd: this appears to have been broken before. The Solr response might always be null now. More testing and cleaning-up to do.
Fix compilation and add unit tests
Adjust documentation. Set ResponseParser in MetricsRequest constructor. Clean-up.
Last clean-up after testing proxyToNodes() on a local cluster.
add changelog, and commit result of tidy
…olr.git into SOLR-17436-V2-Metrics-API
gerlowskija
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was very excited to see this PR @igiguere , thanks for putting this together!
I've only reviewed about half the PR but wanted to publish what I have so far.
Everything looks great overall so far, though there were a few small things I think will need tweaked that I've tried to call out inline. Lmk if any of those comments are unclear or you disagree and I'll try to respond when I come back for the second half later!
| summary = "Retrieve metrics gathered by Solr.", | ||
| tags = {"metrics"}, | ||
| extensions = { | ||
| @Extension(properties = {@ExtensionProperty(name = RAW_OUTPUT_PROPERTY, value = "true")}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[+1] Just came here from SOLR-17436 where you left a comment or two about struggling to get the template-generated code to compile. Glad you were able to find "StreamingOutput" and this property, that's exactly the right approach!
| * | ||
| * <p>This class could be used if a json output is ever needed again? | ||
| */ | ||
| public class MetricsResponse extends SolrJerseyResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-0] I'll miss our metrics' JSON support, but if there's no active plan to support that then IMO we should take this file out of the PR 😦 . We can always resurrect it later from this PR if adding JSON support comes back up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll remove it. If Json is needed again, it's not a big file to re-create.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a path where we will bring JSON metrics back unless some popular format for reading metrics in such a way happens with OTEL or we for some reason need to bridge JSON metrics again. I'd rather avoid that latter though.
| Number size = (Number) rsp.getResponse()._get(List.of("metrics", indexSizeMetricName), null); | ||
| if (size == null) { | ||
| log.warn("cannot verify information for parent shard leader"); | ||
| log.warn("missing index size information for parent shard leader"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[+1] Great little logging clarification; thanks!
| SolrException.ErrorCode.INVALID_STATE, "SolrMetricManager instance not initialized"); | ||
| } | ||
|
|
||
| SolrParams params = solrQueryRequest.getParams(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-1] Solr's v1 APIs rely on "wt" to indicate what response should be sent to users, but in our v2 APIs we're trying to use the more HTTP standard "Accept" header for this sort of response-format negotiation thing.
(In fact, we have a "pre-request filter" that runs before any v2 API that attempts to convert any user-provided "wt" params into an "Accept" header equivalent so that our v2 code only has to check the header.)
So, rather than fetching "wt" here out of SolrQueryRequest, IMO we should add a parameter to this method representing the accept header, and then key off of that instead. Something like the code below would work I think:
@HeaderParam("Accept") String acceptHeader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
God to know about the pre-request filter. I'll fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case see https://prometheus.io/docs/instrumenting/content_negotiation/#accept-header-construction for the accept header for prometheus vs openmetrics. I had done a content type and accept header for OpenMetrics here. I guess this is a good time to actually do it for prometheus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mlbiscoc :
I found the Prometheus and OpenMetrics MediaTypes in class PrometheusResponseWriter. That was probably you ;) Thanks.
| return Set.of(); | ||
| } | ||
|
|
||
| List<String> paramSet = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Could this be a Set to save on the "copyOf" later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-exisiting code, move here, but, might as well improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was me who put it there originally probably. Please improve anything you think can use it. Thank you!
|
|
||
| SolrParams params = solrQueryRequest.getParams(); | ||
|
|
||
| Set<String> metricNames = MetricsUtil.readParamsAsSet(params, MetricsUtil.METRIC_NAME_PARAM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-1] One of the goals of the JAX-RS framework is to make the inputs and outputs of each API as clear as possible.
It'll undermine the code re-use you were trying to get out of the "MetricsUtil" stuff (which I very much appreciate!), but IMO we need to declare this METRIC_NAME_PARAM input in the method signature. This makes the code here easier to read as folks know at a glance what the inputs are, and it ensures that the OpenAPI spec (and the Java, Python, Javascript, etc. clients generated from that) know about the parameters this API expects.
Something like:
@QueryParam("metricNames") String[] metricNames
should work I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, should work, for that and other params. It will need some re-writing because the method to implement will have params, but I'll do that.
gerlowskija
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, finished my review. Left a few more in-line comments, though things look very good overall. Just a few smaller things I think we'll need to change.
Let me know if you have any questions or I can help with anything!
| register(MessageBodyWriters.XmlMessageBodyWriter.class, 5); | ||
| register(MessageBodyWriters.CsvMessageBodyWriter.class, 5); | ||
| register(MessageBodyWriters.RawMessageBodyWriter.class, 5); | ||
| // Not sure if required. Ref. org.apache.solr.handler.admin.api.GetMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[+1] This is definitely correct and needed; feel free to remove this comment and the others like it in MessageBodyWriters.java
| import org.apache.solr.common.util.StrUtils; | ||
|
|
||
| /** Utility methods for Metrics */ | ||
| public class MetricsUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] I'm doing a good bit of skimming over this class in my review, as (afaict) it's mostly code that preexists your PR and that is only being moved from MetricsHandler.
Is that correct? Are there particular changes or sections of this file that I should pay closer attention to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all pre-existing code that was moved.
| writeMetricSnapshots(out, request, snapshots); | ||
| } | ||
|
|
||
| /** Write MetricSnapshots in Prometheus or OpenMertics format */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] Typo: "OpenMertics" -> "OpenMetrics"
| StreamingOutput output = get.getMetrics(); | ||
| Assert.assertNotNull(output); | ||
|
|
||
| Path tmpFile = Files.createTempFile(outputPath, "test-", "-GetMetricsDefault"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-1] SolrTestCaseJ4 inherits "createTempDir" and "createTempFile" methods that will track the files you've created in your test and clean them up for you at the end. Would you mind using those for file creation instead? They tend to be a little safer as they'll make sure the file is still cleaned up even if an exception is thrown in your test method, an assertion fails, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!
| SolrQueryRequest req = new SolrQueryRequestBase(h.getCore(), new ModifiableSolrParams()) {}; | ||
| SolrQueryResponse resp = new SolrQueryResponse(); | ||
|
|
||
| GetMetrics get = new GetMetrics(cc, req, resp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-0] Is there a reason you had this test instantiate the "GetMetrics" class and invoke it directly, as opposed to, say, having the test start a Solr node that we can send an HTTP request to?
Most of our tests take that latter approach. It's not always the right one, but it has the benefit of testing a wider chunk of functionality. For example, as-is this test doesn't really validate that the new v2 API looks the way we'd expect it to (available at the expected HTTP method and path, takes the query-params that we expect, etc.). It doesn't validate any the SolrJ classes generated from the annotations in MetricsApi, etc.
I don't think we have to switch necessarily, but if we're going the less common route there should probably be a reason or some other plan for testing those other aspects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that at least some of the tests for the JAX-RS V2 APIs (GetSchemaAPITest, RelaodCoreAPITest, maybe others) also instantiate the API implementation class, and invoke it directly.
To send a HTTP request, the most useful classes and methods are deprecated, and in the process of being changed (there's another PR for that), so adding more tests using these methods would mean more refactoring at the time of merging this.
I'll see what I can do without using the existing deprecated HTTP request helpers... But, if I have too much trouble, I'll leave the tests as they are.
|
|
||
| [NOTE] | ||
| ==== | ||
| The V1 `/admin/metrics` endpoint is deprecated and will be removed in a future version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-1] I wish : (
I hope to change this soon but for now our v2 API is still "experimental". We can't deprecate any of the v1 functionality until v2 has graduated from that designation. I'd drop this note in the docs.
| === OpenMetrics | ||
|
|
||
| OpenMetrics format is available from the `/admin/metrics` endpoint by providing the `wt=openmetrics` parameter or by passing the Accept header `application/openmetrics-text;version=1.0.0`. OpenMetrics is an extension of the Prometheus format that adds additional metadata and exemplars. | ||
| OpenMetrics format is available from the `/metrics` endpoint by providing the `wt=openmetrics` parameter or by passing the Accept header `application/openmetrics-text;version=1.0.0`. OpenMetrics is an extension of the Prometheus format that adds additional metadata and exemplars. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Huh - I'm surprised that these docs mention the "Accept" header even before you added a v2 endpoint in this PR. I didn't think anything on the v1 side looked at "Accept". I guess that's a good thing, I'm just surprised is all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had written something custom to check content-type but just for OpenMetrics but not prometheus. See my comment above.
|
|
||
| [source,text] | ||
| http://localhost:8983/solr/admin/metrics?collection=foobar | ||
| http://localhost:8983/api/metrics?collection=foobar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[-0] Hmm, a lot of these little examples reference query parameters that I didn't notice when reviewing GetMetrics above: "collection", "category", "core", etc.
Assuming I just overlooked those and they are valid inputs to the v2 API, then they should probably each get a method parameter in the MetricsApi.getMetrics(...) signature. See this comment above where I described this in a bit more detail for the metricNames param
|
|
||
| /** Request to "/admin/metrics" */ | ||
| public class MetricsRequest extends SolrRequest<SolrResponse> { | ||
| public class MetricsRequest extends SolrRequest<InputStreamResponse> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Why change the SolrJ class that corresponds to the v1 metrics API? Your PR only modifies the v2 API, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V1, since Solr 10, returns a stream (prometheus or openmetrics), not a SolrResponse (where I would expect a NamedList... right?).
If I recall, the only place in Solr code where MetricsRequest is used is a method somewhere in the CLI tools that stars with something like : if (true) return; The method is disabled, and there's a couple of comments about tickets to be done.
I think the type param InputStreamResponse makes it clearer that this is not a typical Solr response.
| scrape_configs: | ||
| - job_name: 'solr' | ||
| metrics_path: "/solr/admin/metrics" | ||
| metrics_path: "/api/metrics" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[0] I'm fine with switching the "path" in these examples to "/api/metrics" as you've done here, but it might be nice to have a little [NOTE] or some similar blurb that explains that the two paths "/api/metrics" and "/solr/admin/metrics" are synonymous, in case any users are confused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll re-write the existing note, so it says that the 2 paths are V1, V2 equivalents, and examples on the page show V2 only.
mlbiscoc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @igiguere for doing this! The timing was great as the metrics api just had a huge overhaul in 10 so getting up to date on V2 gets us closer to a more modern path.
| * | ||
| * <p>This class could be used if a json output is ever needed again? | ||
| */ | ||
| public class MetricsResponse extends SolrJerseyResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a path where we will bring JSON metrics back unless some popular format for reading metrics in such a way happens with OTEL or we for some reason need to bridge JSON metrics again. I'd rather avoid that latter though.
| SolrException.ErrorCode.INVALID_STATE, "SolrMetricManager instance not initialized"); | ||
| } | ||
|
|
||
| SolrParams params = solrQueryRequest.getParams(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case see https://prometheus.io/docs/instrumenting/content_negotiation/#accept-header-construction for the accept header for prometheus vs openmetrics. I had done a content type and accept header for OpenMetrics here. I guess this is a good time to actually do it for prometheus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a MetricUtils class under a util package and now MetricsUtil. Kinda awkward having both of these exist. I would merge or keep one or the other. I would get rid of one of them and merge these into one file. But also many of these utility methods are so specific to this metrics thing like merging the snapshots, I am not sure if someone would use these utility classes anywhere else and maybe just keeping it static in GetMetrics class works as well so MetricsHandler has access but not sure if this is a bad pattern to do. Up to you, but I definitely think at the minimum drop one of the Util classes.
| return Set.of(); | ||
| } | ||
|
|
||
| List<String> paramSet = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was me who put it there originally probably. Please improve anything you think can use it. Thank you!
| try (InputStream tmpIn = Files.newInputStream(tmpFile, StandardOpenOption.READ)) { | ||
| byte[] bytes = tmpIn.readNBytes(1024); | ||
| String str = new String(bytes, StandardCharsets.UTF_8); | ||
| // System.out.println(str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left in comment here and a few places. Lets clean it up
| } | ||
|
|
||
| @Test | ||
| public void testGetMetricsOpenMetrics() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my tests in TestPrometheusResponseWriter. For this OpenMetrics format, the easiest way I know if openmetrics format exists is if # EOF actually exists. Otherwise this test from what I can see would still pass for regular prometheus format. (They are very very similar and only way you'll see a major difference is if you turn on distributed trace as well. You can see that in TestMetricExemplars for that test but not saying you need to make something like that here.)
| === OpenMetrics | ||
|
|
||
| OpenMetrics format is available from the `/admin/metrics` endpoint by providing the `wt=openmetrics` parameter or by passing the Accept header `application/openmetrics-text;version=1.0.0`. OpenMetrics is an extension of the Prometheus format that adds additional metadata and exemplars. | ||
| OpenMetrics format is available from the `/metrics` endpoint by providing the `wt=openmetrics` parameter or by passing the Accept header `application/openmetrics-text;version=1.0.0`. OpenMetrics is an extension of the Prometheus format that adds additional metadata and exemplars. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had written something custom to check content-type but just for OpenMetrics but not prometheus. See my comment above.
Address review comments. Add query params to the API method. Fix implementation and tests. Adjust documentation and comments. Remove class MetricsResponse. TODO : AdminHandlersProxy only supports V1
|
@gerlowskija, @mlbiscoc gradlew clean check in progress, locally. |
https://issues.apache.org/jira/browse/SOLR-17436
Description
Create a JAX-RS V2 equivalent for /admin/metrics.
Solution
Add interface MetricsApi, and implementation GetMetrics, as an extension of JerseyResource (via AdminAPIBase).
The new full URL path is /api/metrics. The metrics response is a StreamingOutput.
Add class MetricsUtil to re-group logic used by both MetricsHandler and GetMetrics. Both classes get their information directly from SolrMetricManager.
Adjust documentation.
Tests
Added unit tests for the GetMetrics implementation of MetricsApi.
Functional tests on "devSlim", with a mini cluster of 2 nodes.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.