SOLR-18060: Add Prometheus metrics to CrossDC Consumer. #4063

sigram · 2026-01-19T14:10:40Z

This PR replaces Dropwizard JSON metrics with Prometheus metrics in the CrossDC Consumer, using directly the Prometheus client_java API. It also removes the Dropwizard dependency.

mlbiscoc

Can you post a sample of all these metrics? Either dump it here or in a txt file? It would be easier to review the names and labels on the metrics.

mlbiscoc · 2026-01-20T15:13:21Z

gradle/libs.versions.toml

 ow2-asm-tree = { module = "org.ow2.asm:asm-tree", version.ref = "ow2-asm" }
 # @keep transitive dependency for version alignment
 perfmark-api = { module = "io.perfmark:perfmark-api", version.ref = "perfmark" }
+prometheus-metrics-core = { module = "io.prometheus:prometheus-metrics-core", version.ref = "prometheus-metrics" }


You should be able to do this with OTEL sdk/api which also offers prometheus right out the box. Any reason you just went with re-introducing this prometheus dependency? This is way better than the dropwizard -> prometheus bridge so not going to say this is a blocker. Just having alignment would be nice.

mlbiscoc · 2026-01-20T15:25:06Z

solr/cross-dc-manager/src/test/org/apache/solr/crossdc/manager/SolrAndKafkaIntegrationTest.java

+
+    client.commit(COLLECTION);
+
+    System.out.println("Sent producer record");


Use log.info instead or just don't print this. Doesn't seem like we need it.

It's a copy-paste from other test in this suite, I'll clean up all of these.

mlbiscoc · 2026-01-20T15:31:29Z

solr/cross-dc-manager/src/java/org/apache/solr/crossdc/manager/consumer/PrometheusMetrics.java

+
+    outputBatchSizeHistogram =
+        Histogram.builder()
+            .name("consumer_output_batch_size_histogram")


You can drop the histogram suffix. # TYPE will have the type that it is a histogram and it gets appended with bucket which is the only suffix this will need. Same with ones below.

mlbiscoc · 2026-01-20T15:35:10Z

solr/cross-dc-manager/src/java/org/apache/solr/crossdc/manager/consumer/PrometheusMetrics.java

+
+    outputTimeHistogram =
+        Histogram.builder()
+            .name("consumer_output_time_histogram")


What are these times? Milliseconds? Instead of histogram suffix, append the unit of time. Also I would confirm if the buckets of these histograms are useful for whatever the time unit is.

mlbiscoc · 2026-01-20T15:36:52Z

solr/cross-dc-manager/src/java/org/apache/solr/crossdc/manager/consumer/PrometheusMetrics.java

+        Counter.builder()
+            .name("consumer_input_total")
+            .help("Total number of input messages")
+            .labelNames("type", "subtype")


I question most of these metrics really need type label. What is the cardinality of it and possible different combinations? I see in the test UPDATE is one. Is there also QUERY or something along those lines?

Yes, there's ADMIN and CONFIGSET.

Hmmm ok. I am not a fan of naming this label being called type. I think it should have some kind of context what it means as type and subtype can be very generic. Is it an operation or message_type maybe? Then what can subtype be? In core, I made it category but it is debateable if we should just remove that label/attribute all together from metrics. If you move type to something more specific then maybe you can just move off subtype to type. Again seeing an sample text output of these metrics would help if you can.

mlbiscoc · 2026-01-20T15:53:26Z

...cross-dc-manager/src/java/org/apache/solr/crossdc/manager/consumer/KafkaCrossDcConsumer.java

@@ -408,19 +408,19 @@ boolean pollAndProcessRequests() {
              List<SolrInputDocument> docs = update.getDocuments();
              if (docs != null) {
                updateReqBatch.add(docs);
-                metrics.counter(MetricRegistry.name(type.name(), "add")).inc(docs.size());
+                metrics.incrementInputCounter(type.name(), "add");


This and the metric below are only incrementing by 1 instead of doc.size now. Is that right?

mlbiscoc · 2026-01-20T15:55:24Z

solr/test-framework/src/java/org/apache/solr/util/SolrKafkaTestsIgnoredThreadsFilter.java

@@ -45,6 +45,11 @@ public boolean reject(Thread t) {
      return true;
    }

+    // Prometheus Scheduler doesn't provide any method to shut down its worker threads
+    if (t.isDaemon()) {


Won't this filter every daemon thread? Does the thread have something like prometheus in its name?

mlbiscoc · 2026-01-20T16:00:58Z

solr/cross-dc-manager/src/java/org/apache/solr/crossdc/manager/consumer/PrometheusMetrics.java

+  public void recordOutputFirstAttemptSize(MirroredSolrRequest.Type type, long firstAttemptTimeNs) {
+    outputFirstAttemptHistogram.labelValues(type.name()).observe(firstAttemptTimeNs);
+  }


This says "size" but it recording a time? But also this is just recording a first attempt time based on the name. Does that mean this records literally one time for this metric and never observed again? If so, I debate if these metrics are worth having. Metrics are for aggregation across a time series and if this records just once, then I think just logging has more value.

mlbiscoc · 2026-01-20T16:03:35Z

...cross-dc-manager/src/java/org/apache/solr/crossdc/manager/consumer/KafkaCrossDcConsumer.java

              }
              List<String> deletes = update.getDeleteById();
              if (deletes != null) {
                updateReqBatch.deleteById(deletes);
-                metrics.counter(MetricRegistry.name(type.name(), "dbi")).inc(deletes.size());
+                metrics.incrementInputCounter(type.name(), "dbi");


I think it is better to actually have this say delete_by_id instead of dbi. Same with below. Feel free to disagree.

sigram added 3 commits January 15, 2026 11:47

WIP.

a99dd06

SOLR-18060: Add Prometheus metrics to CrossDC Consumer.

aff9064

Add unit test, fix some bugs.

a42205a

sigram requested a review from mlbiscoc January 19, 2026 14:10

github-actions bot added dependencies Dependency upgrades test-framework tool:build tests labels Jan 19, 2026

sigram mentioned this pull request Jan 19, 2026

SOLR-18060: CrossDC Consumer - add Prometheus metrics. #4044

Closed

sigram added 4 commits January 20, 2026 13:25

Tidy.

5eef271

Add license and notice.

c569ff6

Fix a logging call.

82531d5

Remove unused licenses.

a7eee83

mlbiscoc reviewed Jan 20, 2026

View reviewed changes


		client.commit(COLLECTION);

		System.out.println("Sent producer record");

SOLR-18060: Add Prometheus metrics to CrossDC Consumer. #4063

Are you sure you want to change the base?

SOLR-18060: Add Prometheus metrics to CrossDC Consumer. #4063

Uh oh!

Conversation

sigram commented Jan 19, 2026

Uh oh!

mlbiscoc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants