Break up the huge MetricsService interface#543
Merged
aditya1702 merged 9 commits intofeature/finalize-metricsfrom Mar 30, 2026
Merged
Break up the huge MetricsService interface#543aditya1702 merged 9 commits intofeature/finalize-metricsfrom
MetricsService interface#543aditya1702 merged 9 commits intofeature/finalize-metricsfrom
Conversation
Phase 1 of metrics refactor: create domain-specific metric structs (DBMetrics, RPCMetrics, IngestionMetrics, HTTPMetrics, GraphQLMetrics, AuthMetrics) with constructors taking prometheus.Registerer. Add pool registration functions. Rewrite metrics.go to compose sub-structs in a top-level Metrics struct. The legacy MetricsService interface is kept temporarily and now delegates to the new structs.
Phase 2: Replace MetricsService interface with *metrics.DBMetrics in all 11 data model structs. Call sites now use direct Prometheus API (e.g., m.Metrics.QueryDuration.WithLabelValues(...).Observe(...)). Add DBMetrics() bridge method to legacy MetricsService interface for callers that still create via NewMetricsService(). Update NewModels() signature and all wiring in serve.go, ingest.go, and loadtest/runner.go.
Phase 3: Replace MetricsService interface with *metrics.RPCMetrics in rpcService. Call sites now use direct Prometheus API (e.g., r.metrics.MethodCallsTotal.WithLabelValues(...).Inc()). Add RPCMetrics() bridge method to legacy MetricsService interface. Update all NewRPCService callers.
Phase 4: Replace MetricsService interface in all middleware: - MetricsMiddleware: accepts *metrics.HTTPMetrics - GraphQLFieldMetrics: accepts *metrics.GraphQLMetrics - ComplexityLogger: accepts *metrics.GraphQLMetrics - AuthenticationMiddleware: accepts *metrics.AuthMetrics Update serve.go wiring to pass sub-structs from *metrics.Metrics.
Phase 5+7: Replace MetricsService in ingestion pipeline: - IngestServiceConfig.Metrics now holds *metrics.Metrics - ingestService uses m.appMetrics.Ingestion.* for all metric calls - Indexer accepts *metrics.IngestionMetrics directly - All processors accept *metrics.IngestionMetrics instead of MetricsServiceInterface, calling StateChangeProcessingDuration directly - loadtest/runner.go and ingest/ingest.go create *metrics.Metrics directly instead of going through the legacy interface
Phase 6: Replace MockMetricsService + .On().Maybe() chains with real prometheus.NewRegistry() + metrics.NewMetrics(reg) in all 23 test files. Delete MetricsService interface, metricsService struct, mocks.go, processors/metrics.go, and metrics_test.go (to be rewritten). Update resolver.go to accept *metrics.Metrics directly. Remove legacy MetricsService field from serve.go handlerDeps. Update cmd/channel_account to use *metrics.Metrics. Net effect: -2050 lines of mock boilerplate removed.
aristidesstaffieri
approved these changes
Mar 30, 2026
aditya1702
added a commit
that referenced
this pull request
Mar 30, 2026
* Break up the huge `MetricsService` interface (#543) * metrics: add concrete metric structs with wallet_ namespace prefix Phase 1 of metrics refactor: create domain-specific metric structs (DBMetrics, RPCMetrics, IngestionMetrics, HTTPMetrics, GraphQLMetrics, AuthMetrics) with constructors taking prometheus.Registerer. Add pool registration functions. Rewrite metrics.go to compose sub-structs in a top-level Metrics struct. The legacy MetricsService interface is kept temporarily and now delegates to the new structs. * metrics: migrate data models to use concrete *DBMetrics struct Phase 2: Replace MetricsService interface with *metrics.DBMetrics in all 11 data model structs. Call sites now use direct Prometheus API (e.g., m.Metrics.QueryDuration.WithLabelValues(...).Observe(...)). Add DBMetrics() bridge method to legacy MetricsService interface for callers that still create via NewMetricsService(). Update NewModels() signature and all wiring in serve.go, ingest.go, and loadtest/runner.go. * metrics: migrate RPC service to use concrete *RPCMetrics struct Phase 3: Replace MetricsService interface with *metrics.RPCMetrics in rpcService. Call sites now use direct Prometheus API (e.g., r.metrics.MethodCallsTotal.WithLabelValues(...).Inc()). Add RPCMetrics() bridge method to legacy MetricsService interface. Update all NewRPCService callers. * metrics: migrate middleware to use concrete metric structs Phase 4: Replace MetricsService interface in all middleware: - MetricsMiddleware: accepts *metrics.HTTPMetrics - GraphQLFieldMetrics: accepts *metrics.GraphQLMetrics - ComplexityLogger: accepts *metrics.GraphQLMetrics - AuthenticationMiddleware: accepts *metrics.AuthMetrics Update serve.go wiring to pass sub-structs from *metrics.Metrics. * metrics: migrate ingestion, indexer, and processors to concrete structs Phase 5+7: Replace MetricsService in ingestion pipeline: - IngestServiceConfig.Metrics now holds *metrics.Metrics - ingestService uses m.appMetrics.Ingestion.* for all metric calls - Indexer accepts *metrics.IngestionMetrics directly - All processors accept *metrics.IngestionMetrics instead of MetricsServiceInterface, calling StateChangeProcessingDuration directly - loadtest/runner.go and ingest/ingest.go create *metrics.Metrics directly instead of going through the legacy interface * metrics: migrate all tests to real registries, delete legacy interface Phase 6: Replace MockMetricsService + .On().Maybe() chains with real prometheus.NewRegistry() + metrics.NewMetrics(reg) in all 23 test files. Delete MetricsService interface, metricsService struct, mocks.go, processors/metrics.go, and metrics_test.go (to be rewritten). Update resolver.go to accept *metrics.Metrics directly. Remove legacy MetricsService field from serve.go handlerDeps. Update cmd/channel_account to use *metrics.Metrics. Net effect: -2050 lines of mock boilerplate removed. * make check * Add metrics tests * Add CollectAndCompare tests * Fix all metrics (#545) * metrics: add concrete metric structs with wallet_ namespace prefix Phase 1 of metrics refactor: create domain-specific metric structs (DBMetrics, RPCMetrics, IngestionMetrics, HTTPMetrics, GraphQLMetrics, AuthMetrics) with constructors taking prometheus.Registerer. Add pool registration functions. Rewrite metrics.go to compose sub-structs in a top-level Metrics struct. The legacy MetricsService interface is kept temporarily and now delegates to the new structs. * metrics: migrate data models to use concrete *DBMetrics struct Phase 2: Replace MetricsService interface with *metrics.DBMetrics in all 11 data model structs. Call sites now use direct Prometheus API (e.g., m.Metrics.QueryDuration.WithLabelValues(...).Observe(...)). Add DBMetrics() bridge method to legacy MetricsService interface for callers that still create via NewMetricsService(). Update NewModels() signature and all wiring in serve.go, ingest.go, and loadtest/runner.go. * metrics: migrate RPC service to use concrete *RPCMetrics struct Phase 3: Replace MetricsService interface with *metrics.RPCMetrics in rpcService. Call sites now use direct Prometheus API (e.g., r.metrics.MethodCallsTotal.WithLabelValues(...).Inc()). Add RPCMetrics() bridge method to legacy MetricsService interface. Update all NewRPCService callers. * metrics: migrate middleware to use concrete metric structs Phase 4: Replace MetricsService interface in all middleware: - MetricsMiddleware: accepts *metrics.HTTPMetrics - GraphQLFieldMetrics: accepts *metrics.GraphQLMetrics - ComplexityLogger: accepts *metrics.GraphQLMetrics - AuthenticationMiddleware: accepts *metrics.AuthMetrics Update serve.go wiring to pass sub-structs from *metrics.Metrics. * metrics: migrate ingestion, indexer, and processors to concrete structs Phase 5+7: Replace MetricsService in ingestion pipeline: - IngestServiceConfig.Metrics now holds *metrics.Metrics - ingestService uses m.appMetrics.Ingestion.* for all metric calls - Indexer accepts *metrics.IngestionMetrics directly - All processors accept *metrics.IngestionMetrics instead of MetricsServiceInterface, calling StateChangeProcessingDuration directly - loadtest/runner.go and ingest/ingest.go create *metrics.Metrics directly instead of going through the legacy interface * metrics: migrate all tests to real registries, delete legacy interface Phase 6: Replace MockMetricsService + .On().Maybe() chains with real prometheus.NewRegistry() + metrics.NewMetrics(reg) in all 23 test files. Delete MetricsService interface, metricsService struct, mocks.go, processors/metrics.go, and metrics_test.go (to be rewritten). Update resolver.go to accept *metrics.Metrics directly. Remove legacy MetricsService field from serve.go handlerDeps. Update cmd/channel_account to use *metrics.Metrics. Net effect: -2050 lines of mock boilerplate removed. * refactor db metrics * make check * Add metrics tests * Add CollectAndCompare tests * fix db test * Add operation-level GraphQL metrics and middleware Introduce operation-level Prometheus collectors (operation duration histogram, operations counter, in-flight gauge, response size histogram) and rename the constructor to NewGraphQLMetrics. Replace heavy per-field timing/counters with a lightweight deprecated-field counter and complexity/response histograms to reduce cardinality and provide SLO-friendly metrics. Add GraphQLOperationMetrics middleware to record duration, throughput, errors and response size; add tests for operation and field middleware and update existing tests and registrations. Wire the new operation and field middlewares into the server handler. * Create graphql_field_metrics_test.go * make check * Add comments for DB metrics * Refactor ingestion metrics; add retries/errors Refactors Prometheus ingestion metrics and updates instrumentation across ingestion code. Duration was changed from a HistogramVec to a Histogram (calls updated), several metric names were renamed (ledgers/transactions/operations totals), BatchSize removed, and new metrics added: LagLedgers, LedgerFetchDuration, RetriesTotal, RetryExhaustionsTotal, ErrorsTotal (and adjusted Participants metric name/buckets). Instrumentation now observes ledger fetch duration, increments retry and exhaustion counters in fetch/flush/persist paths, reports errors on live ingestion failures, and updates lag when available. Tests updated to match new metric types, bucket counts, and include unit tests for the new metrics. * Enhance RPC metrics with histograms and gauges Refactor and expand RPC Prometheus instrumentation for better SLOs and observability. - Replace per-endpoint summary metrics and separate success/failure counters with: - wallet_rpc_request_duration_seconds (HistogramVec by method) - wallet_rpc_request_duration_seconds and wallet_rpc_method_duration_seconds use explicit rpcDurationBuckets - wallet_rpc_requests_total now has (method,status) labels for success/failure - Add wallet_rpc_in_flight_requests (Gauge) and wallet_rpc_response_size_bytes (HistogramVec) - Convert MethodDuration to a histogram and keep MethodErrorsTotal and MethodCallsTotal counters - Update registration to include new collectors and remove deprecated ones. - Update tests to assert new metrics, add histogram and bucket checks, and adjust transport counter tests to use (method,status) labels. - RPC service changes: - Remove heartbeat channel accessor from the interface and implementation - GetHealth now sets ServiceHealth and LatestLedger based on response and marks health=0 on errors - sendRPCRequest now tracks InFlightRequests, observes RequestDuration, records ResponseSizeBytes, and increments RequestsTotal with success/failure labels instead of old endpoint counters These changes improve latency and size visibility, simplify error/success accounting, and provide gauges useful for detecting RPC node stalls or connection exhaustion. * Update rpc.go * Rename pool label and expand pool/DB metrics Replace the pond pool "channel" label with a clearer "pool_name" label and rename the RegisterPoolMetrics parameter accordingly. Update pool metrics (use wallet_pool_tasks_dropped_total instead of tasks_completed) and tests to reflect the label/name changes. Add extensive documentation comments and new Prometheus metrics for pgxpool (constructing_conns gauge, acquire/empty-acquire counters, wait time counters, new_conns/canceled/max_lifetime/max_idle destroy counters) and improve help text for several metrics to provide better observability of pool and DB connection behavior. * Add QueryExecMode to DB pool config Expose pgx.QueryExecMode on PoolConfig and apply it when opening the connection pool. If non-zero, the value is copied into cfg.ConnConfig.DefaultQueryExecMode so callers can override pgx's default (cached prepared statements). The serve config now sets QueryExecMode to Exec to avoid server-side prepared statement caching which conflicts with PgBouncer in transaction pooling mode (SQLSTATE 42P05), and imports github.com/jackc/pgx/v5. * Refactor GraphQL metrics and remove RPC heartbeat Ensure GraphQL operation metrics properly decrement InFlightOperations exactly once by adding a responded guard and defer. Normalize GraphQL error labels: unrecognized extension codes now map to "unknown" (and the comment documents the closed set). Remove the heartbeatChannel from rpcService and its mock/tests, simplifying the RPC service surface and cleaning up related test assertions.
aditya1702
added a commit
that referenced
this pull request
Mar 30, 2026
* metrics: add concrete metric structs with wallet_ namespace prefix Phase 1 of metrics refactor: create domain-specific metric structs (DBMetrics, RPCMetrics, IngestionMetrics, HTTPMetrics, GraphQLMetrics, AuthMetrics) with constructors taking prometheus.Registerer. Add pool registration functions. Rewrite metrics.go to compose sub-structs in a top-level Metrics struct. The legacy MetricsService interface is kept temporarily and now delegates to the new structs. * metrics: migrate data models to use concrete *DBMetrics struct Phase 2: Replace MetricsService interface with *metrics.DBMetrics in all 11 data model structs. Call sites now use direct Prometheus API (e.g., m.Metrics.QueryDuration.WithLabelValues(...).Observe(...)). Add DBMetrics() bridge method to legacy MetricsService interface for callers that still create via NewMetricsService(). Update NewModels() signature and all wiring in serve.go, ingest.go, and loadtest/runner.go. * metrics: migrate RPC service to use concrete *RPCMetrics struct Phase 3: Replace MetricsService interface with *metrics.RPCMetrics in rpcService. Call sites now use direct Prometheus API (e.g., r.metrics.MethodCallsTotal.WithLabelValues(...).Inc()). Add RPCMetrics() bridge method to legacy MetricsService interface. Update all NewRPCService callers. * metrics: migrate middleware to use concrete metric structs Phase 4: Replace MetricsService interface in all middleware: - MetricsMiddleware: accepts *metrics.HTTPMetrics - GraphQLFieldMetrics: accepts *metrics.GraphQLMetrics - ComplexityLogger: accepts *metrics.GraphQLMetrics - AuthenticationMiddleware: accepts *metrics.AuthMetrics Update serve.go wiring to pass sub-structs from *metrics.Metrics. * metrics: migrate ingestion, indexer, and processors to concrete structs Phase 5+7: Replace MetricsService in ingestion pipeline: - IngestServiceConfig.Metrics now holds *metrics.Metrics - ingestService uses m.appMetrics.Ingestion.* for all metric calls - Indexer accepts *metrics.IngestionMetrics directly - All processors accept *metrics.IngestionMetrics instead of MetricsServiceInterface, calling StateChangeProcessingDuration directly - loadtest/runner.go and ingest/ingest.go create *metrics.Metrics directly instead of going through the legacy interface * metrics: migrate all tests to real registries, delete legacy interface Phase 6: Replace MockMetricsService + .On().Maybe() chains with real prometheus.NewRegistry() + metrics.NewMetrics(reg) in all 23 test files. Delete MetricsService interface, metricsService struct, mocks.go, processors/metrics.go, and metrics_test.go (to be rewritten). Update resolver.go to accept *metrics.Metrics directly. Remove legacy MetricsService field from serve.go handlerDeps. Update cmd/channel_account to use *metrics.Metrics. Net effect: -2050 lines of mock boilerplate removed. * refactor db metrics * make check * Add metrics tests * Add CollectAndCompare tests * fix db test * Add operation-level GraphQL metrics and middleware Introduce operation-level Prometheus collectors (operation duration histogram, operations counter, in-flight gauge, response size histogram) and rename the constructor to NewGraphQLMetrics. Replace heavy per-field timing/counters with a lightweight deprecated-field counter and complexity/response histograms to reduce cardinality and provide SLO-friendly metrics. Add GraphQLOperationMetrics middleware to record duration, throughput, errors and response size; add tests for operation and field middleware and update existing tests and registrations. Wire the new operation and field middlewares into the server handler. * Create graphql_field_metrics_test.go * make check * Add comments for DB metrics * Refactor ingestion metrics; add retries/errors Refactors Prometheus ingestion metrics and updates instrumentation across ingestion code. Duration was changed from a HistogramVec to a Histogram (calls updated), several metric names were renamed (ledgers/transactions/operations totals), BatchSize removed, and new metrics added: LagLedgers, LedgerFetchDuration, RetriesTotal, RetryExhaustionsTotal, ErrorsTotal (and adjusted Participants metric name/buckets). Instrumentation now observes ledger fetch duration, increments retry and exhaustion counters in fetch/flush/persist paths, reports errors on live ingestion failures, and updates lag when available. Tests updated to match new metric types, bucket counts, and include unit tests for the new metrics. * Enhance RPC metrics with histograms and gauges Refactor and expand RPC Prometheus instrumentation for better SLOs and observability. - Replace per-endpoint summary metrics and separate success/failure counters with: - wallet_rpc_request_duration_seconds (HistogramVec by method) - wallet_rpc_request_duration_seconds and wallet_rpc_method_duration_seconds use explicit rpcDurationBuckets - wallet_rpc_requests_total now has (method,status) labels for success/failure - Add wallet_rpc_in_flight_requests (Gauge) and wallet_rpc_response_size_bytes (HistogramVec) - Convert MethodDuration to a histogram and keep MethodErrorsTotal and MethodCallsTotal counters - Update registration to include new collectors and remove deprecated ones. - Update tests to assert new metrics, add histogram and bucket checks, and adjust transport counter tests to use (method,status) labels. - RPC service changes: - Remove heartbeat channel accessor from the interface and implementation - GetHealth now sets ServiceHealth and LatestLedger based on response and marks health=0 on errors - sendRPCRequest now tracks InFlightRequests, observes RequestDuration, records ResponseSizeBytes, and increments RequestsTotal with success/failure labels instead of old endpoint counters These changes improve latency and size visibility, simplify error/success accounting, and provide gauges useful for detecting RPC node stalls or connection exhaustion. * Update rpc.go * Rename pool label and expand pool/DB metrics Replace the pond pool "channel" label with a clearer "pool_name" label and rename the RegisterPoolMetrics parameter accordingly. Update pool metrics (use wallet_pool_tasks_dropped_total instead of tasks_completed) and tests to reflect the label/name changes. Add extensive documentation comments and new Prometheus metrics for pgxpool (constructing_conns gauge, acquire/empty-acquire counters, wait time counters, new_conns/canceled/max_lifetime/max_idle destroy counters) and improve help text for several metrics to provide better observability of pool and DB connection behavior. * Add QueryExecMode to DB pool config Expose pgx.QueryExecMode on PoolConfig and apply it when opening the connection pool. If non-zero, the value is copied into cfg.ConnConfig.DefaultQueryExecMode so callers can override pgx's default (cached prepared statements). The serve config now sets QueryExecMode to Exec to avoid server-side prepared statement caching which conflicts with PgBouncer in transaction pooling mode (SQLSTATE 42P05), and imports github.com/jackc/pgx/v5. * remove envelope_xdr and meta_xdr - 1 * fix all tests * Add back the envelopeXDR and metaXDR temporarily for tests * Refactor GraphQL metrics and remove RPC heartbeat Ensure GraphQL operation metrics properly decrement InFlightOperations exactly once by adding a responded guard and defer. Normalize GraphQL error labels: unrecognized extension codes now map to "unknown" (and the comment documents the closed set). Remove the heartbeatChannel from rpcService and its mock/tests, simplifying the RPC service surface and cleaning up related test assertions. * Break up the huge `MetricsService` interface (#543) * metrics: add concrete metric structs with wallet_ namespace prefix Phase 1 of metrics refactor: create domain-specific metric structs (DBMetrics, RPCMetrics, IngestionMetrics, HTTPMetrics, GraphQLMetrics, AuthMetrics) with constructors taking prometheus.Registerer. Add pool registration functions. Rewrite metrics.go to compose sub-structs in a top-level Metrics struct. The legacy MetricsService interface is kept temporarily and now delegates to the new structs. * metrics: migrate data models to use concrete *DBMetrics struct Phase 2: Replace MetricsService interface with *metrics.DBMetrics in all 11 data model structs. Call sites now use direct Prometheus API (e.g., m.Metrics.QueryDuration.WithLabelValues(...).Observe(...)). Add DBMetrics() bridge method to legacy MetricsService interface for callers that still create via NewMetricsService(). Update NewModels() signature and all wiring in serve.go, ingest.go, and loadtest/runner.go. * metrics: migrate RPC service to use concrete *RPCMetrics struct Phase 3: Replace MetricsService interface with *metrics.RPCMetrics in rpcService. Call sites now use direct Prometheus API (e.g., r.metrics.MethodCallsTotal.WithLabelValues(...).Inc()). Add RPCMetrics() bridge method to legacy MetricsService interface. Update all NewRPCService callers. * metrics: migrate middleware to use concrete metric structs Phase 4: Replace MetricsService interface in all middleware: - MetricsMiddleware: accepts *metrics.HTTPMetrics - GraphQLFieldMetrics: accepts *metrics.GraphQLMetrics - ComplexityLogger: accepts *metrics.GraphQLMetrics - AuthenticationMiddleware: accepts *metrics.AuthMetrics Update serve.go wiring to pass sub-structs from *metrics.Metrics. * metrics: migrate ingestion, indexer, and processors to concrete structs Phase 5+7: Replace MetricsService in ingestion pipeline: - IngestServiceConfig.Metrics now holds *metrics.Metrics - ingestService uses m.appMetrics.Ingestion.* for all metric calls - Indexer accepts *metrics.IngestionMetrics directly - All processors accept *metrics.IngestionMetrics instead of MetricsServiceInterface, calling StateChangeProcessingDuration directly - loadtest/runner.go and ingest/ingest.go create *metrics.Metrics directly instead of going through the legacy interface * metrics: migrate all tests to real registries, delete legacy interface Phase 6: Replace MockMetricsService + .On().Maybe() chains with real prometheus.NewRegistry() + metrics.NewMetrics(reg) in all 23 test files. Delete MetricsService interface, metricsService struct, mocks.go, processors/metrics.go, and metrics_test.go (to be rewritten). Update resolver.go to accept *metrics.Metrics directly. Remove legacy MetricsService field from serve.go handlerDeps. Update cmd/channel_account to use *metrics.Metrics. Net effect: -2050 lines of mock boilerplate removed. * make check * Add metrics tests * Add CollectAndCompare tests * Fix all metrics (#545) * metrics: add concrete metric structs with wallet_ namespace prefix Phase 1 of metrics refactor: create domain-specific metric structs (DBMetrics, RPCMetrics, IngestionMetrics, HTTPMetrics, GraphQLMetrics, AuthMetrics) with constructors taking prometheus.Registerer. Add pool registration functions. Rewrite metrics.go to compose sub-structs in a top-level Metrics struct. The legacy MetricsService interface is kept temporarily and now delegates to the new structs. * metrics: migrate data models to use concrete *DBMetrics struct Phase 2: Replace MetricsService interface with *metrics.DBMetrics in all 11 data model structs. Call sites now use direct Prometheus API (e.g., m.Metrics.QueryDuration.WithLabelValues(...).Observe(...)). Add DBMetrics() bridge method to legacy MetricsService interface for callers that still create via NewMetricsService(). Update NewModels() signature and all wiring in serve.go, ingest.go, and loadtest/runner.go. * metrics: migrate RPC service to use concrete *RPCMetrics struct Phase 3: Replace MetricsService interface with *metrics.RPCMetrics in rpcService. Call sites now use direct Prometheus API (e.g., r.metrics.MethodCallsTotal.WithLabelValues(...).Inc()). Add RPCMetrics() bridge method to legacy MetricsService interface. Update all NewRPCService callers. * metrics: migrate middleware to use concrete metric structs Phase 4: Replace MetricsService interface in all middleware: - MetricsMiddleware: accepts *metrics.HTTPMetrics - GraphQLFieldMetrics: accepts *metrics.GraphQLMetrics - ComplexityLogger: accepts *metrics.GraphQLMetrics - AuthenticationMiddleware: accepts *metrics.AuthMetrics Update serve.go wiring to pass sub-structs from *metrics.Metrics. * metrics: migrate ingestion, indexer, and processors to concrete structs Phase 5+7: Replace MetricsService in ingestion pipeline: - IngestServiceConfig.Metrics now holds *metrics.Metrics - ingestService uses m.appMetrics.Ingestion.* for all metric calls - Indexer accepts *metrics.IngestionMetrics directly - All processors accept *metrics.IngestionMetrics instead of MetricsServiceInterface, calling StateChangeProcessingDuration directly - loadtest/runner.go and ingest/ingest.go create *metrics.Metrics directly instead of going through the legacy interface * metrics: migrate all tests to real registries, delete legacy interface Phase 6: Replace MockMetricsService + .On().Maybe() chains with real prometheus.NewRegistry() + metrics.NewMetrics(reg) in all 23 test files. Delete MetricsService interface, metricsService struct, mocks.go, processors/metrics.go, and metrics_test.go (to be rewritten). Update resolver.go to accept *metrics.Metrics directly. Remove legacy MetricsService field from serve.go handlerDeps. Update cmd/channel_account to use *metrics.Metrics. Net effect: -2050 lines of mock boilerplate removed. * refactor db metrics * make check * Add metrics tests * Add CollectAndCompare tests * fix db test * Add operation-level GraphQL metrics and middleware Introduce operation-level Prometheus collectors (operation duration histogram, operations counter, in-flight gauge, response size histogram) and rename the constructor to NewGraphQLMetrics. Replace heavy per-field timing/counters with a lightweight deprecated-field counter and complexity/response histograms to reduce cardinality and provide SLO-friendly metrics. Add GraphQLOperationMetrics middleware to record duration, throughput, errors and response size; add tests for operation and field middleware and update existing tests and registrations. Wire the new operation and field middlewares into the server handler. * Create graphql_field_metrics_test.go * make check * Add comments for DB metrics * Refactor ingestion metrics; add retries/errors Refactors Prometheus ingestion metrics and updates instrumentation across ingestion code. Duration was changed from a HistogramVec to a Histogram (calls updated), several metric names were renamed (ledgers/transactions/operations totals), BatchSize removed, and new metrics added: LagLedgers, LedgerFetchDuration, RetriesTotal, RetryExhaustionsTotal, ErrorsTotal (and adjusted Participants metric name/buckets). Instrumentation now observes ledger fetch duration, increments retry and exhaustion counters in fetch/flush/persist paths, reports errors on live ingestion failures, and updates lag when available. Tests updated to match new metric types, bucket counts, and include unit tests for the new metrics. * Enhance RPC metrics with histograms and gauges Refactor and expand RPC Prometheus instrumentation for better SLOs and observability. - Replace per-endpoint summary metrics and separate success/failure counters with: - wallet_rpc_request_duration_seconds (HistogramVec by method) - wallet_rpc_request_duration_seconds and wallet_rpc_method_duration_seconds use explicit rpcDurationBuckets - wallet_rpc_requests_total now has (method,status) labels for success/failure - Add wallet_rpc_in_flight_requests (Gauge) and wallet_rpc_response_size_bytes (HistogramVec) - Convert MethodDuration to a histogram and keep MethodErrorsTotal and MethodCallsTotal counters - Update registration to include new collectors and remove deprecated ones. - Update tests to assert new metrics, add histogram and bucket checks, and adjust transport counter tests to use (method,status) labels. - RPC service changes: - Remove heartbeat channel accessor from the interface and implementation - GetHealth now sets ServiceHealth and LatestLedger based on response and marks health=0 on errors - sendRPCRequest now tracks InFlightRequests, observes RequestDuration, records ResponseSizeBytes, and increments RequestsTotal with success/failure labels instead of old endpoint counters These changes improve latency and size visibility, simplify error/success accounting, and provide gauges useful for detecting RPC node stalls or connection exhaustion. * Update rpc.go * Rename pool label and expand pool/DB metrics Replace the pond pool "channel" label with a clearer "pool_name" label and rename the RegisterPoolMetrics parameter accordingly. Update pool metrics (use wallet_pool_tasks_dropped_total instead of tasks_completed) and tests to reflect the label/name changes. Add extensive documentation comments and new Prometheus metrics for pgxpool (constructing_conns gauge, acquire/empty-acquire counters, wait time counters, new_conns/canceled/max_lifetime/max_idle destroy counters) and improve help text for several metrics to provide better observability of pool and DB connection behavior. * Add QueryExecMode to DB pool config Expose pgx.QueryExecMode on PoolConfig and apply it when opening the connection pool. If non-zero, the value is copied into cfg.ConnConfig.DefaultQueryExecMode so callers can override pgx's default (cached prepared statements). The serve config now sets QueryExecMode to Exec to avoid server-side prepared statement caching which conflicts with PgBouncer in transaction pooling mode (SQLSTATE 42P05), and imports github.com/jackc/pgx/v5. * Refactor GraphQL metrics and remove RPC heartbeat Ensure GraphQL operation metrics properly decrement InFlightOperations exactly once by adding a responded guard and defer. Normalize GraphQL error labels: unrecognized extension codes now map to "unknown" (and the comment documents the closed set). Remove the heartbeatChannel from rpcService and its mock/tests, simplifying the RPC service surface and cleaning up related test assertions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #547
Break up the big
MetricsServiceinterface into a struct with individual service level metric files. This makes it very easy to add new metrics for existing services and future new features and their metrics