Observability | Bulk Analyzer #5535

aarshi0301 · 2025-10-21T05:04:21Z

Change description

Description here

Add Observability Metrics for Bulk Operations

Adds comprehensive observability metrics for bulk create/update operations to help identify performance bottlenecks.

Features Added

Timing metrics for diff calculation, lineage calculation, validation, ingestion, notification, audit logging
Payload metrics (entity count and bytes)
Relationship and array attribute tracking
Operation counters and gauges
Error tracking with error type classification

Type of change

Bug fix (fixes an issue)
New feature (adds functionality)

Related issues

Fix #1

Helm Config Changes for Running Tests (Staging PR)

Does this PR require Helm config changes for testing?

Tests are NOT required for this commit. (You can proceed with the PR.) ✅
No, Helm config changes are not needed. (You can proceed with the PR.) ✅
Yes, I have already updated the config-values on enpla9up36. (You can proceed with the PR.) ✅
Yes, but I have NOT updated the config-values. (Please update them before proceeding; or, tests will run with default values.)⚠️

Checklists

Development

Lint rules pass locally
Application changes have been tested thoroughly
Automated tests covering modified code pass

Security

Security impact of change has been considered
Code follows company security practices and guidelines

Code review

Pull request has a descriptive title and context useful to a reviewer. Screenshots or screencasts are attached as necessary
"Ready for review" label attached and reviewers assigned
Changes have been reviewed by at least one other contributor
Pull request linked to task tracker where applicable

Note

Adds Micrometer-based observability for bulk create/update (payload, timing, relationships, gauges, errors) and lineage timing, with docs and minimal API hooks.

Observability (Micrometer):
- Introduces AtlasObservabilityService and AtlasObservabilityData to emit timers/counters/distributions and gauges (operations_in_progress, total_operations).
- Records duration, payload size/bytes, relationship/attribute counts, operation status/errors; structured error logs with MDC.
Payload analysis:
- Adds PayloadAnalyzer to compute array_relationships, per-relationship counts, and array attribute counts from AtlasEntitiesWithExtInfo.
Create/Update instrumentation:
- Instruments AtlasEntityStoreV2#createOrUpdate: operation start/end/failure, timings (validation, diff, ingestion), payload metrics, RequestContext counters; integrates PayloadAnalyzer and records Prometheus-safe tags.
- Exposes AtlasEntityStream#getEntitiesWithExtInfo for analysis.
Lineage timing:
- Captures and accumulates lineage calculation time in DeleteHandlerV1 and EntityGraphMapper; adds lineageCalcTime fields/methods to RequestContext.
Diff timing:
- Wraps AtlasEntityComparator#getDiffResult with perf metric.
Docs:
- Adds observability.mdc with full metric catalog, Grafana queries, dashboards, and alerting examples.

^{Written by Cursor Bugbot for commit b151f12. This will update automatically on new commits. Configure here.}

- Add AtlasObservabilityData class with all required metrics fields - Add AtlasObservabilityService using Micrometer for metrics recording - Add PayloadAnalyzer for array relationship and attribute analysis - Instrument AtlasEntityStoreV2.createOrUpdate() with timing and payload metrics - Add observability implementation plan documentation - Capture trace_id, client_origin, timing metrics, and array counts - Support both relationship arrays and regular attribute arrays

- Remove traceId, agentId, vertexIds, assetGuids from Prometheus metrics to prevent cardinality explosion - Keep high-cardinality fields only for error logging via logErrorDetails() - Update observability implementation plan with cardinality management guidelines - Add error handling around observability metrics recording - Add array attributes analysis similar to relationship attributes

- Set MDC filter key 'atlas-observability' for error logging - Use OBSERVABILITY logger for proper log routing - Add proper MDC cleanup in finally block - Follows same pattern as other Atlas loggers (audit, metrics, tasks)

- Replace manual MDC.put/remove with MDCScope.of() - Follows Atlas best practices for MDC management - Automatically restores previous MDC state on close - Cleaner and more robust than try/finally approach

- Record individual relationship types (process, inputs, outputs, etc.) with counts - Record individual attribute types with counts - Use Counter metrics with relationship_name/attribute_name tags - Enables detailed analysis: process:1, inputs:3, outputs:2, etc. - Maintains existing total count metrics for aggregation

repository/src/main/java/org/apache/atlas/observability/AtlasObservabilityService.java

repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java

repository/src/main/java/org/apache/atlas/observability/AtlasObservabilityService.java

server-api/src/main/java/org/apache/atlas/RequestContext.java

repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityComparator.java

repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java

Copilot

Pull Request Overview

This PR adds comprehensive Micrometer-based observability instrumentation for Atlas bulk create/update operations, introducing metrics tracking for payload analysis, timing breakdowns, and operational status monitoring.

Key Changes:

Introduced AtlasObservabilityService with Micrometer metrics (gauges, timers, counters, distribution summaries) for tracking operation performance, payload characteristics, and errors
Added timing instrumentation across entity pipeline stages (validation, diff calculation, ingestion, lineage calculation)
Created PayloadAnalyzer to compute payload metrics including sizes, relationships, and attributes

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
`repository/src/main/java/org/apache/atlas/observability/AtlasObservabilityService.java`	New service providing Micrometer-based metrics recording with SLOs, percentiles, and low-cardinality tags for Prometheus compatibility
`repository/src/main/java/org/apache/atlas/observability/PayloadAnalyzer.java`	New analyzer for extracting payload metrics including entity counts, relationships, and attributes
`common/src/main/java/org/apache/atlas/observability/AtlasObservabilityData.java`	New data model holding observability metrics including timing, payload, and request metadata
`repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStoreV2.java`	Instrumented createOrUpdate method with timing capture, payload analysis, and metrics recording with error handling
`server-api/src/main/java/org/apache/atlas/RequestContext.java`	Added lineage calculation timing accumulation fields and methods
`repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java`	Added lineage calculation timing in addHasLineage method
`repository/src/main/java/org/apache/atlas/repository/store/graph/v1/DeleteHandlerV1.java`	Added lineage calculation timing in deletion handlers
`repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityComparator.java`	Wrapped diff calculation with performance metric recording
`repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasEntityStream.java`	Added accessor method for entitiesWithExtInfo to enable payload analysis
`observability.mdc`	Comprehensive documentation of all metrics with PromQL queries, dashboard recommendations, and alerting rules
`.github/workflows/maven.yml`	Whitespace-only formatting changes to CI workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.