-
Couldn't load subscription status.
- Fork 278
Description
What are you trying to achieve?
This proposal introduces a new semantic convention attribute service.criticality to enable classification of services based on their operational importance. This attribute will allow observability platforms to implement criticality-aware tracing, and sampling strategies.
What did you expect to see?
| Value | Description | Use Cases |
|---|---|---|
critical |
Service is business-critical; downtime directly impacts revenue, user experience, or core functionality | Payment processing, authentication, primary user-facing APIs |
high |
Service is important but has degradation tolerance or fallback mechanisms | Shopping cart, search, recommendation engines |
medium |
Service provides supplementary functionality; degradation has limited user impact | Analytics, reporting, non-essential integrations |
low |
Service is non-essential to core operations; used for background tasks or internal tools | Batch processors, cleanup jobs, internal dashboards |
Additional context.
By introducing a standardized criticality attribute, one can:
- Implement adaptive sampling rates (e.g., 100% for critical, 10% for low-priority services)
- Optimize telemetry costs by reducing data from non-critical services
- Improve incident response by surfacing critical service traces first
- Enable better capacity planning and resource allocation
I attach below a sample OTel collector tailsampling processor where the proposed semconv attribute is utilized as intended
# OpenTelemetry Collector Configuration
# Tail-based sampling using service.criticality
receivers:
otlp:
protocols:
grpc:
http:
processors:
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: critical-services-policy
type: string_attribute
string_attribute:
key: service.criticality
values:
- critical
enabled_regex_matching: false
invert_match: false
- name: high-criticality-services
type: and
and:
and_sub_policy:
- name: is-high-criticality
type: string_attribute
string_attribute:
key: service.criticality
values:
- high
- name: probabilistic-50
type: probabilistic
probabilistic:
sampling_percentage: 50
- name: medium-criticality-services
type: and
and:
and_sub_policy:
- name: is-medium-criticality
type: string_attribute
string_attribute:
key: service.criticality
values:
- medium
- name: probabilistic-10
type: probabilistic
probabilistic:
sampling_percentage: 10
- name: low-criticality-services
type: and
and:
and_sub_policy:
- name: is-low-criticality
type: string_attribute
string_attribute:
key: service.criticality
values:
- low
- name: probabilistic-1
type: probabilistic
probabilistic:
sampling_percentage: 1
- name: error-traces
type: status_code
status_code:
status_codes:
- ERROR
- name: slow-critical-traces
type: and
and:
and_sub_policy:
- name: is-critical-or-high
type: string_attribute
string_attribute:
key: service.criticality
values:
- critical
- high
- name: is-slow
type: latency
latency:
threshold_ms: 5000
exporters:
otlp:
endpoint: some-backend:4317
service:
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [otlp]
Add any other context about the problem here. If you followed an existing documentation, please share the link to it.
Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status