|
1 | 1 | --- |
2 | | -title: OpenTelemetry metrics |
3 | | -subTitle: An introduction to OpenTelemetry's most data-efficient signal |
4 | | -displayTitle: OpenTelemetry Metrics |
5 | | -description: OpenTelemetry Metrics play a critical role in monitoring applications by offering a way to capture and analyze key metrics in a standardized, scalable manner. Whether you're managing a complex microservices architecture or a simpler system, OpenTelemetry helps track essential statistics that reveal the health and performance of your services. |
6 | | -date: 2024-10-18 |
| 2 | +title: OpenTelemetry traces |
| 3 | +subTitle: An introduction to OpenTelemetry's most readable tool |
| 4 | +displayTitle: OpenTelemetry Traces |
| 5 | +description: OpenTelemetry traces capture how individual operations within your system interact over time. A trace follows a request as it flows through a system, recording the relationships between different operations. Traces are particularly useful in distributed systems, where multiple services or components interact. However, they are equally valuable for monolithic applications, providing insights even when everything runs in a single process. |
| 6 | +date: 2024-10-30 |
7 | 7 | author: Nocnica Mellifera |
8 | 8 | githubUser: serverless-mom |
9 | 9 | displayDescription: |
10 | 10 | Learn more about OpenTelemetry & Monitoring with Checkly. Explore metrics, one of the three pillars of observability. |
11 | 11 | menu: |
12 | 12 | learn: |
13 | 13 | parent: "OpenTelemetry" |
14 | | -weight: 3 |
| 14 | +weight: 4 |
15 | 15 | --- |
16 | 16 |
|
17 | | -**OpenTelemetry Metrics** play a critical role in monitoring applications by offering a way to capture and analyze key metrics in a standardized, scalable manner. Whether you're managing a complex microservices architecture or a simpler system, OpenTelemetry helps track essential statistics that reveal the health and performance of your services. |
| 17 | +# An Introduction to OpenTelemetry Traces |
18 | 18 |
|
19 | | ---- |
20 | | - |
21 | | -## What are Metrics? |
22 | | - |
23 | | -Metrics represent **quantitative measurements** of your system’s health and behavior. They provide insights into performance trends, such as: |
24 | | - |
25 | | -- **CPU usage** over time |
26 | | -- **Request rates** per endpoint |
27 | | -- **Error counts** or failure rates |
28 | | -- **Latency** in handling requests |
29 | | - |
30 | | -Metrics are lightweight and highly efficient to collect, aggregate, and query. They help identify patterns and anomalies without burdening storage, making them suitable for continuous monitoring at scale. |
31 | | - |
32 | | -### Types of Metrics in OpenTelemetry: |
33 | | - |
34 | | -- **Counter**: Measures occurrences or events, such as the number of requests handled. |
35 | | -- **Gauge**: Captures values that fluctuate, like memory usage. |
36 | | -- **Histogram**: Measures the distribution of values, such as response time percentiles. |
37 | | - |
38 | | -Explore further in the [OpenTelemetry Metrics Documentation](https://opentelemetry.io/docs/concepts/signals/metrics/). |
39 | | - |
40 | | - |
41 | | -## Why Metrics Matter |
42 | | - |
43 | | -In a **microservices** environment, metrics are indispensable for: |
| 19 | +## What Are OpenTelemetry Traces? |
44 | 20 |
|
45 | | -- **Performance monitoring**: Identifying bottlenecks or degraded performance. |
46 | | -- **Capacity planning**: Forecasting when additional resources are required. |
47 | | -- **Incident detection**: Alerting teams about abnormal system behavior. |
| 21 | +OpenTelemetry traces capture how individual operations within your system interact over time. A trace follows a request as it flows through a system, recording the relationships between different operations. Traces are particularly useful in distributed systems, where multiple services or components interact. However, they are equally valuable for monolithic applications, providing insights even when everything runs in a single process. |
48 | 22 |
|
49 | | -Metrics are often **the first step** in identifying that something has gone wrong. If a metric shows unusual values (e.g., a spike in response time), you can investigate further by drilling into traces or logs to find the root cause. |
| 23 | +## Key Concepts in OpenTelemetry Traces |
50 | 24 |
|
51 | | -## Metrics vs. Traces |
| 25 | +1. **Spans:** |
| 26 | + - The core unit in a trace. |
| 27 | + - Represents an individual operation. |
| 28 | + - Each span has a name, a start and end time, and metadata (attributes) as key-value pairs. |
| 29 | + - Spans can be nested to reflect parent-child relationships. |
52 | 30 |
|
53 | | -Metrics have a number of advantages over tracing. Metrics are much more data efficient, generally at the collector level it’s possible to compress hundreds of individual metrics reported to a single packet of data sent on to the metrics backend. Further, metrics show broad trends whereas a trace, no matter how interesting, will always cover only a single request. |
| 31 | +2. **Trace Context:** |
| 32 | + - Propagates trace identifiers across process boundaries. |
| 33 | + - Helps track related spans across multiple services or components. |
54 | 34 |
|
55 | | -Should you use metrics instead of traces to monitor your service? Absolutely not. Metrics will always present average performance, and the specific information needed to really understand root causes will be elusive. Further, even with high resolution timeseries metrics it’s very hard to go from worrying metrics to find matching log data of a problem. Finally, modern traces can effectively show information about asynchronous requests as they contribute to overall request time, something that’s very hard to tease out of bare metrics. |
| 35 | +3. **Automatic Instrumentation:** |
| 36 | + - Some languages and frameworks allow tracing without code changes by using instrumentation agents. |
| 37 | + - This approach quickly provides a basic trace structure, capturing incoming requests and outgoing responses. |
56 | 38 |
|
57 | | -## Setting up OpenTelemetry Metrics |
| 39 | +4. **Manual Instrumentation:** |
| 40 | + - Developers use OpenTelemetry APIs to create spans where deeper insights are needed. |
| 41 | + - Useful for tracking specific application logic or attaching custom attributes. |
58 | 42 |
|
59 | | -### Auto-Instrumentation vs. Manual Instrumentation |
| 43 | +## Tracing in Monolithic vs. Distributed Systems |
60 | 44 |
|
61 | | -1. **Auto-Instrumentation**: Many popular frameworks and libraries come with automatic OpenTelemetry instrumentation, requiring minimal setup. |
62 | | -2. **Manual Instrumentation**: Developers can manually add metrics within the application code by using SDKs to track specific business metrics (e.g., purchases per hour). |
| 45 | +Though OpenTelemetry is often associated with microservices, its principles apply equally to monoliths. Even when working with a single application, external dependencies like databases, message queues, or third-party services make distributed tracing beneficial. Instrumenting a monolith provides visibility into which operations are slow, how many database calls occur per request, and which API calls contribute to latency. |
63 | 46 |
|
64 | | -Learn more about instrumentation options in the [OpenTelemetry SDK Guide](https://opentelemetry.io/docs/instrumentation/). |
| 47 | +### Example: Intercom’s Tracing Journey |
65 | 48 |
|
66 | | -## Example Metric Pipeline |
| 49 | +Intercom, a company that offers customer communication tools, transitioned from using structured logs to adopting tracing incrementally. They started by instrumenting API and database calls, which provided immediate value. Over time, they instrumented more of their service, improving their understanding of internal workflows and onboarding processes. |
67 | 50 |
|
68 | | -With OpenTelemetry, you can collect, process, and export metrics using **Collectors**. Here’s a high-level example of a typical metric pipeline: |
| 51 | +## Logs and Traces: A Complementary Approach |
69 | 52 |
|
70 | | -1. **Data Collection**: Metrics are generated by instrumented services. |
71 | | -2. **Processing**: The OpenTelemetry Collector aggregates and processes the data (e.g., batching or filtering metrics). |
72 | | -3. **Exporting**: Metrics are sent to observability platforms like **Prometheus** or **Grafana**. |
| 53 | +Organizations often have an existing logging infrastructure when adopting tracing. OpenTelemetry’s logs bridge allows integration between structured logs and traces by wrapping logs with trace identifiers. This ensures logs and traces remain correlated without requiring a complete overhaul of existing logging practices. |
73 | 54 |
|
74 | | -Learn how to configure a collector in the [OpenTelemetry Collector Guide](learn/opentelemetry/otel-collector/). |
| 55 | +### Gradual Migration with Logs Bridge |
75 | 56 |
|
| 57 | +Organizations can slowly convert significant logs into spans, as seen with Loan Market, an Australian financial services company. This approach allows gradual adoption of tracing without interrupting existing workflows, ensuring a smooth transition. |
76 | 58 |
|
| 59 | +## Benefits of OpenTelemetry Tracing |
77 | 60 |
|
78 | | -## Best Practices for Metrics in OpenTelemetry |
| 61 | +- **Visibility:** Quickly identify slow or failing operations. |
| 62 | +- **Efficiency:** Diagnose complex issues by tracking dependencies and relationships. |
| 63 | +- **Onboarding:** Help new developers understand system behavior through visualized traces. |
| 64 | +- **Adaptability:** Works across monoliths, microservices, and hybrid systems. |
79 | 65 |
|
80 | | -- **Optimize cardinality**: Avoid creating too many distinct labels, as this can overwhelm storage and query systems. |
81 | | -- **Set appropriate aggregation intervals**: Batch data intelligently to balance between real-time insights and system load. |
82 | | -- **Use meaningful names**: Clearly describe the purpose of each metric to make dashboards and alerts easier to understand. |
83 | | -- **Standardize naming early**: While OpenTelemetry defines standard language for a number of concepts, actual metric naming is not standardized. As such it's possible to report `total-web-shop-checkout-time` and `webShopCheckoutTime_total` as two totally separate metrics even though they should be aggregated. No standard is perfect, of course, and to normalize data before it's stored, use the [filtering tools in the OpenTelemetry collector](learn/opentelemetry/otel-filtering/). |
| 66 | +## Getting Started |
84 | 67 |
|
| 68 | +To begin, select your language and follow the documentation to add automatic instrumentation or use the OpenTelemetry API to create spans. Many libraries already support tracing out-of-the-box, making it easier to adopt tracing incrementally. |
85 | 69 |
|
86 | | -OpenTelemetry metrics provide a robust foundation for observability, helping teams proactively monitor performance and detect issues before they escalate. With the right setup and tooling, you can gain comprehensive insights into your applications, enabling faster resolution times and improved reliability. |
| 70 | +Incorporating OpenTelemetry traces helps developers detect problems earlier, understand their systems better, and respond effectively to user issues. Whether your application is a monolith, a microservice, or somewhere in between, traces provide the insight you need to optimize and troubleshoot your software. |
0 commit comments