|
| 1 | +--- |
| 2 | +title: "Fastrace: A Modern Approach to Distributed Tracing in Rust" |
| 3 | +tags: |
| 4 | + - Observability |
| 5 | + - Traces |
| 6 | +pubDate: 2025-03-22 |
| 7 | +--- |
| 8 | + |
| 9 | +## TL;DR |
| 10 | + |
| 11 | +Distributed tracing is critical for understanding modern microservice architectures. While `tokio-rs/tracing` is widely used in Rust, it comes with significant challenges: ecosystem fragmentation, complex configuration, and high overhead. |
| 12 | + |
| 13 | +[Fastrace](https://github.com/fast/fastrace) provides a production-ready solution with seamless ecosystem integration, out-of-box OpenTelemetry support, and a more straightforward API that works naturally with the existing logging infrastructure. |
| 14 | + |
| 15 | +The following example demonstrates how to trace functions with `fastrace`: |
| 16 | + |
| 17 | +```rust |
| 18 | +#[fastrace::trace] |
| 19 | +pub fn send_request(req: HttpRequest) -> Result<(), Error> { |
| 20 | + // ... |
| 21 | +} |
| 22 | +``` |
| 23 | + |
| 24 | +It's being used in production by products like [ScopeDB](https://www.scopedb.io/blog/manage-observability-data-in-petabytes), where it helps trace and debug petabyte-scale observability data workloads. |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +## Why Distributed Tracing Matters |
| 29 | + |
| 30 | +Understanding what is happening inside your applications has never been more challenging in today's microservices and distributed systems. A user request might touch dozens of services before completion, and traditional logging approaches quickly fall short. |
| 31 | + |
| 32 | +Consider a typical request flow: |
| 33 | + |
| 34 | +``` |
| 35 | +User → API Gateway → Auth Service → User Service → Database |
| 36 | +``` |
| 37 | + |
| 38 | +When an exception is thrown, or the app performs poorly, where exactly is the root cause? Individual service logs only show fragments of the whole trace, lacking the crucial context of how the request flows through your entire system. |
| 39 | + |
| 40 | +This is where distributed tracing becomes essential. Tracing creates a connected view of your request's flow across service boundaries, making it possible to: |
| 41 | + |
| 42 | +- Identify performance bottlenecks across services |
| 43 | +- Debug complex interactions between components |
| 44 | +- Understand dependencies and service relationships |
| 45 | +- Analyze latency distributions and outliers |
| 46 | +- Correlate logs and metrics with request context |
| 47 | + |
| 48 | +## A Famous Approach: `tokio-rs/tracing` |
| 49 | + |
| 50 | +For some Rust developers, `tokio-rs/tracing` is the go-to solution for application instrumentation. Let's look at how a typical implementation works: |
| 51 | + |
| 52 | +```rust |
| 53 | +fn main() { |
| 54 | + // Initialize the tracing subscriber |
| 55 | + // Complex configuration code omitted... |
| 56 | + |
| 57 | + // Create a span and record some data |
| 58 | + let span = tracing::info_span!("processing_request", |
| 59 | + user_id = 42, |
| 60 | + request_id = "abcd1234" |
| 61 | + ); |
| 62 | + |
| 63 | + // Enter the span (activates it for the current execution context) |
| 64 | + let _guard = span.enter(); |
| 65 | + |
| 66 | + // Log within the span context |
| 67 | + tracing::info!("Starting request processing"); |
| 68 | + |
| 69 | + process_data(); |
| 70 | + |
| 71 | + tracing::info!("Finished processing request"); |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +For instrumenting functions, `tokio-rs/tracing` provides attribute macros: |
| 76 | + |
| 77 | +```rust |
| 78 | +#[tracing::instrument(skip(password), fields(user_id = user.id))] |
| 79 | +async fn authenticate(user: &User, password: &str) -> Result<AuthToken, AuthError> { |
| 80 | + tracing::info!("Authenticating user {}", user.id); |
| 81 | + // ...more code... |
| 82 | +} |
| 83 | +``` |
| 84 | + |
| 85 | +## The Challenges with `tokio-rs/tracing` |
| 86 | + |
| 87 | +According to our previous user experience, `tokio-rs/tracing` comes with several significant challenges: |
| 88 | + |
| 89 | +### 1. Ecosystem Fragmentation |
| 90 | + |
| 91 | +By introducing its own logging macros, `tokio-rs/tracing` creates a division with code using the standard `log` crate: |
| 92 | + |
| 93 | +```rust |
| 94 | +// Using log crate |
| 95 | +log::info!("Starting operation"); |
| 96 | + |
| 97 | +// Using tracing crate (different syntax) |
| 98 | +tracing::info!("Starting operation"); |
| 99 | +``` |
| 100 | + |
| 101 | +This fragmentation is particularly problematic for library authors. When creating a library, authors face a difficult choice: |
| 102 | + |
| 103 | +1. Use the `log` crate for compatibility with the broader ecosystem |
| 104 | +2. Use `tokio-rs/tracing` for better observability features |
| 105 | + |
| 106 | +Many libraries choose the first option for simplicity, but miss out on the benefits of tracing. |
| 107 | + |
| 108 | +While `tokio-rs/tracing` does provide a feature flag 'log' that allows emitting log records to the `log` crate when using `tokio-rs/tracing`'s macros, library authors must manually enable this feature flag to ensure all users properly receive log records regardless of which logging framework they use. This creates additional configuration complexity for library maintainers. |
| 109 | + |
| 110 | +Furthermore, applications using `tokio-rs/tracing` must additionally install and configure the `tracing-log` bridge to properly receive log records from libraries that use the `log` crate. This creates a bidirectional compatibility problem requiring explicit configuration: |
| 111 | + |
| 112 | +```toml |
| 113 | +# Library's Cargo.toml |
| 114 | +[dependencies] |
| 115 | +tracing = { version = "0.1", features = ["log"] } # Emit log records for log compatibility |
| 116 | + |
| 117 | +# Application's Cargo.toml |
| 118 | +[dependencies] |
| 119 | +tracing = "0.1" |
| 120 | +tracing-log = "0.2" # Listen to log records for log compatibility |
| 121 | +``` |
| 122 | + |
| 123 | +### 2. Performance Impact for Libraries |
| 124 | + |
| 125 | +Library authors are particularly sensitive to performance overhead, as their code may be called in tight loops or performance-critical paths. `tokio-rs/tracing`'s overhead can be substantial when instrumented, which creates a dilemma: |
| 126 | + |
| 127 | +1. Always instrument tracing (and impose overhead on all users) |
| 128 | +2. Don't instrument at all (and lose observability) |
| 129 | +3. Create an additional feature flag system (increasing maintenance burden) |
| 130 | + |
| 131 | + Here is a common pattern in libraries using `tokio-rs/tracing`: |
| 132 | + |
| 133 | + ```rust |
| 134 | + #[cfg_attr(feature = "tracing", tracing::instrument(skip(password), fields(user_id = user.id)))] |
| 135 | + async fn authenticate(user: &User, password: &str) -> Result<AuthToken, AuthError> { |
| 136 | + // ...more code... |
| 137 | + } |
| 138 | + ``` |
| 139 | + |
| 140 | + Different libraries may define feature flags with subtly different names, making it hard for the final application to configure all of them. |
| 141 | + |
| 142 | +With `tokio-rs/tracing`, there's no clean way to have tracing zero-cost disabled. This makes library authors reluctant to add instrumentation to performance-sensitive code paths. |
| 143 | + |
| 144 | +### 3. No Context Propagation |
| 145 | + |
| 146 | +Distributed tracing requires propagating context across service boundaries, but `tokio-rs/tracing` leaves this largely as an exercise for the developer. For example, this is [tonic's official example](https://github.com/hyperium/tonic/blob/master/examples/src/tracing/server.rs) demonstrating how to trace a gRPC service: |
| 147 | + |
| 148 | +```rust |
| 149 | +Server::builder() |
| 150 | + .trace_fn(|_| tracing::info_span!("grpc_server")) |
| 151 | + .add_service(MyServiceServer::new(MyService::default())) |
| 152 | + .serve(addr) |
| 153 | + .await?; |
| 154 | +``` |
| 155 | + |
| 156 | +The above example only creates a basic span but doesn't extract tracing context from the incoming request. |
| 157 | + |
| 158 | +The consequences of missing context propagation are severe in distributed systems. When a trace disconnects due to missing context: |
| 159 | + |
| 160 | +- Instead of seeing a complete flow of a request like: |
| 161 | + ``` |
| 162 | + Trace #1: Frontend → API Gateway → User Service → Database → Response |
| 163 | + ``` |
| 164 | + |
| 165 | +- You'll see disconnected fragments from a request: |
| 166 | + ``` |
| 167 | + Trace #1: Frontend → API Gateway |
| 168 | + Trace #2: User Service → Database |
| 169 | + Trace #3: API Gateway → Response |
| 170 | + ``` |
| 171 | + |
| 172 | +- Even worse, when multile requests are interleaved, the traces become a chaotic mess: |
| 173 | + ``` |
| 174 | + Trace #1: Frontend → API Gateway |
| 175 | + Trace #2: Frontend → API Gateway |
| 176 | + Trace #3: Frontend → API Gateway |
| 177 | + Trace #4: User Service → Database |
| 178 | + Trace #6: API Gateway → Response |
| 179 | + Trace #5: User Service → Database |
| 180 | + ``` |
| 181 | + |
| 182 | +This fragmentation makes it extremely difficult to follow request flows, isolate performance issues, or understand causal relationships between services. |
| 183 | + |
| 184 | +## Introducing `fastrace`: A Fast and Complete Solution |
| 185 | + |
| 186 | +### 1. Zero-cost Abstraction |
| 187 | + |
| 188 | +`fastrace` is designed with real zero-cost abstraction. When disabled, instrumentations are completely omitted from compilation, resulting in no runtime overhead. This makes it ideal for libraries concerned about performance. |
| 189 | + |
| 190 | +### 2. Ecosystem Compatibility |
| 191 | + |
| 192 | +`fastrace` focuses exclusively on distributed tracing. Through its composable design, it integrates seamlessly with the existing Rust ecosystem, including compatibility with the standard `log` crate. This architectural approach allows libraries to implement comprehensive tracing while preserving their users' freedom to use their preferred logging setup. |
| 193 | + |
| 194 | +### 3. Simplicity First |
| 195 | + |
| 196 | +The API is designed to be intuitive and require minimal boilerplate, focusing on the most common use cases while still providing extensibility when needed. |
| 197 | + |
| 198 | +### 4. Insanely Fast |
| 199 | + |
| 200 | + |
| 201 | + |
| 202 | +`fastrace` is designed for high-performance applications. It can handle massive amounts of spans with minimal impact on CPU and memory usage. |
| 203 | + |
| 204 | +### 5. Ergonomic for both Applications and Libraries |
| 205 | + |
| 206 | +Libraries can use `fastrace` without imposing performance penalties when not needed: |
| 207 | + |
| 208 | +```rust |
| 209 | +#[fastrace::trace] // Zero-cost when the application doesn't enable the 'enable' feature |
| 210 | +pub fn process_data(data: &[u8]) -> Result<Vec<u8>, Error> { |
| 211 | + // Library uses standard log crate |
| 212 | + log::debug!("Processing {} bytes of data", data.len()); |
| 213 | + |
| 214 | + // ...more code... |
| 215 | +} |
| 216 | +``` |
| 217 | + |
| 218 | +The key point here is that libraries should include `fastrace` without enabling any features: |
| 219 | + |
| 220 | +```toml |
| 221 | +[dependencies] |
| 222 | +fastrace = "0.7" # No 'enable' feature |
| 223 | +``` |
| 224 | + |
| 225 | +When an application uses this library and doesn't enable the 'enable' feature of `fastrace`: |
| 226 | +- All tracing code is completely optimized away at compile time |
| 227 | +- Zero runtime overhead is added to the library |
| 228 | +- No impact on performance-critical code paths |
| 229 | + |
| 230 | +When the application does enable tracing via the 'enable' feature: |
| 231 | +- Instrumentation in the dedicated library becomes active |
| 232 | +- Spans are collected and reported |
| 233 | +- The application gets full visibility into library behavior |
| 234 | + |
| 235 | +This is a significant advantage over other tracing solutions that either always impose overhead or require libraries to implement complex feature-flag systems. |
| 236 | + |
| 237 | +### 6. Seamless Context Propagation |
| 238 | + |
| 239 | +`fastrace` provides companion crates for popular frameworks that handle context propagation automatically: |
| 240 | + |
| 241 | +```rust |
| 242 | +// For HTTP clients with reqwest |
| 243 | +let response = client.get(&format!("https://user-service/users/{}", user_id)) |
| 244 | + .headers(fastrace_reqwest::traceparent_headers()) // Automatically inject trace context |
| 245 | + .send() |
| 246 | + .await?; |
| 247 | + |
| 248 | +// For gRPC servers with tonic |
| 249 | +Server::builder() |
| 250 | + .layer(fastrace_tonic::FastraceServerLayer) // Automatically extracts context from incoming requests |
| 251 | + .add_service(MyServiceServer::new(MyService::default())) |
| 252 | + .serve(addr); |
| 253 | + |
| 254 | +// For gRPC clients |
| 255 | +let channel = ServiceBuilder::new() |
| 256 | + .layer(fastrace_tonic::FastraceClientLayer) // Automatically injects context into outgoing requests |
| 257 | + .service(channel); |
| 258 | + |
| 259 | +// For data access with Apache OpenDAL |
| 260 | +let op = Operator::new(services::Memory::default())? |
| 261 | + .layer(opendal::layers::FastraceLayer) // Automatically traces all data operations |
| 262 | + .finish(); |
| 263 | +op.write("test", "0".repeat(16 * 1024 * 1024).into_bytes()) |
| 264 | + .await?; |
| 265 | +``` |
| 266 | + |
| 267 | +This provides out-of-box distributed tracing without manual context handling. |
| 268 | + |
| 269 | +## The Complete Solution: `fastrace` + `log` + `logforth` |
| 270 | + |
| 271 | +`fastrace` deliberately focuses on doing one thing well: tracing. Through its composable design and the Rust's great ecosystem, a powerful combination emerges: |
| 272 | + |
| 273 | +- **log**: The standard Rust logging facade |
| 274 | +- **logforth**: A flexible logging implementation with industrial-ready features |
| 275 | +- **fastrace**: High-performance tracing with distributed context propagation |
| 276 | + |
| 277 | +This integration allows automatically associating your logs with trace spans, providing correlation without requiring using different logging macros: |
| 278 | + |
| 279 | +```rust |
| 280 | +log::info!("Processing started"); |
| 281 | + |
| 282 | +// Later, in your logging infrastructure, you can see which trace and span |
| 283 | +// each log entry belongs to. |
| 284 | +``` |
| 285 | + |
| 286 | +To illustrate the simplicity of this approach, here's a streamlined example of building a microservice with complete observability: |
| 287 | + |
| 288 | +```rust |
| 289 | +#[poem::handler] |
| 290 | +#[fastrace::trace] // Automatically creates and manages spans |
| 291 | +async fn get_user(Path(user_id): Path<String>) -> Json<User> { |
| 292 | + // Standard log calls are automatically associated with the current span |
| 293 | + log::info!("Fetching user {}", user_id); |
| 294 | + |
| 295 | + let user_details = fetch_user_details(&user_id).await; |
| 296 | + |
| 297 | + Json(User { |
| 298 | + id: user_id, |
| 299 | + name: user_details.name, |
| 300 | + email: user_details.email, |
| 301 | + }) |
| 302 | +} |
| 303 | + |
| 304 | +#[fastrace::trace] |
| 305 | +async fn fetch_user_details(user_id: &str) -> UserDetails { |
| 306 | + let client = reqwest::Client::new(); |
| 307 | + |
| 308 | + let response = client.get(&format!("https://user-details-service/users/{}", user_id)) |
| 309 | + .headers(fastrace_reqwest::traceparent_headers()) // Automatic trace context propagation |
| 310 | + .send() |
| 311 | + .await |
| 312 | + .expect("Request failed"); |
| 313 | + |
| 314 | + response.json::<UserDetails>().await.expect("Failed to parse JSON") |
| 315 | +} |
| 316 | + |
| 317 | +#[tokio::main] |
| 318 | +async fn main() { |
| 319 | + // Configure logging and tracing |
| 320 | + setup_observability("user-service"); |
| 321 | + |
| 322 | + let app = poem::Route::new() |
| 323 | + .at("/users/:id", poem::get(get_user)) |
| 324 | + .with(fastrace_poem::FastraceMiddleware); // Automatic trace context extraction |
| 325 | + |
| 326 | + poem::Server::new(poem::listener::TcpListener::bind("0.0.0.0:3000")) |
| 327 | + .run(app) |
| 328 | + .await |
| 329 | + .unwrap(); |
| 330 | + |
| 331 | + fastrace::flush(); |
| 332 | +} |
| 333 | + |
| 334 | +fn setup_observability(service_name: &str) { |
| 335 | + // Setup logging with logforth |
| 336 | + logforth::stderr() |
| 337 | + .dispatch(|d| { |
| 338 | + d.filter(log::LevelFilter::Info) |
| 339 | + // Attaches trace id to logs |
| 340 | + .diagnostic(logforth::diagnostic::FastraceDiagnostic::default()) |
| 341 | + // Attaches logs to spans |
| 342 | + .append(logforth::append::FastraceEvent::default()) |
| 343 | + }) |
| 344 | + .apply(); |
| 345 | + |
| 346 | + // Setup tracing with fastrace |
| 347 | + fastrace::set_reporter( |
| 348 | + fastrace_jaeger::JaegerReporter::new("127.0.0.1:6831".parse().unwrap(), service_name).unwrap(), |
| 349 | + fastrace::collector::Config::default() |
| 350 | + ); |
| 351 | +} |
| 352 | +``` |
| 353 | + |
| 354 | +## Conclusion |
| 355 | + |
| 356 | +`fastrace` represents a modern approach to distributed tracing in Rust. The most significant advantages of `fastrace` are: |
| 357 | + |
| 358 | +1. **Zero Runtime Overhead When Disabled**: Libraries can add rich instrumentation without worrying about performance impact when tracing is not enabled by the application. |
| 359 | + |
| 360 | +2. **No Ecosystem Lock-In**: Libraries can use `fastrace` without forcing their users into a specific logging ecosystem. |
| 361 | + |
| 362 | +3. **Simple API Surface**: The minimal API surface makes it easy to add comprehensive tracing with little code. |
| 363 | + |
| 364 | +4. **Predictable Performance**: `fastrace`'s performance characteristics are consistent and predictable, even under high load. |
| 365 | + |
| 366 | +An ecosystem where libraries are comprehensively instrumented with `fastrace` would enable unprecedented visibility into applications, without the performance or compatibility concerns that have historically prevented such instrumentation. |
| 367 | + |
| 368 | +## Resources |
| 369 | + |
| 370 | +- [fastrace](https://github.com/fast/fastrace) |
| 371 | +- [fastrace-jaeger](https://crates.io/crates/fastrace-jaeger) |
| 372 | +- [fastrace-opentelemetry](https://crates.io/crates/fastrace-opentelemetry) |
| 373 | +- [fastrace-reqwest](https://crates.io/crates/fastrace-reqwest) |
| 374 | +- [fastrace-poem](https://crates.io/crates/fastrace-poem) |
| 375 | +- [fastrace-tonic](https://crates.io/crates/fastrace-tonic) |
| 376 | +- [logforth](https://crates.io/crates/logforth) |
0 commit comments