What is Distributed Tracing?
Distributed tracing is a method for tracking and observing requests as they flow through multiple services in a distributed system. It provides visibility into the entire lifecycle of a request, from the initial entry point through all downstream service calls.
Think of distributed tracing as a GPS for your requests - it shows you the complete path a request takes through your system, how long each step takes, and where problems occur.
The Challenge
Modern applications are rarely monolithic. A single user action might trigger:
An API gateway request
Authentication service verification
Database queries across multiple services
Cache lookups
External API calls
Message queue operations
Background job processing
Without distributed tracing, you can only see fragments of this journey in individual service logs. You lose the connection between cause and effect.
Key Concepts
Traces
A trace represents the entire journey of a request through your system. It has a unique identifier (trace-id) that remains constant across all services involved.
Trace: User checkout flow
trace-id: 4bf92f3577b34da6a3ce929d0e0e4736
Service A (API Gateway) → Service B (Cart) → Service C (Payment) → Service D (Inventory)
All operations related to this checkout flow share the same trace-id, allowing you to correlate logs, metrics, and events across all four services.
Spans
A span represents a single operation within a trace. Each service or operation creates its own span, identified by a unique parent-id.
Trace: 4bf92f3577b34da6a3ce929d0e0e4736
├─ Span: 00f067aa0ba902b7 (API Gateway) [100ms]
├─ Span: a1b2c3d4e5f6a7b8 (Cart Service) [60ms]
│ └─ Span: b2c3d4e5f6a7b8c9 (Database) [40ms]
└─ Span: c3d4e5f6a7b8c9d0 (Payment) [80ms]
└─ Span: d4e5f6a7b8c9d0e1 (Bank API) [70ms]
Each span tracks:
Duration : How long the operation took
Parent relationship : Which span triggered this one
Metadata : Additional context about the operation
Context Propagation
Context propagation is the mechanism of passing trace information from one service to another. This is where the W3C Trace Context specification and tctx come in.
Without standardized propagation:
// Service A
fetch ( '/service-b' , {
headers: { 'x-custom-trace' : 'some-id' }
});
// Service B doesn't know how to interpret 'x-custom-trace'
// The trace is broken
With W3C Trace Context:
// Service A
import * as traceparent from 'tctx/traceparent' ;
const parent = traceparent . make ();
fetch ( '/service-b' , {
headers: { 'traceparent' : parent . child (). toString () }
});
// Service B
const parent = traceparent . parse ( req . headers . get ( 'traceparent' ));
// ✓ Service B understands the standard format and continues the trace
How Trace Context Enables Distributed Tracing
The W3C Trace Context specification provides the foundation for distributed tracing by standardizing how trace information flows between services.
The Flow
Initial Request
A request arrives at your system’s entry point (e.g., API gateway). // No traceparent header exists yet
const parent = traceparent . make ();
// Creates: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-03
First Service Processing
The first service processes the request and needs to call downstream services. // Create child span for downstream call
fetch ( '/cart-service' , {
headers: {
'traceparent' : parent . child (). toString ()
}
});
// Sends: 00-4bf92f3577b34da6a3ce929d0e0e4736-a1b2c3d4e5f6a7b8-03
// ^^^^^^^^^^^^^^^^ (new parent-id)
Downstream Service
The downstream service receives the traceparent and continues the trace. // Parse incoming trace context
const parent = traceparent . parse ( req . headers . get ( 'traceparent' ));
// Make another downstream call
fetch ( '/payment-service' , {
headers: {
'traceparent' : parent . child (). toString ()
}
});
// Sends: 00-4bf92f3577b34da6a3ce929d0e0e4736-b2c3d4e5f6a7b8c9-03
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (same trace-id)
// ^^^^^^^^^^^^^^^^ (new parent-id)
Complete Trace
All services maintain the same trace-id while creating unique parent-ids, creating a complete trace hierarchy.
Trace Hierarchy Visualization
HTTP Request → API Gateway
trace-id: 4bf92f3577b34da6a3ce929d0e0e4736
parent-id: 00f067aa0ba902b7
|
├─→ Cart Service
│ trace-id: 4bf92f3577b34da6a3ce929d0e0e4736 (same)
│ parent-id: a1b2c3d4e5f6a7b8 (new)
│ |
│ └─→ Database Query
│ trace-id: 4bf92f3577b34da6a3ce929d0e0e4736 (same)
│ parent-id: b2c3d4e5f6a7b8c9 (new)
│
└─→ Payment Service
trace-id: 4bf92f3577b34da6a3ce929d0e0e4736 (same)
parent-id: c3d4e5f6a7b8c9d0 (new)
|
└─→ External Bank API
trace-id: 4bf92f3577b34da6a3ce929d0e0e4736 (same)
parent-id: d4e5f6a7b8c9d0e1 (new)
Why Trace Context Matters
1. End-to-End Visibility
Without trace context:
[API Gateway]: Request processed in 200ms
[Cart Service]: Query took 60ms
[Payment Service]: Transaction completed in 80ms
With trace context:
Trace 4bf92f3577b34da6a3ce929d0e0e4736:
├─ API Gateway (200ms total)
│ ├─ Cart Service (60ms)
│ │ └─ Database (40ms) ← 66% of cart service time
│ └─ Payment Service (80ms)
│ └─ Bank API (70ms) ← 87% of payment service time!
You can now see that most time is spent on external calls, not your code.
2. Cross-Service Debugging
When a user reports an error, search logs across all services for the trace-id:
# Find all log entries related to this request
grep "4bf92f3577b34da6a3ce929d0e0e4736" /var/log/ ** / * .log
You’ll see:
Which service failed
What upstream events led to the failure
What downstream operations were affected
The complete request timeline
Identify bottlenecks by analyzing span durations:
// Log spans with timing information
console . log ({
trace_id: parent . trace_id ,
parent_id: parent . parent_id ,
service: 'payment-service' ,
operation: 'process_payment' ,
duration_ms: 850 // ← This operation is slow!
});
Aggregating this data reveals patterns:
Which operations are consistently slow
Where time is actually spent
How changes affect performance across services
4. Vendor Interoperability
Because W3C Trace Context is a standard, different observability tools can work together:
Service A uses Datadog
Service B uses New Relic
Service C uses OpenTelemetry
All three systems can participate in the same trace because they all understand the traceparent header format.
Sampling
In high-traffic systems, tracing every request is impractical and expensive. Sampling lets you trace a representative subset of requests.
The Sampled Flag
The traceparent header includes a sampled flag (bit 0 of the flags field):
import { make , sample , unsample , is_sampled } from 'tctx/traceparent' ;
const parent = make ();
console . log ( is_sampled ( parent )); // true (sampled by default)
// Implement sampling logic
if ( Math . random () > 0.1 ) { // Sample only 10% of requests
unsample ( parent );
}
Sampling Strategies
The decision to sample is made at the trace’s origin (the “head”). // Entry point service
const parent = make ();
// Sample 10% of requests
if ( Math . random () > 0.1 ) {
unsample ( parent );
}
Pros : Simple, efficient, consistentCons : Might miss interesting traces (errors, slow requests)
The decision is made after observing the complete trace. // Collect spans in memory
const spans = collectSpans ( trace_id );
// Decide based on trace characteristics
const shouldSample =
hasError ( spans ) ||
isSlowTrace ( spans ) ||
matchesRules ( spans );
if ( shouldSample ) {
exportToStorage ( spans );
}
Pros : Captures interesting traces, more intelligentCons : Complex, requires buffering, higher memory usage
Sampling rate adjusts based on traffic patterns and system load. let sampleRate = 0.1 ; // Start at 10%
setInterval (() => {
if ( highTraffic ()) {
sampleRate = Math . max ( 0.01 , sampleRate * 0.5 );
} else {
sampleRate = Math . min ( 1.0 , sampleRate * 1.5 );
}
}, 60000 );
Pros : Balances cost and coverageCons : Requires monitoring infrastructure
Respecting Upstream Sampling Decisions
When receiving a traced request, respect the upstream sampling decision:
const parent = traceparent . parse ( req . headers . get ( 'traceparent' ));
if ( parent && ! is_sampled ( parent )) {
// Don't record this trace, but continue propagating it
return ; // Skip expensive tracing operations
}
// Record trace data only if sampled
recordSpan ({
trace_id: parent . trace_id ,
parent_id: parent . parent_id ,
duration_ms: elapsed
});
According to the W3C spec, downstream services should respect the sampled flag but can make their own decisions. Common practice is to honor upstream decisions to maintain consistent sampling across a trace.
While traceparent provides standardized trace correlation, tracestate allows you to attach service-specific metadata.
Use Cases
User Context state . set ( 'user-id' , userId );
state . set ( 'tenant' , tenantId );
Track which user triggered the trace
Feature Flags state . set ( 'feature-x' , 'enabled' );
state . set ( 'experiment' , 'variant-b' );
See which features were active
Request Metadata state . set ( 'api-version' , 'v2' );
state . set ( 'client' , 'mobile-app' );
Track request characteristics
Routing Info state . set ( 'region' , 'us-west-2' );
state . set ( 'environment' , 'production' );
Record infrastructure details
Complete Example
import * as traceparent from 'tctx/traceparent' ;
import * as tracestate from 'tctx/tracestate' ;
export async function handleRequest ( req : Request ) {
// Parse or create trace context
let parent = traceparent . parse ( req . headers . get ( 'traceparent' ));
let state = parent
? tracestate . parse ( req . headers . get ( 'tracestate' ))
: null ;
parent ||= traceparent . make ();
state ||= tracestate . make ();
// Add service-specific metadata
state . set ( 'api-gateway' , 'processed' );
state . set ( 'user-id' , getUserId ( req ));
state . set ( 'api-version' , 'v2' );
const startTime = Date . now ();
try {
// Make downstream call
const response = await fetch ( 'https://cart-service/checkout' , {
headers: {
'traceparent' : parent . child (). toString (),
'tracestate' : state . toString ()
}
});
// Record successful span
recordSpan ({
trace_id: parent . trace_id ,
parent_id: parent . parent_id ,
service: 'api-gateway' ,
operation: 'checkout' ,
duration_ms: Date . now () - startTime ,
status: 'ok'
});
return response ;
} catch ( error ) {
// Record error span
recordSpan ({
trace_id: parent . trace_id ,
parent_id: parent . parent_id ,
service: 'api-gateway' ,
operation: 'checkout' ,
duration_ms: Date . now () - startTime ,
status: 'error' ,
error: error . message
});
throw error ;
}
}
tctx provides the foundation for integration with popular observability platforms:
OpenTelemetry
import { trace } from '@opentelemetry/api' ;
import * as traceparent from 'tctx/traceparent' ;
const parent = traceparent . parse ( req . headers . get ( 'traceparent' ));
if ( parent ) {
const tracer = trace . getTracer ( 'my-service' );
const span = tracer . startSpan ( 'operation' , {
attributes: {
'trace.id' : parent . trace_id ,
'span.id' : parent . parent_id
}
});
}
Custom Logging
import * as traceparent from 'tctx/traceparent' ;
const parent = traceparent . parse ( req . headers . get ( 'traceparent' ))
|| traceparent . make ();
// Structured logging with trace context
console . log ( JSON . stringify ({
level: 'info' ,
message: 'Processing request' ,
trace_id: parent . trace_id ,
span_id: parent . parent_id ,
timestamp: new Date (). toISOString ()
}));
Most Application Performance Monitoring tools automatically recognize W3C Trace Context headers:
Datadog : Reads traceparent and tracestate automatically
New Relic : Native support for W3C Trace Context
Elastic APM : Full W3C Trace Context support
Honeycomb : Accepts standard trace headers
Best Practices
Always propagate trace context
Even if your service doesn’t record traces, always propagate the headers: const headers = new Headers ();
// Preserve trace context
const tp = req . headers . get ( 'traceparent' );
if ( tp ) headers . set ( 'traceparent' , tp );
const ts = req . headers . get ( 'tracestate' );
if ( ts ) headers . set ( 'tracestate' , ts );
fetch ( '/downstream' , { headers });
Create child spans for downstream calls
Always use .child() when making outbound requests: // ✓ Correct
fetch ( '/api' , {
headers: { traceparent: parent . child (). toString () }
});
// ✗ Incorrect - breaks trace hierarchy
fetch ( '/api' , {
headers: { traceparent: parent . toString () }
});
Validate before parsing tracestate
Only parse tracestate if traceparent is valid: const parent = traceparent . parse ( req . headers . get ( 'traceparent' ));
let state = null ;
if ( parent ) {
const ts = req . headers . get ( 'tracestate' );
if ( ts ) state = tracestate . parse ( ts );
}
Use meaningful tracestate keys
Choose descriptive, namespaced keys: // ✓ Good - clear, namespaced
state . set ( 'payment-svc@company' , 'processed' );
state . set ( 'user-id' , '12345' );
// ✗ Bad - vague, conflicts likely
state . set ( 'status' , 'ok' );
state . set ( 'id' , '12345' );
Make sampling decisions at the entry point: const parent = make ();
// Sample based on your requirements
if ( ! shouldSample ( req )) {
unsample ( parent );
}
Common Patterns
Middleware Pattern
function traceMiddleware ( handler : Handler ) : Handler {
return async ( req : Request ) => {
const parent = traceparent . parse ( req . headers . get ( 'traceparent' ))
|| traceparent . make ();
const state = parent
? tracestate . parse ( req . headers . get ( 'tracestate' ))
: null ;
// Attach to request context
req . trace = { parent , state };
return handler ( req );
};
}
Service Client Pattern
class ServiceClient {
constructor ( private baseURL : string ) {}
async call ( path : string , parent : Traceparent ) {
return fetch ( ` ${ this . baseURL }${ path } ` , {
headers: {
'traceparent' : parent . child (). toString ()
}
});
}
}
Background Job Pattern
interface Job {
data : unknown ;
trace ?: string ; // Serialized traceparent
}
// Producer
queue . push ({
data: { user_id: 123 },
trace: parent . toString ()
});
// Consumer
const job = await queue . pop ();
const parent = job . trace
? traceparent . parse ( job . trace )
: traceparent . make ();
// Continue the trace in background processing
processJob ( job . data , parent );
Next Steps
API Reference Explore the complete tctx API
Guides See real-world usage examples
Core Concepts Learn about W3C Trace Context
Performance Learn about tctx’s performance