Monitoring30 November 20246 min read

Introduction to Distributed Tracing: Following a Request Across Services

A distributed trace is a directed acyclic graph of spans. Each span represents a unit of work — a database query, an HTTP call, a cache lookup. Together, they show you the complete story of a single request traveling through your system.

MonitoringUptime MonitoringWebsite MonitoringApi MonitoringCron Job Monitoring

Monitoring

Spans and traces: the basic vocabulary

Trace — the complete lifecycle of a single request, from entry point to final response.

Span — one unit of work within a trace. Has a start time, duration, and optional error status.

Trace ID — a unique identifier that flows through every service on the same request.

Parent span ID — links child spans to their parent, forming the tree structure.

Every log line should include the trace ID. This is how you jump from a log to its full trace context.

OpenTelemetry: the standard

OpenTelemetry (OTel) is the vendor-neutral standard for distributed tracing. It provides: - Auto-instrumentation for most frameworks (Express, FastAPI, Spring, etc.) - Manual instrumentation SDK for custom spans - Exporters to Jaeger, Zipkin, Tempo, Datadog, etc.

from opentelemetry import trace
tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span('process-payment'): result = process_payment(order) ```

What to instrument first

Prioritize: 1. Service entry points (HTTP handlers, queue consumers) 2. External calls (database queries, HTTP clients, cache operations) 3. Business logic boundaries (order placement, user authentication)

Avoid instrumenting every internal function — it creates noise without value.

Sampling strategy

Tracing 100% of traffic is expensive. Common strategies:

- Head-based sampling — decide at trace start (simple, misses rare errors) - Tail-based sampling — decide after trace completes (captures errors, more complex) - Rate limiting — N traces per second per service

For most teams: 10% head-based sampling + 100% error sampling.

Connecting traces to AlertsDock monitors

When an uptime monitor fires, your incident response starts with: what requests were failing during that window? With distributed tracing enabled, you can filter traces by time range and error status to see exactly which service and which code path caused the failure.

This article is available across the supported locale routes — use the language switcher above to change.

Feature Guide

Uptime Monitoring

AlertsDock gives teams uptime monitoring for websites, APIs, TCP checks, DNS checks, SSL expiry, and fast alert routing without enterprise overhead.

Read guide

Alternative Page

UptimeRobot Alternative

Compare AlertsDock with UptimeRobot for teams that want uptime monitoring plus heartbeat monitoring, status pages, webhook inspection, and per-resource alert routing.

See comparison

AlertsDock Team

30 November 2024

Try AlertsDock free

Monitoring

Frontend Monitoring: Real User Monitoring vs Synthetic Testing

Backend uptime checks miss the browser. Real user monitoring shows you what actual users experience — slow renders, JavaScript errors, and failed resource loads that your API monitors never see.

Monitoring

API Gateway Monitoring: Seeing What Happens Before Your Code Runs

Your API gateway processes every request before it reaches your service. Rate limits, auth failures, and routing errors all happen there — and most teams have zero visibility into them.

Monitoring

Monitoring AI Workloads: LLM APIs, Inference Costs, and Timeout Handling

LLM API calls can take 30 seconds and cost $0.10 each. When they fail, they fail silently in ways traditional monitoring was never designed to catch.

Introduction to Distributed Tracing: Following a Request Across Services

Spans and traces: the basic vocabulary

OpenTelemetry: the standard

What to instrument first

Sampling strategy

Connecting traces to AlertsDock monitors

Uptime Monitoring

UptimeRobot Alternative

More articles

Frontend Monitoring: Real User Monitoring vs Synthetic Testing

API Gateway Monitoring: Seeing What Happens Before Your Code Runs

Monitoring AI Workloads: LLM APIs, Inference Costs, and Timeout Handling