OpenTelemetry Tracing: Complete Setup Guide for DevOps & SRE in 2026

Introduction

Distributed tracing answers the hardest question in microservices: "Why is this request slow?" A single user request can touch 12 services. Without tracing, you debug by grep-ing logs across 12 different dashboards and guessing.

OpenTelemetry is the CNCF standard for distributed tracing. It gives you end-to-end request visibility across services, languages, and infrastructure — with or without code changes.

This guide covers auto-instrumentation for Go, Python, and Node.js, plus OTLP export to Jaeger and Grafana Tempo.

Auto-Instrumentation: Traces Without Code Changes

The OpenTelemetry Operator injects instrumentation into your pods without modifying a single line of application code:

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Annotate your namespace or deployment to enable auto-instrumentation:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-java: "true"
    instrumentation.opentelemetry.io/inject-python: "true"
    instrumentation.opentelemetry.io/inject-nodejs: "true"

The operator injects an init container with the OpenTelemetry agent. When the application starts, the agent attaches to the runtime and instruments HTTP, gRPC, database calls, and message queues — automatically.

For Go applications (which compile instrumentation into the binary), use the SDK directly:

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
)

func initTracer() {
    exporter, _ := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint("tempo:4317"),
        otlptracegrpc.WithInsecure(),
    )
    tp := trace.NewTracerProvider(trace.WithBatcher(exporter))
    otel.SetTracerProvider(tp)
}

Export to Tempo and Jaeger

OpenTelemetry uses OTLP as the wire protocol. Both Tempo and Jaeger accept OTLP:

# OpenTelemetry Collector pipeline
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp/tempo]

Deploy the Collector as a DaemonSet — one per node — to handle trace volume without adding latency. Applications send traces to localhost:4317 and the Collector batches and exports to Tempo.

Sampling: Don't Store Every Trace

At scale, tracing 100% of requests is cost-prohibitive. Use intelligent sampling:

Head sampling (probabilistic): Decide at trace start. 10% sampling means 10% of requests are traced. Simple but misses rare slow requests.
Tail sampling (Collector): Decide after the trace completes. Sample 100% of traces with errors or latency above P99. This is the SRE-relevant approach — you capture every incident trace.

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: latency
        type: latency
        latency: {threshold_ms: 1000}
      - name: default
        type: probabilistic
        probabilistic: {sampling_percentage: 10}

Tracing in Production: The SRE Checklist

Export to Tempo (cost-effective object storage backend) or Jaeger (Elasticsearch/Cassandra)
Enable tail sampling — capture every error and slow trace, discard healthy fast traffic
Correlate traces with logs via trace ID injection in structured logging
Use span attributes for business context: user.id, order.id, tenant.id

For the broader observability picture — combining traces with metrics and logs — our OpenTelemetry complete setup guide covers the full three-pillar implementation, including metrics export to Prometheus and logs via the OTel filelog receiver.

For teams adopting eBPF-based observability alongside OpenTelemetry, our eBPF observability for SRE guide shows how kernel-level telemetry complements application-level tracing.

OpenTelemetry tracing turns every request into a story — from ingress to database and back. When the next incident hits, you will not be grep-ing logs. You will be following a trace.

OpenTelemetry Tracing: Complete Setup Guide for DevOps & SRE in 2026

Introduction

Auto-Instrumentation: Traces Without Code Changes

Export to Tempo and Jaeger

Sampling: Don't Store Every Trace

Tracing in Production: The SRE Checklist

Related Articles

eBPF Observability for SRE: The End of Sidecars in 2026

SLI vs SLO vs SLA: The Real SRE Guide with Examples in 2026

AI Agents for SRE: Autonomous Incident Response in 2026