Gopher Holmes: On the Trail with OpenTelemetry

I have to admit, back when I started working on my first serious microservices project, I spent countless nights digging through logs trying to figure out why the system was slowing down or why some requests were vanishing into thin air. At the time, observability seemed like nothing more than logging to me. But as systems grew and became more complex, I realized just how critical true observability really is.
That’s when I discovered OpenTelemetry, one of the most valuable tools in my toolbox today. In this post, I want to share how I use OpenTelemetry in real-world Go applications, especially in production environments, sprinkled with some personal experience and practical tips.
What is OpenTelemetry and Why Does It Matter?
Observability is the ability to understand what’s happening inside a system from the outside. It rests on three pillars: traces, metrics, and logs. Now, instead of using different tools for each, wouldn’t it be nice to have a single, unified system? That’s exactly what OpenTelemetry offers.
In distributed systems, you need tracing to see where requests slow down or fail, metrics to monitor system health, and logs to capture key events. But logs alone are no longer enough—without trace IDs, they lack real context.
Getting Started with OpenTelemetry in Go
Integrating OpenTelemetry into a Go project is straightforward. I usually start by adding these modules:
go get go.opentelemetry.io/otel
go get go.opentelemetry.io/otel/sdk
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
The first thing I typically do is set up a simple span. For example:
package main
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/trace"
"fmt"
)
func main() {
tracer := otel.Tracer("my-app")
ctx, span := tracer.Start(context.Background(), "main-operation")
defer span.End()
fmt.Println("Hello, OpenTelemetry!")
}
This code starts a basic trace in my app. But at this point, it’s only local—the fun really begins when we start sending this data somewhere meaningful.
Sending Traces with an Exporter
In production, you need to export your traces to a proper observability platform. I usually go with OTLP because it’s flexible and widely supported.
Here’s an example of setting up an OTLP exporter:
package main
import (
"context"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/otel/semconv/v1.21.0"
"log"
)
func initTracer() *sdktrace.TracerProvider {
ctx := context.Background()
exporter, err := otlptracehttp.New(ctx)
if err != nil {
log.Fatal(err)
}
res, err := resource.New(ctx,
resource.WithAttributes(
semconv.ServiceNameKey.String("my-service"),
semconv.ServiceVersionKey.String("1.0.0"),
semconv.DeploymentEnvironmentKey.String("production"),
),
)
if err != nil {
log.Fatal(err)
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(res),
sdktrace.WithSampler(sdktrace.TraceIDRatioBased(0.01)),
)
otel.SetTracerProvider(tp)
return tp
}
Notice the sampler here. I set it to 1% because logging every trace in production is costly. Start low and scale as needed.
Best Practices for Production
- Sampling Strategy: Use TraceIDRatioBased. 1% for high traffic, up to 100% for low traffic.
- Batching: Always use BatchSpanProcessor. Don’t waste resources by sending spans individually.
- Resource Attributes: Always include service name, version, and environment. It makes analysis way easier.
- Exporter Resilience: Your app should not crash if the exporter fails. Log the error, but move on.
Context Propagation: Tracing Across Microservices
One of the toughest parts of distributed systems is tracking how a single request flows through multiple services. A user’s request might hit 3-4 microservices, and each only sees a part of the story. This is where context propagation comes in.
OpenTelemetry uses trace context, passed through HTTP headers or gRPC metadata, to track a request’s journey.
In Go, this is easy. If you're using HTTP, just wrap your handlers with OpenTelemetry’s middleware:
import (
"net/http"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)
func main() {
handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("Hello with tracing!"))
})
wrappedHandler := otelhttp.NewHandler(handler, "hello-handler")
http.Handle("/hello", wrappedHandler)
http.ListenAndServe(":8080", nil)
}
Using Gin? Even easier:
import (
"github.com/gin-gonic/gin"
"go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin"
)
func main() {
r := gin.Default()
r.Use(otelgin.Middleware("my-gin-service"))
r.GET("/user/:id", func(c *gin.Context) {
c.String(200, "User details")
})
r.Run(":8080")
}
Linking Logs to Traces: Shedding Light on Errors
Traces are great, but logs are still essential for understanding what’s happening inside your app. The problem? In distributed systems, logs without trace context are just noise.
Linking logs with trace IDs is a must. OpenTelemetry gives each span a trace ID and span ID, which you can inject into your logs for instant context.
I like to add trace IDs to all my logs. In Go, it’s simple:
import (
"log"
"go.opentelemetry.io/otel/trace"
)
func logWithTrace(ctx context.Context) {
span := trace.SpanFromContext(ctx)
traceID := span.SpanContext().TraceID().String()
log.Printf("trace_id=%s, an error occurred: %s", traceID, "user not found")
}
Using Logrus or Zap? Here’s how with Logrus:
import (
"github.com/sirupsen/logrus"
"go.opentelemetry.io/otel/trace"
)
func logWithLogrus(ctx context.Context) {
span := trace.SpanFromContext(ctx)
traceID := span.SpanContext().TraceID().String()
logrus.WithField("trace_id", traceID).Error("failed to process payment")
}
This way, when you see an error log, you can jump straight to the trace and see the full picture.
Understanding what’s happening inside your systems and reacting to issues quickly requires observability. With OpenTelemetry in Go, it’s not just about collecting traces, but about propagating them across services and linking them to logs.
In my experience, seeing a trace ID in a log has saved me hours. I can instantly open the trace in my dashboard and follow every step that led to the issue. Once set up, nothing in your system is hidden from you.