home / skills / dexploarer / hyper-forge / distributed-tracing-setup

distributed-tracing-setup skill

/.claude/skills/distributed-tracing-setup

This skill helps teams configure distributed tracing with Jaeger, Zipkin, or Datadog to improve microservices observability and troubleshooting.

npx playbooks add skill dexploarer/hyper-forge --skill distributed-tracing-setup

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.1 KB
---
name: distributed-tracing-setup
description: Configure distributed tracing with Jaeger, Zipkin, or Datadog for microservices observability
allowed-tools: [Read, Write, Edit, Bash, Grep, Glob]
---

# distributed tracing setup

Configure distributed tracing with Jaeger, Zipkin, or Datadog for microservices observability

## When to Use

This skill activates when you need to configure distributed tracing with jaeger, zipkin, or datadog for microservices observability.

## Quick Example

```yaml
# Configuration example for distributed-tracing-setup
# See full documentation in the skill implementation
```

## Best Practices

- ✅ Follow industry standards
- ✅ Document all configurations
- ✅ Test thoroughly before production
- ✅ Monitor and alert appropriately
- ✅ Regular maintenance and updates

## Related Skills

- `microservices-orchestrator`
- `compliance-auditor`
- Use `enterprise-architect` agent for design consultation

## Implementation Guide

[Detailed implementation steps would go here in production]

This skill provides comprehensive guidance for configure distributed tracing with jaeger, zipkin, or datadog for microservices observability.

Overview

This skill configures distributed tracing for microservices using Jaeger, Zipkin, or Datadog to improve observability and incident response. It provides practical setup steps, configuration patterns, and integration tips targeted at TypeScript-based services. The guidance focuses on actionable outcomes: trace context propagation, sampling policies, and backend export configuration. Use it to get consistent, production-ready tracing across a service mesh or standalone services.

How this skill works

The skill inspects service entry points, middleware, and client libraries to recommend where to instrument code and inject trace context. It outlines configuration for popular collectors and exporters, environment variables, and SDK initialization in TypeScript apps. It also covers sampling strategies, resource tagging (service name, version, environment), and secure transport to backends like Jaeger, Zipkin, or Datadog. Finally, it suggests verification steps and simple end-to-end tests to confirm traces are collected and visible.

When to use it

  • When you need end-to-end request visibility across multiple microservices.
  • When debugging latency spikes, error hotspots, or cascading failures.
  • When introducing observability during migration to a service mesh or new runtime.
  • When enforcing consistent instrumentation and trace context propagation.
  • When configuring tracing for production with controlled sampling and retention.

Best practices

  • Instrument at service boundaries and key business logic, not every internal function.
  • Propagate trace context through HTTP headers, message brokers, and background jobs.
  • Use stable service names, environment tags, and version metadata for grouping traces.
  • Start with conservative sampling in production and increase for targeted debugging.
  • Secure collector endpoints and restrict access to tracing data; rotate keys regularly.

Example use cases

  • Add automatic HTTP and gRPC tracing to a TypeScript microservice using OpenTelemetry SDK and export to Jaeger.
  • Configure sidecar or agent-based collection for Kubernetes workloads and send traces to Datadog with APM enabled.
  • Instrument background workers and message handlers so asynchronous work links back to originating requests.
  • Migrate Zipkin instrumentation to OpenTelemetry while preserving trace IDs and sampling behavior.
  • Set up alerting rules for increased trace error rates or high p95 latency in the tracing backend.

FAQ

Which tracer should I pick: Jaeger, Zipkin, or Datadog?

Choose based on backend requirements: Jaeger for open-source self-hosted, Zipkin for lightweight setups, Datadog for managed APM with integrated metrics and logs.

How do I validate trace propagation across services?

Run an end-to-end request, inspect trace IDs in logs, and verify the full span tree appears in the tracing UI; add a deterministic test request if needed.