home / skills / julianobarbosa / claude-code-skills / opentelemetry-skill
This skill guides implementing OpenTelemetry in Kubernetes, covering collectors, pipelines, instrumentation, and troubleshooting to improve observability.
npx playbooks add skill julianobarbosa/claude-code-skills --skill opentelemetry-skillReview the files below or copy the command above to add this skill to your agents.
---
name: opentelemetry
description: Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.
---
# OpenTelemetry Implementation Guide
## Overview
OpenTelemetry (OTel) is a vendor-neutral observability framework for instrumenting, generating, collecting, and exporting telemetry data (traces, metrics, logs). This skill provides guidance for implementing OTEL in Kubernetes environments.
## Quick Start
### Deploy OTEL Collector on Kubernetes
```bash
# Add Helm repo
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
# Install with basic config
helm install otel-collector open-telemetry/opentelemetry-collector \
--namespace monitoring --create-namespace \
--set mode=daemonset
```
### Send Test Data via OTLP
```bash
# gRPC endpoint: 4317, HTTP endpoint: 4318
curl -X POST http://otel-collector:4318/v1/traces \
-H "Content-Type: application/json" \
-d '{"resourceSpans":[]}'
```
## Core Concepts
**Signals**: Three types of telemetry data:
- **Traces**: Distributed request flows across services
- **Metrics**: Numerical measurements (counters, gauges, histograms)
- **Logs**: Event records with structured/unstructured data
**Collector Components**:
- **Receivers**: Accept data (OTLP, Prometheus, Jaeger, Zipkin)
- **Processors**: Transform data (batch, memory_limiter, k8sattributes)
- **Exporters**: Send data (prometheusremotewrite, loki, otlp)
- **Extensions**: Add capabilities (health_check, pprof, zpages)
## Collector Configuration
### Basic Pipeline Structure
```yaml
config:
receivers:
otlp:
protocols:
grpc:
endpoint: ${env:MY_POD_IP}:4317
http:
endpoint: ${env:MY_POD_IP}:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
memory_limiter:
check_interval: 5s
limit_percentage: 80
spike_limit_percentage: 25
exporters:
prometheusremotewrite:
endpoint: "http://prometheus:9090/api/v1/write"
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/tempo]
```
### Kubernetes Attributes Enrichment
```yaml
processors:
k8sattributes:
auth_type: "serviceAccount"
passthrough: false
filter:
node_from_env_var: ${env:K8S_NODE_NAME}
extract:
metadata:
- k8s.pod.name
- k8s.namespace.name
- k8s.deployment.name
- k8s.node.name
```
## Deployment Modes
| Mode | Use Case | Pros | Cons |
|------|----------|------|------|
| DaemonSet | Node-level collection | Full coverage, host metrics | Higher resource usage |
| Deployment | Centralized gateway | Scalable, easier management | Single point of failure |
| Sidecar | Per-pod collection | Isolated, fine-grained | Resource overhead per pod |
## Common Patterns
### Development Environment
- Enable debug exporter for visibility
- Lower resource limits (250m CPU, 512Mi memory)
- Include spot instance tolerations for cost savings
### Production Environment
- Implement sampling (10-50% for traces)
- Higher batch sizes (2048-4096)
- Enable autoscaling and PodDisruptionBudget
- Use TLS for all endpoints
## Detailed References
For in-depth guidance, see:
- **Collector Configuration**: [COLLECTOR.md](references/COLLECTOR.md)
- **Kubernetes Deployment**: [KUBERNETES.md](references/KUBERNETES.md)
- **Troubleshooting**: [TROUBLESHOOTING.md](references/TROUBLESHOOTING.md)
- **Instrumentation**: [INSTRUMENTATION.md](references/INSTRUMENTATION.md)
## Validation Commands
```bash
# Check collector pods
kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector
# View collector logs
kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100
# Test OTLP endpoint
kubectl run test-otlp --image=curlimages/curl:latest --rm -it -- \
curl -v http://otel-collector.monitoring:4318/v1/traces
# Validate config syntax
otelcol validate --config=config.yaml
```
## Key Helm Chart Values
```yaml
mode: "daemonset" # or "deployment"
presets:
logsCollection:
enabled: true
hostMetrics:
enabled: true
kubernetesAttributes:
enabled: true
kubeletMetrics:
enabled: true
useGOMEMLIMIT: true
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
```
This skill helps implement OpenTelemetry (OTel) observability for Kubernetes-based applications, covering Collector configuration, deployment modes, telemetry pipelines for traces/metrics/logs, instrumentation guidance, and troubleshooting. It focuses on practical, production-ready patterns for running the OTel Collector with Kubernetes, Helm, and OTLP endpoints. The goal is repeatable observability: reliable data collection, enrichment, and export to backends like Prometheus, Loki, or Tempo.
The skill explains core Collector components (receivers, processors, exporters, extensions) and shows how to wire pipelines for metrics, logs, and traces. It provides example Helm deployment steps, sample YAML for receivers/processors/exporters, and Kubernetes attribute enrichment to add pod and node metadata. Validation commands and recommended Helm values aid safe rollout and operational checks.
Which Collector deployment mode should I choose?
Choose DaemonSet for full host coverage, Deployment for a centralized gateway, and Sidecar for per-pod isolation. Balance coverage against resource cost and operational complexity.
How do I test that the Collector is receiving OTLP data?
Use a curl container to POST to the OTLP HTTP endpoint (4318) or send a gRPC request to 4317. Also check Collector pod logs and kubectl get pods for pod health.