home / skills / julianobarbosa / claude-code-skills / opentelemetry-skill

opentelemetry-skill skill

This skill guides implementing OpenTelemetry in Kubernetes, covering collectors, pipelines, instrumentation, and troubleshooting to improve observability.

npx playbooks add skill julianobarbosa/claude-code-skills --skill opentelemetry-skill

Review the files below or copy the command above to add this skill to your agents.

Files (8)

SKILL.md

4.7 KB

---
name: opentelemetry
description: Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.
---

# OpenTelemetry Implementation Guide

## Overview

OpenTelemetry (OTel) is a vendor-neutral observability framework for instrumenting, generating, collecting, and exporting telemetry data (traces, metrics, logs). This skill provides guidance for implementing OTEL in Kubernetes environments.

## Quick Start

### Deploy OTEL Collector on Kubernetes

```bash
# Add Helm repo
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# Install with basic config
helm install otel-collector open-telemetry/opentelemetry-collector \
  --namespace monitoring --create-namespace \
  --set mode=daemonset
```

### Send Test Data via OTLP

```bash
# gRPC endpoint: 4317, HTTP endpoint: 4318
curl -X POST http://otel-collector:4318/v1/traces \
  -H "Content-Type: application/json" \
  -d '{"resourceSpans":[]}'
```

## Core Concepts

**Signals**: Three types of telemetry data:

- **Traces**: Distributed request flows across services
- **Metrics**: Numerical measurements (counters, gauges, histograms)
- **Logs**: Event records with structured/unstructured data

**Collector Components**:

- **Receivers**: Accept data (OTLP, Prometheus, Jaeger, Zipkin)
- **Processors**: Transform data (batch, memory_limiter, k8sattributes)
- **Exporters**: Send data (prometheusremotewrite, loki, otlp)
- **Extensions**: Add capabilities (health_check, pprof, zpages)

## Collector Configuration

### Basic Pipeline Structure

```yaml
config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318

  processors:
    batch:
      timeout: 10s
      send_batch_size: 1024
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 25

  exporters:
    prometheusremotewrite:
      endpoint: "http://prometheus:9090/api/v1/write"
    loki:
      endpoint: "http://loki:3100/loki/api/v1/push"

  service:
    pipelines:
      metrics:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [prometheusremotewrite]
      logs:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [loki]
      traces:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [otlp/tempo]
```

### Kubernetes Attributes Enrichment

```yaml
processors:
  k8sattributes:
    auth_type: "serviceAccount"
    passthrough: false
    filter:
      node_from_env_var: ${env:K8S_NODE_NAME}
    extract:
      metadata:
        - k8s.pod.name
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.node.name
```

## Deployment Modes

| Mode | Use Case | Pros | Cons |
|------|----------|------|------|
| DaemonSet | Node-level collection | Full coverage, host metrics | Higher resource usage |
| Deployment | Centralized gateway | Scalable, easier management | Single point of failure |
| Sidecar | Per-pod collection | Isolated, fine-grained | Resource overhead per pod |

## Common Patterns

### Development Environment

- Enable debug exporter for visibility
- Lower resource limits (250m CPU, 512Mi memory)
- Include spot instance tolerations for cost savings

### Production Environment

- Implement sampling (10-50% for traces)
- Higher batch sizes (2048-4096)
- Enable autoscaling and PodDisruptionBudget
- Use TLS for all endpoints

## Detailed References

For in-depth guidance, see:

- **Collector Configuration**: [COLLECTOR.md](references/COLLECTOR.md)
- **Kubernetes Deployment**: [KUBERNETES.md](references/KUBERNETES.md)
- **Troubleshooting**: [TROUBLESHOOTING.md](references/TROUBLESHOOTING.md)
- **Instrumentation**: [INSTRUMENTATION.md](references/INSTRUMENTATION.md)

## Validation Commands

```bash
# Check collector pods
kubectl get pods -n monitoring -l app.kubernetes.io/name=otel-collector

# View collector logs
kubectl logs -n monitoring -l app.kubernetes.io/name=otel-collector --tail=100

# Test OTLP endpoint
kubectl run test-otlp --image=curlimages/curl:latest --rm -it -- \
  curl -v http://otel-collector.monitoring:4318/v1/traces

# Validate config syntax
otelcol validate --config=config.yaml
```

## Key Helm Chart Values

```yaml
mode: "daemonset"  # or "deployment"
presets:
  logsCollection:
    enabled: true
  hostMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
  kubeletMetrics:
    enabled: true
useGOMEMLIMIT: true
resources:
  limits:
    cpu: 500m
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 256Mi
```

Overview

This skill helps implement OpenTelemetry (OTel) observability for Kubernetes-based applications, covering Collector configuration, deployment modes, telemetry pipelines for traces/metrics/logs, instrumentation guidance, and troubleshooting. It focuses on practical, production-ready patterns for running the OTel Collector with Kubernetes, Helm, and OTLP endpoints. The goal is repeatable observability: reliable data collection, enrichment, and export to backends like Prometheus, Loki, or Tempo.

How this skill works

The skill explains core Collector components (receivers, processors, exporters, extensions) and shows how to wire pipelines for metrics, logs, and traces. It provides example Helm deployment steps, sample YAML for receivers/processors/exporters, and Kubernetes attribute enrichment to add pod and node metadata. Validation commands and recommended Helm values aid safe rollout and operational checks.

When to use it

Deploying OTEL Collector in Kubernetes to centralize telemetry collection
Creating or tuning traces/metrics/logs pipelines and exporter destinations
Instrumenting services with OTLP or troubleshooting missing telemetry
Choosing a deployment mode (DaemonSet, Deployment, Sidecar) for coverage and resource trade-offs
Hardening production observability with sampling, batching, and TLS

Best practices

Use k8sattributes processor to enrich telemetry with pod, namespace, deployment, and node metadata
Start with a DaemonSet for full host coverage; use Deployment for central gateway patterns and Sidecar for fine-grained isolation
Enable batching and memory_limiter processors to protect the Collector under load
Apply sampling for traces in high-volume environments (10–50%) and increase batch sizes for production
Validate configs with otelcol validate and watch Collector logs after rollout

Example use cases

Run a node-level Collector DaemonSet to collect host and container metrics and forward to Prometheus Remote Write
Deploy a centralized Collector Deployment that routes traces to Tempo and logs to Loki with k8s metadata enrichment
Attach sidecar Collectors for latency-sensitive services to reduce network hops and capture per-pod context
Lower resource limits and enable a debug exporter for faster feedback during development
Implement autoscaling, PDBs, and TLS for a production-ready Collector fleet

FAQ

Which Collector deployment mode should I choose?

Choose DaemonSet for full host coverage, Deployment for a centralized gateway, and Sidecar for per-pod isolation. Balance coverage against resource cost and operational complexity.

How do I test that the Collector is receiving OTLP data?

Use a curl container to POST to the OTLP HTTP endpoint (4318) or send a gRPC request to 4317. Also check Collector pod logs and kubectl get pods for pod health.