home / skills / michaelvessia / nixos-config / datadog

datadog skill

safe

/modules/programs/claude-code/skills/datadog

This skill helps you debug and triage with the Datadog CLI by searching logs, querying metrics, and tracing distributed requests.

npx playbooks add skill michaelvessia/nixos-config --skill datadog

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

6.5 KB

---
name: datadog
description: |
  Datadog CLI for debugging and triaging. Use this skill when you need to:
  search Datadog logs, query metrics, tail logs in real-time, trace distributed requests,
  investigate errors, compare time periods, find log patterns, check service health,
  or export observability data. Trigger phrases include "search logs", "tail logs",
  "query metrics", "check Datadog", "find errors", "trace request", "compare errors",
  "what services exist", "log patterns", "CPU usage", "service health".
---

# Datadog CLI Reference

A CLI tool for AI agents to debug and triage using Datadog logs and metrics.

datadog-cli should be in your path. If not, you can try running it with
`bunx @ctdio/datadog-cli` or `npx @ctdio/datadog-cli`. If none of these work,
report that to the user.

## Commands

### Log Search

```bash
datadog logs search --query "<query>" [--from <time>] [--to <time>] [--limit <n>] [--sort <order>]
```

**Examples:**

```bash
datadog logs search --query "status:error" --from 1h
datadog logs search --query "service:api status:error @http.status_code:500" --from 1h
```

### Live Tail (Real-time Streaming)

Stream logs as they arrive. Press Ctrl+C to stop.

```bash
datadog logs tail --query "<query>" [--interval <seconds>]
```

**Examples:**

```bash
datadog logs tail --query "status:error"
datadog logs tail --query "service:api" --interval 5
```

### Trace Correlation

Find all logs for a distributed trace across services.

```bash
datadog logs trace --id "<trace-id>" [--from <time>] [--to <time>]
```

**Example:**

```bash
datadog logs trace --id "abc123def456" --from 24h
```

### Log Context

Get logs before and after a specific timestamp to understand what happened.

```bash
datadog logs context --timestamp "<iso-timestamp>" [--before <time>] [--after <time>] [--service <svc>]
```

**Examples:**

```bash
datadog logs context --timestamp "2024-01-15T10:30:00Z" --before 5m --after 2m
datadog logs context --timestamp "2024-01-15T10:30:00Z" --service api --before 10m
```

### Error Summary

Quick breakdown of errors by service, type, and message.

```bash
datadog errors [--from <time>] [--to <time>] [--service <svc>]
```

**Examples:**

```bash
datadog errors --from 1h
datadog errors --service payment-api --from 24h
```

### Period Comparison

Compare log counts between current period and previous period.

```bash
datadog logs compare --query "<query>" --period <time>
```

**Examples:**

```bash
datadog logs compare --query "status:error" --period 1h
datadog logs compare --query "service:api status:error" --period 6h
```

### Log Patterns

Group similar log messages to find patterns (replaces UUIDs, numbers, etc.).

```bash
datadog logs patterns --query "<query>" [--from <time>] [--limit <n>]
```

**Examples:**

```bash
datadog logs patterns --query "status:error" --from 1h
datadog logs patterns --query "service:api" --from 6h --limit 1000
```

### Service Discovery

List all services with recent log activity.

```bash
datadog services [--from <time>] [--to <time>]
```

**Example:**

```bash
datadog services --from 24h
```

### Log Aggregation

```bash
datadog logs agg --query "<query>" --facet <facet> [--from <time>]
```

**Common facets:** `status`, `service`, `host`, `@http.status_code`,
`@error.kind`

**Examples:**

```bash
datadog logs agg --query "*" --facet status --from 1h
datadog logs agg --query "status:error" --facet service --from 24h
```

### Multiple Queries

Run multiple queries in parallel.

```bash
datadog logs multi --queries "name1:query1,name2:query2" [--from <time>]
```

**Example:**

```bash
datadog logs multi --queries "errors:status:error,warnings:status:warn" --from 1h
```

### Metrics Query

```bash
datadog metrics query --query "<metrics-query>" [--from <time>] [--to <time>]
```

**Query format:** `<aggregation>:<metric>{<tags>}`

**Examples:**

```bash
datadog metrics query --query "avg:system.cpu.user{*}" --from 1h
datadog metrics query --query "avg:system.cpu.user{service:api}" --from 1h
datadog metrics query --query "sum:trace.http.request.errors{service:api}.as_count()" --from 1h
```

## Global Flags

| Flag              | Description                         |
| ----------------- | ----------------------------------- |
| `--pretty`        | Human-readable output with colors   |
| `--output <file>` | Export results to JSON file         |
| `--site <site>`   | Datadog site (e.g., `datadoghq.eu`) |

## Time Formats

- Relative: `30m`, `1h`, `6h`, `24h`, `7d`
- ISO 8601: `2024-01-15T10:30:00Z`

## Common Workflows

### Incident Triage

```bash
# 1. Quick error overview
datadog errors --from 1h

# 2. Is this new? Compare to previous period
datadog logs compare --query "status:error" --period 1h

# 3. What patterns are we seeing?
datadog logs patterns --query "status:error" --from 1h

# 4. Narrow down by service
datadog logs search --query "status:error service:payment-api" --from 1h

# 5. Get context around a specific timestamp
datadog logs context --timestamp "2024-01-15T10:30:00Z" --service api --before 5m --after 2m

# 6. Follow the distributed trace
datadog logs trace --id "TRACE_ID"
```

### Real-time Debugging

```bash
# Stream errors as they happen
datadog logs tail --query "status:error"

# Watch specific service
datadog logs tail --query "service:api status:error"
```

### Service Health Check

```bash
# List services
datadog services --from 24h

# Check error distribution
datadog logs agg --query "service:api" --facet status --from 1h

# Check CPU/memory
datadog metrics query --query "avg:system.cpu.user{service:api}" --from 1h
```

### Export for Sharing

```bash
# Save search results
datadog logs search --query "status:error" --from 1h --output errors.json

# Save error summary
datadog errors --from 24h --output error-report.json
```

## Datadog Query Syntax

| Operator  | Example                       | Description        |
| --------- | ----------------------------- | ------------------ |
| `AND`     | `service:api status:error`    | Both conditions    |
| `OR`      | `status:error OR status:warn` | Either condition   |
| `-`       | `-status:info`                | Exclude            |
| `*`       | `service:api-*`               | Wildcard           |
| `>=` `<=` | `@http.status_code:>=400`     | Numeric comparison |
| `[TO]`    | `@duration:[1000 TO 5000]`    | Range              |

### Common Attributes

- `service` - Service name
- `status` - Log level (error, warn, info, debug)
- `host` - Hostname
- `@http.status_code` - HTTP status code
- `@error.kind` - Error type
- `@trace_id` / `@dd.trace_id` - Trace ID

Overview

This skill provides a Datadog CLI interface for debugging and triaging observability data from logs, metrics, and traces. It helps you search and stream logs, correlate traces, summarize errors, compare time periods, and export results for sharing or analysis. Use it when you need fast, command-line access to Datadog insights during incidents or investigations.

How this skill works

The skill runs the datadog-cli binary (or falls back to bunx/npx wrappers) to execute prebuilt commands that query logs, tail live streams, aggregate patterns, query metrics, and fetch trace-correlated logs. Commands accept relative or ISO timestamps, support facets and filters, and can output pretty or JSON-formatted results for downstream use. Global flags allow colored output, file export, and selecting the Datadog site.

When to use it

Triage an incident and get a quick error breakdown by service or error kind
Search logs or tail errors in real time to reproduce and observe failures
Correlate logs with a trace ID to follow distributed requests across services
Compare current error rates to the previous period to detect regressions
Find recurring log patterns by normalizing variable parts like UUIDs and numbers
Query metrics (CPU, request errors) to check service health or resource usage

Best practices

Start with datadog errors for a fast overview, then narrow with logs search or patterns
Use relative times (30m, 1h, 24h) during incident work for quick iteration
Tail logs with an appropriate interval to avoid noisy output for high-traffic services
Export results to JSON for sharing with teammates or attaching to tickets
Filter by service, host, or specific attributes (@http.status_code, @error.kind) to reduce noise
Run multiple queries in parallel when comparing different error types or services

Example use cases

Run datadog errors --from 1h then datadog logs compare --query "status:error" --period 1h to see if errors are new
Stream recent errors with datadog logs tail --query "service:api status:error" while reproducing a bug
Fetch all logs for a trace ID: datadog logs trace --id "TRACE_ID" --from 24h to inspect cross-service hops
Aggregate error distribution: datadog logs agg --query "service:api" --facet status --from 1h to check service health
Find common log templates: datadog logs patterns --query "status:error" --from 6h --limit 1000 to identify root causes

FAQ

What if datadog-cli is not in my PATH?

Try running it with bunx or npx (bunx @ctdio/datadog-cli or npx @ctdio/datadog-cli). If those fail, report the missing binary so it can be installed or added to PATH.

How do I export a search for sharing?

Add --output <file> to most commands (for example datadog logs search ... --output errors.json) to save JSON results you can attach or analyze offline.