home / skills / openclaw / skills / docker-diag

docker-diag skill

/skills/mkrdiop/docker-diag

This skill analyzes Docker container logs, extracts errors, and reports root causes with actionable fixes.

npx playbooks add skill openclaw/skills --skill docker-diag

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
712 B
---
name: Docker Pro Diagnostic
description: Advanced log analysis for Docker containers using signal extraction.
bins: ["python3", "docker"]
---

# Docker Pro Diagnostic

When a user asks "Why is my container failing?" or "Analyze the logs for [container]", follow these steps:

1.  **Run Extraction:** Call `python3 {{skillDir}}/log_processor.py <container_name>`.
2.  **Analyze:** Feed the output (which contains errors and context) into your reasoning engine.
3.  **Report:** Summarize the root cause. If it looks like a code error, suggest a fix. If it looks like a resource error (OOM), suggest increasing Docker memory limits.

## Example Command
`python3 log_processor.py api_gateway_prod`

Overview

This skill performs advanced diagnostic analysis of Docker container failures by extracting signals from container logs and summarizing root causes. It automates log extraction, highlights errors and contextual traces, and recommends targeted fixes for code issues and resource constraints. The goal is fast, actionable insights to reduce downtime and speed remediation.

How this skill works

Run the included log extractor against a target container to produce a structured output of errors, stack traces, and contextual lines. Feed that output into the reasoning engine, which correlates patterns (exceptions, OOMs, connection errors) and prioritizes likely root causes. The skill then generates a concise report with probable causes and concrete remediation steps such as code fixes, configuration changes, or resource limit adjustments.

When to use it

  • When a container repeatedly crashes or restarts and you need a clear cause.
  • After receiving an error notification but before rolling back or redeploying.
  • When logs are noisy and you want the most relevant error signals isolated.
  • To determine if failures are code bugs, dependency issues, or resource limits.
  • During post-incident analysis to produce a short remedial action list.

Best practices

  • Run the extractor against the exact container name or task identifier to capture correct logs.
  • Collect recent logs around the crash window to ensure context is included.
  • Compare before/after resource limit changes when investigating OOMs.
  • Pair the report with the application source or stack trace to validate suggested code fixes.
  • Keep the log extractor updated to recognize new error patterns used by your stack.

Example use cases

  • Diagnosing frequent container restarts for an API service after a deploy.
  • Identifying whether an exception in logs is due to a bug or a missing dependency.
  • Confirming that a pod/container OOM needs higher memory limits rather than a code leak.
  • Triage of multiple failing containers to prioritize the most impactful fixes.
  • Generating a short action list for on-call engineers after an incident.

FAQ

What command runs the extractor?

Run the Python extractor with the container name: python3 log_processor.py <container_name>.

Can it tell the difference between code bugs and resource issues?

Yes. The analysis correlates error types and signals (exceptions vs OOM traces) and recommends fixes like code changes or increasing Docker memory limits.