home / skills / xiaomi / mone / k8s-troubleshoot
This skill helps you diagnose Kubernetes apps by finding pods by label selectors and running diagnostic commands inside containers for quick issue resolution.
npx playbooks add skill xiaomi/mone --skill k8s-troubleshootReview the files below or copy the command above to add this skill to your agents.
---
name: k8s-troubleshoot
description: Kubernetes troubleshooting toolkit - search pods by labels and execute diagnostic commands inside containers. Use when user reports service errors, exceptions, crashes, timeouts, or needs to check logs, processes, network, or resource usage in K8s pods.
---
# Kubernetes Troubleshooting Skill
A complete toolkit for diagnosing Kubernetes applications. Find pods by labels, then execute commands inside containers for deep diagnostics.
## When to Use
- User reports service errors, exceptions, failures, timeouts
- Need to check application logs or process status
- Diagnose network, memory, or disk issues
- Keywords: error, exception, failed, timeout, crash, not working, logs, troubleshoot, diagnose, pod, container
## Workflow
1. **Search pods** - Find target pods by label selector
2. **Execute diagnostics** - Run commands inside containers
## Scripts
### 1. Search Pods
Find pods by label selector:
```bash
uv run python .claude/skills/k8s-troubleshoot/scripts/search_pods.py -l "app=nginx" -n default
```
| Parameter | Required | Description |
|-----------|----------|-------------|
| `-l, --label-selector` | Yes | Label selector, e.g., `app=nginx` or `project-id=123,pipeline-id=456` |
| `-n, --namespace` | No | Namespace (default: `default`). Use `all` for all namespaces |
**Output**: JSON with `success`, `podCount`, `pods` (name, namespace, phase, containers)
### 2. Execute Command in Pod
Run diagnostic commands inside a container:
```bash
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p "pod-name" -n default -cmd "tail -n 100 /root/logs/app.log"
```
| Parameter | Required | Description |
|-----------|----------|-------------|
| `-p, --pod` | Yes | Pod name |
| `-n, --namespace` | No | Namespace (default: `default`) |
| `-c, --container` | No | Container name (for multi-container pods) |
| `-cmd, --command` | Yes | Command to execute |
**Output**: JSON with `success`, `pod`, `namespace`, `command`, `output`
## Common Diagnostic Patterns
### View application logs
```bash
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "tail -n 100 /root/logs/app.log"
```
### Check Nacos config (dubbo3 issues)
```bash
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "cat /root/logs/nacos/config.log | grep nacos"
```
### Check processes
```bash
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "ps aux | head -20"
```
### Check network
```bash
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "netstat -tlnp"
```
### Check disk and memory
```bash
uv run python .claude/skills/k8s-troubleshoot/scripts/exec_pod.py -p my-pod -n default -cmd "df -h && free -m"
```
## Troubleshooting Tips
| Issue | Diagnostic Command |
|-------|-------------------|
| dubbo3 no provider | Check `/root/logs/nacos/config.log` for nacos address |
| Service not responding | Check process status with `ps aux` and logs |
| Connection issues | Check network with `netstat -tlnp` |
| OOM errors | Check memory with `free -m` |This skill is a Kubernetes troubleshooting toolkit that helps you find pods by labels and run diagnostic commands inside container(s). It is designed for rapid diagnosis of service errors, crashes, timeouts, and resource or network problems in Kubernetes clusters. Use it to inspect logs, processes, network sockets, and resource usage without leaving your CLI.
First it searches the cluster for pods matching a label selector and returns structured JSON with pod names, namespaces, phases, and containers. Then it executes user-provided shell commands inside a chosen pod container and returns the command output in JSON. The workflow enables scripted checks (logs, ps, netstat, df/free) and ad-hoc interactive diagnostics.
Can I search across all namespaces?
Yes. Use the namespace parameter set to 'all' to search across every namespace.
What if a pod has multiple containers?
Specify the container name with the container parameter to run commands in the correct container; otherwise the default container will be used.