home / skills / openclaw / skills / linux-service-triage

linux-service-triage skill

/skills/kowl64/linux-service-triage

This skill diagnoses Linux service issues using logs, systemd/PM2, permissions, Nginx, and DNS checks, guiding fixes and verification.

npx playbooks add skill openclaw/skills --skill linux-service-triage

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
3.1 KB
---
name: linux-service-triage
description: Diagnoses common Linux service issues using logs, systemd/PM2, file permissions, Nginx reverse proxy checks, and DNS sanity checks. Use when a server app is failing, unreachable, or misconfigured.
---

# Linux & service basics: logs, systemd/PM2, permissions, Nginx reverse proxy, DNS checks

## PURPOSE
Diagnoses common Linux service issues using logs, systemd/PM2, file permissions, Nginx reverse proxy checks, and DNS sanity checks.

## WHEN TO USE
- TRIGGERS:
  - Show me why this service is failing using logs, then give the exact fix commands.
  - Restart this app cleanly and confirm it is listening on the right port.
  - Fix the permissions on this folder so the service can read and write safely.
  - Set up Nginx reverse proxy for this port and verify DNS and TLS are sane.
  - Create a systemd service for this script and make it survive reboots.
- DO NOT USE WHEN…
  - You need kernel debugging or deep performance profiling.
  - You want to exploit systems or bypass access controls.

## INPUTS
- REQUIRED:
  - Service type: systemd unit name or PM2 process name.
  - Observed symptom: error message, status output, or logs (pasted by user).
- OPTIONAL:
  - Nginx config snippet, domain name, expected upstream port.
  - Filesystem paths used by the service.
- EXAMPLES:
  - `systemctl status myapp` output + `journalctl` excerpt
  - Nginx server block + domain + upstream port

## OUTPUTS
- Default: triage report (likely cause, evidence from logs, minimal fix plan).
- If explicitly requested and safe: exact shell commands to apply the fix.
Success = service runs, listens on expected port, and reverse proxy/DNS path is correct.


## WORKFLOW
1. Confirm scope and safety:
   - identify service name and whether changes are permitted.
2. Gather evidence:
   - status output + recent logs (see `references/triage-commands.md`).
3. Classify failure:
   - config error, dependency missing, permission denied, port conflict, upstream unreachable, DNS mismatch.
4. Propose minimal fix + verification steps.
5. Validate network path (if web service):
   - app listens → Nginx proxies → DNS resolves → (TLS sanity if applicable).
6. Provide restart/reload plan and confirm health checks.
7. STOP AND ASK THE USER if:
   - logs/status output are missing,
   - actions require privileged access not confirmed,
   - TLS/cert management is required but setup is unknown.


## OUTPUT FORMAT
```text
TRIAGE REPORT
- Symptom:
- Evidence (what you provided):
- Most likely cause:
- Fix plan (minimal steps):
- Exact commands (ONLY if user approved changes):
- Verification:
- Rollback:
```


## SAFETY & EDGE CASES
- Read-only by default: diagnose from provided outputs; do not assume you can run commands.
- Avoid destructive changes; require explicit confirmation for anything risky.
- Prefer `nginx -t` before reload and verify ports with `ss`.


## EXAMPLES
- Input: “journal shows permission denied on /var/app/uploads.”  
  Output: path permission analysis + safe chown/chmod plan + verification.

- Input: “App works locally but domain returns 502.”  
  Output: upstream port checks + nginx error log interpretation + proxy_pass fix plan.

Overview

This skill diagnoses common Linux service issues and guides practical fixes for systemd/PM2 services, file permission problems, Nginx reverse proxy misconfigurations, and basic DNS/TLS sanity checks. It produces a concise triage report that points to the likely cause and a minimal, safe fix plan. Use it to restore unreachable or misbehaving server apps with clear verification steps.

How this skill works

I inspect service status outputs and log excerpts to classify failures into categories like configuration errors, missing dependencies, permission denials, port conflicts, or upstream reachability issues. For web services I validate the full path: app listening → Nginx proxy → DNS resolution → TLS sanity, and recommend non-destructive commands to verify and fix each layer. I default to read-only analysis unless you explicitly permit command suggestions.

When to use it

  • Service fails to start or crashes with journalctl/system logs available
  • App is running locally but domain returns 502/504 or proxy errors
  • You suspect file or directory permission denied errors in logs
  • You need a systemd unit or PM2 restart plan and health verification
  • You want DNS resolution or TLS basic sanity checks for a reverse-proxied app

Best practices

  • Provide exact service name, status output, and relevant log excerpts for accurate diagnosis
  • Share Nginx server block, domain, and expected upstream port when web issues occur
  • Prefer non-destructive fixes first; require explicit approval for changes that modify permissions or restart services
  • Run nginx -t and ss -ltnpu to verify configuration and listening ports before reloads
  • Include filesystem paths used by the service to check ownership and writable flags

Example use cases

  • journalctl shows permission denied on /var/www/uploads — produce safe chown/chmod commands and verification steps
  • systemctl status myapp shows failed dependencies — identify missing package or config error and give minimal fix plan
  • Nginx returns 502 — check upstream listening port, inspect nginx error_log, and recommend proxy_pass correction
  • App runs under PM2 but not reachable — verify PM2 status, port binding, and system firewall rules
  • Domain resolves to wrong IP or certificate mismatch — run DNS sanity checks and TLS verification guidance

FAQ

Can you run fixes on my server directly?

No. I analyze outputs you provide and can suggest exact shell commands, but I will only provide them after you explicitly approve applying changes.

What logs should I paste for a good triage?

Provide systemctl status output, recent journalctl lines for the unit, nginx error_log snippets if web-related, and any app stdout/stderr logs tied to the incident.