home / skills / shaul1991 / shaul-agents-plugin / devops-infra

devops-infra skill

/skills/devops-infra

This skill helps you manage containers, backups, and network configurations efficiently by guiding resource optimization and scaling across environments.

npx playbooks add skill shaul1991/shaul-agents-plugin --skill devops-infra

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
4.1 KB
---
name: devops-infra
description: DevOps Infrastructure Agent. 인프라 관리, 스케일링, 백업, 네트워크 설정을 담당합니다. 인프라, 스케일링(scale), 백업(backup), 네트워크 관련 요청 시 사용됩니다.
allowed-tools: Bash(docker:*), Bash(systemctl:*), Bash(cat:*), Bash(curl:*), Read, Write, Edit, Grep
---

# DevOps Infrastructure Agent

## 역할
인프라 관리 및 리소스 운영을 담당합니다.

## 담당 업무

### 1. 리소스 관리

#### Docker 리소스
```bash
# 디스크 사용량
docker system df

# 미사용 리소스 정리
docker system prune -f

# 이미지 정리 (최근 5개 유지)
docker images nest-api --format "{{.Tag}}" | \
  grep -v latest | sort -r | tail -n +6 | \
  xargs -I {} docker rmi nest-api:{}
```

#### 볼륨 관리
```bash
# 볼륨 목록
docker volume ls --filter "name=nest-api"

# 볼륨 상세
docker volume inspect nest-api-[dev|prod]_postgres_data
```

### 2. 스케일링

현재 단일 인스턴스 구성. 스케일링 필요 시:
1. Docker Swarm 또는 Kubernetes 도입 검토
2. 로드밸런서 구성
3. 데이터베이스 복제 설정

### 3. 백업

#### 데이터베이스 백업
```bash
# PostgreSQL 백업
docker exec nest-api-postgres-[env] \
  pg_dump -U nest_api nest_api > backup_$(date +%Y%m%d).sql

# 복원
cat backup.sql | docker exec -i nest-api-postgres-[env] \
  psql -U nest_api nest_api
```

#### 볼륨 백업
```bash
# 볼륨 데이터 백업
docker run --rm \
  -v nest-api-[env]_postgres_data:/data \
  -v $(pwd):/backup \
  alpine tar czf /backup/postgres_backup.tar.gz /data
```

### 4. 네트워크 관리

#### Caddy 설정
```bash
# 설정 파일
cat /etc/caddy/Caddyfile

# 설정 검증
caddy validate --config /etc/caddy/Caddyfile

# 리로드
systemctl reload caddy
```

#### Docker 네트워크
```bash
# 네트워크 목록
docker network ls --filter "name=nest-api"

# 네트워크 상세
docker network inspect nest-api-[dev|prod]
```

## 인프라 구성도

```
┌─────────────────────────────────────────┐
│              Host Server                │
├─────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    │
│  │   Caddy     │    │   Docker    │    │
│  │  (Proxy)    │    │  (Engine)   │    │
│  └──────┬──────┘    └──────┬──────┘    │
│         │                  │            │
│  ┌──────▼──────────────────▼──────┐    │
│  │         Docker Networks         │    │
│  │  ┌──────────┐  ┌──────────┐    │    │
│  │  │ nest-api │  │ nest-api │    │    │
│  │  │   -dev   │  │  -prod   │    │    │
│  │  └────┬─────┘  └────┬─────┘    │    │
│  │       │             │          │    │
│  │  ┌────▼────┐   ┌────▼────┐    │    │
│  │  │ Volumes │   │ Volumes │    │    │
│  │  │  (Dev)  │   │ (Prod)  │    │    │
│  │  └─────────┘   └─────────┘    │    │
│  └────────────────────────────────┘    │
└─────────────────────────────────────────┘
```

## 유지보수 작업

### 정기 작업
| 주기 | 작업 | 명령어 |
|------|------|--------|
| 일간 | 로그 확인 | `docker logs --since 24h` |
| 주간 | 디스크 정리 | `docker system prune -f` |
| 월간 | DB 백업 | `pg_dump` |

### 비상 절차
1. 서비스 중단 시: 헬스체크 → 롤백 → 원인 분석
2. 디스크 부족 시: 이미지/로그 정리 → 볼륨 확장
3. 네트워크 장애 시: Caddy 재시작 → DNS 확인

Overview

This skill is a DevOps Infrastructure Agent that handles infrastructure management, scaling, backups, and network configuration for containerized applications. It provides commands, procedures, and runbook-style guidance for Docker resources, volumes, backups, and proxy/network management. The goal is reliable runtime operations, repeatable maintenance, and clear escalation steps for incidents.

How this skill works

The agent inspects and manages Docker resources (images, containers, volumes, networks) and host-level proxy configuration (Caddy). It automates routine maintenance tasks such as pruning unused resources, backing up databases and volumes, validating and reloading proxy config, and provides options for scaling via orchestration (Swarm/Kubernetes) and load balancers. It also defines regular maintenance cadence and emergency procedures for outages and resource exhaustion.

When to use it

  • When disk usage or unused Docker resources need cleanup and reclamation
  • When you need reliable database and volume backup or restore procedures
  • When planning to scale a single-instance deployment to multiple nodes
  • When validating, reloading, or troubleshooting the Caddy proxy configuration
  • When responding to service outages, disk full events, or network failures

Best practices

  • Schedule daily log checks, weekly resource pruning, and monthly DB backups
  • Keep a small number of recent container images and prune older tags regularly
  • Store backups off-host and verify restore procedures periodically
  • Use network isolation per environment (dev/prod) and inspect Docker networks regularly
  • Adopt orchestration (Kubernetes/Swarm) and a load balancer before scaling traffic

Example use cases

  • Free up disk space by pruning unused images and removing stale container tags
  • Create a consistent nightly PostgreSQL dump and rotate backups with dated filenames
  • Backup a Docker volume by running a temporary container that archives volume contents
  • Validate and reload Caddy configuration after TLS or routing updates
  • Plan migration from single-host Docker to Kubernetes with load balancer and DB replication

FAQ

How do I back up the PostgreSQL database running in a container?

Run a pg_dump from inside the Postgres container to a dated SQL file on the host, then copy or transfer that file off-host for safekeeping.

What is the quick way to reclaim disk space used by Docker?

Run docker system prune -f and remove older image tags while keeping the most recent few; inspect volumes to remove any unused volumes if safe.