home / skills / oimiragieo / agent-studio / kubernetes-flux

kubernetes-flux skill

/.claude/skills/kubernetes-flux

This skill helps you manage and troubleshoot Kubernetes clusters with kubectl, enabling pod, deployment, service inspection, logs, events, and rollout control.

npx playbooks add skill oimiragieo/agent-studio --skill kubernetes-flux

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
7.1 KB
---
name: kubernetes-flux
description: Kubernetes cluster management and troubleshooting. Query pods, deployments, services, logs, and events. Supports context switching, scaling, and rollout management. Use for Kubernetes debugging, monitoring, and operations.
version: 1.0
model: sonnet
invoked_by: both
user_invocable: true
tools: [Read, Write, Bash]
best_practices:
  - Verify kubectl is configured before operations
  - Use namespace flags for clarity
  - Check current context before cluster operations
  - Avoid destructive operations without confirmation
  - Mask secrets in output
error_handling: graceful
streaming: supported
safety_level: high
---

# Kubernetes Flux Skill

## Overview

This skill provides comprehensive Kubernetes cluster management through kubectl, enabling AI agents to inspect, troubleshoot, and manage Kubernetes resources.

## When to Use

- Debugging application pods and containers
- Monitoring deployment rollouts and status
- Analyzing service networking and endpoints
- Investigating cluster events and errors
- Troubleshooting performance issues
- Managing application scaling
- Port forwarding for local development

## Requirements

- kubectl installed and configured
- Valid KUBECONFIG file or default context
- Cluster access credentials
- Appropriate RBAC permissions

## Quick Reference

```bash
# Get pods in current namespace
kubectl get pods

# Get pods in specific namespace
kubectl get pods -n production

# Get pods with labels
kubectl get pods -l app=web -n production

# Describe a pod
kubectl describe pod my-app-123 -n default

# Get pod logs
kubectl logs my-app-123 -n default

# Get logs with tail
kubectl logs my-app-123 -n default --tail=100

# Get logs since time
kubectl logs my-app-123 -n default --since=1h

# List recent events
kubectl get events -n default --sort-by='.lastTimestamp' | tail -20

# Watch events in real-time
kubectl get events -n default -w
```

## Resource Discovery

### Pods

```bash
# List all pods
kubectl get pods -n <namespace>

# List pods with wide output
kubectl get pods -n <namespace> -o wide

# List pods across all namespaces
kubectl get pods -A

# Filter by label
kubectl get pods -l app=nginx -n <namespace>
```

### Deployments

```bash
# List deployments
kubectl get deployments -n <namespace>

# Get deployment details
kubectl describe deployment <name> -n <namespace>

# Check rollout status
kubectl rollout status deployment/<name> -n <namespace>
```

### Services

```bash
# List services
kubectl get svc -n <namespace>

# Describe service
kubectl describe svc <name> -n <namespace>

# Get endpoints
kubectl get endpoints <name> -n <namespace>
```

### ConfigMaps and Secrets

```bash
# List ConfigMaps
kubectl get configmaps -n <namespace>

# Describe ConfigMap
kubectl describe configmap <name> -n <namespace>

# Get ConfigMap data
kubectl get configmap <name> -n <namespace> -o yaml

# List Secrets (names only)
kubectl get secrets -n <namespace>

# Describe Secret (values masked)
kubectl describe secret <name> -n <namespace>
```

### Namespaces

```bash
# List namespaces
kubectl get namespaces

# Get namespace details
kubectl describe namespace <name>
```

## Troubleshooting

### Pod Debugging

```bash
# Describe pod for events and conditions
kubectl describe pod <name> -n <namespace>

# Get pod logs
kubectl logs <pod-name> -n <namespace>

# Get logs from specific container
kubectl logs <pod-name> -c <container-name> -n <namespace>

# Get previous container logs (after crash)
kubectl logs <pod-name> -n <namespace> --previous

# Exec into pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Run command in pod
kubectl exec <pod-name> -n <namespace> -- ls -la /app
```

### Events

```bash
# List events sorted by time
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Filter warning events
kubectl get events -n <namespace> --field-selector type=Warning

# Watch events live
kubectl get events -n <namespace> -w
```

## Management Operations

### Scaling

```bash
# Scale deployment
kubectl scale deployment <name> --replicas=5 -n <namespace>

# Autoscale deployment
kubectl autoscale deployment <name> --min=2 --max=10 --cpu-percent=80 -n <namespace>
```

### Rollouts

```bash
# Check rollout status
kubectl rollout status deployment/<name> -n <namespace>

# View rollout history
kubectl rollout history deployment/<name> -n <namespace>

# Rollback to previous version
kubectl rollout undo deployment/<name> -n <namespace>

# Rollback to specific revision
kubectl rollout undo deployment/<name> --to-revision=2 -n <namespace>
```

### Port Forwarding

```bash
# Forward local port to pod
kubectl port-forward <pod-name> 8080:80 -n <namespace>

# Forward to service
kubectl port-forward svc/<service-name> 8080:80 -n <namespace>
```

## Context Management

```bash
# Get current context
kubectl config current-context

# List all contexts
kubectl config get-contexts

# Switch context
kubectl config use-context <context-name>

# Set default namespace
kubectl config set-context --current --namespace=<namespace>
```

## Common Workflows

### Troubleshoot a Failing Pod

```bash
# 1. Find the problematic pod
kubectl get pods -n production

# 2. Describe for events
kubectl describe pod <pod-name> -n production

# 3. Check events
kubectl get events -n production --sort-by='.lastTimestamp' | tail -20

# 4. Get logs
kubectl logs <pod-name> -n production --tail=200
```

### Monitor Deployment Rollout

```bash
# 1. Check deployment status
kubectl get deployments -n production

# 2. Watch rollout
kubectl rollout status deployment/<name> -n production

# 3. Watch pods
kubectl get pods -l app=<app-name> -n production -w
```

### Debug Service Connectivity

```bash
# 1. Check service
kubectl describe svc <name> -n <namespace>

# 2. Check endpoints
kubectl get endpoints <name> -n <namespace>

# 3. Check backing pods
kubectl get pods -l <service-selector> -n <namespace>

# 4. Port forward for testing
kubectl port-forward svc/<name> 8080:80 -n <namespace>
```

## Safety Features

### Blocked Operations

The following are dangerous and require confirmation:

- `kubectl delete` commands
- Destructive exec commands (rm, dd, mkfs)
- Scale to 0 replicas in production

### Masked Output

Secret values are always masked. Only metadata shown.

## Error Handling

| Error                       | Cause               | Fix                   |
| --------------------------- | ------------------- | --------------------- |
| `kubectl not found`         | Not installed       | Install kubectl       |
| `Unable to connect`         | Cluster unreachable | Check network/VPN     |
| `Forbidden`                 | RBAC permissions    | Request permissions   |
| `NotFound`                  | Resource missing    | Verify name/namespace |
| `context deadline exceeded` | Timeout             | Check cluster health  |

## Related

- kubectl docs: https://kubernetes.io/docs/reference/kubectl/
- Kubernetes API: https://kubernetes.io/docs/reference/kubernetes-api/

## Memory Protocol (MANDATORY)

**Before starting:**

```bash
cat .claude/context/memory/learnings.md
```

**After completing:** Record any new patterns or exceptions discovered.

> ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

Overview

This skill delivers practical Kubernetes cluster management and troubleshooting using kubectl commands. It enables inspection of pods, deployments, services, logs, and events, plus context switching, scaling, and rollout control. Use it to accelerate debugging, monitoring, and routine cluster operations.

How this skill works

The skill runs kubectl commands against the configured KUBECONFIG context to gather resource lists, descriptions, logs, and events. It supports context switching, port-forwarding, scaling, rollout status and rollbacks, and masks secrets or blocks destructive actions by policy. Errors and common fixes are surfaced to guide remediation.

When to use it

  • Debug application pods and container failures
  • Monitor deployment rollouts and health
  • Investigate networking, services, and endpoints
  • Collect logs and recent events for incident analysis
  • Perform scaling, autoscale configuration, and rollbacks

Best practices

  • Ensure kubectl is installed and KUBECONFIG points to the correct cluster before running commands
  • Run describe and events together to correlate conditions and timestamps
  • Mask or avoid printing secrets; use describe for metadata and kubectl get -o yaml for controlled inspection
  • Prefer rollout status and history before performing rollbacks to understand changes
  • Avoid destructive kubectl delete or scale-to-zero in production without explicit confirmation

Example use cases

  • Find a crashing pod: list pods, describe the pod, fetch recent and previous logs, and exec into the container if needed
  • Assess deployment rollout: check deployment status, watch pods by label, and view rollout history before undoing
  • Debug service connectivity: describe the service, check endpoints, list backing pods, and port-forward for local testing
  • Scale an application: apply kubectl scale or autoscale, then verify new pod readiness and watch events for failures
  • Inspect cluster events: list recent events sorted by timestamp and watch live events during a deployment

FAQ

What permissions are required to use the skill?

You need kubectl access with RBAC permissions for the resources you will inspect or modify; read-only for diagnostics, write for scaling and rollouts.

How are secrets handled?

Secret values are always masked; only metadata is shown. For full secret data use kubectl get secret <name> -o yaml with caution and proper permissions.

What safety checks exist for destructive commands?

By default the skill blocks destructive delete commands, destructive exec operations, and warns about scaling to zero in production contexts; explicit confirmation is required to proceed.