home / skills / vadimcomanescu / codex-skills / senior-devops

senior-devops skill

safe

This skill helps you design repeatable and safe deployment pipelines, improve observability, and manage runbooks for reliable, low-risk releases.

npx playbooks add skill vadimcomanescu/codex-skills --skill senior-devops

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

1.1 KB

---
name: senior-devops
description: "DevOps workflow for CI/CD, infrastructure, observability, reliability, and safe deployments. Use when designing deployment pipelines, reviewing infra changes, improving operational readiness (alerts/runbooks), or auditing a repo’s production-readiness signals."
---

# Senior DevOps

Make deployments repeatable and incidents survivable.

## Quick Start
1) Define the operational goal (latency, availability, cost) and deploy frequency.
2) Pipeline: build → test → package → deploy → verify → rollback.
3) Observability: logs/metrics/traces + alerts tied to user impact.
4) Runbooks: how to debug and how to roll back safely.

## Release readiness checklist
- Rollback path is tested and documented.
- Alerts are tied to user-facing impact and have clear owners.
- Deploys are scoped, with feature flags for risky changes.

## Optional tool: repo ops inventory
```bash
python ~/.codex/skills/senior-devops/scripts/repo_ops_inventory.py . --out /tmp/ops_inventory.md
```

## References
- Deployment checklist: `references/deploy-checklist.md`

Overview

This skill provides a senior DevOps workflow for CI/CD, infrastructure, observability, reliability, and safe deployments. It focuses on making deployments repeatable and incidents survivable by aligning pipelines, monitoring, and operational practices with business goals. Use it to design deployment pipelines, review infrastructure changes, improve operational readiness, or audit production-readiness signals.

How this skill works

The skill inspects and codifies an end-to-end release flow: build → test → package → deploy → verify → rollback. It evaluates observability across logs, metrics, and traces, and maps alerts to user impact with clear ownership. It also checks for tested rollback paths, scoped deploys with feature flags, and presence of runbooks for common incidents.

When to use it

Designing or redesigning CI/CD pipelines for repeatable releases
Reviewing infrastructure-as-code changes before production rollout
Improving operational readiness: alerts, runbooks, and on-call ownership
Auditing a repository for production-readiness signals and gaps
Planning safe deployments or launch strategies for risky features

Best practices

Define and document operational goals (latency, availability, cost, deploy frequency) up front
Keep pipeline stages explicit: build, test, package, deploy, verify, rollback
Tie alerts to user-facing impact and assign clear owners for each alert
Test rollback paths regularly and document step-by-step rollback procedures
Scope deploys and use feature flags to reduce blast radius for risky changes

Example use cases

Create a CI/CD pipeline template that enforces test, verification, and automated rollback steps
Audit a repo to produce an operations inventory and a prioritized list of missing runbooks or alerts
Review terraform or CloudFormation changes for safety, drift, and rollbackability
Design an observability plan that maps key metrics and alerts to user journeys
Prepare a deployment checklist for a high-risk release, including feature flags and verification gates

FAQ

What is the minimal pipeline I should adopt?

At minimum: build, automated tests, package/artifact, deploy to an environment with verification, and a documented rollback path.

How do I prioritize alerts to avoid noise?

Prioritize by user impact: map alerts to business metrics, assign owners, and set escalation rules so only high-impact issues interrupt on-call.