home / skills / vadimcomanescu / codex-skills / senior-devops

This skill helps you design repeatable and safe deployment pipelines, improve observability, and manage runbooks for reliable, low-risk releases.

npx playbooks add skill vadimcomanescu/codex-skills --skill senior-devops

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
1.1 KB
---
name: senior-devops
description: "DevOps workflow for CI/CD, infrastructure, observability, reliability, and safe deployments. Use when designing deployment pipelines, reviewing infra changes, improving operational readiness (alerts/runbooks), or auditing a repo’s production-readiness signals."
---

# Senior DevOps

Make deployments repeatable and incidents survivable.

## Quick Start
1) Define the operational goal (latency, availability, cost) and deploy frequency.
2) Pipeline: build → test → package → deploy → verify → rollback.
3) Observability: logs/metrics/traces + alerts tied to user impact.
4) Runbooks: how to debug and how to roll back safely.

## Release readiness checklist
- Rollback path is tested and documented.
- Alerts are tied to user-facing impact and have clear owners.
- Deploys are scoped, with feature flags for risky changes.

## Optional tool: repo ops inventory
```bash
python ~/.codex/skills/senior-devops/scripts/repo_ops_inventory.py . --out /tmp/ops_inventory.md
```

## References
- Deployment checklist: `references/deploy-checklist.md`

Overview

This skill provides a senior DevOps workflow for CI/CD, infrastructure, observability, reliability, and safe deployments. It focuses on making deployments repeatable and incidents survivable by aligning pipelines, monitoring, and operational practices with business goals. Use it to design deployment pipelines, review infrastructure changes, improve operational readiness, or audit production-readiness signals.

How this skill works

The skill inspects and codifies an end-to-end release flow: build → test → package → deploy → verify → rollback. It evaluates observability across logs, metrics, and traces, and maps alerts to user impact with clear ownership. It also checks for tested rollback paths, scoped deploys with feature flags, and presence of runbooks for common incidents.

When to use it

  • Designing or redesigning CI/CD pipelines for repeatable releases
  • Reviewing infrastructure-as-code changes before production rollout
  • Improving operational readiness: alerts, runbooks, and on-call ownership
  • Auditing a repository for production-readiness signals and gaps
  • Planning safe deployments or launch strategies for risky features

Best practices

  • Define and document operational goals (latency, availability, cost, deploy frequency) up front
  • Keep pipeline stages explicit: build, test, package, deploy, verify, rollback
  • Tie alerts to user-facing impact and assign clear owners for each alert
  • Test rollback paths regularly and document step-by-step rollback procedures
  • Scope deploys and use feature flags to reduce blast radius for risky changes

Example use cases

  • Create a CI/CD pipeline template that enforces test, verification, and automated rollback steps
  • Audit a repo to produce an operations inventory and a prioritized list of missing runbooks or alerts
  • Review terraform or CloudFormation changes for safety, drift, and rollbackability
  • Design an observability plan that maps key metrics and alerts to user journeys
  • Prepare a deployment checklist for a high-risk release, including feature flags and verification gates

FAQ

What is the minimal pipeline I should adopt?

At minimum: build, automated tests, package/artifact, deploy to an environment with verification, and a documented rollback path.

How do I prioritize alerts to avoid noise?

Prioritize by user impact: map alerts to business metrics, assign owners, and set escalation rules so only high-impact issues interrupt on-call.