home / skills / ahmedasmar / devops-claude-skills / ci-cd

ci-cd skill

unsafe

This skill helps you design, optimize, and secure CI/CD pipelines across platforms with DevSecOps scanning and troubleshooting guidance.

npx playbooks add skill ahmedasmar/devops-claude-skills --skill ci-cd

Review the files below or copy the command above to add this skill to your agents.

Files (19)

SKILL.md

16.6 KB

---
name: ci-cd
description: CI/CD pipeline design, optimization, DevSecOps security scanning, and troubleshooting. Use for creating workflows, debugging pipeline failures, implementing SAST/DAST/SCA, optimizing build performance, implementing caching strategies, setting up deployments, securing pipelines with OIDC/secrets management, and troubleshooting common issues across GitHub Actions, GitLab CI, and other platforms.
---

# CI/CD Pipelines

Comprehensive guide for CI/CD pipeline design, optimization, security, and troubleshooting across GitHub Actions, GitLab CI, and other platforms.

## When to Use This Skill

Use this skill when:
- Creating new CI/CD workflows or pipelines
- Debugging pipeline failures or flaky tests
- Optimizing slow builds or test suites
- Implementing caching strategies
- Setting up deployment workflows
- Securing pipelines (secrets, OIDC, supply chain)
- Implementing DevSecOps security scanning (SAST, DAST, SCA)
- Troubleshooting platform-specific issues
- Analyzing pipeline performance
- Implementing matrix builds or test sharding
- Configuring multi-environment deployments

## Core Workflows

### 1. Creating a New Pipeline

**Decision tree:**
```
What are you building?
├── Node.js/Frontend → GitHub: templates/github-actions/node-ci.yml | GitLab: templates/gitlab-ci/node-ci.yml
├── Python → GitHub: templates/github-actions/python-ci.yml | GitLab: templates/gitlab-ci/python-ci.yml
├── Go → GitHub: templates/github-actions/go-ci.yml | GitLab: templates/gitlab-ci/go-ci.yml
├── Docker Image → GitHub: templates/github-actions/docker-build.yml | GitLab: templates/gitlab-ci/docker-build.yml
├── Other → Follow the pipeline design pattern below
```

**Basic pipeline structure:**
```yaml
# 1. Fast feedback (lint, format) - <1 min
# 2. Unit tests - 1-5 min
# 3. Integration tests - 5-15 min
# 4. Build artifacts
# 5. E2E tests (optional, main branch only) - 15-30 min
# 6. Deploy (with approval gates)
```

**Key principles:**
- Fail fast: Run cheap validation first
- Parallelize: Remove unnecessary job dependencies
- Cache dependencies: Use `actions/cache` or GitLab cache
- Use artifacts: Build once, deploy many times

See [best_practices.md](references/best_practices.md) for comprehensive pipeline design patterns.

### 2. Optimizing Pipeline Performance

**Quick wins checklist:**
- [ ] Add dependency caching (50-90% faster builds)
- [ ] Remove unnecessary `needs` dependencies
- [ ] Add path filters to skip unnecessary runs
- [ ] Use `npm ci` instead of `npm install`
- [ ] Add job timeouts to prevent hung builds
- [ ] Enable concurrency cancellation for duplicate runs

**Analyze existing pipeline:**
```bash
# Use the pipeline analyzer script
python3 scripts/pipeline_analyzer.py --platform github --workflow .github/workflows/ci.yml
```

**Common optimizations:**
- **Slow tests:** Shard tests with matrix builds
- **Repeated dependency installs:** Add caching
- **Sequential jobs:** Parallelize with proper `needs`
- **Full test suite on every PR:** Use path filters or test impact analysis

See [optimization.md](references/optimization.md) for detailed caching strategies, parallelization techniques, and performance tuning.

### 3. Securing Your Pipeline

**Essential security checklist:**
- [ ] Use OIDC instead of static credentials
- [ ] Pin actions/includes to commit SHAs
- [ ] Use minimal permissions
- [ ] Enable secret scanning
- [ ] Add vulnerability scanning (dependencies, containers)
- [ ] Implement branch protection
- [ ] Separate test from deploy workflows

**Quick setup - OIDC authentication:**

**GitHub Actions → AWS:**
```yaml
permissions:
  id-token: write
  contents: read

steps:
  - uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789:role/GitHubActionsRole
      aws-region: us-east-1
```

**Secrets management:**
- Store in platform secret stores (GitHub Secrets, GitLab CI/CD Variables)
- Mark as "masked" in GitLab
- Use environment-specific secrets
- Rotate regularly (every 90 days)
- Never log secrets

See [security.md](references/security.md) for comprehensive security patterns, supply chain security, and secrets management.

### 4. Troubleshooting Pipeline Failures

**Systematic approach:**

**Step 1: Check pipeline health**
```bash
python3 scripts/ci_health.py --platform github --repo owner/repo
```

**Step 2: Identify the failure type**

| Error Pattern | Common Cause | Quick Fix |
|---------------|--------------|-----------|
| "Module not found" | Missing dependency or cache issue | Clear cache, run `npm ci` |
| "Timeout" | Job taking too long | Add caching, increase timeout |
| "Permission denied" | Missing permissions | Add to `permissions:` block |
| "Cannot connect to Docker daemon" | Docker not available | Use correct runner or DinD |
| Intermittent failures | Flaky tests or race conditions | Add retries, fix timing issues |

**Step 3: Enable debug logging**

GitHub Actions:
```yaml
# Add repository secrets:
# ACTIONS_RUNNER_DEBUG = true
# ACTIONS_STEP_DEBUG = true
```

GitLab CI:
```yaml
variables:
  CI_DEBUG_TRACE: "true"
```

**Step 4: Reproduce locally**
```bash
# GitHub Actions - use act
act -j build

# Or Docker
docker run -it ubuntu:latest bash
# Then manually run the failing steps
```

See [troubleshooting.md](references/troubleshooting.md) for comprehensive issue diagnosis, platform-specific problems, and solutions.

### 5. Implementing Deployment Workflows

**Deployment pattern selection:**

| Pattern | Use Case | Complexity | Risk |
|---------|----------|------------|------|
| Direct | Simple apps, low traffic | Low | Medium |
| Blue-Green | Zero downtime required | Medium | Low |
| Canary | Gradual rollout, monitoring | High | Very Low |
| Rolling | Kubernetes, containers | Medium | Low |

**Basic deployment structure:**
```yaml
deploy:
  needs: [build, test]
  if: github.ref == 'refs/heads/main'
  environment:
    name: production
    url: https://example.com
  steps:
    - name: Download artifacts
    - name: Deploy
    - name: Health check
    - name: Rollback on failure
```

**Multi-environment setup:**
- **Development:** Auto-deploy on develop branch
- **Staging:** Auto-deploy on main, requires passing tests
- **Production:** Manual approval required, smoke tests mandatory

See [best_practices.md](references/best_practices.md#deployment-strategies) for detailed deployment patterns and environment management.

### 6. Implementing DevSecOps Security Scanning

**Security scanning types:**

| Scan Type | Purpose | When to Run | Speed | Tools |
|-----------|---------|-------------|-------|-------|
| Secret Scanning | Find exposed credentials | Every commit | Fast (<1 min) | TruffleHog, Gitleaks |
| SAST | Find code vulnerabilities | Every commit | Medium (5-15 min) | CodeQL, Semgrep, Bandit, Gosec |
| SCA | Find dependency vulnerabilities | Every commit | Fast (1-5 min) | npm audit, pip-audit, Snyk |
| Container Scanning | Find image vulnerabilities | After build | Medium (5-10 min) | Trivy, Grype |
| DAST | Find runtime vulnerabilities | Scheduled/main only | Slow (15-60 min) | OWASP ZAP |

**Quick setup - Add security to existing pipeline:**

**GitHub Actions:**
```yaml
jobs:
  # Add before build job
  secret-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: trufflesecurity/trufflehog@main
      - uses: gitleaks/gitleaks-action@v2

  sast:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
    steps:
      - uses: actions/checkout@v4
      - uses: github/codeql-action/init@v3
        with:
          languages: javascript  # or python, go
      - uses: github/codeql-action/analyze@v3

  build:
    needs: [secret-scan, sast]  # Add dependencies
```

**GitLab CI:**
```yaml
stages:
  - security  # Add before other stages
  - build
  - test

# Secret scanning
secret-scan:
  stage: security
  image: trufflesecurity/trufflehog:latest
  script:
    - trufflehog filesystem . --json --fail

# SAST
sast:semgrep:
  stage: security
  image: returntocorp/semgrep
  script:
    - semgrep scan --config=auto .

# Use GitLab templates
include:
  - template: Security/SAST.gitlab-ci.yml
  - template: Security/Dependency-Scanning.gitlab-ci.yml
```

**Comprehensive security pipeline templates:**
- **GitHub Actions:** `templates/github-actions/security-scan.yml` - Complete DevSecOps pipeline with all scanning stages
- **GitLab CI:** `templates/gitlab-ci/security-scan.yml` - Complete DevSecOps pipeline with GitLab security templates

**Security gate pattern:**

Add a security gate job that evaluates all security scan results and fails the pipeline if critical issues are found:

```yaml
security-gate:
  needs: [secret-scan, sast, sca, container-scan]
  script:
    # Check for critical vulnerabilities
    # Parse JSON reports and evaluate thresholds
    # Fail if critical issues found
```

**Language-specific security tools:**

- **Node.js:** CodeQL, Semgrep, npm audit, eslint-plugin-security
- **Python:** CodeQL, Semgrep, Bandit, pip-audit, Safety
- **Go:** CodeQL, Semgrep, Gosec, govulncheck

All language-specific templates now include security scanning stages. See:
- `templates/github-actions/node-ci.yml`
- `templates/github-actions/python-ci.yml`
- `templates/github-actions/go-ci.yml`
- `templates/gitlab-ci/node-ci.yml`
- `templates/gitlab-ci/python-ci.yml`
- `templates/gitlab-ci/go-ci.yml`

See [devsecops.md](references/devsecops.md) for comprehensive DevSecOps guide covering all security scanning types, tool comparisons, and implementation patterns.

## Quick Reference Commands

### GitHub Actions

```bash
# List workflows
gh workflow list

# View recent runs
gh run list --limit 20

# View specific run
gh run view <run-id>

# Re-run failed jobs
gh run rerun <run-id> --failed

# Download logs
gh run view <run-id> --log > logs.txt

# Trigger workflow manually
gh workflow run ci.yml

# Check workflow status
gh run watch
```

### GitLab CI

```bash
# View pipelines
gl project-pipelines list

# Pipeline status
gl project-pipeline get <pipeline-id>

# Retry failed jobs
gl project-pipeline retry <pipeline-id>

# Cancel pipeline
gl project-pipeline cancel <pipeline-id>

# Download artifacts
gl project-job artifacts <job-id>
```

## Platform-Specific Patterns

### GitHub Actions

**Reusable workflows:**
```yaml
# .github/workflows/reusable-test.yml
on:
  workflow_call:
    inputs:
      node-version:
        required: true
        type: string

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ inputs.node-version }}
```

**Call from another workflow:**
```yaml
jobs:
  test:
    uses: ./.github/workflows/reusable-test.yml
    with:
      node-version: '20'
```

### GitLab CI

**Templates with extends:**
```yaml
.test_template:
  image: node:20
  before_script:
    - npm ci

unit-test:
  extends: .test_template
  script:
    - npm run test:unit

integration-test:
  extends: .test_template
  script:
    - npm run test:integration
```

**DAG pipelines with needs:**
```yaml
build:
  stage: build

test:unit:
  stage: test
  needs: [build]

test:integration:
  stage: test
  needs: [build]

deploy:
  stage: deploy
  needs: [test:unit, test:integration]
```

## Diagnostic Scripts

### Pipeline Analyzer

Analyzes workflow configuration for optimization opportunities:

```bash
# GitHub Actions
python3 scripts/pipeline_analyzer.py --platform github --workflow .github/workflows/ci.yml

# GitLab CI
python3 scripts/pipeline_analyzer.py --platform gitlab --config .gitlab-ci.yml
```

**Identifies:**
- Missing caching opportunities
- Unnecessary sequential execution
- Outdated action versions
- Unused artifacts
- Overly broad triggers

### CI Health Checker

Checks pipeline status and identifies issues:

```bash
# GitHub Actions
python3 scripts/ci_health.py --platform github --repo owner/repo --limit 20

# GitLab CI
python3 scripts/ci_health.py --platform gitlab --project-id 12345 --token $GITLAB_TOKEN
```

**Provides:**
- Success/failure rates
- Recent failure patterns
- Workflow-specific insights
- Actionable recommendations

## Reference Documentation

For deep-dive information on specific topics:

- **[best_practices.md](references/best_practices.md)** - Pipeline design, testing strategies, deployment patterns, dependency management, artifact handling, platform-specific patterns
- **[security.md](references/security.md)** - Secrets management, OIDC authentication, supply chain security, access control, vulnerability scanning, secure pipeline patterns
- **[devsecops.md](references/devsecops.md)** - Comprehensive DevSecOps guide: SAST (CodeQL, Semgrep, Bandit, Gosec), DAST (OWASP ZAP), SCA (npm audit, pip-audit, Snyk), container security (Trivy, Grype, SBOM), secret scanning (TruffleHog, Gitleaks), security gates, license compliance
- **[optimization.md](references/optimization.md)** - Caching strategies (dependencies, Docker layers, build artifacts), parallelization techniques, test splitting, build optimization, resource management
- **[troubleshooting.md](references/troubleshooting.md)** - Common issues (workflow not triggering, flaky tests, timeouts, dependency errors), Docker problems, authentication issues, platform-specific debugging

## Templates

Starter templates for common use cases:

### GitHub Actions
- **`assets/templates/github-actions/node-ci.yml`** - Complete Node.js CI/CD with security scanning, caching, matrix testing, and multi-environment deployment
- **`assets/templates/github-actions/python-ci.yml`** - Python pipeline with security scanning, pytest, coverage, PyPI deployment
- **`assets/templates/github-actions/go-ci.yml`** - Go pipeline with security scanning, multi-platform builds, benchmarks, integration tests
- **`assets/templates/github-actions/docker-build.yml`** - Docker build with multi-platform support, security scanning, SBOM generation, and signing
- **`assets/templates/github-actions/security-scan.yml`** - Comprehensive DevSecOps pipeline with SAST, DAST, SCA, container scanning, and security gates

### GitLab CI
- **`assets/templates/gitlab-ci/node-ci.yml`** - GitLab CI pipeline with security scanning, parallel execution, services, and deployment stages
- **`assets/templates/gitlab-ci/python-ci.yml`** - Python pipeline with security scanning, parallel testing, Docker builds, PyPI and Cloud Run deployment
- **`assets/templates/gitlab-ci/go-ci.yml`** - Go pipeline with security scanning, multi-platform builds, benchmarks, Kubernetes deployment
- **`assets/templates/gitlab-ci/docker-build.yml`** - Docker build with DinD, multi-arch, Container Registry, security scanning
- **`assets/templates/gitlab-ci/security-scan.yml`** - Comprehensive DevSecOps pipeline with SAST, DAST, SCA, container scanning, GitLab security templates, and security gates

## Common Patterns

### Caching Dependencies

**GitHub Actions:**
```yaml
- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-
- run: npm ci
```

**GitLab CI:**
```yaml
cache:
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/
```

### Matrix Builds

**GitHub Actions:**
```yaml
strategy:
  matrix:
    os: [ubuntu-latest, macos-latest]
    node: [18, 20, 22]
  fail-fast: false
```

**GitLab CI:**
```yaml
test:
  parallel:
    matrix:
      - NODE_VERSION: ['18', '20', '22']
```

### Conditional Execution

**GitHub Actions:**
```yaml
- name: Deploy
  if: github.ref == 'refs/heads/main' && github.event_name == 'push'
```

**GitLab CI:**
```yaml
deploy:
  rules:
    - if: '$CI_COMMIT_BRANCH == "main"'
      when: manual
```

## Best Practices Summary

**Performance:**
- Enable dependency caching
- Parallelize independent jobs
- Add path filters to reduce unnecessary runs
- Use matrix builds for cross-platform testing

**Security:**
- Use OIDC for cloud authentication
- Pin actions to commit SHAs
- Enable secret scanning and vulnerability checks
- Apply principle of least privilege

**Reliability:**
- Add timeouts to prevent hung jobs
- Implement retry logic for flaky operations
- Use health checks after deployments
- Enable concurrency cancellation

**Maintainability:**
- Use reusable workflows/templates
- Document non-obvious decisions
- Keep workflows DRY with extends/includes
- Regular dependency updates

## Getting Started

1. **New pipeline:** Start with a template from `assets/templates/`
2. **Add security scanning:** Use DevSecOps templates or add security stages to existing pipelines (see workflow 6 above)
3. **Optimize existing:** Run `scripts/pipeline_analyzer.py`
4. **Debug issues:** Check `references/troubleshooting.md`
5. **Improve security:** Review `references/security.md` and `references/devsecops.md` checklists
6. **Speed up builds:** See `references/optimization.md`

Overview

This skill provides hands-on CI/CD pipeline design, optimization, DevSecOps scanning, and troubleshooting across GitHub Actions, GitLab CI, and similar platforms. It helps you create reproducible workflows, speed up builds, secure supply chains, and resolve common pipeline failures. Use it to implement caching, matrix builds, deployments, OIDC-based auth, and multi-environment strategies.

How this skill works

The skill analyzes existing pipeline configurations and generates or improves YAML workflows, adding fast-feedback stages, caching, parallelization, and artifact reuse. It recommends and injects security scanning stages (SAST, SCA, DAST, container scans) and security gates, and provides diagnostic scripts to surface optimization and failure patterns. It also produces concrete fixes for platform-specific errors and reproducible local debugging steps.

When to use it

Creating a new CI/CD workflow for Node, Python, Go, Docker images, or other stacks
Debugging failing or flaky pipelines and reproducing failures locally
Optimizing slow builds and test suites with caching and parallelization
Implementing DevSecOps scans (SAST, SCA, DAST, container scanning)
Setting up deployments (direct, blue-green, canary, rolling) and multi-environment flows
Securing pipelines with OIDC, secrets management, permission hardening, and pinning actions

Best practices

Fail fast: run lint/format checks before expensive jobs
Cache dependencies and Docker layers; build once and reuse artifacts
Parallelize independent jobs and use matrix/sharding for tests
Pin third-party actions to commit SHAs and grant minimal permissions
Add security scans early and a security gate to fail on critical issues
Use path filters and concurrency cancellation to reduce unnecessary runs

Example use cases

Create a GitHub Actions Python CI that runs lint, unit tests, CodeQL, pip-audit, builds artifacts, and deploys to staging
Optimize an existing pipeline by adding actions/cache, removing unnecessary needs, and sharding slow test suites
Add OIDC-based AWS credentials to GitHub Actions to avoid long-lived secrets and enable least-privilege access
Assemble a security stage with Trivy, Semgrep, and SCA tools, then implement a security-gate job to enforce thresholds
Troubleshoot intermittent test failures by enabling debug logs, reproducing the job with act, and applying retries or timing fixes

FAQ

How do I speed up slow dependency installs?

Add dependency caching (platform cache actions or GitLab cache), prefer reproducible installs (npm ci, pip wheel caches), and persist Docker layer caches for image builds.

When should I run DAST versus SAST?

Run SAST on every commit for fast code analysis; run DAST on scheduled runs or main branch deployments because it requires a running app and takes longer.