home / skills / pluginagentmarketplace / custom-plugin-devops / cloud

cloud skill

/skills/cloud

This skill helps you manage cloud infrastructure across AWS, Azure, and GCP with security, networking, and cost optimization best practices.

npx playbooks add skill pluginagentmarketplace/custom-plugin-devops --skill cloud

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
2.6 KB
---
name: cloud-skill
description: Cloud infrastructure with AWS, Azure, GCP - architecture, services, security, and cost optimization.
sasmp_version: "1.3.0"
bonded_agent: 07-cloud-infrastructure
bond_type: PRIMARY_BOND

parameters:
  - name: provider
    type: string
    required: false
    enum: ["aws", "azure", "gcp", "multi-cloud"]
    default: "aws"
  - name: service
    type: string
    required: false
    enum: ["compute", "storage", "database", "networking", "serverless"]
    default: "compute"

retry_config:
  strategy: exponential_backoff
  initial_delay_ms: 1000
  max_retries: 3

observability:
  logging: structured
  metrics: enabled
---

# Cloud Infrastructure Skill

## Overview
Master cloud platforms: AWS, Azure, and GCP.

## Parameters
| Name | Type | Required | Default | Description |
|------|------|----------|---------|-------------|
| provider | string | No | aws | Cloud provider |
| service | string | No | compute | Service type |

## Core Topics

### MANDATORY
- AWS: EC2, S3, RDS, Lambda, VPC
- Azure: VMs, Storage, AKS
- GCP: Compute Engine, GKE
- IAM and security
- Networking (VPCs, subnets)

### OPTIONAL
- Cost optimization
- Multi-cloud strategies
- Managed Kubernetes
- Serverless patterns

### ADVANCED
- Well-Architected Framework
- Landing zones
- Organizations/Control Tower
- FinOps

## Service Comparison
| Category | AWS | Azure | GCP |
|----------|-----|-------|-----|
| Compute | EC2 | VMs | Compute Engine |
| K8s | EKS | AKS | GKE |
| Serverless | Lambda | Functions | Cloud Functions |
| Storage | S3 | Blob | Cloud Storage |

## Quick Reference

```bash
# AWS CLI
aws sts get-caller-identity
aws ec2 describe-instances
aws s3 ls s3://bucket-name
aws eks update-kubeconfig --name cluster

# Azure CLI
az login
az account list
az vm list
az aks get-credentials --name cluster

# GCP CLI
gcloud auth login
gcloud projects list
gcloud compute instances list
gcloud container clusters get-credentials cluster
```

## Troubleshooting

### Common Failures
| Symptom | Root Cause | Solution |
|---------|------------|----------|
| Access Denied | IAM policy | Check policies |
| Quota Exceeded | Service limit | Request increase |
| Timeout | Network/SG | Check VPC, SGs |
| Cost spike | Runaway resources | Cost Explorer |

### Debug Checklist
1. Identity: `aws sts get-caller-identity`
2. Region: `echo $AWS_REGION`
3. Permissions: Check IAM
4. CloudTrail: Audit logs

### Recovery Procedures

#### Compromised Key
1. Disable key immediately
2. Review CloudTrail
3. Rotate credentials

## Resources
- [AWS Docs](https://docs.aws.amazon.com)
- [Azure Docs](https://docs.microsoft.com/azure)
- [GCP Docs](https://cloud.google.com/docs)

Overview

This skill provides practical guidance for designing, operating, and optimizing cloud infrastructure across AWS, Azure, and GCP. It focuses on architecture, core services, security posture, networking, and cost optimization to support robust production workloads. The content is aimed at DevOps engineers and platform teams building CI/CD, deployment, monitoring, and infrastructure automation.

How this skill works

The skill inspects platform-agnostic patterns and maps them to provider-specific services (EC2/VMs/Compute Engine, S3/Blob/Cloud Storage, EKS/AKS/GKE, Lambda/Functions/Cloud Functions). It highlights identity and access management, networking configuration, troubleshooting steps, and recovery procedures for compromised credentials. Quick CLI commands and a debug checklist help validate identity, region, permissions, and audit logs.

When to use it

  • Designing multi-cloud or single-cloud infrastructure blueprints
  • Implementing CI/CD pipelines and automated deployments
  • Securing accounts, roles, and network boundaries before production launch
  • Optimizing cloud spend and performing FinOps reviews
  • Recovering from compromised credentials or access incidents

Best practices

  • Use provider-managed services where appropriate to reduce operational overhead
  • Enforce least-privilege IAM policies and centralize audit logs with CloudTrail/Equivalents
  • Define network boundaries with VPCs/subnets and explicit security group rules
  • Automate deployments and configuration with CI/CD and infrastructure-as-code
  • Establish cost controls: budgets, alerts, and tagging for chargeback and FinOps

Example use cases

  • Create a landing zone with organization-level controls and centralized logging
  • Migrate workloads to managed Kubernetes (EKS/AKS/GKE) with automated rollout pipelines
  • Build serverless backends for event-driven workloads using Lambda/Functions
  • Perform a security incident response: disable keys, review audit logs, rotate credentials
  • Run a cost optimization audit to identify idle instances and oversized resources

FAQ

Which CLI commands are critical for initial troubleshooting?

Use provider CLIs to validate identity and resources: aws sts get-caller-identity, az login / az account list, gcloud auth login; then list compute and cluster resources.

What steps to take if an access key is compromised?

Immediately disable the key, review CloudTrail or provider audit logs for scope of use, rotate credentials, and update any automation that depended on the key.