home / skills / hyperb1iss / hyperskills / platform

platform skill

/skills/platform

This skill helps platform engineers manage reliable, observable, cost-efficient infrastructure across IaC, GitOps, and cloud deployments.

npx playbooks add skill hyperb1iss/hyperskills --skill platform

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.3 KB
---
name: platform
description: Use this skill when working on infrastructure, DevOps, CI/CD, Kubernetes, cloud deployment, observability, or cost optimization. Activates on mentions of Kubernetes, Docker, Terraform, Pulumi, OpenTofu, GitOps, Argo CD, Flux, CI/CD, GitHub Actions, observability, OpenTelemetry, Prometheus, Grafana, AWS, GCP, Azure, infrastructure as code, platform engineering, FinOps, or cloud costs.
---

# Platform Engineering

Build reliable, observable, cost-efficient infrastructure.

## Quick Reference

### The 2026 Platform Stack

| Layer         | Tool                   | Purpose                   |
| ------------- | ---------------------- | ------------------------- |
| IaC           | OpenTofu / Pulumi      | Infrastructure definition |
| GitOps        | Argo CD / Flux         | Continuous deployment     |
| Control Plane | Crossplane             | Kubernetes-native infra   |
| Observability | OpenTelemetry          | Unified telemetry         |
| Service Mesh  | Istio Ambient / Cilium | mTLS, traffic management  |
| Cost          | FinOps Framework       | Cloud optimization        |

### Infrastructure as Code

**OpenTofu** (Terraform-compatible, open-source):

```hcl
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"

  tags = {
    Name        = "web-server"
    Environment = "production"
  }
}
```

**Pulumi** (Real programming languages):

```typescript
import * as aws from "@pulumi/aws";

const server = new aws.ec2.Instance("web", {
  ami: "ami-0c55b159cbfafe1f0",
  instanceType: "t3.micro",
  tags: { Name: "web-server" },
});

export const publicIp = server.publicIp;
```

### GitOps with Argo CD

```yaml
# Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/repo
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
```

### Kubernetes Patterns

**Gateway API** (replacing Ingress):

```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
spec:
  parentRefs:
    - name: main-gateway
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api-service
          port: 8080
```

**Istio Ambient Mode** (sidecar-less service mesh):

```yaml
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    istio.io/dataplane-mode: ambient # Enable ambient mesh
```

### OpenTelemetry Setup

```python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Initialize
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# Use
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("my-operation"):
    do_work()
```

### CI/CD Pipeline (GitHub Actions)

```yaml
name: Deploy
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

      - name: Update manifests
        run: |
          cd k8s/overlays/production
          kustomize edit set image app=ghcr.io/${{ github.repository }}:${{ github.sha }}
          git commit -am "Deploy ${{ github.sha }}"
          git push
```

### FinOps Framework

**Phase 1: INFORM** (visibility)

- Tag everything: `team`, `environment`, `cost-center`
- Use cloud cost explorers
- Target: 95%+ cost allocation accuracy

**Phase 2: OPTIMIZE** (action)

- Rightsize instances (most are overprovisioned)
- Use spot/preemptible for stateless workloads
- Reserved instances for baseline capacity
- Target: 20-30% cost reduction

**Phase 3: OPERATE** (governance)

- Budget alerts at 80% threshold
- Cost metrics in CI/CD gates
- Regular FinOps reviews

### Security Baseline

```yaml
# Tetragon policy (eBPF runtime enforcement)
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: block-shell
spec:
  kprobes:
    - call: "sys_execve"
      selectors:
        - matchBinaries:
            - operator: "In"
              values: ["/bin/sh", "/bin/bash"]
          matchNamespaces:
            - namespace: production
      action: Block
```

## Agents

- **platform-engineer** - GitOps, IaC, Kubernetes, observability
- **data-engineer** - Pipelines, ETL, data infrastructure
- **finops-engineer** - Cloud cost optimization, FinOps framework

## Deep Dives

- [references/gitops-patterns.md](references/gitops-patterns.md)
- [references/kubernetes-gateway.md](references/kubernetes-gateway.md)
- [references/opentelemetry.md](references/opentelemetry.md)
- [references/finops-framework.md](references/finops-framework.md)

## Examples

- [examples/argo-cd-setup/](examples/argo-cd-setup/)
- [examples/pulumi-aws/](examples/pulumi-aws/)
- [examples/otel-stack/](examples/otel-stack/)

Overview

This skill helps with platform engineering tasks across infrastructure, CI/CD, Kubernetes, observability, and cloud cost optimization. It provides practical advice, patterns, and examples for IaC, GitOps, telemetry, service mesh, and FinOps to build reliable, observable, and cost-efficient platforms.

How this skill works

The skill inspects infrastructure and deployment contexts and offers concrete guidance, manifests, and pipeline snippets for common platform tasks. It recommends tooling and configuration patterns (OpenTofu/Pulumi, Argo CD/Flux, Crossplane, OpenTelemetry, Istio/Cilium) and maps them to outcomes like deploy automation, observability, and cost reduction. Use it to validate design choices, generate configuration examples, and produce checklists for operational readiness.

When to use it

  • Setting up or auditing GitOps pipelines (Argo CD, Flux) for Kubernetes deployments
  • Designing infrastructure as code with OpenTofu (Terraform-compatible) or Pulumi
  • Implementing observability with OpenTelemetry, Prometheus, and tracing pipelines
  • Defining service mesh and network policies (Istio Ambient, Cilium) for mTLS and traffic control
  • Running FinOps initiatives: tagging, rightsizing, spot/preemptible use, and budget gates
  • Building CI/CD workflows (GitHub Actions) to automate builds, image pushes, and manifest updates

Best practices

  • Treat infrastructure as code and store everything in Git; enforce GitOps for continuous deployments
  • Tag resources consistently (team, environment, cost-center) to achieve >95% cost allocation accuracy
  • Use automated sync with prune and self-heal in Argo CD or Flux to maintain cluster drift protection
  • Instrument apps and platform components with OpenTelemetry and export to a collector for unified tracing/metrics
  • Rightsize compute and prefer spot/preemptible for stateless workloads; reserve baseline capacity where appropriate
  • Shift cost and security checks into CI/CD gates and include budget alerts and FinOps reviews in runbooks

Example use cases

  • Create an Argo CD Application manifest that syncs production overlays and self-heals after drift
  • Author an OpenTofu module for AWS EC2, tagging instances with environment and cost-center
  • Add OpenTelemetry SDK initialization to a service and export traces to a collector for end-to-end tracing
  • Implement a GitHub Actions deploy job that builds an image, pushes to a registry, and updates kustomize overlays
  • Define a FinOps playbook: inventory, rightsizing plan, spot adoption, and CI budget gates

FAQ

Which IaC should I choose: OpenTofu or Pulumi?

Choose OpenTofu for Terraform-compatible declarative workflows and broad provider support; choose Pulumi if you need real programming languages for complex logic and richer abstractions.

How do I get actionable cost savings fast?

Start with visibility: tag resources, enable cost explorers, and run a rightsizing report. Move stateless workloads to spot instances and reserve baseline capacity where predictable.