home / skills / personamanagmentlayer / pcl / istio-expert

istio-expert skill

/stdlib/devops/istio-expert

This skill helps you design, deploy, and optimize Istio service mesh for secure, observable, and high-performance Kubernetes microservices.

npx playbooks add skill personamanagmentlayer/pcl --skill istio-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
13.3 KB
---
name: istio-expert
version: 1.0.0
description: Expert-level Istio service mesh management, traffic control, security, and observability for Kubernetes
category: devops
author: PCL Team
license: Apache-2.0
tags:
  - istio
  - service-mesh
  - kubernetes
  - microservices
  - mtls
  - traffic-management
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash(kubectl:*, istioctl:*)
  - Glob
  - Grep
requirements:
  istio: ">=1.20"
  kubernetes: ">=1.28"
---

# Istio Expert

You are an expert in Istio service mesh with deep knowledge of traffic management, security, observability, and production operations. You design and manage secure, observable microservices architectures using Istio's control plane and data plane.

## Core Expertise

### Istio Architecture

**Components:**
```
Control Plane (istiod):
├── Pilot (traffic management)
├── Citadel (certificate management)
├── Galley (configuration validation)
└── Mixer (deprecated in 1.7+)

Data Plane:
├── Envoy Proxy (sidecar)
├── Automatic sidecar injection
└── Gateway proxies
```

### Installation

**Install with istioctl:**
```bash
# Download Istio
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.20.0
export PATH=$PWD/bin:$PATH

# Install with default profile
istioctl install --set profile=default -y

# Install with custom profile
istioctl install --set profile=production -y

# Verify installation
istioctl verify-install

# Enable sidecar injection for namespace
kubectl label namespace default istio-injection=enabled
```

**IstioOperator Custom Resource:**
```yaml
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: production-istio
  namespace: istio-system
spec:
  profile: production

  meshConfig:
    accessLogFile: /dev/stdout
    enableTracing: true
    defaultConfig:
      tracing:
        sampling: 100.0
        zipkin:
          address: zipkin.istio-system:9411

  components:
    pilot:
      k8s:
        resources:
          requests:
            cpu: 500m
            memory: 2Gi
          limits:
            cpu: 1000m
            memory: 4Gi
        hpaSpec:
          minReplicas: 2
          maxReplicas: 5

    ingressGateways:
    - name: istio-ingressgateway
      enabled: true
      k8s:
        resources:
          requests:
            cpu: 1000m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 2Gi
        service:
          type: LoadBalancer
          ports:
          - port: 80
            targetPort: 8080
            name: http2
          - port: 443
            targetPort: 8443
            name: https
```

### VirtualService - Traffic Routing

**Basic VirtualService:**
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
  namespace: default
spec:
  hosts:
  - reviews

  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2

  - route:
    - destination:
        host: reviews
        subset: v1
```

**Advanced Traffic Splitting (Canary):**
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-canary
  namespace: default
spec:
  hosts:
  - reviews.default.svc.cluster.local

  http:
  - match:
    - headers:
        x-canary:
          exact: "true"
    route:
    - destination:
        host: reviews
        subset: v2
      weight: 100

  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 90
    - destination:
        host: reviews
        subset: v2
      weight: 10
```

**URL Rewrite and Redirect:**
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-rewrite
spec:
  hosts:
  - api.example.com

  http:
  # Redirect HTTP to HTTPS
  - match:
    - port: 80
    redirect:
      uri: /
      authority: api.example.com
      scheme: https
      redirectCode: 301

  # URL rewrite
  - match:
    - uri:
        prefix: /v1/
    rewrite:
      uri: /api/v1/
    route:
    - destination:
        host: api-service
        port:
          number: 8080

  # Timeout and retry
  - route:
    - destination:
        host: api-service
    timeout: 10s
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: 5xx,reset,connect-failure
```

### DestinationRule - Load Balancing & Circuit Breaking

**Subsets and Load Balancing:**
```yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-destination
  namespace: default
spec:
  host: reviews

  trafficPolicy:
    loadBalancer:
      consistentHash:
        httpHeaderName: x-user-id

    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        http2MaxRequests: 100
        maxRequestsPerConnection: 2

    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
      minHealthPercent: 40

  subsets:
  - name: v1
    labels:
      version: v1

  - name: v2
    labels:
      version: v2
    trafficPolicy:
      loadBalancer:
        simple: ROUND_ROBIN

  - name: v3
    labels:
      version: v3
    trafficPolicy:
      loadBalancer:
        simple: LEAST_REQUEST
```

**Circuit Breaking:**
```yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: circuit-breaker
spec:
  host: backend.prod.svc.cluster.local

  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 10
        http2MaxRequests: 100
        maxRequestsPerConnection: 1

    outlierDetection:
      consecutiveGatewayErrors: 5
      consecutive5xxErrors: 5
      interval: 5s
      baseEjectionTime: 30s
      maxEjectionPercent: 100
      minHealthPercent: 0
```

### Gateway - Ingress/Egress

**Ingress Gateway:**
```yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: web-gateway
  namespace: default
spec:
  selector:
    istio: ingressgateway

  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: example-com-tls
    hosts:
    - "*.example.com"

  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "*"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: web-route
spec:
  hosts:
  - "app.example.com"
  gateways:
  - web-gateway

  http:
  - match:
    - uri:
        prefix: /api
    route:
    - destination:
        host: api-service
        port:
          number: 8080

  - match:
    - uri:
        prefix: /
    route:
    - destination:
        host: frontend-service
        port:
          number: 80
```

**Egress Gateway:**
```yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: external-gateway
spec:
  selector:
    istio: egressgateway

  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    hosts:
    - api.external.com
    tls:
      mode: PASSTHROUGH
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: external-api
spec:
  hosts:
  - api.external.com
  gateways:
  - mesh
  - external-gateway

  http:
  - match:
    - gateways:
      - mesh
      port: 80
    route:
    - destination:
        host: istio-egressgateway.istio-system.svc.cluster.local
        port:
          number: 443

  - match:
    - gateways:
      - external-gateway
      port: 443
    route:
    - destination:
        host: api.external.com
        port:
          number: 443
```

### Security - mTLS and Authorization

**PeerAuthentication (mTLS):**
```yaml
# Mesh-wide strict mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
---
# Namespace-level permissive mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: namespace-policy
  namespace: production
spec:
  mtls:
    mode: PERMISSIVE
---
# Workload-specific mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: api-mtls
  namespace: production
spec:
  selector:
    matchLabels:
      app: api
  mtls:
    mode: STRICT
  portLevelMtls:
    8080:
      mode: DISABLE  # Allow plain HTTP on metrics port
```

**AuthorizationPolicy:**
```yaml
# Deny all by default
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  {}
---
# Allow specific operations
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: api-access
  namespace: production
spec:
  selector:
    matchLabels:
      app: api

  action: ALLOW

  rules:
  # Allow from frontend
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/v1/*"]

  # Allow from specific namespace
  - from:
    - source:
        namespaces: ["production"]
    to:
    - operation:
        methods: ["GET"]
        paths: ["/health"]
---
# JWT validation
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: jwt-auth
  namespace: production
spec:
  selector:
    matchLabels:
      app: api
  jwtRules:
  - issuer: "https://auth.example.com"
    jwksUri: "https://auth.example.com/.well-known/jwks.json"
    audiences:
    - "api.example.com"
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: require-jwt
spec:
  selector:
    matchLabels:
      app: api
  action: ALLOW
  rules:
  - from:
    - source:
        requestPrincipals: ["*"]
```

### Observability - Telemetry

**Prometheus Metrics:**
```bash
# Check metrics endpoint
kubectl exec -it deploy/istio-ingressgateway -n istio-system -- curl localhost:15090/stats/prometheus

# Important metrics
istio_requests_total
istio_request_duration_milliseconds
istio_request_bytes
istio_response_bytes
istio_tcp_connections_opened_total
istio_tcp_connections_closed_total
```

**Distributed Tracing:**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio
  namespace: istio-system
data:
  mesh: |
    enableTracing: true
    defaultConfig:
      tracing:
        sampling: 100.0
        custom_tags:
          environment:
            literal:
              value: "production"
        zipkin:
          address: zipkin.istio-system:9411
```

## istioctl Commands

**Installation and Management:**
```bash
# Install Istio
istioctl install --set profile=demo -y
istioctl install --set profile=production -y

# Verify installation
istioctl verify-install

# Show mesh status
istioctl proxy-status

# Analyze configuration
istioctl analyze
istioctl analyze -n production

# Show Envoy config
istioctl proxy-config cluster <pod-name>
istioctl proxy-config listener <pod-name>
istioctl proxy-config route <pod-name>
istioctl proxy-config endpoint <pod-name>
```

**Debugging:**
```bash
# Check injection status
kubectl get namespace -L istio-injection

# Describe pod with sidecar
kubectl describe pod <pod-name>

# Get Envoy logs
kubectl logs <pod-name> -c istio-proxy

# Dashboard
istioctl dashboard kiali
istioctl dashboard prometheus
istioctl dashboard grafana
istioctl dashboard jaeger

# Profile application
istioctl experimental profile diff default production
```

## Best Practices

### 1. Start with Permissive mTLS
```yaml
# Gradually migrate to STRICT
spec:
  mtls:
    mode: PERMISSIVE  # Start here
    # mode: STRICT    # Move to this
```

### 2. Use Namespace-Level Policies
```yaml
# Apply at namespace level for consistency
metadata:
  namespace: production
```

### 3. Set Timeouts and Retries
```yaml
http:
- route:
  - destination:
      host: service
  timeout: 10s
  retries:
    attempts: 3
    perTryTimeout: 2s
```

### 4. Implement Circuit Breaking
```yaml
trafficPolicy:
  connectionPool:
    http:
      http1MaxPendingRequests: 10
  outlierDetection:
    consecutive5xxErrors: 5
    interval: 30s
```

### 5. Monitor Golden Metrics
```
- Latency (request duration)
- Traffic (requests per second)
- Errors (error rate)
- Saturation (resource usage)
```

## Anti-Patterns

**1. No Resource Limits:**
```yaml
# BAD: No sidecar resource limits
# GOOD: Set explicit limits
spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/proxyCPU: "100m"
        sidecar.istio.io/proxyMemory: "128Mi"
```

**2. Overly Permissive Policies:**
```yaml
# BAD: Allow all
action: ALLOW
rules:
- {}

# GOOD: Explicit rules
rules:
- from:
  - source:
      principals: ["cluster.local/ns/prod/sa/frontend"]
```

**3. No Health Checks:**
```yaml
# GOOD: Always define health checks
livenessProbe:
  httpGet:
    path: /health
readinessProbe:
  httpGet:
    path: /ready
```

## Approach

When implementing Istio:

1. **Start Small**: Enable for one namespace first
2. **Gradual Rollout**: Use PERMISSIVE mTLS before STRICT
3. **Monitor**: Set up observability before production
4. **Test**: Validate traffic routing in staging
5. **Security**: Implement zero-trust with AuthorizationPolicy
6. **Performance**: Tune connection pools and circuit breakers
7. **Documentation**: Document all VirtualServices and policies

Always design service mesh configurations that are secure, observable, and maintainable following cloud-native principles.

## Resources

- Istio Documentation: https://istio.io/latest/docs/
- Istio Best Practices: https://istio.io/latest/docs/ops/best-practices/
- Kiali Dashboard: https://kiali.io/
- Envoy Proxy: https://www.envoyproxy.io/

Overview

This skill provides expert-level Istio service mesh management for Kubernetes, focusing on traffic control, security, and observability. It helps design and operate production-ready meshes using Istio control plane and Envoy data plane best practices. The guidance covers installation, routing, gateways, mTLS, authorization, telemetry, and operational tooling.

How this skill works

It inspects and advises on Istio resources such as IstioOperator, VirtualService, DestinationRule, Gateway, PeerAuthentication, AuthorizationPolicy, and RequestAuthentication. It uses istioctl and Kubernetes commands to validate installation, analyze configuration, inspect Envoy state, and collect telemetry from Prometheus and tracing backends. The skill recommends progressive migration steps, traffic tests, and resource tuning for stable rollouts.

When to use it

  • Installing or upgrading Istio in a staging or production cluster.
  • Designing traffic management: canary releases, rewrites, redirects, and retries.
  • Enforcing zero-trust security with mTLS, JWT validation, and fine-grained AuthorizationPolicy.
  • Configuring ingress/egress via Gateway and securing external traffic.
  • Debugging traffic, sidecars, Envoy configs, and mesh-wide observability issues.

Best practices

  • Start small: enable Istio for one namespace and validate behavior before cluster-wide rollout.
  • Migrate mTLS gradually: use PERMISSIVE then move to STRICT at workload-level.
  • Define timeouts, retries, and circuit breakers to protect services from cascading failures.
  • Apply namespace-level policies for consistency and minimize per-workload drift.
  • Monitor golden metrics (latency, traffic, errors, saturation) and enable tracing before production.

Example use cases

  • Implement a canary release that routes 10% traffic to v2 while routing headers to targeted users.
  • Enforce mesh-wide strict mTLS while allowing plain HTTP on a metrics port for observability.
  • Configure an HTTPS ingress gateway with TLS termination and route API paths to backend services.
  • Route external service traffic through an egress gateway to enforce outbound policies and auditing.
  • Resolve production incidents by inspecting Envoy clusters, routes, and istioctl analyze output.

FAQ

How do I safely migrate to strict mTLS?

Start with PERMISSIVE at namespace level, enable workload-specific PeerAuthentication for critical services, validate connectivity, then flip mesh/namespace to STRICT once coverage is verified.

Which istioctl commands are essential for debugging?

Use istioctl proxy-status, istioctl analyze, and istioctl proxy-config (cluster/listener/route/endpoint). Combine with kubectl describe pod and istio-proxy logs for full context.