home / skills / aj-geddes / useful-ai-prompts / service-mesh-implementation

service-mesh-implementation skill

/skills/service-mesh-implementation

This skill guides deploying and configuring a service mesh (Istio or Linkerd) for secure, observable, and reliable service-to-service communication.

npx playbooks add skill aj-geddes/useful-ai-prompts --skill service-mesh-implementation

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
9.1 KB
---
name: service-mesh-implementation
description: Implement service mesh (Istio, Linkerd) for service-to-service communication, traffic management, security, and observability.
---

# Service Mesh Implementation

## Overview

Deploy and configure a service mesh to manage microservice communication, enable advanced traffic management, implement security policies, and provide comprehensive observability across distributed systems.

## When to Use

- Microservice communication management
- Cross-cutting security policies
- Traffic splitting and canary deployments
- Service-to-service authentication
- Request routing and retries
- Distributed tracing integration
- Circuit breaker patterns
- Mutual TLS between services

## Implementation Examples

### 1. **Istio Core Setup**

```yaml
# istio-setup.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: istio-system
  labels:
    istio-injection: enabled

---
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-config
  namespace: istio-system
spec:
  profile: production
  revision: "1-13"

  components:
    pilot:
      k8s:
        resources:
          requests:
            cpu: 500m
            memory: 2048Mi
          limits:
            cpu: 2000m
            memory: 4096Mi
        replicaCount: 3

    ingressGateways:
      - name: istio-ingressgateway
        enabled: true
        k8s:
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 2000m
              memory: 1024Mi
          service:
            type: LoadBalancer
            ports:
              - port: 80
                targetPort: 8080
                name: http2
              - port: 443
                targetPort: 8443
                name: https

    egressGateways:
      - name: istio-egressgateway
        enabled: true

  meshConfig:
    enableAutoMTLS: true
    outboundTrafficPolicy:
      mode: ALLOW_ANY

    accessLogFile: /dev/stdout
    accessLogFormat: |
      [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%"
      %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT%
      "%DURATION%" "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%"

---
# Enable sidecar injection for namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    istio-injection: enabled
```

### 2. **Virtual Service and Destination Rule**

```yaml
# virtual-service-config.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api-service
  namespace: production
spec:
  hosts:
    - api-service
    - api-service.production.svc.cluster.local
  http:
    # Canary: 10% to v2, 90% to v1
    - match:
        - uri:
            prefix: /api/v1
      route:
        - destination:
            host: api-service
            subset: v1
          weight: 90
        - destination:
            host: api-service
            subset: v2
          weight: 10
      timeout: 30s
      retries:
        attempts: 3
        perTryTimeout: 10s

    # API v2 for testing
    - match:
        - headers:
            user-agent:
              regex: ".*Chrome.*"
      route:
        - destination:
            host: api-service
            subset: v2
      timeout: 30s

    # Default route
    - route:
        - destination:
            host: api-service
            subset: v1
          weight: 100
      timeout: 30s
      retries:
        attempts: 3
        perTryTimeout: 10s

---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: api-service
  namespace: production
spec:
  host: api-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 100
        maxRequestsPerConnection: 2
        h2UpgradePolicy: UPGRADE

    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
      minRequestVolume: 10

  subsets:
    - name: v1
      labels:
        version: v1
      trafficPolicy:
        connectionPool:
          http:
            http1MaxPendingRequests: 50

    - name: v2
      labels:
        version: v2
      trafficPolicy:
        connectionPool:
          http:
            http1MaxPendingRequests: 100
```

### 3. **Security Policies**

```yaml
# security-config.yaml
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT  # Enforce mTLS for all workloads

---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: api-service-authz
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  action: ALLOW
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/web-service"]
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/api/v1/*"]

    # Allow health checks
    - to:
        - operation:
            methods: ["GET"]
            paths: ["/health"]

---
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
  name: api-service-authn
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-service
  jwtRules:
    - issuer: https://auth.mycompany.com
      jwksUri: https://auth.mycompany.com/.well-known/jwks.json
      audiences: api-service
```

### 4. **Observability Configuration**

```yaml
# observability-config.yaml
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: custom-logging
  namespace: production
spec:
  metrics:
    - providers:
        - name: prometheus
      dimensions:
        - request.path
        - response.code
        - destination.service.name

---
apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: custom-tracing
  namespace: production
spec:
  tracing:
    - providers:
        - name: jaeger
      randomSamplingPercentage: 100.0
      useRequestIdForTraceSampling: true

---
# Grafana Dashboard ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: istio-dashboard
  namespace: monitoring
data:
  istio-mesh.json: |
    {
      "dashboard": {
        "title": "Istio Mesh",
        "panels": [
          {
            "title": "Request Rate",
            "targets": [
              {
                "expr": "rate(istio_requests_total[5m])"
              }
            ]
          },
          {
            "title": "Error Rate",
            "targets": [
              {
                "expr": "rate(istio_requests_total{response_code=~\"5..\"}[5m])"
              }
            ]
          },
          {
            "title": "Latency P95",
            "targets": [
              {
                "expr": "histogram_quantile(0.95, rate(istio_request_duration_milliseconds_bucket[5m]))"
              }
            ]
          }
        ]
      }
    }
```

### 5. **Service Mesh Deployment Script**

```bash
#!/bin/bash
# deploy-istio.sh - Install and configure Istio

set -euo pipefail

VERSION="1.13.0"
NAMESPACE="istio-system"

echo "Installing Istio $VERSION..."

# Download Istio
if [ ! -d "istio-$VERSION" ]; then
    echo "Downloading Istio..."
    curl -L https://istio.io/downloadIstio | ISTIO_VERSION=$VERSION sh -
fi

cd "istio-$VERSION"

# Add istioctl to PATH
export PATH=$PWD/bin:$PATH

# Verify cluster
echo "Verifying cluster compatibility..."
istioctl analyze

# Install Istio
echo "Installing Istio on cluster..."
istioctl install --set profile=production -y

# Verify installation
echo "Verifying installation..."
kubectl get ns $NAMESPACE
kubectl get pods -n $NAMESPACE

# Label namespaces for sidecar injection
echo "Configuring sidecar injection..."
kubectl label namespace production istio-injection=enabled --overwrite

# Wait for sidecars
echo "Waiting for sidecars to be injected..."
kubectl rollout restart deployment -n production

echo "Istio installation complete!"

# Show status
istioctl version
```

## Service Mesh Patterns

### Traffic Management
- **Canary Deployments**: Gradually shift traffic
- **A/B Testing**: Route based on headers
- **Circuit Breaking**: Fail fast with outlier detection
- **Rate Limiting**: Control request flow

### Security
- **mTLS**: Mutual authentication
- **Authorization Policies**: Fine-grained access control
- **JWT Validation**: Token verification
- **Encryption**: Automatic in-transit encryption

## Best Practices

### ✅ DO
- Enable mTLS for all workloads
- Implement proper authorization policies
- Use virtual services for traffic management
- Enable distributed tracing
- Monitor resource usage (CPU, memory)
- Use appropriate sampling rates for tracing
- Implement circuit breakers
- Use namespace isolation

### ❌ DON'T
- Disable mTLS in production
- Allow permissive traffic policies
- Ignore observability setup
- Deploy without resource requests/limits
- Skip sidecar injection validation
- Use 100% sampling in high-traffic systems
- Mix service versions without proper routing
- Neglect authorization policies

## Resources

- [Istio Official Documentation](https://istio.io/latest/docs/)
- [Linkerd Documentation](https://linkerd.io/2/overview/)
- [Service Mesh Interface (SMI)](https://smi-spec.io/)
- [Istio Security Best Practices](https://istio.io/latest/docs/concepts/security/)

Overview

This skill implements a service mesh (Istio or Linkerd) to manage service-to-service communication, traffic control, security, and observability across Kubernetes clusters. It provides production-ready manifests, traffic routing examples, security policies, observability configuration, and an automated install script. The goal is to enable secure, observable, and controlled microservice interactions with minimal manual steps.

How this skill works

The skill deploys and configures control plane components, enables sidecar injection for workloads, and applies VirtualService/DestinationRule patterns for traffic splitting, retries, timeouts, and outlier detection. It enforces mTLS and authorization rules, integrates tracing and metrics providers (Jaeger/Prometheus), and offers Grafana dashboards and scripts to automate installation and validation.

When to use it

  • You run many microservices that require secure service-to-service communication
  • You need canary releases, A/B testing, or precise traffic shaping
  • You require centralized observability: tracing, metrics, logs, and dashboards
  • You want platform-level authentication/authorization and mTLS for in-transit encryption
  • You need circuit breakers, rate limiting, or outlier detection to improve resilience

Best practices

  • Enable mTLS cluster-wide and validate sidecar injection after deployment
  • Define fine-grained AuthorizationPolicy and RequestAuthentication rules for APIs
  • Use VirtualService and DestinationRule subsets for gradual rollouts and canaries
  • Set resource requests/limits for control plane components and workloads
  • Enable distributed tracing with sensible sampling; avoid 100% sampling in high-traffic systems
  • Monitor control plane and data plane metrics and configure alerting for ejection/latency anomalies

Example use cases

  • Canary deploy api-service: split traffic 90/10 between v1 and v2 using VirtualService weights
  • Enforce strict mTLS and JWT validation for internal APIs to block unauthorized callers
  • Implement outlier detection and circuit breaking to eject failing instances automatically
  • Route traffic by header (user-agent) for A/B experiments without changing application code
  • Collect request metrics in Prometheus and trace requests end-to-end in Jaeger for incident diagnosis

FAQ

Can I use this with Istio and Linkerd interchangeably?

The skill includes patterns for both, but manifests and APIs differ; choose the provider and adapt manifests and telemetry integrations accordingly.

Is it safe to enable 100% trace sampling in production?

No—100% sampling can overwhelm storage and observability backends in high-traffic systems; use targeted or adaptive sampling.

How do I validate sidecar injection worked?

Check pods for the envoy/linkerd-proxy container, run istioctl analyze, and review injection logs and pod annotations.