home / skills / ancoleman / ai-design-components / implementing-service-mesh

implementing-service-mesh skill

/skills/implementing-service-mesh

This skill helps you implement production-grade service mesh deployments with Istio, Linkerd, or Cilium for secure, observable microservices.

npx playbooks add skill ancoleman/ai-design-components --skill implementing-service-mesh

Review the files below or copy the command above to add this skill to your agents.

Files (14)
SKILL.md
10.5 KB
---
name: implementing-service-mesh
description: Implement production-ready service mesh deployments with Istio, Linkerd, or Cilium. Configure mTLS, authorization policies, traffic routing, and progressive delivery patterns for secure, observable microservices. Use when setting up service-to-service communication, implementing zero-trust security, or enabling canary deployments.
---

# Service Mesh Implementation

## Purpose

Configure and deploy service mesh infrastructure for Kubernetes environments. Enable secure service-to-service communication with mutual TLS, implement traffic management policies, configure authorization controls, and set up progressive delivery strategies. Abstracts network complexity while providing observability, security, and resilience for microservices.

## When to Use

Invoke this skill when:

- "Set up service mesh with mTLS"
- "Configure Istio traffic routing"
- "Implement canary deployments"
- "Secure microservices communication"
- "Add authorization policies to services"
- "Traffic splitting between versions"
- "Multi-cluster service mesh setup"
- "Configure ambient mode vs sidecar"
- "Set up circuit breaker configuration"
- "Enable distributed tracing"

## Service Mesh Selection

Choose based on requirements and constraints.

**Istio Ambient (Recommended for most):**
- 8% latency overhead with mTLS (vs 166% sidecar mode)
- Enterprise features, multi-cloud, advanced L7 routing
- Sidecar-less L4 (ztunnel) + optional L7 (waypoint)

**Linkerd (Simplicity priority):**
- 33% latency overhead (lowest sidecar)
- Rust-based micro-proxy, automatic mTLS
- Best for small-medium teams, easy adoption

**Cilium (eBPF-native):**
- 99% latency overhead, kernel-level enforcement
- Advanced networking, sidecar-less by design
- Best for eBPF infrastructure, future-proof

For detailed comparison matrix and architecture trade-offs, see `references/decision-tree.md`.

## Core Concepts

### Data Plane Architectures

**Sidecar:** Proxy per pod, fine-grained L7 control, higher overhead
**Sidecar-less:** Shared node proxies (Istio Ambient) or eBPF (Cilium), lower overhead

**Istio Ambient Components:**
- ztunnel: Per-node L4 proxy for mTLS
- waypoint: Optional per-namespace L7 proxy for HTTP routing

### Traffic Management

**Routing:** Path, header, weight-based traffic distribution
**Resilience:** Retries, timeouts, circuit breakers, fault injection
**Load Balancing:** Round robin, least connections, consistent hash

### Security Model

**mTLS:** Automatic encryption, certificate rotation, zero app changes
**Modes:** STRICT (reject plaintext), PERMISSIVE (accept both)
**Authorization:** Default-deny, identity-based (not IP), L7 policies

## Istio Configuration

Istio uses Custom Resource Definitions for traffic management and security.

### VirtualService (Routing)

```yaml
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: backend-canary
spec:
  hosts:
  - backend
  http:
  - route:
    - destination:
        host: backend
        subset: v1
      weight: 90
    - destination:
        host: backend
        subset: v2
      weight: 10
```

### DestinationRule (Traffic Policy)

```yaml
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: backend-circuit-breaker
spec:
  host: backend
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 10
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
```

### PeerAuthentication (mTLS)

```yaml
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: istio-system
spec:
  mtls:
    mode: STRICT
```

### AuthorizationPolicy (Access Control)

```yaml
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend
  namespace: production
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/production/sa/frontend
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
```

For advanced patterns (fault injection, mirroring, gateways), see `references/istio-patterns.md`.

## Linkerd Configuration

Linkerd emphasizes simplicity with automatic mTLS.

### HTTPRoute (Traffic Splitting)

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: HTTPRoute
metadata:
  name: backend-canary
spec:
  parentRefs:
  - name: backend
    kind: Service
  rules:
  - backendRefs:
    - name: backend-v1
      port: 8080
      weight: 90
    - name: backend-v2
      port: 8080
      weight: 10
```

### ServiceProfile (Retries/Timeouts)

```yaml
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
  name: backend.production.svc.cluster.local
spec:
  routes:
  - name: GET /api/data
    condition:
      method: GET
      pathRegex: /api/data
    timeout: 3s
    retryBudget:
      retryRatio: 0.2
      minRetriesPerSecond: 10
```

### AuthorizationPolicy

```yaml
apiVersion: policy.linkerd.io/v1alpha1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend
spec:
  targetRef:
    kind: Server
    name: backend-api
  requiredAuthenticationRefs:
  - name: frontend-identity
    kind: MeshTLSAuthentication
```

For complete patterns and mTLS verification, see `references/linkerd-patterns.md`.

## Cilium Configuration

Cilium uses eBPF for kernel-level enforcement.

### CiliumNetworkPolicy (L3/L4/L7)

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: backend-access
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: frontend
    toPorts:
    - ports:
      - port: "8080"
      rules:
        http:
        - method: GET
          path: "/api/.*"
```

### DNS-Based Egress

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: external-api-access
spec:
  endpointSelector:
    matchLabels:
      app: backend
  egress:
  - toFQDNs:
    - matchName: "api.github.com"
    toPorts:
    - ports:
      - port: "443"
```

For mTLS with SPIRE and eBPF patterns, see `references/cilium-patterns.md`.

## Security Implementation

### Zero-Trust Architecture

1. Enable strict mTLS (encrypt all traffic)
2. Default-deny authorization policies
3. Explicit allow rules (least privilege)
4. Identity-based access control
5. Audit logging

**Example (Istio):**

```yaml
# Strict mTLS
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: strict-mtls
  namespace: production
spec:
  mtls:
    mode: STRICT
---
# Deny all by default
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: production
spec: {}
```

### Certificate Management

- Automatic rotation (24h TTL default)
- Zero-downtime updates
- External CA integration (cert-manager)
- SPIFFE/SPIRE for workload identity

For JWT authentication and external authorization (OPA), see `references/security-patterns.md`.

## Progressive Delivery

### Canary Deployment

Gradually shift traffic with monitoring.

**Stages:**
1. Deploy v2 with 0% traffic
2. Route 10% to v2, monitor metrics
3. Increase: 25% → 50% → 75% → 100%
4. Cleanup v1 deployment

**Monitor:** Error rate, latency (P95/P99), throughput

### Blue/Green Deployment

Instant cutover with quick rollback.

**Process:**
1. Deploy green alongside blue
2. Test green with header routing
3. Instant cutover to green
4. Rollback to blue if needed

### Automated Rollback (Flagger)

```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: backend
spec:
  targetRef:
    kind: Deployment
    name: backend
  service:
    port: 8080
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
```

For A/B testing and detailed patterns, see `references/progressive-delivery.md`.

## Multi-Cluster Mesh

Extend mesh across Kubernetes clusters.

**Use Cases:** HA, geo-distribution, compliance, DR

**Istio Multi-Primary:**

```bash
# Install on cluster 1
istioctl install --set values.global.meshID=mesh1 \
  --set values.global.multiCluster.clusterName=cluster1

# Exchange secrets for service discovery
istioctl x create-remote-secret --context=cluster2 | \
  kubectl apply -f - --context=cluster1
```

**Linkerd Multi-Cluster:**

```bash
# Link clusters
linkerd multicluster link --cluster-name cluster2 | \
  kubectl apply -f -

# Export service
kubectl label svc/backend mirror.linkerd.io/exported=true
```

For complete setup and cross-cluster patterns, see `references/multi-cluster.md`.

## Installation

### Istio Ambient Mode

```bash
curl -L https://istio.io/downloadIstio | sh -
istioctl install --set profile=ambient -y
kubectl label namespace production istio.io/dataplane-mode=ambient
```

### Linkerd

```bash
curl -sL https://run.linkerd.io/install-edge | sh
linkerd install --crds | kubectl apply -f -
linkerd install | kubectl apply -f -
kubectl annotate namespace production linkerd.io/inject=enabled
```

### Cilium

```bash
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set meshMode=enabled \
  --set authentication.mutual.spire.enabled=true
```

## Troubleshooting

### mTLS Issues

```bash
# Istio: Check mTLS status
istioctl authn tls-check frontend.production.svc.cluster.local

# Linkerd: Check edges
linkerd edges deployment/frontend -n production

# Cilium: Check auth
cilium bpf auth list
```

### Traffic Routing Issues

```bash
# Istio: Analyze config
istioctl analyze -n production

# Linkerd: Tap traffic
linkerd tap deployment/backend -n production

# Cilium: Observe flows
hubble observe --namespace production
```

For complete debugging guide and solutions, see `references/troubleshooting.md`.

## Integration with Other Skills

**kubernetes-operations:** Cluster setup, namespaces, RBAC
**security-hardening:** Container security, secret management
**infrastructure-as-code:** Terraform/Helm for mesh deployment
**building-ci-pipelines:** Automated canary, integration tests
**performance-engineering:** Latency benchmarking, optimization

## Reference Files

- `references/decision-tree.md` - Service mesh selection and comparison
- `references/istio-patterns.md` - Istio configuration examples
- `references/linkerd-patterns.md` - Linkerd patterns and best practices
- `references/cilium-patterns.md` - Cilium eBPF policies and mTLS
- `references/security-patterns.md` - Zero-trust and authorization
- `references/progressive-delivery.md` - Canary, blue/green, A/B testing
- `references/multi-cluster.md` - Multi-cluster setup and federation
- `references/troubleshooting.md` - Common issues and debugging

Overview

This skill implements production-ready service mesh deployments using Istio, Linkerd, or Cilium. It delivers secure service-to-service communication with mTLS, authorization policies, traffic routing, and progressive delivery patterns for microservices. The goal is zero-trust security, observability, and resilient traffic management for Kubernetes environments.

How this skill works

The skill automates mesh selection, installation, and configuration for chosen platforms (Istio Ambient/sidecar, Linkerd, Cilium eBPF). It applies mTLS, PeerAuthentication/AuthorizationPolicy rules, traffic routing objects (VirtualService, HTTPRoute, CiliumNetworkPolicy), and progressive-delivery resources (canary/Flagger). It also includes multi-cluster federation steps, certificate management, and troubleshooting checks to validate health and policy enforcement.

When to use it

  • Set up service-to-service encryption with automatic mTLS
  • Implement canary, blue/green, or A/B progressive delivery
  • Add identity-based authorization policies and default-deny controls
  • Configure advanced L7 routing, retries, and circuit breakers
  • Deploy sidecar, sidecar-less, or eBPF-native architectures
  • Extend a mesh across multiple Kubernetes clusters

Best practices

  • Choose mesh by trade-offs: Istio for features, Linkerd for simplicity, Cilium for eBPF performance
  • Start with permissive mTLS in staging, then move to STRICT in production
  • Enforce default-deny and explicit allow rules for least-privilege access
  • Use weighted routing + automated analysis (Flagger) for safe rollouts
  • Monitor p95/p99 latency and error-rate thresholds before increasing traffic weight
  • Automate cert rotation and integrate external CAs or SPIRE for workload identity

Example use cases

  • Secure internal APIs across namespaces with Istio Ambient and PeerAuthentication STRICT
  • Run a 10→50→100% canary flow using VirtualService/DestinationRule and Flagger analysis
  • Adopt Linkerd for rapid mTLS enablement and simple traffic splits for small teams
  • Apply CiliumNetworkPolicy to enforce L3/L4/L7 policies with eBPF performance
  • Configure multi-cluster Istio multi-primary for geo-distribution and disaster recovery

FAQ

Which mesh should I pick for a new greenfield platform?

If you need advanced L7 features and multi-cloud scale pick Istio; for quick adoption and low operational overhead pick Linkerd; for eBPF-native performance and kernel-level policies pick Cilium.

How do I minimize latency when enabling mTLS?

Use sidecar-less or ambient modes (Istio Ambient or Cilium eBPF) to reduce per-pod overhead, tune proxy placement, and benchmark p95/p99 after enabling mTLS.