home / skills / ancoleman / ai-design-components / managing-dns
This skill helps you configure and automate DNS records, TTL strategies, and DNS-as-code across providers to ensure reliable domain resolution.
npx playbooks add skill ancoleman/ai-design-components --skill managing-dnsReview the files below or copy the command above to add this skill to your agents.
---
name: managing-dns
description: Manage DNS records, TTL strategies, and DNS-as-code automation for infrastructure. Use when configuring domain resolution, automating DNS from Kubernetes with external-dns, setting up DNS-based load balancing, or troubleshooting propagation issues across cloud providers (Route53, Cloud DNS, Azure DNS, Cloudflare).
---
# DNS Management
Configure and automate DNS records with proper TTL strategies, DNS-as-code patterns, and troubleshooting techniques.
## Purpose
Guide DNS configuration for applications, infrastructure, and services with focus on:
- Record type selection (A, AAAA, CNAME, MX, TXT, SRV, CAA)
- TTL strategies for propagation and caching
- DNS-as-code automation (external-dns, OctoDNS, DNSControl)
- Cloud DNS services comparison and selection
- DNS-based load balancing patterns
- Troubleshooting tools and techniques
## When to Use This Skill
Apply DNS management patterns when:
- Setting up DNS for new applications or services
- Automating DNS updates from Kubernetes workloads
- Configuring DNS-based failover or load balancing
- Troubleshooting DNS propagation or resolution issues
- Migrating DNS between providers
- Planning DNS changes with minimal downtime
- Implementing GeoDNS for global users
## Record Type Selection
### Quick Reference
**Address Resolution:**
- **A Record**: Map hostname to IPv4 address (example.com → 192.0.2.1)
- **AAAA Record**: Map hostname to IPv6 address (example.com → 2001:db8::1)
- **CNAME Record**: Alias to another domain (www.example.com → example.com)
- Cannot use at zone apex (@)
- Cannot coexist with other records at same name
**Email Configuration:**
- **MX Record**: Direct email to mail servers with priority
- **TXT Record**: Email authentication (SPF, DKIM, DMARC) and verification
**Service Discovery:**
- **SRV Record**: Specify service location (protocol, priority, weight, port, target)
**Delegation and Security:**
- **NS Record**: Delegate subdomain to different nameservers
- **CAA Record**: Restrict which Certificate Authorities can issue certificates
**Cloud-Specific:**
- **ALIAS Record**: Like CNAME but works at zone apex (Route53, Cloudflare)
### Decision Tree
```
Need to point domain to:
├─ IPv4 Address? → A record
├─ IPv6 Address? → AAAA record
├─ Another Domain?
│ ├─ Zone apex (@) → ALIAS/ANAME or A record
│ └─ Subdomain → CNAME
├─ Mail Server? → MX record (with priority)
├─ Email Authentication? → TXT record (SPF/DKIM/DMARC)
├─ Service Discovery? → SRV record
├─ Domain Verification? → TXT record
├─ Certificate Control? → CAA record
└─ Subdomain Delegation? → NS record
```
For detailed record type examples and patterns, see `references/record-types.md`.
## TTL Strategy
### Standard TTL Values
**By Change Frequency:**
- **Stable records**: 3600-86400s (1-24 hours) - NS, stable A/AAAA
- **Normal operation**: 3600s (1 hour) - Standard websites, MX
- **Moderate changes**: 300-1800s (5-30 min) - Development, A/B testing
- **Failover scenarios**: 60-300s (1-5 min) - Critical records needing fast updates
**Key Principle:** Lower TTL = faster propagation but higher DNS query load
### Pre-Change Process
When planning DNS changes:
```
T-48h: Lower TTL to 300s
T-24h: Verify TTL propagated globally
T-0h: Make DNS change
T+1h: Verify new records propagating
T+6h: Confirm global propagation
T+24h: Raise TTL back to normal (3600s)
```
**Propagation Formula:** `Max Time = Old TTL + New TTL + Query Time`
Example: Changing a record with 3600s TTL takes up to 2 hours to fully propagate.
### TTL by Use Case
| Use Case | TTL | Rationale |
|----------|-----|-----------|
| Production (stable) | 3600s | Balance speed and load |
| Before planned change | 300s | Fast propagation |
| Development/staging | 300-600s | Frequent changes |
| DNS-based failover | 60-300s | Fast recovery |
| Mail servers | 3600s | Rarely change |
| NS records | 86400s | Very stable |
For detailed TTL scenarios and calculations, see `references/ttl-strategies.md`.
## DNS-as-Code Tools
### Tool Selection by Use Case
**Kubernetes DNS Automation → external-dns**
- Annotation-based configuration on Services/Ingresses
- Automatic sync to DNS providers (20+ supported)
- No manual DNS updates required
- See `examples/external-dns/`
**Multi-Provider DNS Management → OctoDNS or DNSControl**
- Version control for DNS records
- Sync configuration across multiple providers
- Preview changes before applying
- OctoDNS (Python/YAML) - See `examples/octodns/`
- DNSControl (JavaScript) - See `examples/dnscontrol/`
**Infrastructure-as-Code → Terraform**
- Manage DNS alongside cloud resources
- Provider-specific resources (aws_route53_record, etc.)
- See `examples/terraform/`
### Tool Comparison
| Tool | Language | Best For | Kubernetes | Multi-Provider |
|------|----------|----------|------------|----------------|
| external-dns | Go | K8s automation | ★★★★★ | ★★★★ |
| OctoDNS | Python/YAML | Version control | ★★★ | ★★★★★ |
| DNSControl | JavaScript | Complex logic | ★★ | ★★★★★ |
| Terraform | HCL | IaC integration | ★★★ | ★★★★ |
### Quick Start: external-dns
```yaml
# Kubernetes Service with DNS annotation
apiVersion: v1
kind: Service
metadata:
name: app
annotations:
external-dns.alpha.kubernetes.io/hostname: app.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
ports:
- port: 80
```
Deploy external-dns controller once, then all annotated Services/Ingresses automatically create DNS records.
For complete examples, see `examples/external-dns/` and `references/dns-as-code-comparison.md`.
## Cloud DNS Provider Selection
### Provider Characteristics
**AWS Route53**
- Best for AWS-heavy infrastructure
- Advanced routing policies (weighted, latency, geolocation, failover)
- Health checks with automatic failover
- ALIAS records for AWS resources (ELB, CloudFront, S3)
- Pricing: $0.50/month per zone + $0.40 per million queries
**Google Cloud DNS**
- Best for GCP-native applications
- Strong DNSSEC support with automatic key rotation
- Private zones for VPC internal DNS
- Split-horizon DNS (different internal/external records)
- Pricing: $0.20/month per zone + $0.40 per million queries
**Azure DNS**
- Best for Azure-native applications
- Integration with Azure Traffic Manager
- Azure Private DNS zones
- Azure RBAC for access control
- Pricing: $0.50/month per zone + $0.40 per million queries
**Cloudflare**
- Best for multi-cloud or cloud-agnostic
- Fastest DNS query times globally
- Built-in DDoS protection
- Free tier with unlimited queries
- CDN integration
- Pricing: Free tier, $20/month Pro, $200/month Business
### Selection Decision Tree
```
Choose based on:
├─ AWS-heavy? → Route53
├─ GCP-native? → Cloud DNS
├─ Azure-native? → Azure DNS
├─ Multi-cloud? → Cloudflare or OctoDNS/DNSControl
├─ Need fastest global DNS? → Cloudflare
├─ Need DDoS protection? → Cloudflare
└─ Budget-conscious? → Cloudflare (free tier) or Cloud DNS (lowest zone cost)
```
For detailed provider comparisons and examples, see `references/cloud-providers.md`.
## DNS-Based Load Balancing
### GeoDNS (Geographic Routing)
Return different IP addresses based on client location to:
- Reduce latency (route to nearest data center)
- Comply with data residency requirements
- Distribute load across regions
**Example Pattern:**
```
Client Location → DNS Response
├─ North America → 192.0.2.1 (US data center)
├─ Europe → 192.0.2.10 (EU data center)
└─ Default → CloudFront edge (global CDN)
```
### Weighted Routing
Distribute traffic by percentage for:
- Blue-green deployments
- Canary releases (10% to new version)
- A/B testing
**Example Pattern:**
```
DNS Responses:
├─ 90% → 192.0.2.1 (stable version)
└─ 10% → 192.0.2.2 (canary version)
```
### Health Check-Based Failover
Automatically route traffic away from unhealthy endpoints.
**Pattern:**
```
Primary: 192.0.2.1 (health checked every 30s)
├─ Healthy → Return primary IP
└─ Unhealthy → Return secondary IP (192.0.2.2)
Failover time: ~2-3 minutes
= Health check failures (90s) + TTL expiration (60s)
```
For complete load balancing examples, see `examples/load-balancing/`.
## Troubleshooting
### Essential Commands
**Check DNS Resolution:**
```bash
# Basic query
dig example.com
# Clean output (just IP)
dig example.com +short
# Query specific DNS server
dig @8.8.8.8 example.com
dig @1.1.1.1 example.com
# Trace resolution path
dig +trace example.com
```
**Check TTL:**
```bash
dig example.com | grep -A1 "ANSWER SECTION"
# Look for TTL value (number before IN A)
```
**Check Propagation:**
```bash
# Multiple resolvers
dig @8.8.8.8 example.com +short # Google
dig @1.1.1.1 example.com +short # Cloudflare
dig @208.67.222.222 example.com +short # OpenDNS
```
**Flush Local DNS Cache:**
```bash
# macOS
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
# Windows
ipconfig /flushdns
# Linux
sudo systemd-resolve --flush-caches
```
### Common Problems
**Slow Propagation:**
- Check current TTL (old TTL must expire first)
- Lower TTL 24-48 hours before changes
- Use propagation checkers: whatsmydns.net, dnschecker.org
**CNAME at Zone Apex:**
- Error: Cannot use CNAME at @ (zone apex)
- Solution: Use ALIAS record (Route53, Cloudflare) or A record
**external-dns Not Creating Records:**
- Verify annotation spelling: `external-dns.alpha.kubernetes.io/hostname`
- Check domain filter matches: `--domain-filter=example.com`
- Review external-dns logs for errors
- Confirm provider credentials configured
For detailed troubleshooting, see `references/troubleshooting.md`.
## Common Patterns
### Pattern 1: Kubernetes DNS Automation
```yaml
# Deploy external-dns (once per cluster)
helm install external-dns external-dns/external-dns \
--set provider=aws \
--set domainFilters[0]=example.com \
--set policy=sync
# Then annotate Services
apiVersion: v1
kind: Service
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: api.example.com
external-dns.alpha.kubernetes.io/ttl: "300"
spec:
type: LoadBalancer
```
### Pattern 2: Multi-Provider Sync with OctoDNS
```yaml
# octodns-config.yaml
providers:
config:
class: octodns.provider.yaml.YamlProvider
directory: ./config
route53:
class: octodns_route53.Route53Provider
cloudflare:
class: octodns_cloudflare.CloudflareProvider
zones:
example.com.:
sources: [config]
targets: [route53, cloudflare]
```
### Pattern 3: DNS-Based Failover
```hcl
# Route53 with health checks
resource "aws_route53_health_check" "primary" {
fqdn = "primary.example.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
}
resource "aws_route53_record" "primary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "primary"
failover_routing_policy {
type = "PRIMARY"
}
health_check_id = aws_route53_health_check.primary.id
records = ["192.0.2.1"]
}
resource "aws_route53_record" "secondary" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
ttl = 60
set_identifier = "secondary"
failover_routing_policy {
type = "SECONDARY"
}
records = ["192.0.2.2"]
}
```
## Integration with Other Skills
**infrastructure-as-code:**
- Manage DNS via Terraform/Pulumi alongside other resources
- Zone configuration in IaC repositories
**kubernetes-operations:**
- external-dns automates DNS for Kubernetes workloads
- Ingress controller integration for automatic DNS
**load-balancing-patterns:**
- DNS-based load balancing (GeoDNS, weighted routing)
- Health checks and failover configurations
**security-hardening:**
- DNSSEC for DNS integrity
- CAA records for certificate authority control
- DNS-based DDoS mitigation
**secret-management:**
- Store DNS provider API credentials in vaults
- Secure DDNS update mechanisms
## Additional Resources
**Reference Documentation:**
- `references/record-types.md` - Detailed record type guide with examples
- `references/ttl-strategies.md` - TTL scenarios and propagation calculations
- `references/cloud-providers.md` - Provider comparison and detailed features
- `references/troubleshooting.md` - Common problems and solutions
- `references/dns-as-code-comparison.md` - Tool comparison matrix
**Examples:**
- `examples/external-dns/` - Kubernetes DNS automation
- `examples/octodns/` - Multi-provider sync with YAML
- `examples/dnscontrol/` - Multi-provider with JavaScript DSL
- `examples/terraform/` - Cloud provider configurations
- `examples/load-balancing/` - GeoDNS and failover patterns
**Scripts:**
- `scripts/check-dns-propagation.sh` - Verify propagation across resolvers
- `scripts/validate-dns-config.py` - Validate DNS configuration
- `scripts/export-dns-records.sh` - Export existing DNS records
- `scripts/calculate-ttl-propagation.py` - Calculate propagation time
## Quick Reference
### Record Types Cheat Sheet
| Record | Purpose | Example |
|--------|---------|---------|
| A | IPv4 address | example.com → 192.0.2.1 |
| AAAA | IPv6 address | example.com → 2001:db8::1 |
| CNAME | Alias to domain | www → example.com |
| MX | Mail server | 10 mail.example.com |
| TXT | Text/verification | "v=spf1 include:_spf.google.com ~all" |
| SRV | Service location | 10 60 5060 sip.example.com |
| NS | Nameserver delegation | ns1.provider.com |
| CAA | CA authorization | 0 issue "letsencrypt.org" |
### TTL Cheat Sheet
| Scenario | TTL | Why |
|----------|-----|-----|
| Stable production | 3600s | Balance speed/load |
| Before change | 300s | Fast propagation |
| Failover | 60-300s | Fast recovery |
| NS records | 86400s | Very stable |
### Provider Cheat Sheet
| Provider | Best For | Key Feature |
|----------|----------|-------------|
| Route53 | AWS | Advanced routing, health checks |
| Cloud DNS | GCP | DNSSEC, private zones |
| Azure DNS | Azure | Traffic Manager integration |
| Cloudflare | Multi-cloud | Fastest, DDoS protection, free tier |
### Tool Cheat Sheet
| Tool | Use When |
|------|----------|
| external-dns | Kubernetes DNS automation |
| OctoDNS | Multi-provider, Python shop |
| DNSControl | Multi-provider, JavaScript preference |
| Terraform | Managing DNS with other infrastructure |
This skill helps engineers manage DNS records, TTL strategies, and DNS-as-code automation across cloud providers. It focuses on selecting correct record types, planning TTLs for minimal downtime, automating updates from Kubernetes, and implementing DNS-based load balancing and failover patterns. Practical guidance covers Route53, Cloud DNS, Azure DNS, and Cloudflare workflows.
The skill inspects DNS requirements and recommends record types (A, AAAA, CNAME, MX, TXT, SRV, NS, CAA, ALIAS) based on the desired outcome. It prescribes TTL strategies and a pre-change process to reduce propagation pain, maps DNS-as-code tools (external-dns, OctoDNS, DNSControl, Terraform) to common workflows, and provides load-balancing patterns (GeoDNS, weighted routing, health-check failover). It also includes troubleshooting commands and checks to validate propagation and diagnose common issues.
Can I use a CNAME at the zone apex?
No. Use an ALIAS/ANAME or point the apex to A/AAAA records; many providers (Route53, Cloudflare) support ALIAS-like records.
How long does DNS propagation take after changing a record?
Propagation can take up to the previous TTL plus any resolver caching; follow a pre-change TTL reduction (e.g., to 300s) 24–48 hours in advance to minimize delay.