home / skills / markus41 / claude / pattern-extraction
/.claude/tools/plugin-cli/infrastructure-template-generator/skills/pattern-extraction
This skill analyzes codebases to detect patterns and extract variables for template generation across IaC and configuration files.
npx playbooks add skill markus41/claude --skill pattern-extractionReview the files below or copy the command above to add this skill to your agents.
---
name: Pattern Extraction
description: Detect and extract patterns from codebases for template generation
version: 1.0.0
trigger_phrases:
- "extract patterns"
- "detect patterns"
- "find variables"
- "identify configuration"
- "analyze structure"
categories: ["infrastructure", "templating", "analysis"]
---
# Pattern Extraction Skill
## When to Use This Skill
Use pattern extraction when you need to:
- **Analyze existing infrastructure** to create reusable templates
- **Convert hardcoded values** into configurable template variables
- **Identify configuration patterns** across multiple similar files
- **Detect technology stacks** and their conventions
- **Extract naming conventions** and structural patterns
- **Generate template metadata** from existing code
- **Create variable schemas** from inferred types and constraints
Perfect for:
- Converting existing IaC to templates
- Building template libraries from production code
- Standardizing infrastructure patterns
- Automating template generation
- Identifying refactoring opportunities
## Core Capabilities
### 1. Technology Stack Detection
Automatically identify:
- **IaC Frameworks**: Terraform, Pulumi, CloudFormation, ARM, Bicep
- **Cloud Providers**: AWS, Azure, GCP, multi-cloud patterns
- **Service Types**: Kubernetes, Docker, serverless, containers
- **Configuration Formats**: YAML, JSON, HCL, TOML
- **Build Systems**: Helm, Kustomize, Jsonnet
- **CI/CD Platforms**: GitHub Actions, GitLab CI, Jenkins, Harness
### 2. Configuration Pattern Extraction
Extract patterns from:
- Resource definitions and relationships
- Environment-specific configurations
- Naming conventions and tagging strategies
- Security policies and compliance rules
- Network topologies and architectures
- Deployment strategies and workflows
### 3. Variable Identification
Intelligent detection of:
- **Hardcoded Values**: Strings, numbers, booleans that should be variables
- **Repeated Values**: Values appearing multiple times across files
- **Environment Indicators**: dev, staging, prod patterns
- **Naming Patterns**: Prefixes, suffixes, delimiters
- **Secret Patterns**: API keys, passwords, tokens (flag for security)
- **Configuration Schemas**: Type inference from usage
### 4. Structure Analysis
Analyze:
- File organization and directory structure
- Module boundaries and dependencies
- Resource hierarchies and relationships
- Configuration inheritance patterns
- Composition and reuse strategies
- Template inclusion patterns
## Pattern Detection Matrix
| Pattern Type | Indicators | Extraction Method | Output Format |
|--------------|-----------|-------------------|---------------|
| **Environment Values** | `dev`, `staging`, `prod` in names | Context-aware regex | `{{ environment }}` |
| **Resource Names** | Repeated prefixes/suffixes | Token analysis | `{{ project_name }}-{{ resource_type }}` |
| **Region/Location** | `us-east-1`, `westeurope` | Cloud provider patterns | `{{ region }}` |
| **Version Numbers** | Semantic versioning patterns | Regex + validation | `{{ version }}` |
| **Port Numbers** | Common service ports | Port range analysis | `{{ port }}` |
| **Size/Scale** | Instance types, node counts | Capacity patterns | `{{ instance_size }}` |
| **CIDR Blocks** | IP address ranges | Network pattern analysis | `{{ cidr_block }}` |
| **Tags/Labels** | Key-value metadata | Metadata extraction | `{{ tags }}` |
| **Secret References** | Vault paths, secret names | Secret pattern detection | `{{ secret_ref }}` (secure) |
| **Feature Flags** | Boolean toggles | Conditional analysis | `{{ enable_feature }}` |
## Variable Inference Rules
### Before/After Transformation Examples
#### Example 1: Resource Names
**Before:**
```hcl
resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t3.medium"
tags = {
Name = "myapp-web-server-prod"
Environment = "production"
Project = "myapp"
}
}
```
**After:**
```hcl
resource "aws_instance" "web_server" {
ami = "{{ ami_id }}"
instance_type = "{{ instance_type }}"
tags = {
Name = "{{ project_name }}-web-server-{{ environment }}"
Environment = "{{ environment }}"
Project = "{{ project_name }}"
}
}
```
**Extracted Variables:**
```yaml
variables:
ami_id:
type: string
description: "AMI ID for the EC2 instance"
default: "ami-0c55b159cbfafe1f0"
pattern: "^ami-[a-f0-9]{17}$"
instance_type:
type: string
description: "EC2 instance type"
default: "t3.medium"
allowed_values: ["t3.micro", "t3.small", "t3.medium", "t3.large"]
project_name:
type: string
description: "Project identifier used in resource naming"
default: "myapp"
pattern: "^[a-z][a-z0-9-]{2,30}$"
environment:
type: string
description: "Deployment environment"
default: "production"
allowed_values: ["dev", "staging", "production"]
```
#### Example 2: Kubernetes Deployment
**Before:**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
version: "1.21.0"
spec:
containers:
- name: nginx
image: nginx:1.21.0
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
```
**After:**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ app_name }}-deployment
namespace: {{ namespace }}
spec:
replicas: {{ replica_count }}
selector:
matchLabels:
app: {{ app_name }}
template:
metadata:
labels:
app: {{ app_name }}
version: "{{ app_version }}"
spec:
containers:
- name: {{ app_name }}
image: {{ container_image }}:{{ app_version }}
ports:
- containerPort: {{ container_port }}
resources:
requests:
memory: "{{ memory_request }}"
cpu: "{{ cpu_request }}"
limits:
memory: "{{ memory_limit }}"
cpu: "{{ cpu_limit }}"
```
**Extracted Variables:**
```yaml
variables:
app_name:
type: string
description: "Application name"
default: "nginx"
namespace:
type: string
description: "Kubernetes namespace"
default: "production"
replica_count:
type: integer
description: "Number of pod replicas"
default: 3
min: 1
max: 10
app_version:
type: string
description: "Application version"
default: "1.21.0"
pattern: "^\\d+\\.\\d+\\.\\d+$"
container_image:
type: string
description: "Container image name"
default: "nginx"
container_port:
type: integer
description: "Container port"
default: 80
memory_request:
type: string
description: "Memory request"
default: "128Mi"
memory_limit:
type: string
description: "Memory limit"
default: "256Mi"
cpu_request:
type: string
description: "CPU request"
default: "250m"
cpu_limit:
type: string
description: "CPU limit"
default: "500m"
```
#### Example 3: Azure Bicep
**Before:**
```bicep
resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = {
name: 'mystorageacct12345'
location: 'westeurope'
sku: {
name: 'Standard_LRS'
}
kind: 'StorageV2'
properties: {
accessTier: 'Hot'
minimumTlsVersion: 'TLS1_2'
supportsHttpsTrafficOnly: true
allowBlobPublicAccess: false
}
tags: {
environment: 'production'
costCenter: 'engineering'
project: 'platform'
}
}
```
**After:**
```bicep
resource storageAccount 'Microsoft.Storage/storageAccounts@2021-04-01' = {
name: '{{ storage_account_name }}'
location: '{{ location }}'
sku: {
name: '{{ sku_name }}'
}
kind: 'StorageV2'
properties: {
accessTier: '{{ access_tier }}'
minimumTlsVersion: '{{ min_tls_version }}'
supportsHttpsTrafficOnly: {{ https_only }}
allowBlobPublicAccess: {{ allow_public_access }}
}
tags: {
environment: '{{ environment }}'
costCenter: '{{ cost_center }}'
project: '{{ project_name }}'
}
}
```
**Extracted Variables:**
```yaml
variables:
storage_account_name:
type: string
description: "Storage account name (globally unique)"
default: "mystorageacct12345"
pattern: "^[a-z0-9]{3,24}$"
location:
type: string
description: "Azure region"
default: "westeurope"
allowed_values: ["westeurope", "northeurope", "eastus", "westus"]
sku_name:
type: string
description: "Storage account SKU"
default: "Standard_LRS"
allowed_values: ["Standard_LRS", "Standard_GRS", "Premium_LRS"]
access_tier:
type: string
description: "Storage access tier"
default: "Hot"
allowed_values: ["Hot", "Cool"]
min_tls_version:
type: string
description: "Minimum TLS version"
default: "TLS1_2"
https_only:
type: boolean
description: "Require HTTPS traffic only"
default: true
allow_public_access:
type: boolean
description: "Allow public blob access"
default: false
environment:
type: string
description: "Environment name"
default: "production"
cost_center:
type: string
description: "Cost center for billing"
default: "engineering"
project_name:
type: string
description: "Project name"
default: "platform"
```
## Decision Tree
Pattern classification flowchart:
```
START: Analyze value/pattern
|
v
┌───────────────────────────────────┐
│ Is it repeated across files? │
└─────┬─────────────────────┬───────┘
│ YES │ NO
v v
┌─────────────┐ ┌──────────────┐
│ Global Var │ │ Check Usage │
└─────────────┘ └──────┬───────┘
│
v
┌──────────────────────┐
│ Used in conditionals? │
└──┬─────────────────┬──┘
│ YES │ NO
v v
┌──────────┐ ┌──────────────┐
│ Feature │ │ Check Type │
│ Flag │ └──────┬───────┘
└──────────┘ │
v
┌──────────────────────┐
│ Contains 'env'? │
└──┬───────────────┬───┘
│ YES │ NO
v v
┌───────────┐ ┌──────────────┐
│ Env Var │ │ Check Format │
└───────────┘ └──────┬───────┘
│
v
┌──────────────────┐
│ Cloud resource? │
└──┬───────────┬───┘
│ YES │ NO
v v
┌──────────┐ ┌─────────┐
│ Resource │ │ Generic │
│ ID │ │ Config │
└──────────┘ └─────────┘
```
## Examples
### Example 1: Extract Terraform AWS Pattern
**Input:**
```bash
extract patterns from ./terraform/aws/ec2-instances.tf
```
**Analysis:**
```
Analyzing: terraform/aws/ec2-instances.tf
├── Technology: Terraform (AWS Provider)
├── Resources Found: 3 (aws_instance, aws_security_group, aws_eip)
├── Variables Identified: 12
└── Patterns Detected:
├── Naming Convention: {project}-{resource}-{env}
├── Tag Strategy: Environment, Project, ManagedBy
└── Configuration Reuse: 85% similarity across instances
```
**Extracted Template:**
```hcl
# Template: aws-ec2-instance.tf.tmpl
variable "project_name" {
type = string
description = "Project identifier"
}
variable "environment" {
type = string
description = "Deployment environment"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "instance_config" {
type = object({
ami = string
instance_type = string
key_name = string
subnet_id = string
})
description = "EC2 instance configuration"
}
resource "aws_instance" "main" {
ami = var.instance_config.ami
instance_type = var.instance_config.instance_type
key_name = var.instance_config.key_name
subnet_id = var.instance_config.subnet_id
vpc_security_group_ids = [aws_security_group.main.id]
tags = {
Name = "${var.project_name}-instance-${var.environment}"
Environment = var.environment
Project = var.project_name
ManagedBy = "Terraform"
}
}
resource "aws_security_group" "main" {
name = "${var.project_name}-sg-${var.environment}"
description = "Security group for ${var.project_name} in ${var.environment}"
tags = {
Name = "${var.project_name}-sg-${var.environment}"
Environment = var.environment
Project = var.project_name
}
}
```
### Example 2: Extract Kubernetes Pattern
**Input:**
```bash
extract patterns from ./k8s/deployments/ --type kubernetes
```
**Analysis:**
```
Analyzing: k8s/deployments/ (15 files)
├── Technology: Kubernetes (v1.24+)
├── Resources Found: 45 total
│ ├── Deployments: 15
│ ├── Services: 12
│ ├── ConfigMaps: 10
│ └── Ingresses: 8
├── Common Patterns:
│ ├── Label Strategy: app, version, environment
│ ├── Resource Requests: 90% use standard sizes
│ ├── Probes: 100% have health checks
│ └── Image Pattern: registry/org/image:tag
└── Variable Candidates: 28 identified
```
**Extracted Template:**
```yaml
# Template: kubernetes-microservice.yaml.tmpl
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ service_name }}-deployment
namespace: {{ namespace }}
labels:
app: {{ service_name }}
version: {{ version }}
environment: {{ environment }}
spec:
replicas: {{ replica_count }}
selector:
matchLabels:
app: {{ service_name }}
template:
metadata:
labels:
app: {{ service_name }}
version: {{ version }}
environment: {{ environment }}
spec:
containers:
- name: {{ service_name }}
image: {{ image_registry }}/{{ organization }}/{{ service_name }}:{{ version }}
ports:
- containerPort: {{ container_port }}
protocol: TCP
env:
{{#each environment_variables}}
- name: {{ name }}
value: "{{ value }}"
{{/each}}
resources:
requests:
memory: {{ memory_request }}
cpu: {{ cpu_request }}
limits:
memory: {{ memory_limit }}
cpu: {{ cpu_limit }}
livenessProbe:
httpGet:
path: {{ health_check_path }}
port: {{ container_port }}
initialDelaySeconds: {{ liveness_initial_delay }}
periodSeconds: {{ liveness_period }}
readinessProbe:
httpGet:
path: {{ readiness_check_path }}
port: {{ container_port }}
initialDelaySeconds: {{ readiness_initial_delay }}
periodSeconds: {{ readiness_period }}
---
apiVersion: v1
kind: Service
metadata:
name: {{ service_name }}-service
namespace: {{ namespace }}
labels:
app: {{ service_name }}
spec:
type: {{ service_type }}
ports:
- port: {{ service_port }}
targetPort: {{ container_port }}
protocol: TCP
selector:
app: {{ service_name }}
```
### Example 3: Multi-File Pattern Extraction
**Input:**
```bash
extract patterns from ./infrastructure/ --recursive --consolidate
```
**Analysis:**
```
Analyzing: infrastructure/ (recursive)
├── Files Scanned: 47
├── Technologies Detected:
│ ├── Terraform (AWS): 23 files
│ ├── Kubernetes: 15 files
│ ├── Helm Charts: 9 files
│ └── Docker Compose: 2 files (excluded - different pattern)
├── Cross-Cutting Patterns:
│ ├── Environment Strategy: 3-tier (dev/staging/prod)
│ ├── Tagging: 100% compliance with org policy
│ ├── Naming: Consistent kebab-case with env suffix
│ └── Secrets: HashiCorp Vault references
└── Template Opportunities:
├── AWS Lambda Function: 8 similar resources
├── RDS Database: 5 similar resources
├── Kubernetes Service: 12 similar resources
└── ALB Configuration: 6 similar resources
```
**Output:**
```
Generated Templates:
├── templates/
│ ├── aws-lambda-function.tf.tmpl (consolidated from 8 files)
│ ├── aws-rds-instance.tf.tmpl (consolidated from 5 files)
│ ├── kubernetes-service.yaml.tmpl (consolidated from 12 files)
│ └── aws-alb.tf.tmpl (consolidated from 6 files)
└── variables/
├── common.yaml (shared across all templates)
├── aws-specific.yaml (AWS provider variables)
└── kubernetes-specific.yaml (K8s variables)
Variable Reuse Analysis:
├── Shared Variables: 15 (45% reuse)
├── Template-Specific: 18 (55% unique)
└── Potential Consolidation: 3 variables can be merged
```
## Best Practices
### 1. Incremental Extraction
Start with a single file or small directory to refine patterns before scaling:
```bash
# Start small
extract patterns from ./terraform/main.tf
# Validate and adjust
review template ./templates/main.tf.tmpl
# Scale up
extract patterns from ./terraform/ --recursive
```
### 2. Variable Naming Conventions
Follow consistent naming:
- Use snake_case for variables
- Prefix with context: `aws_`, `k8s_`, `azure_`
- Suffix with type: `_count`, `_enabled`, `_config`
- Be descriptive: `instance_type` not `type`
### 3. Type Inference Priority
1. **Explicit types** from existing variable definitions
2. **Usage patterns** (e.g., used in math → number)
3. **Value format** (e.g., "true"/"false" → boolean)
4. **Defaults** to string if ambiguous
### 4. Security-Sensitive Patterns
Always flag potential secrets:
```yaml
# Good - flagged for security review
api_key:
type: string
sensitive: true
description: "API key for external service"
default: null # Force explicit value
```
### 5. Validation Rules
Add validation for critical variables:
```yaml
environment:
type: string
allowed_values: ["dev", "staging", "prod"]
cidr_block:
type: string
pattern: "^\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}/\\d{1,2}$"
replica_count:
type: integer
min: 1
max: 100
```
### 6. Documentation Generation
Auto-generate docs from extracted patterns:
```markdown
# Generated Template: aws-ec2-instance
## Description
Extracted from 8 similar EC2 instance definitions.
Pattern confidence: 95%
## Variables
| Name | Type | Required | Default | Description |
|------|------|----------|---------|-------------|
| instance_type | string | Yes | t3.medium | EC2 instance type |
| environment | string | Yes | - | Deployment environment |
## Usage
terraform apply -var="instance_type=t3.large" -var="environment=prod"
```
### 7. Pattern Confidence Scoring
Rate extraction confidence:
- **High (90-100%)**: Identical patterns across files
- **Medium (70-89%)**: Similar with minor variations
- **Low (<70%)**: Significant differences, manual review needed
### 8. Iterative Refinement
```bash
# Extract initial patterns
extract patterns from ./infra/ --output ./templates/v1/
# Review and refine
review templates ./templates/v1/ --suggest-improvements
# Re-extract with refinements
extract patterns from ./infra/ --output ./templates/v2/ \
--naming-convention kebab-case \
--tag-strategy company-standard
```
## Related Skills
- **[[template-generation]]** - Generate templates from extracted patterns
- **[[variable-schema-design]]** - Design robust variable schemas
- **[[template-validation]]** - Validate generated templates
- **[[naming-convention-analyzer]]** - Analyze and enforce naming conventions
- **[[configuration-consolidation]]** - Merge similar configurations
- **[[security-pattern-detection]]** - Identify security anti-patterns
- **[[compliance-checking]]** - Ensure extracted patterns meet compliance requirements
## Advanced Techniques
### Multi-Stage Extraction Pipeline
```bash
# Stage 1: Initial scan
extract patterns from ./infra/ --stage scan
# Stage 2: Pattern classification
extract patterns from ./infra/ --stage classify
# Stage 3: Variable inference
extract patterns from ./infra/ --stage variables
# Stage 4: Template generation
extract patterns from ./infra/ --stage generate
# Stage 5: Validation
extract patterns from ./infra/ --stage validate
```
### Machine Learning-Enhanced Extraction
Use ML models to improve pattern detection:
- **Clustering**: Group similar configurations
- **Anomaly Detection**: Identify outliers for manual review
- **Type Inference**: Predict variable types from usage
- **Dependency Analysis**: Extract implicit dependencies
### Cross-Repository Pattern Mining
Extract patterns across multiple repositories:
```bash
extract patterns --repos-file ./repos.txt \
--output ./org-templates/ \
--consolidate-org-wide
```
## Output Formats
### JSON Schema
```json
{
"template": "aws-ec2-instance",
"version": "1.0.0",
"variables": {
"instance_type": {
"type": "string",
"default": "t3.medium",
"description": "EC2 instance type"
}
}
}
```
### YAML Schema
```yaml
template: aws-ec2-instance
version: 1.0.0
variables:
instance_type:
type: string
default: t3.medium
description: EC2 instance type
```
### Terraform Variable Definitions
```hcl
variable "instance_type" {
type = string
default = "t3.medium"
description = "EC2 instance type"
}
```
## Integration Points
- **CI/CD Pipelines**: Auto-extract patterns on code changes
- **Template Registries**: Publish extracted templates
- **Documentation Systems**: Generate docs from patterns
- **Monitoring**: Track pattern usage and evolution
- **Compliance Tools**: Validate against org standards
## Troubleshooting
### Pattern Detection Issues
**Problem**: Too many false positives
**Solution**: Increase confidence threshold, add exclusion patterns
**Problem**: Missing obvious patterns
**Solution**: Check file encoding, adjust regex patterns, enable debug mode
**Problem**: Inconsistent variable names
**Solution**: Enable smart naming normalization, use naming dictionary
### Template Generation Issues
**Problem**: Templates too generic
**Solution**: Use narrower extraction scope, increase specificity
**Problem**: Templates too specific
**Solution**: Widen extraction scope, enable cross-file consolidation
---
**Version:** 1.0.0
**Last Updated:** 2026-01-19
**Skill Type:** Analysis & Extraction
**Complexity:** Advanced
This skill detects and extracts recurring patterns from codebases to generate reusable templates and variable schemas. It targets infrastructure and configuration files, inferring variables, naming conventions, and technology stacks to accelerate template creation and standardization. The output is ready-to-use template fragments and a set of validated variable definitions.
The skill scans project files, detects technology and configuration formats, and identifies repeated values, hardcoded literals, and structural patterns. It classifies findings (environment values, resource names, regions, secrets, ports, etc.), infers types and constraints, and emits template-ready replacements plus a variables manifest. It also flags potential secrets and recommends validation rules or allowed values.
What file types and frameworks does this skill support?
It recognizes common IaC and config formats such as Terraform (HCL), CloudFormation, Bicep, ARM, Kubernetes YAML, Helm, JSON, and common CI/CD files; detection works via file extension and content heuristics.
How does it handle secrets found in code?
Secrets are flagged as high-risk candidates and replaced with secure variable references in templates; the tool recommends storing them in a secret manager and does not automatically persist secrets in generated manifests.