home / skills / dexploarer / claudius-skills / regex-pattern-builder

regex-pattern-builder skill

safe

/examples/intermediate/developer-productivity/regex-pattern-builder

This skill builds regex patterns from natural language, explains existing patterns, tests them, and provides practical examples.

npx playbooks add skill dexploarer/claudius-skills --skill regex-pattern-builder

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

11.1 KB

---
name: regex-pattern-builder
description: Builds and explains regex patterns from natural language, tests patterns, and provides examples. Use when user asks to "create regex", "regex pattern", "match pattern", "validate email/phone", or "regex help".
allowed-tools: [Read, Write]
---

# Regex Pattern Builder

Creates regex patterns from natural language descriptions, explains existing patterns, and helps test and debug regex.

## When to Use

- "Create a regex to match emails"
- "Regex pattern for phone numbers"
- "How do I match URLs"
- "Explain this regex"
- "Test my regex pattern"
- "Validate password regex"

## Instructions

### 1. Understand the Requirement

Ask clarifying questions if needed:
- What format are you trying to match?
- Should it be strict or permissive?
- What language/flavor (JavaScript, Python, etc.)?
- Full match or contains?
- Case sensitive?

### 2. Build Pattern from Description

## Common Patterns

**Email validation:**
```javascript
// Simple (permissive)
/^[^\s@]+@[^\s@]+\.[^\s@]+$/

// More strict (RFC 5322 simplified)
/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

// Explanation:
// ^              - Start of string
// [a-zA-Z0-9._%+-]+ - Username: letters, numbers, dots, etc.
// @              - Literal @ symbol
// [a-zA-Z0-9.-]+ - Domain name
// \.             - Literal dot
// [a-zA-Z]{2,}   - TLD (2+ letters)
// $              - End of string
```

**Phone numbers:**
```javascript
// US phone (flexible)
/^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$/

// Matches:
// 123-456-7890
// (123) 456-7890
// 123.456.7890
// 1234567890

// International (E.164)
/^\+?[1-9]\d{1,14}$/

// Explanation:
// ^\+?           - Optional + at start
// [1-9]          - First digit 1-9
// \d{1,14}       - 1-14 more digits
// $              - End of string
```

**URLs:**
```javascript
// Simple URL
/^https?:\/\/[\w\-._~:/?#[\]@!$&'()*+,;=]+$/

// With capture groups
/^(https?):\/\/([\w.-]+)(:\d+)?(\/[\w\-._~:/?#[\]@!$&'()*+,;=]*)?$/

// Groups:
// $1 - protocol (http/https)
// $2 - domain
// $3 - port (optional)
// $4 - path (optional)
```

**Passwords:**
```javascript
// At least 8 chars, 1 uppercase, 1 lowercase, 1 number
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d@$!%*?&]{8,}$/

// At least 8 chars, 1 uppercase, 1 lowercase, 1 number, 1 special
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/

// Explanation:
// (?=.*[a-z])    - Lookahead: contains lowercase
// (?=.*[A-Z])    - Lookahead: contains uppercase
// (?=.*\d)       - Lookahead: contains digit
// (?=.*[@$!%*?&])- Lookahead: contains special char
// [A-Za-z\d@$!%*?&]{8,} - 8+ valid characters
```

**Dates:**
```javascript
// YYYY-MM-DD
/^\d{4}-\d{2}-\d{2}$/

// MM/DD/YYYY or M/D/YYYY
/^(0?[1-9]|1[0-2])\/(0?[1-9]|[12]\d|3[01])\/\d{4}$/

// ISO 8601 (with time)
/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d{3})?Z?$/
```

**Credit card:**
```javascript
// Any 13-19 digits with optional spaces/dashes
/^[\d\s-]{13,19}$/

// Specific cards:
// Visa: /^4\d{12}(?:\d{3})?$/
// Mastercard: /^5[1-5]\d{14}$/
// Amex: /^3[47]\d{13}$/
```

**IP addresses:**
```javascript
// IPv4
/^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/

// IPv4 (simple)
/^(\d{1,3}\.){3}\d{1,3}$/

// IPv6 (simplified)
/^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$/
```

**Usernames:**
```javascript
// 3-16 chars, alphanumeric + underscore/hyphen
/^[a-zA-Z0-9_-]{3,16}$/

// Must start with letter, 3-16 chars
/^[a-zA-Z][a-zA-Z0-9_-]{2,15}$/
```

**Hex colors:**
```javascript
// #RGB or #RRGGBB
/^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$/

// With optional alpha (#RRGGBBAA)
/^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{8}|[A-Fa-f0-9]{3}|[A-Fa-f0-9]{4})$/
```

**HTML tags:**
```javascript
// Match opening tags
/<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)/

// Strip all HTML tags
/<[^>]*>/g

// Match specific tag
/<div\b[^>]*>(.*?)<\/div>/gs
```

**File paths:**
```javascript
// Windows path
/^[a-zA-Z]:\\(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$/

// Unix path
/^\/(?:[^\/\0]+\/)*[^\/\0]*$/

// File extension
/\.([a-zA-Z0-9]+)$/
```

### 3. Provide Test Cases

For each pattern, show examples:

```javascript
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

// Valid emails
console.log(emailRegex.test('[email protected]'))        // true
console.log(emailRegex.test('[email protected]')) // true
console.log(emailRegex.test('[email protected]'))     // true

// Invalid emails
console.log(emailRegex.test('@example.com'))            // false
console.log(emailRegex.test('user@'))                   // false
console.log(emailRegex.test('user example.com'))        // false
console.log(emailRegex.test('user@domain'))             // false
```

### 4. Explain Pattern Components

**Basic syntax:**
```
.       - Any character except newline
\d      - Digit [0-9]
\D      - Not digit
\w      - Word character [a-zA-Z0-9_]
\W      - Not word character
\s      - Whitespace [\t\n\r ]
\S      - Not whitespace

^       - Start of string/line
$       - End of string/line
\b      - Word boundary
\B      - Not word boundary

*       - 0 or more (greedy)
+       - 1 or more (greedy)
?       - 0 or 1 (greedy)
{n}     - Exactly n times
{n,}    - n or more times
{n,m}   - Between n and m times

*?      - 0 or more (lazy)
+?      - 1 or more (lazy)
??      - 0 or 1 (lazy)

[abc]   - Any of a, b, or c
[^abc]  - Not a, b, or c
[a-z]   - Any lowercase letter
[0-9]   - Any digit

(...)   - Capture group
(?:...) - Non-capturing group
(?=...) - Positive lookahead
(?!...) - Negative lookahead
(?<=...)- Positive lookbehind
(?<!...)- Negative lookbehind

|       - OR
\       - Escape special character
```

### 5. Language-Specific Variations

**JavaScript:**
```javascript
// Flags
const regex = /pattern/gi
// g - global (find all matches)
// i - case insensitive
// m - multiline (^ and $ match line breaks)
// s - dotall (. matches newlines)
// u - unicode
// y - sticky

// Methods
'text'.match(/pattern/g)           // Array of matches
'text'.matchAll(/pattern/g)        // Iterator of matches
'text'.search(/pattern/)           // Index of first match
'text'.replace(/pattern/g, 'new')  // Replace matches
/pattern/.test('text')             // Boolean
/pattern/.exec('text')             // Match details
```

**Python:**
```python
import re

# Flags
re.IGNORECASE  # Case insensitive
re.MULTILINE   # ^ and $ match line breaks
re.DOTALL      # . matches newlines
re.VERBOSE     # Allow comments in regex

# Methods
re.match(pattern, string)      # Match at start
re.search(pattern, string)     # Find anywhere
re.findall(pattern, string)    # All matches
re.finditer(pattern, string)   # Iterator
re.sub(pattern, repl, string)  # Replace
re.split(pattern, string)      # Split
```

**PHP:**
```php
// Functions
preg_match($pattern, $subject)         // Single match
preg_match_all($pattern, $subject)     // All matches
preg_replace($pattern, $replace, $subject)  // Replace
preg_split($pattern, $subject)         // Split

// Pattern modifiers
/pattern/i  // Case insensitive
/pattern/m  // Multiline
/pattern/s  // Dotall
/pattern/x  // Ignore whitespace
```

### 6. Common Use Cases

**Extract data:**
```javascript
const text = "Contact: [email protected] or [email protected]"
const emails = text.match(/[^\s@]+@[^\s@]+\.[^\s@]+/g)
console.log(emails)  // ['[email protected]', '[email protected]']
```

**Validate input:**
```javascript
function validateEmail(email) {
  const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/
  return regex.test(email)
}
```

**Replace content:**
```javascript
const text = "Call us at 555-1234 or 555-5678"
const censored = text.replace(/\d{3}-\d{4}/g, 'XXX-XXXX')
console.log(censored)  // "Call us at XXX-XXXX or XXX-XXXX"
```

**Parse structured data:**
```javascript
const log = "2024-01-15 ERROR: Failed to connect"
const match = log.match(/^(\d{4}-\d{2}-\d{2}) (\w+): (.+)$/)

if (match) {
  const [, date, level, message] = match
  console.log({ date, level, message })
}
```

### 7. Advanced Patterns

**Lookaheads and lookbehinds:**
```javascript
// Password: 8+ chars, must include uppercase, lowercase, number
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/

// Match word not followed by another word
/\b\w+\b(?!\s+\w+)/

// Match number preceded by $
/(?<=\$)\d+(\.\d{2})?/
```

**Capture groups:**
```javascript
const text = "John Doe ([email protected])"
const regex = /(\w+)\s+(\w+)\s+\(([^)]+)\)/
const [, firstName, lastName, email] = text.match(regex)

console.log({ firstName, lastName, email })
// { firstName: 'John', lastName: 'Doe', email: '[email protected]' }
```

**Named groups (modern):**
```javascript
const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const match = '2024-01-15'.match(regex)

console.log(match.groups)
// { year: '2024', month: '01', day: '15' }
```

### 8. Performance Tips

**Avoid catastrophic backtracking:**
```javascript
// ❌ BAD: Catastrophic backtracking
/(a+)+b/

// ✅ GOOD: Atomic grouping or possessive quantifiers
/a+b/
```

**Be specific:**
```javascript
// ❌ BAD: Too greedy
/<.*>/

// ✅ GOOD: Lazy quantifier
/<.*?>/

// ✅ BETTER: Specific negation
/<[^>]*>/
```

**Anchor patterns:**
```javascript
// ❌ BAD: Scans entire string
/\d{3}-\d{4}/

// ✅ GOOD: Anchored
/^\d{3}-\d{4}$/
```

### 9. Testing Tools

Provide testing code:

```javascript
function testRegex(pattern, testCases) {
  console.log(`Testing: ${pattern}\n`)

  testCases.forEach(({ input, expected }) => {
    const result = pattern.test(input)
    const status = result === expected ? '✓' : '✗'

    console.log(`${status} "${input}" -> ${result} (expected ${expected})`)
  })
}

// Usage
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/

testRegex(emailRegex, [
  { input: '[email protected]', expected: true },
  { input: 'invalid@', expected: false },
  { input: '@example.com', expected: false },
  { input: 'user example.com', expected: false }
])
```

### 10. Common Mistakes

**Forgetting to escape:**
```javascript
// ❌ BAD: . matches any character
/user.example.com/

// ✅ GOOD: Escaped dot
/user\.example\.com/
```

**Not anchoring:**
```javascript
// ❌ BAD: Matches anywhere in string
/\d{3}-\d{4}/  // "xxx-555-1234-yyy" passes

// ✅ GOOD: Anchored to start and end
/^\d{3}-\d{4}$/  // Only "555-1234" passes
```

**Greedy vs lazy:**
```javascript
const html = '<div>content</div><span>more</span>'

// ❌ BAD: Greedy, matches too much
html.match(/<.*>/)  // '<div>content</div><span>more</span>'

// ✅ GOOD: Lazy, matches minimally
html.match(/<.*?>/)  // '<div>'
```

**Case sensitivity:**
```javascript
// ❌ BAD: Case sensitive
/^[a-z]+$/  // Only lowercase

// ✅ GOOD: Case insensitive
/^[a-z]+$/i  // Any case
```

### Regex Cheat Sheet

**Character classes:**
- `\d` = `[0-9]`
- `\D` = `[^0-9]`
- `\w` = `[a-zA-Z0-9_]`
- `\W` = `[^a-zA-Z0-9_]`
- `\s` = `[ \t\n\r\f\v]`
- `\S` = `[^ \t\n\r\f\v]`

**Quantifiers:**
- `*` = {0,∞}
- `+` = {1,∞}
- `?` = {0,1}
- `{n}` = exactly n
- `{n,}` = n or more
- `{n,m}` = between n and m

**Anchors:**
- `^` = start of string/line
- `$` = end of string/line
- `\b` = word boundary
- `\A` = start of string (Python)
- `\Z` = end of string (Python)

**Groups:**
- `(...)` = capture
- `(?:...)` = non-capture
- `(?<name>...)` = named capture
- `(?=...)` = lookahead
- `(?!...)` = negative lookahead
- `(?<=...)` = lookbehind
- `(?<!...)` = negative lookbehind

Overview

This skill builds, explains, and tests regular expressions from plain-English requests. It converts requirements into regex tailored to common flavors (JavaScript, Python, PHP), explains each component, and supplies runnable test cases and examples. Use it to validate input, extract data, or debug complex patterns.

How this skill works

You describe the format or paste an existing regex; the skill asks clarifying questions when needed (strict vs permissive, flavor, anchors, case sensitivity). It then returns a clear pattern, a short breakdown of tokens and groups, recommended flags, and example test cases with expected outcomes. It also flags performance risks like catastrophic backtracking and suggests safer alternatives.

When to use it

Create a regex to match or validate emails, phones, URLs, dates, etc.
Ask for a regex tailored to a language/flavor (JavaScript, Python, PHP).
Explain or document an existing regex pattern.
Test and debug a regex with example inputs and expected results.
Optimize a regex to avoid catastrophic backtracking or improve performance.

Best practices

Specify whether you need full-string matches or substring matches (use anchors ^ and $ when appropriate).
State the target language or engine to ensure compatible features (lookbehinds, named groups, flags).
Prefer specific character classes and bounded quantifiers to avoid greedy matches and backtracking.
Provide positive and negative test cases so the pattern is validated against expected inputs.
Use non-capturing groups (?:...) when you don’t need captured results to improve clarity and performance.

Example use cases

Validate email input on a signup form with a permissive or stricter pattern.
Match international phone numbers in E.164 format for stored contacts.
Extract date, level, and message from log lines using capture or named groups.
Create a password rule enforcing length and character classes with lookaheads.
Convert a greedy HTML tag match to a safe, lazy or negated character-class alternative.

FAQ

Which regex flavor should I pick?

Pick the flavor supported by your runtime. Use JavaScript for browser/node, Python for backend scripts, and PCRE/PHP when using preg_ functions. Tell the skill your target and it will produce compatible syntax.

How do I avoid catastrophic backtracking?

Be specific with quantifiers, avoid nested ambiguous groups like (a+)+, use atomic groups or possessive quantifiers where supported, and prefer negated character classes or lazy quantifiers.