home / skills / a5c-ai / babysitter / lexer-generator
This skill generates and hand-writes lexers using DFA, table-driven, and recursive approaches to speed up language tooling.
npx playbooks add skill a5c-ai/babysitter --skill lexer-generatorReview the files below or copy the command above to add this skill to your agents.
---
name: Lexer Generator
description: Expert skill for generating and hand-writing lexers using DFA-based, table-driven, and recursive approaches
category: Compiler Frontend
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
---
# Lexer Generator Skill
## Overview
Expert skill for generating and hand-writing lexers using various approaches including DFA-based lexers, table-driven lexers, and hand-written recursive lexers.
## Capabilities
- Generate lexer from regular expression specifications
- Implement maximal munch tokenization
- Handle Unicode character classes and normalization
- Implement efficient keyword recognition (tries, perfect hashing)
- Support incremental/resumable lexing for IDE integration
- Generate lexer tables and state machines
- Handle lexer modes and contexts (e.g., string interpolation)
- Implement error recovery with skip-to-next strategies
## Target Processes
- lexer-implementation.js
- language-grammar-design.js
- lsp-server-implementation.js
- repl-development.js
## Dependencies
- Flex-like generators
- RE2/Hyperscan libraries
## Usage Guidelines
1. **Token Definition**: Start by defining the complete set of tokens with their regex patterns
2. **Maximal Munch**: Always implement maximal munch to handle ambiguous token boundaries
3. **Unicode Support**: Consider Unicode normalization forms and character classes from the start
4. **Error Recovery**: Implement skip-to-next-valid strategies for robust error handling
5. **Performance**: Use table-driven approaches for large token sets, hand-written for simple lexers
## Output Schema
```json
{
"type": "object",
"properties": {
"tokens": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"pattern": { "type": "string" },
"priority": { "type": "integer" }
}
}
},
"lexerType": {
"type": "string",
"enum": ["dfa", "table-driven", "hand-written"]
},
"generatedFiles": {
"type": "array",
"items": { "type": "string" }
}
}
}
```
This skill generates and hand-writes lexers using DFA-based, table-driven, and recursive techniques. It produces deterministic, high-performance tokenizers with support for maximal munch, Unicode handling, lexer modes, and resumable lexing for IDEs. The skill is focused on practical, production-ready implementations suitable for compilers, language servers, and REPLs. Outputs include token schemas, lexer tables, and ready-to-use JavaScript source files.
Provide a complete token specification using regular expressions and priorities; the skill converts these into DFA state machines or table-driven tables, or emits a hand-written recursive lexer when simpler logic is preferable. It implements maximal munch by resolving ambiguities with longest-match and priority rules, and supports Unicode normalization and character classes. For performance it can generate compact transition tables, keyword tries, or perfect-hash based matchers. It also emits integration helpers for incremental/resumable lexing and error-recovery strategies like skip-to-next.
Which lexer type should I choose for a new language?
Use table-driven/DFA for many tokens and strict performance needs; choose hand-written recursive lexers when grammar includes complex mode switches or when implementation simplicity matters.
How is Unicode handled?
The skill supports Unicode character classes and normalization; pick a normalization form early and include Unicode-aware regexes or character-range handling in the token specs.