home / skills / a5c-ai / babysitter / suffix-structure-builder

This skill helps you construct and query suffix arrays, trees, and automata efficiently to enable fast pattern matching and substring analysis.

npx playbooks add skill a5c-ai/babysitter --skill suffix-structure-builder

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
1.7 KB
---
name: suffix-structure-builder
description: Build and query suffix arrays and related structures
allowed-tools:
  - Read
  - Write
  - Grep
  - Glob
  - Edit
---

# Suffix Structure Builder Skill

## Purpose

Build suffix arrays, suffix trees, and related structures with efficient construction algorithms and common query implementations.

## Capabilities

- Suffix array construction (SA-IS, DC3)
- LCP array construction
- Suffix tree construction
- Suffix automaton construction
- Query implementations for each structure
- Sparse table for LCP queries

## Target Processes

- trie-suffix-structures
- pattern-matching-algorithms
- string-processing

## Suffix Structures

### Suffix Array
- O(n log n) or O(n) construction
- Combined with LCP for powerful queries
- Pattern matching in O(m log n)

### LCP Array
- Kasai's algorithm O(n)
- Range minimum queries for LCA
- Distinct substring counting

### Suffix Tree
- Ukkonen's algorithm O(n)
- More complex but powerful
- Direct pattern matching O(m)

### Suffix Automaton
- O(n) construction
- Smallest automaton for all substrings
- Powerful for counting problems

## Input Schema

```json
{
  "type": "object",
  "properties": {
    "structure": {
      "type": "string",
      "enum": ["suffixArray", "lcpArray", "suffixTree", "suffixAutomaton"]
    },
    "algorithm": { "type": "string" },
    "queries": { "type": "array" },
    "language": {
      "type": "string",
      "enum": ["cpp", "python", "java"]
    }
  },
  "required": ["structure"]
}
```

## Output Schema

```json
{
  "type": "object",
  "properties": {
    "success": { "type": "boolean" },
    "code": { "type": "string" },
    "complexity": { "type": "object" },
    "queryImplementations": { "type": "array" }
  },
  "required": ["success", "code"]
}
```

Overview

This skill builds and queries suffix arrays, suffix trees, suffix automata, and related auxiliary structures. It provides multiple construction algorithms and common query implementations so you can integrate string-indexing primitives into larger workflows. Use it to generate production-ready code snippets and complexity summaries for C++, Python, or Java.

How this skill works

You specify the target structure (suffixArray, lcpArray, suffixTree, or suffixAutomaton) and optionally the construction algorithm. The skill emits implementation code, a complexity breakdown, and ready-to-run query functions such as pattern matching, LCP queries, distinct substring counting, and substring frequency. It can also include a sparse table for fast range-minimum/LCA lookups and provide multiple algorithmic options (SA-IS, DC3, Ukkonen, Kasai).

When to use it

  • Implementing fast substring search and pattern matching for large texts
  • Counting distinct substrings or computing substring frequencies
  • Building index structures for bioinformatics or search engines
  • Teaching or benchmarking suffix-structure algorithms and runtimes
  • Adding deterministic string-orchestration steps into agentic workflows

Best practices

  • Choose linear algorithms (SA-IS, DC3, Ukkonen) for very large inputs to keep memory and time optimal
  • Combine suffix array + LCP + sparse table for practical fast queries (binary search + RMQ)
  • Prefer suffix automaton for counting distinct substrings or substring occurrence multiplicities
  • Benchmark implementations on representative datasets to validate performance assumptions
  • Expose clear query APIs (find, count, firstOccurrence, distinctCount) to callers

Example use cases

  • Generate a C++ suffix array using SA-IS with a Kasai LCP and RMQ for fast pattern queries
  • Create a Python suffix automaton to count distinct substrings and frequency of each substring
  • Produce a Java implementation of Ukkonen's suffix tree for direct O(m) pattern matching
  • Add a suffix-structure construction step into an agent workflow that needs resumable string indexing
  • Compare DC3 and SA-IS on a dataset to pick the best construction for production

FAQ

Which structure should I pick for pattern matching?

For most pattern queries use suffix array + LCP + RMQ for a compact and fast solution; use suffix tree when you need guaranteed O(m) traversal and richer topology.

When is suffix automaton preferable?

Use a suffix automaton when you need to enumerate or count all distinct substrings and compute multiplicities efficiently with linear construction.