home / skills / a5c-ai / babysitter / suffix-structure-builder

suffix-structure-builder skill

safe

/plugins/babysitter/skills/babysit/process/specializations/algorithms-optimization/skills/suffix-structure-builder

This skill helps you construct and query suffix arrays, trees, and automata efficiently to enable fast pattern matching and substring analysis.

npx playbooks add skill a5c-ai/babysitter --skill suffix-structure-builder

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

1.7 KB

---
name: suffix-structure-builder
description: Build and query suffix arrays and related structures
allowed-tools:
  - Read
  - Write
  - Grep
  - Glob
  - Edit
---

# Suffix Structure Builder Skill

## Purpose

Build suffix arrays, suffix trees, and related structures with efficient construction algorithms and common query implementations.

## Capabilities

- Suffix array construction (SA-IS, DC3)
- LCP array construction
- Suffix tree construction
- Suffix automaton construction
- Query implementations for each structure
- Sparse table for LCP queries

## Target Processes

- trie-suffix-structures
- pattern-matching-algorithms
- string-processing

## Suffix Structures

### Suffix Array
- O(n log n) or O(n) construction
- Combined with LCP for powerful queries
- Pattern matching in O(m log n)

### LCP Array
- Kasai's algorithm O(n)
- Range minimum queries for LCA
- Distinct substring counting

### Suffix Tree
- Ukkonen's algorithm O(n)
- More complex but powerful
- Direct pattern matching O(m)

### Suffix Automaton
- O(n) construction
- Smallest automaton for all substrings
- Powerful for counting problems

## Input Schema

```json
{
  "type": "object",
  "properties": {
    "structure": {
      "type": "string",
      "enum": ["suffixArray", "lcpArray", "suffixTree", "suffixAutomaton"]
    },
    "algorithm": { "type": "string" },
    "queries": { "type": "array" },
    "language": {
      "type": "string",
      "enum": ["cpp", "python", "java"]
    }
  },
  "required": ["structure"]
}
```

## Output Schema

```json
{
  "type": "object",
  "properties": {
    "success": { "type": "boolean" },
    "code": { "type": "string" },
    "complexity": { "type": "object" },
    "queryImplementations": { "type": "array" }
  },
  "required": ["success", "code"]
}
```

Overview

This skill builds and queries suffix arrays, suffix trees, suffix automata, and related auxiliary structures. It provides multiple construction algorithms and common query implementations so you can integrate string-indexing primitives into larger workflows. Use it to generate production-ready code snippets and complexity summaries for C++, Python, or Java.

How this skill works

You specify the target structure (suffixArray, lcpArray, suffixTree, or suffixAutomaton) and optionally the construction algorithm. The skill emits implementation code, a complexity breakdown, and ready-to-run query functions such as pattern matching, LCP queries, distinct substring counting, and substring frequency. It can also include a sparse table for fast range-minimum/LCA lookups and provide multiple algorithmic options (SA-IS, DC3, Ukkonen, Kasai).

When to use it

Implementing fast substring search and pattern matching for large texts
Counting distinct substrings or computing substring frequencies
Building index structures for bioinformatics or search engines
Teaching or benchmarking suffix-structure algorithms and runtimes
Adding deterministic string-orchestration steps into agentic workflows

Best practices

Choose linear algorithms (SA-IS, DC3, Ukkonen) for very large inputs to keep memory and time optimal
Combine suffix array + LCP + sparse table for practical fast queries (binary search + RMQ)
Prefer suffix automaton for counting distinct substrings or substring occurrence multiplicities
Benchmark implementations on representative datasets to validate performance assumptions
Expose clear query APIs (find, count, firstOccurrence, distinctCount) to callers

Example use cases

Generate a C++ suffix array using SA-IS with a Kasai LCP and RMQ for fast pattern queries
Create a Python suffix automaton to count distinct substrings and frequency of each substring
Produce a Java implementation of Ukkonen's suffix tree for direct O(m) pattern matching
Add a suffix-structure construction step into an agent workflow that needs resumable string indexing
Compare DC3 and SA-IS on a dataset to pick the best construction for production

FAQ

Which structure should I pick for pattern matching?

For most pattern queries use suffix array + LCP + RMQ for a compact and fast solution; use suffix tree when you need guaranteed O(m) traversal and richer topology.

When is suffix automaton preferable?

Use a suffix automaton when you need to enumerate or count all distinct substrings and compute multiplicities efficiently with linear construction.