home / skills / bdambrosio / cognitive_workbench / word-count

word-count skill

safe

/src/tools/word-count

This skill counts words in text to assess length and determine if summarization is needed before further processing.

npx playbooks add skill bdambrosio/cognitive_workbench --skill word-count

Review the files below or copy the command above to add this skill to your agents.

Files (2)

Skill.md

745 B

---
name: word-count
type: python
description: "Count words in text. Use to determine length of a document, e.g. to determine if it needs to be summarized before further use"
---

# word-count

Simple deterministic word counting for text analysis.

## Input

- `target`: Note ID or variable containing text to count

## Output

Success (`status: "success"`):
- `value`: String with word count (e.g., "Word count: 6")
- `extra.count`: Integer count

## Behavior

- Simple whitespace-based counting
- Fast and reliable
- Works on any text content

## Planning Notes

- Use to check document length before processing
- Useful for determining if summarization is needed

## Example

```json
{"type":"word-count","target":"$text","out":"$count"}
```

Overview

This skill provides a simple, deterministic word count for any text input. It returns both a human-readable string and a numeric count, making it easy to integrate into pipelines that need quick length checks. Use it to decide whether a document should be summarized, split, or processed further.

How this skill works

The skill performs a fast whitespace-based tokenization to count words in the provided text. You supply a target (note ID or variable containing text) and it outputs a string like "Word count: 6" plus an integer in extra.count. It is intentionally simple and reliable, without language-specific tokenization or punctuation rules.

When to use it

Check document length before applying expensive NLP tasks (summarization, translation, embeddings).
Decide whether to split large texts into smaller chunks for downstream processing.
Quickly validate input size constraints for APIs or storage limits.
Provide user-facing length information in editors or content creation tools.
Automate workflows that depend on text length thresholds (e.g., publish/hold rules).

Best practices

Use as a fast pre-filter rather than a linguistic tokenizer when precise token rules matter.
Combine with character counts or language-aware tokenizers when downstream models require token-level limits.
Trim or normalize input (remove boilerplate or markup) before counting if you need meaningful content length.
Run counts on cleaned, final text to avoid inflated results from metadata or debug logs.
Treat the numeric count as a conservative estimate for planning splits and summaries.

Example use cases

Check if a user-submitted article exceeds a publish-length threshold before displaying submission options.
Decide to summarize long customer support tickets before sending them to an agent or model.
Split a long transcript into chunks when its word count exceeds a safe processing window for a speech-to-text pipeline.
Enforce input limits for third-party APIs by rejecting or truncating text that exceeds configured word counts.
Report document length in dashboards or analytics for content performance monitoring.

FAQ

Does the skill handle languages with non-space tokenization?

No. It uses simple whitespace-based counting and does not apply language-specific tokenization for languages that require it.

What output format does it return?

On success it returns a string like "Word count: X" in value and the integer count in extra.count.