home / skills / bdambrosio / cognitive_workbench / word-count

word-count skill

/src/tools/word-count

This skill counts words in text to assess length and determine if summarization is needed before further processing.

npx playbooks add skill bdambrosio/cognitive_workbench --skill word-count

Review the files below or copy the command above to add this skill to your agents.

Files (2)
Skill.md
745 B
---
name: word-count
type: python
description: "Count words in text. Use to determine length of a document, e.g. to determine if it needs to be summarized before further use"
---

# word-count

Simple deterministic word counting for text analysis.

## Input

- `target`: Note ID or variable containing text to count

## Output

Success (`status: "success"`):
- `value`: String with word count (e.g., "Word count: 6")
- `extra.count`: Integer count

## Behavior

- Simple whitespace-based counting
- Fast and reliable
- Works on any text content

## Planning Notes

- Use to check document length before processing
- Useful for determining if summarization is needed

## Example

```json
{"type":"word-count","target":"$text","out":"$count"}
```

Overview

This skill provides a simple, deterministic word count for any text input. It returns both a human-readable string and a numeric count, making it easy to integrate into pipelines that need quick length checks. Use it to decide whether a document should be summarized, split, or processed further.

How this skill works

The skill performs a fast whitespace-based tokenization to count words in the provided text. You supply a target (note ID or variable containing text) and it outputs a string like "Word count: 6" plus an integer in extra.count. It is intentionally simple and reliable, without language-specific tokenization or punctuation rules.

When to use it

  • Check document length before applying expensive NLP tasks (summarization, translation, embeddings).
  • Decide whether to split large texts into smaller chunks for downstream processing.
  • Quickly validate input size constraints for APIs or storage limits.
  • Provide user-facing length information in editors or content creation tools.
  • Automate workflows that depend on text length thresholds (e.g., publish/hold rules).

Best practices

  • Use as a fast pre-filter rather than a linguistic tokenizer when precise token rules matter.
  • Combine with character counts or language-aware tokenizers when downstream models require token-level limits.
  • Trim or normalize input (remove boilerplate or markup) before counting if you need meaningful content length.
  • Run counts on cleaned, final text to avoid inflated results from metadata or debug logs.
  • Treat the numeric count as a conservative estimate for planning splits and summaries.

Example use cases

  • Check if a user-submitted article exceeds a publish-length threshold before displaying submission options.
  • Decide to summarize long customer support tickets before sending them to an agent or model.
  • Split a long transcript into chunks when its word count exceeds a safe processing window for a speech-to-text pipeline.
  • Enforce input limits for third-party APIs by rejecting or truncating text that exceeds configured word counts.
  • Report document length in dashboards or analytics for content performance monitoring.

FAQ

Does the skill handle languages with non-space tokenization?

No. It uses simple whitespace-based counting and does not apply language-specific tokenization for languages that require it.

What output format does it return?

On success it returns a string like "Word count: X" in value and the integer count in extra.count.