home / skills / bdambrosio / cognitive_workbench / extract-entities

extract-entities skill

safe

This skill helps you extract named entities, topics, and relationships from text to power knowledge graphs, indexing, and cross-document analysis.

npx playbooks add skill bdambrosio/cognitive_workbench --skill extract-entities

Review the files below or copy the command above to add this skill to your agents.

Files (2)

Skill.md

3.7 KB

---
name: extract-entities
description: Extract named entities, topics, and relationships from text or structured content
type: python
flattens_collections: true
parameters: none
examples:
  - '{"type":"extract-entities","target":"$paper_text","out":"$entities","expect":"should find authors and organizations"}'
---

# Extract Entities

Identify and extract structured information from unstructured text: people, places, organizations, topics, dates, and relationships between entities.

## Purpose

Transform free-form text into structured entity data for:
- Building knowledge graphs
- Indexing and retrieval
- Pattern detection across documents
- Linking related content

## Input Format

Accepts:
- Plain text (paragraphs, documents)
- Structured data with text fields
- Lists of text snippets

## Output Format

Returns JSON structure:
```json
{
  "people": ["Name1", "Name2"],
  "organizations": ["Org1", "Org2"],
  "locations": ["Place1", "Place2"],
  "topics": ["Topic1", "Topic2"],
  "dates": ["2025-01-15", "last week"],
  "key_concepts": ["Concept1", "Concept2"],
  "relationships": [
    {"subject": "Name1", "predicate": "works_at", "object": "Org1"},
    {"subject": "Topic1", "predicate": "relates_to", "object": "Topic2"}
  ]
}
```

## Extraction Guidelines

### Entity Categories

**People**: Full names, roles, pronouns with clear referents
- Include professional titles if mentioned
- Resolve pronouns when unambiguous

**Organizations**: Companies, institutions, projects, teams
- Include both formal and informal names
- Note parent/subsidiary relationships

**Locations**: Cities, countries, venues, virtual spaces
- Be specific when possible (not just "the office")

**Topics**: Domain areas, technologies, methodologies
- Extract at appropriate granularity (not too broad/narrow)
- Include synonyms if multiple terms used

**Dates/Time**: Absolute and relative temporal references
- Normalize when possible (ISO format for absolute dates)
- Preserve relative references ("next week", "recently")

**Key Concepts**: Abstract ideas, themes, goals
- Focus on novel or emphasized concepts
- Distinguish from general background

### Relationships

Extract explicit and strongly implied relationships:
- Employment/affiliation
- Collaboration/partnership
- Causation/dependency
- Temporal ordering
- Hierarchical structure

**Format**: `{subject, predicate, object}` triples

### Quality Standards

- **Precision over recall**: Only extract clear, confident entities
- **Disambiguation**: Use context to resolve ambiguous references
- **Normalization**: Consistent entity naming across text
- **No hallucination**: Never infer entities not present in source

## Special Handling

**Pronouns**: Resolve only when antecedent is clear and recent
**Abbreviations**: Expand on first use, preserve thereafter
**Implicit entities**: Extract only if strongly implied by context
**Conflicting info**: Note conflicts in relationships field

## Parameters

Optional args dict can specify:
- `entity_types`: List of types to extract (default: all)
- `include_confidence`: Boolean, add confidence scores (default: false)
- `max_entities_per_type`: Limit results (default: unlimited)

## Example

**Input:**
```
Sarah joined Anthropic last quarter to work on constitutional AI. 
She previously collaborated with researchers at DeepMind on alignment.
```

**Output:**
```json
{
  "people": ["Sarah"],
  "organizations": ["Anthropic", "DeepMind"],
  "topics": ["constitutional AI", "alignment"],
  "dates": ["last quarter"],
  "key_concepts": ["alignment research"],
  "relationships": [
    {"subject": "Sarah", "predicate": "works_at", "object": "Anthropic"},
    {"subject": "Sarah", "predicate": "previously_at", "object": "DeepMind"},
    {"subject": "Sarah", "predicate": "works_on", "object": "constitutional AI"}
  ]
}
```

Overview

This skill extracts named entities, topics, and relationships from free-form or structured text and returns a consistent JSON representation. It converts paragraphs, lists, and text fields into people, organizations, locations, topics, dates, key concepts, and relationship triples. The output is designed for downstream indexing, knowledge graph construction, and content linking.

How this skill works

The skill inspects text to identify clear entity mentions and normalizes them for consistency. It classifies entities into categories (people, organizations, locations, topics, dates, key concepts) and emits relationship triples (subject, predicate, object) for explicit or strongly implied links. It favors precision over recall, resolves pronouns only when antecedents are unambiguous, and avoids inventing entities not present in the source.

When to use it

Extract structured entities from documents for search indexing or metadata tagging
Generate relationship triples for knowledge graph building or entity linking
Normalize dates and entity names across a corpus for analytics
Detect collaborations, affiliations, or temporal orderings in text
Preprocess text for summarization, recommendation, or pattern detection

Best practices

Provide complete or well-scoped text snippets to reduce ambiguity
Use the entity_types parameter to limit extraction to relevant categories
Enable include_confidence to aid downstream filtering when needed
Prefer explicit mentions over relying on implied references to avoid errors
Review and reconcile conflicting relationships before automatic ingestion

Example use cases

Indexing news articles: extract people, organizations, locations, and topics for faceted search
Knowledge graph population: convert research papers into entities and affiliation triples
Compliance review: find dates, parties, and contractual relationships in documents
Content linking: identify related topics and authors across a document collection
Analytics pipelines: normalize entity names and dates for cross-document trend analysis

FAQ

What input formats are supported?

Plain text, structured records with text fields, and lists of text snippets.

Does it invent entities or infer unstated facts?

No. The skill avoids hallucination and only extracts entities or relationships that are clearly present or strongly implied.

Can it return confidence scores?

Yes — set include_confidence to true to receive per-entity confidence values.