home / skills / personamanagmentlayer / pcl / elasticsearch-expert

elasticsearch-expert skill

/stdlib/data/elasticsearch-expert

This skill helps you optimize Elasticsearch search, indexing, and analytics with expert guidance on mappings, queries, and aggregations.

npx playbooks add skill personamanagmentlayer/pcl --skill elasticsearch-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
3.8 KB
---
name: elasticsearch-expert
version: 1.0.0
description: Expert-level Elasticsearch, search, ELK stack, and full-text search
category: data
tags: [elasticsearch, search, elk, logstash, kibana, full-text-search]
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash(*)
---

# Elasticsearch Expert

Expert guidance for Elasticsearch, search optimization, ELK stack, and distributed search systems.

## Core Concepts

- Full-text search and inverted indexes
- Document-oriented storage
- RESTful API
- Distributed architecture with sharding
- ELK stack (Elasticsearch, Logstash, Kibana)
- Aggregations and analytics

## Index Management

```python
from elasticsearch import Elasticsearch

es = Elasticsearch(['http://localhost:9200'])

# Create index with mapping
mapping = {
    "mappings": {
        "properties": {
            "title": {"type": "text", "analyzer": "english"},
            "content": {"type": "text"},
            "author": {"type": "keyword"},
            "created_at": {"type": "date"},
            "views": {"type": "integer"}
        }
    }
}

es.indices.create(index='articles', body=mapping)

# Index document
doc = {
    "title": "Elasticsearch Guide",
    "content": "Complete guide to Elasticsearch",
    "author": "John Doe",
    "created_at": "2024-01-01",
    "views": 100
}

es.index(index='articles', id=1, body=doc)

# Bulk indexing
from elasticsearch.helpers import bulk

actions = [
    {"_index": "articles", "_id": i, "_source": doc}
    for i, doc in enumerate(documents)
]

bulk(es, actions)
```

## Search Queries

```python
# Full-text search
query = {
    "query": {
        "match": {
            "content": "elasticsearch guide"
        }
    }
}

results = es.search(index='articles', body=query)

# Boolean query
bool_query = {
    "query": {
        "bool": {
            "must": [
                {"match": {"content": "elasticsearch"}}
            ],
            "filter": [
                {"range": {"views": {"gte": 100}}}
            ],
            "should": [
                {"term": {"author": "john-doe"}}
            ],
            "must_not": [
                {"term": {"status": "draft"}}
            ]
        }
    }
}

# Multi-match query
multi_match = {
    "query": {
        "multi_match": {
            "query": "elasticsearch guide",
            "fields": ["title^2", "content"],  # Boost title
            "type": "best_fields"
        }
    }
}

# Fuzzy search
fuzzy = {
    "query": {
        "fuzzy": {
            "title": {
                "value": "elasticseerch",
                "fuzziness": "AUTO"
            }
        }
    }
}
```

## Aggregations

```python
# Aggregation query
agg_query = {
    "aggs": {
        "authors": {
            "terms": {
                "field": "author",
                "size": 10
            }
        },
        "avg_views": {
            "avg": {
                "field": "views"
            }
        },
        "views_histogram": {
            "histogram": {
                "field": "views",
                "interval": 100
            }
        },
        "date_histogram": {
            "date_histogram": {
                "field": "created_at",
                "calendar_interval": "month"
            }
        }
    }
}

result = es.search(index='articles', body=agg_query)
```

## Best Practices

- Design mappings carefully
- Use appropriate analyzers
- Implement proper sharding strategy
- Monitor cluster health
- Use bulk operations
- Implement pagination with search_after
- Cache frequently used queries

## Anti-Patterns

❌ Deep pagination with from/size
❌ Wildcard queries without prefix
❌ No replica shards
❌ Over-sharding
❌ Not using filters for exact matches
❌ Ignoring cluster yellow/red status

## Resources

- Elasticsearch Guide: https://www.elastic.co/guide/
- ELK Stack: https://www.elastic.co/elk-stack

Overview

This skill provides expert-level guidance on Elasticsearch, full-text search, and the ELK stack for building and maintaining production search systems. It focuses on index design, query patterns, aggregations, and operational best practices to deliver fast, relevant search experiences. Guidance covers mappings, analyzers, sharding, bulk operations, and common anti-patterns.

How this skill works

I inspect typical search workflows: index creation, document indexing (including bulk), query construction (match, bool, multi_match, fuzzy), and aggregation pipelines. I explain trade-offs for analyzers, sharding, and replica settings, and show how to optimize search and analytics performance. I also highlight operational checks such as cluster health, monitoring, and safe pagination strategies.

When to use it

  • Designing or revising index mappings for large text collections
  • Optimizing query relevance and boosting fields for search results
  • Scaling clusters with appropriate shard and replica strategies
  • Implementing analytics and reporting with aggregations
  • Troubleshooting slow queries, cluster health, or indexing bottlenecks

Best practices

  • Design mappings and choose analyzers up front to avoid costly reindexing
  • Use filters for exact matches and queries for relevance scoring
  • Prefer bulk operations for large ingest and search_after for deep pagination
  • Monitor cluster health, shard allocation, and JVM/GC metrics continuously
  • Avoid wildcard and expensive regex queries; use prefix or n-grams when needed

Example use cases

  • Create an articles index with language analyzers, date and numeric fields, and optimized title boosting
  • Implement a multi-field search that boosts title over content and falls back to fuzzy matching
  • Build aggregation dashboards for author counts, average views, and monthly trends
  • Bulk ingest logs into the ELK stack and configure Kibana dashboards for observability
  • Diagnose and resolve yellow/red cluster states, rebalance shards, and tune replica counts

FAQ

How do I avoid expensive deep pagination?

Use search_after for reliable deep paging, or use use cursor-style pagination via point-in-time (PIT) to snapshot results without high cost.

When should I reindex?

Reindex when mappings or analyzers change in ways that affect tokenization or field types; plan downtime or use zero-downtime reindex patterns with aliases.

How many shards should I create per index?

Choose shard count based on expected index size, not node count; avoid over-sharding and favor fewer larger shards that can be split later if needed.