home / skills / dkyazzentwatwa / chatgpt-skills / language-detector

language-detector skill

/language-detector

This skill detects text language with confidence scores across 50+ languages, supporting batch analysis and CSV inputs.

npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill language-detector

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
704 B
---
name: language-detector
description: Detect language of text with confidence scores, support for 50+ languages, and batch text classification.
---

# Language Detector

Identify the language of text with confidence scoring.

## Features

- **50+ Languages**: Wide language support
- **Confidence Scores**: Probability estimates
- **Batch Detection**: Process multiple texts
- **CSV Support**: Analyze text columns
- **Multiple Algorithms**: Character n-gram analysis

## CLI Usage

```bash
python language_detector.py --text "Hello world" --output result.json
python language_detector.py --file texts.csv --column text --output languages.csv
```

## Dependencies

- langdetect>=1.0.9
- pandas>=2.0.0

Overview

This skill detects the language of input text and returns per-language confidence scores. It supports 50+ languages, batch processing, and CSV column analysis for quick integration into pipelines. The implementation is lightweight and designed for command-line and programmatic use. It is focused on reliable, tested detection with multiple algorithmic heuristics.

How this skill works

The detector analyzes text using character n-gram patterns and probabilistic models to estimate the most likely language. For each input it returns a ranked list of candidate languages with confidence (probability) scores. It supports single-text queries, batch lists, and CSV files where a specified column contains text to classify. Results can be exported as JSON or CSV for downstream processing.

When to use it

  • Automatically tag incoming user text with language metadata for routing or processing.
  • Preprocess datasets to split data by language before training language-specific models.
  • Audit multilingual content in CSV exports or logs to measure language distribution.
  • Batch-process customer feedback, reviews, or comments to detect language at scale.
  • Filter or redirect messages in chat systems to the appropriate language handlers.

Best practices

  • Provide reasonably sized text samples (short phrases can be ambiguous).
  • Use batch CSV mode for large datasets to avoid repeated process startup overhead.
  • Check top-3 candidate languages when confidence is low to handle ambiguity.
  • Normalize text (remove excessive punctuation or markup) for more reliable detection.
  • Log confidence scores alongside labels to enable downstream quality checks.

Example use cases

  • Run a CLI check on a CSV of user comments: detect language in the 'text' column and write results to a new CSV.
  • Integrate detection into ingestion pipelines to assign content to language-specific translators or moderation teams.
  • Quickly classify a list of sentences to profile language distribution before analytics.
  • Pre-filter multilingual corpora to split data into separate language folders for model training.

FAQ

Which languages are supported?

The detector supports 50+ common languages; it covers major world languages used in typical NLP pipelines.

What formats can I process in batch?

You can process plain text lists or CSV files by specifying the column containing text. Outputs can be JSON or CSV.