home / skills / dkyazzentwatwa / chatgpt-skills / language-detector

language-detector skill

safe

/language-detector

This skill detects text language with confidence scores across 50+ languages, supporting batch analysis and CSV inputs.

npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill language-detector

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

704 B

---
name: language-detector
description: Detect language of text with confidence scores, support for 50+ languages, and batch text classification.
---

# Language Detector

Identify the language of text with confidence scoring.

## Features

- **50+ Languages**: Wide language support
- **Confidence Scores**: Probability estimates
- **Batch Detection**: Process multiple texts
- **CSV Support**: Analyze text columns
- **Multiple Algorithms**: Character n-gram analysis

## CLI Usage

```bash
python language_detector.py --text "Hello world" --output result.json
python language_detector.py --file texts.csv --column text --output languages.csv
```

## Dependencies

- langdetect>=1.0.9
- pandas>=2.0.0

Overview

This skill detects the language of input text and returns per-language confidence scores. It supports 50+ languages, batch processing, and CSV column analysis for quick integration into pipelines. The implementation is lightweight and designed for command-line and programmatic use. It is focused on reliable, tested detection with multiple algorithmic heuristics.

How this skill works

The detector analyzes text using character n-gram patterns and probabilistic models to estimate the most likely language. For each input it returns a ranked list of candidate languages with confidence (probability) scores. It supports single-text queries, batch lists, and CSV files where a specified column contains text to classify. Results can be exported as JSON or CSV for downstream processing.

When to use it

Automatically tag incoming user text with language metadata for routing or processing.
Preprocess datasets to split data by language before training language-specific models.
Audit multilingual content in CSV exports or logs to measure language distribution.
Batch-process customer feedback, reviews, or comments to detect language at scale.
Filter or redirect messages in chat systems to the appropriate language handlers.

Best practices

Provide reasonably sized text samples (short phrases can be ambiguous).
Use batch CSV mode for large datasets to avoid repeated process startup overhead.
Check top-3 candidate languages when confidence is low to handle ambiguity.
Normalize text (remove excessive punctuation or markup) for more reliable detection.
Log confidence scores alongside labels to enable downstream quality checks.

Example use cases

Run a CLI check on a CSV of user comments: detect language in the 'text' column and write results to a new CSV.
Integrate detection into ingestion pipelines to assign content to language-specific translators or moderation teams.
Quickly classify a list of sentences to profile language distribution before analytics.
Pre-filter multilingual corpora to split data into separate language folders for model training.

FAQ

Which languages are supported?

The detector supports 50+ common languages; it covers major world languages used in typical NLP pipelines.

What formats can I process in batch?

You can process plain text lists or CSV files by specifying the column containing text. Outputs can be JSON or CSV.