home / skills / trpc-group / trpc-agent-go / ocr

This skill extracts text from images using OCR with Tesseract, supporting many languages and optional preprocessing for improved accuracy.

npx playbooks add skill trpc-group/trpc-agent-go --skill ocr

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
2.2 KB
---
name: ocr
description: Extract text from images using Tesseract OCR
---

# OCR Image Text Extraction Skill

Extract text from images using Tesseract OCR engine.

## Capabilities

- Extract text from image files (PNG, JPG, JPEG, GIF, BMP, TIFF)
- Support for 100+ languages
- Optional image preprocessing for better accuracy
- Output in plain text or JSON format with confidence scores

## Usage

### Basic OCR

```bash
python3 scripts/ocr.py <image_file> <output_file>
```

### With Options

```bash
# Specify language (default: eng)
python3 scripts/ocr.py image.png text.txt --lang eng

# Chinese text
python3 scripts/ocr.py image.png text.txt --lang chi_sim

# Multiple languages
python3 scripts/ocr.py image.png text.txt --lang eng+chi_sim

# With image preprocessing (improves accuracy)
python3 scripts/ocr.py image.png text.txt --preprocess

# JSON output with confidence scores
python3 scripts/ocr.py image.png output.json --format json
```

### Download and OCR from URL

```bash
# OCR from remote image
python3 scripts/ocr_url.py <image_url> <output_file>

# With options
python3 scripts/ocr_url.py https://example.com/image.jpg text.txt --lang eng --preprocess
```

## Parameters

- `image_file` / `image_url` (required): Path to local image or image URL
- `output_file` (required): Path to output text/JSON file
- `--lang`: Language code (e.g., eng, chi_sim, jpn, fra, deu). Default: eng
- `--preprocess`: Apply image preprocessing (grayscale, thresholding) for better accuracy
- `--format`: Output format (text/json, default: text)

## Common Languages

| Language | Code |
|----------|------|
| English | eng |
| Chinese (Simplified) | chi_sim |
| Chinese (Traditional) | chi_tra |
| Japanese | jpn |
| Korean | kor |
| French | fra |
| German | deu |
| Spanish | spa |
| Russian | rus |
| Arabic | ara |

## Supported Image Formats

PNG, JPG, JPEG, GIF, BMP, TIFF, WEBP

## Dependencies

- Python 3.8+
- pytesseract
- Pillow (PIL)
- tesseract-ocr (system package)

## Installation

```bash
# Python packages
pip install pytesseract Pillow

# Tesseract OCR engine
sudo apt-get install tesseract-ocr  # Ubuntu/Debian
sudo yum install tesseract           # CentOS/RHEL
brew install tesseract               # macOS
```

Overview

This skill extracts text from images using the Tesseract OCR engine. It supports common image formats, 100+ languages, optional preprocessing to improve accuracy, and can output plain text or JSON with confidence scores. It is designed for integration into agent workflows that need reliable image-to-text conversion.

How this skill works

The skill accepts a local image path or a remote image URL, optionally applies preprocessing (grayscale, thresholding) to enhance legibility, and runs Tesseract via pytesseract to recognize characters. Results can be returned as plain text or structured JSON that includes per-line or per-word confidence scores and language hints when provided. It relies on the system Tesseract installation and Python libraries Pillow and pytesseract.

When to use it

  • Extract printed or scanned text from PNG/JPG/TIFF/WEBP and other common image files
  • Automate ingestion of receipts, invoices, or forms into downstream processing pipelines
  • Convert screenshots, photos, or scanned documents into searchable text
  • Capture multilingual text with support for language-specific models
  • Produce structured JSON output including confidence scores for validation or QA

Best practices

  • Install and configure the appropriate Tesseract language packs for non-English content
  • Use the preprocessing option for low-contrast, noisy, or skewed images to improve recognition
  • Pass explicit language codes (e.g., eng, chi_sim) for multi-language or non-Latin scripts
  • Validate high-value outputs using confidence thresholds and fallback manual review
  • Prefer lossless image formats or higher-resolution captures for small or dense text

Example use cases

  • Batch OCR of scanned contract pages to make them searchable and indexable
  • Extract text from product labels and packaging photos for cataloging or compliance checks
  • Automated capture of invoice data fields followed by numeric parsing and accounting ingestion
  • OCR screenshots from mobile devices to populate knowledge bases or QA datasets
  • Remote OCR of images hosted by URL for lightweight web scraping and extraction tasks

FAQ

Which image formats are supported?

Most common formats: PNG, JPG, JPEG, GIF, BMP, TIFF, and WEBP are supported via Pillow.

How do I improve accuracy for non-English text?

Install the appropriate Tesseract language pack and pass the language code (e.g., chi_sim). Enabling preprocessing often helps for noisy images.

Can I get confidence scores for the recognized text?

Yes. Use the JSON output mode to receive per-line or per-word confidence scores for downstream validation.