home / skills / openclaw / skills / image-ocr

image-ocr skill

/skills/xejrax/image-ocr

This skill extracts text from images using Tesseract OCR, supporting multiple languages and common formats to enable quick content digitization.

npx playbooks add skill openclaw/skills --skill image-ocr

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
828 B
---
name: image-ocr
description: "Extract text from images using Tesseract OCR"
metadata:
  {
    "openclaw":
      {
        "emoji": "👁️",
        "requires": { "bins": ["tesseract"] },
        "install":
          [
            {
              "id": "dnf",
              "kind": "dnf",
              "package": "tesseract",
              "bins": ["tesseract"],
              "label": "Install via dnf",
            },
          ],
      },
  }
---

# Image OCR

Extract text from images using Tesseract OCR. Supports multiple languages and image formats including PNG, JPEG, TIFF, and BMP.

## Commands

```bash
# Extract text from an image (default: English)
image-ocr "screenshot.png"

# Extract text with a specific language
image-ocr "document.jpg" --lang eng
```

## Install

```bash
sudo dnf install tesseract
```

Overview

This skill extracts text from images using Tesseract OCR. It supports multiple image formats (PNG, JPEG, TIFF, BMP) and can run OCR in different languages. The tool is lightweight and designed for batch or single-image text extraction in automation pipelines.

How this skill works

The skill calls Tesseract OCR on supplied image files and returns the recognized text. You can pass a language code to improve accuracy for non-English content. It accepts common image formats and can be invoked for single files or scripted to process folders in bulk.

When to use it

  • Digitizing screenshots, scanned documents, or photos to searchable text
  • Automating data extraction from receipts, forms, or invoices
  • Preprocessing images for natural language processing or indexing
  • Quickly converting single images into editable text for editing or translation
  • Running offline OCR where cloud services are not an option

Best practices

  • Use clean, high-resolution images and crop to the region of interest to improve accuracy
  • Specify the correct language code (e.g., eng) when text is not English
  • Preprocess images (deskew, enhance contrast, remove noise) before OCR for better results
  • Batch-process images with consistent naming and output paths for reliable automation
  • Validate and post-process extracted text (spellcheck, regex extraction) for critical workflows

Example use cases

  • Extracting text from a scanned contract folder to create a searchable archive
  • Converting screenshots of error messages into text for bug reports
  • Pulling line items from photographed receipts for expense tracking
  • Preparing OCRed text for translation or content analysis
  • Integrating with a pipeline that indexes documents for full-text search

FAQ

What image formats are supported?

Common formats are supported: PNG, JPEG, TIFF, and BMP.

How do I improve OCR accuracy for non-English text?

Install the appropriate Tesseract language data and pass the corresponding language code (for example, --lang eng for English).