home / skills / lin-a1 / skills-agent / ocr_service

ocr_service skill

needs review

This skill performs high-precision OCR on images, supports multiple languages and formats, and returns text, coordinates, and confidence scores for document

npx playbooks add skill lin-a1/skills-agent --skill ocr_service

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

1.1 KB

---
name: ocr-service
description: 高精度光学字符识别（OCR）服务。支持多语言、多格式图像的文字检测与提取，并提供文本区域坐标与置信度评分，适用于文档数字化与图像内容分析。
---

## 功能
从图像中提取文字内容，支持多种图像格式和语言。

## 调用方式
```python
from services.ocr_service.client import OCRServiceClient

client = OCRServiceClient()

# 健康检查
status = client.health_check()

# OCR识别
image_base64 = client.image_to_base64("/path/to/image.jpg")
result = client.ocr(image_base64)

# 获取识别结果
texts = result["rec_texts"]    # ["识别的文字1", "识别的文字2", ...]
scores = result["rec_scores"]  # [0.98, 0.95, ...]
```

## 返回格式
```json
{
  "doc_preprocessor_res": {"angle": 0},
  "dt_polys": [[x1,y1], [x2,y2], ...],
  "rec_texts": ["识别的文字1", "识别的文字2"],
  "rec_scores": [0.98, 0.95]
}
```

## 字段说明
- `rec_texts`: 识别出的文字列表
- `rec_scores`: 每个文字块的置信度
- `dt_polys`: 检测到的文本区域坐标

Overview

This skill provides a high-precision optical character recognition (OCR) service for extracting text from images. It supports multiple languages and image formats, returns detected text with bounding polygon coordinates and confidence scores, and is designed for document digitization and image content analysis. The service includes a health-check endpoint and simple client functions for image conversion and OCR invocation.

How this skill works

The client sends base64-encoded images to the OCR service, which performs text detection and recognition. The service returns a structured result containing preprocessor info (e.g., rotation), detected text polygons, recognized text strings, and per-item confidence scores. Consumers can use the polygon coordinates to map text back onto the original image or to crop regions for downstream processing.

When to use it

Digitizing scanned documents, receipts, invoices, forms, or printed reports.
Extracting text from photos for indexing, search, or automated data entry.
Analyzing images with mixed-language content or multi-format image inputs.
Preprocessing images for NLP pipelines that require positional context.
Automating verification tasks by extracting text plus confidence for validation.

Best practices

Provide well-lit, high-resolution images and avoid heavy compression to improve accuracy.
Use the doc_preprocessor_res angle to deskew images before downstream layout tasks.
Filter recognized items by rec_scores to reduce false positives in critical workflows.
Map dt_polys back to the original image to validate spatial relationships or extract subregions.
Batch images and reuse a single client instance to reduce connection overhead.

Example use cases

Convert bulk scanned contracts into searchable text and preserve layout by storing dt_polys.
Extract line items and totals from photographed receipts for expense automation using rec_scores to trust high-confidence items.
Detect and transcribe signage or labels in multi-language street or product photos for indexing.
Preprocess and crop fields from forms using polygon coordinates, then pass crops to specialized parsers.
Run a health check before large jobs to ensure the OCR endpoint is reachable and functioning.

FAQ

What does rec_scores represent and how should I use it?

rec_scores is the per-text confidence value (0–1). Use a threshold to filter low-confidence results or surface them for manual review.

How do I handle rotated or skewed images?

Check doc_preprocessor_res.angle to detect rotation. Deskew using that angle or reorient the image before downstream layout-sensitive processing.