home / skills / openclaw / skills / captcha-solver

This skill automatically recognizes and solves various CAPTCHAs using local OCR and optional APIs to streamline automated form access.

npx playbooks add skill openclaw/skills --skill captcha-solver

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
2.5 KB
---
name: captcha-solver
description: 验证码识别与解决 - 本地OCR识别 + 第三方API / CAPTCHA Recognition and Solving - Local OCR + Third-party APIs
metadata:
  version: 1.0.0
---

# 验证码识别与解决 / CAPTCHA Solver

自动识别和解决各类验证码 / Automatically recognize and solve various CAPTCHAs

## 支持类型 / Supported Types

### 本地OCR识别 / Local OCR (免费/Free)
- 🔤 简单文本验证码 / Simple text CAPTCHA
- 🔢 数字验证码 / Numeric CAPTCHA
- ➕ 数学运算验证码 / Math CAPTCHA
- 🖼️ 滑动验证码(缺口检测) / Slide CAPTCHA (gap detection)

### API解决 / API Solving (付费/APIs)
- reCAPTCHA v2/v3
- hCaptcha
- Cloudflare Turnstile
- 2Captcha / Anti-Captcha

## 使用方法 / Usage

```bash
# 识别图片验证码
python solve.py --image captcha.png

# 解决reCaptcha
python solve.py --recaptcha "site_key" --url "page_url"

# 滑动验证码
python solve.py --slide background.png --template slider.png
```

## 配置 / Configuration

### 本地OCR
```python
# 默认使用Tesseract
TESSERACT_CMD = "/usr/bin/tesseract"
LANG = "eng+chi_sim"  # 支持中英文
```

### API服务 (可选)
```python
# 2Captcha
API_2CAPTCHA = "your_api_key"

# Anti-Captcha  
API_ANTI_CAPTCHA = "your_api_key"
```

## 算法 / Algorithms

### 1. 图像预处理
- 灰度转换 / Grayscale
- 二值化 / Binarization
- 去噪 / Denoising
- 锐化 / Sharpening

### 2. 字符分割
- 连通域分析 / Connected component analysis
- 投影法 / Projection method

### 3. 字符识别
- 模板匹配 / Template matching
- 机器学习 / ML-based OCR

### 4. 滑动验证码
- 边缘检测 / Edge detection
- 缺口定位 / Gap localization
- 轨迹生成 / Trajectory generation

## 示例 / Examples

### 简单文本识别
```python
from solver import CaptchaSolver

solver = CaptchaSolver()
result = solver.solve_image("captcha.png")
print(result)  # 输出识别的字符
```

### 滑动验证码
```python
result = solver.solve_slide(bg_img, slider_img)
print(result)  # 输出滑动距离
```

### reCaptcha
```python
result = solver.solve_recaptcha(site_key, page_url)
print(result)  # 输出token
```

## 服务对比 / Service Comparison

| 服务 | 价格 | 成功率 | 速度 |
|------|------|--------|------|
| 本地OCR | 免费 | 60-80% | 快 |
| 2Captcha | $2.99/1000 | 95%+ | 慢 |
| Anti-Captcha | $2.00/1000 | 95%+ | 中 |

## 注意事项 / Notes

1. 优先使用本地OCR,失败再调用API
2. 遵守网站使用条款
3. 不要用于非法用途

Overview

This skill provides CAPTCHA recognition and solving using local OCR plus optional third-party solving APIs. It supports simple text, numeric, math, and slide-gap CAPTCHAs locally, and integrates with paid services for complex challenges like reCAPTCHA, hCaptcha, and Cloudflare Turnstile. The design prioritizes local processing with API fallback to balance cost and success rate. It’s implemented in Python and configurable for Tesseract or external solver APIs.

How this skill works

The solver first attempts local OCR workflows: image preprocessing (grayscale, binarization, denoising, sharpening), character segmentation, and recognition via template matching or ML-based OCR. For slide CAPTCHAs it detects edges and gap locations, then generates a realistic movement trajectory to simulate human dragging. If local methods fail or the challenge requires it, the skill forwards the task to configured third-party services (2Captcha, Anti-Captcha) to retrieve tokens or answers.

When to use it

  • Automating recognition of simple image or numeric CAPTCHAs where local OCR is sufficient.
  • Solving slide-gap CAPTCHAs in automated testing or scraping with simulated human-like trajectories.
  • Handling complex site CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile) by using paid API providers.
  • Balancing cost and reliability: try local OCR first and fall back to paid solvers on failure.
  • Integrating CAPTCHA solving into headless browser workflows or testing pipelines.

Best practices

  • Prefer local OCR for privacy and zero API cost; configure Tesseract path and language packs appropriately.
  • Chain strategies: preprocess images, then segment and recognize; only call paid APIs when local attempts fail.
  • Respect target sites’ terms of service and avoid using the tool for unlawful or abusive activities.
  • Tune preprocessing parameters (thresholds, denoising strength) per target CAPTCHA style for best accuracy.
  • Use realistic movement profiles for slide CAPTCHAs to reduce detection by anti-bot heuristics.

Example use cases

  • Automated QA for web forms that include simple image CAPTCHAs during staging.
  • Scraping public data where occasional CAPTCHAs appear and automated fallback is needed.
  • Integrating slide-gap solving into end-to-end tests for interactive UI components.
  • Using third-party solvers when encountering reCAPTCHA or hCaptcha during large-scale runs.
  • Prototyping ML-based OCR improvements by swapping recognition backends in the solver.

FAQ

Which CAPTCHAs can the local OCR solve reliably?

Local OCR handles simple text, numeric, and basic math CAPTCHAs with typical success rates; slide-gap detection is supported but may need tuning per site.

When should I use a paid API?

Use paid APIs for high-success needs or for complex challenges like reCAPTCHA v2/v3 and hCaptcha, or when local OCR fails repeatedly.

How do I configure Tesseract and API keys?

Set the TESSERACT_CMD path and LANG for languages, and provide service API keys (2Captcha, Anti-Captcha) in the configuration to enable third-party solving.