home / skills / dkyazzentwatwa / chatgpt-skills / receipt-scanner
This skill extracts vendor, date, items, amounts, and total from receipt images using OCR, delivering structured JSON output for easy integration.
npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill receipt-scannerReview the files below or copy the command above to add this skill to your agents.
---
name: receipt-scanner
description: Extract vendor, date, items, amounts, and total from receipt images using OCR and pattern matching with structured JSON output.
---
# Receipt Scanner
Extract structured data from receipt images using OCR.
## Features
- **OCR Processing**: Extract text from receipt images
- **Data Extraction**: Vendor, date, items, amounts, total, tax
- **Pattern Matching**: Smart regex patterns for receipts
- **Multi-Format Support**: JPG, PNG, PDF receipts
- **JSON/CSV Export**: Structured data output
- **Batch Processing**: Process multiple receipts
## CLI Usage
```bash
python receipt_scanner.py --input receipt.jpg --output data.json
python receipt_scanner.py --batch receipts/ --output receipts.csv
```
## Dependencies
- pytesseract>=0.3.10
- pillow>=10.0.0
- opencv-python>=4.8.0
- pandas>=2.0.0
This skill extracts structured data from receipt images using OCR and pattern matching, returning vendor, date, line items, amounts, taxes, and totals in JSON or CSV. It supports common image and PDF formats and can run on single files or batches for bulk processing. The implementation focuses on reliable parsing with tested regex patterns and clean, exportable outputs.
The scanner runs OCR on each receipt image to get raw text, then applies configurable pattern matching and heuristics to identify vendor, transaction date, line items (description, quantity, unit price), taxes, and total amounts. It normalizes amounts and dates, validates numeric fields, and outputs a structured JSON object or tabular CSV. Batch mode iterates files and aggregates results into a single output file.
Which file formats are supported?
JPG, PNG and PDF are supported; multi-page PDFs are handled by iterating pages.
How accurate is the extraction?
Accuracy depends on image quality and receipt layout. Clear, high-resolution images and vendor-specific rules significantly improve results.
Can outputs be customized?
Yes. Field patterns and output fields are configurable; you can add vendor-specific parsing rules or change CSV/JSON schemas.