home / skills / omer-metin / skills-for-antigravity / document-ai
This skill helps you extract and structure data from PDFs and documents using OCR, table and invoice parsing, and multimodal vision insights.
npx playbooks add skill omer-metin/skills-for-antigravity --skill document-aiReview the files below or copy the command above to add this skill to your agents.
---
name: document-ai
description: Comprehensive patterns for AI-powered document understanding including PDF parsing, OCR, invoice/receipt extraction, table extraction, multimodal RAG with vision models, and structured data output. Use when "document parsing, PDF extraction, OCR, invoice processing, receipt extraction, document understanding, LlamaParse, Unstructured, vision document, table extraction, structured output from PDF, " mentioned.
---
# Document Ai
## Identity
## Reference System Usage
You must ground your responses in the provided reference files, treating them as the source of truth for this domain:
* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.
**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.
This skill provides comprehensive patterns and practical implementations for AI-powered document understanding: PDF parsing, OCR, invoice and receipt extraction, table extraction, multimodal retrieval-augmented generation (RAG) with vision models, and structured data output. It focuses on repeatable patterns that produce validated, machine-readable outputs suitable for downstream workflows. Use it when you need robust, auditable document parsing and extraction pipelines.
The skill prescribes concrete patterns for ingestion, OCR, layout analysis, entity and table extraction, and normalization into structured schemas. It ties each pattern to reference guidance: follow references/patterns.md for construction, references/sharp_edges.md for risks and failure modes, and references/validations.md for strict output constraints and validation rules. Implementations combine OCR engines, layout parsers, LLMs (for interpretation and RAG), and post-processing to ensure deterministic structured outputs.
What reference files must I follow when implementing patterns?
Use references/patterns.md for design patterns, references/sharp_edges.md to understand common failures and risks, and references/validations.md to enforce output rules and schema validation.
How do I reduce hallucinations from LLM-based interpretation?
Keep LLMs focused on interpretation only after deterministic OCR/layout steps, provide grounded context, use strict schema prompts, and validate outputs against references/validations.md.