home / skills / openclaw / skills / chat-with-pdf

chat-with-pdf skill

safe

/skills/lijie420461340/chat-with-pdf

This skill lets you ask questions about PDF content, get structured summaries, and extract data with citations for fast insights.

npx playbooks add skill openclaw/skills --skill chat-with-pdf

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

5.1 KB

---
name: Chat with PDF
description: Answer questions about PDF content, summarize, and extract information
author: claude-office-skills
version: "1.0"
tags: [pdf, document-ai, qa, summarization, extraction]
models: [claude-sonnet-4, claude-opus-4]
tools: [computer, file_operations]
---

# Chat with PDF

Have intelligent conversations about PDF documents - ask questions, get summaries, and extract specific information.

## Overview

This skill enables you to:
- Ask questions about PDF content
- Get summaries at various detail levels
- Extract specific data points
- Compare information across multiple PDFs
- Find relevant sections quickly

## How to Use

### Basic Interaction
1. Share or upload the PDF document
2. Ask your question or request
3. Get contextual answers with citations

### Question Types

**Factual Questions**
```
"What is the contract value mentioned in this document?"
"Who are the parties involved in this agreement?"
"What are the key dates mentioned?"
```

**Summarization**
```
"Summarize this document in 3 bullet points"
"Give me an executive summary"
"What are the main topics covered?"
```

**Extraction**
```
"Extract all names and titles mentioned"
"List all financial figures in the document"
"Find all action items or deliverables"
```

**Analysis**
```
"What are the risks mentioned in this contract?"
"Are there any ambiguous terms?"
"What obligations does Party A have?"
```

## Output Formats

### Q&A Format
```markdown
**Question**: [Your question]

**Answer**: [Direct answer to your question]

**Source**: Page [X], Section [Y]
> "[Relevant quote from document]"

**Confidence**: [High/Medium/Low]
```

### Summary Format
```markdown
## Document Summary

**Type**: [Contract/Report/Manual/etc.]
**Pages**: [X]
**Date**: [If mentioned]

### Key Points
1. [Main point 1]
2. [Main point 2]
3. [Main point 3]

### Important Details
- [Detail 1]
- [Detail 2]
```

### Extraction Format
```markdown
## Extracted Information

### [Category 1]
| Item | Value | Location |
|------|-------|----------|
| [Item 1] | [Value] | Page X |
| [Item 2] | [Value] | Page Y |

### [Category 2]
...
```

## Best Practices

### For Better Answers
1. **Be specific**: "What is the termination clause?" vs "Tell me about the contract"
2. **Reference sections**: "What does Section 5.2 say about liability?"
3. **Ask follow-ups**: Build on previous answers for deeper understanding

### For Better Extraction
1. **Specify format**: "Extract as a table" or "List as bullet points"
2. **Name the fields**: "Extract: name, date, amount, description"
3. **Set criteria**: "Only extract amounts over $10,000"

### For Better Summaries
1. **Specify length**: "Summarize in 100 words" or "3 bullet points"
2. **Focus area**: "Summarize the financial terms only"
3. **Audience**: "Summarize for a legal team" vs "for executives"

## Example Workflows

### Contract Review
```
1. "What type of contract is this?"
2. "Who are the parties and what are their roles?"
3. "What are the key obligations for each party?"
4. "What is the term and renewal process?"
5. "What are the termination conditions?"
6. "Are there any unusual or concerning clauses?"
```

### Research Paper Analysis
```
1. "What is the main thesis or hypothesis?"
2. "What methodology was used?"
3. "What are the key findings?"
4. "What are the limitations mentioned?"
5. "What future research do they suggest?"
```

### Financial Report
```
1. "What is the total revenue reported?"
2. "How does this compare to last year?"
3. "What are the main expense categories?"
4. "What guidance is given for next quarter?"
5. "Extract all financial metrics into a table"
```

## Multi-Document Support

When working with multiple PDFs:

```
"Compare the terms in Contract A vs Contract B"
"Which document mentions [topic]?"
"Create a summary table comparing key points across all documents"
```

### Comparison Output
```markdown
## Document Comparison

| Aspect | Document A | Document B |
|--------|------------|------------|
| Term Length | 2 years | 3 years |
| Value | $50,000 | $75,000 |
| Termination | 30 days notice | 60 days notice |

### Key Differences
1. [Difference 1]
2. [Difference 2]

### Similarities
1. [Similarity 1]
2. [Similarity 2]
```

## Handling Challenges

### Scanned PDFs (Image-based)
- OCR will be applied automatically
- Quality depends on scan quality
- May have recognition errors

### Complex Layouts
- Tables may need reformatting
- Multi-column text is processed left-to-right
- Footnotes processed separately

### Long Documents
- Ask about specific sections for accuracy
- Request page-by-page summaries for overview
- Use targeted questions over broad ones

## Limitations

- Cannot execute code embedded in PDFs
- Password-protected PDFs need password
- Very large PDFs (500+ pages) may need chunking
- Handwritten content recognition is limited
- Cannot guarantee 100% accuracy on scanned documents
- Charts and images are described, not analyzed numerically

## Privacy Note

Document contents are processed according to the AI provider's privacy policy. For sensitive documents, consider:
- Using on-premise solutions
- Redacting sensitive information first
- Checking data retention policies

Overview

This skill lets you have intelligent conversations with PDF documents: ask targeted questions, get concise summaries, and extract structured data. It supports single- and multi-document workflows and provides citations and location references for answers. Use it to speed review, research, and data extraction from contracts, reports, manuals, and more.

How this skill works

Upload or point to one or more PDF files and ask a question or request a summary. The skill parses text (applies OCR for scanned pages), locates relevant passages, and returns answers with source citations and confidence estimates. It can extract named fields into tables, compare documents side-by-side, and produce summaries at custom lengths and focus areas.

When to use it

Review contracts to identify obligations, dates, and risks
Summarize long reports for executives or teams
Extract names, dates, amounts, and other structured data into tables
Compare terms across multiple agreements or versions
Quickly locate clauses, definitions, or references in manuals or policies

Best practices

Be specific: name the clause, section, or data fields you need
Request output format: table, bullet list, or short summary for precise results
Limit scope for long documents: specify pages or sections to improve accuracy
Provide passwords for protected PDFs and high-quality scans for better OCR
Follow up with clarifying questions to refine answers and capture context

Example use cases

Contract review: identify parties, term, termination, obligations, and risky clauses
Financial report analysis: extract revenue, expense categories, and key metrics into a table
Research papers: summarize thesis, methodology, findings, and limitations
Multi-document comparison: create a table comparing term length, value, and notice periods
Compliance audit: locate regulatory references and extract relevant passages for evidence

FAQ

What formats and sizes are supported?

PDF is the primary format. Very large PDFs (500+ pages) may be chunked; scanned PDFs are supported but OCR quality depends on scan clarity.

How are answers sourced and cited?

Answers include page and section references plus quoted passages when available, and a confidence indicator (High/Medium/Low).