home / skills / openclaw / skills / pdf-converter

pdf-converter skill

/skills/lijie420461340/pdf-converter

This skill converts PDFs to editable formats like Word or Excel and back, preserving layout and enabling batch processing.

npx playbooks add skill openclaw/skills --skill pdf-converter

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
6.4 KB
---
name: PDF Converter
description: Convert PDF files to and from Word, Excel, Image, and other formats
author: claude-office-skills
version: "1.0"
tags: [pdf, conversion, document, format, export, import]
models: [claude-sonnet-4, claude-opus-4]
tools: [computer, file_operations]
---

# PDF Converter

Convert PDF files to various formats and vice versa while preserving formatting.

## Overview

This skill helps you:
- Convert PDFs to editable formats (Word, Excel)
- Convert documents to PDF
- Extract images from PDFs
- Optimize conversion quality
- Handle batch conversions

## Supported Conversions

### PDF to Other Formats
| Target Format | Best For | Quality |
|---------------|----------|---------|
| **Word (.docx)** | Text-heavy documents | ⭐⭐⭐⭐ |
| **Excel (.xlsx)** | Tables and data | ⭐⭐⭐⭐ |
| **PowerPoint (.pptx)** | Presentations | ⭐⭐⭐ |
| **Images (.png/.jpg)** | Visual snapshots | ⭐⭐⭐⭐⭐ |
| **Text (.txt)** | Plain text extraction | ⭐⭐⭐⭐ |
| **HTML** | Web content | ⭐⭐⭐ |
| **Markdown (.md)** | Structured text | ⭐⭐⭐ |

### Other Formats to PDF
| Source Format | Quality Notes |
|---------------|---------------|
| **Word (.docx)** | Excellent preservation |
| **Excel (.xlsx)** | Good, check page breaks |
| **PowerPoint (.pptx)** | Excellent with animations flat |
| **Images** | Depends on resolution |
| **HTML** | Variable, CSS may differ |
| **Text (.txt)** | Perfect, but basic |

## How to Use

### Basic Conversion
```
"Convert this PDF to Word"
"Save this document as PDF"
"Extract this PDF as images"
```

### With Options
```
"Convert PDF to Word, preserve exact formatting"
"Export PDF pages 1-5 as PNG images at 300 DPI"
"Convert Excel to PDF, fit all columns on one page"
```

### Batch Conversion
```
"Convert all PDFs in this folder to Word documents"
"Create PDFs from these 10 Word files"
```

## Conversion Guidelines

### PDF to Word
```markdown
## PDF to Word Conversion

### Best Practices
1. **Check source PDF type**:
   - Native PDF (from Word/etc): Best results
   - Scanned PDF: Use OCR first
   - Image-based: Limited accuracy

2. **Formatting considerations**:
   - Complex layouts may shift
   - Fonts substitute if not installed
   - Tables may need adjustment
   - Headers/footers require review

### Quality Settings
| Setting | Result |
|---------|--------|
| **Exact** | Matches layout precisely, harder to edit |
| **Editable** | Optimized for editing, may shift layout |
| **Text only** | Plain text, no formatting |

### Common Issues
| Issue | Solution |
|-------|----------|
| Text as image | Run OCR before converting |
| Missing fonts | Embed or substitute fonts |
| Broken tables | Manually adjust in Word |
| Lost colors | Check color profile settings |
```

### PDF to Excel
```markdown
## PDF to Excel Conversion

### Ideal Sources
- PDF with clear table structure
- Financial statements
- Data reports
- Invoices with line items

### Extraction Methods
| Method | Use When |
|--------|----------|
| **Auto-detect tables** | Clear table borders |
| **Select area** | Tables without borders |
| **Full page** | Entire page is data |

### Quality Tips
1. Ensure PDF has selectable text (not scanned)
2. Clean table borders help detection
3. Merged cells may cause issues
4. Multi-page tables need manual merge

### Data Cleanup
After conversion, check:
- [ ] Column alignment
- [ ] Number formatting
- [ ] Date formats
- [ ] Merged cell handling
- [ ] Header row detection
```

### PDF to Images
```markdown
## PDF to Image Conversion

### Resolution Settings
| DPI | Use Case | File Size |
|-----|----------|-----------|
| 72 | Screen viewing | Small |
| 150 | Email/web | Medium |
| 300 | Print quality | Large |
| 600 | High-quality print | Very large |

### Format Selection
| Format | Best For |
|--------|----------|
| **PNG** | Text, graphics, transparency |
| **JPG** | Photos, smaller files |
| **TIFF** | Print production |
| **WebP** | Web optimization |

### Output Options
- All pages → separate images
- Specific pages → selected images
- Page range → batch export
```

### Document to PDF
```markdown
## Converting to PDF

### From Word
**Settings**:
- [ ] Embed fonts
- [ ] Include bookmarks
- [ ] Set PDF/A for archival
- [ ] Compress images (optional)

### From Excel
**Settings**:
- [ ] Define print area
- [ ] Set page breaks
- [ ] Choose orientation
- [ ] Fit to page options

### From PowerPoint
**Settings**:
- [ ] Slide range
- [ ] Include notes (optional)
- [ ] Quality level
- [ ] Handout format (optional)

### Universal Tips
1. Review in print preview first
2. Check page breaks
3. Ensure fonts are embedded
4. Verify hyperlinks work
```

## Batch Processing

### Batch Conversion Template
```markdown
## Batch Conversion Job

**Source**: [Folder path]
**Target Format**: [Format]
**Output Folder**: [Path]

### Files to Convert
| File | Pages | Status |
|------|-------|--------|
| document1.pdf | All | ✅ Complete |
| document2.pdf | All | ✅ Complete |
| document3.pdf | 1-5 | ⏳ Processing |

### Settings Applied
- Resolution: [X] DPI
- Quality: [High/Medium/Low]
- Naming: [Original name]_converted.[ext]

### Summary
- Total files: [X]
- Successful: [Y]
- Failed: [Z]
```

## Troubleshooting

### Common Issues
| Problem | Cause | Solution |
|---------|-------|----------|
| Text not selectable | Scanned PDF | Apply OCR first |
| Missing characters | Font issues | Embed fonts or convert |
| Poor image quality | Low DPI | Use higher resolution |
| Large file size | Uncompressed | Apply compression |
| Lost formatting | Complex layout | Use "exact" mode |

### Quality Checklist
After conversion, verify:
- [ ] All text present and readable
- [ ] Formatting approximately preserved
- [ ] Images included and clear
- [ ] Tables properly structured
- [ ] Links functional (if applicable)
- [ ] Page count matches

## Tool Recommendations

### Online Tools
- Adobe Acrobat (best quality)
- SmallPDF (easy to use)
- ILovePDF (batch friendly)
- PDF24 (free, good quality)

### Desktop Software
- Adobe Acrobat Pro
- Microsoft Office (built-in)
- LibreOffice (free)
- Foxit PDF Editor

### Command Line
- Pandoc (text formats)
- ImageMagick (images)
- pdftk (PDF manipulation)
- Poppler utilities

## Limitations

- Cannot perform actual file conversion (provides guidance)
- Scanned PDFs require OCR preprocessing
- Complex layouts may not convert perfectly
- Password-protected PDFs need password
- Some formatting always lost in conversion
- Quality depends on source PDF type

Overview

This skill converts PDF files to and from Word, Excel, PowerPoint, images, HTML, Markdown, and plain text while aiming to preserve layout and data. It supports single-file and batch jobs, offers resolution and quality options, and includes guidance for OCR and preprocessing scanned documents. Use it to extract text, images, tables, or produce print-ready PDFs from office files.

How this skill works

The skill inspects the PDF source type (native text, scanned image, or mixed) and applies appropriate extraction methods: OCR for image-based PDFs, table detection or area selection for data pages, and layout-preserving or edit-optimized modes for document output. It exposes settings for DPI, quality level, page range, and export format, and provides post-conversion checks and cleanup recommendations.

When to use it

  • You need editable Word or Excel versions of a PDF for editing or data analysis
  • Extract images or high-resolution page snapshots from PDFs
  • Convert multiple documents in a folder into a single target format (batch jobs)
  • Prepare print-ready PDFs from Word, Excel, or PowerPoint with embedded fonts
  • Process scanned documents by running OCR before conversion

Best practices

  • Identify source PDF type: run OCR on scanned or image-based PDFs first
  • Choose 'Exact' mode for layout fidelity or 'Editable' mode for easier edits
  • Set DPI appropriate to use case (300 DPI for print, 150 DPI for web)
  • Check fonts and embed or substitute missing fonts when exporting to PDF
  • Validate tables and numeric formats after PDF→Excel conversions and adjust merged cells

Example use cases

  • Convert financial reports (PDF) to Excel for analysis, using table auto-detection
  • Batch-convert a folder of resumes (PDF) to Word for HR editing
  • Export slides from a presentation PDF to high-resolution PNGs for a poster
  • Create archival PDFs from Word files with PDF/A and embedded fonts for long-term storage
  • Extract product images from a catalog PDF at 300 DPI for e-commerce listings

FAQ

Will conversions perfectly preserve complex layouts?

Complex layouts may shift; use 'Exact' mode to maximize fidelity but expect some manual adjustments for multi-column pages, floating elements, or complex tables.

How do I handle scanned PDFs?

Run OCR first to convert images to selectable text. OCR quality depends on scan resolution and clarity—use 300 DPI or higher for best results.