home / skills / vm0-ai / vm0-skills / pdfco

pdfco skill

/pdfco

This skill integrates PDF.co to convert, extract, merge, split, and optimize PDFs with OCR support for automation.

npx playbooks add skill vm0-ai/vm0-skills --skill pdfco

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
8.3 KB
---
name: pdfco
description: PDF processing API for conversion, extraction, merging, splitting and more
vm0_secrets:
  - PDFCO_API_KEY
---

# PDF.co

All-in-one PDF processing API. Convert, extract, merge, split, compress PDFs and more. Supports OCR for scanned documents.

> Official docs: https://docs.pdf.co/

---

## When to Use

Use this skill when you need to:

- Extract text from PDF files (with OCR support)
- Convert PDF to CSV, JSON, or other formats
- Merge multiple PDFs into one
- Split PDF into multiple files
- Compress PDF to reduce file size
- Convert HTML/URL to PDF
- Parse invoices and documents with AI

---

## Prerequisites

1. Create an account at https://pdf.co/
2. Get your API key from https://app.pdf.co/

Set environment variable:

```bash
export PDFCO_API_KEY="[email protected]_your-api-key"
```

---


> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```

## How to Use

### 1. PDF to Text

Extract text from PDF with OCR support:

Write to `/tmp/request.json`:

```json
{
  "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-text/sample.pdf",
  "inline": true
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/convert/to/text" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

**With specific pages (1-indexed):**

Write to `/tmp/request.json`:

```json
{
  "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-text/sample.pdf",
  "pages": "1-3",
  "inline": true
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/convert/to/text" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

### 2. PDF to CSV

Convert PDF tables to CSV:

Write to `/tmp/request.json`:

```json
{
  "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-csv/sample.pdf",
  "inline": true
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/convert/to/csv" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

### 3. Merge PDFs

Combine multiple PDFs into one:

Write to `/tmp/request.json`:

```json
{
  "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-merge/sample1.pdf,https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-merge/sample2.pdf",
  "name": "merged.pdf"
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/merge" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

### 4. Split PDF

Split PDF by page ranges:

Write to `/tmp/request.json`:

```json
{
  "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-split/sample.pdf",
  "pages": "1-2,3-"
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/split" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

### 5. Compress PDF

Reduce PDF file size:

Write to `/tmp/request.json`:

```json
{
  "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-optimize/sample.pdf",
  "name": "compressed.pdf"
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/optimize" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

### 6. HTML to PDF

Convert HTML or URL to PDF:

Write to `/tmp/request.json`:

```json
{
  "html": "<h1>Hello World</h1><p>This is a test.</p>",
  "name": "output.pdf"
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/convert/from/html" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

**From URL:**

Write to `/tmp/request.json`:

```json
{
  "url": "https://example.com",
  "name": "webpage.pdf"
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/convert/from/url" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

### 7. AI Invoice Parser

Extract structured data from invoices:

Write to `/tmp/request.json`:

```json
{
  "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/ai-invoice-parser/sample-invoice.pdf",
  "inline": true
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/ai-invoice-parser" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

### 8. Upload Local File

Upload a local file first, then use the returned URL:

**Step 1: Get presigned upload URL**

```bash
bash -c 'curl -s "https://api.pdf.co/v1/file/upload/get-presigned-url?name=myfile.pdf&contenttype=application/pdf" --header "x-api-key: ${PDFCO_API_KEY}"' | jq -r '.presignedUrl, .url'
```

Copy the presigned URL and file URL from the response.

**Step 2: Upload file**

Replace `<presigned-url>` with the URL from Step 1:

```bash
curl -X PUT "<presigned-url>" --header "Content-Type: application/pdf" --data-binary @/path/to/your/file.pdf
```

**Step 3: Use file URL in subsequent API calls**

Replace `<file-url>` with the file URL from Step 1:

Write to `/tmp/request.json`:

```json
{
  "url": "<file-url>",
  "inline": true
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/pdf/convert/to/text" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

### 9. Async Mode (Large Files)

For large files, use async mode to avoid timeouts:

**Step 1: Start async job**

Write to `/tmp/request.json`:

```json
{
  "url": "https://example.com/large-file.pdf",
  "async": true
}
```

```bash
bash -c 'curl -s --location --request POST "https://api.pdf.co/v1/pdf/convert/to/text" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json' | jq -r '.jobId'
```

Copy the job ID from the response.

**Step 2: Check job status**

Replace `<job-id>` with the job ID from Step 1:

Write to `/tmp/request.json`:

```json
{
  "jobid": "<job-id>"
}
```

```bash
bash -c 'curl --location --request POST "https://api.pdf.co/v1/job/check" --header "x-api-key: ${PDFCO_API_KEY}" --header "Content-Type: application/json" -d @/tmp/request.json'
```

---

## Common Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `url` | string | URL to source file (required) |
| `inline` | boolean | Return result in response body |
| `async` | boolean | Run as background job |
| `pages` | string | Page range, **1-indexed** (e.g., "1-3", "1,3,5", "2-") |
| `name` | string | Output filename |
| `password` | string | PDF password if protected |
| `expiration` | integer | Output link expiration in minutes (default: 60) |

---

## Response Format

```json
{
  "url": "https://pdf-temp-files.s3.amazonaws.com/.../result.pdf",
  "pageCount": 5,
  "error": false,
  "status": 200,
  "name": "result.pdf",
  "credits": 10,
  "remainingCredits": 9990
}
```

With `inline: true`, the response includes `body` field with extracted content.

---

## API Endpoints

| Endpoint | Description |
|----------|-------------|
| `/pdf/convert/to/text` | PDF to text (OCR supported) |
| `/pdf/convert/to/csv` | PDF to CSV |
| `/pdf/convert/to/json` | PDF to JSON |
| `/pdf/merge` | Merge multiple PDFs |
| `/pdf/split` | Split PDF by pages |
| `/pdf/optimize` | Compress PDF |
| `/pdf/convert/from/html` | HTML to PDF |
| `/pdf/convert/from/url` | URL to PDF |
| `/ai-invoice-parser` | AI-powered invoice parsing |
| `/document-parser` | Template-based document parsing |
| `/file/upload/get-presigned-url` | Get upload URL |
| `/job/check` | Check async job status |

---

## Guidelines

1. **File Sources**: Use direct URLs or upload files first via presigned URL
2. **Large Files**: Use `async: true` for files over 40 pages or 10MB
3. **OCR**: Automatically enabled for scanned PDFs (set `lang` for non-English)
4. **Rate Limits**: Check your plan at https://pdf.co/pricing
5. **Output Expiration**: Download results within expiration time (default 60 min)
6. **Credits**: Each operation costs credits; check `remainingCredits` in response

Overview

This skill provides an all-in-one PDF processing API for conversion, extraction, merging, splitting, compression and more. It supports OCR for scanned documents and includes AI-powered parsers for invoices and structured documents. Use it to automate PDF workflows from simple text extraction to large-file async jobs.

How this skill works

Interact with the API by sending JSON requests that reference file URLs or uploaded files (via presigned URLs). Common operations include convert-to-text/CSV/JSON, merge, split, compress, HTML/URL-to-PDF, and AI invoice/document parsing. For large files, enable async mode to receive a job ID and poll the job status until results are ready.

When to use it

  • Extract searchable text from PDFs, including scanned images via OCR.
  • Convert PDF tables to CSV or structured JSON for downstream processing.
  • Combine multiple PDF files into a single document or split a PDF by page ranges.
  • Compress PDFs to reduce size for storage or email.
  • Convert HTML or web pages into PDF for archiving or reporting.
  • Parse invoices and documents into structured fields with AI or templates.

Best practices

  • Upload local files using presigned URLs, then pass the returned file URL to API endpoints.
  • Use async: true for files larger than ~10MB or PDFs with many pages to avoid timeouts.
  • Specify pages (1-indexed) when you only need a subset to save credits and time.
  • Set inline: true when you want content returned directly in the response body.
  • Monitor remainingCredits and output expiration; download results before the link expires.

Example use cases

  • Extract text from a scanned contract and feed it into an indexing/search pipeline.
  • Convert a batch of invoice PDFs to JSON for accounting automation.
  • Merge monthly reports into a single PDF for distribution.
  • Split a large scanned book into chapter PDFs by page ranges.
  • Generate PDF snapshots of web pages for compliance or audit trails.

FAQ

How do I process a local file?

Request a presigned upload URL, PUT your file to that URL, then use the returned file URL in subsequent API calls.

When should I use async mode?

Enable async for large files (over ~10MB or many pages) or long-running operations to avoid request timeouts; poll the job/check endpoint with the jobId.