home / skills / affaan-m / everything-claude-code / nutrient-document-processing
This skill helps you automate document processing with Nutrient DWS API, converting formats, extracting data, redacting PII, and signing documents.
npx playbooks add skill affaan-m/everything-claude-code --skill nutrient-document-processingReview the files below or copy the command above to add this skill to your agents.
---
name: nutrient-document-processing
description: Process, convert, OCR, extract, redact, sign, and fill documents using the Nutrient DWS API. Works with PDFs, DOCX, XLSX, PPTX, HTML, and images.
---
# Nutrient Document Processing
Process documents with the [Nutrient DWS Processor API](https://www.nutrient.io/api/). Convert formats, extract text and tables, OCR scanned documents, redact PII, add watermarks, digitally sign, and fill PDF forms.
## Setup
Get a free API key at **[nutrient.io](https://dashboard.nutrient.io/sign_up/?product=processor)**
```bash
export NUTRIENT_API_KEY="pdf_live_..."
```
All requests go to `https://api.nutrient.io/build` as multipart POST with an `instructions` JSON field.
## Operations
### Convert Documents
```bash
# DOCX to PDF
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
-o output.pdf
# PDF to DOCX
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
-o output.docx
# HTML to PDF
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"html":"index.html"}]}' \
-o output.pdf
```
Supported inputs: PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS.
### Extract Text and Data
```bash
# Extract plain text
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
-o output.txt
# Extract tables as Excel
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
-o tables.xlsx
```
### OCR Scanned Documents
```bash
# OCR to searchable PDF (supports 100+ languages)
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
-o searchable.pdf
```
Languages: Supports 100+ languages via ISO 639-2 codes (e.g., `eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`). Full language names like `english` or `german` also work. See the [complete OCR language table](https://www.nutrient.io/guides/document-engine/ocr/language-support/) for all supported codes.
### Redact Sensitive Information
```bash
# Pattern-based (SSN, email)
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}' \
-o redacted.pdf
# Regex-based
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\\b[A-Z]{2}\\d{6}\\b"}}]}' \
-o redacted.pdf
```
Presets: `social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`.
### Add Watermarks
```bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
-o watermarked.pdf
```
### Digital Signatures
```bash
# Self-signed CMS signature
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}' \
-o signed.pdf
```
### Fill PDF Forms
```bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "[email protected]" \
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"[email protected]","date":"2026-02-06"}}]}' \
-o filled.pdf
```
## MCP Server (Alternative)
For native tool integration, use the MCP server instead of curl:
```json
{
"mcpServers": {
"nutrient-dws": {
"command": "npx",
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
"env": {
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
"SANDBOX_PATH": "/path/to/working/directory"
}
}
}
}
```
## When to Use
- Converting documents between formats (PDF, DOCX, XLSX, PPTX, HTML, images)
- Extracting text, tables, or key-value pairs from PDFs
- OCR on scanned documents or images
- Redacting PII before sharing documents
- Adding watermarks to drafts or confidential documents
- Digitally signing contracts or agreements
- Filling PDF forms programmatically
## Links
- [API Playground](https://dashboard.nutrient.io/processor-api/playground/)
- [Full API Docs](https://www.nutrient.io/guides/dws-processor/)
- [Agent Skill Repo](https://github.com/PSPDFKit-labs/nutrient-agent-skill)
- [npm MCP Server](https://www.npmjs.com/package/@nutrient-sdk/dws-mcp-server)
This skill processes and automates document workflows using the Nutrient DWS Processor API. It handles conversions, OCR, extraction, redaction, watermarking, digital signing, and PDF form filling across PDFs, Office files, HTML, and common image formats. The skill is built for integration into agent pipelines or developer tools to speed up repetitive document tasks.
You send multipart POST requests to the Nutrient API endpoint with an instructions JSON payload describing input parts and actions. The API performs conversions, OCR, table/text extraction, redactions, watermarks, signatures, and form fills and returns the requested output format. Optionally, an MCP server can run locally to expose the same functionality for native tool integrations.
What input formats are supported?
PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, and common image types (JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS).
How do I handle OCR for non-English text?
Specify the OCR language using ISO 639-2 codes (e.g., eng, fra, spa) or full language names in the instructions; the API supports 100+ languages.
Can I redact using custom patterns?
Yes. Use the regex redaction strategy to supply custom regular expressions, or choose from many built-in presets for common PII.