home / skills / benchflow-ai / skillsbench / pdf-processing

pdf-processing skill

/registry/terminal_bench_2.0/full_batch_reviewed/terminal_bench_2_0_financial-document-processor/environment/skills/pdf-processing

This skill guides you to use LlamaParse for accurate PDF text extraction, reducing errors and speeding up document processing.

npx playbooks add skill benchflow-ai/skillsbench --skill pdf-processing

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
1.2 KB
---
name: PDF Processing
description: Invoke this skill BEFORE implementing any text extraction/parsing logic to learn how to use LlamaParse to process any document accurately. Requires llama_cloud_services package and LLAMA_CLOUD_API_KEY as an environment variable.
---

# PDF Processing

## Quick start

Extract text:

```python
from llama_cloud_services import LlamaParse

parser = LlamaParse(
    parse_mode="parse_page_with_agent",
    model="openai-gpt-4-1-mini",
    high_res_ocr=True,
    adaptive_long_table=True,
    outlined_table_extraction=True,
    output_tables_as_HTML=True,
    result_type="markdown",
    project_id=project_id,
    organization_id=organization_id,
)

result = parser.parse("./my_file.pdf")
documents = result.get_markdown_documents(split_by_page=True)

full_text = ""

for document in documents:
    full_text += document.text + "\n\n---\n\n"
```

For more detailed code implementations, see [REFERENCE.md](REFERENCE.md).

## Requirements

The `llama_cloud_services` package must be installed in your environment:

```bash
pip install llama_cloud_services
```

And the `LLAMA_CLOUD_API_KEY` must be available as an environment variable:

```bash
export LLAMA_CLOUD_API_KEY="..."
```

Overview

This skill guides using LlamaParse to preprocess PDF files before implementing any text extraction or parsing logic. It ensures accurate OCR, table handling, and structured output so downstream parsing is robust and reliable. It requires the llama_cloud_services package and an LLAMA_CLOUD_API_KEY environment variable.

How this skill works

The skill configures LlamaParse with options for high-resolution OCR, adaptive handling of long tables, and outlined table extraction to convert PDFs into clean, structured Markdown or HTML table outputs. It runs a parsing agent over pages, returns document objects split by page if desired, and exposes text and table outputs for easy aggregation or further NLP processing.

When to use it

  • Before building text extraction logic for PDFs to validate document structure and content quality.
  • When PDFs contain scanned pages that need high-resolution OCR for improved accuracy.
  • When documents include complex or long tables that require adaptive extraction and HTML output.
  • When you need page-level segmentation or markdown-ready output for downstream parsers.
  • When you must standardize diverse PDFs into a predictable text/table representation.

Best practices

  • Install llama_cloud_services and set LLAMA_CLOUD_API_KEY in your environment before running the parser.
  • Enable high_res_ocr for scanned documents and use adaptive_long_table for multi-page tables.
  • Request output_tables_as_HTML when preserving table layout is important for later parsing.
  • Use result_type='markdown' and split_by_page=True to simplify incremental processing.
  • Aggregate page documents into a single corpus only after verifying page-level extraction quality.

Example use cases

  • Preprocessing legal or regulatory PDFs with mixed scanned pages and native text to prepare for clause extraction.
  • Converting financial reports with long tables into HTML tables to feed into a table-parsing pipeline.
  • Splitting multi-page research papers into page-level markdown for citation and section extraction.
  • Validating OCR quality on digitized records before training a downstream information-extraction model.

FAQ

What packages and environment variables are required?

Install the llama_cloud_services package and set LLAMA_CLOUD_API_KEY as an environment variable before using the skill.

Which output formats can I get from the parser?

The parser can produce markdown documents and HTML tables; you can configure result types and table output options to match your pipeline.