home / skills / sickn33 / antigravity-awesome-skills / azure-ai-contentunderstanding-py

azure-ai-contentunderstanding-py skill

/skills/azure-ai-contentunderstanding-py

This skill helps you manage Azure AI Content Understanding workflows in Python, enabling multimodal extraction for documents, images, audio, and video.

npx playbooks add skill sickn33/antigravity-awesome-skills --skill azure-ai-contentunderstanding-py

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

7.7 KB

---
name: azure-ai-contentunderstanding-py
description: |
  Azure AI Content Understanding SDK for Python. Use for multimodal content extraction from documents, images, audio, and video.
  Triggers: "azure-ai-contentunderstanding", "ContentUnderstandingClient", "multimodal analysis", "document extraction", "video analysis", "audio transcription".
package: azure-ai-contentunderstanding
---

# Azure AI Content Understanding SDK for Python

Multimodal AI service that extracts semantic content from documents, video, audio, and image files for RAG and automated workflows.

## Installation

```bash
pip install azure-ai-contentunderstanding
```

## Environment Variables

```bash
CONTENTUNDERSTANDING_ENDPOINT=https://<resource>.cognitiveservices.azure.com/
```

## Authentication

```python
import os
from azure.ai.contentunderstanding import ContentUnderstandingClient
from azure.identity import DefaultAzureCredential

endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
credential = DefaultAzureCredential()
client = ContentUnderstandingClient(endpoint=endpoint, credential=credential)
```

## Core Workflow

Content Understanding operations are asynchronous long-running operations:

1. **Begin Analysis** — Start the analysis operation with `begin_analyze()` (returns a poller)
2. **Poll for Results** — Poll until analysis completes (SDK handles this with `.result()`)
3. **Process Results** — Extract structured results from `AnalyzeResult.contents`

## Prebuilt Analyzers

| Analyzer | Content Type | Purpose |
|----------|--------------|---------|
| `prebuilt-documentSearch` | Documents | Extract markdown for RAG applications |
| `prebuilt-imageSearch` | Images | Extract content from images |
| `prebuilt-audioSearch` | Audio | Transcribe audio with timing |
| `prebuilt-videoSearch` | Video | Extract frames, transcripts, summaries |
| `prebuilt-invoice` | Documents | Extract invoice fields |

## Analyze Document

```python
import os
from azure.ai.contentunderstanding import ContentUnderstandingClient
from azure.ai.contentunderstanding.models import AnalyzeInput
from azure.identity import DefaultAzureCredential

endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
client = ContentUnderstandingClient(
    endpoint=endpoint,
    credential=DefaultAzureCredential()
)

# Analyze document from URL
poller = client.begin_analyze(
    analyzer_id="prebuilt-documentSearch",
    inputs=[AnalyzeInput(url="https://example.com/document.pdf")]
)

result = poller.result()

# Access markdown content (contents is a list)
content = result.contents[0]
print(content.markdown)
```

## Access Document Content Details

```python
from azure.ai.contentunderstanding.models import MediaContentKind, DocumentContent

content = result.contents[0]
if content.kind == MediaContentKind.DOCUMENT:
    document_content: DocumentContent = content  # type: ignore
    print(document_content.start_page_number)
```

## Analyze Image

```python
from azure.ai.contentunderstanding.models import AnalyzeInput

poller = client.begin_analyze(
    analyzer_id="prebuilt-imageSearch",
    inputs=[AnalyzeInput(url="https://example.com/image.jpg")]
)
result = poller.result()
content = result.contents[0]
print(content.markdown)
```

## Analyze Video

```python
from azure.ai.contentunderstanding.models import AnalyzeInput

poller = client.begin_analyze(
    analyzer_id="prebuilt-videoSearch",
    inputs=[AnalyzeInput(url="https://example.com/video.mp4")]
)

result = poller.result()

# Access video content (AudioVisualContent)
content = result.contents[0]

# Get transcript phrases with timing
for phrase in content.transcript_phrases:
    print(f"[{phrase.start_time} - {phrase.end_time}]: {phrase.text}")

# Get key frames (for video)
for frame in content.key_frames:
    print(f"Frame at {frame.time}: {frame.description}")
```

## Analyze Audio

```python
from azure.ai.contentunderstanding.models import AnalyzeInput

poller = client.begin_analyze(
    analyzer_id="prebuilt-audioSearch",
    inputs=[AnalyzeInput(url="https://example.com/audio.mp3")]
)

result = poller.result()

# Access audio transcript
content = result.contents[0]
for phrase in content.transcript_phrases:
    print(f"[{phrase.start_time}] {phrase.text}")
```

## Custom Analyzers

Create custom analyzers with field schemas for specialized extraction:

```python
# Create custom analyzer
analyzer = client.create_analyzer(
    analyzer_id="my-invoice-analyzer",
    analyzer={
        "description": "Custom invoice analyzer",
        "base_analyzer_id": "prebuilt-documentSearch",
        "field_schema": {
            "fields": {
                "vendor_name": {"type": "string"},
                "invoice_total": {"type": "number"},
                "line_items": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "description": {"type": "string"},
                            "amount": {"type": "number"}
                        }
                    }
                }
            }
        }
    }
)

# Use custom analyzer
from azure.ai.contentunderstanding.models import AnalyzeInput

poller = client.begin_analyze(
    analyzer_id="my-invoice-analyzer",
    inputs=[AnalyzeInput(url="https://example.com/invoice.pdf")]
)

result = poller.result()

# Access extracted fields
print(result.fields["vendor_name"])
print(result.fields["invoice_total"])
```

## Analyzer Management

```python
# List all analyzers
analyzers = client.list_analyzers()
for analyzer in analyzers:
    print(f"{analyzer.analyzer_id}: {analyzer.description}")

# Get specific analyzer
analyzer = client.get_analyzer("prebuilt-documentSearch")

# Delete custom analyzer
client.delete_analyzer("my-custom-analyzer")
```

## Async Client

```python
import asyncio
import os
from azure.ai.contentunderstanding.aio import ContentUnderstandingClient
from azure.ai.contentunderstanding.models import AnalyzeInput
from azure.identity.aio import DefaultAzureCredential

async def analyze_document():
    endpoint = os.environ["CONTENTUNDERSTANDING_ENDPOINT"]
    credential = DefaultAzureCredential()
    
    async with ContentUnderstandingClient(
        endpoint=endpoint,
        credential=credential
    ) as client:
        poller = await client.begin_analyze(
            analyzer_id="prebuilt-documentSearch",
            inputs=[AnalyzeInput(url="https://example.com/doc.pdf")]
        )
        result = await poller.result()
        content = result.contents[0]
        return content.markdown

asyncio.run(analyze_document())
```

## Content Types

| Class | For | Provides |
|-------|-----|----------|
| `DocumentContent` | PDF, images, Office docs | Pages, tables, figures, paragraphs |
| `AudioVisualContent` | Audio, video files | Transcript phrases, timing, key frames |

Both derive from `MediaContent` which provides basic info and markdown representation.

## Model Imports

```python
from azure.ai.contentunderstanding.models import (
    AnalyzeInput,
    AnalyzeResult,
    MediaContentKind,
    DocumentContent,
    AudioVisualContent,
)
```

## Client Types

| Client | Purpose |
|--------|---------|
| `ContentUnderstandingClient` | Sync client for all operations |
| `ContentUnderstandingClient` (aio) | Async client for all operations |

## Best Practices

1. **Use `begin_analyze` with `AnalyzeInput`** — this is the correct method signature
2. **Access results via `result.contents[0]`** — results are returned as a list
3. **Use prebuilt analyzers** for common scenarios (document/image/audio/video search)
4. **Create custom analyzers** only for domain-specific field extraction
5. **Use async client** for high-throughput scenarios with `azure.identity.aio` credentials
6. **Handle long-running operations** — video/audio analysis can take minutes
7. **Use URL sources** when possible to avoid upload overhead

Overview

This skill provides a Python SDK wrapper for Azure AI Content Understanding, enabling multimodal content extraction from documents, images, audio, and video. It exposes sync and async clients, prebuilt analyzers for common tasks, and support for custom analyzers to extract structured fields. Use it to produce transcripts, markdown-ready content, key frames, and extracted metadata for RAG and automated workflows.

How this skill works

The SDK starts long-running analysis operations with begin_analyze(), returning a poller you can wait on with .result() (or await in the async client). Results are returned as AnalyzeResult with a list of MediaContent-derived objects (DocumentContent or AudioVisualContent) that include markdown, transcripts, timing, pages, tables, and fields. Prebuilt analyzers handle common formats; custom analyzers accept a JSON field schema for domain-specific extraction.

When to use it

Extract searchable markdown and structured data from PDFs, Office docs, and scanned images for RAG pipelines.
Transcribe audio files and produce time-aligned phrases for subtitles, search, or speaker diarization.
Analyze videos to get transcripts, key frames, and timestamps for indexing and summarization.
Build automated extraction for invoices, receipts, or domain documents using custom analyzers.
Integrate async analysis in high-throughput services to avoid blocking on long-running jobs.

Best practices

Use begin_analyze with AnalyzeInput objects that reference stable URLs to avoid upload overhead.
Prefer prebuilt analyzers (documentSearch, imageSearch, audioSearch, videoSearch) for common tasks to get robust defaults quickly.
Use the async client (azure.ai.contentunderstanding.aio) with DefaultAzureCredential for scalable, non-blocking processing.
Handle long-running operations gracefully: poll or await poller.result() and implement retries/timeouts for network errors.
Create custom analyzers only when you need structured field extraction; keep field schemas minimal and test with sample documents.

Example use cases

Convert a corpus of PDFs into markdown for an LLM retrieval-augmented generation (RAG) index.
Transcribe podcast episodes with timecodes and extract highlights or chapter timestamps.
Scan invoices to auto-populate ERP fields like vendor_name, invoice_total, and line_items.
Extract frames and scene-level transcripts from training videos to create searchable learning modules.
Run image content extraction on product photos to produce alt-text and descriptive metadata.

FAQ

Do I need different clients for sync and async?

Yes. Use ContentUnderstandingClient for synchronous calls and azure.ai.contentunderstanding.aio.ContentUnderstandingClient for async workflows.

How do I handle long-running analyses?

Use the poller returned by begin_analyze() and call .result() (or await poller.result() in async). Implement timeouts and retries for stability.