home / skills / sickn33 / antigravity-awesome-skills / azure-ai-vision-imageanalysis-py

azure-ai-vision-imageanalysis-py skill

/skills/azure-ai-vision-imageanalysis-py

This skill helps you perform Azure AI Vision image analysis in Python, returning captions, tags, objects, OCR, and smart crops to enhance image understanding.

npx playbooks add skill sickn33/antigravity-awesome-skills --skill azure-ai-vision-imageanalysis-py

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
6.5 KB
---
name: azure-ai-vision-imageanalysis-py
description: |
  Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks.
  Triggers: "image analysis", "computer vision", "OCR", "object detection", "ImageAnalysisClient", "image caption".
package: azure-ai-vision-imageanalysis
---

# Azure AI Vision Image Analysis SDK for Python

Client library for Azure AI Vision 4.0 image analysis including captions, tags, objects, OCR, and more.

## Installation

```bash
pip install azure-ai-vision-imageanalysis
```

## Environment Variables

```bash
VISION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
VISION_KEY=<your-api-key>  # If using API key
```

## Authentication

### API Key

```python
import os
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["VISION_ENDPOINT"]
key = os.environ["VISION_KEY"]

client = ImageAnalysisClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key)
)
```

### Entra ID (Recommended)

```python
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.identity import DefaultAzureCredential

client = ImageAnalysisClient(
    endpoint=os.environ["VISION_ENDPOINT"],
    credential=DefaultAzureCredential()
)
```

## Analyze Image from URL

```python
from azure.ai.vision.imageanalysis.models import VisualFeatures

image_url = "https://example.com/image.jpg"

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[
        VisualFeatures.CAPTION,
        VisualFeatures.TAGS,
        VisualFeatures.OBJECTS,
        VisualFeatures.READ,
        VisualFeatures.PEOPLE,
        VisualFeatures.SMART_CROPS,
        VisualFeatures.DENSE_CAPTIONS
    ],
    gender_neutral_caption=True,
    language="en"
)
```

## Analyze Image from File

```python
with open("image.jpg", "rb") as f:
    image_data = f.read()

result = client.analyze(
    image_data=image_data,
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS]
)
```

## Image Caption

```python
result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.CAPTION],
    gender_neutral_caption=True
)

if result.caption:
    print(f"Caption: {result.caption.text}")
    print(f"Confidence: {result.caption.confidence:.2f}")
```

## Dense Captions (Multiple Regions)

```python
result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.DENSE_CAPTIONS]
)

if result.dense_captions:
    for caption in result.dense_captions.list:
        print(f"Caption: {caption.text}")
        print(f"  Confidence: {caption.confidence:.2f}")
        print(f"  Bounding box: {caption.bounding_box}")
```

## Tags

```python
result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.TAGS]
)

if result.tags:
    for tag in result.tags.list:
        print(f"Tag: {tag.name} (confidence: {tag.confidence:.2f})")
```

## Object Detection

```python
result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.OBJECTS]
)

if result.objects:
    for obj in result.objects.list:
        print(f"Object: {obj.tags[0].name}")
        print(f"  Confidence: {obj.tags[0].confidence:.2f}")
        box = obj.bounding_box
        print(f"  Bounding box: x={box.x}, y={box.y}, w={box.width}, h={box.height}")
```

## OCR (Text Extraction)

```python
result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.READ]
)

if result.read:
    for block in result.read.blocks:
        for line in block.lines:
            print(f"Line: {line.text}")
            print(f"  Bounding polygon: {line.bounding_polygon}")
            
            # Word-level details
            for word in line.words:
                print(f"  Word: {word.text} (confidence: {word.confidence:.2f})")
```

## People Detection

```python
result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.PEOPLE]
)

if result.people:
    for person in result.people.list:
        print(f"Person detected:")
        print(f"  Confidence: {person.confidence:.2f}")
        box = person.bounding_box
        print(f"  Bounding box: x={box.x}, y={box.y}, w={box.width}, h={box.height}")
```

## Smart Cropping

```python
result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.SMART_CROPS],
    smart_crops_aspect_ratios=[0.9, 1.33, 1.78]  # Portrait, 4:3, 16:9
)

if result.smart_crops:
    for crop in result.smart_crops.list:
        print(f"Aspect ratio: {crop.aspect_ratio}")
        box = crop.bounding_box
        print(f"  Crop region: x={box.x}, y={box.y}, w={box.width}, h={box.height}")
```

## Async Client

```python
from azure.ai.vision.imageanalysis.aio import ImageAnalysisClient
from azure.identity.aio import DefaultAzureCredential

async def analyze_image():
    async with ImageAnalysisClient(
        endpoint=endpoint,
        credential=DefaultAzureCredential()
    ) as client:
        result = await client.analyze_from_url(
            image_url=image_url,
            visual_features=[VisualFeatures.CAPTION]
        )
        print(result.caption.text)
```

## Visual Features

| Feature | Description |
|---------|-------------|
| `CAPTION` | Single sentence describing the image |
| `DENSE_CAPTIONS` | Captions for multiple regions |
| `TAGS` | Content tags (objects, scenes, actions) |
| `OBJECTS` | Object detection with bounding boxes |
| `READ` | OCR text extraction |
| `PEOPLE` | People detection with bounding boxes |
| `SMART_CROPS` | Suggested crop regions for thumbnails |

## Error Handling

```python
from azure.core.exceptions import HttpResponseError

try:
    result = client.analyze_from_url(
        image_url=image_url,
        visual_features=[VisualFeatures.CAPTION]
    )
except HttpResponseError as e:
    print(f"Status code: {e.status_code}")
    print(f"Reason: {e.reason}")
    print(f"Message: {e.error.message}")
```

## Image Requirements

- Formats: JPEG, PNG, GIF, BMP, WEBP, ICO, TIFF, MPO
- Max size: 20 MB
- Dimensions: 50x50 to 16000x16000 pixels

## Best Practices

1. **Select only needed features** to optimize latency and cost
2. **Use async client** for high-throughput scenarios
3. **Handle HttpResponseError** for invalid images or auth issues
4. **Enable gender_neutral_caption** for inclusive descriptions
5. **Specify language** for localized captions
6. **Use smart_crops_aspect_ratios** matching your thumbnail requirements
7. **Cache results** when analyzing the same image multiple times

Overview

This skill integrates the Azure AI Vision Image Analysis SDK for Python to generate captions, tags, object detections, OCR, people detection, and smart cropping suggestions. It provides a compact, developer-friendly interface for common computer vision tasks and supports both API key and Entra ID authentication. Use it to automate image understanding, accessibility, search indexing, and thumbnail generation workflows.

How this skill works

The skill creates an ImageAnalysisClient and calls analyze or analyze_from_url with a selected set of VisualFeatures (CAPTION, TAGS, OBJECTS, READ, PEOPLE, SMART_CROPS, DENSE_CAPTIONS). The client returns structured results: a primary caption with confidence, region-level dense captions, lists of tags, detected objects and people with bounding boxes, OCR blocks/lines/words with confidences, and recommended crop regions. Async client variants support high-throughput use cases.

When to use it

  • Generate accessible image captions for web or mobile apps
  • Extract text from images or documents using OCR
  • Detect and localize objects or people for moderation or analytics
  • Produce tags and dense captions for search and content indexing
  • Create smart crop coordinates for thumbnails and responsive design
  • Build pipelines that require both synchronous and async image analysis

Best practices

  • Request only the VisualFeatures you need to minimize latency and cost
  • Use the async ImageAnalysisClient for bulk or high-concurrency workloads
  • Handle HttpResponseError and inspect status codes for robust error handling
  • Enable gender_neutral_caption and set language for inclusive, localized captions
  • Provide smart_crops_aspect_ratios that match target thumbnail sizes
  • Cache analysis results for identical images to avoid repeated API calls

Example use cases

  • E-commerce product pages: auto-generate captions, tags, and thumbnail crops
  • Document processing: OCR text extraction and word-level confidence for indexing
  • Content moderation: detect people and objects, flag suspicious items
  • Media management: create dense captions and tags for search and editorial workflows
  • Mobile apps: on-device upload + server-side analysis to annotate photos automatically

FAQ

What authentication methods are supported?

Supports API key (AzureKeyCredential) and Entra ID via DefaultAzureCredential; Entra ID is recommended for production.

How do I reduce analysis cost and latency?

Select only necessary VisualFeatures, reuse or cache results, and use the async client for parallel processing.