home / skills / openclaw / skills / meta-video-ad-analyzer

meta-video-ad-analyzer skill

Q: What outputs will I get from a single analysis?

You receive duration, transcript, a timeline of detected on-screen text, a scene timeline with descriptions, and a thumbnail URL packaged in an ExtractedVideoContent object.

Q: How do I reduce processing time on long videos?

Increase scene_interval and text_interval to sample less frequently, preprocess and trim source clips, or analyze representative segments rather than full-length assets.

safe

/skills/fortytwode/meta-video-ad-analyzer

This skill analyzes video ads by extracting frames, OCR, transcripts, and scene descriptions using Gemini Vision for actionable insights.

npx playbooks add skill openclaw/skills --skill meta-video-ad-analyzer

Review the files below or copy the command above to add this skill to your agents.

Files (7)

SKILL.md

3.6 KB

---
name: video-ad-analyzer
version: 1.0.0
description: Extract and analyze content from video ads using Gemini Vision AI. Supports frame extraction, OCR text detection, audio transcription, and AI-powered scene analysis. Use when analyzing video creative content, extracting text overlays, or generating scene-by-scene descriptions.
---

# Video Ad Analyzer

AI-powered video content extraction using Google Gemini Vision.

## What This Skill Does

- **Frame Extraction**: Smart sampling with scene change detection
- **OCR Text Detection**: Extract text overlays using EasyOCR
- **Audio Transcription**: Convert speech to text with Google Cloud Speech
- **AI Scene Analysis**: Describe each scene using Gemini Vision
- **Native Video Analysis**: Direct video understanding for longer content
- **Thumbnail Generation**: Auto-generate thumbnails from first frame

## Setup

### 1. Environment Variables

```bash
# Required for Gemini Vision
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Required for audio transcription
# (same service account needs Speech-to-Text API enabled)
```

### 2. Dependencies

```bash
pip install opencv-python pillow easyocr ffmpeg-python google-cloud-speech vertexai google-api-python-client
```

Also requires `ffmpeg` and `ffprobe` installed on system.

## Usage

### Basic Video Analysis

```python
from scripts.video_extractor import VideoExtractor
from scripts.models import ExtractedVideoContent
import vertexai
from vertexai.generative_models import GenerativeModel

# Initialize Vertex AI
vertexai.init(project="your-project-id", location="us-central1")
gemini_model = GenerativeModel("gemini-1.5-flash")

# Create extractor
extractor = VideoExtractor(gemini_model=gemini_model)

# Analyze video
result = extractor.extract_content("/path/to/video.mp4")

print(f"Duration: {result.duration}s")
print(f"Scenes: {len(result.scene_timeline)}")
print(f"Text overlays: {len(result.text_timeline)}")
print(f"Transcript: {result.transcript[:200]}...")
```

### Extract Only Frames

```python
frames, timestamps, text_timeline, scene_timeline, thumbnail = extractor.extract_smart_frames(
    "/path/to/video.mp4",
    scene_interval=2,    # Check for scene changes every 2s
    text_interval=0.5    # Check for text every 0.5s
)
```

### Analyze Images

```python
# Works with images too
result = extractor.extract_content("/path/to/image.jpg")
print(result.scene_timeline[0]['description'])
```

## Output Structure

```python
ExtractedVideoContent(
    video_path="/path/to/video.mp4",
    duration=30.5,
    transcript="Here's what we found...",
    text_timeline=[
        {"at": 0.0, "text": ["Download Now"]},
        {"at": 5.5, "text": ["50% Off Today"]}
    ],
    scene_timeline=[
        {"timestamp": 0.0, "description": "Woman using phone app..."},
        {"timestamp": 2.0, "description": "Product showcase with features..."}
    ],
    thumbnail_url="/static/thumbnails/video_thumb.jpg",
    extraction_complete=True
)
```

## Key Features

| Feature | Description |
|---------|-------------|
| Scene Detection | Histogram-based change detection (threshold=65) |
| OCR Confidence | Tiered thresholds (0.5 high, 0.3 low) |
| AI Proofreading | Gemini cleans up OCR errors |
| Source Reconciliation | Merges OCR + Vision text intelligently |
| Native Video | Direct Gemini analysis for <20MB files |

## Prompts

Customize AI behavior by editing prompts in the `prompts/` folder:

- `scene_analysis.md` - Frame analysis prompts
- `scene_reconciliation.md` - Scene enrichment prompts

## Common Questions This Answers

- "What text appears in this video ad?"
- "Describe each scene in this creative"
- "What does the narrator say?"
- "Extract the call-to-action from this ad"

Overview

This skill extracts and analyzes content from video ads using Gemini Vision and supporting tools. It performs smart frame sampling, OCR text detection, audio transcription, and AI-powered scene analysis to produce a structured, searchable representation of creative content. Use it to rapidly review ad assets, pull text overlays, and generate scene-by-scene descriptions.

How this skill works

The extractor samples frames with scene-change detection and optional interval tuning, runs EasyOCR on frames to capture text overlays, and transcribes audio via Google Cloud Speech. Gemini Vision enriches frames with natural language scene descriptions and reconciles OCR with vision outputs. Results are returned as a single ExtractedVideoContent object containing duration, transcript, text timeline, scene timeline, and a generated thumbnail.

When to use it

Review and audit multiple video ad creatives quickly
Extract on-screen copy, captions, and CTAs from ads
Generate scene-by-scene descriptions for tagging or QA
Create searchable metadata and transcripts for an ad library
Produce thumbnails and highlights for content previews

Best practices

Provide a service account with Speech-to-Text and Vertex AI enabled and set GOOGLE_APPLICATION_CREDENTIALS
Tune scene_interval and text_interval to balance speed and fidelity (shorter intervals = more detail, higher processing cost)
Pre-convert videos to common codecs/resolutions to avoid ffmpeg issues and speed processing
Adjust OCR confidence thresholds for noisy visuals; use AI proofreading to correct OCR errors
Limit native Gemini analysis to smaller files (<20 MB) for cost-effective direct video understanding

Example use cases

Bulk-analyze a folder of ad variations to extract CTAs and top-line messaging
Generate transcripts and time-indexed text overlays for compliance review
Create scene-by-scene descriptions for storyboard comparison during creative iteration
Auto-generate thumbnails and highlight frames for an ad library UI
Feed structured outputs into A/B testing workflows to correlate creative elements with performance

FAQ

What outputs will I get from a single analysis?

You receive duration, transcript, a timeline of detected on-screen text, a scene timeline with descriptions, and a thumbnail URL packaged in an ExtractedVideoContent object.

How do I reduce processing time on long videos?

Increase scene_interval and text_interval to sample less frequently, preprocess and trim source clips, or analyze representative segments rather than full-length assets.