home / skills / vm0-ai / vm0-skills / supadata
This skill helps you extract transcripts and scrape web content using the Supadata API via curl, delivering ready-to-use markdown and metadata.
npx playbooks add skill vm0-ai/vm0-skills --skill supadataReview the files below or copy the command above to add this skill to your agents.
---
name: supadata
description: Supadata API via curl. Use this skill to extract transcripts from YouTube/TikTok/Instagram videos and scrape web content to markdown.
vm0_secrets:
- SUPADATA_API_KEY
---
# Supadata API
Use the Supadata API via direct `curl` calls to **extract video transcripts** and **scrape web content** for AI applications.
> Official docs: `https://docs.supadata.ai/`
---
## When to Use
Use this skill when you need to:
- **Extract transcripts** from YouTube, TikTok, Instagram, X (Twitter), Facebook videos
- **Scrape web pages** to markdown format for AI processing
- **Get video/channel metadata** from social platforms
- **Crawl websites** to extract content from multiple pages
---
## Prerequisites
1. Sign up at [Supadata Dashboard](https://dash.supadata.ai/)
2. API key is automatically generated on signup (no credit card required)
3. Store your API key in environment variable
```bash
export SUPADATA_API_KEY="your-api-key"
```
### Pricing
- Transcript fetch (existing): 1 credit
- Transcript generation (AI): 2 credits/minute
- Free tier available
---
> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"' | jq .
> ```
## How to Use
All examples below assume you have `SUPADATA_API_KEY` set.
The base URL for the API is:
- `https://api.supadata.ai/v1`
Authentication uses the `x-api-key` header.
---
### 1. Get YouTube Video Transcript
Extract transcript from a YouTube video:
Write to `/tmp/supadata_url.txt`:
```
https://www.youtube.com/watch?v=dQw4w9WgXcQ
```
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"'
```
**Parameters:**
- `url`: Video URL (required)
- `text`: Return plain text (`true`) or timestamped chunks (`false`, default)
- `lang`: Preferred language (ISO 639-1 code, e.g., `en`, `zh`)
- `mode`: `native` (existing only), `generate` (AI), `auto` (default)
---
### 2. Get Transcript with Timestamps
Get transcript with timing information:
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=false"' | jq '.content[:3]'
```
Response format:
```json
{
"content": [
{"text": "Hello", "offset": 0, "duration": 1500, "lang": "en"}
],
"lang": "en",
"availableLangs": ["en", "es", "zh"]
}
```
---
### 3. Get TikTok/Instagram/X Transcript
Extract transcript from other platforms:
```bash
# TikTok
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"'
# Instagram Reel
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"'
```
Supported platforms: YouTube, TikTok, Instagram, X (Twitter), Facebook
---
### 4. Native Transcript Only (Save Credits)
Fetch only existing transcripts without AI generation:
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true" -d "mode=native"'
```
Use `mode=native` to avoid AI generation costs (1 credit vs 2 credits/min).
---
### 5. Get YouTube Channel Metadata
Get channel information:
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/youtube/channel" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "id=@mkbhd"' | jq '{name, subscriberCount, videoCount}
```
Accepts channel URL, channel ID, or handle (e.g., `@mkbhd`).
---
### 6. Get YouTube Video Metadata
Get video information:
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/youtube/video" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt"' | jq '{title, viewCount, likeCount, duration}
```
---
### 7. Get Social Media Metadata
Get metadata from any supported platform:
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/metadata" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt"'
```
Works with YouTube, TikTok, Instagram, X, Facebook posts.
---
### 8. Scrape Web Page to Markdown
Extract web page content:
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/web/scrape" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt"'
```
Returns page content in Markdown format, ideal for AI processing.
---
### 9. Map Website Links
Get all links from a website:
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/web/map" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt"' | jq '.urls[:10]'
```
---
### 10. Crawl Website (Async)
Start a crawl job for multiple pages.
Write to `/tmp/supadata_request.json`:
```json
{
"url": "https://example.com",
"maxPages": 10
}
```
Then run:
```bash
# Start crawl
JOB_ID="$(bash -c 'curl -s "https://api.supadata.ai/v1/web/crawl" -X POST -H "x-api-key: ${SUPADATA_API_KEY}" -H "Content-Type: application/json" -d @/tmp/supadata_request.json' | jq -r '.jobId')"
echo "Job ID: ${JOB_ID}"
# Check status
bash -c 'curl -s "https://api.supadata.ai/v1/web/crawl/<your-job-id>" -H "x-api-key: ${SUPADATA_API_KEY}"' | jq '{status, pagesCompleted}'
```
Status values: `queued`, `active`, `completed`, `failed`
---
### 11. Translate Transcript
Translate a YouTube transcript to another language:
```bash
bash -c 'curl -s "https://api.supadata.ai/v1/youtube/transcript/translate" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "lang=zh" -d "text=true"'
```
---
## Response Handling
**Synchronous (HTTP 200):** Direct result returned.
**Asynchronous (HTTP 202):** Returns `jobId` for polling:
```json
{"jobId": "abc123"}
```
Poll the job endpoint until status is `completed`.
---
## Guidelines
1. **Use `mode=native` to save credits**: Only fetches existing transcripts
2. **URL encode parameters**: Use `--data-urlencode` for URLs
3. **Check available languages**: Response includes `availableLangs` array
4. **Handle async responses**: Some requests return job IDs for polling
5. **Max file size**: 1GB for direct file URLs
6. **Supported formats**: MP4, WEBM, MP3, FLAC, MPEG, M4A, OGG, WAV
This skill provides direct curl-based access to the Supadata API to extract video transcripts and scrape web content into markdown. It focuses on practical, scriptable commands for YouTube, TikTok, Instagram, X, and Facebook videos, plus site crawling and metadata extraction. Use it when you need reliable transcript text, timestamped segments, or markdown-ready page content for AI pipelines.
Calls to the Supadata REST endpoints use the x-api-key header and a simple curl pattern. You pass a target URL (or a JSON job payload for crawls) and choose modes such as native (existing transcript only) or generate (AI-generated), plus text=true/false for plain text vs timestamped chunks. Results are returned synchronously (200) or asynchronously (202 with jobId) for long-running crawls, and many endpoints return JSON ready for jq or further scripting.
How do I avoid extra credits when possible?
Use mode=native to fetch only existing transcripts; generate mode triggers AI transcription and costs more credits.
What response indicates an async job?
HTTP 202 with a JSON object containing jobId; poll the /web/crawl/<jobId> endpoint until status is completed.
Which formats and size limits are supported?
Supported file formats include MP4, WEBM, MP3, FLAC, MPEG, M4A, OGG, WAV and direct file URLs must be under 1GB.