home / skills / vm0-ai / vm0-skills / supadata

supadata skill

/supadata

This skill helps you extract transcripts and scrape web content using the Supadata API via curl, delivering ready-to-use markdown and metadata.

npx playbooks add skill vm0-ai/vm0-skills --skill supadata

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
6.4 KB
---
name: supadata
description: Supadata API via curl. Use this skill to extract transcripts from YouTube/TikTok/Instagram videos and scrape web content to markdown.
vm0_secrets:
  - SUPADATA_API_KEY
---

# Supadata API

Use the Supadata API via direct `curl` calls to **extract video transcripts** and **scrape web content** for AI applications.

> Official docs: `https://docs.supadata.ai/`

---

## When to Use

Use this skill when you need to:

- **Extract transcripts** from YouTube, TikTok, Instagram, X (Twitter), Facebook videos
- **Scrape web pages** to markdown format for AI processing
- **Get video/channel metadata** from social platforms
- **Crawl websites** to extract content from multiple pages

---

## Prerequisites

1. Sign up at [Supadata Dashboard](https://dash.supadata.ai/)
2. API key is automatically generated on signup (no credit card required)
3. Store your API key in environment variable

```bash
export SUPADATA_API_KEY="your-api-key"
```

### Pricing

- Transcript fetch (existing): 1 credit
- Transcript generation (AI): 2 credits/minute
- Free tier available

---


> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"' | jq .
> ```

## How to Use

All examples below assume you have `SUPADATA_API_KEY` set.

The base URL for the API is:

- `https://api.supadata.ai/v1`

Authentication uses the `x-api-key` header.

---

### 1. Get YouTube Video Transcript

Extract transcript from a YouTube video:

Write to `/tmp/supadata_url.txt`:

```
https://www.youtube.com/watch?v=dQw4w9WgXcQ
```

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"'
```

**Parameters:**

- `url`: Video URL (required)
- `text`: Return plain text (`true`) or timestamped chunks (`false`, default)
- `lang`: Preferred language (ISO 639-1 code, e.g., `en`, `zh`)
- `mode`: `native` (existing only), `generate` (AI), `auto` (default)

---

### 2. Get Transcript with Timestamps

Get transcript with timing information:

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=false"' | jq '.content[:3]'
```

Response format:
```json
{
  "content": [
  {"text": "Hello", "offset": 0, "duration": 1500, "lang": "en"}
  ],
  "lang": "en",
  "availableLangs": ["en", "es", "zh"]
}
```

---

### 3. Get TikTok/Instagram/X Transcript

Extract transcript from other platforms:

```bash
# TikTok
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"'

# Instagram Reel
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true"'
```

Supported platforms: YouTube, TikTok, Instagram, X (Twitter), Facebook

---

### 4. Native Transcript Only (Save Credits)

Fetch only existing transcripts without AI generation:

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/transcript" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "text=true" -d "mode=native"'
```

Use `mode=native` to avoid AI generation costs (1 credit vs 2 credits/min).

---

### 5. Get YouTube Channel Metadata

Get channel information:

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/youtube/channel" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "id=@mkbhd"' | jq '{name, subscriberCount, videoCount}
```

Accepts channel URL, channel ID, or handle (e.g., `@mkbhd`).

---

### 6. Get YouTube Video Metadata

Get video information:

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/youtube/video" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt"' | jq '{title, viewCount, likeCount, duration}
```

---

### 7. Get Social Media Metadata

Get metadata from any supported platform:

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/metadata" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt"'
```

Works with YouTube, TikTok, Instagram, X, Facebook posts.

---

### 8. Scrape Web Page to Markdown

Extract web page content:

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/web/scrape" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt"'
```

Returns page content in Markdown format, ideal for AI processing.

---

### 9. Map Website Links

Get all links from a website:

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/web/map" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt"' | jq '.urls[:10]'
```

---

### 10. Crawl Website (Async)

Start a crawl job for multiple pages.

Write to `/tmp/supadata_request.json`:

```json
{
  "url": "https://example.com",
  "maxPages": 10
}
```

Then run:

```bash
# Start crawl
JOB_ID="$(bash -c 'curl -s "https://api.supadata.ai/v1/web/crawl" -X POST -H "x-api-key: ${SUPADATA_API_KEY}" -H "Content-Type: application/json" -d @/tmp/supadata_request.json' | jq -r '.jobId')"

echo "Job ID: ${JOB_ID}"

# Check status
bash -c 'curl -s "https://api.supadata.ai/v1/web/crawl/<your-job-id>" -H "x-api-key: ${SUPADATA_API_KEY}"' | jq '{status, pagesCompleted}'
```

Status values: `queued`, `active`, `completed`, `failed`

---

### 11. Translate Transcript

Translate a YouTube transcript to another language:

```bash
bash -c 'curl -s "https://api.supadata.ai/v1/youtube/transcript/translate" -H "x-api-key: ${SUPADATA_API_KEY}" -G --data-urlencode "url@/tmp/supadata_url.txt" -d "lang=zh" -d "text=true"'
```

---

## Response Handling

**Synchronous (HTTP 200):** Direct result returned.

**Asynchronous (HTTP 202):** Returns `jobId` for polling:
```json
{"jobId": "abc123"}
```

Poll the job endpoint until status is `completed`.

---

## Guidelines

1. **Use `mode=native` to save credits**: Only fetches existing transcripts
2. **URL encode parameters**: Use `--data-urlencode` for URLs
3. **Check available languages**: Response includes `availableLangs` array
4. **Handle async responses**: Some requests return job IDs for polling
5. **Max file size**: 1GB for direct file URLs
6. **Supported formats**: MP4, WEBM, MP3, FLAC, MPEG, M4A, OGG, WAV

Overview

This skill provides direct curl-based access to the Supadata API to extract video transcripts and scrape web content into markdown. It focuses on practical, scriptable commands for YouTube, TikTok, Instagram, X, and Facebook videos, plus site crawling and metadata extraction. Use it when you need reliable transcript text, timestamped segments, or markdown-ready page content for AI pipelines.

How this skill works

Calls to the Supadata REST endpoints use the x-api-key header and a simple curl pattern. You pass a target URL (or a JSON job payload for crawls) and choose modes such as native (existing transcript only) or generate (AI-generated), plus text=true/false for plain text vs timestamped chunks. Results are returned synchronously (200) or asynchronously (202 with jobId) for long-running crawls, and many endpoints return JSON ready for jq or further scripting.

When to use it

  • You need a plain-text or timestamped transcript from a YouTube, TikTok, Instagram, X, or Facebook video.
  • You want to scrape a web page and receive a markdown-formatted version for downstream AI processing.
  • You require video or channel metadata (views, likes, duration, subscriber count) for analytics.
  • You need to crawl multiple pages and aggregate content asynchronously for indexing or training data.
  • You want to translate existing transcripts into another language for localization.

Best practices

  • Set SUPADATA_API_KEY as an environment variable and use bash -c '...' when piping to avoid environment clearing bugs.
  • Use mode=native to save credits when existing transcripts are available; use generate only when necessary.
  • Always URL-encode URLs with --data-urlencode to avoid malformed requests.
  • Handle async responses by polling the job endpoint until status is completed.
  • Check availableLangs in the response before requesting translations or a specific language.

Example use cases

  • Fetch a YouTube video's plain transcript, save as text, and feed into an LLM summarization pipeline.
  • Scrape a product documentation page to markdown for ingestion into a vector database.
  • Start a site crawl to collect blog posts, then poll the job until completed and export results.
  • Extract timestamped segments from a tutorial video to build chapter markers or highlight reels.
  • Translate a popular interview transcript into Chinese for localized publishing.

FAQ

How do I avoid extra credits when possible?

Use mode=native to fetch only existing transcripts; generate mode triggers AI transcription and costs more credits.

What response indicates an async job?

HTTP 202 with a JSON object containing jobId; poll the /web/crawl/<jobId> endpoint until status is completed.

Which formats and size limits are supported?

Supported file formats include MP4, WEBM, MP3, FLAC, MPEG, M4A, OGG, WAV and direct file URLs must be under 1GB.