home / skills / hokupod / sitepanda / assets

assets skill

safe

This skill scrapes a web page using a headless browser and returns the main content as Markdown, enabling quick analysis or summarization.

npx playbooks add skill hokupod/sitepanda --skill assets

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.3 KB

---
name: sitepanda
description: >
  Scrape websites with a headless browser and extract main readable content as Markdown.
  Use this skill when the user asks to retrieve, analyze, or summarize content from a URL or website.
---

# Sitepanda (Web Scraping Tool)

## Instructions

1. When the user provides a URL or asks for website content, use Sitepanda to scrape the page.
2. By default, use the following command to scrape a single page:

   sitepanda scrape <URL> --silent --limit 1

3. If you need to perform recursive scraping (following links), you **must** ask the user for confirmation before starting, as it may take a long time.
4. Capture the output, which is returned in Markdown format.
5. Read and analyze the extracted content.
6. Respond to the user using only the relevant information from the page.
7. If the content is long, summarize or extract only the necessary sections.

## Examples

### Example 1

**User request:**
"Please summarize the article at https://example.com/blog/post-123"

**Agent behavior:**
- Use Sitepanda to scrape the page
- Read the extracted Markdown
- Summarize the main points in the response

### Example 2

**User request:**
"What does this documentation page say? https://example.com/docs"

**Agent behavior:**
- Fetch the page using Sitepanda
- Extract key sections
- Explain the content concisely

Overview

This skill uses a headless browser to scrape web pages and extract the main readable content, saving it as Markdown. It is designed for retrieving, analyzing, and summarizing page content so you can get concise, usable text from URLs quickly.

How this skill works

When given a URL, the skill loads the page in a headless browser, runs readability extraction to find the main article or content block, and outputs the result as Markdown. It can operate on a single page by default and supports recursive link-following only with explicit user confirmation. The extracted Markdown is then analyzed and condensed for direct responses.

When to use it

Summarize an article, blog post, or news item from a URL
Extract readable content for offline review or note-taking
Analyze documentation pages or tutorials hosted on a website
Convert web content to Markdown for publishing or editing
Gather a page’s main content before running further NLP tasks

Best practices

Provide the exact URL you want scraped to avoid ambiguity
Request recursive scraping only if you authorize link-following
Ask for specific sections if you don’t need the full page
Expect the output in Markdown and request further summarization if needed
Avoid scraping behind logins or paywalls without proper access

Example use cases

User asks: “Summarize the article at https://example.com/blog/post-123” — scrape and return a concise summary of the main points.
User asks: “Extract the main docs page at https://example.com/docs” — retrieve key sections and explain them plainly.
User needs a Markdown copy of a news story for editing — provide cleaned, readable Markdown.
User requests key takeaways from a long feature article — extract and list the core arguments and facts.

FAQ

Can the skill follow links and scrape an entire site?

Yes, but the skill will ask for your explicit confirmation before performing recursive scraping since it can be time-consuming and broad.

What format is the extracted content returned in?

Content is returned in Markdown by default. You can ask for summaries, plain text, or specific sections instead.