home / skills / yonatangross / orchestkit / web-research-workflow

web-research-workflow skill

safe

/plugins/ork/skills/web-research-workflow

This skill automates web research by selecting WebFetch, Tavily, or agent-browser based on site traits, enabling efficient extraction and competitive

npx playbooks add skill yonatangross/orchestkit --skill web-research-workflow

Review the files below or copy the command above to add this skill to your agents.

Files (8)

SKILL.md

4.4 KB

---
name: web-research-workflow
license: MIT
compatibility: "Claude Code 2.1.34+. Requires network access."
description: Unified decision tree for web research and competitive monitoring. Auto-selects WebFetch, Tavily, or agent-browser based on target site characteristics and available API keys. Includes competitor page tracking, snapshot diffing, and change alerting. Use when researching web content, scraping, extracting raw markdown, capturing documentation, or monitoring competitor changes.
context: fork
agent: web-research-analyst
version: 1.3.0
author: OrchestKit AI Agent Hub
tags: [research, browser, webfetch, tavily, automation, scraping, content-extraction, competitive-intelligence, monitoring]
user-invocable: false
allowed-tools: [Bash, Read, Write, WebFetch]
complexity: low
metadata:
  category: mcp-enhancement
---

# Web Research Workflow

Unified approach for web content research that automatically selects the right tool for each situation.

## Quick Decision Tree

```
URL to research
     │
     ▼
┌─────────────────┐
│ 1. Try WebFetch │ ← Fast, free, no overhead
│    (always try) │
└─────────────────┘
     │
Content OK? ──Yes──► Parse and return
     │
     No (empty/partial/<500 chars)
     │
     ▼
┌───────────────────────┐
│ 2. TAVILY_API_KEY set?│
└───────────────────────┘
     │          │
    Yes         No ──► Skip to step 3
     │
     ▼
┌───────────────────────────┐
│ Tavily search/extract/    │ ← Raw markdown, batch URLs
│ crawl/research            │
└───────────────────────────┘
     │
Content OK? ──Yes──► Parse and return
     │
     No (JS-rendered/auth-required)
     │
     ▼
┌─────────────────────┐
│ 3. Use agent-browser │ ← Full browser, last resort
└─────────────────────┘
     │
├─ SPA (react/vue/angular) ──► wait --load networkidle
├─ Login required ──► auth flow + state save
├─ Dynamic content ──► wait --text "Expected"
└─ Multi-page ──► crawl pattern
```

## Tavily Enhanced Research

When `TAVILY_API_KEY` is set, Tavily provides a powerful middle tier between WebFetch and agent-browser. It returns raw markdown content, supports batch URL extraction, and offers semantic search with relevance scoring. If `TAVILY_API_KEY` is not set, the 3-tier tree collapses to 2-tier (WebFetch → agent-browser) automatically.

See [Tool Selection](rules/tool-selection.md) for when-to-use-what tables, escalation heuristics, SPA detection patterns, and cost awareness.

See [Tavily API Reference](references/tavily-api.md) for Search, Extract, Map, Crawl, and Research endpoint examples and options.

## Browser Patterns

For content requiring JavaScript rendering, authentication, or multi-page crawling, fall back to agent-browser.

See [Browser Patterns](rules/browser-patterns.md) for auto-fallback, authentication flow, multi-page research patterns, best practices, and troubleshooting.

## Competitive Monitoring

Track competitor websites for changes in pricing, features, positioning, and content.

See [Competitor Page Monitoring](rules/monitoring-competitor.md) for snapshot capture, structured data extraction, and change classification.

See [Change Detection & Discovery](rules/monitoring-change-detection.md) for diff detection, structured comparison, Tavily site discovery, and CI automation.

### Change Classification

| Severity | Examples | Action |
|----------|----------|--------|
| Critical | Price increase/decrease, major feature change | Immediate alert |
| High | New feature added, feature removed | Review required |
| Medium | Copy changes, positioning shift | Note for analysis |
| Low | Typos, minor styling | Log only |

## Integration with Agents

This skill is used by:
- `web-research-analyst` - Primary user
- `market-intelligence` - Competitor research
- `product-strategist` - Deep competitive analysis
- `ux-researcher` - Design system capture
- `documentation-specialist` - API doc extraction

## Related Skills

- `browser-content-capture` - Detailed browser patterns
- `agent-browser` - CLI reference

---

**Version:** 1.3.0 (February 2026)

Overview

This skill implements a unified decision tree for web research and competitive monitoring that automatically picks the right tool for each target URL. It prioritizes a fast, low-cost fetch, escalates to a middle-tier extraction service when an API key is available, and falls back to a full browser agent when JavaScript, authentication, or multi-page crawling is required. It also includes competitor page tracking, snapshot diffing, and change alerting for production monitoring.

How this skill works

Given a URL, the workflow first attempts a lightweight WebFetch to retrieve raw content. If the fetch is incomplete or under a content threshold and a Tavily API key is present, it routes to Tavily for raw markdown extraction, batch processing, and semantic search. When content requires JS rendering, login, or complex crawling, it uses a full agent-browser session with patterns for SPAs, auth flows, network idle waits, and multi-page crawls. Change detection captures snapshots, computes diffs, classifies severity, and emits alerts based on configured rules.

When to use it

Quickly research web pages and extract raw markdown or structured text.
Batch extract content or run semantic search when Tavily API key is available.
Monitor competitor pages for pricing, feature, or content changes over time.
Capture documentation, changelogs, or product pages that may require JS rendering.
Escalate automatically from fast fetch to browser-only when content is dynamic or behind auth.

Best practices

Always attempt WebFetch first to save time and cost; use Tavily for richer extraction when available.
Keep expected-content heuristics (min length, selectors, text tokens) tuned to reduce unnecessary browser sessions.
Configure authentication state saving for repeatable, reliable agent-browser runs on gated sites.
Define change severity rules (critical/high/medium/low) aligned with business impact for actionable alerts.
Use batch extraction and semantic relevance scoring for large-scale site discovery and competitor snapshots.

Example use cases

Extract API docs or developer guides into raw markdown for RAG pipelines.
Continuously monitor competitor pricing pages and trigger immediate alerts for critical changes.
Crawl single-page applications to capture rendered content and UX copy for design audits.
Run scheduled batch extractions of industry sites using Tavily for relevance-ranked results.
Detect and classify content diffs across product pages as part of CI or market intelligence workflows.

FAQ

What happens if Tavily API key is not set?

The workflow collapses to WebFetch first and agent-browser as the fallback; batch markdown and Tavily-specific features are skipped.

How are change severities determined?

Diffs are classified by predefined rules (e.g., price changes = critical, new features = high, copy tweaks = medium) that can be tuned for your monitoring needs.