home / skills / glebis / claude-skills / firecrawl-research

firecrawl-research skill

/firecrawl-research

npx playbooks add skill glebis/claude-skills --skill firecrawl-research

Review the files below or copy the command above to add this skill to your agents.

Files (9)
SKILL.md
8.6 KB
---
name: firecrawl-research
description: This skill should be used when the user requests to research topics using FireCrawl, enrich notes with web sources, search and scrape information, or write scientific/academic papers. It extracts research topics from markdown files, creates research documents with scraped sources, generates BibTeX bibliographies from research results, and provides Pandoc/MyST templates for academic writing with citation management.
---

# FireCrawl Research

## Overview

Enrich research documents by automatically searching and scraping web sources using the FireCrawl API. Extract research topics from markdown files and generate comprehensive research documents with source material.

## When to Use This Skill

Use this skill when the user:
- Says "Research this topic using FireCrawl"
- Requests to enrich notes or documents with web sources
- Wants to gather information about topics listed in a markdown file
- Needs to search and scrape multiple topics systematically

## How It Works

### 1. Topic Extraction

The script automatically extracts research topics from markdown files using two methods:

**Method 1: Headers**
```markdown
## Spatial Reasoning in AI
### Computer Vision Applications
```
Both `Spatial Reasoning in AI` and `Computer Vision Applications` become research topics.

**Method 2: Research Tags**
```markdown
- [research] Large Language Models for robotics
- [search] Theory of Mind in autonomous driving
```
Both tagged items become research topics.

### 2. Search and Scrape

For each topic:
1. Searches FireCrawl with the topic as query
2. Retrieves up to N results (default: 5)
3. Automatically scrapes full content from each result
4. Extracts markdown-formatted content (main content only)

### 3. Output Generation

Creates new markdown files in the specified output directory:
- One file per topic
- Filename: `{topic}_{timestamp}.md`
- Contains: title, date, sources count, full scraped content
- Each source includes: title, URL, markdown content

## Usage

### Basic Usage

```bash
python scripts/firecrawl_research.py research.md
```

Outputs to current directory.

### Specify Output Directory

```bash
python scripts/firecrawl_research.py research.md ./output
```

Creates files in `./output/` folder.

### Limit Results Per Topic

```bash
python scripts/firecrawl_research.py research.md ./output 3
```

Retrieves maximum 3 results per topic.

## Configuration

### API Key Setup

1. Copy `.env.example` to `.env`:
   ```bash
   cp .env.example .env
   ```

2. Add FireCrawl API key:
   ```
   FIRECRAWL_API_KEY=fc-your-actual-api-key
   ```

The script automatically loads the API key from the skill's `.env` file.

### Rate Limiting

The script includes automatic rate limiting for FireCrawl's free tier:
- **Free tier limit:** 5 requests/minute
- **Built-in delay:** 12 seconds between topics
- Prevents API errors and credit exhaustion

When processing multiple topics, expect:
- 5 topics: ~1 minute
- 10 topics: ~2 minutes
- 20 topics: ~4 minutes

## Workflow Example

**User request:** "Research these AI topics using FireCrawl"

**Input file (`ai-research.md`):**
```markdown
# AI Research Topics

## Spatial Reasoning in Vision-Language Models

- [research] Embodied AI for robotics
- [research] Computer Use Agents
```

**Command:**
```bash
python scripts/firecrawl_research.py ai-research.md ./research_output 5
```

**Output:**
```
research_output/
├── Spatial_Reasoning_in_Vision-Language_Models_20251122_140530.md
├── Embodied_AI_for_robotics_20251122_140542.md
└── Computer_Use_Agents_20251122_140554.md
```

Each file contains:
- Topic title
- Timestamp
- Source count
- Full scraped content from up to 5 sources
- Source URLs

## Common Patterns

### Pattern 1: Quick Research
Extract topics from existing notes, research them, save to current folder:
```bash
python scripts/firecrawl_research.py my-notes.md
```

### Pattern 2: Organized Research
Create dedicated output folder for research results:
```bash
python scripts/firecrawl_research.py topics.md ./research_results
```

### Pattern 3: Deep Dive
Increase results per topic for comprehensive coverage:
```bash
python scripts/firecrawl_research.py topics.md ./deep_research 10
```

### Pattern 4: Obsidian Vault Integration
Direct output to vault's research folder:
```bash
python scripts/firecrawl_research.py topics.md ~/Brains/brain/Research
```

## Error Handling

### "API key not found"
Create `.env` file in skill folder with `FIRECRAWL_API_KEY=...`

### "Rate limit exceeded"
- Free tier: 5 req/min
- Script has 12s delay built-in
- If still hitting limit, reduce topics or wait between runs

### "Insufficient credits"
- Check FireCrawl account credits
- Upgrade plan or wait for credit reset

### "No topics found"
Add topics to markdown using:
- `## Header format`
- `- [research] Topic format`
- `- [search] Topic format`

## Script Details

**Location:** `scripts/firecrawl_research.py`

**Dependencies:**
- `python-dotenv` - Environment variable management
- `requests` - HTTP requests to FireCrawl API

**Install dependencies:**
```bash
pip install python-dotenv requests
```

**FireCrawl Features Used:**
- `/v1/search` endpoint - Search with automatic scraping
- `scrapeOptions.formats: ['markdown']` - Markdown output
- `scrapeOptions.onlyMainContent: true` - Filter noise

## Academic Writing Templates

This skill includes templates for writing scientific papers in markdown format.

### Available Templates

**1. Pandoc Scholarly Paper** (`assets/templates/pandoc-scholarly-paper.md`)
- Standard academic paper format
- Compatible with Pandoc converter
- Supports citations via BibTeX
- Exports to PDF, DOCX, HTML

**2. MyST Scientific Paper** (`assets/templates/myst-scientific-paper.md`)
- MyST (Markedly Structured Text) format
- Advanced cross-referencing
- Professional scientific publishing
- Multi-format export (PDF, LaTeX, DOCX)

### Using Templates

**Copy template to your project:**
```bash
cp assets/templates/pandoc-scholarly-paper.md my-paper.md
# or
cp assets/templates/myst-scientific-paper.md my-paper.md
```

**Edit content:**
- Update YAML frontmatter (title, authors, affiliations)
- Write your content in sections
- Add citations using `[@AuthorYear]` (Pandoc) or `{cite}\`AuthorYear\`` (MyST)

**Convert to PDF/DOCX:**
```bash
python scripts/convert_academic.py my-paper.md pdf
python scripts/convert_academic.py my-paper.md docx
python scripts/convert_academic.py my-paper.md pdf --myst  # For MyST
```

### Bibliography Generation

Convert FireCrawl research results into BibTeX bibliography entries:

```bash
python scripts/generate_bibliography.py research_output/*.md -o references.bib
```

**What it does:**
- Extracts URLs and titles from FireCrawl markdown files
- Generates BibTeX `@misc` entries
- Creates citation keys automatically
- Adds access dates

**Example workflow:**
```bash
# 1. Research topics
python scripts/firecrawl_research.py topics.md ./research

# 2. Generate bibliography
python scripts/generate_bibliography.py research/*.md -o refs.bib

# 3. Copy template
cp assets/templates/pandoc-scholarly-paper.md paper.md

# 4. Edit paper.md (add content, cite sources)

# 5. Convert to PDF
python scripts/convert_academic.py paper.md pdf
```

### Citation Examples

**Pandoc syntax:**
```markdown
Recent research [@Smith2024] shows...
Multiple studies [@Jones2023; @Brown2024] indicate...
```

**MyST syntax:**
```markdown
Recent research {cite}`Smith2024` shows...
Multiple studies {cite}`Jones2023,Brown2024` indicate...
```

### Example Bibliography File

An example bibliography is provided in `assets/references.bib` with common entry types:
- Journal articles (`@article`)
- Conference papers (`@inproceedings`)
- Books (`@book`)
- PhD theses (`@phdthesis`)
- Web resources (`@misc`)
- Preprints (`@article` with arXiv)

## Tips

1. **Organize topics hierarchically** - Use `##` for main topics, `###` for subtopics
2. **Use descriptive names** - Topic text becomes filename, make it clear
3. **Batch processing** - Group related topics in one file for efficiency
4. **Output organization** - Create separate folders for different research projects
5. **Content review** - Results are truncated at 3000 chars/source for readability
6. **Academic workflow** - Use bibliography generator to cite research sources in papers
7. **Template customization** - Modify templates for your field's citation style

## Limitations

- **No summarization** - Returns raw scraped content, not summaries
- **No deduplication** - Duplicate sources may appear across topics
- **No quality ranking** - All results treated equally
- **New files only** - Does not append to existing files
- **Free tier constraints** - Rate limiting affects processing speed