home / skills / terrylica / cc-skills / article-extractor

article-extractor skill

safe

/plugins/mql5/skills/article-extractor

This skill extracts MQL5 articles and documentation from mql5.com to build a high-quality training corpus.

npx playbooks add skill terrylica/cc-skills --skill article-extractor

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

3.5 KB

---
name: article-extractor
description: Extract MQL5 articles and documentation. TRIGGERS - MQL5 articles, MetaTrader docs, mql5.com resources.
allowed-tools: Read, Bash, Grep, Glob
---

# MQL5 Article Extractor

Extract technical trading articles from mql5.com for training data collection. **Scope limited to mql5.com domain only.**

## When to Use This Skill

Use this skill when:

- Extracting articles from mql5.com for reference or training data
- Downloading MQL5 documentation and tutorials
- Collecting trading articles from specific MQL5 users
- Building a corpus of MQL5 programming examples

## Scope Boundaries

**VALID requests:**

- "Extract this mql5.com article: <https://www.mql5.com/en/articles/19625>"
- "Get all articles from MQL5 user 29210372"
- "Download trading articles from mql5.com"
- "Extract 5 MQL5 articles for testing"

**OUT OF SCOPE:**

- "Extract from yahoo.com" - NOT SUPPORTED (mql5.com only)
- "Scrape news from reuters" - NOT SUPPORTED (mql5.com only)
- "Get stock data from Bloomberg" - NOT SUPPORTED (mql5.com only)

If user requests non-mql5.com extraction, respond: "This skill extracts articles from mql5.com ONLY. For other sites, use different tools."

## Repository Location

Working directory: `$HOME/eon/mql5` (adjust path for your environment)

Always execute commands from this directory:

```bash
cd "$HOME/eon/mql5"
```

## Valid Input Types

### 1. Article URL (Most Specific)

**Format**: `https://www.mql5.com/en/articles/[ID]`
**Example**: `https://www.mql5.com/en/articles/19625`
**Action**: Extract single article

### 2. User ID (Numeric or Username)

**Format**: Numeric (e.g., `29210372`) or username (e.g., `jslopes`)
**Source**: From mql5.com profile URL
**Action**: Auto-discover and extract all user's articles

### 3. URL List File

**Format**: Text file with one URL per line
**Action**: Batch process multiple articles

### 4. Vague Request

If user says "extract mql5 articles" without specifics, prompt for:

1. Article URL OR User ID
1. Quantity limit (for testing)
1. Output location preference

---

## Reference Documentation

For detailed information, see:

- [Extraction Modes](./references/extraction-modes.md) - Single, batch, auto-discovery, official docs modes
- [Data Sources](./references/data-sources.md) - User collections and official documentation
- [Troubleshooting](./references/troubleshooting.md) - Common issues and solutions
- [Examples](./references/examples.md) - Usage examples and patterns

---

## Troubleshooting

| Issue                  | Cause                         | Solution                                          |
| ---------------------- | ----------------------------- | ------------------------------------------------- |
| Non-mql5.com URL       | Skill only supports mql5.com  | Use other tools for non-mql5.com sites            |
| Article not found      | Invalid article ID or removed | Verify URL exists by visiting in browser          |
| User ID not recognized | Wrong user ID format          | Use numeric ID from profile URL or exact username |
| Empty extraction       | Rate limiting or site change  | Wait and retry, check for site structure changes  |
| Permission denied      | Working directory mismatch    | Run from $HOME/eon/mql5 directory                 |
| Batch too large        | Too many articles requested   | Limit batch size, use URL list file               |
| Missing dependencies   | Required tools not installed  | Install curl, jq for extraction                   |
| Output encoding issues | Unicode in article content    | Ensure UTF-8 output handling                      |

Overview

This skill extracts technical articles, documentation, and tutorials from mql5.com to build reference corpora or training datasets. It focuses exclusively on MQL5 domain content and supports single-article extraction, user-based harvesting, and batch URL lists. The tool is designed for reproducible data collection and local export of article text and metadata.

How this skill works

The extractor accepts an article URL, a numeric or username user ID, or a text file with article URLs. It discovers article pages on mql5.com, fetches HTML, parses article body, metadata (title, author, date, tags), and saves structured output for downstream use. Requests outside mql5.com are rejected; common helpers and CLI utilities are used to run jobs from the project working directory.

When to use it

You need a clean corpus of MQL5 articles or tutorials for training or analysis.
You want to download all articles published by a specific MQL5 user.
You need to batch-extract a curated list of MQL5 article URLs.
You require structured article metadata alongside the article text.
You are preparing test data or examples for a trading/EA development tool.

Best practices

Always run extraction from the project working directory (cd "$HOME/eon/mql5") to avoid path issues.
Prefer a numeric user ID from the profile URL for reliable user discovery.
Limit batch sizes to avoid rate limiting; test with a small sample first.
Verify each URL in a browser if an article returns not found before retrying.
Ensure required CLI tools (curl, jq) and UTF-8 handling are installed to prevent encoding problems.

Example use cases

Extract a single article: provide https://www.mql5.com/en/articles/19625 to get the article text and metadata.
Harvest all articles from user 29210372 to build a personalized example set.
Process a text file with 50 mql5.com article URLs to create a local dataset for model fine-tuning.
Download official MQL5 documentation pages for offline reference and code examples.
Collect tutorial series from a specific author to analyze coding patterns and snippets.

FAQ

Can this skill extract from websites other than mql5.com?

This skill extracts articles from mql5.com ONLY. For other sites, use different tools.

What input formats are supported?

Supported inputs are a single MQL5 article URL, a numeric or username user ID for auto-discovery, or a text file with one mql5.com URL per line.