home / skills / sounder25 / google-antigravity-skills-library / 11_llmstxt_doc_parsing

11_llmstxt_doc_parsing skill

/11_llmstxt_doc_parsing

This skill rapidly ingests llms.txt documentation to provide fast, accurate context for libraries and APIs.

npx playbooks add skill sounder25/google-antigravity-skills-library --skill 11_llmstxt_doc_parsing

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
2.3 KB
---
name: llms.txt & Doc Parsing
description: Rapidly ingest documentation via the /llms.txt standard to gain "fast-track" understanding of libraries without scraping entire sites.
version: 1.0.0
author: Antigravity Skills Library
created: 2026-01-16
leverage_score: 5/5
---

# SKILL-011: llms.txt & Doc Parsing

## Overview

Executes "Rapid Documentation Mastery" by locating and consuming the `llms.txt` file from a documentation site. This file provides a curated map of markdown files optimized for LLM consumption, allowing the agent to instantly master a framework or API.

## Trigger Phrases

- `read docs for <url>`
- `ingest llms.txt`
- `learn <library> fast`
- `parse documentation`

## Inputs

| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `--url` | string | Yes | - | Base URL of the project documentation (e.g., `https://docs.example.com`) |
| `--output-dir` | string | No | `.docs` | Directory to save ingested documentation |
| `--max-files` | int | No | 10 | Limit number of referenced files to fetch |

## Outputs

### 1. DOCS_INDEX.json

Metdata about what was ingested:
```json
{
  "source_url": "https://docs.example.com/llms.txt",
  "project_name": "Example Lib",
  "ingested_files": [
    { "path": "overview.md", "tokens": 1200 },
    { "path": "api-reference.md", "tokens": 4500 }
  ],
  "total_tokens": 5700
}
```

### 2. CONSOLIDATED_KNOWLEDGE.md

A single, optimized markdown file containing the "fast-track" documentation content defined in `llms.txt`.

## Preconditions

1. Target site must have an `/llms.txt` file (or user must provide direct link).
2. Internet access required.

## Implementation

### Script: fetch_docs.ps1

1. Checks `url/llms.txt` and `url/llms-full.txt`.
2. Parses the typical `llms.txt` format:
   ```text
   # Project Name
   > Project Description
   
   - [Title](link) - Description
   - [API](link) - Main API docs
   ```
3. Fetches the linked markdown files.
4. Concatenates them into `CONSOLIDATED_KNOWLEDGE.md`.
5. Prompts the agent to read this single file.

## Integration

**Agent Workflow:**
1. User asks: "How do I use this new generic web3 library?"
2. Agent runs: `.\skills\08_llmstxt_doc_parsing\fetch_docs.ps1 -Url "https://generic-web3.io"`
3. Script outputs: `CONSOLIDATED_KNOWLEDGE.md`
4. Agent reads file and answers user instantly.

Overview

This skill rapidly ingests project documentation via the /llms.txt convention to give an agent a fast-track understanding of libraries and APIs. It fetches a curated list of markdown files, consolidates them into a single knowledge artifact, and enables immediate, accurate responses without scraping entire sites. Implemented as a PowerShell utility, it outputs a machine-readable index and an optimized consolidated doc.

How this skill works

The script checks the provided base URL for llms.txt or llms-full.txt and parses the file for project metadata and markdown links. It then fetches the referenced markdown files (up to the configured limit), counts tokens for each file, and writes DOCS_INDEX.json plus CONSOLIDATED_KNOWLEDGE.md. After consolidation, the agent reads the single file for instant familiarity with the library or API.

When to use it

  • You need rapid familiarity with a new framework or API without crawling the entire site
  • Documentation providers publish an llms.txt manifest to speed LLM consumption
  • Preparing a code assistant to answer detailed questions about a library before coding
  • Onboarding agents to third-party SDKs during runtime or in CI pipelines

Best practices

  • Provide the exact project docs base URL or the direct llms.txt link to avoid redirects
  • Set a sensible --max-files to balance coverage vs. fetch time (default 10)
  • Use a dedicated output directory (default .docs) to keep artifacts isolated
  • Validate that the target site intentionally exposes llms.txt to respect owner intent
  • Review DOCS_INDEX.json after fetch to confirm token counts and included files

Example use cases

  • Agent prepares to answer API usage questions by ingesting a library’s llms.txt and producing a consolidated cheat sheet
  • CI job runs the fetch before integration tests so agents have authoritative API docs available
  • Support engineers rapidly build a knowledge snapshot for troubleshooting third-party SDK behavior
  • Security review: fetch manifest and scan consolidated content for deprecated or risky API usage

FAQ

What if the site has no llms.txt?

Provide the direct link to a curated manifest or a list of markdown files; without an llms.txt the script cannot auto-discover files.

Can I change how many files are fetched?

Yes — use the --max-files parameter to limit the number of referenced files fetched and consolidated.

Does this respect robots or site permissions?

The script performs standard HTTP fetches; ensure you have permission to retrieve and store the documentation before running it.