home / skills / git-fg / thecattoolkit / ingesting-git

This skill converts Git repositories into structured plain-text digests optimized for large language model analysis and rapid insights.

npx playbooks add skill git-fg/thecattoolkit --skill ingesting-git

Review the files below or copy the command above to add this skill to your agents.

Files (10)
SKILL.md
3.6 KB
---
name: ingesting-git
description: "Transforms repositories into structured plain-text digests optimized for LLM consumption. Use when analyzing GitHub repositories, digesting codebases, or ingesting git repos for AI analysis."
allowed-tools: [Read, Write, Edit, Bash, Grep]
---

# GitIngest Protocol



## Quick Start

**Execute via Script:**
```bash
uv run --with gitingest scripts/ingest.py <url_or_path> [options]
```

**Examples:**
```bash
# Ingest remote repo
uv run --with gitingest scripts/ingest.py https://github.com/user/repo

# Ingest with filtering
uv run --with gitingest scripts/ingest.py . -i "*.py" -e "tests/*"
```

## Output Format

GitIngest returns **structured plain-text** optimized for LLM consumption with three distinct sections:

**Section 1: Repository Summary**
```
Repository: owner/repo-name
Files analyzed: 42
Estimated tokens: 15.2k
```

**Section 2: Directory Structure**
```
Directory structure:
└── project-name/
    ├── src/
    │   ├── main.py
    │   └── utils.py
    ├── tests/
    │   └── test_main.py
    └── README.md
```

**Section 3: File Contents**
```
================================================
FILE: src/main.py
================================================
def hello_world():
    print("Hello, World!")
```

## Configuration Options

| Option | Purpose | Example |
|:-------|:--------|:--------|
| `-i` / `--include-pattern` | Include files matching patterns | `-i "*.py" -i "*.js"` |
| `-e` / `--exclude-pattern` | Exclude files matching patterns | `-e "node_modules/*"` |
| `-s` / `--max-size` | Maximum file size in bytes | `-s 102400` |
| `-b` / `--branch` | Specify branch | `-b main` |
| `-t` / `--token` | GitHub access token | `-t $GITHUB_TOKEN` |
| `-o` | Output file (or `-` for stdout) | `-o digest.txt` |

## Common Exclude Patterns

```
node_modules/*          # Dependencies
*.log                   # Log files
dist/*                  # Build outputs
build/*                 # Build directories
*.min.js                # Minified files
*.lock                  # Lock files
```

## Implementation Protocol

When executing the gitingest skill:

1. **Assess Requirements**
   - Determine if CLI or Python integration is needed
   - Identify repository size and scope
   - Plan filtering strategy (include/exclude patterns)

2. **Setup Environment**
   - Verify gitingest installation
   - Check authentication for private repositories
   - Configure output destination

3. **Execute Ingestion**
   - Run gitingest with appropriate parameters
   - Monitor for errors and timeouts
   - Apply filtering and size limits

4. **Process Results**
   - Parse the three-section output format
   - Analyze summary, tree, and content
   - Generate insights and reports

## Extended Documentation

For detailed integration examples, error handling patterns, and best practices:
- **Integration Examples:** `references/integration-examples.md`



## Integration with CatToolkit

**Usage Examples:**
```bash
# "Ingest this repository for AI analysis"
# → Uses gitingest to create structured digest

# "Analyze the codebase without dependencies"
# → Uses gitingest with exclude-patterns for node_modules, dist, etc.

# "Generate documentation from this repo"
# → Uses gitingest + filtering to extract docs and code structure
```

The gitingest skill integrates seamlessly with other CatToolkit skills:
- **deep-analysis**: Process gitingest output for comprehensive insights
- **software-engineering**: Analyze ingested code for quality and security
- **prompt-engineering**: Use repository context to generate better prompts

Overview

This skill transforms Git repositories into structured plain-text digests optimized for large language model consumption. It produces a three-part output: a concise repository summary, a readable directory tree, and full file contents blocks. Use it to quickly convert codebases into LLM-friendly context for analysis, documentation, or downstream tools.

How this skill works

The skill clones or reads a repository path, applies include/exclude patterns and size limits, and then emits a three-section plain-text digest (summary, directory structure, file contents). Configuration flags let you target branches, authenticate to private repos, and control output destination. The output is intentionally simple so other tools or LLMs can parse and consume it reliably.

When to use it

  • Preparing a codebase as context for LLM-based code review or analysis
  • Generating an ingestible snapshot of a repo for automated documentation or summarization
  • Extracting repository structure and file contents while excluding dependencies or large assets
  • Feeding downstream analysis pipelines that expect plain-text inputs
  • Quickly auditing repository surface area (file counts, estimated tokens, tree)

Best practices

  • Use include patterns to limit files to relevant source and docs (e.g., *.py, *.md)
  • Exclude node_modules, build/dist, and large binary files to reduce token usage
  • Set a sensible max-size per file to avoid very large blobs in the digest
  • Provide a GitHub token for private repositories and to avoid rate limits
  • Run ingestion on the specific branch you want analyzed to get accurate context

Example use cases

  • Ingest a public GitHub repo to generate a digest for an LLM-driven architecture summary
  • Filter and ingest only documentation and source files to produce docs-ready content
  • Create a reproducible plain-text snapshot for automated security or quality scanning
  • Combine with deeper analysis skills to produce prioritized code issues and remediation steps
  • Feed the digest into prompt-engineering workflows to craft precise, repo-aware prompts

FAQ

What does the output look like?

Three plain-text sections: a short repository summary, an indented directory tree, and labeled file-content blocks separated by delimiters.

How do I avoid ingesting dependencies or large files?

Use exclude-patterns (e.g., node_modules/*, dist/*), include-patterns to whitelist file types, and set a per-file max-size option.