home / skills / lofcz / llmtornado / codebase-context-extractor

This skill analyzes large codebases to extract structured context, dependencies, and execution flows for debugging and modification.

npx playbooks add skill lofcz/llmtornado --skill codebase-context-extractor

Review the files below or copy the command above to add this skill to your agents.

Files (8)
SKILL.md
5.7 KB
---
name: codebase-context-extractor
description: This skill provides a comprehensive context extraction system for large codebases. It intelligently analyzes code structure, dependencies, and relationships to extract relevant context for understanding, debugging, or modifying code.
---

# Codebase Context Extractor Skill

## Overview
This skill provides a comprehensive context extraction system for large codebases. It intelligently analyzes code structure, dependencies, and relationships to extract relevant context for understanding, debugging, or modifying code.

## Trigger Words
- "extract context"
- "codebase context"
- "code context"
- "analyze codebase"
- "codebase analysis"
- "code structure"
- "dependency analysis"
- "code relationships"
- "understand codebase"
- "map codebase"

## When to Use This Skill
Use this skill when you need to:
- Understand the structure and organization of a large codebase
- Extract relevant context for a specific function, class, or module
- Analyze dependencies and relationships between code components
- Generate documentation or summaries of code sections
- Prepare context for code modifications or debugging
- Identify entry points and execution flows
- Map out API surfaces and public interfaces
- Understand data flow and state management

## Instructions

When this skill is triggered, execute the `context_extractor.py` script with appropriate parameters.

### Basic Usage
```bash
python /projects/workspace/codebase-context-extractor/context_extractor.py \
  --target-path <path_to_codebase> \
  --mode <extraction_mode> \
  --output <output_file>
```

### Extraction Modes

1. **full** - Complete codebase analysis with all components
2. **targeted** - Focus on specific files, functions, or classes
3. **dependency** - Map dependencies and imports
4. **flow** - Trace execution flows and call chains
5. **api** - Extract public interfaces and API surfaces
6. **data** - Analyze data structures and models
7. **hierarchy** - Show class hierarchies and inheritance
8. **summary** - Generate high-level overview

### Parameters

- `--target-path` (required): Path to the codebase to analyze
- `--mode` (required): Extraction mode (see above)
- `--output` (optional): Output file path (default: stdout)
- `--focus` (optional): Specific file, class, or function to focus on
- `--depth` (optional): Maximum depth for traversal (default: unlimited)
- `--include-tests` (optional): Include test files in analysis (default: false)
- `--language` (optional): Programming language (auto-detected if not specified)
- `--format` (optional): Output format (markdown, json, yaml, text) (default: markdown)
- `--exclude` (optional): Patterns to exclude (comma-separated)

### Examples

1. Full codebase analysis:
```bash
python context_extractor.py --target-path ./my-project --mode full --output context.md
```

2. Targeted analysis of a specific class:
```bash
python context_extractor.py --target-path ./my-project --mode targeted --focus "UserService" --output user_service_context.md
```

3. Dependency mapping:
```bash
python context_extractor.py --target-path ./my-project --mode dependency --format json --output dependencies.json
```

4. Execution flow analysis:
```bash
python context_extractor.py --target-path ./my-project --mode flow --focus "main" --depth 5
```

## Output Structure

The extractor generates structured output including:

### For Full/Targeted Mode
- **Project Overview**: Language, structure, entry points
- **File Organization**: Directory structure and file purposes
- **Key Components**: Important classes, functions, modules
- **Dependencies**: External and internal dependencies
- **Code Metrics**: Lines of code, complexity estimates
- **Context Summary**: High-level understanding

### For Dependency Mode
- **Dependency Graph**: Visual representation of dependencies
- **Import Analysis**: All imports and their usage
- **Circular Dependencies**: Detection and reporting
- **Unused Dependencies**: Potential cleanup targets

### For Flow Mode
- **Call Chains**: Function call sequences
- **Entry Points**: Main execution paths
- **Exit Points**: Return and error handling
- **Branch Analysis**: Conditional execution paths

### For API Mode
- **Public Interfaces**: Exported functions and classes
- **API Documentation**: Signatures and docstrings
- **Usage Examples**: How to use the API
- **Versioning Info**: API version and compatibility

## Advanced Features

### Smart Context Window Management
The extractor automatically manages context size to fit within LLM token limits:
- Prioritizes most relevant code sections
- Provides summaries for less critical parts
- Includes breadcrumb navigation for context

### Multi-Language Support
Supports analysis of:
- Python
- JavaScript/TypeScript
- Java
- C#
- Go
- Rust
- C/C++
- Ruby
- PHP
- And more (extensible)

### Intelligent Filtering
- Excludes generated code, build artifacts, and vendor directories
- Focuses on business logic and core functionality
- Configurable exclusion patterns

## Integration with Other Tools

The context extractor output can be used with:
- Documentation generators
- Code review tools
- Refactoring assistants
- Bug tracking systems
- Development environments

## Best Practices

1. **Start with Summary Mode**: Get a high-level overview before diving deep
2. **Use Targeted Mode for Specific Tasks**: Focus on relevant code sections
3. **Combine with Dependency Analysis**: Understand impact of changes
4. **Leverage Flow Analysis for Debugging**: Trace execution paths
5. **Regular Updates**: Re-run analysis as codebase evolves

## Notes

- Large codebases may take time to analyze
- Consider using depth limits for very large projects
- JSON output is best for programmatic processing
- Markdown output is best for human reading
- The tool respects .gitignore patterns by default

Overview

This skill provides a comprehensive context extraction system for large codebases. It analyzes code structure, dependencies, execution flows, and API surfaces to surface the most relevant context for understanding, debugging, or modifying code. It is designed to scale to large repositories and to prepare concise, prioritized outputs for human or programmatic use.

How this skill works

The extractor scans the target path, parses source files, and builds structural maps: file organization, call graphs, dependency graphs, class hierarchies, and data models. It supports multiple extraction modes (full, targeted, dependency, flow, api, data, hierarchy, summary) and manages context windows to prioritize relevant code sections and summaries. Outputs can be emitted as markdown, JSON, YAML, or text and include metrics, visual graphs, and focused summaries for specified targets.

When to use it

  • Onboard to or explore a large unfamiliar codebase
  • Prepare focused context for a specific class, function, or module
  • Map dependencies before making changes or upgrades
  • Trace execution flows for debugging or performance analysis
  • Generate API surface docs and usage summaries
  • Create inputs for code review, refactoring, or RAG-enabled agents

Best practices

  • Start with summary mode to get a high-level orientation before deeper analysis
  • Use targeted mode with the --focus parameter for precise investigations
  • Combine dependency mode with flow mode to understand impact and runtime behavior
  • Set a reasonable --depth for very large projects to limit analysis time
  • Include tests selectively to understand integration points; exclude vendor/generated code with patterns

Example use cases

  • Produce a developer-friendly context.md for a new team member using full mode
  • Generate a dependencies.json to find circular or unused dependencies before upgrades
  • Extract a focused context for UserService to prepare a safe refactor using targeted mode
  • Trace call chains from main or an API endpoint to diagnose a bug using flow mode
  • Export public API signatures and docstrings for API documentation using api mode

FAQ

What languages does it support?

It supports many languages including C#, Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, Ruby, and PHP, with auto-detection available.

How do I limit analysis time on very large repositories?

Use the --depth option, exclude vendor/generated directories via --exclude, and start with summary mode to narrow focus before full runs.

What formats can outputs be produced in?

Outputs can be produced in markdown, json, yaml, or plain text; use JSON for programmatic pipelines and markdown for human-readable reports.