home / skills / transilienceai / communitytools / code_repository_intel

code_repository_intel skill

safe

/projects/techstack_identification/.claude/skills/code_repository_intel

This skill scans public repositories to identify technologies from dependencies, CI configs, and language stats across GitHub and GitLab.

npx playbooks add skill transilienceai/communitytools --skill code_repository_intel

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

7.2 KB

---
name: code-repository-intel
description: Scans GitHub/GitLab for public repos, dependencies, and CI configurations
tools: Bash, WebFetch
model: inherit
hooks:
  PreToolUse:
    - matcher: "WebFetch"
      hooks:
        - type: command
          command: "../../../hooks/skills/pre_rate_limit_hook.sh"
  PostToolUse:
    - matcher: "WebFetch"
      hooks:
        - type: command
          command: "../../../hooks/skills/post_skill_logging_hook.sh"
---

# Code Repository Intelligence Skill

## Purpose

Scan public code repositories (GitHub, GitLab) to discover technologies through dependency files, CI configurations, and language statistics.

## Operations

### 1. find_github_org

Search for company's GitHub organization.

**Search Strategies:**
```
1. Direct org URL: github.com/{company_name}
2. GitHub search: org:{company_name}
3. Google dork: site:github.com "{company_name}"
4. Check company website for GitHub links
```

**GitHub API (if available):**
```
GET https://api.github.com/orgs/{org_name}
GET https://api.github.com/users/{username}
```

**Validation:**
- Organization description matches company
- Website URL in profile matches domain
- Recent activity (not abandoned)

### 2. analyze_repo_languages

Extract primary languages from repositories.

**GitHub API:**
```
GET https://api.github.com/repos/{owner}/{repo}/languages
```

**Response:**
```json
{
  "TypeScript": 245000,
  "JavaScript": 45000,
  "CSS": 12000,
  "HTML": 8000
}
```

**Language Implications:**
```json
{
  "TypeScript": {"implies": ["Node.js ecosystem"], "modern": true},
  "Python": {"implies": ["Backend/ML"], "frameworks": ["Django", "Flask", "FastAPI"]},
  "Ruby": {"implies": ["Rails ecosystem"]},
  "Go": {"implies": ["Cloud native", "Microservices"]},
  "Rust": {"implies": ["Performance critical", "Systems"]},
  "Java": {"implies": ["Enterprise"]},
  "Kotlin": {"implies": ["Android", "JVM modern"]},
  "Swift": {"implies": ["iOS/macOS"]},
  "PHP": {"implies": ["Web backend"], "frameworks": ["Laravel", "Symfony"]},
  "C#": {"implies": [".NET ecosystem"]}
}
```

### 3. scan_dependency_files

Parse dependency files for technology stack.

**Dependency File Mapping:**

**package.json (Node.js):**
```json
{
  "react": {"tech": "React", "confidence": 95},
  "next": {"tech": "Next.js", "implies": ["React"], "confidence": 95},
  "vue": {"tech": "Vue.js", "confidence": 95},
  "nuxt": {"tech": "Nuxt.js", "implies": ["Vue.js"], "confidence": 95},
  "express": {"tech": "Express.js", "confidence": 95},
  "fastify": {"tech": "Fastify", "confidence": 95},
  "nest": {"tech": "NestJS", "confidence": 95},
  "prisma": {"tech": "Prisma ORM", "confidence": 95},
  "mongoose": {"tech": "MongoDB", "confidence": 90},
  "pg": {"tech": "PostgreSQL", "confidence": 90},
  "mysql2": {"tech": "MySQL", "confidence": 90},
  "redis": {"tech": "Redis", "confidence": 90},
  "graphql": {"tech": "GraphQL", "confidence": 95},
  "apollo-server": {"tech": "Apollo GraphQL", "confidence": 95}
}
```

**requirements.txt / pyproject.toml (Python):**
```json
{
  "django": {"tech": "Django", "confidence": 95},
  "flask": {"tech": "Flask", "confidence": 95},
  "fastapi": {"tech": "FastAPI", "confidence": 95},
  "sqlalchemy": {"tech": "SQLAlchemy", "confidence": 90},
  "celery": {"tech": "Celery", "confidence": 90},
  "redis": {"tech": "Redis", "confidence": 85},
  "boto3": {"tech": "AWS SDK", "confidence": 90},
  "pandas": {"tech": "Data Science stack", "confidence": 70},
  "tensorflow": {"tech": "TensorFlow", "confidence": 95},
  "pytorch": {"tech": "PyTorch", "confidence": 95}
}
```

**Gemfile (Ruby):**
```json
{
  "rails": {"tech": "Ruby on Rails", "confidence": 95},
  "sinatra": {"tech": "Sinatra", "confidence": 95},
  "sidekiq": {"tech": "Sidekiq", "confidence": 90},
  "pg": {"tech": "PostgreSQL", "confidence": 85}
}
```

**go.mod (Go):**
```json
{
  "gin-gonic/gin": {"tech": "Gin", "confidence": 95},
  "gorilla/mux": {"tech": "Gorilla Mux", "confidence": 90},
  "gorm.io/gorm": {"tech": "GORM", "confidence": 90}
}
```

### 4. detect_ci_configs

Find and analyze CI/CD configuration files.

**CI Config Locations:**
```
.github/workflows/*.yml → GitHub Actions
.gitlab-ci.yml → GitLab CI
Jenkinsfile → Jenkins
.circleci/config.yml → CircleCI
.travis.yml → Travis CI
azure-pipelines.yml → Azure Pipelines
bitbucket-pipelines.yml → Bitbucket Pipelines
.drone.yml → Drone CI
cloudbuild.yaml → Google Cloud Build
buildspec.yml → AWS CodeBuild
```

**CI Config Analysis:**
```json
{
  "github_actions": {
    "indicates": "GitHub Actions CI/CD",
    "signals": ["Likely GitHub-centric workflow"]
  },
  "gitlab_ci": {
    "indicates": "GitLab CI/CD",
    "signals": ["Self-hosted or GitLab.com"]
  },
  "jenkins": {
    "indicates": "Jenkins",
    "signals": ["Enterprise CI", "Self-hosted"]
  }
}
```

### 5. search_dockerfile

Identify container base images and configuration.

**Dockerfile Analysis:**
```
FROM node:18-alpine → Node.js 18
FROM python:3.11-slim → Python 3.11
FROM golang:1.21 → Go 1.21
FROM ruby:3.2 → Ruby 3.2
FROM openjdk:17 → Java 17
FROM nginx:latest → nginx
FROM postgres:15 → PostgreSQL 15
FROM redis:7 → Redis 7
```

**Docker Compose Analysis:**
- Service names
- Image references
- Environment variables
- Port mappings

## Output

```json
{
  "skill": "code_repository_intel",
  "domain": "string",
  "results": {
    "organization": {
      "platform": "GitHub|GitLab",
      "name": "string",
      "url": "string",
      "verified": "boolean",
      "public_repos": "number"
    },
    "repositories": [
      {
        "name": "string",
        "url": "string",
        "description": "string",
        "languages": {
          "TypeScript": 65,
          "JavaScript": 25,
          "CSS": 10
        },
        "primary_language": "TypeScript",
        "last_updated": "date",
        "stars": "number"
      }
    ],
    "dependencies_found": {
      "node": [
        {"name": "react", "version": "^18.2.0", "tech": "React"},
        {"name": "next", "version": "13.4.0", "tech": "Next.js"}
      ],
      "python": [],
      "ruby": [],
      "go": []
    },
    "ci_cd": {
      "platform": "GitHub Actions",
      "config_file": ".github/workflows/ci.yml",
      "jobs_detected": ["build", "test", "deploy"]
    },
    "containerization": {
      "uses_docker": true,
      "base_images": ["node:18-alpine", "nginx:alpine"],
      "orchestration": "Kubernetes (k8s manifests found)"
    },
    "technologies_summary": [
      {
        "name": "string",
        "category": "Language|Framework|Database|Tool",
        "confidence": "number",
        "source": "dependency_file|ci_config|dockerfile"
      }
    ]
  },
  "evidence": [
    {
      "type": "repository",
      "url": "string",
      "file": "string",
      "content_sample": "string",
      "timestamp": "ISO-8601"
    }
  ]
}
```

## Rate Limiting

- GitHub API (unauthenticated): 60 requests/hour
- GitHub API (authenticated): 5000 requests/hour
- GitLab API: 300 requests/minute
- Web scraping fallback: 10 requests/minute

## Error Handling

- 404: Organization/repo doesn't exist or is private
- 403: Rate limited - wait and retry
- Continue with partial results
- Fall back to web scraping if API fails

## Security Considerations

- Only access public repositories
- Do not clone repositories
- Respect rate limits
- Do not store code content
- Log all API calls for audit

Overview

This skill scans public GitHub and GitLab repositories to map technologies, dependencies, CI configurations, and container usage. It produces a consolidated profile of an organization or repo set to accelerate security research, bug bounty triage, or attack surface discovery. Results emphasize detectable tech with confidence scores and evidence links.

How this skill works

The skill queries public APIs and falls back to web scraping when needed, harvesting repository lists, language statistics, dependency files, CI config files, Dockerfiles, and compose manifests. It parses package manifests (package.json, requirements.txt, go.mod, Gemfile), CI locations (.github/workflows, .gitlab-ci.yml, Jenkinsfile), and Docker base images to infer stacks and runtime versions. Outputs include per-repo language breakdowns, discovered dependencies with mapped technologies, CI platform detection, container base images, and timestamped evidence references.

When to use it

During reconnaissance for bug bounty programs or penetration tests to build an inventory of likely targets
Before authoring targeted exploits or proofs-of-concept to pick relevant frameworks and versions
To prioritize security reviews by identifying repositories with risky or out-of-date dependencies
When assessing a vendor or third-party provider’s public footprint and CI posture
To prepare scoped, evidence-backed reports for disclosure or audit teams

Best practices

Limit scans to public repositories and respect API/web scraping rate limits
Use authenticated API tokens for higher GitHub rate limits when authorized
Combine dependency detections with CI and Docker findings for more reliable tech inference
Treat inferred versions and techs as leads—verify in a controlled environment before exploiting
Log and timestamp all findings and store only metadata and evidence links, not raw source code

Example use cases

Find a company’s GitHub org, validate it via profile metadata, and list active repos for triage
Detect primary languages and frameworks across a repo set to choose tooling and fuzzing targets
Parse package.json and requirements.txt to surface popular vulnerable libraries or outdated runtimes
Locate CI pipelines (GitHub Actions, GitLab CI, Jenkins) to assess secrets exposure risk in workflows
Scan Dockerfiles and compose files to identify base images and potential container-level misconfigurations

FAQ

Does this skill clone repositories or store source code?

No. It only inspects metadata, dependency and config files via APIs or limited scraping and does not clone or persist raw source code.

How does it handle API rate limits?

It uses authenticated API access when available, respects documented limits, falls back to web scraping at a lower rate, and continues with partial results if throttled.

Are private repositories scanned?

No. The skill only accesses public repositories and will report 404/forbidden conditions for private or nonexistent resources.