home / skills / sounder25 / google-antigravity-skills-library / 14_detect_duplicate_files
This skill detects duplicate files across a workspace by content hashes and reports deduplication opportunities to save space.
npx playbooks add skill sounder25/google-antigravity-skills-library --skill 14_detect_duplicate_filesReview the files below or copy the command above to add this skill to your agents.
---
name: Detect Duplicate Files
description: Identify duplicate files across the workspace using SHA256 hashing to reduce redundancy and confusion.
version: 1.0.0
author: Antigravity Skills Library
created: 2026-01-16
leverage_score: 2/5
---
# SKILL-014: Detect Duplicate Files
## Overview
Scans the workspace for identical files (by content, not name) to detect redundancy, copy-paste errors, or accidental forks. Generates a report suggesting deduplication actions.
## Trigger Phrases
- `find duplicates`
- `check for duplicate files`
- `scan redundancy`
## Inputs
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `--workspace-path` | string | No | Current directory | Root to scan |
| `--min-size` | int | No | 0 | Minimum file size in bytes to check |
| `--exclude` | string[] | No | `node_modules`, `.git`, `bin`, `obj` | Directories to ignore |
## Outputs
### 1. DUPLICATE_REPORT.md
Summary of found duplicates:
```markdown
# Duplicate File Report
**Total Duplicates:** 12
**Wasted Space:** 4.5 MB
## Group 1 (Hash: a1b2...)
- `src/utils/math.ts` (Original?)
- `src/legacy/math_copy.ts`
## Group 2 (Hash: c3d4...)
- `config/settings.json`
- `deploy/settings.prod.json`
```
## Implementation
### Script: find_duplicates.ps1
1. recurses through directory (respecting excludes).
2. Calculates SHA256 hash of every file.
3. Groups by hash.
4. Filters groups with count < 2.
5. Generates Markdown report.
## Use Cases
1. **Cleanup:** Reducing repo size by removing accidental copies of large assets.
2. **Refactoring:** Finding code that was copy-pasted instead of shared.
This skill scans a workspace to identify files with identical content using SHA256 hashing. It produces a Markdown report that groups duplicates, shows total wasted space, and suggests candidates for deduplication. The goal is to reduce redundancy, surface accidental copies, and simplify cleanup tasks.
The script recursively walks the target directory while respecting configurable exclude patterns and a minimum file size. It computes a SHA256 hash for each file, groups files by hash, and filters out groups with only one file. Finally, it summarizes duplicate groups and aggregate wasted space in a DUPLICATE_REPORT.md file.
How does it decide which file is the original?
The script only groups identical content by hash; it does not mark originals. The report lists files in each group for manual inspection to choose the canonical file.
Can I exclude directories or change the minimum size?
Yes. Use the --exclude parameter to add directories and --min-size to ignore small files. Defaults already exclude node_modules, .git, bin, and obj.