home / skills / transilienceai / communitytools / ai-threat-testing

ai-threat-testing skill

needs review

/projects/pentest/.claude/skills/ai-threat-testing

This skill helps you assess AI systems for OWASP LLM vulnerabilities by orchestrating targeted tests and generating professional PoC reports.

npx playbooks add skill transilienceai/communitytools --skill ai-threat-testing

Review the files below or copy the command above to add this skill to your agents.

Files (13)

SKILL.md

3.3 KB

---
name: ai-threat-testing
description: Offensive AI security testing and exploitation framework. Systematically tests LLM applications for OWASP Top 10 vulnerabilities including prompt injection, model extraction, data poisoning, and supply chain attacks. Integrates with pentest workflows to discover and exploit AI-specific threats.
---

# AI Threat Testing

Test LLM applications for OWASP LLM Top 10 vulnerabilities using 10 specialized agents. Use for authorized AI security assessments.

## Quick Start

```
1. Specify target (LLM app URL, API endpoint, or local model)
2. Select scope: Full OWASP Top 10 | Specific vulnerability | Supply chain
3. Agents deploy, test, capture evidence
4. Professional report with PoCs generated
```

## Primary Agents

Each agent targets one OWASP LLM vulnerability:

1. **Prompt Injection** (LLM01): Direct/indirect injection, system prompt extraction
2. **Output Handling** (LLM02): Code/XSS injection, unsafe deserialization
3. **Training Poisoning** (LLM03): Membership inference, backdoors, data extraction
4. **Resource Exhaustion** (LLM04): Token flooding, DoS, cost impact
5. **Supply Chain** (LLM05): Dependency scanning, plugin security
6. **Excessive Agency** (LLM06): Privilege escalation, unauthorized actions
7. **Model Extraction** (LLM07): Query-based theft, data reconstruction
8. **Vector Poisoning** (LLM08): RAG injection, retrieval manipulation
9. **Overreliance** (LLM09): Hallucination testing, confidence manipulation
10. **Logging Bypass** (LLM10): Monitoring evasion, forensic gaps

See `agents/llm0X-*.md` for agent details.

## Workflows

**Full Assessment** (4-8 hours):
```
- [ ] Reconnaissance
- [ ] Deploy all 10 agents
- [ ] Execute exploits
- [ ] Capture evidence
- [ ] Generate report
```

**Focused Testing** (1-3 hours):
```
- [ ] Select vulnerability (LLM01-10)
- [ ] Deploy agent
- [ ] Execute techniques
- [ ] Document findings
```

**Supply Chain Audit** (2-4 hours):
```
- [ ] Inventory dependencies
- [ ] Scan CVEs
- [ ] Test plugins/APIs
- [ ] Verify model provenance
```

## Integration

Enhances `/pentest` with AI-specific testing:
- Traditional pentesting + AI threat testing = complete security assessment
- Chain vulnerabilities across traditional and AI vectors
- Unified reporting with CVSS scores

## Key Techniques

**Prompt Injection**: Instruction override, system prompt extraction, filter evasion
**Model Extraction**: Query sampling, token analysis, membership inference
**Data Poisoning**: Behavioral anomalies, backdoor triggers, bias analysis
**DoS**: Token flooding, recursive expansion, context exhaustion
**Supply Chain**: CVE scanning, plugin audit, model verification

## Evidence Capture

All agents collect: screenshots, network logs, API responses, errors, console output, execution metrics.

## Reporting

Automated reports include: executive summary, detailed findings (CVSS scores), PoC scripts, evidence, remediation guidance.

## Critical Rules

- Written authorization REQUIRED before testing
- Never exceed defined scope
- Test in isolated environments when possible
- Document all findings with reproducible PoCs
- Follow responsible disclosure practices

## Integration

- Integrates with `/pentest` skill for comprehensive security testing
- AI-specific vulnerability knowledge in `/AGENTS.md`
- Agent definitions in `agents/llm0X-*.md`

Overview

This skill provides an offensive AI security testing and exploitation framework for systematically assessing LLM applications against the OWASP LLM Top 10. It deploys specialized agents to discover prompt injection, model extraction, data poisoning, supply chain flaws, and other AI-specific threats. Use it to generate evidence-backed findings and actionable remediation for authorized assessments.

How this skill works

You specify a target (LLM app URL, API endpoint, or local model) and a scope (full OWASP Top 10, a specific vulnerability, or supply chain audit). Ten purpose-built agents execute targeted techniques, capture evidence (screenshots, logs, API responses) and produce reproducible proofs-of-concept. Results are combined into a professional report with CVSS-style severity, remediation recommendations, and PoC scripts. Integrations allow inclusion in existing pentest workflows and unified reporting.

When to use it

Authorized penetration tests of LLM applications and chatbots
Bug bounty investigations focused on AI-specific vulnerabilities
Pre-release security assessments for models, plugins, and integrations
Supply chain and dependency audits for model components
Incident validation when AI components may have been compromised

Best practices

Obtain written authorization before any testing and strictly adhere to defined scope
Run tests in isolated or staging environments when possible to avoid impacting production
Prioritize high-risk vectors (prompt injection, model extraction, RAG/vector poisoning) early in assessments
Document all steps and collect reproducible PoCs, logs, and network captures for disclosure
Combine AI-specific tests with traditional pentest activities to trace chained vulnerabilities

Example use cases

Full assessment: deploy all 10 agents to generate a comprehensive OWASP LLM Top 10 report
Focused test: run the Prompt Injection agent against a conversational UI to verify system prompt leaks
Model extraction check: evaluate query-based data leakage and estimate model theft risk
Supply chain audit: inventory dependencies, scan for CVEs, and test plugin/plugin API security
RAG/Vector test: simulate malicious retrieval inputs to assess retrieval-augmented-generation poisoning

FAQ

Is written authorization required to use this skill?

Yes. Written authorization is required before testing and you must not exceed the agreed scope.

Can this be integrated into existing pentest workflows?

Yes. It integrates with professional pentest workflows and can augment traditional testing with AI-specific agents and unified reporting.

What evidence does the skill collect?

Agents collect screenshots, network logs, API responses, console output, and execution metrics to support reproducible PoCs and remediation guidance.