home / skills / pluginagentmarketplace / custom-plugin-ai-red-teaming / testing-methodologies
This skill analyzes AI security testing methodologies to help you identify vulnerabilities, prioritize threats, and create actionable remediation plans.
npx playbooks add skill pluginagentmarketplace/custom-plugin-ai-red-teaming --skill testing-methodologiesReview the files below or copy the command above to add this skill to your agents.
---
name: testing-methodologies
version: "2.0.0"
description: Structured approaches for AI security testing including threat modeling, penetration testing, and red team operations
sasmp_version: "1.3.0"
bonded_agent: 04-llm-vulnerability-analyst
bond_type: PRIMARY_BOND
# Schema Definitions
input_schema:
type: object
required: [methodology]
properties:
methodology:
type: string
enum: [stride, pasta, attack_tree, kill_chain, mitre_atlas]
scope:
type: object
properties:
target_type:
type: string
enum: [llm, ml_model, pipeline, api, infrastructure]
depth:
type: string
enum: [reconnaissance, assessment, exploitation, full]
output_schema:
type: object
properties:
threat_model:
type: object
attack_paths:
type: array
test_plan:
type: object
findings:
type: array
# Framework Mappings
owasp_llm_2025: [LLM01, LLM02, LLM03, LLM04, LLM05, LLM06, LLM07, LLM08, LLM09, LLM10]
nist_ai_rmf: [Map, Measure]
mitre_atlas: [AML.T0000, AML.T0001, AML.T0002]
---
# AI Security Testing Methodologies
Apply **systematic testing approaches** to identify and validate vulnerabilities in AI/ML systems.
## Quick Reference
```yaml
Skill: testing-methodologies
Agent: 04-evaluation-analyst
OWASP: Full LLM Top 10 Coverage
NIST: Map, Measure
MITRE: ATLAS Techniques
Use Case: Structured security assessment
```
## Testing Lifecycle
```
┌────────────────────────────────────────────────────────────────┐
│ AI SECURITY TESTING LIFECYCLE │
├────────────────────────────────────────────────────────────────┤
│ │
│ [1. Scope] → [2. Threat Model] → [3. Test Plan] │
│ ↓ ↓ │
│ [6. Report] ← [5. Analysis] ← [4. Execution] │
│ ↓ │
│ [7. Remediation Tracking] │
│ │
└────────────────────────────────────────────────────────────────┘
```
## Threat Modeling
### STRIDE for AI Systems
```python
class AIThreatModel:
"""STRIDE threat modeling for AI systems."""
STRIDE_AI = {
"Spoofing": {
"threats": [
"Adversarial examples mimicking valid inputs",
"Impersonation via prompt injection",
"Fake identity in multi-turn conversations"
],
"owasp": ["LLM01"],
"mitigations": ["Input validation", "Anomaly detection"]
},
"Tampering": {
"threats": [
"Training data poisoning",
"Model weight modification",
"RAG knowledge base manipulation"
],
"owasp": ["LLM03", "LLM04"],
"mitigations": ["Data integrity checks", "Access control"]
},
"Repudiation": {
"threats": [
"Untraceable model decisions",
"Missing audit logs",
"Prompt manipulation without logging"
],
"owasp": ["LLM06"],
"mitigations": ["Comprehensive logging", "Decision audit trail"]
},
"Information Disclosure": {
"threats": [
"Training data extraction",
"System prompt leakage",
"Membership inference attacks"
],
"owasp": ["LLM02", "LLM07"],
"mitigations": ["Output filtering", "Differential privacy"]
},
"Denial of Service": {
"threats": [
"Resource exhaustion attacks",
"Token flooding",
"Model degradation"
],
"owasp": ["LLM10"],
"mitigations": ["Rate limiting", "Resource quotas"]
},
"Elevation of Privilege": {
"threats": [
"Jailbreaking safety guardrails",
"Prompt injection for unauthorized actions",
"Agent tool abuse"
],
"owasp": ["LLM01", LLM05", "LLM06"],
"mitigations": ["Guardrails", "Permission boundaries"]
}
}
def analyze(self, system_description):
"""Generate threat model for AI system."""
threats = []
for category, details in self.STRIDE_AI.items():
for threat in details["threats"]:
applicable = self._check_applicability(
threat, system_description
)
if applicable:
threats.append({
"category": category,
"threat": threat,
"owasp_mapping": details["owasp"],
"mitigations": details["mitigations"],
"risk_score": self._calculate_risk(threat)
})
return ThreatModel(threats=threats)
```
### Attack Tree Construction
```yaml
Attack Tree: Compromise AI System
├── 1. Manipulate Model Behavior
│ ├── 1.1 Prompt Injection
│ │ ├── 1.1.1 Direct Injection (via user input)
│ │ ├── 1.1.2 Indirect Injection (via external data)
│ │ └── 1.1.3 Multi-turn Manipulation
│ ├── 1.2 Jailbreaking
│ │ ├── 1.2.1 Authority Exploits
│ │ ├── 1.2.2 Roleplay/Hypothetical
│ │ └── 1.2.3 Encoding Bypass
│ └── 1.3 Adversarial Inputs
│ ├── 1.3.1 Edge Cases
│ └── 1.3.2 Out-of-Distribution
│
├── 2. Extract Information
│ ├── 2.1 Training Data Extraction
│ │ ├── 2.1.1 Membership Inference
│ │ └── 2.1.2 Model Inversion
│ ├── 2.2 System Prompt Disclosure
│ │ ├── 2.2.1 Direct Query
│ │ └── 2.2.2 Inference via Behavior
│ └── 2.3 Model Theft
│ ├── 2.3.1 Query-based Extraction
│ └── 2.3.2 Distillation Attack
│
└── 3. Disrupt Operations
├── 3.1 Resource Exhaustion
│ ├── 3.1.1 Token Flooding
│ └── 3.1.2 Complex Query Spam
└── 3.2 Supply Chain Attack
├── 3.2.1 Poisoned Dependencies
└── 3.2.2 Compromised Plugins
```
```python
class AttackTreeBuilder:
"""Build and analyze attack trees for AI systems."""
def build_tree(self, root_goal, system_context):
"""Construct attack tree for given goal."""
root = AttackNode(goal=root_goal, type="OR")
# Add child attack vectors
attack_vectors = self._identify_vectors(root_goal, system_context)
for vector in attack_vectors:
child = AttackNode(
goal=vector.goal,
type=vector.combination_type,
difficulty=vector.difficulty,
detectability=vector.detectability
)
root.add_child(child)
# Recursively add sub-attacks
if vector.has_sub_attacks:
self._expand_node(child, system_context)
return AttackTree(root=root)
def calculate_path_risk(self, tree):
"""Calculate risk score for each attack path."""
paths = self._enumerate_paths(tree.root)
scored_paths = []
for path in paths:
# Risk = Likelihood × Impact
likelihood = self._calculate_likelihood(path)
impact = self._calculate_impact(path)
risk = likelihood * impact
scored_paths.append({
"path": [n.goal for n in path],
"likelihood": likelihood,
"impact": impact,
"risk": risk
})
return sorted(scored_paths, key=lambda x: x["risk"], reverse=True)
```
## Testing Phases
### Phase 1: Reconnaissance
```python
class AIReconnaissance:
"""Gather information about target AI system."""
def enumerate(self, target):
results = {
"model_info": self._fingerprint_model(target),
"api_endpoints": self._discover_endpoints(target),
"input_constraints": self._probe_constraints(target),
"response_patterns": self._analyze_responses(target),
"error_behaviors": self._trigger_errors(target)
}
return ReconReport(results)
def _fingerprint_model(self, target):
"""Identify model type and characteristics."""
probes = [
"What model are you?",
"What is your knowledge cutoff?",
"Who created you?",
"Complete: The quick brown fox",
]
responses = [target.query(p) for p in probes]
return self._analyze_fingerprint(responses)
def _probe_constraints(self, target):
"""Discover input validation rules."""
constraints = {}
# Length limits
for length in [100, 1000, 5000, 10000, 50000]:
test_input = "a" * length
try:
response = target.query(test_input)
constraints["max_length"] = length
except Exception:
constraints["max_length"] = length - 1
break
# Token limits
constraints["token_behavior"] = self._test_token_limits(target)
# Rate limits
constraints["rate_limits"] = self._test_rate_limits(target)
return constraints
```
### Phase 2: Vulnerability Assessment
```yaml
Assessment Matrix:
Input Handling:
tests:
- Prompt injection variants
- Encoding bypass
- Boundary testing
- Format fuzzing
owasp: [LLM01]
Output Safety:
tests:
- Harmful content generation
- Toxicity evaluation
- Bias testing
- PII in responses
owasp: [LLM05, LLM07]
Model Robustness:
tests:
- Adversarial examples
- Out-of-distribution inputs
- Edge case handling
owasp: [LLM04, LLM09]
Access Control:
tests:
- Authentication bypass
- Authorization escalation
- Rate limit bypass
owasp: [LLM06, LLM10]
Data Security:
tests:
- Training data extraction
- System prompt disclosure
- Configuration leakage
owasp: [LLM02, LLM03]
```
### Phase 3: Exploitation
```python
class ExploitationPhase:
"""Develop and execute proof-of-concept exploits."""
def develop_poc(self, vulnerability):
"""Create proof-of-concept for vulnerability."""
poc = ProofOfConcept(
vulnerability=vulnerability,
payload=self._craft_payload(vulnerability),
success_criteria=self._define_success(vulnerability),
impact_assessment=self._assess_impact(vulnerability)
)
return poc
def execute(self, poc, target):
"""Execute proof-of-concept against target."""
# Pre-execution logging
execution_id = self._log_execution_start(poc)
try:
# Execute with safety controls
response = target.query(poc.payload)
# Verify success
success = poc.verify_success(response)
# Document evidence
evidence = Evidence(
payload=poc.payload,
response=response,
success=success,
timestamp=datetime.utcnow()
)
return ExploitResult(
execution_id=execution_id,
success=success,
evidence=evidence,
impact=poc.impact_assessment if success else None
)
finally:
self._log_execution_end(execution_id)
```
### Phase 4: Reporting
```python
class SecurityReportGenerator:
"""Generate comprehensive security assessment reports."""
REPORT_TEMPLATE = """
## Executive Summary
{executive_summary}
## Scope
{scope_description}
## Methodology
{methodology_used}
## Findings Summary
| Severity | Count |
|----------|-------|
| Critical | {critical_count} |
| High | {high_count} |
| Medium | {medium_count} |
| Low | {low_count} |
## Detailed Findings
{detailed_findings}
## Remediation Roadmap
{remediation_plan}
## Appendix
{appendix}
"""
def generate(self, assessment_results):
"""Generate full security report."""
findings = self._format_findings(assessment_results.findings)
return Report(
executive_summary=self._write_executive_summary(assessment_results),
scope=assessment_results.scope,
methodology=assessment_results.methodology,
findings=findings,
remediation=self._create_remediation_plan(findings),
appendix=self._compile_appendix(assessment_results)
)
```
## MITRE ATLAS Integration
```yaml
ATLAS Technique Mapping:
Reconnaissance:
- AML.T0000: ML Model Access
- AML.T0001: ML Attack Staging
Resource Development:
- AML.T0002: Acquire Infrastructure
- AML.T0003: Develop Adversarial ML Attacks
Initial Access:
- AML.T0004: Supply Chain Compromise
- AML.T0005: LLM Prompt Injection
Execution:
- AML.T0006: Active Scanning
- AML.T0007: Discovery via APIs
Impact:
- AML.T0008: Model Denial of Service
- AML.T0009: Model Evasion
```
## Test Metrics
```yaml
Coverage Metrics:
attack_vector_coverage: "% of OWASP LLM Top 10 tested"
technique_coverage: "% of MITRE ATLAS techniques tested"
code_coverage: "% of AI pipeline tested"
Effectiveness Metrics:
vulnerability_density: "Issues per 1000 queries"
attack_success_rate: "% of successful attacks"
false_positive_rate: "% incorrect vulnerability flags"
Efficiency Metrics:
time_to_detect: "Average detection time"
remediation_velocity: "Days from discovery to fix"
test_throughput: "Tests per hour"
```
## Severity Classification
```yaml
CRITICAL (CVSS 9.0-10.0):
- Remote code execution via prompt
- Complete training data extraction
- Full model theft
- Authentication bypass
HIGH (CVSS 7.0-8.9):
- Successful jailbreak
- Significant data leakage
- Harmful content generation
- Privilege escalation
MEDIUM (CVSS 4.0-6.9):
- Partial information disclosure
- Rate limit bypass
- Bias in specific scenarios
- Minor guardrail bypass
LOW (CVSS 0.1-3.9):
- Information leakage (non-sensitive)
- Minor configuration issues
- Edge case failures
```
## Troubleshooting
```yaml
Issue: Incomplete threat model
Solution: Use multiple frameworks (STRIDE + PASTA + Attack Trees)
Issue: Missing attack vectors
Solution: Cross-reference with OWASP LLM Top 10, MITRE ATLAS
Issue: Inconsistent test results
Solution: Standardize test environment, increase sample size
Issue: Unclear risk prioritization
Solution: Use CVSS scoring, consider business context
```
## Integration Points
| Component | Purpose |
|-----------|---------|
| Agent 04 | Methodology execution |
| Agent 01 | Threat intelligence |
| /analyze | Threat analysis |
| JIRA/GitHub | Issue tracking |
---
**Apply systematic methodologies for thorough AI security testing.**
This skill provides structured approaches for AI security testing, covering threat modeling, vulnerability assessment, exploitation proofs, and reporting. It codifies lifecycle phases and mappings to OWASP LLM Top 10 and MITRE ATLAS to support repeatable, measurable assessments.
The methodology follows a lifecycle: scope definition, threat modeling (STRIDE + attack trees), test planning, execution (recon, vulnerability assessment, exploitation), analysis, and reporting with remediation tracking. It includes techniques for fingerprinting models, probing input constraints, building attack trees, developing proof-of-concept exploits, and generating prioritized security reports.
How do I prioritize findings from AI security tests?
Prioritize by combining severity (CVSS-style bands), exploitability (likelihood), and business impact; use attack path risk scores and remediation velocity to sequence fixes.
Which frameworks does this methodology map to?
It maps to STRIDE for threat modeling, OWASP LLM Top 10 for vulnerability categories, and MITRE ATLAS for adversarial ML technique alignment.