home / skills / 404kidwiz / claude-supercode-skills / prompt-engineer-skill

prompt-engineer-skill skill

safe

This skill helps you design and optimize prompts for large language models, enabling effective reasoning, few-shot learning, and production prompt management.

npx playbooks add skill 404kidwiz/claude-supercode-skills --skill prompt-engineer-skill

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.2 KB

---
name: prompt-engineer
description: Expert in designing, optimizing, and evaluating prompts for Large Language Models. Specializes in Chain-of-Thought, ReAct, few-shot learning, and production prompt management. Use when crafting prompts, optimizing LLM outputs, or building prompt systems. Triggers include "prompt engineering", "prompt optimization", "chain of thought", "few-shot", "prompt template", "LLM prompting".
---

# Prompt Engineer

## Purpose
Provides expertise in designing, optimizing, and evaluating prompts for Large Language Models. Specializes in prompting techniques like Chain-of-Thought, ReAct, and few-shot learning, as well as production prompt management and evaluation.

## When to Use
- Designing prompts for LLM applications
- Optimizing prompt performance
- Implementing Chain-of-Thought reasoning
- Creating few-shot examples
- Building prompt templates
- Evaluating prompt effectiveness
- Managing prompts in production
- Reducing hallucinations through prompting

## Quick Start
**Invoke this skill when:**
- Crafting prompts for LLM applications
- Optimizing existing prompts
- Implementing advanced prompting techniques
- Building prompt management systems
- Evaluating prompt quality

**Do NOT invoke when:**
- LLM system architecture → use `/llm-architect`
- RAG implementation → use `/ai-engineer`
- NLP model training → use `/nlp-engineer`
- Agent performance monitoring → use `/performance-monitor`

## Decision Framework
```
Prompting Technique?
├── Reasoning Tasks
│   ├── Step-by-step → Chain-of-Thought
│   └── Tool use → ReAct
├── Classification/Extraction
│   ├── Clear categories → Zero-shot + examples
│   └── Complex → Few-shot with edge cases
├── Generation
│   └── Structured output → JSON mode + schema
└── Consistency
    └── System prompt + temperature tuning
```

## Core Workflows

### 1. Prompt Design
1. Define task clearly
2. Choose prompting technique
3. Write system prompt with context
4. Add examples if few-shot
5. Specify output format
6. Test with diverse inputs

### 2. Chain-of-Thought Implementation
1. Identify reasoning requirements
2. Add "Let's think step by step" or equivalent
3. Provide reasoning examples
4. Structure expected reasoning steps
5. Test reasoning quality
6. Iterate on step guidance

### 3. Prompt Optimization
1. Establish baseline metrics
2. Identify failure patterns
3. Adjust instructions for clarity
4. Add/modify examples
5. Tune output constraints
6. Measure improvement

## Best Practices
- Be specific and explicit in instructions
- Use structured output formats (JSON, XML)
- Include examples for complex tasks
- Test with edge cases and adversarial inputs
- Version control prompts
- Measure and track prompt performance

## Anti-Patterns
| Anti-Pattern | Problem | Correct Approach |
|--------------|---------|------------------|
| Vague instructions | Inconsistent output | Be specific and explicit |
| No examples | Poor performance on complex tasks | Add few-shot examples |
| Unstructured output | Hard to parse | Specify format clearly |
| No testing | Unknown failure modes | Test diverse inputs |
| Prompt in code | Hard to iterate | Separate prompt management |

Overview

This skill provides expert guidance for designing, optimizing, and evaluating prompts for Large Language Models. It focuses on advanced techniques like Chain-of-Thought, ReAct, and few-shot learning, plus practical production prompt management. Use it to improve reliability, reduce hallucinations, and scale prompt systems in real applications.

How this skill works

The skill inspects a task and recommends an appropriate prompting technique, including step-by-step reasoning prompts, tool-invocation patterns, or few-shot example selection. It produces concrete prompt templates, example sets, output schemas, and an iterative testing plan. It also suggests metrics and changes to tune temperature, format constraints, and versioning for production use.

When to use it

Designing new prompts for LLM-powered features or assistants
Improving output accuracy or reducing hallucinations in existing prompts
Implementing reasoning workflows with Chain-of-Thought or ReAct
Creating few-shot examples and structured output schemas
Setting up prompt versioning, testing, and production rollout

Best practices

Be explicit and task-specific in system and user instructions
Prefer structured outputs (JSON/schema) when downstream parsing is required
Include representative few-shot examples and edge-case tests
Measure baseline metrics, iterate on failures, and track improvements
Separate prompts from code and maintain prompt version control

Example use cases

Crafting a multi-step reasoning prompt for finance or legal summarization
Designing a ReAct prompt to combine LLM reasoning with external tools
Building a few-shot classifier for nuanced categories with edge-case examples
Converting free-form responses into strict JSON for downstream pipelines
Optimizing an assistant prompt to lower hallucination rates and increase consistency

FAQ

When should I use Chain-of-Thought versus few-shot?

Use Chain-of-Thought for tasks that require multi-step reasoning and transparency; use few-shot when demonstrating desired output formats or handling complex classification with representative examples.

How do I reduce hallucinations without losing creativity?

Constrain outputs with explicit instructions and schemas, provide evidence or source-checking steps, and tune temperature or sampling while preserving guiding examples that show safe creativity.

What metrics should I track for prompt optimization?

Track task-specific accuracy, format compliance rate (e.g., valid JSON), hallucination frequency, response length distribution, and latency or token cost for production monitoring.