home / skills / 404kidwiz / claude-supercode-skills / ai-engineer-skill

ai-engineer-skill skill

safe

This skill helps you design, implement, and optimize end-to-end AI systems including LLM integration, RAG pipelines, and agent-based applications.

npx playbooks add skill 404kidwiz/claude-supercode-skills --skill ai-engineer-skill

Review the files below or copy the command above to add this skill to your agents.

Files (10)

SKILL.md

3.4 KB

---
name: ai-engineer
description: Expert in building comprehensive AI systems, integrating LLMs, RAG architectures, and autonomous agents into production applications. Use when building AI-powered features, implementing LLM integrations, designing RAG pipelines, or deploying AI systems.
---

# AI Engineer

## Purpose
Provides expertise in end-to-end AI system development, from LLM integration to production deployment. Covers RAG architectures, embedding strategies, vector databases, prompt engineering, and AI application patterns.

## When to Use
- Building LLM-powered applications or features
- Implementing RAG (Retrieval-Augmented Generation) systems
- Integrating AI APIs (OpenAI, Anthropic, etc.)
- Designing embedding and vector search pipelines
- Building chatbots or conversational AI
- Implementing AI agents with tool use
- Optimizing AI system latency and cost

## Quick Start
**Invoke this skill when:**
- Building LLM-powered applications or features
- Implementing RAG systems with vector databases
- Integrating AI APIs into applications
- Designing embedding and retrieval pipelines
- Building conversational AI or agents

**Do NOT invoke when:**
- Training custom ML models from scratch (use ml-engineer)
- Deploying ML models to production infrastructure (use mlops-engineer)
- Managing multi-agent coordination (use agent-organizer)
- Optimizing LLM serving infrastructure (use llm-architect)

## Decision Framework
```
AI Feature Type:
├── Simple Q&A → Direct LLM API call
├── Knowledge-based answers → RAG pipeline
├── Multi-step reasoning → Chain-of-thought or agents
├── External actions needed → Tool-use agents
├── Real-time data → Streaming + function calling
└── Complex workflows → Multi-agent orchestration
```

## Core Workflows

### 1. RAG Pipeline Implementation
1. Chunk documents with appropriate strategy
2. Generate embeddings using suitable model
3. Store in vector database with metadata
4. Implement semantic search with reranking
5. Construct prompts with retrieved context
6. Add evaluation and monitoring

### 2. LLM Integration
1. Select appropriate model for use case
2. Design prompt templates with versioning
3. Implement structured output parsing
4. Add retry logic and fallbacks
5. Monitor token usage and costs
6. Cache responses where appropriate

### 3. AI Agent Development
1. Define agent capabilities and tools
2. Implement tool interfaces with validation
3. Design agent loop with termination conditions
4. Add guardrails and safety checks
5. Implement logging and tracing
6. Test edge cases and failure modes

## Best Practices
- Version prompts alongside application code
- Use structured outputs (JSON mode) for reliability
- Implement semantic caching for common queries
- Add human-in-the-loop for critical decisions
- Monitor hallucination rates and retrieval quality
- Design for graceful degradation when AI fails

## Anti-Patterns
| Anti-Pattern | Problem | Correct Approach |
|--------------|---------|------------------|
| Prompt in code | Hard to iterate and test | Use prompt templates with versioning |
| No evaluation | Unknown quality in production | Implement eval pipelines |
| Synchronous LLM calls | Slow user experience | Use streaming responses |
| Unbounded context | Token limits and cost | Implement context windowing |
| No fallbacks | System fails on API errors | Add retry logic and alternatives |

Overview

This skill is an expert guide for designing and shipping end-to-end AI systems that integrate LLMs, RAG pipelines, and autonomous agents into production applications. It focuses on practical implementation: embeddings, vector databases, prompt engineering, tool-enabled agents, and operational concerns like latency, cost, and monitoring. Use it to choose architectures and apply tested patterns for reliable AI features.

How this skill works

The skill inspects your feature goals and recommends an appropriate architecture: direct LLM calls, RAG retrieval, agent-based tooling, or multi-step orchestration. It provides concrete workflows for chunking and embedding documents, storing and searching vectors, constructing retrieval-aware prompts, and building agent loops with safety checks. It also covers integration patterns: model selection, structured outputs, retries, caching, and observability.

When to use it

Building LLM-powered application features or chat interfaces
Implementing Retrieval-Augmented Generation (RAG) with vector stores
Designing embedding pipelines and semantic search
Integrating external AI APIs (OpenAI, Anthropic, etc.) into production code
Creating agents that use tools and execute workflows
Optimizing system cost, latency, and reliability for AI features

Best practices

Version prompts alongside application code and use template management
Return structured outputs (JSON) and parse with validators
Implement semantic caching and response caching for repeat queries
Add human-in-the-loop for high-risk or sensitive decisions
Monitor hallucination rates, retrieval quality, token usage, and costs
Design graceful degradation and fallbacks for API failures

Example use cases

Customer support conversational agent that uses product docs via RAG
Knowledge base search with semantic relevance and reranking
Multi-step agent that performs web lookups, calls APIs, and summarizes results
Feature that augments user workflows with LLM-generated suggestions and structured outputs
Embedding pipeline to index PDF corpora into a vector database with metadata

FAQ

When should I choose RAG over a direct LLM call?

Use RAG when answers must be grounded in a changing or large corpus (documents, manuals, product data). Direct calls work for generic Q&A or creative generation without external knowledge needs.

How do I reduce latency and cost for user-facing AI features?

Use smaller or distilled models for latency-sensitive paths, implement semantic and response caching, stream outputs when possible, and limit retrieved context to top-K semantically relevant items with reranking.