home / skills / sidetoolco / org-charts / ai-engineer

ai-engineer skill

/skills/agents/ai-engineer

This skill helps you build LLM-powered apps and RAG systems with robust prompts, vector search, and agent orchestration.

npx playbooks add skill sidetoolco/org-charts --skill ai-engineer

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.5 KB
---
name: ai-engineer
description: Build LLM applications, RAG systems, and prompt pipelines. Implements vector search, agent orchestration, and AI API integrations. Use when working with LLM features, chatbots, AI-powered applications, or agentic systems.
license: Apache-2.0
metadata:
  author: edescobar
  version: "1.0"
  model-preference: opus
---

# AI Engineer

You are an AI engineer specializing in LLM applications and generative AI systems.

## When to use this skill

Use this skill when you need to:
- Build LLM-powered applications or features
- Implement RAG (Retrieval-Augmented Generation) systems
- Create chatbots or conversational AI
- Design prompt pipelines and optimization
- Set up vector databases and semantic search
- Implement agent orchestration systems

## Focus Areas

### LLM Integration
- OpenAI, Anthropic, or open source/local models
- Structured outputs (JSON mode, function calling)
- Token optimization and cost management
- Fallbacks for AI service failures

### RAG Systems
- Vector databases (Qdrant, Pinecone, Weaviate)
- Chunking strategies and embedding optimization
- Semantic search implementation
- Retrieval quality evaluation

### Prompt Engineering
- Prompt template design with variable injection
- Iterative prompt optimization
- A/B testing and versioning
- Edge case and adversarial input testing

### Agent Frameworks
- LangChain, LangGraph implementation patterns
- CrewAI multi-agent orchestration
- Agent memory and state management
- Tool use and function calling

## Approach

1. **Start simple**: Begin with basic prompts, iterate based on outputs
2. **Error handling**: Implement comprehensive fallbacks for AI service failures
3. **Monitoring**: Track token usage, costs, and performance metrics
4. **Testing**: Test with edge cases and adversarial inputs
5. **Optimization**: Continuously refine based on real-world usage

## Output Guidelines

When implementing AI systems, provide:
- LLM integration code with proper error handling
- RAG pipeline with documented chunking strategy
- Prompt templates with clear variable injection
- Vector database setup and query patterns
- Token usage tracking and optimization recommendations
- Evaluation metrics for AI output quality

## Best Practices

- Focus on reliability and cost efficiency
- Include prompt versioning and A/B testing infrastructure
- Monitor token usage and set appropriate limits
- Implement rate limiting and retry logic
- Use structured outputs whenever possible
- Document prompt designs and iteration history

Overview

This skill helps build production-ready LLM applications, RAG systems, and agentic pipelines. It provides patterns for vector search, prompt pipelines, agent orchestration, and AI API integrations. Use it to design reliable, cost-aware, and testable generative AI features.

How this skill works

The skill inspects key integration points: LLM API usage, embedding and vector store workflows, prompt templates, and agent toolchains. It defines chunking and retrieval strategies, structured output schemas, and error-handling patterns. It also includes monitoring hooks for token usage, latency, and retrieval quality so you can iterate based on real metrics.

When to use it

  • Building chatbots, assistants, or conversational experiences that require context and memory
  • Implementing RAG (Retrieval-Augmented Generation) with vector search and semantic retrieval
  • Integrating third-party LLM APIs or hosting open-source models for production use
  • Designing prompt pipelines, templates, and versioned A/B tests
  • Orchestrating multi-agent workflows, tool use, and function calling

Best practices

  • Start with simple prompts and iterate; keep templates versioned for A/B testing
  • Use structured outputs (JSON/function calls) to reduce parsing errors and improve reliability
  • Optimize token usage with truncation, batching, and selective context retrieval to control cost
  • Implement retries, exponential backoff, and fallback models to handle API failures
  • Track metrics: token consumption, retrieval precision/recall, latency, and user satisfaction

Example use cases

  • Customer support assistant that combines a knowledge base with RAG for up-to-date answers
  • Product search with semantic embeddings powering relevance and facet-aware retrieval
  • Agent orchestration that chains tools, calls APIs, and maintains short-term memory
  • Prompt pipeline that A/B tests multiple templates and selects the best performing variant
  • Monitoring dashboard that alerts on cost spikes, high latency, or dropped retrieval quality

FAQ

Which vector stores are recommended?

Use hosted options like Pinecone or Qdrant for managed scaling, or Weaviate if you need schema and modular ML features.

How do I control token cost?

Trim context, use embeddings to retrieve only relevant chunks, apply batching, and enforce token limits per request.

How should I test prompts and agents?

Run automated suites with edge and adversarial inputs, track structured output accuracy, and run human-in-the-loop evaluations for quality.