home / skills / kienhaminh / anti-chaotic / ai-engineer
This skill guides building production-grade GenAI and agentic systems with robust evaluation, advanced RAG, and scalable MLOps.
npx playbooks add skill kienhaminh/anti-chaotic --skill ai-engineerReview the files below or copy the command above to add this skill to your agents.
---
name: ai-engineer
description: Use when building production-grade GenAI, Agentic Systems, Advanced RAG, or setting up rigorous Evaluation pipelines.
license: MIT
metadata:
version: "2.0"
---
# AI Engineering Standards
This skill provides guidelines for building production-grade GenAI, Agentic Systems, Advanced RAG, and rigorous Evaluation pipelines. Focus on robustness, scalability, and engineering reliability into stochastic systems.
## Core Responsibilities
1. **Agentic Systems & Architecture**: Designing multi-agent workflows, planning capabilities, and reliable tool-use patterns.
2. **Advanced RAG & Retrieval**: Implementing hybrid search, query expansion, re-ranking, and knowledge graphs.
3. **Evaluation & Reliability (Evals)**: Setting up rigorous evaluation pipelines (LLM-as-a-judge), regression testing, and guardrails.
4. **Model Integration & Optimization**: Function calling, structured outputs, prompt engineering, and choosing the right model for the task (latency vs. intelligence trade-offs).
5. **MLOps & Serving**: Observability, tracing, caching, and cost management.
## Dynamic Stack Loading
- **Agentic Patterns**: [Principles for reliable agents](references/agentic-patterns.md)
- **Advanced RAG**: [Techniques for high-recall retrieval](references/rag-advanced.md)
- **Evaluation Frameworks**: [Testing & Metrics](references/evaluation.md)
- **Serving & Optimization**: [Performance & MLOps](references/serving-optimization.md)
- **LLM Fundamentals**: [Prompting & SDKs](references/llm.md)
This skill codifies engineering standards for building production-grade generative AI, agentic systems, advanced retrieval-augmented generation (RAG), and rigorous evaluation pipelines. It focuses on robustness, scalability, and operational reliability for inherently stochastic systems. The guidance helps teams move from prototypes to repeatable, observable production services.
It inspects system design across five core areas: multi-agent orchestration, advanced retrieval and knowledge integration, evaluation and regression testing, model integration and optimization, and production serving with MLOps practices. The skill maps common failure modes to practical mitigations—re-ranking, hybrid search, structured outputs, traceable prompts, and observability hooks. It also provides modular patterns so teams can load only the capabilities they need for a given project.
How do I choose between latency-optimized and accuracy-optimized models?
Base the choice on task requirements: use smaller, faster models for interactive latency constraints and larger models or cascaded rerankers when higher accuracy justifies cost and delay.
What are the first observability metrics to add?
Start with request latency, token consumption, success/failure rates, and semantic regression checks against baseline outputs.