home / skills / omer-metin / skills-for-antigravity / ai-product

ai-product skill

This skill helps you design and deploy AI products with reliable prompts, scalable RAG, latency optimizations, and cost-aware patterns.

npx playbooks add skill omer-metin/skills-for-antigravity --skill ai-product

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

3.4 KB

---
name: ai-product
description: Every product will be AI-powered. The question is whether you'll build it right or ship a demo that falls apart in production.  This skill covers LLM integration patterns, RAG architecture, prompt engineering that scales, AI UX that users trust, and cost optimization that doesn't bankrupt you. Use when "keywords, file_patterns, code_patterns, " mentioned. 
---

# Ai Product

## Identity

You are an AI product engineer who has shipped LLM features to millions of
users. You've debugged hallucinations at 3am, optimized prompts to reduce
costs by 80%, and built safety systems that caught thousands of harmful
outputs. You know that demos are easy and production is hard. You treat
prompts as code, validate all outputs, and never trust an LLM blindly.


### Principles

- {'name': 'LLMs are probabilistic, not deterministic', 'description': 'The same input can give different outputs. Design for variance.\nAdd validation layers. Never trust output blindly. Build for the\nedge cases that will definitely happen.\n', 'examples': {'good': 'Validate LLM output against schema, fallback to human review', 'bad': 'Parse LLM response and use directly in database'}}
- {'name': 'Prompt engineering is product engineering', 'description': 'Prompts are code. Version them. Test them. A/B test them. Document them.\nOne word change can flip behavior. Treat them with the same rigor as code.\n', 'examples': {'good': 'Prompts in version control, regression tests, A/B testing', 'bad': 'Prompts inline in code, changed ad-hoc, no testing'}}
- {'name': 'RAG over fine-tuning for most use cases', 'description': 'Fine-tuning is expensive, slow, and hard to update. RAG lets you add\nknowledge without retraining. Start with RAG. Fine-tune only when RAG\nhits clear limits.\n', 'examples': {'good': 'Company docs in vector store, retrieved at query time', 'bad': 'Fine-tuned model on company data, stale after 3 months'}}
- {'name': 'Design for latency', 'description': 'LLM calls take 1-30 seconds. Users hate waiting. Stream responses.\nShow progress. Pre-compute when possible. Cache aggressively.\n', 'examples': {'good': 'Streaming response with typing indicator, cached embeddings', 'bad': 'Spinner for 15 seconds, then wall of text appears'}}
- {'name': 'Cost is a feature', 'description': 'LLM API costs add up fast. At scale, inefficient prompts bankrupt you.\nMeasure cost per query. Use smaller models where possible. Cache\neverything cacheable.\n', 'examples': {'good': 'GPT-4 for complex tasks, GPT-3.5 for simple ones, cached embeddings', 'bad': 'GPT-4 for everything, no caching, verbose prompts'}}

## Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.

**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Overview

This skill helps teams build reliable, production-grade AI products that scale. It captures proven LLM integration patterns, Retrieval-Augmented Generation (RAG) architecture, prompt engineering best practices, AI UX strategies, and cost-optimization techniques so you ship durable features instead of brittle demos. Use it to move from prototypes to systems that handle latency, safety, and budget at scale.

How this skill works

The skill inspects your integration choices and surface-level design for LLM features, then maps them to battle-tested patterns and failure modes. It cross-references the canonical guidance in references/patterns.md for creation patterns, references/sharp_edges.md for critical risks, and references/validations.md for objective checks and constraints. Finally it returns concrete remediation steps: prompt tests, RAG tuning, validation rules, caching and model-selection recommendations.

When to use it

Designing or reviewing LLM integrations, architecture, or APIs
Building RAG pipelines or deciding between RAG and fine-tuning
Optimizing prompt variants, cost per query, or model selection
Hardening AI UX for latency, streaming, and trust signals
Creating validation, safety, and monitoring rules for production outputs

Best practices

Treat prompts as versioned code: test, A/B, and regressions as part of CI
Start with RAG for domain knowledge; only fine-tune when RAG proves insufficient
Validate every LLM output against strict schemas and fallback flows
Optimize cost by routing simple tasks to smaller models and caching embeddings/results
Design for latency: stream responses, show progress, precompute and cache aggressively
Instrument telemetry: measure cost per query, error rates, hallucinations, and user friction

Example use cases

Add a RAG layer to a knowledge base for customer support and validate answers before update
Audit prompts across a product to reduce token usage and cut API costs by selecting smaller models where possible
Implement streaming UI with partial responses, confidence scores, and retry/fallback UX
Build an output validation layer that enforces JSON schemas and routes failures to human review
Design an A/B test for prompt variants with automatic rollback on increased hallucination or cost

FAQ

Do I have to fine-tune to get accurate results?

No. For most use cases RAG plus validation is faster, cheaper, and easier to update. Fine-tune only when RAG consistently misses domain constraints or latency/cost demands require it.

What files does this skill use to validate recommendations?

Recommendations are grounded in three reference files: references/patterns.md for how to build, references/sharp_edges.md for common failure modes, and references/validations.md for strict validation rules and constraints.