home / skills / sickn33 / antigravity-awesome-skills / ai-product

ai-product skill

safe

This skill helps you design and ship production-ready AI products by applying LLM integration, prompt engineering, and cost-optimization best practices.

This is most likely a fork of the ai-product skill from xfstudio

npx playbooks add skill sickn33/antigravity-awesome-skills --skill ai-product

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.1 KB

---
name: ai-product
description: "Every product will be AI-powered. The question is whether you'll build it right or ship a demo that falls apart in production.  This skill covers LLM integration patterns, RAG architecture, prompt engineering that scales, AI UX that users trust, and cost optimization that doesn't bankrupt you. Use when: keywords, file_patterns, code_patterns."
source: vibeship-spawner-skills (Apache 2.0)
---

# AI Product Development

You are an AI product engineer who has shipped LLM features to millions of
users. You've debugged hallucinations at 3am, optimized prompts to reduce
costs by 80%, and built safety systems that caught thousands of harmful
outputs. You know that demos are easy and production is hard. You treat
prompts as code, validate all outputs, and never trust an LLM blindly.

## Patterns

### Structured Output with Validation

Use function calling or JSON mode with schema validation

### Streaming with Progress

Stream LLM responses to show progress and reduce perceived latency

### Prompt Versioning and Testing

Version prompts in code and test with regression suite

## Anti-Patterns

### ❌ Demo-ware

**Why bad**: Demos deceive. Production reveals truth. Users lose trust fast.

### ❌ Context window stuffing

**Why bad**: Expensive, slow, hits limits. Dilutes relevant context with noise.

### ❌ Unstructured output parsing

**Why bad**: Breaks randomly. Inconsistent formats. Injection risks.

## ⚠️ Sharp Edges

| Issue | Severity | Solution |
|-------|----------|----------|
| Trusting LLM output without validation | critical | # Always validate output: |
| User input directly in prompts without sanitization | critical | # Defense layers: |
| Stuffing too much into context window | high | # Calculate tokens before sending: |
| Waiting for complete response before showing anything | high | # Stream responses: |
| Not monitoring LLM API costs | high | # Track per-request: |
| App breaks when LLM API fails | high | # Defense in depth: |
| Not validating facts from LLM responses | critical | # For factual claims: |
| Making LLM calls in synchronous request handlers | high | # Async patterns: |

Overview

This skill helps teams turn LLM experiments into reliable, scalable AI products. It codifies integration patterns for LLMs, retrieval-augmented generation (RAG), prompt engineering that scales, AI UX best practices, and cost and safety controls for production systems. Use it to avoid demo-ware and build features that survive real-world traffic and adversarial inputs.

How this skill works

The skill inspects code and configuration for common LLM integration patterns, validating use of structured outputs (function calls/JSON schemas), streaming support, prompt versioning, and RAG architecture. It flags anti-patterns such as context-window stuffing, unstructured parsing, synchronous blocking calls, and missing validation or cost monitoring. It provides actionable guidance to remediate issues and harden AI paths for production.

When to use it

Before shipping an LLM-powered feature to users or running a beta
When reviewing code for prompt handling, schema validation, or RAG pipelines
During architecture reviews for cost, latency, and reliability concerns
When building AI UX flows where trust, explainability, or safety matter
When auditing security, input sanitization, or failure modes in LLM calls

Best practices

Treat prompts as code: version, test, and run regression suites against them
Enforce structured outputs with schemas or function calling and validate on receipt
Stream responses and emit progress updates to reduce perceived latency
Sanitize and validate user input before including it in prompts
Monitor per-request costs, token usage, and error rates; set budgets and alerts
Design defense-in-depth: graceful degradation, retries, fallbacks, and offline safeguards

Example use cases

Convert a prototype chat feature into production: add schema validation, streaming, and cost limits
Audit an app that uses large context windows and refactor to RAG with condensed contexts
Implement prompt versioning and A/B regression tests to reduce hallucinations
Add safety layers: input sanitization, output validation, and blocklists for sensitive content
Optimize API usage by batching calls, caching embeddings, and trimming context to save costs

FAQ

How do I stop hallucinations in a deployed feature?

Use RAG to ground responses in verified data, validate factual claims against sources, and include confidence metadata so the app can surface uncertainty or ask follow-up questions.

Is streaming always necessary?

Not always, but streaming improves perceived performance and lets you show partial results and progress; use it for long responses or interactive flows and fall back to batch for small, fast calls.