home / skills / hyperb1iss / hyperskills / ai

ai skill

/skills/ai

This skill helps you design and deploy AI-enabled features by guiding LLM integration, RAG patterns, embeddings, and model pipelines.

npx playbooks add skill hyperb1iss/hyperskills --skill ai

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.4 KB
---
name: ai
description: Use this skill when building AI features, integrating LLMs, implementing RAG, working with embeddings, deploying ML models, or doing data science. Activates on mentions of OpenAI, Anthropic, Claude, GPT, LLM, RAG, embeddings, vector database, Pinecone, Qdrant, LangChain, LlamaIndex, DSPy, MLflow, fine-tuning, LoRA, QLoRA, model deployment, ML pipeline, feature engineering, or machine learning.
---

# AI/ML Engineering

Build production AI systems with modern patterns and tools.

## Quick Reference

### The 2026 AI Stack

| Layer               | Tool              | Purpose                          |
| ------------------- | ----------------- | -------------------------------- |
| Prompting           | DSPy              | Programmatic prompt optimization |
| Orchestration       | LangGraph         | Stateful multi-agent workflows   |
| RAG                 | LlamaIndex        | Document ingestion and retrieval |
| Vectors             | Qdrant / Pinecone | Embedding storage and search     |
| Evaluation          | RAGAS             | RAG quality metrics              |
| Experiment Tracking | MLflow / W&B      | Logging, versioning, comparison  |
| Serving             | BentoML / vLLM    | Model deployment                 |
| Protocol            | MCP               | Tool and context integration     |

### DSPy: Programmatic Prompting

**Manual prompts are dead.** DSPy treats prompts as optimizable code:

```python
import dspy

class QA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="1-5 words")

# Create module
qa = dspy.Predict(QA)

# Use it
result = qa(question="What is the capital of France?")
print(result.answer)  # "Paris"
```

**Optimize with real data:**

```python
from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=exact_match)
optimized_qa = optimizer.compile(qa, trainset=train_data)
```

### RAG Architecture (Production)

```
Query → Rewrite → Hybrid Retrieval → Rerank → Generate → Cite
         │              │                │
         v              v                v
    Query expansion  Dense + BM25   Cross-encoder
```

**LlamaIndex + LangGraph Pattern:**

```python
from llama_index.core import VectorStoreIndex
from langgraph.graph import StateGraph

# Data layer (LlamaIndex)
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

# Control layer (LangGraph)
def retrieve(state):
    response = query_engine.query(state["question"])
    return {"context": response.response, "sources": response.source_nodes}

graph = StateGraph(State)
graph.add_node("retrieve", retrieve)
graph.add_node("generate", generate_answer)
graph.add_edge("retrieve", "generate")
```

### MCP Integration

Model Context Protocol is the standard for tool integration:

```python
from mcp import Server, Tool

server = Server("my-tools")

@server.tool()
async def search_docs(query: str) -> str:
    """Search the knowledge base."""
    results = await vector_store.search(query)
    return format_results(results)
```

### Embeddings (2026)

| Model                  | Dimensions | Best For         |
| ---------------------- | ---------- | ---------------- |
| text-embedding-3-large | 3072       | General purpose  |
| BGE-M3                 | 1024       | Multilingual RAG |
| Qwen3-Embedding        | Flexible   | Custom domains   |

### Fine-Tuning with LoRA/QLoRA

```python
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)

model = get_peft_model(base_model, config)
# Train on ~24GB VRAM (QLoRA on RTX 4090)
```

### MLOps Pipeline

```yaml
# MLflow tracking
mlflow.set_experiment("rag-v2")

with mlflow.start_run():
    mlflow.log_params({"chunk_size": 512, "model": "gpt-4"})
    mlflow.log_metrics({"faithfulness": 0.92, "relevance": 0.88})
    mlflow.log_artifact("prompts/qa.txt")
```

### Evaluation with RAGAS

```python
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_precision

results = evaluate(
    dataset,
    metrics=[faithfulness, answer_relevancy, context_precision],
)
print(results)  # {'faithfulness': 0.92, 'answer_relevancy': 0.88, ...}
```

### Vector Database Selection

| DB       | Best For               | Pricing             |
| -------- | ---------------------- | ------------------- |
| Qdrant   | Self-hosted, filtering | 1GB free forever    |
| Pinecone | Managed, zero-ops      | Free tier available |
| Weaviate | Knowledge graphs       | 14-day trial        |
| Milvus   | Billion-scale          | Self-hosted         |

## Agents

- **ai-engineer** - LLM integration, RAG, MCP, production AI
- **mlops-engineer** - Model deployment, monitoring, pipelines
- **data-scientist** - Analysis, modeling, experimentation
- **ml-researcher** - Cutting-edge architectures, paper implementation
- **cv-engineer** - Computer vision, VLMs, image processing

## Deep Dives

- [references/dspy-guide.md](references/dspy-guide.md)
- [references/rag-patterns.md](references/rag-patterns.md)
- [references/mcp-integration.md](references/mcp-integration.md)
- [references/fine-tuning.md](references/fine-tuning.md)
- [references/evaluation.md](references/evaluation.md)

## Examples

- [examples/rag-pipeline/](examples/rag-pipeline/)
- [examples/mcp-server/](examples/mcp-server/)
- [examples/dspy-optimization/](examples/dspy-optimization/)

Overview

This skill packages practical patterns and tools for building production AI systems, covering LLM integration, retrieval-augmented generation (RAG), embeddings, fine-tuning, and MLOps. It consolidates modern stack choices, code examples, and architecture patterns to accelerate shipping reliable, auditable AI features. Use it as a hands-on reference when designing pipelines, deploying models, or implementing evaluation and monitoring.

How this skill works

The skill inspects common AI engineering tasks and recommends focused solutions: prompt engineering with DSPy, document ingestion and retrieval with LlamaIndex, vector storage choices (Qdrant, Pinecone), and orchestration with LangGraph. It provides code snippets for RAG pipelines, MCP tool integration, LoRA/QLoRA fine-tuning recipes, and MLOps tracking examples using MLflow. The content emphasizes reproducible patterns: query rewrite → hybrid retrieval → rerank → generate → cite, and integrates evaluation with RAGAS.

When to use it

  • Designing a retrieval-augmented generation (RAG) pipeline for product search or knowledge assistants
  • Integrating LLMs and toolchains (OpenAI, Anthropic, Claude, custom models) into applications
  • Selecting and configuring vector databases and embedding models for semantic search
  • Fine-tuning or parameter-efficient tuning (LoRA/QLoRA) for domain-specific tasks
  • Setting up experiment tracking, evaluation metrics, and production model serving

Best practices

  • Treat prompts as code: use programmatic prompt tooling (DSPy) and optimize with data-driven bootstrapping
  • Combine dense and lexical retrieval (dense + BM25) and rerank with cross-encoders for precision
  • Choose embedding models by task: higher dims for general purpose, multilingual models for cross-language RAG
  • Track experiments, params, and artifacts with MLflow or W&B to enable reproducibility
  • Use MCP or a tooling protocol for secure, testable integration of external tools and stateful agents
  • Evaluate RAG outputs with faithfulness and relevance metrics (RAGAS) before deploying

Example use cases

  • Customer support assistant: ingest docs with LlamaIndex, store vectors in Qdrant, serve via LangGraph-driven agents
  • Domain-adapted model: apply LoRA/QLoRA on a base model and track runs with MLflow for A/B comparisons
  • Hybrid search system: combine BM25 and dense embeddings, rerank answers, and include citations in generation
  • Tool-enabled agents: expose internal search and analytics via MCP tools and orchestrate flows with LangGraph
  • Evaluation pipeline: run RAGAS metrics on benchmark queries to validate faithfulness and context precision

FAQ

Which vector DB should I pick for production?

Pick Qdrant for self-hosted control and filtering, Pinecone for managed zero-ops, and Milvus for very large scale; evaluate on latency, cost, and filtering needs.

When should I use LoRA vs QLoRA?

Use LoRA for lightweight adaptation when GPU memory is sufficient; use QLoRA to enable fine-tuning on limited VRAM (e.g., consumer GPUs) with quantization.