home / skills / faionfaion / faion-network / faion-ml-ops
npx playbooks add skill faionfaion/faion-network --skill faion-ml-opsReview the files below or copy the command above to add this skill to your agents.
---
name: faion-ml-ops
description: "ML operations: fine-tuning (LoRA, QLoRA), model evaluation, cost optimization, observability."
user-invocable: false
allowed-tools: Read, Write, Edit, Glob, Grep, Bash, Task, AskUserQuestion, TodoWrite
---
> **Entry point:** `/faion-net` — invoke this skill for automatic routing to the appropriate domain.
# ML Ops Skill
**Communication: User's language. Code: English.**
## Purpose
Handles ML model operations. Covers fine-tuning, evaluation, cost management, and observability.
## Scope
| Area | Coverage |
|------|----------|
| **Fine-tuning** | LoRA, QLoRA, OpenAI fine-tuning, datasets |
| **Evaluation** | Metrics, benchmarks, frameworks |
| **Cost Optimization** | Token management, caching, batch APIs |
| **Observability** | LLM monitoring, tracing, logging |
## Quick Start
| Task | Files |
|------|-------|
| Fine-tune OpenAI | fine-tuning-openai-basics.md → fine-tuning-openai-production.md |
| Fine-tune LoRA | lora-qlora.md → finetuning-basics.md |
| Cost optimization | llm-cost-basics.md → cost-reduction-strategies.md |
| Evaluation | evaluation-metrics.md → evaluation-framework.md |
| Observability | llm-observability.md → llm-observability-stack-2026.md |
## Methodologies (15)
**Fine-tuning (5):**
- finetuning-basics: Fundamentals, when to fine-tune
- finetuning-datasets: Data preparation, quality
- fine-tuning-openai-basics: OpenAI API fine-tuning
- fine-tuning-openai-production: Production deployment
- lora-qlora: Efficient fine-tuning, parameter selection
**Evaluation (3):**
- evaluation-metrics: Accuracy, F1, perplexity, task metrics
- evaluation-framework: LLM-as-judge, human eval
- evaluation-benchmarks: MMLU, HumanEval, industry benchmarks
**Cost Optimization (2):**
- llm-cost-basics: Token counting, pricing models
- cost-reduction-strategies: Caching, compression, batching
**Observability (5):**
- llm-observability: Fundamentals, why monitor
- llm-observability-stack: Tools selection
- llm-observability-stack-2026: Latest tools (LangSmith, Langfuse)
- llm-management-observability: End-to-end management
## Code Examples
### OpenAI Fine-tuning
```python
from openai import OpenAI
client = OpenAI()
# Upload training data
file = client.files.create(
file=open("training_data.jsonl", "rb"),
purpose="fine-tune"
)
# Create fine-tuning job
job = client.fine_tuning.jobs.create(
training_file=file.id,
model="gpt-4o-mini-2024-07-18",
hyperparameters={"n_epochs": 3}
)
# Monitor
while True:
job = client.fine_tuning.jobs.retrieve(job.id)
if job.status == "succeeded":
break
```
### LoRA Fine-tuning
```python
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3-8b")
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.1,
bias="none"
)
model = get_peft_model(model, lora_config)
```
### Cost Tracking
```python
import tiktoken
def count_tokens(text, model="gpt-4o"):
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
def estimate_cost(prompt, completion, model="gpt-4o"):
prompt_tokens = count_tokens(prompt, model)
completion_tokens = count_tokens(completion, model)
# GPT-4o pricing
prompt_cost = prompt_tokens * 0.000005
completion_cost = completion_tokens * 0.000015
return prompt_cost + completion_cost
```
### LLM Observability with LangSmith
```python
from langsmith import traceable
@traceable
def rag_pipeline(query: str) -> str:
# Retrieval
docs = retrieve(query)
# Generation
response = generate(query, docs)
return response
```
## Fine-tuning Decision Matrix
| Scenario | Approach |
|----------|----------|
| Small dataset (<100 examples) | Few-shot prompting |
| Medium dataset (100-1000) | OpenAI fine-tuning |
| Large dataset (>1000) | LoRA/QLoRA |
| Custom behavior | Fine-tuning |
| New knowledge | RAG (not fine-tuning) |
## Cost Reduction Strategies
| Strategy | Savings | Trade-off |
|----------|---------|-----------|
| Prompt caching | 90% on cached | Cold start cost |
| Batch API | 50% | 24h latency |
| Smaller models | 80%+ | Lower quality |
| Context pruning | Variable | May lose context |
| Output limits | Variable | Truncated responses |
## Evaluation Frameworks
| Framework | Use Case |
|-----------|----------|
| **LangSmith** | Production monitoring, traces |
| **Langfuse** | Open-source observability |
| **PromptLayer** | Prompt versioning |
| **Weights & Biases** | Experiment tracking |
## Related Skills
| Skill | Relationship |
|-------|-------------|
| faion-llm-integration | Provides APIs to optimize |
| faion-rag-engineer | RAG evaluation |
| faion-devops-engineer | Model deployment |
---
*ML Ops v1.0 | 15 methodologies*