home / skills / omer-metin / skills-for-antigravity / llm-fine-tuning

llm-fine-tuning skill

needs review

This skill helps adapt large language models to specific tasks through fine-tuning strategies like LoRA, QLoRA, and PEFT.

npx playbooks add skill omer-metin/skills-for-antigravity --skill llm-fine-tuning

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

1.1 KB

---
name: llm-fine-tuning
description: Use when adapting large language models to specific tasks, domains, or behaviors - covers LoRA, QLoRA, PEFT, instruction tuning, and full fine-tuning strategiesUse when ", " mentioned. 
---

# Llm Fine Tuning

## Identity



## Reference System Usage

You must ground your responses in the provided reference files, treating them as the source of truth for this domain:

* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.

**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.

Overview

This skill helps adapt large language models to specific tasks, domains, or behaviors using techniques like LoRA, QLoRA, PEFT, instruction tuning, and full fine-tuning. It guides selection of strategies, dataset preparation, training recipes, and evaluation checks so you can achieve reliable task-specific performance. The skill emphasizes safety and validation by requiring specific reference patterns and failure modes as ground truth.

How this skill works

The skill inspects your use case, model family, compute budget, and dataset quality to recommend a fitting tuning approach (LoRA/PEFT for low-cost adapters, QLoRA for memory-efficient scalar tuning, or full fine-tuning for ultimate fidelity). It enforces creation patterns, diagnoses risks using sharp-edge failure modes, and validates outputs against strict validations to ensure compliance with constraints. Recommendations include hyperparameters, checkpointing, evaluation metrics, and post-training QA steps.

When to use it

You need a fast, low-cost domain adaptation for inference-limited deployment (use LoRA/PEFT).
You want to fine-tune a large model on commodity GPUs with reduced memory footprint (use QLoRA).
You require full model behavior change or highest possible task performance and have sufficient compute (use full fine-tuning).
You must align model outputs to custom instructions or specialized workflows (use instruction tuning).
You need a repeatable pipeline with strict validation and failure diagnostics.

Best practices

Always consult the provided creation pattern file before building datasets or choosing architectures.
Use adapter-based methods (PEFT/LoRA) for rapid iteration and to preserve base model safety properties.
Run the listed sharp-edge diagnostics to identify hallucination, instruction-following drift, and safety regressions.
Keep a validation set aligned with downstream tasks and apply the validation rules to approve checkpoints.
Version checkpoints and record hyperparameters, tokenizers, and exact prompts used for evaluation.

Example use cases

Customize a customer-support assistant on company transcripts using LoRA for low-cost updates.
Compress a tuning run for a 13B model on a single GPU using QLoRA to fit memory constraints.
Build an instruct-tuned model for domain-specific compliance tasks with rigorous validation gates.
Prototype behavior changes quickly with PEFT adapters, then promote stable adapters to production.
Perform full fine-tuning when regulatory or performance needs demand end-to-end model modification.

FAQ

Which method should I pick for limited GPU memory?

Use QLoRA or PEFT/LoRA adapters; they minimize GPU memory and let you iterate quickly while keeping base checkpoints intact.

How do I avoid safety regressions after tuning?

Run the sharp-edge diagnostics to detect failure modes, use the validation rules to block unsafe checkpoints, and prefer adapter methods that retain base model safety behaviors.