home / skills / orchestra-research / ai-research-skills / unsloth

unsloth skill

/03-fine-tuning/unsloth

This skill provides expert guidance for fast fine-tuning with Unsloth, enabling 2-5x training speed and reduced memory usage.

npx playbooks add skill orchestra-research/ai-research-skills --skill unsloth

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
2.3 KB
---
name: unsloth
description: Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Fine-Tuning, Unsloth, Fast Training, LoRA, QLoRA, Memory-Efficient, Optimization, Llama, Mistral, Gemma, Qwen]
dependencies: [unsloth, torch, transformers, trl, datasets, peft]
---

# Unsloth Skill

Comprehensive assistance with unsloth development, generated from official documentation.

## When to Use This Skill

This skill should be triggered when:
- Working with unsloth
- Asking about unsloth features or APIs
- Implementing unsloth solutions
- Debugging unsloth code
- Learning unsloth best practices

## Quick Reference

### Common Patterns

*Quick reference patterns will be added as you use the skill.*

## Reference Files

This skill includes comprehensive documentation in `references/`:

- **llms-txt.md** - Llms-Txt documentation

Use `view` to read specific reference files when detailed information is needed.

## Working with This Skill

### For Beginners
Start with the getting_started or tutorials reference files for foundational concepts.

### For Specific Features
Use the appropriate category reference file (api, guides, etc.) for detailed information.

### For Code Examples
The quick reference section above contains common patterns extracted from the official docs.

## Resources

### references/
Organized documentation extracted from official sources. These files contain:
- Detailed explanations
- Code examples with language annotations
- Links to original documentation
- Table of contents for quick navigation

### scripts/
Add helper scripts here for common automation tasks.

### assets/
Add templates, boilerplate, or example projects here.

## Notes

- This skill was automatically generated from official documentation
- Reference files preserve the structure and examples from source docs
- Code examples include language detection for better syntax highlighting
- Quick reference patterns are extracted from common usage examples in the docs

## Updating

To refresh this skill with updated documentation:
1. Re-run the scraper with the same configuration
2. The skill will be rebuilt with the latest information

<!-- Trigger re-upload 1763621536 -->



Overview

This skill provides expert guidance for fast fine-tuning with Unsloth, focusing on 2–5x faster training and 50–80% memory savings using LoRA and QLoRA optimizations. It distills practical workflows, configuration tips, and debugging techniques so you can get efficient, reproducible results quickly. The content emphasizes integration with common stacks like Hugging Face, vLLM, and PyTorch-based training loops.

How this skill works

The skill inspects your fine-tuning pipeline and recommends performance optimizations: low-rank adapters (LoRA), quantized low-rank adapters (QLoRA), mixed precision, and memory-saving schedulers. It explains which components to modify (model loading, optimizer, checkpointing, tokenization) and provides concrete configuration patterns to reduce GPU and CPU memory footprint. It also outlines debugging steps for OOMs, convergence issues, and evaluation mismatches.

When to use it

  • Fine-tuning large language models where GPU memory or cost is a bottleneck.
  • Prototyping LoRA/QLoRA adapters to minimize training time and resources.
  • Migrating a research training loop to production with lower latency and memory use.
  • Debugging training crashes, unexpected accuracy drops, or slow iteration cycles.
  • Benchmarking speed and memory trade-offs across quantization and precision choices.

Best practices

  • Start with LoRA on a frozen backbone before switching to heavier QLoRA quantization.
  • Use mixed precision (FP16/BF16) plus gradient accumulation to balance batch size and memory.
  • Enable gradient checkpointing for deep models to trade compute for memory when necessary.
  • Profile a single step with a representative batch to find hot spots and GPU memory peaks.
  • Keep optimizer states lightweight (AdamW fused or 8-bit optimizers) and save periodic, small checkpoints.

Example use cases

  • Domain-adapt a 7B model using LoRA to fine-tune on proprietary customer support logs with a single A100 40GB GPU.
  • Experiment with QLoRA to train a 13B model on commodity hardware while cutting memory use by ~60%.
  • Speed up hyperparameter sweeps by using low-cost adapters and smaller warm-start checkpoints.
  • Integrate Unsloth optimizations into a CI pipeline for reproducible training and faster iteration.
  • Recover from OOMs by applying mixed precision, checkpointing, and reducing optimizer state size.

FAQ

Is Unsloth compatible with Hugging Face Transformers and vLLM?

Yes. The guidance covers common integration patterns with Hugging Face model loading and tokenizers, and outlines how to use vLLM for faster serving and evaluation loops.

What speedup and memory savings can I expect?

Typical outcomes are 2–5x faster training and 50–80% less memory depending on model size, batch size, and chosen optimizations (LoRA vs QLoRA, mixed precision, checkpointing).