home / skills / orchestra-research / ai-research-skills / llama-factory

llama-factory skill

This skill provides expert guidance for fine-tuning LLaMA models with Llama-Factory, covering APIs, setup, and best practices for multimodal, 8-bit QLoRA

npx playbooks add skill orchestra-research/ai-research-skills --skill llama-factory

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

2.4 KB

---
name: llama-factory
description: Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Fine-Tuning, LLaMA Factory, LLM, WebUI, No-Code, QLoRA, LoRA, Multimodal, HuggingFace, Llama, Qwen, Gemma]
dependencies: [llmtuner, torch, transformers, datasets, peft, accelerate, gradio]
---

# Llama-Factory Skill

Comprehensive assistance with llama-factory development, generated from official documentation.

## When to Use This Skill

This skill should be triggered when:
- Working with llama-factory
- Asking about llama-factory features or APIs
- Implementing llama-factory solutions
- Debugging llama-factory code
- Learning llama-factory best practices

## Quick Reference

### Common Patterns

*Quick reference patterns will be added as you use the skill.*

## Reference Files

This skill includes comprehensive documentation in `references/`:

- **_images.md** -  Images documentation
- **advanced.md** - Advanced documentation
- **getting_started.md** - Getting Started documentation
- **other.md** - Other documentation

Use `view` to read specific reference files when detailed information is needed.

## Working with This Skill

### For Beginners
Start with the getting_started or tutorials reference files for foundational concepts.

### For Specific Features
Use the appropriate category reference file (api, guides, etc.) for detailed information.

### For Code Examples
The quick reference section above contains common patterns extracted from the official docs.

## Resources

### references/
Organized documentation extracted from official sources. These files contain:
- Detailed explanations
- Code examples with language annotations
- Links to original documentation
- Table of contents for quick navigation

### scripts/
Add helper scripts here for common automation tasks.

### assets/
Add templates, boilerplate, or example projects here.

## Notes

- This skill was automatically generated from official documentation
- Reference files preserve the structure and examples from source docs
- Code examples include language detection for better syntax highlighting
- Quick reference patterns are extracted from common usage examples in the docs

## Updating

To refresh this skill with updated documentation:
1. Re-run the scraper with the same configuration
2. The skill will be rebuilt with the latest information

Overview

This skill provides expert guidance for fine-tuning large language models using LLaMA-Factory WebUI. It covers no-code workflows, support for 100+ models, quantized training (2/3/4/5/6/8-bit QLoRA), and multimodal configurations. The goal is to help engineers and researchers deploy efficient, reproducible fine-tuning pipelines.

How this skill works

The skill inspects LLaMA-Factory features, documentation references, and common usage patterns to give actionable steps and troubleshooting advice. It explains how to configure WebUI sessions, select quantization and bit-width for QLoRA, and prepare datasets for multimodal training. It also highlights scripts, templates, and automation tips for end-to-end workflows.

When to use it

You are preparing to fine-tune a model with LLaMA-Factory WebUI.
You need guidance selecting QLoRA bit-width or performance vs. resource trade-offs.
You are integrating multimodal inputs (text, images) into a fine-tuning job.
You want no-code or low-code options to run reproducible experiments.
You are debugging training, convergence, or quantization-related issues.

Best practices

Start with the getting_started and tutorials references to understand core flows.
Choose QLoRA bit-width based on GPU memory and target performance; test 4-bit and 8-bit first for balance.
Use well-structured, cleaned datasets and consistent preprocessing for multimodal inputs.
Automate experiment logging and checkpointing; keep reproducible config files for each run.
Validate models on held-out sets and run lightweight inference checks before full deployment.

Example use cases

Low-cost fine-tuning of a 7B model using 4-bit QLoRA via the WebUI for domain adaptation.
Rapid prototyping of multimodal assistants by combining text and image datasets in LLaMA-Factory.
Comparing inference latency and accuracy across models using 2/3/4/8-bit quantized checkpoints.
Automating batch runs with provided helper scripts to sweep hyperparameters and collect metrics.
Debugging training instability by inspecting config, optimizer settings, and dataset examples.

FAQ

Which QLoRA bit-width should I try first?

Start with 4-bit for a strong balance of memory savings and performance. If you have more memory, test 8-bit; for extreme memory constraints, try 2/3-bit but expect potential quality loss.

Can I use LLaMA-Factory without coding?

Yes. The WebUI offers no-code workflows for common fine-tuning tasks, while advanced users can access scripts and config files for programmatic control.