home / skills / orchestra-research / ai-research-skills / llama-factory

This skill provides expert guidance for fine-tuning LLaMA models with Llama-Factory, covering APIs, setup, and best practices for multimodal, 8-bit QLoRA

npx playbooks add skill orchestra-research/ai-research-skills --skill llama-factory

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
2.4 KB
---
name: llama-factory
description: Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Fine-Tuning, LLaMA Factory, LLM, WebUI, No-Code, QLoRA, LoRA, Multimodal, HuggingFace, Llama, Qwen, Gemma]
dependencies: [llmtuner, torch, transformers, datasets, peft, accelerate, gradio]
---

# Llama-Factory Skill

Comprehensive assistance with llama-factory development, generated from official documentation.

## When to Use This Skill

This skill should be triggered when:
- Working with llama-factory
- Asking about llama-factory features or APIs
- Implementing llama-factory solutions
- Debugging llama-factory code
- Learning llama-factory best practices

## Quick Reference

### Common Patterns

*Quick reference patterns will be added as you use the skill.*

## Reference Files

This skill includes comprehensive documentation in `references/`:

- **_images.md** -  Images documentation
- **advanced.md** - Advanced documentation
- **getting_started.md** - Getting Started documentation
- **other.md** - Other documentation

Use `view` to read specific reference files when detailed information is needed.

## Working with This Skill

### For Beginners
Start with the getting_started or tutorials reference files for foundational concepts.

### For Specific Features
Use the appropriate category reference file (api, guides, etc.) for detailed information.

### For Code Examples
The quick reference section above contains common patterns extracted from the official docs.

## Resources

### references/
Organized documentation extracted from official sources. These files contain:
- Detailed explanations
- Code examples with language annotations
- Links to original documentation
- Table of contents for quick navigation

### scripts/
Add helper scripts here for common automation tasks.

### assets/
Add templates, boilerplate, or example projects here.

## Notes

- This skill was automatically generated from official documentation
- Reference files preserve the structure and examples from source docs
- Code examples include language detection for better syntax highlighting
- Quick reference patterns are extracted from common usage examples in the docs

## Updating

To refresh this skill with updated documentation:
1. Re-run the scraper with the same configuration
2. The skill will be rebuilt with the latest information


Overview

This skill provides expert guidance for fine-tuning large language models using LLaMA-Factory WebUI. It covers no-code workflows, support for 100+ models, quantized training (2/3/4/5/6/8-bit QLoRA), and multimodal configurations. The goal is to help engineers and researchers deploy efficient, reproducible fine-tuning pipelines.

How this skill works

The skill inspects LLaMA-Factory features, documentation references, and common usage patterns to give actionable steps and troubleshooting advice. It explains how to configure WebUI sessions, select quantization and bit-width for QLoRA, and prepare datasets for multimodal training. It also highlights scripts, templates, and automation tips for end-to-end workflows.

When to use it

  • You are preparing to fine-tune a model with LLaMA-Factory WebUI.
  • You need guidance selecting QLoRA bit-width or performance vs. resource trade-offs.
  • You are integrating multimodal inputs (text, images) into a fine-tuning job.
  • You want no-code or low-code options to run reproducible experiments.
  • You are debugging training, convergence, or quantization-related issues.

Best practices

  • Start with the getting_started and tutorials references to understand core flows.
  • Choose QLoRA bit-width based on GPU memory and target performance; test 4-bit and 8-bit first for balance.
  • Use well-structured, cleaned datasets and consistent preprocessing for multimodal inputs.
  • Automate experiment logging and checkpointing; keep reproducible config files for each run.
  • Validate models on held-out sets and run lightweight inference checks before full deployment.

Example use cases

  • Low-cost fine-tuning of a 7B model using 4-bit QLoRA via the WebUI for domain adaptation.
  • Rapid prototyping of multimodal assistants by combining text and image datasets in LLaMA-Factory.
  • Comparing inference latency and accuracy across models using 2/3/4/8-bit quantized checkpoints.
  • Automating batch runs with provided helper scripts to sweep hyperparameters and collect metrics.
  • Debugging training instability by inspecting config, optimizer settings, and dataset examples.

FAQ

Which QLoRA bit-width should I try first?

Start with 4-bit for a strong balance of memory savings and performance. If you have more memory, test 8-bit; for extreme memory constraints, try 2/3-bit but expect potential quality loss.

Can I use LLaMA-Factory without coding?

Yes. The WebUI offers no-code workflows for common fine-tuning tasks, while advanced users can access scripts and config files for programmatic control.