home / skills / orchestra-research / ai-research-skills / axolotl

axolotl skill

/03-fine-tuning/axolotl

This skill provides expert guidance for fine-tuning LLMs with Axolotl, including YAML configs, 100+ models, and multimodal support.

npx playbooks add skill orchestra-research/ai-research-skills --skill axolotl

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
4.7 KB
---
name: axolotl
description: Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Fine-Tuning, Axolotl, LLM, LoRA, QLoRA, DPO, KTO, ORPO, GRPO, YAML, HuggingFace, DeepSpeed, Multimodal]
dependencies: [axolotl, torch, transformers, datasets, peft, accelerate, deepspeed]
---

# Axolotl Skill

Comprehensive assistance with axolotl development, generated from official documentation.

## When to Use This Skill

This skill should be triggered when:
- Working with axolotl
- Asking about axolotl features or APIs
- Implementing axolotl solutions
- Debugging axolotl code
- Learning axolotl best practices

## Quick Reference

### Common Patterns

**Pattern 1:** To validate that acceptable data transfer speeds exist for your training job, running NCCL Tests can help pinpoint bottlenecks, for example:

```
./build/all_reduce_perf -b 8 -e 128M -f 2 -g 3
```

**Pattern 2:** Configure your model to use FSDP in the Axolotl yaml. For example:

```
fsdp_version: 2
fsdp_config:
  offload_params: true
  state_dict_type: FULL_STATE_DICT
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: LlamaDecoderLayer
  reshard_after_forward: true
```

**Pattern 3:** The context_parallel_size should be a divisor of the total number of GPUs. For example:

```
context_parallel_size
```

**Pattern 4:** For example: - With 8 GPUs and no sequence parallelism: 8 different batches processed per step - With 8 GPUs and context_parallel_size=4: Only 2 different batches processed per step (each split across 4 GPUs) - If your per-GPU micro_batch_size is 2, the global batch size decreases from 16 to 4

```
context_parallel_size=4
```

**Pattern 5:** Setting save_compressed: true in your configuration enables saving models in a compressed format, which: - Reduces disk space usage by approximately 40% - Maintains compatibility with vLLM for accelerated inference - Maintains compatibility with llmcompressor for further optimization (example: quantization)

```
save_compressed: true
```

**Pattern 6:** Note It is not necessary to place your integration in the integrations folder. It can be in any location, so long as it’s installed in a package in your python env. See this repo for an example: https://github.com/axolotl-ai-cloud/diff-transformer

```
integrations
```

**Pattern 7:** Handle both single-example and batched data. - single example: sample[‘input_ids’] is a list[int] - batched data: sample[‘input_ids’] is a list[list[int]]

```
utils.trainer.drop_long_seq(sample, sequence_len=2048, min_sequence_len=2)
```

### Example Code Patterns

**Example 1** (python):
```python
cli.cloud.modal_.ModalCloud(config, app=None)
```

**Example 2** (python):
```python
cli.cloud.modal_.run_cmd(cmd, run_folder, volumes=None)
```

**Example 3** (python):
```python
core.trainers.base.AxolotlTrainer(
    *_args,
    bench_data_collator=None,
    eval_data_collator=None,
    dataset_tags=None,
    **kwargs,
)
```

**Example 4** (python):
```python
core.trainers.base.AxolotlTrainer.log(logs, start_time=None)
```

**Example 5** (python):
```python
prompt_strategies.input_output.RawInputOutputPrompter()
```

## Reference Files

This skill includes comprehensive documentation in `references/`:

- **api.md** - Api documentation
- **dataset-formats.md** - Dataset-Formats documentation
- **other.md** - Other documentation

Use `view` to read specific reference files when detailed information is needed.

## Working with This Skill

### For Beginners
Start with the getting_started or tutorials reference files for foundational concepts.

### For Specific Features
Use the appropriate category reference file (api, guides, etc.) for detailed information.

### For Code Examples
The quick reference section above contains common patterns extracted from the official docs.

## Resources

### references/
Organized documentation extracted from official sources. These files contain:
- Detailed explanations
- Code examples with language annotations
- Links to original documentation
- Table of contents for quick navigation

### scripts/
Add helper scripts here for common automation tasks.

### assets/
Add templates, boilerplate, or example projects here.

## Notes

- This skill was automatically generated from official documentation
- Reference files preserve the structure and examples from source docs
- Code examples include language detection for better syntax highlighting
- Quick reference patterns are extracted from common usage examples in the docs

## Updating

To refresh this skill with updated documentation:
1. Re-run the scraper with the same configuration
2. The skill will be rebuilt with the latest information


Overview

This skill provides expert guidance for fine-tuning large language models using Axolotl. It covers YAML configuration patterns, model and trainer APIs, LoRA/QLoRA workflows, advanced optimization methods (DPO, KTO, ORPO, GRPO), and multimodal support across 100+ models. The content is practical, example-driven, and focused on reproducible training and inference setups.

How this skill works

The skill inspects Axolotl configuration patterns, training utilities, and code-level APIs to surface recommended settings, common pitfalls, and runnable examples. It maps YAML options to runtime behavior (FSDP, context parallelism, compression) and explains how to integrate LoRA/QLoRA and reward-style optimizers into Axolotl trainers. It also provides troubleshooting steps for performance bottlenecks, data handling, and multi-GPU setups.

When to use it

  • When preparing YAML configs for training or distributed setups with Axolotl
  • When implementing LoRA/QLoRA or integrating quantized training pipelines
  • When using DPO/KTO/ORPO/GRPO-style optimization for instruction tuning or reward learning
  • When debugging multi-GPU scaling, FSDP, or context-parallel issues
  • When adapting Axolotl for multimodal models or custom integrations

Best practices

  • Validate inter-node bandwidth using NCCL tests before large runs to identify transfer bottlenecks
  • Make context_parallel_size a divisor of total GPUs to ensure predictable global batch sizes
  • Use FSDP settings (offload, state_dict type, auto-wrap) in YAML for memory-efficient training on many layers
  • Enable save_compressed:true to reduce disk usage and keep compatibility with vLLM and post-quantization tools
  • Support both single-example and batched inputs in data pipelines; clip or drop overly long sequences before batching

Example use cases

  • Create a YAML that enables FSDP with auto_wrap for a Llama-based decoder and run a 100B LoRA fine-tune across 8+ GPUs
  • Swap training to QLoRA by adjusting config and enabling compressed saves for downstream fast inference with vLLM
  • Implement DPO reward optimization to align model responses using AxolotlTrainer hooks and custom evaluators
  • Integrate a custom dataset format by following dataset-format patterns and handling both single and batched input_ids
  • Troubleshoot degraded throughput by running NCCL all-reduce perf tests and tuning context_parallel_size

FAQ

Can I place integrations outside a specific folder?

Yes. Integrations can live anywhere as long as they are installed in the Python environment as a package.

How do I handle very long sequences in datasets?

Trim or drop sequences that exceed your model max length using provided utilities, e.g., drop_long_seq with sensible min/max thresholds.