home / skills / orchestra-research / ai-research-skills / axolotl

axolotl skill

needs review

This skill provides expert guidance for fine-tuning LLMs with Axolotl, including YAML configs, 100+ models, and multimodal support.

npx playbooks add skill orchestra-research/ai-research-skills --skill axolotl

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

4.7 KB

---
name: axolotl
description: Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Fine-Tuning, Axolotl, LLM, LoRA, QLoRA, DPO, KTO, ORPO, GRPO, YAML, HuggingFace, DeepSpeed, Multimodal]
dependencies: [axolotl, torch, transformers, datasets, peft, accelerate, deepspeed]
---

# Axolotl Skill

Comprehensive assistance with axolotl development, generated from official documentation.

## When to Use This Skill

This skill should be triggered when:
- Working with axolotl
- Asking about axolotl features or APIs
- Implementing axolotl solutions
- Debugging axolotl code
- Learning axolotl best practices

## Quick Reference

### Common Patterns

**Pattern 1:** To validate that acceptable data transfer speeds exist for your training job, running NCCL Tests can help pinpoint bottlenecks, for example:

```
./build/all_reduce_perf -b 8 -e 128M -f 2 -g 3
```

**Pattern 2:** Configure your model to use FSDP in the Axolotl yaml. For example:

```
fsdp_version: 2
fsdp_config:
  offload_params: true
  state_dict_type: FULL_STATE_DICT
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: LlamaDecoderLayer
  reshard_after_forward: true
```

**Pattern 3:** The context_parallel_size should be a divisor of the total number of GPUs. For example:

```
context_parallel_size
```

**Pattern 4:** For example: - With 8 GPUs and no sequence parallelism: 8 different batches processed per step - With 8 GPUs and context_parallel_size=4: Only 2 different batches processed per step (each split across 4 GPUs) - If your per-GPU micro_batch_size is 2, the global batch size decreases from 16 to 4

```
context_parallel_size=4
```

**Pattern 5:** Setting save_compressed: true in your configuration enables saving models in a compressed format, which: - Reduces disk space usage by approximately 40% - Maintains compatibility with vLLM for accelerated inference - Maintains compatibility with llmcompressor for further optimization (example: quantization)

```
save_compressed: true
```

**Pattern 6:** Note It is not necessary to place your integration in the integrations folder. It can be in any location, so long as it’s installed in a package in your python env. See this repo for an example: https://github.com/axolotl-ai-cloud/diff-transformer

```
integrations
```

**Pattern 7:** Handle both single-example and batched data. - single example: sample[‘input_ids’] is a list[int] - batched data: sample[‘input_ids’] is a list[list[int]]

```
utils.trainer.drop_long_seq(sample, sequence_len=2048, min_sequence_len=2)
```

### Example Code Patterns

**Example 1** (python):
```python
cli.cloud.modal_.ModalCloud(config, app=None)
```

**Example 2** (python):
```python
cli.cloud.modal_.run_cmd(cmd, run_folder, volumes=None)
```

**Example 3** (python):
```python
core.trainers.base.AxolotlTrainer(
    *_args,
    bench_data_collator=None,
    eval_data_collator=None,
    dataset_tags=None,
    **kwargs,
)
```

**Example 4** (python):
```python
core.trainers.base.AxolotlTrainer.log(logs, start_time=None)
```

**Example 5** (python):
```python
prompt_strategies.input_output.RawInputOutputPrompter()
```

## Reference Files

This skill includes comprehensive documentation in `references/`:

- **api.md** - Api documentation
- **dataset-formats.md** - Dataset-Formats documentation
- **other.md** - Other documentation

Use `view` to read specific reference files when detailed information is needed.

## Working with This Skill

### For Beginners
Start with the getting_started or tutorials reference files for foundational concepts.

### For Specific Features
Use the appropriate category reference file (api, guides, etc.) for detailed information.

### For Code Examples
The quick reference section above contains common patterns extracted from the official docs.

## Resources

### references/
Organized documentation extracted from official sources. These files contain:
- Detailed explanations
- Code examples with language annotations
- Links to original documentation
- Table of contents for quick navigation

### scripts/
Add helper scripts here for common automation tasks.

### assets/
Add templates, boilerplate, or example projects here.

## Notes

- This skill was automatically generated from official documentation
- Reference files preserve the structure and examples from source docs
- Code examples include language detection for better syntax highlighting
- Quick reference patterns are extracted from common usage examples in the docs

## Updating

To refresh this skill with updated documentation:
1. Re-run the scraper with the same configuration
2. The skill will be rebuilt with the latest information

Overview

This skill provides expert guidance for fine-tuning large language models using Axolotl. It covers YAML configuration patterns, model and trainer APIs, LoRA/QLoRA workflows, advanced optimization methods (DPO, KTO, ORPO, GRPO), and multimodal support across 100+ models. The content is practical, example-driven, and focused on reproducible training and inference setups.

How this skill works

The skill inspects Axolotl configuration patterns, training utilities, and code-level APIs to surface recommended settings, common pitfalls, and runnable examples. It maps YAML options to runtime behavior (FSDP, context parallelism, compression) and explains how to integrate LoRA/QLoRA and reward-style optimizers into Axolotl trainers. It also provides troubleshooting steps for performance bottlenecks, data handling, and multi-GPU setups.

When to use it

When preparing YAML configs for training or distributed setups with Axolotl
When implementing LoRA/QLoRA or integrating quantized training pipelines
When using DPO/KTO/ORPO/GRPO-style optimization for instruction tuning or reward learning
When debugging multi-GPU scaling, FSDP, or context-parallel issues
When adapting Axolotl for multimodal models or custom integrations

Best practices

Validate inter-node bandwidth using NCCL tests before large runs to identify transfer bottlenecks
Make context_parallel_size a divisor of total GPUs to ensure predictable global batch sizes
Use FSDP settings (offload, state_dict type, auto-wrap) in YAML for memory-efficient training on many layers
Enable save_compressed:true to reduce disk usage and keep compatibility with vLLM and post-quantization tools
Support both single-example and batched inputs in data pipelines; clip or drop overly long sequences before batching

Example use cases

Create a YAML that enables FSDP with auto_wrap for a Llama-based decoder and run a 100B LoRA fine-tune across 8+ GPUs
Swap training to QLoRA by adjusting config and enabling compressed saves for downstream fast inference with vLLM
Implement DPO reward optimization to align model responses using AxolotlTrainer hooks and custom evaluators
Integrate a custom dataset format by following dataset-format patterns and handling both single and batched input_ids
Troubleshoot degraded throughput by running NCCL all-reduce perf tests and tuning context_parallel_size

FAQ

Can I place integrations outside a specific folder?

Yes. Integrations can live anywhere as long as they are installed in the Python environment as a package.

How do I handle very long sequences in datasets?

Trim or drop sequences that exceed your model max length using provided utilities, e.g., drop_long_seq with sensible min/max thresholds.