home / skills / benchflow-ai / skillsbench / modal-gpu

modal-gpu skill

safe

/tasks/mhc-layer-impl/environment/skills/modal-gpu

This skill enables running Python ML training on cloud GPUs via Modal, handling GPU selection, data download, and remote result management.

npx playbooks add skill benchflow-ai/skillsbench --skill modal-gpu

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

2.6 KB

---
name: modal-gpu
description: Run Python code on cloud GPUs using Modal serverless platform. Use when you need A100/T4/A10G GPU access for training ML models. Covers Modal app setup, GPU selection, data downloading inside functions, and result handling.
---

# Modal GPU Training

## Overview

Modal is a serverless platform for running Python code on cloud GPUs. It provides:

- **Serverless GPUs**: On-demand access to T4, A10G, A100 GPUs
- **Container Images**: Define dependencies declaratively with pip
- **Remote Execution**: Run functions on cloud infrastructure
- **Result Handling**: Return Python objects from remote functions

Two patterns:
- **Single Function**: Simple script with `@app.function` decorator
- **Multi-Function**: Complex workflows with multiple remote calls

## Quick Reference

| Topic | Reference |
|-------|-----------|
| Basic Structure | [Getting Started](references/getting-started.md) |
| GPU Options | [GPU Selection](references/gpu-selection.md) |
| Data Handling | [Data Download](references/data-download.md) |
| Results & Outputs | [Results](references/results.md) |
| Troubleshooting | [Common Issues](references/common-issues.md) |

## Installation

```bash
pip install modal
modal token set --token-id <id> --token-secret <secret>
```

## Minimal Example

```python
import modal

app = modal.App("my-training-app")

image = modal.Image.debian_slim(python_version="3.11").pip_install(
    "torch",
    "einops",
    "numpy",
)

@app.function(gpu="A100", image=image, timeout=3600)
def train():
    import torch
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")

    # Training code here
    return {"loss": 0.5}

@app.local_entrypoint()
def main():
    results = train.remote()
    print(results)
```

## Common Imports

```python
import modal
from modal import Image, App

# Inside remote function
import torch
import torch.nn as nn
from huggingface_hub import hf_hub_download
```

## When to Use What

| Scenario | Approach |
|----------|----------|
| Quick GPU experiments | `gpu="T4"` (16GB, cheapest) |
| Medium training jobs | `gpu="A10G"` (24GB) |
| Large-scale training | `gpu="A100"` (40/80GB, fastest) |
| Long-running jobs | Set `timeout=3600` or higher |
| Data from HuggingFace | Download inside function with `hf_hub_download` |
| Return metrics | Return dict from function |

## Running

```bash
# Run script
modal run train_modal.py

# Run in background
modal run --detach train_modal.py
```

## External Resources

- Modal Documentation: https://modal.com/docs
- Modal Examples: https://github.com/modal-labs/modal-examples

Overview

This skill shows how to run Python code on cloud GPUs using the Modal serverless platform. It guides you through setting up a Modal app, choosing A100/T4/A10G GPUs, packaging dependencies, and returning results from remote functions. The content covers both simple single-function workflows and multi-function orchestration for more complex training jobs.

How this skill works

You declare a Modal App and an execution image that includes your Python dependencies. Annotate functions with @app.function specifying gpu, image, and timeout; those functions execute remotely on the chosen GPU and can download data inside the function (for example via hf_hub_download). Remote functions run on Modal-managed containers and return Python objects or dictionaries you can inspect locally after completion.

When to use it

Quick GPU experiments where you need temporary access to a T4 for fast iteration.
Medium-sized training runs that require an A10G for increased memory and performance.
Large-scale or high-throughput training that benefits from A100 GPUs.
Workflows that must download datasets or model artifacts at runtime inside the remote function.
Jobs where you want serverless execution and result handling without managing VM lifecycles.

Best practices

Install and pin required Python packages in the Modal Image so remote functions are reproducible.
Select the smallest GPU that meets memory needs to control cost (T4 → A10G → A100).
Download data and model checkpoints inside the remote function to avoid packaging large files in the image.
Return compact metrics (e.g., loss, accuracy, checkpoints paths) instead of huge objects to minimize data transfer.
Set an appropriate timeout for long-running training jobs and consider running detached for background runs.

Example use cases

Run a quick fine-tuning loop on a small model using a T4 for rapid iteration.
Train a medium-sized model on A10G when 16GB is insufficient but A100 is overkill.
Perform large-batch training or distributed experiments on A100 for maximum throughput.
Download a checkpoint from Hugging Face inside the function and resume training remotely.
Execute long-running experiments detached and retrieve final metrics and artifacts after completion.

FAQ

How do I authenticate Modal and set credentials?

Install the modal package and run modal token set with your token id and secret; include these in your local environment before running.

Where should I download datasets or model files?

Download datasets and checkpoints inside the remote function (for example using hf_hub_download) to keep images small and avoid shipping large files with the container.

How do I retrieve results from a remote run?

Have the remote function return a Python dict or object; call .remote() from the local entrypoint and inspect the returned value or logs.