home / skills / eyadsibai / ltk / modal
This skill helps you deploy and run serverless GPU-powered ML workloads with autoscaling, scheduling, and pay-per-use compute.
npx playbooks add skill eyadsibai/ltk --skill modalReview the files below or copy the command above to add this skill to your agents.
---
name: modal
description: Use when "Modal", "serverless GPU", "cloud GPU", "deploy ML model", or asking about "serverless containers", "GPU compute", "batch processing", "scheduled jobs", "autoscaling ML"
version: 1.0.0
---
<!-- Adapted from: claude-scientific-skills/scientific-skills/modal -->
# Modal Serverless Cloud Platform
Serverless Python execution with GPUs, autoscaling, and pay-per-use compute.
## When to Use
- Deploy and serve ML models (LLMs, image generation)
- Run GPU-accelerated computation
- Batch process large datasets in parallel
- Schedule compute-intensive jobs
- Build serverless APIs with autoscaling
## Quick Start
```bash
# Install
pip install modal
# Authenticate
modal token new
```
```python
import modal
app = modal.App("my-app")
@app.function()
def hello():
return "Hello from Modal!"
# Run with: modal run script.py
```
## Container Images
```python
# Build image with dependencies
image = (
modal.Image.debian_slim(python_version="3.12")
.pip_install("torch", "transformers", "numpy")
)
app = modal.App("ml-app", image=image)
```
## GPU Functions
```python
@app.function(gpu="H100")
def train_model():
import torch
assert torch.cuda.is_available()
# GPU code here
# Available GPUs: T4, L4, A10, A100, L40S, H100, H200, B200
# Multi-GPU: gpu="H100:8"
```
## Web Endpoints
```python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
result = model.predict(data["input"])
return {"prediction": result}
# Deploy: modal deploy script.py
```
## Scheduled Jobs
```python
@app.function(schedule=modal.Cron("0 2 * * *")) # Daily at 2 AM
def daily_backup():
pass
@app.function(schedule=modal.Period(hours=4)) # Every 4 hours
def refresh_cache():
pass
```
## Autoscaling
```python
@app.function()
def process_item(item_id: int):
return analyze(item_id)
@app.local_entrypoint()
def main():
items = range(1000)
# Automatically parallelized across containers
results = list(process_item.map(items))
```
## Persistent Storage
```python
volume = modal.Volume.from_name("my-data", create_if_missing=True)
@app.function(volumes={"/data": volume})
def save_results(data):
with open("/data/results.txt", "w") as f:
f.write(data)
volume.commit() # Persist changes
```
## Secrets Management
```python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
import os
token = os.environ["HF_TOKEN"]
```
## ML Model Serving
```python
@app.cls(gpu="L40S")
class Model:
@modal.enter()
def load_model(self):
from transformers import pipeline
self.pipe = pipeline("text-classification", device="cuda")
@modal.method()
def predict(self, text: str):
return self.pipe(text)
@app.local_entrypoint()
def main():
model = Model()
result = model.predict.remote("Modal is great!")
```
## Resource Configuration
```python
@app.function(
cpu=8.0, # 8 CPU cores
memory=32768, # 32 GiB RAM
ephemeral_disk=10240, # 10 GiB disk
timeout=3600 # 1 hour timeout
)
def memory_intensive_task():
pass
```
## Best Practices
1. **Pin dependencies** for reproducible builds
2. **Use appropriate GPU types** - L40S for inference, H100 for training
3. **Leverage caching** via Volumes for model weights
4. **Use `.map()` for parallel processing**
5. **Import packages inside functions** if not available locally
6. **Store secrets securely** - never hardcode API keys
## vs Alternatives
| Platform | Best For |
|----------|----------|
| **Modal** | Serverless GPUs, autoscaling, Python-native |
| RunPod | GPU rental, long-running jobs |
| AWS Lambda | CPU workloads, AWS ecosystem |
| Replicate | Model hosting, simple deployments |
## Resources
- Docs: <https://modal.com/docs>
- Examples: <https://github.com/modal-labs/modal-examples>
This skill encapsulates deploying and running Python workloads on Modal’s serverless cloud with GPU support, autoscaling, and pay-per-use billing. It focuses on making GPU-accelerated model training, inference, batch processing, and scheduled jobs simple to deploy and manage. The skill shows how to build container images, configure resources, and expose web endpoints or scheduled functions for production use.
You define an app and functions in Python using Modal primitives (App, function, Image, Volume, Secret). Functions can request GPUs, CPUs, memory, ephemeral disk, or schedules; Modal then provisions serverless containers with the requested resources and autoscaling. Use .map() for parallel work, web_endpoint for HTTP APIs, and Volumes/Secrets for persistent storage and credentials.
How do I choose the right GPU?
Choose inference-optimized GPUs like L40S or L4 for lower latency and cost; pick H100/H200 or A100 for large-scale training and multi-GPU parallelism.
Can I persist files between runs?
Yes. Use Modal Volume objects to mount persistent storage into containers and call volume.commit() to save changes.