home / skills / josiahsiegel / claude-plugin-marketplace / modal-knowledge
npx playbooks add skill josiahsiegel/claude-plugin-marketplace --skill modal-knowledgeReview the files below or copy the command above to add this skill to your agents.
---
name: modal-knowledge
description: Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices
---
# Modal Knowledge Skill
Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices. Activate this skill when users need detailed information about Modal's serverless cloud platform.
## Activation Triggers
Activate this skill when users ask about:
- Modal.com platform features and capabilities
- GPU-accelerated Python functions
- Serverless container configuration
- Modal pricing and billing
- Modal CLI commands
- Web endpoints and APIs on Modal
- Scheduled/cron jobs on Modal
- Modal volumes, secrets, and storage
- Parallel processing with Modal
- Modal deployment and CI/CD
---
## Platform Overview
Modal is a serverless cloud platform for running Python code, optimized for AI/ML workloads with:
- **Zero Configuration**: Everything defined in Python code
- **Fast GPU Startup**: ~1 second container spin-up
- **Automatic Scaling**: Scale to zero, scale to thousands
- **Per-Second Billing**: Only pay for active compute
- **Multi-Cloud**: AWS, GCP, Oracle Cloud Infrastructure
---
## Core Components Reference
### Apps and Functions
```python
import modal
app = modal.App("app-name")
@app.function()
def basic_function(arg: str) -> str:
return f"Result: {arg}"
@app.local_entrypoint()
def main():
result = basic_function.remote("test")
print(result)
```
### Function Decorator Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| `image` | Image | Container image configuration |
| `gpu` | str/list | GPU type(s): "T4", "A100", ["H100", "A100"] |
| `cpu` | float | CPU cores (0.125 to 64) |
| `memory` | int | Memory in MB (128 to 262144) |
| `timeout` | int | Max execution seconds |
| `retries` | int | Retry attempts on failure |
| `secrets` | list | Secrets to inject |
| `volumes` | dict | Volume mount points |
| `schedule` | Cron/Period | Scheduled execution |
| `concurrency_limit` | int | Max concurrent executions |
| `container_idle_timeout` | int | Seconds to keep warm |
| `include_source` | bool | Auto-sync source code |
---
## GPU Reference
### Available GPUs
| GPU | Memory | Use Case | ~Cost/hr |
|-----|--------|----------|----------|
| T4 | 16 GB | Small inference | $0.59 |
| L4 | 24 GB | Medium inference | $0.80 |
| A10G | 24 GB | Inference/fine-tuning | $1.10 |
| L40S | 48 GB | Heavy inference | $1.50 |
| A100-40GB | 40 GB | Training | $2.00 |
| A100-80GB | 80 GB | Large models | $3.00 |
| H100 | 80 GB | Cutting-edge | $5.00 |
| H200 | 141 GB | Largest models | $5.00 |
| B200 | 180+ GB | Latest gen | $6.25 |
### GPU Configuration
```python
# Single GPU
@app.function(gpu="A100")
# Specific memory variant
@app.function(gpu="A100-80GB")
# Multi-GPU
@app.function(gpu="H100:4")
# Fallbacks (tries in order)
@app.function(gpu=["H100", "A100", "any"])
# "any" = L4, A10G, or T4
@app.function(gpu="any")
```
---
## Image Building
### Base Images
```python
# Debian slim (recommended)
modal.Image.debian_slim(python_version="3.11")
# From Dockerfile
modal.Image.from_dockerfile("./Dockerfile")
# From Docker registry
modal.Image.from_registry("nvidia/cuda:12.1.0-base-ubuntu22.04")
```
### Package Installation
```python
# pip (standard)
image.pip_install("torch", "transformers")
# uv (FASTER - 10-100x)
image.uv_pip_install("torch", "transformers")
# System packages
image.apt_install("ffmpeg", "libsm6")
# Shell commands
image.run_commands("apt-get update", "make install")
```
### Adding Files
```python
# Single file
image.add_local_file("./config.json", "/app/config.json")
# Directory
image.add_local_dir("./models", "/app/models")
# Python source
image.add_local_python_source("my_module")
# Environment variables
image.env({"VAR": "value"})
```
### Build-Time Function
```python
def download_model():
from huggingface_hub import snapshot_download
snapshot_download("model-name")
image.run_function(download_model, secrets=[...])
```
---
## Storage
### Volumes
```python
# Create/reference volume
vol = modal.Volume.from_name("my-vol", create_if_missing=True)
# Mount in function
@app.function(volumes={"/data": vol})
def func():
# Read/write to /data
vol.commit() # Persist changes
```
### Secrets
```python
# From dashboard (recommended)
modal.Secret.from_name("secret-name")
# From dictionary
modal.Secret.from_dict({"KEY": "value"})
# From local env
modal.Secret.from_local_environ(["KEY1", "KEY2"])
# From .env file
modal.Secret.from_dotenv()
# Usage
@app.function(secrets=[modal.Secret.from_name("api-keys")])
def func():
import os
key = os.environ["API_KEY"]
```
### Dict and Queue
```python
# Distributed dict
d = modal.Dict.from_name("cache", create_if_missing=True)
d["key"] = "value"
d.put("key", "value", ttl=3600)
# Distributed queue
q = modal.Queue.from_name("jobs", create_if_missing=True)
q.put("task")
item = q.get()
```
---
## Web Endpoints
### FastAPI Endpoint (Simple)
```python
@app.function()
@modal.fastapi_endpoint()
def hello(name: str = "World"):
return {"message": f"Hello, {name}!"}
```
### ASGI App (Full FastAPI)
```python
from fastapi import FastAPI
web_app = FastAPI()
@web_app.post("/predict")
def predict(text: str):
return {"result": process(text)}
@app.function()
@modal.asgi_app()
def fastapi_app():
return web_app
```
### WSGI App (Flask)
```python
from flask import Flask
flask_app = Flask(__name__)
@app.function()
@modal.wsgi_app()
def flask_endpoint():
return flask_app
```
### Custom Web Server
```python
@app.function()
@modal.web_server(port=8000)
def custom_server():
subprocess.run(["python", "-m", "http.server", "8000"])
```
### Custom Domains
```python
@modal.asgi_app(custom_domains=["api.example.com"])
```
---
## Scheduling
### Cron
```python
# Daily at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * *"))
# With timezone
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
```
### Period
```python
@app.function(schedule=modal.Period(hours=5))
@app.function(schedule=modal.Period(days=1))
```
**Note:** Scheduled functions only run with `modal deploy`, not `modal run`.
---
## Parallel Processing
### Map
```python
# Parallel execution (up to 1000 concurrent)
results = list(func.map(items))
# Unordered (faster)
results = list(func.map(items, order_outputs=False))
```
### Starmap
```python
# Spread args
pairs = [(1, 2), (3, 4)]
results = list(add.starmap(pairs))
```
### Spawn
```python
# Async job (returns immediately)
call = func.spawn(data)
result = call.get() # Get result later
# Spawn many
calls = [func.spawn(item) for item in items]
results = [call.get() for call in calls]
```
---
## Container Lifecycle (Classes)
```python
@app.cls(gpu="A100", container_idle_timeout=300)
class Server:
@modal.enter()
def load(self):
self.model = load_model()
@modal.method()
def predict(self, text):
return self.model(text)
@modal.exit()
def cleanup(self):
del self.model
```
### Concurrency
```python
@modal.concurrent(max_inputs=100, target_inputs=80)
@modal.method()
def batched(self, item):
pass
```
---
## CLI Commands
### Development
```bash
modal run app.py # Run function
modal serve app.py # Hot-reload dev server
modal shell app.py # Interactive shell
modal shell app.py --gpu A100 # Shell with GPU
```
### Deployment
```bash
modal deploy app.py # Deploy
modal app list # List apps
modal app logs app-name # View logs
modal app stop app-name # Stop app
```
### Resources
```bash
# Volumes
modal volume create name
modal volume list
modal volume put name local remote
modal volume get name remote local
# Secrets
modal secret create name KEY=value
modal secret list
# Environments
modal environment create staging
```
---
## Pricing (2025)
### Plans
| Plan | Price | Containers | GPU Concurrency |
|------|-------|------------|-----------------|
| Starter | Free ($30 credits) | 100 | 10 |
| Team | $250/month | 1000 | 50 |
| Enterprise | Custom | Unlimited | Custom |
### Compute
- **CPU**: $0.0000131/core/sec
- **Memory**: $0.00000222/GiB/sec
- **GPUs**: See GPU table above
### Special Programs
- Startups: Up to $25k credits
- Researchers: Up to $10k credits
---
## Best Practices
1. **Use `@modal.enter()`** for model loading
2. **Use `uv_pip_install`** for faster builds
3. **Use GPU fallbacks** for availability
4. **Set appropriate timeouts** and retries
5. **Use environments** (dev/staging/prod)
6. **Download models during build**, not runtime
7. **Use `order_outputs=False`** when order doesn't matter
8. **Set `container_idle_timeout`** to balance cost/latency
9. **Monitor costs** in Modal dashboard
10. **Test with `modal run`** before `modal deploy`
---
## Common Patterns
### LLM Inference
```python
@app.cls(gpu="A100", container_idle_timeout=300)
class LLM:
@modal.enter()
def load(self):
from vllm import LLM
self.llm = LLM(model="...")
@modal.method()
def generate(self, prompt):
return self.llm.generate([prompt])
```
### Batch Processing
```python
@app.function(volumes={"/data": vol})
def process(file):
# Process file
vol.commit()
# Parallel
results = list(process.map(files))
```
### Scheduled ETL
```python
@app.function(
schedule=modal.Cron("0 6 * * *"),
secrets=[modal.Secret.from_name("db")]
)
def daily_etl():
extract()
transform()
load()
```
---
## Quick Reference
| Task | Code |
|------|------|
| Create app | `app = modal.App("name")` |
| Basic function | `@app.function()` |
| With GPU | `@app.function(gpu="A100")` |
| With image | `@app.function(image=img)` |
| Web endpoint | `@modal.asgi_app()` |
| Scheduled | `schedule=modal.Cron("...")` |
| Mount volume | `volumes={"/path": vol}` |
| Use secret | `secrets=[modal.Secret.from_name("x")]` |
| Parallel map | `func.map(items)` |
| Async spawn | `func.spawn(arg)` |
| Class pattern | `@app.cls()` with `@modal.enter()` |