home / skills / plurigrid / asi / ksim-rl
This skill helps you train humanoid locomotion policies with PPO, AMP, and domain-randomized sim-to-real workflows on MuJoCo and JAX.
npx playbooks add skill plurigrid/asi --skill ksim-rlReview the files below or copy the command above to add this skill to your agents.
---
name: ksim-rl
description: RL training library for humanoid locomotion and manipulation built on MuJoCo and JAX. Provides PPO, AMP, and custom task abstractions for sim-to-real robotics policy training.
version: 1.0.0
category: robotics-rl
author: K-Scale Labs
source: kscalelabs/ksim
license: MIT
trit: -1
trit_label: MINUS
color: "#3A2F9E"
verified: false
featured: true
---
# KSIM-RL Skill
**Trit**: -1 (MINUS - analysis/verification)
**Color**: #3A2F9E (Deep Purple)
**URI**: skill://ksim-rl#3A2F9E
## Overview
KSIM is K-Scale Labs' reinforcement learning library for humanoid robot locomotion and manipulation. Built on MuJoCo for physics simulation and JAX for hardware-accelerated training.
## Core Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ KSIM ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ RLTask │ │ PPOTask │ │ AMPTask │ │
│ │ (abstract) │──│ (PPO impl) │──│ (Adversarial Motion) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ PhysicsEngine │ │
│ │ ┌───────────────┐ ┌───────────────────────────────┐ │ │
│ │ │ MujocoEngine │ │ MjxEngine (JAX-accelerated) │ │ │
│ │ └───────────────┘ └───────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Environment Components │ │
│ │ • Actuators: Position, Velocity, Torque control │ │
│ │ • Observations: Joint states, IMU, local view │ │
│ │ • Rewards: Velocity tracking, gait, energy, stability │ │
│ │ • Terminations: Fall detection, boundary violations │ │
│ └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
## Key Features
- **JAX-Accelerated**: Uses MJX for parallel environment simulation on GPU/TPU
- **PPO Training**: Proximal Policy Optimization with configurable hyperparameters
- **AMP Support**: Adversarial Motion Priors for realistic humanoid locomotion
- **Modular Rewards**: Composable reward functions for gait, velocity, energy
- **Domain Randomization**: Built-in randomizers for sim-to-real transfer
## API Usage
```python
import ksim
from ksim import PPOTask, MjxEngine
from ksim.tasks.humanoid import HumanoidWalkingTask
# Define custom task
class KBotWalkingTask(PPOTask):
model_path = "kbot.mjcf"
# Observations
observations = [
ksim.JointPosition(),
ksim.JointVelocity(),
ksim.IMUAngularVelocity(),
ksim.BaseOrientation(),
]
# Rewards
rewards = [
ksim.LinearVelocityReward(scale=1.0),
ksim.GaitPhaseReward(scale=0.5),
ksim.EnergyPenalty(scale=-0.01),
]
# Actuators
actuators = [
ksim.PositionActuator(
joint_name=".*",
kp=100.0,
kd=10.0,
action_scale=0.5,
)
]
# Train
task = KBotWalkingTask()
task.run_training(
num_envs=4096,
num_steps=1000000,
learning_rate=3e-4,
)
```
## GF(3) Triads
This skill participates in balanced triads:
```
ksim-rl (-1) ⊗ kos-firmware (+1) ⊗ mujoco-scenes (0) = 0 ✓
ksim-rl (-1) ⊗ kos-firmware (+1) ⊗ urdf2mjcf (0) = needs balancing
```
## Key Contributors
- **codekansas** (Ben Bolte): Core architecture, PPO, rewards
- **b-vm**: Randomizers, disturbances, policy training
- **carlosdp**: Adaptive KL, action scaling
- **WT-MM**: Visualization, markers
## Related Skills
- `kos-firmware` (+1): Robot firmware and gRPC services
- `mujoco-scenes` (0): Scene composition for MuJoCo
- `evla-vla` (-1): Vision-language-action models
- `urdf2mjcf` (-1): URDF to MJCF conversion
- `ktune-sim2real` (-1): Servo tuning for sim2real
## References
```bibtex
@misc{ksim2024,
title={K-Sim: RL Training for Humanoid Locomotion},
author={K-Scale Labs},
year={2024},
url={https://github.com/kscalelabs/ksim}
}
```
## SDF Interleaving
This skill connects to **Software Design for Flexibility** (Hanson & Sussman, 2021):
### Primary Chapter: 5. Evaluation
**Concepts**: eval, apply, interpreter, environment
### GF(3) Balanced Triad
```
ksim-rl (○) + SDF.Ch5 (−) + [balancer] (+) = 0
```
**Skill Trit**: 0 (ERGODIC - coordination)
### Secondary Chapters
- Ch2: Domain-Specific Languages
### Connection Pattern
Evaluation interprets expressions. This skill processes or generates evaluable forms.
This skill is an RL training library for humanoid locomotion and manipulation built on MuJoCo and JAX. It provides PPO, Adversarial Motion Priors (AMP), modular task abstractions, and GPU/TPU-accelerated parallel simulation for sim-to-real robotics policy training. The design focuses on composable observations, reward modules, and domain randomization for robust policies.
ksim-rl defines abstract RLTask classes and concrete PPOTask/AMPTask implementations that connect to a physics engine backend (MuJoCo or an MJX JAX-accelerated engine). Tasks assemble observations, actuators, reward terms, and termination conditions into vectorized environments that train policies using PPO and optional AMP regularizers. Domain randomizers, energy and gait rewards, and actuator models are pluggable to support transfer to hardware.
Does this support GPU-accelerated simulation?
Yes. The MJX engine leverages JAX to run many parallel environments on GPU/TPU for fast training.
Can I combine PPO and AMP in the same workflow?
Yes. Tasks can incorporate PPO for policy optimization and AMP modules to impose adversarial motion priors for realistic behaviors.