home / skills / plurigrid / asi / ksim-rl

ksim-rl skill

safe

This skill helps you train humanoid locomotion policies with PPO, AMP, and domain-randomized sim-to-real workflows on MuJoCo and JAX.

npx playbooks add skill plurigrid/asi --skill ksim-rl

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

6.5 KB

---
name: ksim-rl
description: RL training library for humanoid locomotion and manipulation built on MuJoCo and JAX. Provides PPO, AMP, and custom task abstractions for sim-to-real robotics policy training.
version: 1.0.0
category: robotics-rl
author: K-Scale Labs
source: kscalelabs/ksim
license: MIT
trit: -1
trit_label: MINUS
color: "#3A2F9E"
verified: false
featured: true
---

# KSIM-RL Skill

**Trit**: -1 (MINUS - analysis/verification)
**Color**: #3A2F9E (Deep Purple)
**URI**: skill://ksim-rl#3A2F9E

## Overview

KSIM is K-Scale Labs' reinforcement learning library for humanoid robot locomotion and manipulation. Built on MuJoCo for physics simulation and JAX for hardware-accelerated training.

## Core Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                        KSIM ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────────┐  │
│  │  RLTask     │  │  PPOTask    │  │  AMPTask                │  │
│  │  (abstract) │──│  (PPO impl) │──│  (Adversarial Motion)   │  │
│  └─────────────┘  └─────────────┘  └─────────────────────────┘  │
│         │                                                        │
│         ▼                                                        │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │                    PhysicsEngine                             │ │
│  │  ┌───────────────┐  ┌───────────────────────────────┐       │ │
│  │  │ MujocoEngine  │  │ MjxEngine (JAX-accelerated)   │       │ │
│  │  └───────────────┘  └───────────────────────────────┘       │ │
│  └─────────────────────────────────────────────────────────────┘ │
│         │                                                        │
│         ▼                                                        │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │  Environment Components                                      │ │
│  │  • Actuators: Position, Velocity, Torque control            │ │
│  │  • Observations: Joint states, IMU, local view              │ │
│  │  • Rewards: Velocity tracking, gait, energy, stability      │ │
│  │  • Terminations: Fall detection, boundary violations        │ │
│  └─────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```

## Key Features

- **JAX-Accelerated**: Uses MJX for parallel environment simulation on GPU/TPU
- **PPO Training**: Proximal Policy Optimization with configurable hyperparameters
- **AMP Support**: Adversarial Motion Priors for realistic humanoid locomotion
- **Modular Rewards**: Composable reward functions for gait, velocity, energy
- **Domain Randomization**: Built-in randomizers for sim-to-real transfer

## API Usage

```python
import ksim
from ksim import PPOTask, MjxEngine
from ksim.tasks.humanoid import HumanoidWalkingTask

# Define custom task
class KBotWalkingTask(PPOTask):
    model_path = "kbot.mjcf"
    
    # Observations
    observations = [
        ksim.JointPosition(),
        ksim.JointVelocity(),
        ksim.IMUAngularVelocity(),
        ksim.BaseOrientation(),
    ]
    
    # Rewards
    rewards = [
        ksim.LinearVelocityReward(scale=1.0),
        ksim.GaitPhaseReward(scale=0.5),
        ksim.EnergyPenalty(scale=-0.01),
    ]
    
    # Actuators
    actuators = [
        ksim.PositionActuator(
            joint_name=".*",
            kp=100.0,
            kd=10.0,
            action_scale=0.5,
        )
    ]

# Train
task = KBotWalkingTask()
task.run_training(
    num_envs=4096,
    num_steps=1000000,
    learning_rate=3e-4,
)
```

## GF(3) Triads

This skill participates in balanced triads:

```
ksim-rl (-1) ⊗ kos-firmware (+1) ⊗ mujoco-scenes (0) = 0 ✓
ksim-rl (-1) ⊗ kos-firmware (+1) ⊗ urdf2mjcf (0) = needs balancing
```

## Key Contributors

- **codekansas** (Ben Bolte): Core architecture, PPO, rewards
- **b-vm**: Randomizers, disturbances, policy training
- **carlosdp**: Adaptive KL, action scaling
- **WT-MM**: Visualization, markers

## Related Skills

- `kos-firmware` (+1): Robot firmware and gRPC services
- `mujoco-scenes` (0): Scene composition for MuJoCo
- `evla-vla` (-1): Vision-language-action models
- `urdf2mjcf` (-1): URDF to MJCF conversion
- `ktune-sim2real` (-1): Servo tuning for sim2real

## References

```bibtex
@misc{ksim2024,
  title={K-Sim: RL Training for Humanoid Locomotion},
  author={K-Scale Labs},
  year={2024},
  url={https://github.com/kscalelabs/ksim}
}
```


## SDF Interleaving

This skill connects to **Software Design for Flexibility** (Hanson & Sussman, 2021):

### Primary Chapter: 5. Evaluation

**Concepts**: eval, apply, interpreter, environment

### GF(3) Balanced Triad

```
ksim-rl (○) + SDF.Ch5 (−) + [balancer] (+) = 0
```

**Skill Trit**: 0 (ERGODIC - coordination)

### Secondary Chapters

- Ch2: Domain-Specific Languages

### Connection Pattern

Evaluation interprets expressions. This skill processes or generates evaluable forms.

Overview

This skill is an RL training library for humanoid locomotion and manipulation built on MuJoCo and JAX. It provides PPO, Adversarial Motion Priors (AMP), modular task abstractions, and GPU/TPU-accelerated parallel simulation for sim-to-real robotics policy training. The design focuses on composable observations, reward modules, and domain randomization for robust policies.

How this skill works

ksim-rl defines abstract RLTask classes and concrete PPOTask/AMPTask implementations that connect to a physics engine backend (MuJoCo or an MJX JAX-accelerated engine). Tasks assemble observations, actuators, reward terms, and termination conditions into vectorized environments that train policies using PPO and optional AMP regularizers. Domain randomizers, energy and gait rewards, and actuator models are pluggable to support transfer to hardware.

When to use it

Training humanoid locomotion policies that require high-throughput simulation on GPU/TPU
Developing manipulation policies with realistic physics and modular reward shaping
Prototyping sim-to-real workflows using domain randomization and actuator models
Comparing PPO variants or integrating adversarial motion priors for naturalistic motion
Scaling parallel environment count for sample-efficient policy optimization

Best practices

Start with a simple task and modular rewards; add complexity (AMP, randomization) iteratively
Use the JAX-accelerated MJX engine for large-scale parallel training to reduce wall-clock time
Tune actuator models and energy penalties to match target hardware dynamics before deployment
Enable domain randomization and disturbances when preparing policies for real robots
Log phase-wise metrics (gait, energy, stability, falls) to diagnose learning failures

Example use cases

Train a humanoid walking controller with gait-phase and velocity tracking rewards, then refine with AMP for natural motion
Create a torque-controlled arm task to learn robust manipulation primitives under sensor noise and randomized dynamics
Scale training across thousands of vectorized envs on GPU to produce policies for real-time onboard inference
Benchmark different PPO hyperparameter settings or reward weightings for stable locomotion
Integrate with firmware or middleware to validate policy execution on hardware in a sim-to-real pipeline

FAQ

Does this support GPU-accelerated simulation?

Yes. The MJX engine leverages JAX to run many parallel environments on GPU/TPU for fast training.

Can I combine PPO and AMP in the same workflow?

Yes. Tasks can incorporate PPO for policy optimization and AMP modules to impose adversarial motion priors for realistic behaviors.