Robot Lab 基于 Isaac Lab 的机器人强化学习实战指南 | 极客日志

PythonAI算法

Robot Lab 基于 Isaac Lab 的机器人强化学习实战指南

综述由AI生成Robot Lab 扩展库为各类机器人提供标准化的强化学习训练环境，无需修改核心 Isaac Lab 仓库。教程涵盖 Isaac Sim 与 Isaac Lab 的核心概念解析、环境安装配置、常用快捷键及多类型机器人支持列表。通过 Unitree Go2 和 G1 示例演示了从训练、监控到策略测试的全流程，并深入讲解了自定义机器人资产定义、任务配置编写及环境注册方法。此外还涉及多 GPU 训练、知识蒸馏等高级功能及常见问题排查，旨在帮助开发者高效完成从仿真到实物的强化学习部署。

王初壹发布于 2026/4/12更新于 2026/6/1232 浏览

Robot Lab 基于 Isaac Lab 的机器人强化学习实战指南

1. 项目简介

Robot Lab 是一个基于 NVIDIA Isaac Lab 构建的机器人强化学习扩展库，专注于为各类机器人提供标准化的强化学习训练环境。开发者可以在独立的环境中开发，无需修改核心 Isaac Lab 仓库。

要理解这套工具链，得先理清 NVIDIA 机器人平台的架构。它主要由两大核心组件构成：基础仿真平台 Isaac Sim，以及构建于其上的机器人学习应用框架 Isaac Lab。

Isaac Sim 是什么？它是一个通用的机器人模拟器，提供了高保真的物理引擎（PhysX）和照片级的渲染技术（RTX）。核心任务是构建精确、逼真的虚拟环境，包括机器人模型、传感器数据和物理交互。

核心概念解析

Application (App)：最高层级管理者，负责所有资源的生命周期，包括启动和销毁仿真进程。即使在无头模式下运行，App 依然是总控制器。

Simulation (Sim)：定义虚拟世界的'规则'，如物理定律（重力方向）、时间步长（dt）及渲染频率。它将每一步划分为不同的子步骤（physics_step, render_step），掌管着 World 对象。

World：为仿真提供空间背景，定义笛卡尔坐标系的原点和单位。所有尺寸和距离问题都在此参考系内解答。

USD 图元 (Prim)：USD 场景的基本构建块，可理解为容器。每个 Prim 有唯一路径（如 /World/MyRobot/Gripper），包含属性（颜色、大小）和关系（材质指定）。父级属性可被子级继承，实现复杂场景组合。

Scene：Isaac Lab 中管理 Stage 上所有与向量化相关的图元。当用户指定创建多个环境副本时，Scene 会自动在 Stage 上克隆这些实体（机器人、桌子等），并将它们放置在不同坐标位置，实现在单一世界中进行大规模并行训练。

Stage：世界的'组成结构'。以通用场景描述（USD）为基础，将仿真元素表示为层级化的树状结构，每个节点都是一个图元（Prim）。

Isaac Lab 本身不是一个模拟器，而是利用 Isaac Sim 提供的环境来进行大规模的 AI 模型训练。

1.1 Isaac Sim 操作快捷键参考

熟练掌握快捷键能显著提升工作效率。以下是常用操作汇总：

基本操作

类型	键位	效果
基本操作	鼠标左键	选中物体
基本操作	ESC	取消选中
基本操作	Ctrl + Z / Y	撤销 / 重做
基本操作	Ctrl + S / O / N	保存 / 打开 / 新建场景
基本操作	Delete	删除选中的物体
基本操作	Ctrl + D / C / V / X	复制 / 剪切 / 粘贴

视野操作

类型	键位	效果
视野操作	目标 + F	聚焦于选中的物体
视野操作	不选中 + F	聚焦于整个场景

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

类型	键位	效果
变换操作	W / E / R	平移 / 旋转 / 缩放模式
变换操作	Shift + 拖拽	特定轴精确移动
变换操作	Ctrl + 拖拽	启用吸附功能
变换操作	Ctrl + Shift + 拖拽	对物体施加力

类型	键位	效果
仿真控制	空格键	播放/暂停仿真
仿真控制	Ctrl + 空格	单步执行仿真
仿真控制	Ctrl + Shift + 空格	停止并重置
仿真控制	. / ,	前进/后退一帧

机器人	环境 ID	制造商
ANYmal D	`RobotLab-Isaac-Velocity-Rough-Anymal-D-v0`	ANYbotics
Unitree Go2	`RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0`	Unitree
Unitree B2	`RobotLab-Isaac-Velocity-Rough-Unitree-B2-v0`	Unitree
Unitree A1	`RobotLab-Isaac-Velocity-Rough-Unitree-A1-v0`	Unitree
Deeprobotics Lite3	`RobotLab-Isaac-Velocity-Rough-Deeprobotics-Lite3-v0`	Deeprobotics

机器人	环境 ID	制造商
Unitree Go2W	`RobotLab-Isaac-Velocity-Rough-Unitree-Go2W-v0`	Unitree
Unitree B2W	`RobotLab-Isaac-Velocity-Rough-Unitree-B2W-v0`	Unitree
Deeprobotics M20	`RobotLab-Isaac-Velocity-Rough-Deeprobotics-M20-v0`	Deeprobotics

机器人	环境 ID	制造商
Unitree G1	`RobotLab-Isaac-Velocity-Rough-Unitree-G1-v0`	Unitree
Unitree H1	`RobotLab-Isaac-Velocity-Rough-Unitree-H1-v0`	Unitree
FFTAI GR1T1	`RobotLab-Isaac-Velocity-Rough-FFTAI-GR1T1-v0`	FFTAI
Booster T1	`RobotLab-Isaac-Velocity-Rough-Booster-T1-v0`	Booster

# 1. 克隆 Robot Lab（在 IsaacLab 目录外）
cd ~
git clone https://github.com/fan-ziqi/robot_lab.git
cd robot_lab

# 2. 激活 Isaac Lab 环境
conda activate isaaclab

# 3. 安装 Robot Lab 扩展
python -m pip install -e source/robot_lab

# 4. 验证安装
python scripts/tools/list_envs.py

# setup.py 片段示例
INSTALL_REQUIRES=[
    "psutil",      # 监控系统资源（CPU、内存、GPU）
    "colorama",    # 跨平台终端彩色输出
    "xacrodoc",    # URDF/Xacro 机器人描述文件处理
    "numpy",       # 数值计算基础库
    "pandas",      # 数据处理和分析
    "pinocchio",   # 高性能机器人动力学库
    "cusrl[all]",  # 自定义强化学习框架
]

# 激活环境
conda activate isaaclab
cd ~/robot_lab

# 开始训练（无头模式，适合服务器）
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--headless \
--num_envs 4096

# 或者使用 GUI 模式（适合本地开发）
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--num_envs 512

tensorboard --logdir=logs

# 测试训练好的策略
python scripts/reinforcement_learning/rsl_rl/play.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--checkpoint /path/to/model_2400.pt \
--num_envs 64

# 使用键盘控制单个机器人
python scripts/reinforcement_learning/rsl_rl/play.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--checkpoint /path/to/model_2400.pt \
--num_envs 1 \
--keyboard

# 录制 200 帧的视频（需要安装 ffmpeg）
python scripts/reinforcement_learning/rsl_rl/play.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--num_envs 4 \
--video \
--video_length 200

# 训练
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-G1-v0 \
--headless

# 测试
python scripts/reinforcement_learning/rsl_rl/play.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-G1-v0

# 下载 LAFAN1 数据集（已重定向到 Unitree G1）
# 或使用自己的 .csv 运动数据

# 将 CSV 转换为 NPZ 格式
python scripts/tools/beyondmimic/csv_to_npz.py \
-f path/to/motion.csv \
--input_fps 60 \
--headless

# 预览运动
python scripts/tools/beyondmimic/replay_npz.py \
-f path/to/motion.npz

# 训练模仿策略
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-BeyondMimic-Flat-Unitree-G1-v0 \
--headless

# 测试（同时播放 2 个不同的动作）
python scripts/reinforcement_learning/rsl_rl/play.py \
--task=RobotLab-Isaac-BeyondMimic-Flat-Unitree-G1-v0 \
--num_envs 2

# 训练（使用 skrl 框架）
python scripts/reinforcement_learning/skrl/train.py \
--task=RobotLab-Isaac-G1-AMP-Dance-Direct-v0 \
--algorithm AMP \
--headless

# 测试（32 个机器人同时跳舞）
python scripts/reinforcement_learning/skrl/play.py \
--task=RobotLab-Isaac-G1-AMP-Dance-Direct-v0 \
--algorithm AMP \
--num_envs 32

# 使用 2 个 GPU 训练
python -m torch.distributed.run \
--nnodes=1 \
--nproc_per_node=2 \
scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--headless \
--distributed

python -m torch.distributed.run \
--nproc_per_node=2 \
--nnodes=2 \
--node_rank=0 \
--rdzv_id=123 \
--rdzv_backend=c10d \
--rdzv_endpoint=localhost:5555 \
scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--headless \
--distributed

python -m torch.distributed.run \
--nproc_per_node=2 \
--nnodes=2 \
--node_rank=1 \
--rdzv_id=123 \
--rdzv_backend=c10d \
--rdzv_endpoint=192.168.1.100:5555 \
scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--headless \
--distributed

# 训练 ANYmal D（使用对称性增强）
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Anymal-D-v0 \
--headless \
--agent=rsl_rl_with_symmetry_cfg_entry_point \
--run_name=ppo_with_symmetry \
agent.algorithm.symmetry_cfg.use_data_augmentation=true

# 测试
python scripts/reinforcement_learning/rsl_rl/play.py \
--task=RobotLab-Isaac-Velocity-Rough-Anymal-D-v0 \
--agent=rsl_rl_with_symmetry_cfg_entry_point \
--run_name=ppo_with_symmetry

# 步骤 1: 训练教师网络
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Flat-Anymal-D-v0 \
--headless \
--run_name=teacher

# 步骤 2: 蒸馏到学生网络
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Flat-Anymal-D-v0 \
--headless \
--agent=rsl_rl_distillation_cfg_entry_point \
--load_run teacher \
--run_name=student

# 步骤 3: 测试学生网络
python scripts/reinforcement_learning/rsl_rl/play.py \
--task=RobotLab-Isaac-Velocity-Flat-Anymal-D-v0 \
--agent=rsl_rl_distillation_cfg_entry_point \
--load_run student

# 从最新的检查点恢复训练
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--headless \
--resume \
--load_run <run_folder_name>

# 从指定检查点恢复
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Rough-Unitree-Go2-v0 \
--headless \
--resume \
--load_run <run_folder_name> \
--checkpoint /path/to/model_5000.pt

# 训练
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Flat-HandStand-Unitree-A1-v0 \
--headless

# 测试
python scripts/reinforcement_learning/rsl_rl/play.py \
--task=RobotLab-Isaac-Velocity-Flat-HandStand-Unitree-A1-v0

robot_lab/
├── source/
│   └── robot_lab/
│       ├── assets/          # 机器人资产定义
│       │   └── unitree.py
│       ├── tasks/           # 任务环境
│       │   └── manager_based/
│       │       └── locomotion/
│       │           └── velocity/
│       │               ├── velocity_env_cfg.py
│       │               └── config/
│       │                   └── unitree_a1/
│       │                       ├── flat_env_cfg.py
│       │                       └── agent/
│       │                           └── rsl_rl_ppo_cfg.py
│       └── ui_extension_example.py
└── scripts/
    ├── reinforcement_learning/
    │   └── rsl_rl/
    │       ├── train.py
    │       └── play.py
    └── tools/

from omni.isaac.lab.actuators import ActuatorNetMLPCfg, DCMotorCfg
from omni.isaac.lab.assets.articulation import ArticulationCfg
import omni.isaac.lab.sim as sim_utils

MY_ROBOT_CFG = ArticulationCfg(
    spawn=sim_utils.UsdFileCfg(
        usd_path="/path/to/my_robot.usd",
        activate_contact_sensors=True,
    ),
    rigid_props=sim_utils.RigidBodyPropertiesCfg(
        disable_gravity=False,
        retain_accelerations=False,
        linear_damping=0.0,
        angular_damping=0.0,
        max_linear_velocity=1000.0,
        max_angular_velocity=1000.0,
        max_depenetration_velocity=1.0,
    ),
    articulation_props=sim_utils.ArticulationRootPropertiesCfg(
        enabled_self_collisions=False,
        solver_position_iteration_count=4,
        solver_velocity_iteration_count=0,
    ),
    init_state=ArticulationCfg.InitialStateCfg(
        pos=(0.0, 0.0, 0.6),
        joint_pos={".*_hip_joint": 0.0, ".*_thigh_joint": 0.7, ".*_calf_joint": -1.4},
        joint_vel={".*": 0.0},
    ),
    actuators={
        "legs": DCMotorCfg(
            joint_names_expr=[".*_hip_joint", ".*_thigh_joint", ".*_calf_joint"],
            effort_limit=33.5,
            saturation_effort=33.5,
            velocity_limit=21.0,
            stiffness=25.0,
            damping=0.5,
            friction=0.0,
        ),
    },
)

# 你写的配置
prim_path="{ENV_REGEX_NS}/Robot"

# Isaac Lab 自动替换（假设创建 4 个环境）
# 环境 0: prim_path="/World/envs/env_0/Robot"
# 环境 1: prim_path="/World/envs/env_1/Robot"
# ...

from isaaclab.utils import configclass
from robot_lab.assets import MY_ROBOT_CFG
from robot_lab.tasks.manager_based.locomotion.velocity.velocity_env_cfg import (
    LocomotionVelocityRoughEnvCfg,
)

@configclass
class MyRobotRoughEnvCfg(LocomotionVelocityRoughEnvCfg):
    base_link_name = "base"
    foot_link_name = ".*_foot"
    joint_names = [
        "FR_hip_joint", "FR_thigh_joint", "FR_calf_joint",
        "FL_hip_joint", "FL_thigh_joint", "FL_calf_joint",
        "RR_hip_joint", "RR_thigh_joint", "RR_calf_joint",
        "RL_hip_joint", "RL_thigh_joint", "RL_calf_joint",
    ]

    def __post_init__(self):
        super().__post_init__()
        self.scene.robot = MY_ROBOT_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
        self.scene.height_scanner.prim_path = "{ENV_REGEX_NS}/Robot/" + self.base_link_name

        self.observations.policy.base_lin_vel.scale = 2.0
        self.observations.policy.base_ang_vel.scale = 0.25
        self.observations.policy.joint_pos.scale = 1.0
        self.observations.policy.joint_vel.scale = 0.05
        self.observations.policy.joint_pos.params["asset_cfg"].joint_names = self.joint_names
        self.observations.policy.joint_vel.params["asset_cfg"].joint_names = self.joint_names

        self.actions.joint_pos.scale = {
            ".*_hip_joint": 0.125,
            "^(?!.*_hip_joint).*": 0.25
        }
        self.actions.joint_pos.clip = {".*": (-100.0, 100.0)}
        self.actions.joint_pos.joint_names = self.joint_names

        self.rewards.lin_vel_z_l2.weight = -2.0
        self.rewards.ang_vel_xy_l2.weight = -0.05
        self.rewards.flat_orientation_l2.weight = 0
        self.rewards.base_height_l2.weight = 0
        self.rewards.joint_torques_l2.weight = -2.5e-5
        self.rewards.joint_acc_l2.weight = -2.5e-7

        self.commands.base_velocity.ranges.lin_vel_x = (-1.0, 1.5)
        self.commands.base_velocity.ranges.lin_vel_y = (-0.5, 0.5)
        self.commands.base_velocity.ranges.ang_vel_z = (-1.0, 1.0)

from isaaclab.utils import configclass
from .rough_env_cfg import MyRobotRoughEnvCfg

@configclass
class MyRobotFlatEnvCfg(MyRobotRoughEnvCfg):
    def __post_init__(self):
        super().__post_init__()
        self.scene.terrain.terrain_type = "plane"
        self.scene.terrain.terrain_generator = None
        self.scene.height_scanner = None
        self.observations.policy.height_scan = None
        self.observations.critic.height_scan = None
        self.curriculum.terrain_levels = None
        self.rewards.base_height_l2.params["sensor_cfg"] = None
        if self.__class__.__name__ == "MyRobotFlatEnvCfg":
            self.disable_zero_weight_rewards()

LocomotionVelocityRoughEnvCfg (Isaac Lab 基类)
↑
MyRobotRoughEnvCfg (完整配置：地形生成、高度扫描、所有传感器)
↑
MyRobotFlatEnvCfg (简化配置：平地、无高度扫描)

from isaaclab.utils import configclass
from isaaclab_rl.rsl_rl import (
    RslRlOnPolicyRunnerCfg,
    RslRlPpoActorCriticCfg,
    RslRlPpoAlgorithmCfg,
)

@configclass
class MyRobotRoughPPORunnerCfg(RslRlOnPolicyRunnerCfg):
    num_steps_per_env = 24
    max_iterations = 20000
    save_interval = 100
    experiment_name = "my_robot_rough"

    policy = RslRlPpoActorCriticCfg(
        init_noise_std=1.0,
        actor_obs_normalization=False,
        critic_obs_normalization=False,
        actor_hidden_dims=[512, 256, 128],
        critic_hidden_dims=[512, 256, 128],
        activation="elu",
    )

    algorithm = RslRlPpoAlgorithmCfg(
        value_loss_coef=1.0,
        use_clipped_value_loss=True,
        clip_param=0.2,
        entropy_coef=0.01,
        num_learning_epochs=5,
        num_mini_batches=4,
        learning_rate=1.0e-3,
        schedule="adaptive",
        gamma=0.99,
        lam=0.95,
        desired_kl=0.01,
        max_grad_norm=1.0,
    )

@configclass
class MyRobotFlatPPORunnerCfg(MyRobotRoughPPORunnerCfg):
    def __post_init__(self):
        super().__post_init__()
        self.max_iterations = 5000
        self.experiment_name = "my_robot_flat"

import gymnasium as gym
from . import agents

gym.register(
    id="RobotLab-Isaac-Velocity-Flat-My-Robot-v0",
    entry_point="isaaclab.envs:ManagerBasedRLEnv",
    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": f"{__name__}.flat_env_cfg:MyRobotFlatEnvCfg",
        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:MyRobotFlatPPORunnerCfg",
    },
)

gym.register(
    id="RobotLab-Isaac-Velocity-Rough-My-Robot-v0",
    entry_point="isaaclab.envs:ManagerBasedRLEnv",
    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": f"{__name__}.rough_env_cfg:MyRobotRoughEnvCfg",
        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:MyRobotRoughPPORunnerCfg",
    },
)

# 验证环境已注册
python scripts/tools/list_envs.py | grep "My-Robot"

# 开始训练
python scripts/reinforcement_learning/rsl_rl/train.py \
--task=RobotLab-Isaac-Velocity-Flat-My-Robot-v0 \
--headless

export ISAAC_SIM_PATH="/path/to/isaac-sim"

python -m pip install --upgrade pip setuptools wheel
python -m pip install -e source/robot_lab

python scripts/reinforcement_learning/rsl_rl/train.py \
--task=<ENV_NAME> \
--num_envs 1024

rigid_props=sim_utils.RigidBodyPropertiesCfg(
    max_depenetration_velocity=1.0,
),
articulation_props=sim_utils.ArticulationRootPropertiesCfg(
    solver_position_iteration_count=8,
    solver_velocity_iteration_count=2,
),

self.sim.dt = 0.005

{"python.analysis.extraPaths": ["${workspaceFolder}/source/robot_lab", "/path/to/IsaacLab/source/isaaclab"]}

rm -rf /tmp/IsaacLab/usd_*

Robot Lab 基于 Isaac Lab 的机器人强化学习实战指南

Robot Lab 基于 Isaac Lab 的机器人强化学习实战指南

1. 项目简介

1.1 Isaac Sim 操作快捷键参考

基本操作

视野操作

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

物体操作与变换

仿真控制

2. 支持的机器人

2.1 四足机器人 (Quadruped)

2.2 轮式机器人 (Wheeled)

2.3 人形机器人 (Humanoid)

3. 环境准备

3.1 安装 Isaac Lab

3.2 安装 Robot Lab

深入理解：安装配置详解

4. 快速开始

4.1 示例 1: 训练四足机器人（Unitree Go2）

训练

监控训练

测试策略

录制视频

4.2 示例 2: 训练人形机器人（Unitree G1）

基础速度控制

模仿策略学习

4.3 示例 3: AMP 舞蹈动作学习

5. 高级功能

5.1 多 GPU 训练

单机多卡

多机多卡

5.2 对称性数据增强

5.3 教师 - 学生蒸馏

5.4 恢复训练

5.5 特技动作训练

6. 自定义机器人

6.1 项目结构

6.2 添加新机器人的步骤

步骤 1: 定义机器人资产

步骤 2: 创建任务配置

步骤 3: 配置训练参数

步骤 4: 注册环境

步骤 5: 验证和训练

调优技巧

7. 常见问题

7.1 安装问题

7.2 训练问题

7.3 仿真问题

7.4 部署问题

7.5 Pylance 找不到模块

7.6 清理 USD 缓存

8. 参考链接

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具