跳到主要内容
基于 Isaac Lab 训练自定义机器人行走 | 极客日志
Python AI 算法
基于 Isaac Lab 训练自定义机器人行走 在 Ubuntu 22.04 系统下完成 Isaac Lab 环境搭建,包括显卡驱动、CUDA、PyTorch 及 Isaac Sim 资产配置。通过 URDF 转换导入机器人模型,编写 Isaac Lab 配置文件定义关节、传感器与奖励项。结合 RSL-RL 框架调整 PPO 算法参数,实现双足机器人在平坦或崎岖地形上的自主行走训练与验证。
菩提 发布于 2026/4/5 更新于 2026/4/24 1 浏览Isaac Lab 机器人行走训练指南
环境配置
推荐配置
操作系统 : Ubuntu 22.04 LTS
显卡 : NVIDIA RTX 4080 或以上
Ubuntu 22.04 LTS 安装
参考官方文档进行安装,建议 /home 与 /usr 的硬盘容量均不少于 200GB。
安装 NVIDIA 驱动
根据自身显卡型号与操作系统,选择对应的显卡驱动,建议选择 550.xxx 版本的显卡驱动。安装完成后在终端输入 nvidia-smi,若出现 GPU 信息则表示驱动安装成功。
安装 CUDA 和 cuDNN
安装 CUDA
根据 nvidia-smi 显示的 CUDA Version 选择对应的版本进行安装。可以在 NVIDIA 官网下载对应操作系统的安装包。
具体安装步骤:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.0-550.54.14-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.0-550.54.14-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-4-local /cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
安装后设置环境变量:
echo 'export PATH=/usr/local/cuda-12.4/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda-12.4' >> ~/.bashrc
source ~/.bashrc
验证 CUDA 安装是否成功,可以在终端输入以下命令:
nvcc --version
如果安装成功,会显示 CUDA 的版本信息。
安装 cuDNN
前往 NVIDIA 官网下载对应版本的 cuDNN。选择与 CUDA 版本相匹配的 cuDNN 版本。找到下载的 cudnn 的 deb 文件所在目录,打开终端,执行下述命令:
sudo dpkg -i cudnn-local-repo-ubuntu2204-xxxxxxxxxxxxxx.deb
/var/cudnn-local-repo-ubuntu2204-9.10.1/cudnn-*-keyring.gpg /usr/share/keyrings/
apt-get update
apt-get -y install cudnn-cuda-12
sudo
cp
sudo
sudo
安装 PyTorch 根据前面下载的 CUDA 版本,在 PyTorch 官网找到对应的安装指令,在自己的 Python 环境中执行以下命令安装 PyTorch(可以等后续创建了 Python 环境后进行安装),以下命令适用于 CUDA 12.4 版本的 PyTorch 安装:
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4-c pytorch -c nvidia
安装 Anaconda
安装 Isaac Sim 可以参考 Isaac Sim 官方安装教程,以下是简要步骤:
conda create -n env_isaaclab python=3.10
conda activate env_isaaclab
pip install --upgrade pip
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4-c pytorch -c nvidia
pip install 'isaacsim[all,extscache]==4.5.0' --extra-index-url https://pypi.nvidia.com
首次启动 isaacsim 需要较长时间,第一次运行将提示用户接受 NVIDIA 软件许可协议。要接受 EULA,请在出现以下消息提示时回复:
By installing or using Isaac Sim, I agree to the terms of NVIDIA SOFTWARE LICENSE AGREEMENT (EULA) in https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement
Do you accept the EULA? (Yes/No): Yes
资产配置 在 Isaac Sim 的官方网站下载 4.5 版本的 Isaac Sim 及 Isaac Sim 的资产包,资产包为 92G。按照官方文档进行资产配置。此外,使用 VSCode 进入 Isaac Sim 的库文件目录,在全局搜索中搜索默认的资产目录,将其替换成自己的资产目录。
安装 Isaac Lab 可以通过 Git 拉取 Isaac Lab 的代码库,或者从官方 GitHub 页面下载 zip 包,以下是通过 Git 拉取的命令:
安装后进入 Isaac Lab 目录,执行以下命令安装相关依赖:
./isaaclab.sh --install
./isaaclab.sh -i
验证安装是否成功,在 isaaclab.sh 所在目录执行:
./isaaclab.sh -p scripts/tutorials/00_sim/create_empty.py
机器人注册
机器人 USD 文件获取 可以通过 SolidWorks 导出机器人的 URDF 文件,然后打开 Isaac Sim,点击左上角的 File -> Import -> Import URDF,选择导出的 URDF 文件,导入后会生成一个 USD 文件,保存到指定目录。若是移动机器人,记得选择可动基座。若导入时卡住或者报错,检查 URDF 的完整性,检查 STL 路径,若 package://... 的路径格式不可行,建议替换成相对或绝对路径。
机器人注册 cd ~/isaaclab/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config
复制已有的 H1 文件夹,重命名为你的机器人名称,比如这里使用 OP3,创建一个 op3.py 作为 OP3 机器人的参数文件。当前的目录结构应如下,按需求补充缺失脚本:
isaaclab_tasks
└── manager_based
└── locomotion
└── velocity
├── config
│ └── op3
│ ├── agents
│ │ ├── __init__.py
│ │ ├── rsl_rl_ppo_cfg.py
│ │ ├── skrl_flat_ppo_cfg.yaml
│ │ ├── skrl_rough_ppo_cfg.yaml
│ │ └── __init__.py
│ ├── op3. py
│ ├── flat_env_cfg.py
│ ├── rough_env_cfg.py
│ ├── check_op3. py
│ ├── velocity_env_cfg.py
│ └── op3_velocity_env_cfg.py
机器人参数文件编写 import isaaclab.sim as sim_utils
from isaaclab.actuators import ImplicitActuatorCfg
from isaaclab.assets.articulation import ArticulationCfg
from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR
OP3_CFG = ArticulationCfg(
spawn=sim_utils.UsdFileCfg(
usd_path=f"xxx/op3.usd" ,
activate_contact_sensors=True ,
rigid_props=sim_utils.RigidBodyPropertiesCfg(
disable_gravity=False ,
retain_accelerations=False ,
linear_damping=0.0 ,
angular_damping=0.0 ,
max_linear_velocity=20.0 ,
max_angular_velocity=20.0 ,
max_depenetration_velocity=10.0 ,
),
articulation_props=sim_utils.ArticulationRootPropertiesCfg(
enabled_self_collisions=True ,
solver_position_iteration_count=4 ,
solver_velocity_iteration_count=4 ,
),
),
init_state=ArticulationCfg.InitialStateCfg(
pos=(0.0 , 0.0 , 0.28 ),
joint_pos={
"head_pan" : 0.0 ,
"head_tilt" : 0.0 ,
".*_hip_yaw" : 0.0 ,
".*_hip_roll" : 0.0 ,
".*_hip_pitch" : 0.0 ,
".*_knee" : 0.0 ,
".*_ank_pitch" : 0.0 ,
".*_ank_roll" : 0.0 ,
".*_sho_pitch" : 0.0 ,
".*_sho_roll" : 0.0 ,
".*_el" : 0.0 ,
},
joint_vel={".*" : 0.0 },
),
soft_joint_pos_limit_factor=0.7 ,
actuators={
"legs" : ImplicitActuatorCfg(
joint_names_expr=[".*_hip_yaw" , ".*_hip_roll" , ".*_hip_pitch" , ".*_knee" , ".*_ank_pitch" , ".*_ank_roll" ],
effort_limit_sim=20 ,
velocity_limit_sim=20.0 ,
stiffness={
".*_hip_yaw" : 100.0 ,
".*_hip_roll" : 100.0 ,
".*_hip_pitch" : 100.0 ,
".*_knee" : 100.0 ,
".*_ank_pitch" : 100.0 ,
".*_ank_roll" : 100.0 ,
},
damping={
".*_hip_yaw" : 20.0 ,
".*_hip_roll" : 20.0 ,
".*_hip_pitch" : 20.0 ,
".*_knee" : 20.0 ,
".*_ank_pitch" : 20.0 ,
".*_ank_roll" : 20.0 ,
},
),
"arms" : ImplicitActuatorCfg(
joint_names_expr=[".*_sho_pitch" , ".*_sho_roll" , ".*_el" ],
effort_limit_sim=20 ,
velocity_limit_sim=20.0 ,
stiffness={
".*_sho_pitch" : 100.0 ,
".*_sho_roll" : 100.0 ,
".*_el" : 100.0 ,
},
damping={
".*_sho_pitch" : 20.0 ,
".*_sho_roll" : 20.0 ,
".*_el" : 20.0 ,
},
),
"head" : ImplicitActuatorCfg(
joint_names_expr=["head_pan" , "head_tilt" ],
effort_limit_sim=20 ,
velocity_limit_sim=20.0 ,
stiffness={"head_pan" : 100.0 , "head_tilt" : 100.0 },
damping={"head_pan" : 20.0 , "head_tilt" : 20.0 },
),
},
)
机器人基座高度检查脚本:
检查机器人基座高度 check_op3.py 的内容如下,可以先在 op3.py 中给机器人一个较高的初始基座高度,然后在这个脚本的输出中查看实际落地时机器人的基座高度,以调整机器人初始基座高度:
import argparse
import torch
from isaaclab.app import AppLauncher
parser = argparse.ArgumentParser(description="This script demonstrates how to simulate bipedal robots." )
AppLauncher.add_app_launcher_args(parser)
args_cli = parser.parse_args()
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app
import isaaclab.sim as sim_utils
from isaaclab.assets import Articulation
from isaaclab.sim import SimulationContext
from op3 import OP3_CFG
def design_scene (sim: sim_utils.SimulationContext ) -> tuple [list , torch.Tensor]:
"""Designs the scene."""
cfg = sim_utils.GroundPlaneCfg()
cfg.func("/World/defaultGroundPlane" , cfg)
cfg = sim_utils.DomeLightCfg(intensity=2000.0 , color=(0.75 , 0.75 , 0.75 ))
cfg.func("/World/Light" , cfg)
origins = torch.tensor([[0.0 , -1.0 , 0.0 ], [0.0 , 0.0 , 0.0 ], [0.0 , 1.0 , 0.0 ],]).to(device=sim.device)
hr2 = Articulation(OP3_CFG.replace(prim_path="/World/G1" ))
robots = [hr2]
return robots, origins
def run_simulator (sim: sim_utils.SimulationContext, robots: list [Articulation], origins: torch.Tensor ):
"""Runs the simulation loop."""
sim_dt = sim.get_physics_dt()
sim_time = 0.0
count = 0
while simulation_app.is_running():
if count % 1000 == 0 :
sim_time = 0.0
count = 0
for index, robot in enumerate (robots):
joint_pos, joint_vel = robot.data.default_joint_pos, robot.data.default_joint_vel
robot.write_joint_state_to_sim(joint_pos, joint_vel)
root_state = robot.data.default_root_state.clone()
root_state[:, :3 ] += origins[index]
robot.write_root_pose_to_sim(root_state[:, :7 ])
robot.write_root_velocity_to_sim(root_state[:, 7 :])
robot.reset()
print (">>>>>>> Reset!" )
for robot in robots:
robot.set_joint_position_target(robot.data.default_joint_pos.clone())
robot.write_data_to_sim()
sim.step()
sim_time += sim_dt
count += 1
for robot in robots:
robot.update(sim_dt)
root_pos = robot.data.root_pos_w
print (f"Robot height (z): {root_pos[:, 2 ].item():.3 f} " )
def main ():
sim_cfg = sim_utils.SimulationCfg(dt=0.005 , device=args_cli.device, gravity=[0.0 , 0.0 , -9.81 ],)
sim = SimulationContext(sim_cfg)
sim.set_camera_view(eye=[3.0 , 0.0 , 2.25 ], target=[0.0 , 0.0 , 1.0 ])
robots, origins = design_scene(sim)
sim.reset()
print ("[INFO]: Setup complete..." )
run_simulator(sim, robots, origins)
if __name__ == "__main__" :
main()
simulation_app.close()
速度环境配置文件编写 速度环境配置文件 op3_velocity_env_cfg.py 的内容如下,可以先复制原有的 velocity_env_cfg.py,然后对照着修改,需要修改的地方有注释说明:
import math
from dataclasses import MISSING
import isaaclab.sim as sim_utils
from isaaclab.assets import ArticulationCfg, AssetBaseCfg
from isaaclab.envs import ManagerBasedRLEnvCfg
from isaaclab.managers import CurriculumTermCfg as CurrTerm
from isaaclab.managers import EventTermCfg as EventTerm
from isaaclab.managers import ObservationGroupCfg as ObsGroup
from isaaclab.managers import ObservationTermCfg as ObsTerm
from isaaclab.managers import RewardTermCfg as RewTerm
from isaaclab.managers import SceneEntityCfg
from isaaclab.managers import TerminationTermCfg as DoneTerm
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.sensors import ContactSensorCfg, RayCasterCfg, patterns
from isaaclab.terrains import TerrainImporterCfg
from isaaclab.utils import configclass
from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR, ISAACLAB_NUCLEUS_DIR
from isaaclab.utils.noise import AdditiveUniformNoiseCfg as Unoise
import isaaclab_tasks.manager_based.locomotion.velocity.mdp as mdp
from isaaclab.terrains.config.rough import ROUGH_TERRAINS_CFG
from isaaclab.managers import ActionTermCfg
@configclass
class MySceneCfg (InteractiveSceneCfg ):
"""Configuration for the terrain scene with a legged robot."""
terrain = TerrainImporterCfg(
prim_path="/World/ground" ,
terrain_type="generator" ,
terrain_generator=ROUGH_TERRAINS_CFG,
max_init_terrain_level=5 ,
collision_group=-1 ,
physics_material=sim_utils.RigidBodyMaterialCfg(
friction_combine_mode="multiply" ,
restitution_combine_mode="multiply" ,
static_friction=1.0 ,
dynamic_friction=1.0 ,
),
visual_material=sim_utils.MdlFileCfg(
mdl_path=f"{ISAACLAB_NUCLEUS_DIR} /Materials/TilesMarbleSpiderWhiteBrickBondHoned/TilesMarbleSpiderWhiteBrickBondHoned.mdl" ,
project_uvw=True ,
texture_scale=(0.25 , 0.25 ),
),
debug_vis=False ,
)
robot: ArticulationCfg = MISSING
height_scanner = RayCasterCfg(
prim_path="{ENV_REGEX_NS}/Robot/robotis_op3/base" ,
offset=RayCasterCfg.OffsetCfg(pos=(0.0 , 0.0 , 20.0 )),
attach_yaw_only=True ,
pattern_cfg=patterns.GridPatternCfg(resolution=0.1 , size=[1.6 , 1.0 ]),
debug_vis=False ,
mesh_prim_paths=["/World/ground" ],
)
contact_forces = ContactSensorCfg(prim_path="{ENV_REGEX_NS}/Robot/robotis_op3/.*" , history_length=3 , track_air_time=True )
sky_light = AssetBaseCfg(
prim_path="/World/skyLight" ,
spawn=sim_utils.DomeLightCfg(
intensity=750.0 ,
texture_file=f"{ISAAC_NUCLEUS_DIR} /Materials/Textures/Skies/PolyHaven/kloofendal_43d_clear_puresky_4k.hdr" ,
),
)
@configclass
class CommandsCfg :
"""Command specifications for the MDP."""
"""设定机器人在世界坐标系下跟踪期望的线速度和角速度"""
base_velocity = mdp.UniformVelocityCommandCfg(
asset_name="robot" ,
resampling_time_range=(10.0 , 10.0 ),
rel_standing_envs=0.02 ,
rel_heading_envs=1.0 ,
heading_command=True ,
heading_control_stiffness=0.5 ,
debug_vis=True ,
ranges=mdp.UniformVelocityCommandCfg.Ranges(
lin_vel_x=(-1.0 , 1.0 ),
lin_vel_y=(-1.0 , 1.0 ),
ang_vel_z=(-1.0 , 1.0 ),
heading=(-math.pi, math.pi),
),
)
@configclass
class ActionsCfg :
"""Action specifications for the MDP."""
joint_pos = mdp.JointPositionActionCfg(asset_name="robot" , joint_names=[".*" ], scale=0.5 , use_default_offset=True )
@configclass
class ObservationsCfg :
"""Observation specifications for the MDP."""
@configclass
class PolicyCfg (ObsGroup ):
"""Observations for policy group."""
base_lin_vel = ObsTerm(func=mdp.base_lin_vel, noise=Unoise(n_min=-0.1 , n_max=0.1 ))
base_ang_vel = ObsTerm(func=mdp.base_ang_vel, noise=Unoise(n_min=-0.2 , n_max=0.2 ))
projected_gravity = ObsTerm(func=mdp.projected_gravity, noise=Unoise(n_min=-0.05 , n_max=0.05 ),)
velocity_commands = ObsTerm(func=mdp.generated_commands, params={"command_name" : "base_velocity" })
joint_pos = ObsTerm(func=mdp.joint_pos_rel, noise=Unoise(n_min=-0.01 , n_max=0.01 ))
joint_vel = ObsTerm(func=mdp.joint_vel_rel, noise=Unoise(n_min=-1.5 , n_max=1.5 ))
actions = ObsTerm(func=mdp.last_action)
height_scan = ObsTerm(
func=mdp.height_scan, params={"sensor_cfg" : SceneEntityCfg("height_scanner" )},
noise=Unoise(n_min=-0.1 , n_max=0.1 ), clip=(-1.0 , 1.0 ),
)
def __post_init__ (self ):
self .enable_corruption = True
self .concatenate_terms = True
policy: PolicyCfg = PolicyCfg()
@configclass
class EventCfg :
"""Configuration for events."""
physics_material = EventTerm(
func=mdp.randomize_rigid_body_material,
mode="startup" ,
params={"asset_cfg" : SceneEntityCfg("robot" , body_names=".*" ), "static_friction_range" : (0.8 , 0.8 ), "dynamic_friction_range" : (0.6 , 0.6 ), "restitution_range" : (0.0 , 0.0 ), "num_buckets" : 64 ,},
)
add_base_mass = EventTerm(
func=mdp.randomize_rigid_body_mass,
mode="startup" ,
params={"asset_cfg" : SceneEntityCfg("robot" , body_names="base" ), "mass_distribution_params" : (-5.0 , 5.0 ), "operation" : "add" ,},
)
base_external_force_torque = EventTerm(
func=mdp.apply_external_force_torque,
mode="reset" ,
params={"asset_cfg" : SceneEntityCfg("robot" , body_names="base" ), "force_range" : (0.0 , 0.0 ), "torque_range" : (-0.0 , 0.0 ),},
)
reset_base = EventTerm(
func=mdp.reset_root_state_uniform,
mode="reset" ,
params={"pose_range" : {"x" : (-0.5 , 0.5 ), "y" : (-0.5 , 0.5 ), "yaw" : (-3.14 , 3.14 )}, "velocity_range" : {"x" : (-0.5 , 0.5 ), "y" : (-0.5 , 0.5 ), "z" : (-0.5 , 0.5 ), "roll" : (-0.5 , 0.5 ), "pitch" : (-0.5 , 0.5 ), "yaw" : (-0.5 , 0.5 ),},},
)
reset_robot_joints = EventTerm(
func=mdp.reset_joints_by_scale,
mode="reset" ,
params={"position_range" : (0.5 , 1.5 ), "velocity_range" : (0.0 , 0.0 ),},
)
push_robot = EventTerm(
func=mdp.push_by_setting_velocity,
mode="interval" ,
interval_range_s=(10.0 , 15.0 ),
params={"velocity_range" : {"x" : (-0.5 , 0.5 ), "y" : (-0.5 , 0.5 )}},
)
@configclass
class RewardsCfg :
"""Reward terms for the MDP."""
track_lin_vel_xy_exp = RewTerm(func=mdp.track_lin_vel_xy_exp, weight=1.0 , params={"command_name" : "base_velocity" , "std" : math.sqrt(0.25 )})
track_ang_vel_z_exp = RewTerm(func=mdp.track_ang_vel_z_exp, weight=0.5 , params={"command_name" : "base_velocity" , "std" : math.sqrt(0.25 )})
lin_vel_z_l2 = RewTerm(func=mdp.lin_vel_z_l2, weight=-2.0 )
ang_vel_xy_l2 = RewTerm(func=mdp.ang_vel_xy_l2, weight=-0.05 )
dof_torques_l2 = RewTerm(func=mdp.joint_torques_l2, weight=-1.0e-5 )
dof_acc_l2 = RewTerm(func=mdp.joint_acc_l2, weight=-2.5e-7 )
action_rate_l2 = RewTerm(func=mdp.action_rate_l2, weight=-0.01 )
feet_air_time = RewTerm(func=mdp.feet_air_time, weight=0.125 , params={"sensor_cfg" : SceneEntityCfg("contact_forces" , body_names=".*ank_roll_link" ), "command_name" : "base_velocity" , "threshold" : 0.5 ,},)
flat_orientation_l2 = RewTerm(func=mdp.flat_orientation_l2, weight=0.0 )
dof_pos_limits = RewTerm(func=mdp.joint_pos_limits, weight=0.0 )
@configclass
class TerminationsCfg :
"""Termination terms for the MDP."""
time_out = DoneTerm(func=mdp.time_out, time_out=True )
base_contact = DoneTerm(func=mdp.illegal_contact, params={"sensor_cfg" : SceneEntityCfg("contact_forces" , body_names="base" ), "threshold" : 1.0 },)
@configclass
class CurriculumCfg :
"""Curriculum terms for the MDP."""
terrain_levels = CurrTerm(func=mdp.terrain_levels_vel)
@configclass
class LocomotionVelocityRoughEnvCfg (ManagerBasedRLEnvCfg ):
"""Configuration for the locomotion velocity-tracking environment."""
scene: MySceneCfg = MySceneCfg(num_envs=4096 , env_spacing=2.5 )
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
commands: CommandsCfg = CommandsCfg()
rewards: RewardsCfg = RewardsCfg()
terminations: TerminationsCfg = TerminationsCfg()
events: EventCfg = EventCfg()
curriculum: CurriculumCfg = CurriculumCfg()
def __post_init__ (self ):
"""Post initialization."""
self .decimation = 4
self .episode_length_s = 20.0
self .sim.dt = 0.005
self .sim.render_interval = self .decimation
self .sim.physics_material = self .scene.terrain.physics_material
self .sim.physx.gpu_max_rigid_patch_count = 10 * 2 ** 15
if self .scene.height_scanner is not None :
self .scene.height_scanner.update_period = self .decimation * self .sim.dt
if self .scene.contact_forces is not None :
self .scene.contact_forces.update_period = self .sim.dt
if getattr (self .curriculum, "terrain_levels" , None ) is not None :
if self .scene.terrain.terrain_generator is not None :
self .scene.terrain.terrain_generator.curriculum = True
else :
if self .scene.terrain.terrain_generator is not None :
self .scene.terrain.terrain_generator.curriculum = False
复杂地面环境参数配置脚本 复杂地面环境配置文件 rough_env_cfg.py 的内容如下:
from isaaclab.managers import RewardTermCfg as RewTerm
from isaaclab.managers import SceneEntityCfg
from isaaclab.utils import configclass
import isaaclab_tasks.manager_based.locomotion.velocity.mdp as mdp
from isaaclab_tasks.manager_based.locomotion.velocity.op3_velocity_env_cfg import LocomotionVelocityRoughEnvCfg, RewardsCfg
from op3 import OP3_CFG
import random
@configclass
class OP3Rewards (RewardsCfg ):
"""Reward terms for the MDP."""
"""机器人死亡时的惩罚"""
termination_penalty = RewTerm(func=mdp.is_terminated, weight=-200.0 )
lin_vel_z_l2 = None
"""奖励机器人在机器人自身的朝向坐标系下跟踪期望的 xy 线速度"""
track_lin_vel_xy_exp = RewTerm(func=mdp.track_lin_vel_xy_yaw_frame_exp, weight=1.0 , params={"command_name" : "base_velocity" , "std" : 0.5 },)
"""奖励机器人在世界坐标系下跟踪期望的 z 轴角速度"""
track_ang_vel_z_exp = RewTerm(func=mdp.track_ang_vel_z_world_exp, weight=1.0 , params={"command_name" : "base_velocity" , "std" : 0.5 })
"""奖励双足交替抬起 (步态),鼓励有步态的行走,在飞行测试时,令 weight=0 即可"""
feet_air_time = RewTerm(func=mdp.feet_air_time_positive_biped, weight=0.25 , params={"command_name" : "base_velocity" , "sensor_cfg" : SceneEntityCfg("contact_forces" , body_names=".*ank_roll_link" ), "threshold" : 0.4 ,},)
"""惩罚脚在地面滑动 (非理想步态)"""
feet_slide = RewTerm(func=mdp.feet_slide, weight=-0.25 , params={"sensor_cfg" : SceneEntityCfg("contact_forces" , body_names=".*ank_roll_link" ), "asset_cfg" : SceneEntityCfg("robot" , body_names=".*ank_roll_link" ),},)
"""惩罚踝关节超出关节极限"""
dof_pos_limits = RewTerm(func=mdp.joint_pos_limits, weight=-1.0 , params={"asset_cfg" : SceneEntityCfg("robot" , joint_names=".*_ank_roll" )})
"""惩罚髋关节 (hip_yaw, hip_roll,hip_yaw) 偏离默认值"""
joint_deviation_hip = RewTerm(func=mdp.joint_deviation_l1, weight=-0.2 , params={"asset_cfg" : SceneEntityCfg("robot" , joint_names=[".*_hip_roll" , ".*_hip_pitch" , ".*_hip_yaw" ])},)
"""惩罚膝关节 (knee) 偏离默认值"""
@configclass
class OP3RoughEnvCfg (LocomotionVelocityRoughEnvCfg ):
rewards: OP3Rewards = OP3Rewards()
def __post_init__ (self ):
super ().__post_init__()
self .scene.robot = OP3_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot" )
if self .scene.height_scanner:
self .scene.height_scanner.prim_path = "{ENV_REGEX_NS}/Robot/robotis_op3/base"
self .events.push_robot = None
self .events.add_base_mass = None
self .events.reset_robot_joints.params["position_range" ] = (1.0 , 1.0 )
self .events.base_external_force_torque.params["asset_cfg" ].body_names = [".*base" ]
self .events.reset_base.params = {"pose_range" : {"x" : (-0.5 , 0.5 ), "y" : (-0.5 , 0.5 ), "yaw" : (-3.14 , 3.14 )}, "velocity_range" : {"x" : (0.0 , 0.0 ), "y" : (0.0 , 0.0 ), "z" : (0.0 , 0.0 ), "roll" : (0.0 , 0.0 ), "pitch" : (0.0 , 0.0 ), "yaw" : (0.0 , 0.0 ),},}
self .terminations.base_contact.params["sensor_cfg" ].body_names = [".*base" ]
self .rewards.undesired_contacts = None
self .rewards.flat_orientation_l2.weight = -1.0
self .rewards.dof_torques_l2.weight = 0.0
self .rewards.action_rate_l2.weight = -0.005
self .rewards.dof_acc_l2.weight = -1.25e-7
self .commands.base_velocity.ranges.lin_vel_x = (0.5 , 1.0 )
self .commands.base_velocity.ranges.lin_vel_y = (0.0 , 0.0 )
self .commands.base_velocity.ranges.ang_vel_z = (-1.0 , 1.0 )
self .terminations.base_contact.params["sensor_cfg" ].body_names = ".*base"
@configclass
class OP3RoughEnvCfg_PLAY (OP3RoughEnvCfg ):
def __post_init__ (self ):
super ().__post_init__()
self .scene.num_envs = 1
self .scene.env_spacing = 2.5
self .episode_length_s = 40.0
self .scene.terrain.max_init_terrain_level = None
if self .scene.terrain.terrain_generator is not None :
self .scene.terrain.terrain_generator.num_rows = 5
self .scene.terrain.terrain_generator.num_cols = 5
self .scene.terrain.terrain_generator.curriculum = False
self .commands.base_velocity.ranges.lin_vel_x = (1.0 , 1.0 )
self .commands.base_velocity.ranges.lin_vel_y = (0.0 , 0.0 )
self .commands.base_velocity.ranges.ang_vel_z = (-1.0 , 1.0 )
self .commands.base_velocity.ranges.heading = (0.0 , 0.0 )
self .observations.policy.enable_corruption = False
self .events.base_external_force_torque = None
self .events.push_robot = None
平整地面环境参数配置脚本 平地环境配置文件 flat_env_cfg.py 的内容如下:
from isaaclab.utils import configclass
from .rough_env_cfg import OP3RoughEnvCfg
@configclass
class OP3FlatEnvCfg (OP3RoughEnvCfg ):
def __post_init__ (self ):
super ().__post_init__()
self .scene.terrain.terrain_type = "plane"
self .scene.terrain.terrain_generator = None
self .scene.height_scanner = None
self .observations.policy.height_scan = None
self .curriculum.terrain_levels = None
self .rewards.feet_air_time.weight = 1.0
self .rewards.feet_air_time.params["threshold" ] = 0.6
class OP3FlatEnvCfg_PLAY (OP3FlatEnvCfg ):
def __post_init__ (self ) -> None :
super ().__post_init__()
self .scene.num_envs = 1
self .scene.env_spacing = 2.5
self .observations.policy.enable_corruption = False
self .events.base_external_force_torque = None
self .events.push_robot = None
机器人训练环境注册 在 __init__.py 中注册机器人训练环境,内容如下,只需要把 OP3 替换为自己的机器人名称即可:
import gymnasium as gym
from . import agents
""" 地形:崎岖地形 (rough terrain), 有地形生成器和难度课程。
用途:标准训练环境,适合训练机器人在复杂地形上行走。
奖励、终止、观测等:完整,适合正式训练。
"""
gym.register(id ="Rough-OP3-train" , entry_point="isaaclab.envs:ManagerBasedRLEnv" , disable_env_checker=True ,
kwargs={"env_cfg_entry_point" : f"{__name__} .rough_env_cfg:OP3RoughEnvCfg" , "rsl_rl_cfg_entry_point" : f"{agents.__name__} .rsl_rl_ppo_cfg:OP3RoughPPORunnerCfg" , "skrl_cfg_entry_point" : f"{agents.__name__} :skrl_rough_ppo_cfg.yaml" ,},)
""" 地形:崎岖地形,但用于'Play'模式。
区别:
环境数量:1,更适合测试和可视化。
地形课程关闭,地形数量减少,内存占用低。
随机扰动/推搡等事件关闭,更稳定。
观测扰动关闭,便于观察真实表现。
用途:用于演示、可视化、调试和模型评估。
"""
gym.register(id ="Rough-OP3-Play" , entry_point="isaaclab.envs:ManagerBasedRLEnv" , disable_env_checker=True ,
kwargs={"env_cfg_entry_point" : f"{__name__} .rough_env_cfg:OP3RoughEnvCfg_PLAY" , "rsl_rl_cfg_entry_point" : f"{agents.__name__} .rsl_rl_ppo_cfg:OP3RoughPPORunnerCfg" , "skrl_cfg_entry_point" : f"{agents.__name__} :skrl_rough_ppo_cfg.yaml" ,},)
""" 地形:平地 (flat terrain), 无地形生成器。
区别:
无地形难度课程,地形始终为平面。
无高度扫描观测,观测量减少。
奖励参数适配平地。
用途:适合在平地上训练,便于对比和基础能力训练。
"""
gym.register(id ="Flat-OP3-train" , entry_point="isaaclab.envs:ManagerBasedRLEnv" , disable_env_checker=True ,
kwargs={"env_cfg_entry_point" : f"{__name__} .flat_env_cfg:OP3FlatEnvCfg" , "rsl_rl_cfg_entry_point" : f"{agents.__name__} .rsl_rl_ppo_cfg:OP3FlatPPORunnerCfg" , "skrl_cfg_entry_point" : f"{agents.__name__} :skrl_flat_ppo_cfg.yaml" ,},)
""" 地形:平地,Play 模式。
区别:
环境数量少,适合测试。
无扰动、无观测噪声,便于可视化和调试。
用途:平地上的演示、可视化、调试和模型评估。
"""
gym.register(id ="Flat-OP3-Play" , entry_point="isaaclab.envs:ManagerBasedRLEnv" , disable_env_checker=True ,
kwargs={"env_cfg_entry_point" : f"{__name__} .flat_env_cfg:OP3FlatEnvCfg_PLAY" , "rsl_rl_cfg_entry_point" : f"{agents.__name__} .rsl_rl_ppo_cfg:OP3FlatPPORunnerCfg" , "skrl_cfg_entry_point" : f"{agents.__name__} :skrl_flat_ppo_cfg.yaml" ,},)
强化学习训练参数脚本修改
RSL-RL PPO 配置
from isaaclab.utils import configclass
from isaaclab_rl.rsl_rl import RslRlOnPolicyRunnerCfg, RslRlPpoActorCriticCfg, RslRlPpoAlgorithmCfg
@configclass
class OP3RoughPPORunnerCfg (RslRlOnPolicyRunnerCfg ):
num_steps_per_env = 24
max_iterations = 3000
save_interval = 50
experiment_name = "OP3_rough"
empirical_normalization = False
policy = RslRlPpoActorCriticCfg(
init_noise_std=1.0 ,
actor_hidden_dims=[512 , 256 , 128 ],
critic_hidden_dims=[512 , 256 , 128 ],
activation="elu" ,
)
algorithm = RslRlPpoAlgorithmCfg(
value_loss_coef=1.0 ,
use_clipped_value_loss=True ,
clip_param=0.2 ,
entropy_coef=0.01 ,
num_learning_epochs=5 ,
num_mini_batches=4 ,
learning_rate=1.0e-3 ,
schedule="adaptive" ,
gamma=0.99 ,
lam=0.95 ,
desired_kl=0.01 ,
max_grad_norm=1.0 ,
)
@configclass
class OP3FlatPPORunnerCfg (OP3RoughPPORunnerCfg ):
def __post_init__ (self ):
super ().__post_init__()
self .max_iterations = 1000
self .experiment_name = "OP3_flat"
self .policy.actor_hidden_dims = [128 , 128 , 128 ]
self .policy.critic_hidden_dims = [128 , 128 , 128 ]
SKRL PPO 配置 skrl_flat_ppo_cfg.yaml 和 skrl_rough_ppo_cfg.yaml 的内容中,只需要把 OP3 替换为自己的机器人名称即可,其他参数可不改。
skrl_flat_ppo_cfg.yaml 的内容如下:
seed: 42
models:
separate: False
policy:
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net
input: STATES
layers: [128 , 128 , 128 ]
activations: elu
output: ACTIONS
value:
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: STATES
layers: [128 , 128 , 128 ]
activations: elu
output: ONE
memory:
class: RandomMemory
memory_size: -1
agent:
class: PPO
rollouts: 24
learning_epochs: 5
mini_batches: 4
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-03
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.01
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: null
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.01
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
experiment:
directory: "OP3_flat"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
trainer:
class: SequentialTrainer
timesteps: 24000
environment_info: log
skrl_rough_ppo_cfg.yaml 的内容如下:
seed: 42
models:
separate: False
policy:
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net
input: STATES
layers: [512 , 256 , 128 ]
activations: elu
output: ACTIONS
value:
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: STATES
layers: [512 , 256 , 128 ]
activations: elu
output: ONE
memory:
class: RandomMemory
memory_size: -1
agent:
class: PPO
rollouts: 24
learning_epochs: 5
mini_batches: 4
discount_factor: 0.995
lambda: 0.95
learning_rate: 1.0e-03
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.01
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: null
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.01
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
experiment:
directory: "OP3_rough"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
trainer:
class: SequentialTrainer
timesteps: 72000
environment_info: log
机器人训练与测试
机器人训练 在完成前面的注册机器人训练环境的所有操作后,进入 Isaac Lab 目录,在终端执行训练语句:
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Flat-OP3-train --max_iterations 3000 --seed 42 --headless
这里的训练框架为 rsl-rl,你也可以使用 isaaclab/scripts/reinforcement_learning/... 路径下的其他训练框架。显然这里的训练环境就是前面注册的 Flat-OP3-train,你也可以使用复杂地面环境 Rough-OP3-train。
此外,关于训练时填入的参数说明可以通过下面语句获取:
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task Flat-OP3-train -h
机器人测试 在完成训练后,你会发现在 ~/isaaclab/logs/rsl_rl/OP3_flat 路径下有很多 .pt 文件,这些文件就是训练过程中保存的强化学习网络模型参数文件,你可以使用这些参数文件来测试机器人的行走能力。
比如,我前面训练了 3000 轮,那我就选取最后一次获取的 .pt 文件 ~/model_2999.pt,在终端执行下面的测试语句:
./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/play.py --task Flat-OP3-Play --checkpoint /home/suiyi/isaacsim/isaaclab/logs/rsl_rl/OP3_flat/2025-06-05_19-31-40/model_2999.pt
相关免费在线工具 加密/解密文本 使用加密算法(如AES、TripleDES、Rabbit或RC4)加密和解密文本明文。 在线工具,加密/解密文本在线工具,online
RSA密钥对生成器 生成新的随机RSA私钥和公钥pem证书。 在线工具,RSA密钥对生成器在线工具,online
Mermaid 预览与可视化编辑 基于 Mermaid.js 实时预览流程图、时序图等图表,支持源码编辑与即时渲染。 在线工具,Mermaid 预览与可视化编辑在线工具,online
随机西班牙地址生成器 随机生成西班牙地址(支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选),支持数量快捷选择、显示全部与下载。 在线工具,随机西班牙地址生成器在线工具,online
Gemini 图片去水印 基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印,支持批量处理与下载。 在线工具,Gemini 图片去水印在线工具,online
curl 转代码 解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。 在线工具,curl 转代码在线工具,online