https://docs.isaacsim.omniverse.nvidia.com/latest/robot_setup_tutorials/index.html
Robot Setup Tutorials Series — Isaac Sim Documentation
Robot Setup Tutorials Series The GUI tutorials walk you through setting up your virtual world and building robot digital twins with various NVIDIA Isaac Sim features. In the process, you will learn where to find frequently used properties, settings, and to
docs.isaacsim.omniverse.nvidia.com
해당 문서를 참고하여 강화학습 제어용 이륜 로봇을 제작하였다.
https://isaac-sim.github.io/IsaacLab/main/source/setup/walkthrough/training_jetbot_gt.html
Training the Jetbot: Ground Truth — Isaac Lab Documentation
Training the Jetbot: Ground Truth With the environment defined, we can now start modifying our observations and rewards in order to train a policy to act as a controller for the Jetbot. As a user, we would like to be able to specify the desired direction f
isaac-sim.github.io
또한 해당 예제를 통해 같은 종류의 이륜 로봇의 방향을 제어하는 예제 코드를 기반으로 시작하였다.
주로 수정한 파일은 Task의 env.py 파일이다.

1. RL 에이전트가 수행할 작업(Task) 변경

원래 코드
- _reset_idx 함수에서 self.commands라는 임의의 방향 벡터 생성
- 에이전트가 self.commands 벡터 방향으로 최대한 정렬, 해당 방향으로 빠르게 나아가는 것이 목표
- 목표 시각화를 위한 waypoint 마커가 존재

수정된 코드
- _reset_idx 함수에서 self.waypoint_locations 목표 좌표 생성
- self.commands 벡터 생성은 삭제
- 에이전트는 현재 위치에서 self.waypoint_locations까지 스스로 경로를 찾아 도달하는 것이 목표
- 조금 더 복잡하고 일반적인 내비게이션(Navigation) 문제
2. 시각화(Visualization) 마커 로직 변경
학습 환경 시각화를 도와주는 마커(_visualize_markers) 작동 방식 변경

원래 코드
- forward 마커(파란색 화살표)는 로봇의 전진 방향 벡터 표시
- command 마커(빨간색 화살표)는 임의로 생성된 self.commands 벡터 방향 표시



수정된 코드
- waypoint 마커 추가해 목표 지점 (녹색 화살표) 시각화
- command 마커가 현재 로봇 위치에서 목표 지점(waypoint)을 가리키는 방향을 실시간으로 계산
- 현재 로봇이 목표를 향해 주행중인지 직관적으로 파악 가능
3. 관측 (_get_observations) 재구성

원래 코드
- 로봇의 foward.commands 와 self.commands 벡터 간의 관계 관측
- dot: 두 벡터의 내적 (방향 일치도).
- cross: 두 벡터의 외적 (방향 차이).
- forward_speed: 로봇의 전진 속도.

수정된 코드
- 목표 지점 도달에 필수적인 정보들로 관측 데이터 구성
- vec_to_target_local: 로봇의 현재 위치와 방향 기준 목표 지점까지의 상대 위치 벡터
- robot_lin_vel_local: 로봇의 로컬 좌표계 기준 선속도
- robot_ang_vel_local: 로봇의 로컬 좌표계 기준 각속도
4. 보상 (Reward) 시스템 설계 변경 (_get_rewards)

원래 코드
- 전진 속도 보상 (forward_reward)
- 명령 방향 정렬 보상 (alignment_reward)

수정된 코드
- 목표 지점 도달을 위한 복합적이고 정교한 보상 구조 도입
- progress_reward: 목표 지점과의 거리에 따른 보상 (포텐셜 기반)
- velocity_reward: 목표 지점을 향해 움직이는 속도에 대한 보상
- heading_reward: 로봇이 목표 지점을 바라보도록 유도하는 보상
- success_reward: 목표 근접 시 주어지는 큰 성공 보상
- action_penalty, time_penalty: 불필요한 움직임, 시간 단축을 위한 페널티
- 이론 지식이 부족해서 대부분의 시간이 Reward function 수정에 소요
5. episode 관리 로직 변경 (_get_dones, _reset_idx)

원래 코드
- 종료 조건: 최대 스텝 도달 시
- 리셋: 무작위 '방향' 벡터(commands) 생성

수정된 코드
- 종료 조건: 목표 지점 도달 조건 추가
- 리셋: 무작위 '목표 지점'(waypoint_locations) 반경 내 생성
- 리셋 시, 거리 기반 보상(potentials) 값 초기화
6. env_cfg 설정 변경

- 환경 이름 외부 프로젝트(Myfirstcar)에 맞춰 수정
- robot_cfg를 jetbot에서 직접 설계한 이륜 로봇으로 교체 (ground plane 간섭 방지용 pos 옵션 추가)
- episode 세션 시간 2배로 증가(waypoint 도달 전에 episode 종료 막음)
- action scale 값 0.5로 최적화 episode 세션 시간 2배로 증가
4800 epoch 학습 결과

수정된 env.py 코드
# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
import math
import torch
from collections.abc import Sequence
import isaaclab.sim as sim_utils
from isaaclab.assets import Articulation
from isaaclab.envs import DirectRLEnv
from isaaclab.sim.spawners.from_files import GroundPlaneCfg, spawn_ground_plane
from isaaclab.utils.math import sample_uniform
from isaaclab.assets import RigidObject, RigidObjectCfg
from isaaclab.sensors.ray_caster import RayCaster, RayCasterCfg, patterns
from .myfirstcar_env_cfg import MyfirstcarEnvCfg
from isaaclab.markers import VisualizationMarkers, VisualizationMarkersCfg
from isaaclab.utils.timer import Timer
from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR
import isaaclab.utils.math as math_utils
def define_markers() -> VisualizationMarkers:
"""Define markers with various different shapes."""
marker_cfg = VisualizationMarkersCfg(
prim_path="/Visuals/myMarkers",
markers={
"forward": sim_utils.UsdFileCfg(
usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/UIElements/arrow_x.usd",
scale=(0.25, 0.25, 0.5),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.0, 1.0, 1.0)),
),
"command": sim_utils.UsdFileCfg(
usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/UIElements/arrow_x.usd",
scale=(0.25, 0.25, 0.5),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(1.0, 0.0, 0.0)),
),
"waypoint": sim_utils.UsdFileCfg(
usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/UIElements/arrow_x.usd",
scale=(0.3, 0.3, 0.6),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.0, 1.0, 0.0)),
),
"lidar": sim_utils.UsdFileCfg(
usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/UIElements/arrow_x.usd",
scale=(0.3, 0.3, 0.6),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.5, 1.0, 0.0)),
)
},
)
return VisualizationMarkers(cfg=marker_cfg)
class MyfirstcarEnv(DirectRLEnv):
cfg: MyfirstcarEnvCfg
def __init__(self, cfg: MyfirstcarEnvCfg, render_mode: str | None = None, **kwargs):
super().__init__(cfg, render_mode, **kwargs)
self.dof_idx, _ = self.robot.find_joints(self.cfg.dof_names)
self.action_scale = torch.tensor(self.cfg.action_scale, device=self.device)
def _define_sensor(self) -> RayCaster:
"""Defines the ray-caster sensor to add to the scene."""
# Lidar sensor configuration
self.ray_caster_cfg = RayCasterCfg(
prim_path="/World/envs/env_.*/Robot/",
mesh_prim_paths=["/World/ground"],
update_period=0.02,
pattern_cfg=patterns.GridPatternCfg(resolution=0.1, size=(1.6, 1.0)),
offset=RayCasterCfg.OffsetCfg(pos=(0.2, 0.0, 0.05)),
ray_alignment="yaw",
debug_vis=True,
)
return RayCaster(cfg=self.ray_caster_cfg)
def _setup_scene(self):
self.robot = Articulation(self.cfg.robot_cfg)
# add ground plane
spawn_ground_plane(prim_path="/World/ground", cfg=GroundPlaneCfg())
# clone and replicate
self.scene.clone_environments(copy_from_source=False)
# add articulation to scene
self.scene.articulations["robot"] = self.robot
# add lights
light_cfg = sim_utils.DomeLightCfg(intensity=2000.0, color=(0.75, 0.75, 0.75))
light_cfg.func("/World/Light", light_cfg)
self.visualization_markers = define_markers()
# Initialize sensors and add them to the scene
self.ray_caster = self._define_sensor()
self.scene.sensors["lidar"] = self.lidar
# setting aside useful variables for later
self.up_dir = torch.tensor([0.0, 0.0, 1.0], device=self.device)
self.waypoint_locations = torch.zeros((self.cfg.scene.num_envs, 3), device=self.device)
# initialize buffers
self.dist_to_target = torch.zeros(self.num_envs, device=self.device)
self.potentials = torch.zeros(self.num_envs, device=self.device)
self.marker_offset = torch.zeros((self.cfg.scene.num_envs, 3), device=self.device)
self.marker_offset[:,-1] = 0.5
def _visualize_markers(self):
# get marker locations and orientations
robot_locations = self.robot.data.root_pos_w + self.marker_offset
forward_orientations = self.robot.data.root_quat_w
# Command marker points towards the waypoint
vec_to_target = self.waypoint_locations - robot_locations
vec_to_target[:, 2] = 0
# Normalize the target vector
vec_to_target_norm = torch.linalg.norm(vec_to_target, dim=1, keepdim=True)
vec_to_target_safe = vec_to_target / (vec_to_target_norm + 1e-6)
# The command marker (arrow_x.usd) points along the x-axis by default.
# This is our starting vector for the rotation calculation.
start_vec = torch.tensor([1.0, 0.0, 0.0], device=self.device).expand_as(vec_to_target_safe)
# Calculate angle and axis for rotation
dot_product = torch.sum(start_vec * vec_to_target_safe, dim=-1).clamp(-1.0, 1.0)
angle = torch.acos(dot_product)
axis = torch.cross(start_vec, vec_to_target_safe, dim=-1)
# Normalize the rotation axis
axis_norm = torch.linalg.norm(axis, dim=-1, keepdim=True)
# Handle cases where vectors are collinear (axis norm is near zero)
up_vec = torch.tensor([0.0, 0.0, 1.0], device=self.device).expand_as(axis)
axis = torch.where(axis_norm > 1e-6, axis / (axis_norm + 1e-8), up_vec)
# Create quaternion from angle and axis
command_orientations = math_utils.quat_from_angle_axis(angle, axis)
# -- waypoint marker is at the target location
waypoint_locations = self.waypoint_locations
# waypoint orientation is fixed (identity)
waypoint_orientations = torch.tensor([-0.7071, 0.0, -0.7071, 0.0], device=self.device).repeat(self.num_envs, 1)
# stack all marker information
locs = torch.vstack((robot_locations, robot_locations, waypoint_locations))
rots = torch.vstack((forward_orientations, command_orientations, waypoint_orientations))
# render the markers
all_envs = torch.arange(self.num_envs, device=self.device)
indices = torch.hstack((torch.zeros_like(all_envs), torch.ones_like(all_envs), torch.full_like(all_envs, 2)))
self.visualization_markers.visualize(locs, rots, marker_indices=indices)
def _pre_physics_step(self, actions: torch.Tensor) -> None:
self.actions = actions.clone()
self._visualize_markers()
def _apply_action(self) -> None:
# apply action scaling
scaled_actions = self.actions * self.action_scale
self.robot.set_joint_velocity_target(scaled_actions, joint_ids=self.dof_idx)
def _get_observations(self) -> dict:
# robot position and orientation
robot_pos = self.robot.data.root_pos_w
robot_quat = self.robot.data.root_quat_w
# vector to target in world frame
self.vec_to_target_world = self.waypoint_locations - robot_pos
self.dist_to_target = torch.norm(self.vec_to_target_world[:, :2], p=2, dim=-1)
# vector to target in robot's local frame
vec_to_target_local = math_utils.quat_apply_inverse(robot_quat, self.vec_to_target_world)
# robot velocities
robot_lin_vel_local = math_utils.quat_apply_inverse(robot_quat, self.robot.data.root_lin_vel_w)
robot_ang_vel_local = math_utils.quat_apply_inverse(robot_quat, self.robot.data.root_ang_vel_w)
# lidar data - 거리 값만 사용
#lidar_scan = self.lidar.data.distance.squeeze(1) # (num_envs, num_rays)
# observations
obs = torch.cat(
(
vec_to_target_local,
robot_lin_vel_local,
robot_ang_vel_local,
#lidar_scan,
),
dim=-1,
)
observations = {"policy": obs}
return observations
def _get_rewards(self) -> torch.Tensor:
# -- distance-based reward (potential-based)
progress_reward = 2.0 * (self.potentials - self.dist_to_target)
self.potentials = self.dist_to_target.clone()
# -- velocity reward (towards target)
# get the unit vector to the target
vec_to_target_unit = self.vec_to_target_world / (torch.norm(self.vec_to_target_world, dim=1, keepdim=True) + 1e-6)
# project the robot's velocity onto the vector to the target
velocity_to_target = torch.sum(self.robot.data.root_lin_vel_w * vec_to_target_unit, dim=1)
# reward for moving towards the target
velocity_reward = 1.0 * torch.clamp(velocity_to_target, min=0)
# -- orientation reward (facing target)
# get the robot's forward direction vector
forward_vec = torch.tensor([1.0, 0.0, 0.0], device=self.device).repeat(self.num_envs, 1)
robot_fwd_vec = math_utils.quat_apply(self.robot.data.root_quat_w, forward_vec)
# compute the cosine similarity to reward facing the target
heading_reward = 0.1 * torch.sum(robot_fwd_vec * vec_to_target_unit, dim=1)
# -- success reward
success_reward = torch.zeros_like(progress_reward)
success_reward[self.dist_to_target < 0.5] = 50.0
# reward for reaching the goal
# -- action penalty
action_penalty = -0.001 * torch.sum(torch.square(self.actions), dim=-1)
# -- time penalty (to encourage speed)
time_penalty = -0.01
# -- total reward
total_reward = (
progress_reward
+ velocity_reward
+ heading_reward
+ success_reward
+ action_penalty
+ time_penalty
)
return total_reward.unsqueeze(1)
def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]:
time_out = self.episode_length_buf >= self.max_episode_length - 1
# reset if goal is reached
reset_goal = self.dist_to_target < 0.5
return reset_goal, time_out
def _reset_idx(self, env_ids: Sequence[int] | None):
if env_ids is None:
env_ids = self.robot._ALL_INDICES
super()._reset_idx(env_ids)
# -- Waypoint goal
# sample new goal locations
radius_minimum = 0.5
radius = 4.5
rand_dist = torch.rand(len(env_ids), device=self.device) * radius + radius_minimum
rand_angle = torch.rand(len(env_ids), device=self.device) * 2 * math.pi
# set the new goals
self.waypoint_locations[env_ids, 0] = self.scene.env_origins[env_ids, 0] + rand_dist * torch.cos(rand_angle)
self.waypoint_locations[env_ids, 1] = self.scene.env_origins[env_ids, 1] + rand_dist * torch.sin(rand_angle)
self.waypoint_locations[env_ids, 2] = 0.3
# fixed height
# set the root state for the reset envs
default_root_state = self.robot.data.default_root_state[env_ids]
default_root_state[:, :3] += self.scene.env_origins[env_ids]
self.robot.write_root_state_to_sim(default_root_state, env_ids)
# reset potentials
robot_pos = self.robot.data.root_pos_w[env_ids]
vec_to_target_world = self.waypoint_locations[env_ids] - robot_pos
self.dist_to_target[env_ids] = torch.norm(vec_to_target_world[:, :2], p=2, dim=-1)
self.potentials[env_ids] = self.dist_to_target[env_ids].clone()
# self.lidar.reset(env_ids)
self._visualize_markers()
수정된 env_cfg.py 코드
# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from MyFirstCar.robots.jetbot import JETBOT_CONFIG
from isaaclab.assets import ArticulationCfg
from isaaclab.envs import DirectRLEnvCfg
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.sim import SimulationCfg
from isaaclab.utils import configclass
@configclass
class MyfirstcarEnvCfg(DirectRLEnvCfg):
# env
decimation = 2
episode_length_s = 10.0
# - spaces definition
action_space = 2
observation_space = 3 #Lidar scan data
state_space = 0
# simulation
sim: SimulationCfg = SimulationCfg(dt=1 / 120, render_interval=decimation)
# robot(s)
robot_cfg: ArticulationCfg = JETBOT_CONFIG.replace(prim_path="/World/envs/env_.*/Robot",init_state=ArticulationCfg.InitialStateCfg(
pos=(0.0, 0.0, 0.8)))
# scene
scene: InteractiveSceneCfg = InteractiveSceneCfg(num_envs=16, env_spacing=20, replicate_physics=True)
dof_names = ["wheel_joint_left", "wheel_joint_right"]
action_scale = (0.5, 0.5)