AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

1. 介绍 Amazon SageMaker 和 Amazon RoboMaker 在支持 Amazon DeepRacer 方面扮演的角色

www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker
www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

2. 如何调整奖励函数以提升性能

www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

3. 通过超参数调优来优化模型。该视频介绍了 Amazon DeepRacer 中提供的每个超参数以及它们对跟踪性能的影响。

www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker
www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker
www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker
www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

4. parameters

default Hyperparameters

www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

5 Reward function

5.1 Time trial - follow the center line (Default)

This example determines how far away the agent is from the center line and gives higher reward if it is closer to the center of the track. It will incentivize the agent to closely follow the center line.

def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']
    
    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width
    
    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track
    
    return float(reward)

5.2 Time trial - stay inside the two borders

This example simply gives high rewards if the agent stays inside the borders and lets the agent figure out what is the best path to finish a lap. It is easy to program and understand, but will be likely to take longer time to converge.

def reward_function(params):
    '''
    Example of rewarding the agent to stay inside the two borders of the track
    '''
    
    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    
    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and
    # the agent is somewhere in between the track borders
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return float(reward)

5.3 Time trial - prevent zig-zag

This example incentivizes the agent to follow the center line but penalizes with lower reward if it steers too much, which will help prevent zig-zag behavior. The agent will learn to drive smoothly in the simulator and likely display the same behavior when deployed in the physical vehicle.

def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''

    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle

    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15 
    
    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8
    return float(reward)

5.4 Object avoidance and head-to-head - stay on one lane and not crashing (default for OA and h2h)

We consider two factors in this reward function. First, reward the agent to stay inside two borders. Second, penalize the agent for getting too close to the next object to avoid crashes. The total reward is calculated with weighted sum of the two factors. The example emphasize more on avoiding crashes but you can play with different weights.

def reward_function(params):
    '''
    Example of rewarding the agent to stay inside two borders
    and penalizing getting too close to the objects in front
    '''

    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    objects_distance = params['objects_distance']
    _, next_object_index = params['closest_objects']
    objects_left_of_center = params['objects_left_of_center']
    is_left_of_center = params['is_left_of_center']

    # Initialize reward with a small number but not zero
    # because zero means off-track or crashed
    reward = 1e-3

    # Reward if the agent stays inside the two borders of the track
    if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
        reward_lane = 1.0
    else:
        reward_lane = 1e-3

    # Penalize if the agent is too close to the next object
    reward_avoid = 1.0

    # Distance to the next object
    distance_closest_object = objects_distance[next_object_index]
    # Decide if the agent and the next object is on the same lane
    is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center

    if is_same_lane:
        if 0.5 <= distance_closest_object < 0.8: 
            reward_avoid *= 0.5
        elif 0.3 <= distance_closest_object < 0.5:
            reward_avoid *= 0.2
        elif distance_closest_object < 0.3:
            reward_avoid = 1e-3 # Likely crashed

    # Calculate reward by putting different weights on 
    # the two aspects above
    reward += 1.0 * reward_lane + 4.0 * reward_avoid

    return reward

Read more

印度统治阶级锁死底层人的5大阳谋

印度统治阶级锁死底层人的5大阳谋

基于社会学和心理学视角: 1. 情感道德: 统治阶级通过塑造道德规范和情感价值观,引导底层人群的行为。例如,宣扬“勤劳致富”“忍耐美德”等观念,让底层人接受现状并自我约束。这种道德框架往往掩盖结构性不平等,使人们将个人困境归咎于自身而非系统。 2. 欲望控制: 通过消费主义和媒体宣传,统治阶级刺激底层人的物质与社会欲望(如名牌、地位),但同时设置经济壁垒,使这些欲望难以实现。底层人被困在追求“更好生活”的循环中,精力被分散,无法聚焦于挑战权力结构。 3. 情绪煽动: 利用恐惧、愤怒或民族主义等情绪,统治阶级可以通过媒体或公共事件转移底层人对社会问题的注意力。例如,制造外部敌人或内部对立(如阶层、种族矛盾),让底层人内耗而非联合反抗。 4. 暴利诱惑: 通过展示少数“成功案例”或快速致富的机会(如赌博、投机),诱导底层人追逐短期暴利。这种机制不仅让底层人陷入经济风险,还强化了对现有经济体系的依赖,削弱长期变革的可能性。 5. 权力震撼: 通过展示统治阶级的权力(

By Ne0inhk