AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

1. 介绍 Amazon SageMaker 和 Amazon RoboMaker 在支持 Amazon DeepRacer 方面扮演的角色

www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker
www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

2. 如何调整奖励函数以提升性能

www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

3. 通过超参数调优来优化模型。该视频介绍了 Amazon DeepRacer 中提供的每个超参数以及它们对跟踪性能的影响。

www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker
www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker
www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker
www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

4. parameters

default Hyperparameters

www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker


www.zeeklog.com  - AWS DeepRacer 参数调优 Amazon SageMaker 和 Amazon RoboMaker

5 Reward function

5.1 Time trial - follow the center line (Default)

This example determines how far away the agent is from the center line and gives higher reward if it is closer to the center of the track. It will incentivize the agent to closely follow the center line.

def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    
    # Read input parameters
    track_width = params['track_width']
    distance_from_center = params['distance_from_center']
    
    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width
    
    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track
    
    return float(reward)

5.2 Time trial - stay inside the two borders

This example simply gives high rewards if the agent stays inside the borders and lets the agent figure out what is the best path to finish a lap. It is easy to program and understand, but will be likely to take longer time to converge.

def reward_function(params):
    '''
    Example of rewarding the agent to stay inside the two borders of the track
    '''
    
    # Read input parameters
    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    
    # Give a very low reward by default
    reward = 1e-3

    # Give a high reward if no wheels go off the track and
    # the agent is somewhere in between the track borders
    if all_wheels_on_track and (0.5*track_width - distance_from_center) >= 0.05:
        reward = 1.0

    # Always return a float value
    return float(reward)

5.3 Time trial - prevent zig-zag

This example incentivizes the agent to follow the center line but penalizes with lower reward if it steers too much, which will help prevent zig-zag behavior. The agent will learn to drive smoothly in the simulator and likely display the same behavior when deployed in the physical vehicle.

def reward_function(params):
    '''
    Example of penalize steering, which helps mitigate zig-zag behaviors
    '''

    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    abs_steering = abs(params['steering_angle']) # Only need the absolute steering angle

    # Calculate 3 marks that are farther and father away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width

    # Give higher reward if the car is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1.0
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track

    # Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15 
    
    # Penalize reward if the car is steering too much
    if abs_steering > ABS_STEERING_THRESHOLD:
        reward *= 0.8
    return float(reward)

5.4 Object avoidance and head-to-head - stay on one lane and not crashing (default for OA and h2h)

We consider two factors in this reward function. First, reward the agent to stay inside two borders. Second, penalize the agent for getting too close to the next object to avoid crashes. The total reward is calculated with weighted sum of the two factors. The example emphasize more on avoiding crashes but you can play with different weights.

def reward_function(params):
    '''
    Example of rewarding the agent to stay inside two borders
    and penalizing getting too close to the objects in front
    '''

    all_wheels_on_track = params['all_wheels_on_track']
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    objects_distance = params['objects_distance']
    _, next_object_index = params['closest_objects']
    objects_left_of_center = params['objects_left_of_center']
    is_left_of_center = params['is_left_of_center']

    # Initialize reward with a small number but not zero
    # because zero means off-track or crashed
    reward = 1e-3

    # Reward if the agent stays inside the two borders of the track
    if all_wheels_on_track and (0.5 * track_width - distance_from_center) >= 0.05:
        reward_lane = 1.0
    else:
        reward_lane = 1e-3

    # Penalize if the agent is too close to the next object
    reward_avoid = 1.0

    # Distance to the next object
    distance_closest_object = objects_distance[next_object_index]
    # Decide if the agent and the next object is on the same lane
    is_same_lane = objects_left_of_center[next_object_index] == is_left_of_center

    if is_same_lane:
        if 0.5 <= distance_closest_object < 0.8: 
            reward_avoid *= 0.5
        elif 0.3 <= distance_closest_object < 0.5:
            reward_avoid *= 0.2
        elif distance_closest_object < 0.3:
            reward_avoid = 1e-3 # Likely crashed

    # Calculate reward by putting different weights on 
    # the two aspects above
    reward += 1.0 * reward_lane + 4.0 * reward_avoid

    return reward

Read more

Kubernetes 学习总结(23)—— 2022 年 Kubernetes 的 5 个趋势

Kubernetes 学习总结(23)—— 2022 年 Kubernetes 的 5 个趋势

前言 Kubernetes 在成长,使用它的团队也在成长。早期采用者现在已经进入了自己的领域,能够基于经验和云原生生态系统的增长,以新的方式扩展 Kubernetes 的核心功能。“我们将继续扩大 Kubernetes 的使用范围,以满足业务的混合云、多云需求。” Liberty Mutual 的架构师 Eric Drobisewski 说,“展望未来,Kubernetes 提供的声明性 API 和强大的协调循环对于统一和提供更一致的方法来跨公共和私有云环境定义、管理和保护数字功能至关重要。”《财富》100强公司加速使用 Kubernetes 作为其更广泛的混合云/多云基础设施的平台,这反映了推动 Kubernetes 在各行业采用率飙升的大趋势。 另一个大趋势是:许多公司仍在起步。而在他们的云之旅的任何阶段,大多数 IT 领导者都希望在生产中运行更多的容器化应用程序 —— Kubernetes 是这样做的常见选择。“Gartner 预测,到 2022 年,超过 75% 的全球组织将在生产中运行容器化应用程序,

By Ne0inhk
Go 语言学习总结(6)—— 学习 Golang 从零到大师

Go 语言学习总结(6)—— 学习 Golang 从零到大师

Pic: Gopher mascot and old logo 让我们从Go(或Golang)的简短介绍开始。 Go 是由 Google 工程师 Robert Griesemer,Rob Pike 和 Ken Thompson 设计的。 它是一种静态类型的编译语言。 第一个版本于 2012 年 3 月作为开源发布。 " Go 是一种开放源代码编程语言,可轻松构建简单,可靠且高效的软件"。 — GoLang 在许多语言中,有很多方法可以解决给定的问题。 程序员可能会花费大量时间来思考解决问题的最佳方法。另一方面,Go 相信更少的功能-只有一种正确的方法来解决问题。这样可以节省开发人员时间,并使大型代码库易于维护。 Go 中没有诸如地图和过滤器之类的"表达性"功能。 "当您具有增加表达力的功能时,

By Ne0inhk
Kubernetes 学习总结(24)—— Kubernetes 滚动更新、蓝绿发布、金丝雀发布等发布策略详解

Kubernetes 学习总结(24)—— Kubernetes 滚动更新、蓝绿发布、金丝雀发布等发布策略详解

前言 大部分公司都已经在使用 Kubernetes 进行容器管理和编排了,但是关于 Kubernetes 的发布策略相关的概括我们很多同学还没有一个完整的认识,下面我们对 Kubernetes 的多种发布策略从整体上做一个概括的认识。Kubernetes 中常见的发布策略主要有如下六种: 重建(recreate)  :即停止一个原有的容器,然后进行容器的新建。 滚动更新(rollingUpdate) :停掉一个容器,然后更新一个容器。 蓝绿布署(blue/green ):准备一套蓝色的容器和一套绿色的容器,进行流量切换。 金丝雀发布(canary) :更新部分容器,没有问题后进行逐步替换,直到切完。 A/B测试发布:即将发布的结果面向部分用户,这块没有现成的组件,需要进行自行处理,比如使用 Istio、Linkerd、Traefik 等。这种方式采用在 Http 的 Header 上进行处理。 无损发布:现在很多发布都是将容器停掉,当没有请求的时候这个时候发布会实现无损发布。 一、重建

By Ne0inhk
Go 语言学习总结(7)—— 大厂 Go 编程规范总结

Go 语言学习总结(7)—— 大厂 Go 编程规范总结

一、接口使用 1、如果希望接口方法修改基础数据,则必须使用指针传递 type F interface { f() } type S1 struct{} func (s S1) f() {} type S2 struct{} func (s *S2) f() {} var f1 F = S1{} var f2 F = &S2{} // f1.f() 无法修改底层数据 // f2.f() 可以修改底层数据,给接口变量 f2 赋值时使用的是对象指针 只有方法的接收者是一个指针,才能修改底层数据。无论方法的调用者是否是指针,底层数据能否被修改取决于 “方法的接收者” 是否是指针。上面的 S2 方法接收者是指针,

By Ne0inhk