机器学习（十） — 强化学习

2024-01-18 09:42:01
开发
29

Reinforcement learning

1 key concepts

states

actions

rewards

discount factor $\gamma$

return

policy $\pi$

2 return

definition: the sum of the rewards that the system gets, weighted by the discount factor

compute:

$R_i$ : reward of state i

$\gamma$ : discount factor(usually close to 1), making the reinforcement learning impatient

$R_1 + \gamma R_2 + \cdots + \gamma^{n-1} R_n$

3 policy

policy $\pi$ maps state $s$ to some action $a$

$\pi(s) = a$

the goal of reinforcement learning is to find a policy $\pi$ to map every state $s$ to action $a$ to maximize the return

在这里插入图片描述

4 state action value function

1. definition

$Q(s, a) = $return if

start in state $s$

take action $a$ once

behave optimally after that

2. usage

the best possible return from state $s$ is $ma x$ $Q (s, a)$

the best possible action in state $s$ is the action $a$ that gives $ma x$ $Q (s, a)$

5 bellman equation

$s$ : current state

$a$ : current action

$s^{'}$ : state you get to after taking action $a$

$a^{'}$ : action that you take in state $s^{'}$

$\gamma max Q(s^{'}, a^{'})$

6 Deep Q-Network

1. definition

use neural network to learn $Q (s, a)$

$a)\\ y = R(s) + \gamma max Q(s^{'}, a^{'}) \\ f_{w, b}(x) \approx y$

在这里插入图片描述

2. step

initialize neural network randomly as guess of $Q (s, a)$

repeat:

take actions, get $s, a, R(s), s^{'})$

store N most recent $s, a, R(s), s^{'})$ tuples

train neural network:

create training set of N examples using $x = (s, a)$ and $\gamma max Q(s^{'}, a^{'})$

train $Q_{new}$ such that $Q_{new} \approx y$

set $Q = Q_{new}$

3. optimazation

在这里插入图片描述

4. $\epsilon$ -greedy policy

with probability $\epsilon$ , pick the action $a$ that maximize $Q (s, a)$

with probability $\epsilon$ , pick the action $a$ randomly

5. mini-batch

use a subset of the dataset on each gradient decent

6. soft update

instead $Q = Q_{new}$

$\alpha w_{new} + w\\ b = \alpha b_{new} + b$

原文地址:https://blog.csdn.net/m0_65591847/article/details/135641978 本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：https://www.suanlizi.com/kf/1747796450701611008.html 如若内容造成侵权/违法违规/事实不符，请联系《酸梨子》网邮箱：1419361763@qq.com进行投诉反馈，一经查实，立即删除！

阅读全部