数理科学与化学
类型
9.9
豆瓣评分
可以朗读
语音朗读
341千字
字数
No.72
科学技术
2024-06-01
发行日期
展开全部
主编推荐语
从零开始到透彻理解,知其然并知其所以然。
内容简介
本书从强化学习最基本的概念开始介绍,将介绍基础的分析工具,包括贝尔曼公式和贝尔曼最优公式,然后推广到基于模型的和无模型的强化学习算法,最后推广到基于函数逼近的强化学习方法。本书强调从数学的角度引人概念、分析问题、分析算法,并不强调算法的编程实现。
本书不要求读者具备任何关于强化学习的知识背景,仅要求读者具备一定的概率论和线性代数的知识。如果读者已经具备强化学习的学习基础,本书可以帮助读者更深入地理解一些问题并提供新的视角。
目录
- 版权信息
- 作者简介
- 内容简介
- Preface
- Overview of this Book
- Chapter 1 Basic Concepts
- 1.1 A grid world example
- 1.2 State and action
- 1.3 State transition
- 1.4 Policy
- 1.5 Reward
- 1.6 Trajectories, returns, and episodes
- 1.7 Markov decision processes
- 1.8 Summary
- 1.9 Q&A
- Chapter 2 State Values and the Bellman Equation
- 2.1 Motivating example 1: Why are returns important?
- 2.2 Motivating example 2: How to calculate returns?
- 2.3 State values
- 2.4 The Bellman equation
- 2.5 Examples for illustrating the Bellman equation
- 2.6 Matrix-vector form of the Bellman equation
- 2.7 Solving state values from the Bellman equation
- 2.8 From state value to action value
- 2.9 Summary
- 2.10 Q&A
- Chapter 3 Optimal State Values and the Bellman Optimality Equation
- 3.1 Motivating example: How to improve policies?
- 3.2 Optimal state values and optimal policies
- 3.3 The Bellman optimality equation
- 3.4 Solving an optimal policy from the BOE
- 3.5 Factors that inf luence optimal policies
- 3.6 Summary
- 3.7 Q&A
- Chapter 4 Value Iteration and Policy Iteration
- 4.1 Value iteration
- 4.2 Policy iteration
- 4.2.1 Algorithm analysis
- 4.3 Truncated policy iteration
- 4.4 Summary
- 4.5 Q&A
- Chapter 5 Monte Carlo Methods
- 5.1 Motivating example: Mean estimation
- 5.2 MC Basic: The simplest MC-based algorithm
- 5.3 MC Exploring Starts
- 5.4 MC ϵ-Greedy: Learning without exploring starts
- 5.5 Exploration and exploitation of ϵ-greedy policies
- 5.6 Summary
- 5.7 Q&A
- Chapter 6 Stochastic Approximation
- 6.1 Motivating example: Mean estimation
- 6.2 Robbins-Monro algorithm
- 6.3 Dvoretzky’s convergence theorem
- 6.4 Stochastic gradient descent
- 6.5 Summary
- 6.6 Q&A
- Chapter 7 Temporal-Difference Methods.
- 7.1 TD learning of state values
- 7.1.3 Convergence analysis
- 7.2 TD learning of action values: Sarsa
- 7.3 TD learning of action values: n-step Sarsa
- 7.4 TD learning of optimal action values: Q-learning
- 7.5 A unif ied viewpoint
- 7.6 Summary
- 7.7 Q&A
- Chapter 8 Value Function Approximation
- 8.1 Value representation: From table to function
- 8.2 TD learning of state values w ith function approximation
- 8.3 TD learning of action values w ith function approximation
- 8.4 Deep Q-learning
- 8.5 Summary
- 8.6 Q&A
- Chapter 9 Policy Gradient Methods
- 9.1 Policy representation: From table to function
- 9.2 Metrics for def ining optimal policies
- 9.3 Gradients of the metrics
- 9.4 Monte Carlo policy gradient (REINFORCE)
- 9.5 Summary
- 9.6 Q&A
- Chapter 10 Actor-Critic Methods
- 10.1 The simplest actor-critic algorithm (QAC)
- 10.2 Advantage actor-critic (A2C)
- 10.3 Off-policy actor-critic
- 10.4 Deterministic actor-critic
- 10.5 Summary
- 10.6 Q&A
- Appendix A Preliminaries for Probability Theory
- Appendix B Measure-Theoretic Probability Theory
- Appendix C Convergence of Sequences
- C.1 Convergence of deterministic sequences
- C.2 Convergence of stochastic sequences
- Appendix D Preliminaries for Gradient Descent
- Bibliography
- Symbols
- Index
展开全部
出版方
清华大学出版社
清华大学出版社成立于1980年6月,是由教育部主管、清华大学主办的综合出版单位。植根于“清华”这座久负盛名的高等学府,秉承清华人“自强不息,厚德载物”的人文精神,清华大学出版社在短短二十多年的时间里,迅速成长起来。清华大学出版社始终坚持弘扬科技文化产业、服务科教兴国战略的出版方向,把出版高等学校教学用书和科技图书作为主要任务,并为促进学术交流、繁荣出版事业设立了多项出版基金,逐渐形成了以出版高水平的教材和学术专著为主的鲜明特色,在教育出版领域树立了强势品牌。