实用强化学习

Practical Reinforcement Learning

1186 次查看
国立高等经济大学
Coursera
  • 完成时间大约为 38 个小时
  • 高级
  • 英语, 韩语
注:本课程由Coursera和Linkshare共同提供,因开课平台的各种因素变化,以上开课日期仅供参考

课程概况

Welcome to the Reinforcement Learning course.

Here you will find out about:

– foundations of RL methods: value/policy iteration, q-learning, policy gradient, etc.
— with math & batteries included

– using deep neural networks for RL tasks
— also known as “the hype train”

– state of the art RL algorithms
— and how to apply duct tape to them for practical problems.

– and, of course, teaching your neural network to play games
— because that’s what everyone thinks RL is about. We’ll also use it for seq2seq and contextual bandits.

Jump in. It’s gonna be fun!

Do you have technical problems? Write to us: coursera@hse.ru

课程大纲

Intro: why should I care?

In this module we gonna define and "taste" what reinforcement learning is about. We'll also learn one simple algorithm that can solve reinforcement learning problems with embarrassing efficiency.

At the heart of RL: Dynamic Programming

This week we'll consider the reinforcement learning formalisms in a more rigorous, mathematical way. You'll learn how to effectively compute the return your agent gets for a particular action - and how to pick best actions based on that return.

Model-free methods

This week we'll find out how to apply last week's ideas to the real world problems: ones where you don't have a perfect model of your environment.

Approximate Value Based Methods

This week we'll learn to scale things even farther up by training agents based on neural networks.

Policy-based methods

We spent 3 previous modules working on the value-based methods: learning state values, action values and whatnot. Now's the time to see an alternative approach that doesn't require you to predict all future rewards to learn something.

Exploration

In this final week you'll learn how to build better exploration strategies with a focus on contextual bandit setup. In honor track, you'll also learn how to apply reinforcement learning to train structured deep learning models.

千万首歌曲。全无广告干扰。
此外,您还能在所有设备上欣赏您的整个音乐资料库。免费畅听 3 个月,之后每月只需 ¥10.00。
Apple 广告
声明:MOOC中国十分重视知识产权问题,我们发布之课程均源自下列机构,版权均归其所有,本站仅作报道收录并尊重其著作权益。感谢他们对MOOC事业做出的贡献!
  • Coursera
  • edX
  • OpenLearning
  • FutureLearn
  • iversity
  • Udacity
  • NovoEd
  • Canvas
  • Open2Study
  • Google
  • ewant
  • FUN
  • IOC-Athlete-MOOC
  • World-Science-U
  • Codecademy
  • CourseSites
  • opencourseworld
  • ShareCourse
  • gacco
  • MiriadaX
  • JANUX
  • openhpi
  • Stanford-Open-Edx
  • 网易云课堂
  • 中国大学MOOC
  • 学堂在线
  • 顶你学堂
  • 华文慕课
  • 好大学在线CnMooc
  • (部分课程由Coursera、Udemy、Linkshare共同提供)

© 2008-2020 MOOC.CN 慕课改变你,你改变世界