Course Description

The problem of decision-making is ubiquitous and has therefore been studied in different domains, often with the intent to devise formal procedures that yield the best decision. Reinforcement Learning (RL) is one such technique, which can be seen both as a machine learning approach and as an optimal control approach. RL is a sampling-based method to solve Markov Decision Processes (MDPs), i.e., problems in which an agent receives a reward as a consequence of the current situation (state) and the selected decision (action); because the current action influences the future states and, in turn, the future rewards, this problem is non-trivial. RL has proven to be a very successful technique, managing, e.g., to beat Chess and Go masters (both human and algorithms).

The aim of this course is to discuss the fundamentals of RL, therefore giving the student a sound understanding of how the problem is formulated and solved using state-of-the-art algorithms. Moreover, some more advanced topics will also be discussed in order to give a more complete picture to the interested students. Finally, the last part of the course will discuss how safety and stability guarantees can be introduced in RL by means of well-established techniques in control.

Course Schedule

The course is open for IMT students only and takes place online; the link will be communicated soon. The schedule is as follows:

  • October 10, 10:00 – 13:00
  • October 11, 10:00 – 13:00
  • October 12, 10:00 – 13:00
  • October 13, 10:00 – 13:00
  • October 14, 10:00 – 13:00
  • October 17, 10:00 – 13:00
  • October 18, 09:00 – 11:00
  • October 19, 09:00 – 11:00
  • October 25, 09:00 – 11:00
  • October 26, 09:00 – 11:00
  • October 27, 09:00 – 11:00
  • October 28, 09:00 – 11:00

Course Material

Please do not print the slides! They are designed with various animations and the paper waste would be quite significant.

Slides (currently only for IMT students, might be updated):
Introduction
MDP
Prediction
Control
Function Approximation
Policy Gradient
Exploration vs. Exploitation
Model-Based RL
Explainability, Safety and Stability

Assignments:
Assignment 1: MDPs, Value Functions and DP
Assignment 2: Value-Based RL Methods
Assignment 3: Function Approximation. Here are some useful useful files that will help you with this assignment (to be renamed as a .zip file and extracted)
Assignment 4: Policy Gradient

Advertisement