Course Description

The problem of decision-making is ubiquitous and has therefore been studied in different domains, often with the intent to devise formal procedures that yield the best decision. Reinforcement Learning (RL) is one such technique, which can be seen both as a machine learning approach and as an optimal control approach. RL is a sampling-based method to solve Markov Decision Processes (MDPs), i.e., problems in which an agent receives a reward as a consequence of the current situation (state) and the selected decision (action); because the current action influences the future states and, in turn, the future rewards, this problem is non-trivial. RL has proven to be a very successful technique, managing, e.g., to beat Chess and Go masters (both human and algorithms).

The aim of this course is to discuss the fundamentals of RL, therefore giving the student a sound understanding of how the problem is formulated and solved using state-of-the-art algorithms. Moreover, some more advanced topics will also be discussed in order to give a more complete picture to the interested students. Finally, the last part of the course will discuss how safety and stability guarantees can be introduced in RL by means of well-established techniques in control.

Course Schedule

The course is open for both IMT and external students and takes place in presence only. The course takes place over the 5 days 20-24 March 2023, with lectures and supervised assignments during the following hours:

  • 11:00 – 13:00
  • 14:00 – 16:00
  • 16:00 – 18:00

Course Material

Please do not print the slides! They are designed with various animations and the paper waste would be quite significant.

Slides (might be subject to small changes):
Introduction (handout)
MDP (handout)
Prediction (handout)
Control (handout)
Function Approximation
Policy Gradient
Exploration vs. Exploitation
Model-Based RL
Explainability, Safety and Stability

Assignment 1: MDPs, Value Functions and DP
Assignment 2: Value-Based RL Methods
Assignment 3: Function Approximation. Here are some useful useful files that will help you with this assignment (to be renamed as a .zip file and extracted)
Assignment 4: Policy Gradient