On the Statistical Complexity of Reinforcement Learning
Mengdi Wang, Princeton University
Recent years have witnessed increasing empirical successes in reinforcement learning (RL). However, many theoretical questions about RL were not well understood. For example, how many observations are needed and sufficient for learning a good policy? What is the regret of online learning with function approximation in a Markov decision process (MDP)? From logged history generated by unknown behavior policies, how do we optimally estimate the value of a new policy? In this talk, I will review some recent results addressing these questions, such as the sample complexities for solving MDP from a generative model, minimax-optimal off-policy evaluation by regression, and regret analysis of model-based RL.
Mengdi Wang is an associate professor at the Department of Electrical Engineering and Center for Statistics and Machine Learning at Princeton University. She is also affiliated with the Department of Operations Research and Financial Engineering and Department of Computer Science. Her research focuses on data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, the Google Faculty Award in 2017, and the MIT Tech Review 35-Under-35 Innovation Award (China region) in 2018. She serves as an associate editor for Operations Research and Mathematics of Operations Research.