Dr. Abhinav Verma
Assistant Professor, Departments of Electrical Engineering and Computer Science, Penn State University
Time: 12:30 pm – 1:30 pm
Date: Friday, January 31st
Location: BC 220
Abstract: Specifications in linear temporal logic (LTL) offer a simplified way of specifying tasks for policy optimization that may otherwise be difficult to describe with scalar reward functions. However, the standard Reinforcement Learning (RL) frameworks can be too myopic to find maximally satisfying policies. In this talk we will discuss eventual discounting, a value-function based proxy under which one can find policies that satisfy a specification with the highest achievable probability. To improve the efficiency of learning from specifications we combine eventual discounting with LTL-guided Counterfactual Experience Replay, a method for generating off-policy data from on-policy rollouts via counterfactual reasoning. Finally, we will discuss a mechanism for exploiting the compositionality of a LTL specification to provide formal guarantees on the behavior of learnt policies for reach-avoid tasks.
Bio: Dr. Verma Is an Assistant Professor in the Department of Computer Science and Engineering at The Pennsylvania State University. Previously, He was a postdoc at the Institute of Science and Technology (IST) Austria in the Henzinger Group. Before joining IST, He completed his PhD from the University of Texas at Austin advised by Prof. Swarat Chaudhuri. His research lies at the intersection of machine learning and formal methods, with a focus on building intelligent systems that are reliable, transparent, and secure. This work builds connections between the symbolic reasoning and inductive learning paradigms of artificial intelligence.