Ongoing Projects

Multi-Agent Model-Based Planning with Diffusion Policy

Towards Scalable and Reliable One-shot Skeleton-based Action Recognition

Publications and preprints

Model-Based Planning with Stochastic Trajectory Prediction Models for Urban Driving

In submission.
Adam Villaflor, Brian Yang, Huangyuan Su, John Dolan Jeff Scheneider (Supervisor)
Significant progress has been made in training multimodal trajectory forecasting models for autonomous driving. However, effectively integrating these models with downstream planners and model-based control approaches is still an open problem. Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop models without retraining. We consider recent trajectory prediction approaches which leverage learned anchor embeddings to predict multiple trajectories, finding that anchor embeddings can parameterize discrete locally consistent modes representing high-level driving behaviors. We propose to perform closed-loop planning over these discrete latent modes, allowing us to tractably model the causal interactions between agents at each step.

Representations Overlaps in Deep Reinforcement Learning

CVPR 2023
link

Qiang He, Huangyuan Su, Jieyu Zhang, Yu Liu, Xinwen Hou

Talk-to-Diffusion: A Language-Controllable Diffusion Policy for Autonomous Driving

Brian Yang, Nikolaos Gkanatsios, Huangyuan Su, Ayush Jain, Jeff Schneider, Katerina Fragkiadaki

Reinventing Policy Iteration under Time Inconsistency

TMLR 2022 Paper: link Code: link
Huangyuan Su, Nixie Lesmana, Pun Chi Seng (Supervisor)


Policy iteration (PI) is a fundamental policy search algorithm in standard RL setting, which can be shown to converge to an optimal policy by policy improvement theorems. However, the standard PI relies on Bellman’s Principle of Optimality, which might be violated by time-inconsistent (TIC) objectives, such as non-exponentially discounted reward functions. The use of standard PI under TIC objectives has thus been marked with questions regarding the convergence of its policy improvement scheme and the optimality of its termination policy, often leading to its avoidance. In this paper, we consider an infinite-horizon TIC RL setting and formally present an alternative type of optimality drawn from game theory, i.e., subgame perfect equilibrium (SPE), that attempts to resolve the aforementioned questions. We first analyze standard PI under the SPE type of optimality, revealing its merits and insufficiencies. Drawing on these observations, we propose backward Q-learning (bwdQ), a new algorithm in the approximate PI family that targets SPE policy under non-exponentially discounted reward functions. Finally, with two TIC gridworld environments, we demonstrate the implications of our theoretical findings on the behavior of bwdQ and other approximate PI variants.