Title: "Learning a visuomotor controller by planning trajectories in simulation"
Short abstract:
The idea of my thesis is combining Reinforced Learning (RL) with planning. I am using robotics simulator V-REP, OMPL motion planning, and PyTorch for deep learning. One common problem is that deep learning approaches typically require a large amount of data to train a model. In particular, collecting training experience on the real robot is time consuming and requires human intervention. Although there have been advances in learning policies in simulation using Reinforcement Learning (RL), fundamental challenges remain, including sample inefficiency, slow convergence and hyperparameter tuning. Instead of using RL, I propose learning the value function or policy directly by planning the trajectories in simulation and generate training experience along these trajectories. A planner with full state feedback is used to find collision free trajectories for the robotic hand from a random initial configuration to reach a target configuration. I then use observations with actions along these trajectories to train a policy. Alternatively, I learn a q-function, where the planner calculates the value of a state (e.g. the number of steps to reach the target). Subsequently, I use RL to finetune the control policy.
Thesis proposal document:
Thesis proposal defense presentation:
Thesis proposal presentation (12-12-18)
Thesis committee members:
Robert Platt (advisor), homepage
Christopher Amato (CCIS, Northeastern), homepage
Hanumant Singh (ECE, Northeastern), homepage
Kate Saenko (Boston University), homepage
Justification of the choice of the committee by advisor Robert Platt:
Christopher Amato is an expert on POMDPs, ML, and applications to
robotics. As such, he can ensure that our algorithmic contributions are
strong.
Hanumant Singh is an expert on field robotics. He can ensure that our
applications are strong.
Kate Saenko is an expert in computer vision. As such, she can
ensure that the perceptual aspects of Uli's work are strong.