< Back to previous page

Project

End-to-end Training of Deep Neural Networks for Monocular Autonomous Navigation - A Feasibility Study

This PhD investigates the feasibility of applying Deep Learning methods to monocular autonomous navigation. Deep Learning has shown impressive results over the last seven years in computer vision, moving from hand crafted algorithms to end-to-end learned models. Whether the same idea of moving from hand crafted algorithms to end-to-end trained deep neural networks is also applicable to monocular navigation is still an open question.

This dissertation answers this question by first building an integrated software framework for imitation learning in simulation where extra sensors can be deployed for supervision. In a second step, the design decisions related to applying Deep Learning to monocular autonomous navigation is disentangled over three sets of design decisions: neural network architecture, training algorithm and domain generalization solutions. The impact of each design decision is evaluated on performance, training stability and sample efficiency on a corridor following task in simulation.

The first set covers the neural network architecture where networks of varying depths, widths, input and output domains are compared. In order to train recurrent neural networks, a more robust training procedure is proposed, called Windowwise-Backpropagation Through Time.

The second set of design decisions discusses the imitation learning strategy by:

- changing the data collection procedure with different policies,

- introducing a new online continual learning method for training without requiring stored datasets,

- introducing a new sample efficient reinforcement learning method for self-supervised collision avoidance without requiring actual collisions.

The third set of design decisions summarizes and evaluates four solutions for dealing with the domain shift. This domain shift occurs when a neural network trained for one environment has to perform in a different environment. In this case, we look at the shift between the simulated environment and the real world. The first solution uses domain invariant intermediate representations, in this case depth estimations. The second solution covers domain randomization. By adding a large variety in training simulated environments, the model successfully generalizes to a new very different environment such as the real world. The third solution implements auxiliary tasks, such as depth prediction, to counter environment specific overfitting. The fourth solution transfers the style of the simulated data to look similar to real-world data.

Comparing learning based methods in robotics is difficult due to a lack of benchmarks. This thesis provides such comparison by investigating the impact of many design decisions on performance, training stability and sample efficiency. Despite the preliminary nature of the experimental setup, namely corridor following, the opinions and conclusions regarding architecture, data sampling and generalization is crucial to take the step to more complex autonomous navigation tasks.

 

Date:1 Oct 2015 →  25 Oct 2019
Keywords:Computer Vision, Aritificiele Intelligentie
Disciplines:Artificial intelligence, Cognitive science and intelligent systems
Project type:PhD project