1. Deep Learning for Time Constrained Visual Classification & Object Detection

In many systems, both classification rates and time-to-decision determine the quality of a given solution. However, in most published work the issue of computational complexity or time is largely ignored. (Or rather, if it is considered it is limited only to minimizing computational overheads in terms of keeping the training and evaluation time of a Deep Network within the bounds of the researchers given resources.) This ignores the many possibilities for improving the quality of learned classifier by using novel data structures, modified learning objectives (e.g. preferring learning objectives which promote sparsity such as the use of Rectified Linear Units).

1.a Iterative Search and Learn

We are developing a method for “mining” large pools of video data for objects of interest and reusing the mined objects in the training of improved classifiers. The resulting method greatly reduces the amount of labelling work involved in training a classifier for basic object detection tasks.


1.b Static Landmark Detection Using NN’s

Visual localization depends on strong visual landmarks in the road scene. These must be static (not moving objects like vehicles), uniquely identifiable (to avoid data association issues with identical landmarks within a small neighbourhood), stable (in terms of achieving invariance to viewpoint, seasonal, weather, and illumination changes), infrastructure independant (they should not depend heavily on lane markings or other unreliable infrastructure assumptions) and ideally non-aliasing (since localization depends on achieving angular precision between the vehicle and the landmark).

Recent results have shown that deep neural networks may be able to identify landmarks according to such constraints leading to fast and robust visual localization for autonomous driving applications.

1.c Features in Deep Learning

We also observe that a commonly promoted advantage of Deep Learning methods is the ability to learn the features rather than using hand crafted features.  However, this ignores a major reason for the adoption of many feature types in computer vision and that is the ability for a given feature to provide a computational optimization, often through the use of cleverly engineered precomputed datatypes (Haar wavelets, HOG, LiteHOG). Additionally, recent work by Ciresan et al. shows that classification performance can be improved by applying various handcrafted transforms to the input training data and query data. This further mimics the function of features in providing ‘X-invariant’ (e.g. illumination-invariant, noise-invariant)  views of the data and somewhat undermines the claims that deep-learning will replace handcrafted features.

Students working under these headings will:

  • Work heavily with open source deep learning packages such as Caffe or Theano as well as significant GPU and HPC resources for training neural networks.
  • Examine the limitations of so-called ‘deep networks’ in learning features with a particular reference to computational efficiency.
  • Target real world problems in road scene understanding and include the time constraints that are inherent in working within this problem domain.

2. Road Scene Visual Localization

Lane Level Accurate Vehicle Localization is vital for autonomous driving applications. Vehicle routing depends on an accurate awareness of the vehicle position within the world, however, deficiencies in absolute positioning can often be made up for by local positional awareness. For example, a human driver doesn’t need to know his position relative to Time Square NYC to navigate a road in Pittsburgh. Rather, the driver needs to only to know their relative position within their planned route. Recent work regarding Visual Teach and Repeat methods has shown the effectiveness of this approach in various challenging mobile robotics problems. We intend to investigate such approaches for vehicle navigation in conjunction with more traditional absolute localization techniques.

Students working on this project will:

  • Work with open source localization and mapping tools such as GTSAM.
  • Produce novel approaches to combine absolute and relative localization and planning algorithms.
  • Work on real-world problems in mobile robot navigation and work to build these platforms.

3. Fast Object Detection in LIDAR Point Cloud Data

Past experience in object detection for images provides several valuable starting points for fast object search in point cloud data. While data structures such as Octree provide some basis for efficient object search in point cloud data there is significant room for improvement.

Students working on this projects will:

  • Work with open source point cloud processing libraries such is PCL.
  • Find new representations for Point Cloud data which facilitate fast object detection and localization problems
  • Work on real-world applications in mobile robot scene understanding, especially autonomous vehicles.

Senior Expert – Autonomous Driving – Computer Vision – Machine Learning @ Xiaopeng Motors