< Back to previous page

Project

Spatial Representation in Models of Images and Text with Applications to Medical Document Indexing and Autonomous Driving

The project addresses the problem of interaction between the autonomous systems and the users in a way where the visual and language information are used to complement and reinforce one another, by learning representations that jointly capture the meaning of language and the visual reality and that allow visual situations to be translated into language and language into visuals. The approach is based on the use of neural networks, or more specifically, multimodal auto-encoders and generative adversarial networks trained on paired visual and textual datasets. The focus lies on the learning and application of multimodal embeddings that can generalize to multiple different tasks and account for both the objects and actions in visual scenes and the lexical content and the grammatical organization of their corresponding language descriptions.

Date:1 Oct 2018 →  20 Dec 2022
Keywords:Deep Learning, Multimodal Learning, Computer Vision
Disciplines:Nanotechnology, Design theories and methods
Project type:PhD project