Modelling neural correlates of visual attention using deep learning
From the moment we open our eyes, we receive an overwhelming amount of information about the world in front of us. While incredibly complex, our brain makes sense of this information effortlessly through the coordinated action of a large number of cortical areas and processes. Developing artificial systems that can perceive the visual world as efficiently as humans do has always been one of the central quests of artificial intelligence. In this regard, significant progress has been made by applying convolutional neural networks (CNNs), a class of deep learning architectures that are inspired by the hierarchical organization of the human visual system, to solve different visual tasks (e.g. object classification). The human ventral visual stream is in charge of creating visual object representations for the purpose of recognition. However, given the vast amount of visual information that arrives from the retina, it is impossible for the ventral stream to process all aspects of the visual input in the same manner. Selective attention is a key process for selecting the important aspects of the incoming inputs for preferential neural processing and thereby facilitating behavior (i.e., faster and more accurate recognition). Attention has recently attracted the interest of the AI community, as it offers the possibility of making deep learning models more computationally and energy efficient. Nonetheless, the attentional mechanisms that have been studied in the AI literature remain limited with respect to the rich array of attention processes operating in the human visual system. In this multidisciplinary project, we aim to improve computational models of attention by combining the models available in the AI literature with single-unit data obtained with invasive cortical recordings in neurosurgical patients. Using multielectrode arrays implanted in the context of the presurgical evaluation of epilepsy, we will record the activity of populations of visual neurons during object recognition and attention tasks. CNN-based decoding models will be fitted to predict neural responses to the presented visual stimuli. Subsequently, the same decoding models will be augmented by incorporating attentional mechanisms in order to predict the neural activity in the different attention conditions tested (e.g. spatial or feature attention). Implementing more accurate computational models of attention has the potential of providing new insights not only into the organization of the human visual system, but could also improve the artificial models of vision themselves.