Project

Visual factors to predict human aesthetic preferences for images: A deep-learning approach based on Fast Fourier Convolution and Vision Transformers

The main goal of this PhD thesis is to develop a deep-learning model to predict human aesthetic preferences for images. The starting point is the existence of a benchmark preference data set, collected in an extensive online study with large samples of images of everyday scenes and paintings and large samples of observers (collected by a PhD student in psychology, who also works on the larger project to which both PhDs belong). The development of the model will comprise two main steps. The first step, aimed at feature extraction, will combine handcrafted and deep features, to achieve high performance. The proposed model will use Fast Fourier Convolution (FFC) and Vision Transformers (ViT) to extract meaningful deep features. Two strategies will be explored: (1) to generate small image patches using FFC blocks, which are then given to a ViT model to produce final deep features; (2) to create deep features from FFC and ViT independently, which are then integrated with other features. The handcrafted features will be generated using traditional computer vision algorithms. This will be done for typical low-level features such as edges, hue, saturation, entropy, and blurriness for the entire image or specific regions in the image, as well as typical high-level factors such as content, style, image category, and art period. (Another PhD student will also develop computer vision algorithms for typical mid-level factors such as symmetry, balance, composition, segmentability into different salient regions, segmentability in fore- and background, etc.). The second step, aimed at aesthetic rating estimation, will fuse all the available features into a feature vector to predict average aesthetic preferences or aesthetic ratings. The proposed deep neural net will be compared against classical regression techniques (after applying a dimensionality reduction technique). We will also try to optimize the model further by using eye-movement data (collected by the PhD student in psychology) to help the ViT to use the most relevant regions in the image. Once the universal model is trained and optimized, we will challenge it further by also trying to develop specific versions that consider participant characteristics such as age, gender, familiarity with specific images, cultural background, educational level, art interest and expertise.

Date:3 Apr 2023 → Today

Keywords:Aesthetics, deep learning, image characteristics, visual perception, universal model, group models

Disciplines:Sensory processes and perception, Cognitive processes, Knowledge representation and machine learning, Computer vision, Image processing

Project type:PhD project

Project

Visual factors to predict human aesthetic preferences for images: A deep-learning approach based on Fast Fourier Convolution and Vision Transformers

Researchers

Project partners

Funding