< Back to previous page

Project

Mid-level visual factors to predict human aesthetic preferences for images: A deep-learning approach based on Bayesian neural networks and latent diffusion models

The main goal of this PhD thesis is to develop a deep-learning model to predict human aesthetic preferences for images. The starting point is the existence of a benchmark preference data set, collected in an extensive online study with large samples of images of everyday scenes and paintings and large samples of observers (collected by a PhD student in psychology, who also works on the larger project to which both PhDs belong). In the first phase of the present PhD will develop a model based on Bayesian Neural Networks (BNNs), in which a deep learning encoder (i.e., a convolutional neural network or CNN) is followed by a Neural Network Gaussian process (NN-GP), thus being able to predict the mean and the variance of aesthetic preferences simultaneously. In addition to the theoretical and practical advantages of this approach, we will use state-of-the-art segmentation models that allow to analyze the composition of images, as one of the important mid-level Gestalt factors known to be important for the aesthetics of images. Applying Fourier analysis to the segmentation allows to analyze the composition in low- and high-frequency regimes. In the second phase of the present PhD, the BNN model will be tested and further validated by developing an image synthesis model based on a diffusion model (DM) and a Generative Adversarial Network (GAN). More specifically, we will use a Latent Diffusion Model (LDM) by training the diffusion model in the low-dimensional latent space extracted by pretrained autoencoders instead of the space of image pixels. In this way, we will extend GANalyze to DMs by integrating a pretrained LDM as the generator and the BNN for aesthetic preference as assessor. In addition, we can also entangle the segmentation map into the denoising step of DMs, allowing further control on the composition of synthesized images, which can then be tested empirically for their aesthetic value. As a last step, we can try to integrated the selection of image regions by eye-movements to exert additional control on the saliency of the synthesized image.

Date:3 Apr 2023 →  Today
Keywords:Aesthetics, Deep learning, Image characteristics, Bayesian Neural Networks, Segmentation, Diffusion
Disciplines:Sensory processes and perception, Cognitive processes, Knowledge representation and machine learning, Computer vision, Image processing
Project type:PhD project