< Terug naar vorige pagina

Publicatie

Components Matter: Considering Compositionality for Visual Representations

Boek - Dissertatie

In this thesis, we investigate the compositionality properties of single images and image sets for visual representations, with the goal of exploring the benefits of considering these properties. In realistic images this compositionality property can be observed in the multiple different features that compose them. Similar to single images, image sets also have the compositionality property, where a set is composed by multiple images consisting of multiple different features. In the first two parts of the thesis, we focus on the single image scenario. We firstly propose a method to identify the component representations that are important to the prediction of a pre-trained model, given an input image. Leveraging the representation visualization method, we also generate the visual explanations, highlighting the important regions of the input images. From a complementary direction, in the next chapter, we actively encode two types of features, present on the inputs, separately. Assuming that images are composed by style and shape features, we disentangle the two features and then combine the disentangled style and shape representations from two different images to synthesize a novel image, where the appearance (style) of the novel image is the same with the original one while the shape is different. Thus, achieving unpaired shape translation (change shape, preserving appearance). Then we move our research interest to the scenario where sets of images are considered. We firstly explore a set composed by images containing the same object to accurately localize the object. Based on the fact that the object-specific representations should be very similar across different images from the same class, we design a regularization to adjust the Class Activation Mapping based localization map. Secondly, we utilize a set of high-resolution face images as exemplars to help a model hallucinate high-resolution images. We believe more exemplars bring more useful visual information. To optimally extract the information from a set of exemplars, we design a module to find and combine the most useful component representation from the sets. Finally, we tackle the general multiple instance learning problem, where the model learns and predicts based on unordered sets of elements. We propose to iteratively learn set-level representations via LSTMs. While not often used for this, we show LSTMs are capable of modeling unordered sets, based on their memory ability. The performance is competitive and even surpasses methods tailored to solve multiple instance learning problems. We also show that LSTMs can indirectly capture instance-level information using only set-level annotations.
Jaar van publicatie:2021
Toegankelijkheid:Open