Prediction and Integration for Natural Language Representation
Giving machines the skills to represent and understand natural language, in order to apply it in the real world, is a great challenge in the area of AI. Current work has shown that it is possible to generate word representations that contain syntactic and semantic relationships and even the context in which they are used. Language Models (LM) such as ELMo and BERT have excelled in different Natural Language Processing (NLP) tasks. However, these models have been shown to have weak language skills. Recent works show that they are sensitive to noise in the input signals, they do not produce good representations at the sentence level and they contain little knowledge of common sense. The area of modern NLP has advanced rapidly taking into consideration mostly previous work from recent years in the area. However, this branch may benefit from ideas from other areas that try to explain the mechanisms of language acquisition and processing in humans. Contrary to how current NLP models work, a human being has the ability to abstract relevant information from a linguistic cue (eg a sentence) and thus anticipate the idea that might come later. Also, a person has the ability to use their prior knowledge (memory) to have a better understanding of the incoming linguistic cue. Predictive Coding and Construction-Integration are two cognitive science theories that try to explain the mechanisms behind these capabilities. Predictive Coding (PC) theory says that the human brain is a prediction machine that allows us to anticipate future events. In addition, it is continuously minimizing the mismatch between the generated predictions and the input sensory signals, allowing to obtain high-level abstract linguistic representations. In this thesis, it is proposed to model mechanisms of this theory, by developing a language model that includes an encoder, which generates latent representations of sentences and at the same time tries to predict the latent states of future sentences. With regard to the Construction-Integration (CI) theory, it argues that the text compression process goes beyond the relationships of the explicit information mentioned. In other words, there is an interaction and fusion between the linguistic signal presented and the general knowledge or experience of the subject. It is proposed to implement the mechanisms of this theory, through a language model that can take advantage of the information from an external knowledge base (KB), through the use of attention mechanisms. Finally, it is proposed to couple the models into a single architecture that can take advantage of the properties of each of the theories. To achieve this, the CI-based model (knowledge integration) will be used underlying the PC model (anticipation mechanism). With this, it is expected to obtain a model that learns high-quality representations, useful for tasks that require sophisticated linguistic abilities for an adequate understanding of text and speech (at the sentence level).