< Back to previous page

Project

Learning by Prediction and Integration: Human-inspired Approaches for Natural Language Understanding

Giving machines the skills to represent and understand natural language for applications in the real world presents a significant challenge in the area of Natural Language Processing. Pre-trained language models based on neural networks have recently achieved outstanding performance in several natural language understanding tasks. Although effective, these models lack certain abilities humans possess to understand text. For example, as we read, we can anticipate what content may come next or use prior knowledge to better understand a passage.

We hypothesize that current language models could benefit from human language processing mechanisms. In this work, we investigate and propose different approaches to improve current language models, drawing inspiration from prediction and integration theories of human language comprehension. Our contributions show that pre-trained language models have some limitations and that augmenting models with human mechanisms leads to improvements in natural language understanding across various tasks. We make six contributions distributed in three parts described below.

First, we evaluate state of the art pre-trained language models under stress conditions using competence, distraction, and noise tests. We show that these models are somewhat robust but still struggle when dealing with perturbed inputs, negations, and numerical reasoning. Furthermore, we evaluate the resulting representations of the models, showing that the pre-trained models in Spanish also produce sufficiently good general-purpose representations as the models in English. However, we confirm their limited representation power at the sentence and discourse level.

Second, we explore memory population methods for pre-trained language models under the lifelong learning with episodic memory paradigm. We show that randomly sampling the global distribution works well enough to integrate previous knowledge and mitigate forgetting in the model, but also that some tasks benefit more from selective-based population methods. Moreover, we propose a method to deal with the stability-plasticity dilemma that occurs in lifelong learning. We show that entropy can be used as a plasticity factor to decide how much a layer in a model must be modified according to the current input, improving its performance and efficiency.

Third, we extend the architecture of pre-trained language models with insights from predictive coding theory. We demonstrate that introducing bottom-up and top-down computation to predict future sentences in latent space in the models improves sentence and discourse-level representations. On the other hand, we propose a method that incorporates memory integration, memory rehearsal, and prediction ideas to solve question answering tasks from streaming data. Our approach leverages cross-attention mechanisms to integrate information into external memory, supported by anticipation and rehearsal. We show the effectiveness of our model in both text-based and video-based sequences.

In summary, we present systematic evaluations that demonstrate the limitations of current pre-trained language models. In addition, various approaches that improve such models follow ideas from human language processing, which shows that human inspiration still posits a way to improve models based on neural networks. By including human-based mechanisms, we bolster or add some abilities that language models do not have and are key to obtaining human-like language processing.

Date:10 May 2021 →  12 Sep 2023
Keywords:Deep learning, Language model, Predictive coding, Integration, Memory
Disciplines:Natural language processing, Machine learning and decision making
Project type:PhD project