< Back to previous page

Project

Sparse and robust estimation of vector autoregressive models with applications in marketing and economics.

Nowadays, a large amount of data is available in nearly every area of science and business.  Information is typically collected in data sets where the different variables are contained in the columns of the data set and the measurements on each variable are contained in the rows. Our interest mainly lies in settings where these measurements are collected over time. Such data sets are said to contain time series in their columns. A time series should be treated differently from a regular variable to account for the time-dependency of its measurements.

Moreover, given today's data abundance, our interest lies in high-dimensional data sets, as opposed to low-dimensional data sets. High-dimensional time series data sets contain many short time series: a large number of time series (columns) is available relative to the number of time points (rows), hence, these data sets are `fat'. Low-dimensional time series data sets, in contrast, contain few long time series: a large number of time points (rows) is available relative to the number of time series (columns), hence, these data sets are `thin'. High-dimensional time series data sets are commonplace in today's business practice since many firms collect information on a large number of variables, but discard data that are older than a few years.

The problem, however, is that traditional estimators are well suited for low-dimensional data sets, but not for high-dimensional data sets. On the one hand, these estimators suffer from very low estimation precision if the number of measurements (rows) is close to the number of variables (columns) in the data set.  On the other hand, traditional estimators are not even computable if the number of measurements (rows) in the data set is larger than the number of variables (columns). Hence, there is a need for new estimation methods especially designed for these high-dimensional data sets.

In this thesis, we develop sparse estimation methods for high-dimensional data. Despite the data abundance, we do not expect each variable of these data sets to be equally informative. Sparse estimation methods rely on a simplicity assumption: we assume that only a relative small number of variables in our data set plays an important role. As such, sparse estimators retain the informative variables and remove the non-informative ones. This highly facilitates interpretation.

We develop sparse estimators for high-dimensional time series models in Chapters 1 to 4, and for Canonical Correlation Analysis (CCA) in Chapters 5 and 6. CCA is a multivariate statistical method that describes the associations between two data sets. Our interest lies in settings where both data sets are high-dimensional. Throughout the thesis, the usefulness and relevance of the sparse estimators are discussed for a wide variety of application domains, ranging from marketing (Chapter 1), and economics (Chapter 3, 4), to biometrics (Chapter 2, 5, 6).

Date:1 Oct 2012 →  30 Sep 2016
Keywords:High-dimensional data, Forecasting, Sparse and robust estimation, Vector autoregressive model
Disciplines:Applied mathematics in specific fields, Statistics and numerical methods, Applied economics, Economic history, Macroeconomics and monetary economics, Microeconomics, Tourism
Project type:PhD project