< Back to previous page

Publication

Flexile modeling of hierarchical, overdispersed, and multivariate biomedical data

Book - Dissertation

As is the case with many biomedical studies, the investigator collects longitudinal, continuous, binary, ordinal, categorical, and survival outcomes, possibly with some observations missing. However, ordinal data did not receive enough attention in statistics, whence the main focus of this PhD project. The proportional odds model (Agresti, 2002) can be regarded as an instance of a generalized linear model (Breslow and Clayton, 1993) where ordinal data are conveniently replaced by parsimonious dummies. Non-Gaussian outcomes are frequently modeled as members of the exponential family. A key feature of this family is their mean-variance relationship, i.e., when the variance can be regarded as a deterministic function of the mean. There are two reasons to extend this family: (1) the occurrence of overdispersion, when the variability of the data is not adequately described by the models, and (2) the incorporation of the hierarchical structure of the data. Molenberghs et al. (2007, 2010) introduced the so-called combined model to simultaneously address both issues. In this PhD project, a model for ordinal response with repeated measures was formulated, subject to overdispersion, which was further fitted on data from an epidemiological study in diabetes patients and on data from a fluvoxamine clinical trial in psychiatric patients. As is very often the case, the outcomes collected in medical studies describe a patient's condition and cannot be regarded as separate outcomes. Hence, they should be modeled jointly. When ordinal longitudinal data are part of a joint model, complexity increases even further. To address this, random effects based models were formulated in this thesis. In these models, the corresponding variance components can be employed to capture the association between various sequences. In some cases, random effects are considered common to various sequences, perhaps up to a scaling factor; in others thre are different yet correlated random effects. Using two case studies, it was shown that the combination of random effects can improve the model's fit considerably and allows one to answer research questions that could otherwise not be addressed. An additional problem of model fitting is the size of the collected data. Pseudo-likelihood methods such as (1) pairwise fitting (Fieuws and Verbeke, 2006), (2) partitioned samples (Molenberghs et al., 2011b), and the one proposed in our work, i.e., pairwise fitting within partitioned samples, enable the joint modeling of large numbers of responses. Based on the diabetes study outcome, it was shown that the pseudo-likelihood methodology yields highly efficient and fast inferences of high-dimensional large datasets. To address the missing value problem in the fuvoxamine trial, the process of missingness was incorporated in the joint modeling of two ordinal repeatedly measured responses. As an extension of the methodology, several missing not at random models were fitted to a set of observed data and shown to approximately yield the same result as their missing at random counterpart, although it affects precision. In addition, the effect of various identifying restrictions on multiple imputation was investigated. Further, an alternative approach was suggested to model the data with missingness present, when the models are involving complex likelihoods. When the missing data are missing at random, Molenberghs et al. (2011a) proposed a suite of corrections to the standard form of pseudo-likelihood, taking the form of singly and doubly robust estimators. The contribution of this PhD project was the detailed development of pairwise marginal pseudo-likelihood for incomplete repeated binary data and to apply it to data from an analgesic trial using the Bahadur model. All formulated models were implemented in SAS (9.3 -- 9.4), using proc NLMIXED and for the matrix calculations, proc IML was applied.
Publication year:2018
Accessibility:Closed