< Terug naar vorige pagina
Publicatie
Hierarchical models for the analysis of spatial health surveys with missing information at individual and areal level
Boek - Dissertatie
Korte inhoud:Statistics are used to draw conclusions from a population of interest, based on a representative sample. Surveys are a frequent example of a sample, where people, sampled from the population, answer questions or fill out a questionnaire. The distribution of certain characteristics (e.g. age, sex, socioeconomic status) between sample and population may differ. In order to account for this difference, a survey weight is assigned to every person in the sample. When conducting a survey, some respondents do not want to or are unable to answer certain questions. This introduces incomplete data when analyzing the survey. It is important for researchers to deal with the missing data in a correct way, in order to avoid biased estimates. Therefore, an assumption has to be made for the reason why someone did not respond to the question of interest. We distinguish three possibilities: (1) the missingness is completely random (MCAR), (2) the missingness depends solely on the observed measurements, independent of the unobserved measurements (MAR) and (3) the missingness depends on both the observed and unobserved measurements (MNAR)(Rubin, 1976). In this thesis, we investigate models which can correctly analyse survey data with missing observations. Furthermore, we account for the spatial context of the data. Estimates are provided at the level of “small areas” (e.g. districts, counties, provinces). The measurements of areas which are close to each other are assumed to be more alike than those of areas which are more distant. The goal of this thesis is to develop methodology which can analyse these three types of data simultaneously. In Chapter 2, the impact of missing data in health surveys was evaluated when estimating area-specific prevalences. The methods described by Mercer et al. (2014) and Vandendijck et al. (2016) served as a foundation, and vary from the unweighted mean in the frequentist framework to the unit-specific spatial random effects model in the Bayesian framework. To account for missing observations in the analysis, a new missingness weight was defined. The inclusion of this missingness weight can correct for distributional shifts, caused by missing data. An extensive simulation study showed that unbiased estimates for the prevalence were yielded under the MCAR and MAR assumption. However, under the MNAR assumption the missingness weight did not have enough support to account for the missing data, as expected. Furthermore, we define a new weight smoothing model, which can model the survey design and the missing data in a flexible, non-linear way. This model produced the best results when a strong spatial effect is present in the data. The 2001 Belgian Health Interview Survey (HIS) was used as an application. The perceived health of respondents was investigated using the proposed models for the 43 administrative districts. Chapter 3 further extended these weight smoothing models by adding covariate information. The analysis was carried out under the MAR assumption. The 2013 Florida Behavioral Risk Factor Surveillance System (BRFSS) was used as an example. The proportion of inhabitants without health insurance coverage was the outcome of interest for the 67 counties. The income of the inhabitants was incorporated in the weight smoothing model as a covariate on the one hand and by means of a subgroup analysis on the other hand. Finally, the direct standardized rate was determined, which corrects for risk factors and allows us to directly compare the results from different counties. Due to economical or practical reasons, it might occur that not every area is included in the survey. As such, it is more difficult to produce unbiased estimates for the areas missing in the sample. In Chapter 4, methods were introduced to cope with the lack of information in these unsampled areas. Again, the methods from Mercer et al. (2014) were used as a foundation in the analysis. The simulation study showed that the results remained stable if about 75% of the intended areas were included in the survey. Furthermore, a strong spatial effect in the data implied that the results remained stable longer as more areas were missing from the survey. Next, we demonstrated a new methodology to improve the estimates for non-sampled areas, using census data about certain population characteristics. While this method had no effect on the results of the sampled areas, the results for the non-sampled areas greatly improved, given that the support for these areas was strong enough. Lastly, this new methodology was applied to the 2008 Mozambique Poverty and Social Impact Analysis (PSIA) survey, where the proportion of school attendance was investigated for the 125 districts. Finally, in Chapter 5, the performance of several multivariate methods were compared in order to model two outcome variables. Since these two outcome variables can be correlated, it is important to include this correlation when constructing the model. Four spatial multivariate models were considered in this chapter. The correlated random effects models produced the best results, highlighting the importance of including the correlation structure between the two outcome variables in the analysis. This was illustrated using the 2013 Florida BRFSS survey, where the prevalences of asthma and COPD were jointly estimated.
Aantal pagina's: 211
Jaar van publicatie:2020
Toegankelijkheid:Open
Reviewstatus:Peerreview