< Back to previous page

Publication

Development of real-time, robust statistical methods for novel applications in food sorting

Book - Dissertation

In industrial food sorting, fast sensor based technologies are used for automated food inspection. These sensors typically produce multivariate data that are used as input for classification algorithms, which are responsible for the detection of commonly found defects among the regular material. Typically, huge amounts of product are scanned in an automated fashion. Food inspection machines therefore generate gigabytes of multivariate data in milliseconds, frequently pushing the boundaries of available computing power.Outliers can dramatically influence the prediction efficiency of traditional classifiers. Robust algorithms are thus an absolute must, since industrial datasets are typically corrupted by outliers in the form of label and measurement noise. However, none of the well-known high breakdown methods can handle the sheer volume of data from these machines. This thesis addresses this problem by the introduction of new robust statistical procedures which are fast to compute, and which are specifically designed for robust outlier detection and multiclass classification problems.This doctoral thesis contains four chapters, where the relation between the different outlier detection techniques is discussed in the first chapter.The second chapter focusses on the speed-up of the deterministic minimum covariance determinant method (DetMCD), which detects outliers by fitting a robust covariance matrix. We construct a much faster version of DetMCD by replacing its initial estimators by two new methods and by incorporating update-based concentration steps. The computation time is reduced further by parallel computing, requiring the development of a novel robust aggregation method to combine the results from the individual threads.In the third chapter, we integrate the real-time DetMCD method into quadratic discriminant analysis (QDA), which is a widely used classification technique. This allows us to solve classification problems with multiple classes. Based on a training dataset, each class in the data is characterized by an estimate of its center and shape, which can then be used to assign unseen observations to one of the classes. We present a novel, robust QDA method where we additionally integrate an anomaly detection step to classify the most suspicious observations into a separate class of outliers. We also introduce the label bias plot, a graphical display to identify label and measurement noise in the training data.However, most outlier detection techniques assume that the non-outlying observations are roughly elliptically distributed, but many datasets are not of that form. Moreover, their computation time increases substantially when the number of variables goes up. In Chapter 4 we therefore propose the Kernel Minimum Regularized Covariance Determinant (KMRCD) estimator in Chapter four which addresses both issues. It is not restricted to elliptical data because it implicitly computes robust covariances in a kernel-induced feature space. A fast algorithm is constructed that starts from kernel-based initial estimates, where the kernel trick is exploited to speed up the subsequent computations.
Publication year:2020
Accessibility:Closed