Genetic robust kernel sample selection for chemometric data analysis

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Wiley Journal of Chemometrics
In this work, we propose a new algorithm to improve existing techniques used in the field of spectroscopic data regression analysis. In particular, it combines the power of nonlinear kernel regressors (kernel ridge regression [KRR], kernel principal component regression [KPCR], and Gaussian process regression [GPR]) with an optimization based on nondominated sorting multi-objective genetic algorithm (NSGAII) to filter the residual outliers in the prediction space and leverage points in the features space. The proposed algorithm, contrary to most existing robust algorithms, simultaneously optimizes many complementary objectives for an automatic adaptation and thus a better outliers detection. It is well known that the elimination of outliers greatly improves the regression model. It is thus the aim of this work to develop a new robust regression algorithm. It has been applied on five different datasets, and the results are compared to both classical nonlinear regression methods and the commonly used robust regression methods robust continuum regression (RCR), partial robust M-regression (PRM), robust principal component regression (RPCR), robust PLSR (RSIMPLS), and locally weighted regression (LWR). They show that the proposed algorithm outperforms the classical nonlinear regression methods and is a promising competitor to the robust methods outperforming most of them. Even though the results obtained are only from five datasets, this algorithm can be considered an interesting contribution for improving data analysis in the field of chemometrics.