Regularized PCA to denoise and visualize data

2 janvier 2017
Durée : 00:51:05
Nombre de vues 26
Nombre d’ajouts dans une liste de lecture 0
Nombre de favoris 0

Principal component analysis (PCA) is a well-established method commonly used to explore and visualize data. A classical PCA model is the fixed effect model where data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we propose a regularized version of PCA that boils down to threshold the singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularized term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularized PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The method is illustrated through a simulation study and a real dataset coming from genetics. We will also highlight the ability of the method to handle properly missing values.

 Informations