Clustering of variables combined with variable selection using random forests : application to gene expression data

Prendre des notes

Il n’y a pas de note disponible pour vous pour cette vidéo.

Connectez-vous pour en créer une nouvelle.

Disciplines

Types

Mots clés

droit 209 pad 203 environnement 118 histoire 101 durable 73 biodiversite 72 archeologie 69 citizen science 64 science participative 64 patrimoine culturel 63 una europa 62 cultural heritage 60 de 53 syndicalisme 48 please 43 relations internationales 43 histoire contemporaine 42 droit des entreprises 41 geographie 41 droit des contrats 37

The main goal of this work is to tackle the problem of dimension reduction for highdimensional supervised classification. The motivation is to handle gene expression data. The proposed method works in 2 steps. First, one eliminates redundancy using clustering of variables, based on the R-package ClustOfVar. This first step is only based on the exploratory variables (genes). Second, the synthetic variables (summarizing the clusters obtained at the first step) are used to construct a classifier (e.g. logistic regression, LDA, random forests). We stress that the first step reduces the dimension and gives linear combinations of original variables (synthetic variables). This step can be considered as an alternative to PCA. A selection of predictors (synthetic variables) in the second step gives a set of relevant original variables (genes). Numerical performances of the proposed procedure are evaluated on gene expression datasets. We compare our methodology with LASSO and sparse PLS discriminant analysis on these datasets.

Ajouté par : Yannick Mahe (ymahe)
Contributeur(s) :
- Université Paris 1 Panthéon - Sorbonne (production)
- Robin Genuer & Vanessa Kuentz-Simonet (Intervenant)
Mis à jour le : 21 juillet 2017 00:00
Chaîne :
- UFR 02 - Ecole d'Economie de la Sorbonne (EES)
Type : Cours / MOOC / SPOC
Langue principale : Anglais
Discipline(s) :
- Mathématiques et informatique appliquées aux sciences humaines et sociales

Réseaux sociaux

Clustering of variables combined with variable selection using random forests : application to gene expression data

Informations