Compartir
Título
An Information Theoretic Approach to Quantify the Stability of Feature Selection and Ranking Algorithms
Autor
Facultad/Centro
Área de conocimiento
Título de la revista
Knowledge-Based Systems
Datos de la obra
Alaiz-Rodríguez, R., & Parnell, A. C. (2020). An information theoretic approach to quantify the stability of feature selection and ranking algorithms. Knowledge-Based Systems, 195. https://doi.org/10.1016/J.KNOSYS.2020.105745
Editor
Elsevier
Fecha
2020-03-06
ISSN
0950-7051
Abstract
[EN] Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data in fields like biomedicine, bioinformatics, genetics or chemometrics by selecting the most relevant features out of the noisy, redundant and irrel- evant features. A problem that arises in many of these applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations, but it has been long overlooked in the literature. We propose an information-theoretic approach based on the Jensen-Shannon di-vergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, top-k lists (feature subsets) as well as the lesser studied partial ranked lists that keep the k best ranked elements. This generalized metric quantifies the dif-ference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties for a stability metric including correction for change, and upper/lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearman’s rank correlation and the Kuncheva’s index on feature ranking and selection outcomes respectively.
Materia
Palabras clave
Peer review
SI
URI
DOI
Aparece en las colecciones
- Untitled [5590]
Files in questo item
Tamaño:
3.931
xmlui.dri2xhtml.METS-1.0.size-megabytes
Formato:
Adobe PDF