RT info:eu-repo/semantics/article T1 An Information Theoretic Approach to Quantify the Stability of Feature Selection and Ranking Algorithms A1 Alaiz Rodríguez, Rocío A1 Parnell, Andrew C. A2 Ingenieria de Sistemas y Automatica K1 Ingeniería de sistemas K1 Feature selection K1 Feature ranking K1 Stability K1 Robustness K1 Jensen-Shannon divergence AB [EN] Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data in fields like biomedicine, bioinformatics, genetics or chemometrics by selecting the most relevant features out of the noisy, redundant and irrel- evant features. A problem that arises in many of these applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations, but it has been long overlooked in the literature. We propose an information-theoretic approach based on the Jensen-Shannon di-vergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, top-k lists (feature subsets) as well as the lesser studied partial ranked lists that keep the k best ranked elements. This generalized metric quantifies the dif-ference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties for a stability metric including correction for change, and upper/lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearman’s rank correlation and the Kuncheva’s index on feature ranking and selection outcomes respectively. PB Elsevier SN 0950-7051 LK https://hdl.handle.net/10612/17647 UL https://hdl.handle.net/10612/17647 NO Alaiz-Rodríguez, R., & Parnell, A. C. (2020). An information theoretic approach to quantify the stability of feature selection and ranking algorithms. Knowledge-Based Systems, 195. https://doi.org/10.1016/J.KNOSYS.2020.105745 DS BULERIA. Repositorio Institucional de la Universidad de León RD 03-jun-2024