dc.contributor | Escuela de Ingenierias Industrial, Informática y Aeroespacial | es_ES |
dc.contributor.author | Alaiz Rodríguez, Rocío | |
dc.contributor.author | Parnell, Andrew C. | |
dc.contributor.other | Ingenieria de Sistemas y Automatica | es_ES |
dc.date | 2020-03-06 | |
dc.date.accessioned | 2024-01-17T13:04:47Z | |
dc.date.available | 2024-01-17T13:04:47Z | |
dc.identifier.citation | Alaiz-Rodríguez, R., & Parnell, A. C. (2020). An information theoretic approach to quantify the stability of feature selection and ranking algorithms. Knowledge-Based Systems, 195. https://doi.org/10.1016/J.KNOSYS.2020.105745 | es_ES |
dc.identifier.issn | 0950-7051 | |
dc.identifier.uri | https://hdl.handle.net/10612/17647 | |
dc.description.abstract | [EN] Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data in fields like biomedicine, bioinformatics, genetics or chemometrics by selecting the most relevant features out of the noisy, redundant and irrel- evant features. A problem that arises in many of these applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations, but it has been long overlooked in the literature. We propose an information-theoretic approach based on the Jensen-Shannon di-vergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, top-k lists (feature subsets) as well as the lesser studied partial ranked lists that keep the k best ranked elements. This generalized metric quantifies the dif-ference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties for a stability metric including correction for change, and upper/lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearman’s rank correlation and the Kuncheva’s index on feature ranking and selection outcomes respectively. | es_ES |
dc.language | eng | es_ES |
dc.publisher | Elsevier | es_ES |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Ingeniería de sistemas | es_ES |
dc.subject.other | Feature selection | es_ES |
dc.subject.other | Feature ranking | es_ES |
dc.subject.other | Stability | es_ES |
dc.subject.other | Robustness | es_ES |
dc.subject.other | Jensen-Shannon divergence | es_ES |
dc.title | An Information Theoretic Approach to Quantify the Stability of Feature Selection and Ranking Algorithms | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.identifier.doi | 10.1016/j.knosys.2020.105745 | |
dc.description.peerreviewed | SI | es_ES |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es_ES |
dc.journal.title | Knowledge-Based Systems | es_ES |
dc.volume.number | 195 | es_ES |
dc.page.initial | 105745 | es_ES |
dc.type.hasVersion | info:eu-repo/semantics/submittedVersion | es_ES |
dc.description.project | This research has been funded with support from the Euro-pean Commission under the 4NSEEK project with Grant Agree-ment 821966.
Andrew Parnell’s work was supported by a Science Foundation
Ireland Career Development Award grant 17/CDA/4695 and an SFI
centre, Ireland grant 12/RC/2289_P2. | |