Show simple item record

dc.contributorEscuela de Ingenierias Industrial, Informática y Aeroespaciales_ES
dc.contributor.authorAlaiz Rodríguez, Rocío 
dc.contributor.authorParnell, Andrew C.
dc.contributor.otherIngenieria de Sistemas y Automaticaes_ES
dc.date2020-03-06
dc.date.accessioned2024-01-17T13:04:47Z
dc.date.available2024-01-17T13:04:47Z
dc.identifier.citationAlaiz-Rodríguez, R., & Parnell, A. C. (2020). An information theoretic approach to quantify the stability of feature selection and ranking algorithms. Knowledge-Based Systems, 195. https://doi.org/10.1016/J.KNOSYS.2020.105745es_ES
dc.identifier.issn0950-7051
dc.identifier.urihttps://hdl.handle.net/10612/17647
dc.description.abstract[EN] Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data in fields like biomedicine, bioinformatics, genetics or chemometrics by selecting the most relevant features out of the noisy, redundant and irrel- evant features. A problem that arises in many of these applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations, but it has been long overlooked in the literature. We propose an information-theoretic approach based on the Jensen-Shannon di-vergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, top-k lists (feature subsets) as well as the lesser studied partial ranked lists that keep the k best ranked elements. This generalized metric quantifies the dif-ference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties for a stability metric including correction for change, and upper/lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearman’s rank correlation and the Kuncheva’s index on feature ranking and selection outcomes respectively.es_ES
dc.languageenges_ES
dc.publisherElsevieres_ES
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectIngeniería de sistemases_ES
dc.subject.otherFeature selectiones_ES
dc.subject.otherFeature rankinges_ES
dc.subject.otherStabilityes_ES
dc.subject.otherRobustnesses_ES
dc.subject.otherJensen-Shannon divergencees_ES
dc.titleAn Information Theoretic Approach to Quantify the Stability of Feature Selection and Ranking Algorithmses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.identifier.doi10.1016/j.knosys.2020.105745
dc.description.peerreviewedSIes_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.journal.titleKnowledge-Based Systemses_ES
dc.volume.number195es_ES
dc.page.initial105745es_ES
dc.type.hasVersioninfo:eu-repo/semantics/submittedVersiones_ES
dc.description.projectThis research has been funded with support from the Euro-pean Commission under the 4NSEEK project with Grant Agree-ment 821966. Andrew Parnell’s work was supported by a Science Foundation Ireland Career Development Award grant 17/CDA/4695 and an SFI centre, Ireland grant 12/RC/2289_P2.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional