An Information Theoretic Approach to Quantify the Stability of Feature Selection and Ranking Algorithms

Alaiz Rodríguez, Rocío; Parnell, Andrew C.

doi:10.1016/j.knosys.2020.105745

Título

An Information Theoretic Approach to Quantify the Stability of Feature Selection and Ranking Algorithms

dc.contributor	Escuela de Ingenierias Industrial, Informática y Aeroespacial	es_ES
dc.contributor.author	Alaiz Rodríguez, Rocío
dc.contributor.author	Parnell, Andrew C.
dc.contributor.other	Ingenieria de Sistemas y Automatica	es_ES
dc.date	2020-03-06
dc.date.accessioned	2024-01-17T13:04:47Z
dc.date.available	2024-01-17T13:04:47Z
dc.identifier.citation	Alaiz-Rodríguez, R., & Parnell, A. C. (2020). An information theoretic approach to quantify the stability of feature selection and ranking algorithms. Knowledge-Based Systems, 195. https://doi.org/10.1016/J.KNOSYS.2020.105745	es_ES
dc.identifier.issn	0950-7051
dc.identifier.uri	https://hdl.handle.net/10612/17647
dc.description.abstract	[EN] Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data in fields like biomedicine, bioinformatics, genetics or chemometrics by selecting the most relevant features out of the noisy, redundant and irrel- evant features. A problem that arises in many of these applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations, but it has been long overlooked in the literature. We propose an information-theoretic approach based on the Jensen-Shannon di-vergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, top-k lists (feature subsets) as well as the lesser studied partial ranked lists that keep the k best ranked elements. This generalized metric quantifies the dif-ference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties for a stability metric including correction for change, and upper/lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearman’s rank correlation and the Kuncheva’s index on feature ranking and selection outcomes respectively.	es_ES
dc.language	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Ingeniería de sistemas	es_ES
dc.subject.other	Feature selection	es_ES
dc.subject.other	Feature ranking	es_ES
dc.subject.other	Stability	es_ES
dc.subject.other	Robustness	es_ES
dc.subject.other	Jensen-Shannon divergence	es_ES
dc.title	An Information Theoretic Approach to Quantify the Stability of Feature Selection and Ranking Algorithms	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.identifier.doi	10.1016/j.knosys.2020.105745
dc.description.peerreviewed	SI	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.journal.title	Knowledge-Based Systems	es_ES
dc.volume.number	195	es_ES
dc.page.initial	105745	es_ES
dc.type.hasVersion	info:eu-repo/semantics/submittedVersion	es_ES
dc.description.project	This research has been funded with support from the Euro-pean Commission under the 4NSEEK project with Grant Agree-ment 821966. Andrew Parnell’s work was supported by a Science Foundation Ireland Career Development Award grant 17/CDA/4695 and an SFI centre, Ireland grant 12/RC/2289_P2.

Files in this item

Name:: Information_Theoretic_Approach ...
Size:: 3.931 xmlui.dri2xhtml.METS-1.0.size-megabytes
Format:: application/pdf

FilesOpen

This item appears in the following Collection(s)

Untitled [5590]

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional