Mostrar el registro sencillo del ítem

dc.contributorEscuela de Ingenierias Industrial, Informática y Aeroespaciales_ES
dc.contributor.authorCueto López, Nahúm
dc.contributor.authorGarcía Ordás, María Teresa 
dc.contributor.authorDávila Batista, Verónica 
dc.contributor.authorAragonés, Nuria
dc.contributor.authorAlaiz Rodríguez, Rocío 
dc.contributor.authorMoreno, Víctor
dc.contributor.otherIngenieria de Sistemas y Automaticaes_ES
dc.date2019-06-02
dc.date.accessioned2024-01-18T07:53:47Z
dc.date.available2024-01-18T07:53:47Z
dc.identifier.citationCueto-López, N., García-Ordás, M. T., Dávila-Batista, V., Moreno, V., Aragonés, N., & Alaiz-Rodríguez, R. (2019). A comparative study on feature selection for a risk prediction model for colorectal cancer. Computer Methods and Programs in Biomedicine, 177, 219-229. https://doi.org/10.1016/J.CMPB.2019.06.001es_ES
dc.identifier.issn0169-2607
dc.identifier.urihttps://hdl.handle.net/10612/17654
dc.description.abstract[EN]Background and objective: Risk prediction models aim at identifying people at higher risk of developing a target disease. Feature selection is particularly important to improve the prediction model performance avoiding overfitting and to identify the leading cancer risk (and protective) factors. Assessing the stability of feature selection/ranking algorithms becomes an important issue when the aim is to analyze the features with more prediction power. Methods: This work is focused on colorectal cancer, assessing several feature ranking algorithms in terms of performance for a set of risk prediction models (Neural Networks, Support Vector Machines (SVM), Logistic Regression, k-Nearest Neighbors and Boosted Trees). Additionally, their robustness is evaluated following a conventional approach with scalar stability metrics and a visual approach proposed in this work to study both similarity among feature ranking techniques as well as their individual stability. A comparative analysis is carried out between the most relevant features found out in this study and features provided by the experts according to the state-of-the-art knowledge. Results: The two best performance results in terms of Area Under the ROC Curve (AUC) are achieved with a SVM classifier using the top-41 features selected by the SVM wrapper approach (AUC=0.693) and Logistic Regression with the top-40 features selected by the Pearson (AUC=0.689). Experiments showed that performing feature selection contributes to classification performance with a 3.9% and 1.9% improvement in AUC for the SVM and Logistic Regression classifier, respectively, with respect to the results using the full feature set. The visual approach proposed in this work allows to see that the Neural Network-based wrapper ranking is the most unstable while the Random Forest is the most stable. Conclusions: This study demonstrates that stability and model performance should be studied jointly as Random Forest turned out to be the most stable algorithm but outperformed by others in terms of model performance while SVM wrapper and the Pearson correlation coefficient are moderately stable while achieving good model performance. © 2019 Elsevier B.V. All rights reservedes_ES
dc.languageenges_ES
dc.publisherElsevieres_ES
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectMatemáticases_ES
dc.subjectMedicina. Saludes_ES
dc.subject.otherColorectal canceres_ES
dc.subject.otherRisk prediction modeles_ES
dc.subject.otherFeature selectiones_ES
dc.subject.otherStabilityes_ES
dc.titleA Comparative Study on Feature Selection for a Risk Prediction Model for Colorectal Canceres_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.identifier.doi10.1016/J.CMPB.2019.06.001
dc.description.peerreviewedSIes_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.journal.titleComputer Methods and Programs in Biomedicinees_ES
dc.volume.number177es_ES
dc.page.initial219es_ES
dc.page.final229es_ES
dc.type.hasVersioninfo:eu-repo/semantics/submittedVersiones_ES
dc.subject.unesco2404 Biomatemáticases_ES


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional