Mostra i principali dati dell'item

dc.contributorEscuela de Ingenierias Industrial, Informática y Aeroespaciales_ES
dc.contributor.authorSanchez Paniagua, Manuel 
dc.contributor.authorFidalgo Fernández, Eduardo 
dc.contributor.authorAlegre Gutiérrez, Enrique 
dc.contributor.authorAlaiz Rodríguez, Rocío 
dc.contributor.otherIngenieria de Sistemas y Automaticaes_ES
dc.date2022-06-30
dc.date.accessioned2024-01-11T13:04:13Z
dc.date.available2024-01-11T13:04:13Z
dc.identifier.citationSánchez-Paniagua, M., Fidalgo, E., Alegre, E., & Alaiz-Rodríguez, R. (2022). Phishing websites detection using a novel multipurpose dataset and web technologies features. Expert Systems with Applications, 207. https://doi.org/10.1016/J.ESWA.2022.118010es_ES
dc.identifier.issn0957-4174
dc.identifier.urihttps://hdl.handle.net/10612/17582
dc.description.abstract[EN] Phishing attacks are one of the most challenging social engineering cyberattacks due to the large amount of entities involved in online transactions and services. In these attacks, criminals deceive users to hijack their credentials or sensitive data through a login form which replicates the original website and submits the data to a malicious server. Many anti-phishing techniques have been developed in recent years, using different resource such as the URL and HTML code from legitimate index websites and phishing ones. These techniques have some limitations when predicting legitimate login websites, since, usually, no login forms are present in the legitimate class used for training the proposed model. Hence, in this work we present a methodology for phishing website detection in real scenarios, which uses URL, HTML, and web technology features. Since there is not any updated and multipurpose dataset for this task, we crafted the Phishing Index Login Websites Dataset (PILWD), an offline phishing dataset composed of 134,000 verified samples, that offers to researchers a wide variety of data to test and compare their approaches. Since approximately three-quarters of collected phishing samples request the introduction of credentials, we decided to crawl legitimate login websites to match the phishing standpoint. The developed approach is independent of third party services and the method relies on a new set of features used for the very first time in this problem, some of them extracted from the web technologies used by the on each specific website. Experimental results show that phishing websites can be detected with 97.95% accuracy using a LightGBM classifier and the complete set of the 54 features selected, when it was evaluated on PILWD dataset.es_ES
dc.languageenges_ES
dc.publisherElsevieres_ES
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectInformáticaes_ES
dc.subject.otherPhishing detectiones_ES
dc.subject.otherPhishing datasetes_ES
dc.subject.otherWeb technologieses_ES
dc.subject.otherMachine learninges_ES
dc.subject.otherLogines_ES
dc.titlePhishing websites detection using a novel multipurpose dataset and web technologies featureses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.identifier.doi10.1016/j.eswa.2022.118010
dc.description.peerreviewedSIes_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.journal.titleExpert Systems with Applicationses_ES
dc.volume.number207es_ES
dc.page.initial118010es_ES
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersiones_ES
dc.subject.unesco1203.17 Informáticaes_ES
dc.subject.unesco1es_ES
dc.description.projectINCIBEes_ES
dc.description.projectUniversidad de Leónes_ES


Files in questo item

Thumbnail

Questo item appare nelle seguenti collezioni

Mostra i principali dati dell'item

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional