Phishing websites detection using a novel multipurpose dataset and web technologies features

Sanchez Paniagua, Manuel; Fidalgo Fernández, Eduardo; Alegre Gutiérrez, Enrique; Alaiz Rodríguez, Rocío

doi:10.1016/j.eswa.2022.118010

Título

Phishing websites detection using a novel multipurpose dataset and web technologies features

dc.contributor	Escuela de Ingenierias Industrial, Informática y Aeroespacial	es_ES
dc.contributor.author	Sanchez Paniagua, Manuel
dc.contributor.author	Fidalgo Fernández, Eduardo
dc.contributor.author	Alegre Gutiérrez, Enrique
dc.contributor.author	Alaiz Rodríguez, Rocío
dc.contributor.other	Ingenieria de Sistemas y Automatica	es_ES
dc.date	2022-06-30
dc.date.accessioned	2024-01-11T13:04:13Z
dc.date.available	2024-01-11T13:04:13Z
dc.identifier.citation	Sánchez-Paniagua, M., Fidalgo, E., Alegre, E., & Alaiz-Rodríguez, R. (2022). Phishing websites detection using a novel multipurpose dataset and web technologies features. Expert Systems with Applications, 207. https://doi.org/10.1016/J.ESWA.2022.118010	es_ES
dc.identifier.issn	0957-4174
dc.identifier.uri	https://hdl.handle.net/10612/17582
dc.description.abstract	[EN] Phishing attacks are one of the most challenging social engineering cyberattacks due to the large amount of entities involved in online transactions and services. In these attacks, criminals deceive users to hijack their credentials or sensitive data through a login form which replicates the original website and submits the data to a malicious server. Many anti-phishing techniques have been developed in recent years, using different resource such as the URL and HTML code from legitimate index websites and phishing ones. These techniques have some limitations when predicting legitimate login websites, since, usually, no login forms are present in the legitimate class used for training the proposed model. Hence, in this work we present a methodology for phishing website detection in real scenarios, which uses URL, HTML, and web technology features. Since there is not any updated and multipurpose dataset for this task, we crafted the Phishing Index Login Websites Dataset (PILWD), an offline phishing dataset composed of 134,000 verified samples, that offers to researchers a wide variety of data to test and compare their approaches. Since approximately three-quarters of collected phishing samples request the introduction of credentials, we decided to crawl legitimate login websites to match the phishing standpoint. The developed approach is independent of third party services and the method relies on a new set of features used for the very first time in this problem, some of them extracted from the web technologies used by the on each specific website. Experimental results show that phishing websites can be detected with 97.95% accuracy using a LightGBM classifier and the complete set of the 54 features selected, when it was evaluated on PILWD dataset.	es_ES
dc.language	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Informática	es_ES
dc.subject.other	Phishing detection	es_ES
dc.subject.other	Phishing dataset	es_ES
dc.subject.other	Web technologies	es_ES
dc.subject.other	Machine learning	es_ES
dc.subject.other	Login	es_ES
dc.title	Phishing websites detection using a novel multipurpose dataset and web technologies features	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.identifier.doi	10.1016/j.eswa.2022.118010
dc.description.peerreviewed	SI	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.journal.title	Expert Systems with Applications	es_ES
dc.volume.number	207	es_ES
dc.page.initial	118010	es_ES
dc.type.hasVersion	info:eu-repo/semantics/publishedVersion	es_ES
dc.subject.unesco	1203.17 Informática	es_ES
dc.subject.unesco	1	es_ES
dc.description.project	INCIBE	es_ES
dc.description.project	Universidad de León	es_ES

Files in questo item

Nombre:: Phishing_Websites_Detection.pdf
Dimensione:: 2.829 xmlui.dri2xhtml.METS-1.0.size-megabytes
Formato:: application/pdf

Mostra/Apri

Questo item appare nelle seguenti collezioni

Untitled [5241]

Mostra i principali dati dell'item

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional