RT info:eu-repo/semantics/article
T1 A review of spam email detection: analysis of spammer strategies and the dataset shift problem
A1 Jáñez Martino, Francisco
A1 Alaiz Rodríguez, Rocío
A1 González Castro, Víctor
A1 López Fidalgo, Eduardo
A1 Alegre Gutiérrez, Enrique
A2 Ingenieria de Sistemas y Automatica
K1 Ingeniería de sistemas
K1 Spam email detection
K1 Dataset shift
K1 Adversarial machine learning
K1 Spammer strategies
K1 Feature selection
AB Spam emails have been traditionally seen as just annoying and unsolicited emails containing advertisements, but they increasingly include scams, malware or phishing. In order to ensure the security and integrity for the users, organisations and researchers aim to develop robust filters for spam email detection. Recently, most spam filters based on machine learning algorithms published in academic journals report very high performance, but users are still reporting a rising number of frauds and attacks via spam emails. Two main challenges can be found in this field: (a) it is a very dynamic environment prone to the dataset shift problem and (b) it suffers from the presence of an adversarial figure, i.e. the spammer. Unlike classical spam email reviews, this one is particularly focused on the problems that this constantly changing environment poses. Moreover, we analyse the different spammer strategies used for contaminating the emails, and we review the state-of-the-art techniques to develop filters based on machine learning. Finally, we empirically evaluate and present the consequences of ignoring the matter of dataset shift in this practical field. Experimental results show that this shift may lead to severe degradation in the estimated generalisation performance, with error rates reaching values up to 48.81%.
PB Springer
SN 0269-2821
LK http://hdl.handle.net/10612/14967
UL http://hdl.handle.net/10612/14967
NO .
DS BULERIA. Repositorio Institucional de la Universidad de León
RD 26-abr-2024