Intelligent Spam Filtering and Analysis Using Web Automation, ReportLab, and Ensemble Machine Learning Techniques

Sathvik Eppakayala
Shiva Kumar Goud Ankula

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The greatest cybersecurity problem of spam emails results in security risks and unnecessary mails in inboxes. To resolve this issue, we developed a real-time automated spam filtering system that efficiently detects and filters spam emails. The goal was to build a model that not only achieves high accuracy but also continuously works without the intervention of a user. We have trained and tested five machine learning classifiers: logistic regression, decision tree, K-nearest neighbors (KNN), Gaussian naive Bayes, and AdaBoost. We have compared their performances with precision, recall, and F1-score. Among them, AdaBoost has performed the best, showing the highest accuracy in classifying spam and legitimate emails. In order to improve the reliability, we combined and balanced two different spam email datasets so that our model adapts well to the various types of emails. This meant that we implemented a fully automatic system by constructing a web application using Python Flask. We deployed Selenium to get emails from a user's inbox automatically and to classify them real-time. Subsequently, it generates an automatically created PDF report using ReportLab, which highlights the detected patterns of spam mail and the success rate of spam filtering. A hands-free mechanism ensures that this process does not require users to check and go through their spam emails manually and thus increases both efficiency and security. Our results indicated that real-time machine learning-based spam filtering along with automation boosts accuracy and reliability. Our system is scalable and adaptable, so it can be useful for many email platforms. We will try to improve the model further, adapting it in order to follow the evolving techniques of spammers, improving speed of processing, and integrating some additional security features. Our contribution is toward making advanced, real-time spam-filtering solutions able to protect the users from unwanted and harmful e-mails.

Version published to 10.21203/rs.3.rs-5941111/v1 on Research Square
Feb 11, 2025

Listed in

Abstract

Article activity feed