Leveraging Predominant Lexical Features to Enhance Malicious URL Detection for Cybersecurity Sustainability

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The Malaysia’s Government purpose to provide great services to its citizens. Information delivery via website is a supplying method that enables citizens to reassess perceptions of the Government's reliability. However, attackers create an identical website to exploit weakness on the webpage. They attempt to deceive the victim by clicking same website in order to obtain the victim's information or control their computer. According to Google's Transparency Report, 2.195 million websites were classified as "Sites Deemed Dangerous by Safe Browsing" on January 17, 2021, with 2.1 million of them being phishing sites. This study aims to improve the accuracy of malicious URL detection via a machine learning model by optimizing feature selection and extraction of lexical features through RFI, SFM, and N-Gram techniques. This study also seeks to develop a model that can improve imbalance dataset by concentrating on raising the quality of the data in order to achieve more effective malicious detection. This proposed an enhancing feature detection process in malicious URL detection that would focusing on improving the detection accuracy and faster detection that contributing the detection of malicious URLs based on lexical features. In this study, performance evaluation metrics like accuracy, precision, f-score, and recall are utilized to compare the findings. In conclusion, this study found that utilizing lexical characteristics and a machine learning model produced promising results in detecting harmful URLs and effectively distinguishing between benign and dangerous URLs.

Article activity feed