Efficient Malicious Domain Detection Using a Distributed Deep Forest Algorithm

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper tackles the growing problem of malicious activities exploiting the Domain Name System (DNS) on the Internet, a major concern for cybersecurity. The study examines current methods used to detect malicious domains, pointing out the limitations of traditional reverse technology and the difficulties faced by deep learning techniques, particularly when identifying domains generated by Domain Generation Algorithms (DGAs). As DNS traffic-based detection becomes a more widely accepted approach, this research introduces a robust system that uses HBase for efficient data warehousing, Hive for comprehensive DNS data analysis, and Spark to build a scalable engine for domain detection. Our approach extracts 24 distinct features from DNS traffic, grouped into categories such as construction-based, time-based, TTL-based, and request-response-based features. These features are evaluated through statistical distribution analysis to differentiate benign from malicious domains. At the heart of the detection engine is the Deep Forest algorithm, which offers key advantages over conventional neural networks, including fewer parameters, faster training, and enhanced adaptability. Additionally, we present a Selective Integration Algorithm to optimize the deep forest model, yielding better accuracy and recall compared to traditional random forest and deep forest methods. The system is designed to process real-time DNS traffic, capable of handling tens of thousands of DNS messages per second, a necessary feature for large-scale deployments. Our experimental results confirm that this approach provides an efficient and highly accurate solution for detecting malicious domains, contributing significantly to ongoing efforts to strengthen cybersecurity against evolving online threats.

Article activity feed