Graph-Based Phishing Domain Detection via Certificate–DNS Heterogeneous Networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Individual phishing URLs are often short-lived, but underlying infrastructure such as domains, IP addresses, and certificates exhibits recurring patterns. We propose a graph-based detection framework that models a heterogeneous network comprising domains, IP addresses, TLS certificates, and registrars. Node embeddings are learned using a relational graph convolutional network (R-GCN) trained on 3.1 million domains, of which 210,000 are labeled as phishing-related. Structural features such as shared-IP communities, certificate reuse, and registrar clusters are incorporated into the model. The graph-based detector is capable of flagging suspicious domains before they are widely used in attacks; in a retrospective study, it identifies 73% of phishing domains at least 24 hours prior to first appearance in blacklists. Compared with domain-lexical baselines, the method improves precision at 90% recall by 15.6 percentage points. These findings demonstrate that infrastructure-level graph modeling provides complementary signals to content-based phishing detection and can enhance proactive defense.

Article activity feed