Text-Aware Contrastive Learning for Bridging Graph Components in a Joint Embedding Space
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Graph-structured data across many domains consists of multiple heterogeneous components, each with its own features, that are related to one another even in the absence of explicit edges.In cybersecurity, for example, MITRE ATT\&CK and CWE provide complementary graph-structured knowledge of adversarial behaviors and system weaknesses, each with rich text-annotated nodes but no explicit links between the graphs. This disconnect prevents learning joint embeddings and limits downstream tasks such as similarity search, retrieval, and automated reasoning. We propose AttWeakBridge, a text-aware Graph Contrastive Learning (GCL) framework for embedding disconnected, text-rich graph components into a shared semantic space. Unlike conventional GCL methods that operate on augmented views of a single connected graph, AttWeakBridge performs contrastive learning across separate graph components, combining node text embeddings with structural signals via a dual-encoder GNN and an inter- / intra-graph triplet sampling strategy.Experimental results on real-world cybersecurity graphs (MITRE ATT\&CK and CWE) show that AttWeakBridge (i) improves similarity learning and cross-graph retrieval compared to text-only and structure-only baselines, and (ii) yields more coherent neighborhoods that align attack techniques with relevant weaknesses. This work contributes a general paradigm for embedding heterogeneous, disconnected graphs into a common semantic space.