A Study on Improving the Automatic Classification Performance of Cybersecurity MITRE ATT&CK Tactics Using NLP-Based ModernBERT and BERTopic Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Cyber Threat Intelligence (CTI) reports are essential resources for identifying the Tactics, Techniques, and Procedures (TTPs) of hackers and cyber threat actors. However, these reports are often lengthy and unstructured, which limits their suitability for automatic mapping to the MITRE ATT&CK framework. This study designs and compares five hybrid classification models that combine statistical features (TF-IDF), transformer-based contextual embeddings (BERT and ModernBERT), and topic-level representations (BERTopic) to automatically classify CTI reports into 12 ATT&CK tactic categories. Experiments using the rcATT dataset, consisting of 1490 public threat reports, show that the model integrating TF-IDF and ModernBERT achieved a micro-precision of 72.25%, reflecting a 10.07-percentage-point improvement in detection precision compared with the baseline. The model combining TF-IDF and BERTopic achieved a micro F0.5 of 67.14% and a macro F0.5 of 63.20%, demonstrating balanced performance across both frequent and rare tactic classes. These findings indicate that integrating statistical, contextual, and semantic representations can improve the balance between precision and recall while enabling clearer interpretation of model outputs in multi-label CTI classification. Furthermore, the proposed model shows potential applicability for improving detection efficiency and reducing analyst workload in Security Operations Center (SOC) environments.

Article activity feed