SafeSurf Darknet 2025: A Novel Dataset for Darknet Traffic Detection and Analysis
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (PREreview)
Abstract
The growing threat of darknet-related activities, ranging from illegal marketplaces to command-and-control infrastructures, has made the accurate identification of darknet traffic a critical concern for cybersecurity professionals. In response to the lack of high-quality, well-labeled datasets in this domain, we present a newly created darknet traffic dataset to support research and analysis efforts in network security. The dataset was developed to address data availability, consistency, and challenges with labeling accuracy. It comprises around 92 megabytes of traffic data on the first layer and 35 megabytes of traffic data on the second and third layer, including nearly 253K individual flows and 79 distinct features (source/destination IPs, ports, protocols, timestamps, etc.), Each entry is labeled according to its nature as darknet or non-darknet traffic in the first layer, and further labeled by darknet type and behavior in the second and third layers, respectively. Potential applications include threat intelligence research, network traffic analysis, and testing security tools and policies. The dataset has a comprehensive three-layered label, indicating its relevance and practical utility for understanding darknet traffic behavior in various applications.
Article activity feed
-
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/16750812.
Summary
This paper introduces SafeSurf Darknet 2025, a new dataset aimed at improving the detection and classification of darknet traffic. The dataset is structured across three layers: from basic darknet vs normal classification to protocol-level and behaviour-level labelling. It includes data from five major darknet technologies (Tor, I2P, Freenet, ZeroNet, and VPN) and captures a wide range of user behaviours like email, video streaming, chat, and file transfer.
The authors collect this traffic in controlled environments using a combination of physical and virtual machines and process it with tools such as Wireshark and CICFlowMeter. They benchmark a variety of machine learning models …
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/16750812.
Summary
This paper introduces SafeSurf Darknet 2025, a new dataset aimed at improving the detection and classification of darknet traffic. The dataset is structured across three layers: from basic darknet vs normal classification to protocol-level and behaviour-level labelling. It includes data from five major darknet technologies (Tor, I2P, Freenet, ZeroNet, and VPN) and captures a wide range of user behaviours like email, video streaming, chat, and file transfer.
The authors collect this traffic in controlled environments using a combination of physical and virtual machines and process it with tools such as Wireshark and CICFlowMeter. They benchmark a variety of machine learning models (Decision Trees, XGBoost, MLP, etc.) across all three layers and report high classification accuracy, particularly with tree-based models.
This work contributes to the field by filling a significant gap in publicly available, behaviorally labelled darknet datasets. Most existing resources are limited to binary classification or single-platform data, but this dataset supports fine-grained, multi-class analysis. As a result, it can help researchers develop more nuanced models for darknet traffic classification, build better intrusion detection systems, and explore behavioural profiling in encrypted networks, which are areas that are becoming increasingly important in cybersecurity and threat intelligence.
Major issues
More clarity needed on model selection: The paper states that Decision Trees were chosen as the final model, but other models like XGBoost and ensemble methods also performed well. A bit more explanation on why Decision Trees were preferred beyond speed would help support the choice.
Simulated environments only: The dataset was collected in a controlled lab setup, which is fine, but there's no external validation to show how well the models would perform on real darknet traffic. This limits how generalizable the results might be.
Minor issues
1. Some grammar and phrasing inconsistencies
Examples:
"Decision Trees consistently demonstrated inferior results. Consequently, we selected the Decision Tree algorithm as the best algorithm." (Should say superior, not inferior)
Awkward phrasing like "It can be used for both classification and regression tasks. The random forest classifier has two main parameters…" those two sentences are a bit disjointed.
2. Feature list missing
The paper mentions that 79 features were extracted using CICFlowMeter but doesn't include a list. A summary table or appendix would make the dataset easier to adopt.
Figures lack context
ROC curves and confusion matrices are presented (pages 12–13), but:
There's no legend for which class is which
Figures are helpful, but need more explanation
Missing context about the dataset collection timeframe
The authors describe how they generated traffic, but don't mention:
How long the data was collected for
Whether all behaviours were captured equally
This affects reproducibility and how representative the dataset may be.
5. Ethical clarity
The ethics section is thoughtful. It's clear they avoided illegal services and anonymised data, but it doesn't say:
Whether IRB review or institutional approval was needed or obtained
If there are any limitations on the dataset use for other researchers
What's Working Well
The dataset seems well thought out and addresses real gaps in the field.
Labelling by behaviour (not just darknet, yes/no) is super valuable.
The breadth of ML model testing is impressive. It covers everything from Naive Bayes to XGBoost and neural nets.
Suggestions
Include a list of the 79 features used in the CSV dataset.
Explain more clearly why Decision Trees were chosen over other strong performers like XGBoost.
Consider including a short guide or example notebook to help people use the dataset easily.
If possible, add cross-validation with another dataset to show generalizability.
Competing interests
The author declares that they have no competing interests.
-