An Evaluation-Aware Systematic Literature Review of DeepLearning-Based Recommender Systems: Domain Imbalance,Dataset Bias, and Evaluation Practices
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep learning-based recommender systems have been widely adopted across various application domains, leading to a rapid growth in domain-specific research. Despite this expansion, researchers often face challenges when selecting appropriate application domains, identifying suitable datasets, and choosing evaluation metrics that accurately reflect real-world objectives. Existing studies primarily focus on performance improvements using a small set of well-established benchmarks, with limited attention to domain imbalance, dataset-driven bias, and gaps in evaluation practices. The findings reveal a strong concentration of research in domains, including education, e-commerce, and social media, whereas application areas, including healthcare and scholarly publication, remain comparatively underexplored. In addition, the review identifies a heavy reliance on a limited set of public benchmark datasets, such as MovieLens and Amazon, alongside a predominant use of accuracy and ranking-oriented evaluation metrics. These patterns indicated that reported model effectiveness is often influenced by dataset availability and evaluation conventions rather than genuine domain-specific requirements. By highlighting underexplored application domains, commonly used datasets, and overlooked evaluation metrics, this review provides decision-oriented and evaluation-aware insights to guide researchers in selecting research directions, choosing appropriate datasets, and designing more balanced evaluation strategies for future deep learning-based recommender systems research.