An Evaluation-Aware Systematic Literature Review of DeepLearning-Based Recommender Systems: Domain Imbalance,Dataset Bias, and Evaluation Practices

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep learning-based recommender systems have been widely adopted across various application domains, leading to a rapid growth in domain-specific research. Despite this expansion, researchers often face challenges when selecting appropriate application domains, identifying suitable datasets, and choosing evaluation metrics that accurately reflect real-world objectives. Existing studies primarily focus on performance improvements using a small set of well-established benchmarks, with limited attention to domain imbalance, dataset-driven bias, and gaps in evaluation practices. The findings reveal a strong concentration of research in domains, including education, e-commerce, and social media, whereas application areas, including healthcare and scholarly publication, remain comparatively underexplored. In addition, the review identifies a heavy reliance on a limited set of public benchmark datasets, such as MovieLens and Amazon, alongside a predominant use of accuracy and ranking-oriented evaluation metrics. These patterns indicated that reported model effectiveness is often influenced by dataset availability and evaluation conventions rather than genuine domain-specific requirements. By highlighting underexplored application domains, commonly used datasets, and overlooked evaluation metrics, this review provides decision-oriented and evaluation-aware insights to guide researchers in selecting research directions, choosing appropriate datasets, and designing more balanced evaluation strategies for future deep learning-based recommender systems research.

Article activity feed