Quality-Aware Evaluation for Journal Recommendation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective: Journal recommendation systems are tools that analyze a manuscript’s content and suggest potential target journals. Most tools that suggest journals are evaluated using simple accuracy measures, such as Top-K accuracy metrics: the recommendation is considered correct only if the exact journal is predicted. All other recommendations are counted as errors. However, this approach ignores journal quality. Recommending a different journal of similar quality is very different from recommending a journal that is far below the appropriate quality level. We developed a Quality-Aware evaluation tool designed to assess whether journal recommendation tools suggest journals of an appropriate quality level, rather than simply whether they predict the exact target journal. Methods: We developed specialty-specific journal recommendation models for medical fields using transformer-based architectures trained on nearly one million PubMed articles (2020–2024; 2020–2025 for Cardiology). Using SCImago quartiles as a proxy for journal quality, we evaluated models using both standard metrics and novel Quality-Aware metrics (Quality Accuracy, Undersell Rate, Severe Undersell Rate). We examined the relationship between these metric families to characterize when recommendation lists can be considered reliable. Results: Across five specialties, mean accuracy@1 was 47.9% (range: 39.9%–55.0%), with Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG) of 0.60 and 0.68, respectively. However, Quality Accuracy@1 averaged 67.7%, exceeding raw accuracy by approximately 20 percentage points. This gap indicates that more than a third of all prediction “errors” under standard metrics involved journals of equivalent quality. Furthermore, Quality Consistency@3 averaged 54.0%, demonstrating that, on average, the majority of journals in the top-3 recommendations consistently aligned with the target journal’s quality tier. Severe Undersell Rates (recommendations two or more quartiles below the ground truth) averaged 7.9%, with the best-performing models achieving rates as low as 5.2%. Conclusion: Standard evaluation metrics for journal recommendation are insufficient because they treat all errors as equivalent. In this study, Quality Accuracy@1 substantially exceeded top@1 accuracy, indicating that a large proportion of apparent “errors” involved journals of equivalent quality. Quality-Aware metrics, combined with ranking quality measures, provide a more complete assessment of whether a system produces reliable recommendations. We propose that journal recommendation systems report Quality-Aware metrics alongside traditional accuracy to better characterize real-world utility. MSC Codes: 68T10 (Pattern recognition); 62P10 (Applications of statistics to biology and medical sciences) JEL Codes: O32 (Management of Technological Innovation and R&D)

Article activity feed