A Method for Identifying Predatory Journals Driven by Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study investigates whether the method of fine-tuning language models can be applied to the task of predatory journal identification and seeks to identify the optimal fine-tuning strategy feasible in similar practical environments. This study employs the Low-Rank Adaptation (LoRA) method to perform instruction-supervised fine-tuning on open-source distilled models (such as DeepSeek-R1-Distill-Qwen-1.5B) based on different strategies. Additionally, three machine learning algorithms and a general large language model API call solution were introduced to compare the performance differences between fine-tuned models, traditional classifiers, and non-fine-tuned large models. The results indicate that a 1.5B model fine-tuned with 398 structured samples surpassed the performance of non-fine-tuned general large models in the specific task, achieving an accuracy of 76%. A 7B model fine-tuned using the same strategy achieved an accuracy of 92%. The comparison revealed that fine-tuning can enhance the performance of distilled models in executing domain-specific tasks, and an increase in the parameter scale of the baseline model can significantly improve the performance of its fine-tuned version in the specific task.