The Perfect AI Research Assistant: Are ScholarGPT’s References More Reliable than ChatGPT’s?

Omar Kiwan
Bisher Tulimat
Mohammed Al-Kalbani
Amr Musa
Omar Edlebi
Tyler Munro

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction: Artificial Intelligence has made a big impact in surgery, particularly in surgical academia. ChatGPT’s role in research has been widely studied but has received scepticism for generating inaccurate and non-existent references. ScholarGPT has been introduced to address these concerns, but its reliability has yet to be assessed. This study therefore aims to evaluate the accuracy of orthopaedic and plastic surgery references generated by ScholarGPT and compare its performance with that of ChatGPT. Methods: References were collected and assessed in a systematic manner. Each model was asked to generate 50 references, 40 for orthopaedics and 10 for plastic surgery. References were collected over 5 rounds, with each generating 10 references. Each reference was manually verified based on its existence, the accuracy of its PubMed ID and the validity of its DOI.The overall accuracy rates for each model were then calculated. Results: ScholarGPT demonstrated a 100% accuracy rate, generating 50 verifiable references with valid PubMed IDs and DOIs. ChatGPT demonstrated a 42% accuracy rate. No non-existent references were generated by either model. ScholarGPT also provided a more diverse range of references whereas ChatGPT’s references were non-specific. Conclusion: ScholarGPT outperformed ChatGPT and showed itself to be capable of providing reliable evidence for research outputs. Further research should explore its applications across other clinical specialities to validate its effectiveness. Integrating ScholarGPT into writing research articles should also be studied.

Version published to 10.21203/rs.3.rs-6254598/v1 on Research Square
Mar 21, 2025

Evaluating the Performance of AI Chatbots in Responding to Dental Implant FAQs: A Comparative Study

This article has 5 authors:
1. Mesut TUZLALI
2. Nagehan BAKİ
3. Kübra ARAL
4. Cüneyt Asım ARAL
5. Erkan BAHÇE
This article has no evaluationsLatest version Jun 10, 2025
The Scope and Limitations of Extant Research into ChatGPT as a Tool for Patient Education: Systematic Review

This article has 4 authors:
1. Reid Dale
2. Maggie Cheng
3. Katharine Casselman Pines
4. Maria Elizabeth Currie
This article has no evaluationsLatest version May 21, 2025
Do They Learn When They Read? A Two-Stage Evaluation of AI Models’ Orthopedic Knowledge Using Orthobullets and Miller’s Review

This article has 3 authors:
1. Mahircan Demir
2. Hunkar Cagdas Bayrak
3. Ibrahim Faruk Adiguzel
This article has no evaluationsLatest version May 30, 2025

Listed in

Abstract

Article activity feed

Related articles

Evaluating the Performance of AI Chatbots in Responding to Dental Implant FAQs: A Comparative Study

The Scope and Limitations of Extant Research into ChatGPT as a Tool for Patient Education: Systematic Review

Do They Learn When They Read? A Two-Stage Evaluation of AI Models’ Orthopedic Knowledge Using Orthobullets and Miller’s Review