Assessing Large Language Model Alignment Towards Radiological Myths and Misconceptions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Objectives

Topics in radiation, such as radiology and nuclear energy usage, are rife with speculation, opinion, and misconception, posing potential risks to public health and safety if misunderstandings are left uncorrected. Achieving objective and unbiased discussion is therefore critical for advancing the field of radiation protection and ensuring that policy, research, and clinical practices are guided by accurate information. Moreover, the increased adoption of AI and large language models in recent times has necessitated an investigation into AI sentiment towards radiological topics, as well as the usage of AI in analyzing or affecting this sentiment.

Methods

A systematic framework was developed to extract agreement and sentiment towards radiological ideas in a structured format. Using this method, we test several large language models, primarily OpenAI’s GPT class of models, on their susceptibility to common radiological myths and mis-conceptions, their cultural and linguistic bias towards controversial radiological topics, and their philosophical/moral alignment in various radiation scenarios. We also use large OpenAI’s GPT 4o mini as a tool to analyze community sentiment towards radiation in the /r/Radiation subreddit from February 2021 to December 2023. Finally, a novel AntiRadiophobeGPT is created to counter radiophobic and myth-containing rhetoric, which is then deployed and evaluated against actual user comments.

Results

It is found that GPT 4o mini is more susceptible to overt agreement with controversial radiological views or myths/misconceptions compared to GPT 4o. As well, the use of smaller models and/or Chinese-language prompts or models significantly increases model bias towards a cultural controversial radiological topic. All GPT-class models tested for moral alignment show deontological leanings, although there is some variance in per-scenario utilitarianism. Our analysis of the radiation subreddit reveals that health-related myths are the most prevalent, but that overall community-wide myth prevalence, radiophobia and hostility have significantly decreased over the 3-year period analyzed. Finally, our custom AntiRadiophobeGPT is shown to provide responses which address radiological myths and misconceptions with a high level of truthfulness but with significantly less hostility and radiophobia compared to actual users.

Conclusions

Our findings demonstrate that large language models can detect and counter radiological myths while also exhibiting vulnerabilities to similar misconceptions. By monitoring community sentiment and deploying targeted anti-misinformation tools, these models can strengthen public understanding of radiation and reduce harmful radiophobia. While AntiRadiophobeGPT shows promise in correcting misconceptions, its deployment must be approached with caution and robust oversight to safeguard against unintended manipulations and ensure responsible public discourse. This duality underscores both the potential and limitations in enhancing radiation protection strategies with LLMs.

Simple Summary

Humans are susceptible to believing in or espousing myths and misconceptions in the radiological field. Since the advent of large language models (LLMs)—with applications such as ChatGPT processing over one billion queries per day, and Gemini now integrated into Google’s search engine—these models have steadily evolved into a major source of information for vast audiences. The aim of our study was to assess whether large language models, which are primarily trained on human-generated data, are susceptible to the same underlying sentiments and biases with regards to radiological topics. Furthermore, we assess the use of large language models in both detecting and analyzing trends in the circulation of radiological myths and misconceptions in online communities. Finally, we evaluate the use of large language models as supportive tools for improving communication on these controversial topics.

Article activity feed