Evaluating generative artificial intelligence’s limitations in health policy identification and interpretation

Rory Wilson
Ciara M. Weets
Amanda Rosner
Rebecca Katz

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Policy epidemiology utilizes human subject-matter experts (SMEs) to systematically surface, analyze, and categorize legally-enforceable policies. The Analysis and Mapping of Policies for Emerging Infectious Diseases project systematically collects and assesses health-related policies from all United Nations Member States. The recent proliferation of generative artificial intelligence (GAI) tools powered by large language models have led to suggestions that such technologies be incorporated into our project and similar research efforts to decrease the human resources required. To test the accuracy and precision of GAI in identifying and interpreting health policies, we designed a study to systematically assess the responses produced by a GAI tool versus those produced by a SME.

We used two validated policy datasets, on emergency and childhood vaccination policy and quarantine and isolation policy in each United Nations Member State. We found that the SME and GAI tool were concordant 78.09% and 67.01% of the time respectively. It also significantly hastened the data collection processes.

However, our analysis of non-concordant results revealed systematic inaccuracies and imprecision across different World Health Organization regions. Regarding vaccination, over 50% of countries in the African, Southeast Asian, and Eastern Mediterranean regions were inaccurately represented in GAI responses. This trend was similar for quarantine and isolation, with the African and Eastern Mediterranean regions least concordant. Furthermore, GAI responses only provided laws or information missed by the SME 2.14% and 2.48% of the time for the vaccination dataset and for the quarantine and isolation dataset, respectively. Notably, the GAI was least concordant with the SME when tasked with policy interpretation.

These results suggest that GAI tools require further development to accurately identify policies across diverse global regions and interpret context-specific information. However, we found that GAI is a useful tool for quality assurance and quality control processes in health policy identification.

Version published to 10.1101/2024.10.02.24314805v1 on medRxiv
Oct 4, 2024

Estimating the Prevalence of Generative AI Use in Medical School Application Essays

This article has 3 authors:
1. Nicholas C. Spies
2. Valerie S. Ratts
3. Ian S. Hagemann
This article has no evaluationsLatest version Oct 22, 2024
An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

This article has 5 authors:
1. Roman Shusterman
2. Allison Waters
3. Shannon O’Neill
4. Phan Luu
5. Don Tucker
This article has no evaluationsLatest version Oct 21, 2024
A Large Language Model-based Approach for Analyzing Covariates of Health Equity in Registered Research Projects

This article has 2 authors:
1. Navapat Nananukul
2. Mayank Kejriwal
This article has no evaluationsLatest version Sep 26, 2024

Listed in

Abstract

Article activity feed

Related articles

Estimating the Prevalence of Generative AI Use in Medical School Application Essays

An Active Inference Strategy for Prompting Reliable Responses from Large Language Models in Medical Practice

A Large Language Model-based Approach for Analyzing Covariates of Health Equity in Registered Research Projects