Artificial Intelligence in Healthcare: 2023 Year in Review

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

The infodemic we are experiencing with AI related publications in healthcare is unparalleled. The excitement and fear surrounding the adoption of rapidly evolving AI in healthcare applications pose a real challenge. Collaborative learning from published research is one of the best ways to understand the associated opportunities and challenges in the field. To gain a deep understanding of recent developments in this field, we have conducted a quantitative and qualitative review of AI in healthcare research articles published in 2023.

Methods

We performed a PubMed search using the terms, “machine learning” or “artificial intelligence” and “2023”, restricted to English language and human subject research as of December 31, 2023 on January 1, 2024. Utilizing a Deep Learning-based approach, we assessed the maturity of publications. Following this, we manually annotated the healthcare specialty, data utilized, and models employed for the identified mature articles. Subsequently, empirical data analysis was performed to elucidate trends and statistics.Similarly, we performed a search for Large Language Model(LLM) based publications for the year 2023.

Results

Our PubMed search yielded 23,306 articles, of which 1,612 were classified as mature. Following exclusions, 1,226 articles were selected for final analysis. Among these, the highest number of articles originated from the Imaging specialty (483), followed by Gastroenterology (86), and Ophthalmology (78). Analysis of data types revealed that image data was predominant, utilized in 75.2% of publications, followed by tabular data (12.9%) and text data (11.6%). Deep Learning models were extensively employed, constituting 59.8% of the models used. For the LLM related publications,after exclusions, 584 publications were finally classified into the 26 different healthcare specialties and used for further analysis. The utilization of Large Language Models (LLMs), is highest in general healthcare specialties, at 20.1%, followed by surgery at 8.5%.

Conclusion

Image based healthcare specialities such as Radiology, Gastroenterology and Cardiology have dominated the landscape of AI in healthcare research for years. In the future, we are likely to see other healthcare specialties including the education and administrative areas of healthcare be driven by the LLMs and possibly multimodal models in the next era of AI in healthcare research and publications.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/11582961.

    This review is the result of a virtual, collaborative live review discussion organized and hosted by PREreview and JMIR Publications. The discussion was joined by 34 people: 2 facilitators, 3 members of the JMIR Publication team, 3 authors, and 26 live review participants. Rachel Grasfield, Ranjani Harish, Kolapo Oyebola, Arya Rahgozar, Renato Sabbatini, Nour Shaballout, and Trevor van Mierlo wished to be recognized for their participation in the live review discussion, even though they have not contributed to authoring the review below. We thank all participants who contributed to the discussion and made it possible for us to provide feedback on this preprint. 

    Summary

    Research Question: The review aims to understand the development and applications of Artificial Intelligence (AI) in healthcare, assessing the prevalence and impact of AI methods in biomedical research. The focus is on the frequency and types of publications in 2023, aiming to provide a comprehensive overview of the current AI landscape in healthcare and identify areas needing further research. It also aims to address the specialties in medicine with greater use of AI.

    Research Approach: The authors employed a mixed-methods approach, combining classical bibliometric analysis and advanced deep learning (DL) techniques to analyze PubMed data. They established search criteria to collect papers published in 2023 related to AI and machine learning (ML) in healthcare, identifying subcategories and medical specialties within the dataset. 

    Research Findings: An increase in AI-related healthcare publications was observed in 2023, with a total of 23,306 articles, which constitutes a 133.7% increase from the previous year. The analysis revealed a differential uptake of AI models across medical fields, with imaging being the most prevalent. The review also noted a spread across various specialties, with cardiology, gastroenterology, ophthalmology, and general clinics being prominent, while psychiatry had fewer publications.

    Interesting Aspects: The review provided insights into future AI trends in medicine and healthcare. The growth of image-based publications and the usefulness of AI in specific healthcare specialties were particularly noted. The review provides a valuable snapshot of AI's role in healthcare research in 2023, highlighting the rapid growth and diverse applications of AI technologies across medical specialties. The authors found AI use in imaging was the most prevalent among the specialties and that other fields, such as gastroenterology, ophthalmology,  and psychiatry are promising for future use of AI.

    Relation to Published Literature: The authors attempt to establish their review as a foundational reference for the current state of AI in healthcare literature for the past year and to set the stage for future comparative analyses.  However, prior work in reviewing the use of AI in certain medical fields, such as robotics and hospital administration, which are prevalent in the established literature, have not been thoroughly addressed in the paper and could warrant greater discussion.

    Strengths and Weaknesses: The review's reproducibility and the use of mixed methods for bibliometric analysis are strengths of the paper. However, the concept of "mature" AI models could use improvement because of its vagueness, and the limited keyword search is a potential weakness. Suggestions for improvement include adhering to reporting guidelines like PRISMA-ScR, clarifying the definition of "mature" research, and considering the inclusion of "healthcare" in the initial search criteria. Additionally, the time frame of a single recent year is very short considering the fact that AI has been around since the 1956 Dartmouth conference and has long been used in the forefront of health research. The authors should consider suggesting that future studies with longer timeframes may be carried out as the applications of AI progress over the years.

    Below we list major and minor concerns that were discussed by participants of the Live Review and, where possible, we provide suggestions on how to address those issues.

    Major Concerns

    • The authors should more clearly articulate the generalizability to only within the PubMed corpus and, feasibly, within the search results of their query choice. Alternatively, the authors could consider expanding their search terms to include additional relevant keywords to be more holistic of the range of AI uses within health research and/or expand their database search outside of PubMed to ensure broader inclusiveness of the literature. 

    • The BERT-based maturity classification model and other AI classification methods encountered difficulties in correctly categorizing publications into their respective medical specialties. This challenge stemmed from BERT's vocabulary limitations, as it was only trained on a specific set of tokens, making it struggle with unfamiliar or uncommon terms not present in its training data. The observed accuracy of these models ranged from 30% to 68%, suggesting frequent misclassifications or inaccurate categorizations of publication specialties. The authors should consider further improvements and training of AI models to enhance the accuracy of specialty classification and reduce the occurrence of false positives.

    • The adoption of multimodal models incorporating diverse data types remains restricted, with image data largely prevailing in AI applications within healthcare, despite the vast array of available data. The authors should consider implementing multimodal models that combine different data formats such as text, tabular, and voice, as this could significantly enhance the effectiveness and adaptability of AI in healthcare.

    • Authors should consider how their study deviates from the "gold standard" PRISMA framework and guidelines and discuss why their method would be preferred over PRISMA in this specific case. 

    • The figures need considerable revision before this would be acceptable for publication, Reviewers recommend the following changes:

      • For Figure 4B, authors should include a definition of terms, to make it clear what exactly AI, ML, LLM, NLP, etc. refers to in context of the papers scoped. 

      • To improve readability, the authors should organize the visualizations under sub-headers. For example, Figures 3 and 5 showing visualizations across healthcare specialties can be grouped together under one sub-header. 

      • For Figure 3, imaging should be analyzed separately as it is a major outlier. For example, breaking it down into the specialties in which imaging was applied, as imaging cuts across all specialties (i.e., radiology, oncology, histology, etc)

    • Data and code should be made freely accessible by default and not incumbent on the reader to "request the data" from the authors. The authors should remedy this oversight before submitting for publication.

    Minor Concerns

    • The authors should more carefully define "maturity." Typically, in AI, maturity refers to a property of how well established the use of AI is within an organization. It is not clear from the paper how that generalizes to a medical field per se.

    • The paper should lavish more details on the geographic sub analysis and results. 

    • The authors should provide more details on the justification of why they chose the specific search terms used to define the sampling frame. If possible, the authors should point to an empirical determination of why those specific terms were chosen over other reasonably good choices. 

    • Authors should consider providing more details about the papers scoped, for example: distribution across journals/frequency of journals, and most common author-provided keywords.

    • The authors should explain the term "infodemic" as used in the introduction perhaps by referencing the WHO definition.

    • Authors should provide support for their claims on the "exponential growth of AI", for example by referencing past studies or including visualizations to illustrate this.

    Concluding remarks

    The reviewers agree that this paper makes an important and timely contribution to the literature. There are several issues that should be addressed prior to finalization of the manuscript - however, the reviewers believe these concerns are easily addressable by the authors.  We thank the authors of the preprint for posting their work openly for feedback. We also thank all participants of the Live Review call for their time and for engaging in the lively discussion that generated this review. For more information about PREreview's Live Review please see: https://prereview.org/live-reviews .

    Competing interests

    The authors declare that they have no competing interests.