The Art of AI Dialogue: Evaluating Applied AI Literacy in Medical Students Using a Performance-Based Rubric — A Single-Institution Observational Study

Nino Shiukashvili
Mariam Rochikashvili
Vasil Kupradze
Nana Gonjilashvili
Nino Gvajaia
Luka Kutchava
Nino Tevzadze
Nona Janikashvili
Archil Undilashvili
Eka Ekaladze

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Artificial intelligence (AI) is rapidly transforming healthcare and medical education. While medical students increasingly use generative AI tools in their academic work, existing studies on AI literacy have largely relied on self-reported surveys, providing limited insight into students’ actual behaviors. There remains a critical need for performance-based assessments that evaluate how students engage with AI in real-world tasks. This study aimed to evaluate medical students’ applied AI literacy through analysis of authentic academic artifacts using a structured, behaviorally anchored rubric. Methods: As part of a required Evidence-Based Medicine course, thirty third-year medical students submitted research proposals along with corresponding AI chat transcripts. Each submission was independently evaluated by three faculty members using a custom rubric assessing four domains: Transparency, Purposefulness (prompt generation), Verification & Critical Thinking (bias recognition), and Integration. Scores ranged from 0-3 per domain (maximum total: 12). Results: The average total score was 5.47 (SD = 1.71), indicating moderate applied AI literacy. Domain-level analysis revealed the highest performance in Transparency (M = 2.08, SD = 0.55) and Integration (M = 1.64, SD = 0.67), while Purposefulness (M = 1.33, SD = 0.69) and Verification & Critical Thinking (M = 0.41, SD = 0.71) were significantly lower. A Friedman test confirmed statistically significant differences across domains (χ²(3) = 50.36, p < 0.001). Post-hoc Wilcoxon signed-rank tests showed that Purposefulness and Verification scored significantly lower than both Transparency and Integration (all p < 0.001). Inter-rater reliability was high across domains (ICC = 0.83–0.93, all p < 0.001), supporting the consistency of the rubric-based evaluation. Conclusions: Performance-based evaluation revealed domain-specific weaknesses in applied AI literacy that remain invisible in self-report-based assessments. These findings support the integration of targeted instruction and authentic assessment into medical curricula to better prepare students for ethical and effective AI engagement. As AI continues to reshape clinical practice, equipping future physicians with these competencies is essential.

Version published to 10.21203/rs.3.rs-7600944/v1 on Research Square
Sep 18, 2025

Natural Language Processing and Generative AI in the Automated Scoring and Feedback of Reflective Writing in Medical Education: A Validity and Fairness Analysis

This article has 4 authors:
1. Simon Ntumi
2. Isaac Yabana
3. Daniel William Essel
4. Edmond Ahovi
This article has no evaluationsLatest version Aug 7, 2025
Applications, Attitudes and Ethical Considerations of Generative Artificial Intelligence (Gen AI) In Nursing Education: A Scoping Review.

This article has 9 authors:
1. Philip Hardie
2. Andrew Darley
3. Rosemarie Derwin
4. Jessica Eustace-Cook
5. Sean Kearns
6. Barry Mc Brien
7. Aysha Siddiquee
8. David Zheng
9. Mary Mooney
This article has no evaluationsLatest version Sep 23, 2025
Assessing Generative AI Tools in Somali Higher Education and Its Impact on Student and Faculty Performance at Somali National University

This article has 2 authors:
1. Mohamed Adam Isak
2. Abdirashid Adam Isak
This article has no evaluationsLatest version Sep 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Natural Language Processing and Generative AI in the Automated Scoring and Feedback of Reflective Writing in Medical Education: A Validity and Fairness Analysis

Applications, Attitudes and Ethical Considerations of Generative Artificial Intelligence (Gen AI) In Nursing Education: A Scoping Review.

Assessing Generative AI Tools in Somali Higher Education and Its Impact on Student and Faculty Performance at Somali National University