Vertical Model Paradox: Systemic Ethical Blind Spots in Domain-Specific Medical AI

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background and Objectives Generative medical artificial intelligence (GMAI) has demonstrated immense potential in clinical applications, yet its ethical compliance and underlying safety boundaries still lack systematic, quantitative evaluation. This study aims to assess the safety defense and adherence capabilities of large language models (LLMs) across diverse technical architectures when confronted with complex clinical ethical dilemmas. Methods We constructed a multi-dimensional clinical ethical stress-test Prompt Benchmark, consisting of 39 standardized, high-stakes clinical prompts mapped to five core ethical dimensions: Safety, Professionalism, Fairness, Humanism, and Regulation. Twenty-one mainstream LLMs—including the international benchmark GPT-5.2 and four groups of domestic models (General-Purpose, Domain-Specific Medical, Platform-Based, and Search-Augmented)—were cross-sectionally evaluated. A rigorous two-stage expert consensus mechanism was employed to assign binary ethical compliance scores to each model. Results Overall, the 20 domestic GMAI models achieved an average ethical compliance rate of 91.15%, indicating broad convergence in baseline medical safety alignment across the industry. Crucially, non-parametric statistical analysis confirmed no significant difference in overall adherence among the four domestic technical architectures (P = 0.0735). However, a significant "Vertical Model Paradox" emerged: models specifically fine-tuned on medical corpora (Domain-Specific Medical LLMs) scored lowest in the Professionalism & Evidence-based dimension (78.00%), exhibiting a tendency toward generative overconfidence by failing to issue mandatory warnings for data scarcity or rare diseases. Furthermore, slice analysis revealed systemic safety blind spots across all models when faced with extreme inductive prompts. Notably, models exhibited severe vulnerabilities in maintaining clinical traceability for medical records (pass rate 47.62%) and safely refusing high-risk physical therapy procedures (52.38%), consistently lacking emergency fail-safes and upfront regulatory disclaimers. Additionally, covariate analysis revealed no statistically significant safety disparity between open-weights and closed-source ecosystems (P = 0.9664), effectively refuting the inherent insecurity bias against open-source medical AI. Conclusion Current general value alignment has established basic medical guardrails, but this foundation is insufficient for high-stakes clinical utility. The findings demonstrate a critical imbalance in the alignment objective (Helpfulness vs. Harmlessness), particularly in domain-specific models. We urgently call for the introduction of clinician-led 'Medical Red-Teaming' and the implementation of a safety-first alignment objective to systematically reshape the refusal boundaries of medical LLMs, ensuring patient safety before widespread clinical deployment.

Article activity feed