MultiMed-ST Datasets for Machine Translation in Medical Applications

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multilingual automated speech recognition (ASR) in the medical field is a crucial task for recurrent applications, such as speech translation,. Improves patient condition by encouraging communication over language hurdles, justifying keen personnel data and enhancing diagnosis and treatment, during pandemics. This study introduces the logical investigation of medical speech translation (ST) by presenting MultiMed-ST, a comprehensive ST dataset for the medical field, encompassing all source–target directions across five languages: Vietnamese, English, German, French, and Traditional Chinese, along with the corresponding models. With a total of 290,000 samples, this corpus stands as the largest medical MT dataset and the leading many-to-many multilingual ST dataset across all fields. Secondly, we introduce, to the best of our knowledge, the most exhaustive ST analysis in the field. including empirical baselines, bilingual–multilingual comparisons, end-to-end versus cascaded models, task-specific versus multi-task seq2seq evaluations, and quantitative as well as qualitative error assessments.

Article activity feed