MFCS-based Barrier-Free Audio Description Production System for Film and Television Programs

Yepeng Ni
Guanqi Wang
Ruohan Wu
Meng Zhou
Yueming Xue
Xiaowei Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

With the rapid advancement of artificial intelligence technology, the field of accessible film and television production is encountering unprecedented opportunities and challenges. Traditional audio description production processes are often complex, costly, and labour-intensive, struggling to meet the growing and evolving demands for accessibility. In this paper, we propose and implement an automatic audio description generation system for accessible film and television programs based on the Model Function Calling Standard (MFCS). Leveraging the powerful semantic understanding and generation capabilities of large language models (LLMs), combined with MFCS's standardized tool invocation mechanism, the system integrates various external tools and services such as natural language processing APIs, speech recognition and synthesis engines, and multilingual translation models. It achieves automatic generation of audio descriptions, multilingual support, emotional adaptation, and speech synthesis, constructing an efficient, flexible, and scalable audio description production platform. Experimental results show that the system significantly improves the efficiency and quality of audio description production, reduces costs, and thereby providing visually impaired audiences with a more diverse and personalized viewing experience. Furthermore, we explore the potential for further application of MFCS in accessible film and television production offering new insights for promoting the widespread adoption of accessible information services and technological innovation.

Version published to 10.21203/rs.3.rs-8291352/v1 on Research Square
Feb 9, 2026

Streaming Transformer Networks: Unified Hearing-to-Speech Recognition and Intelligent Text Generation Systems

This article has 1 author:
1. P. Selvaprasanth
This article has no evaluationsLatest version Mar 4, 2026
Reg2Bangla: An End-to-End Regional Speech Standardization

This article has 7 authors:
1. Samiul Basir Bhuiyan
2. Md Sazzad Hossain Adib
3. Mohammed Aman Bhuiyan
4. Aritra Islam Saswato
5. Ahmed Faizul Haque Dhrubo
6. Mohammad Ashrafuzzaman Khan
7. Mohammad Abdul Qayum
This article has no evaluationsLatest version Mar 17, 2026
Generative AI : A Comprehensive Overview of Large Language Models for Prompt Engineering and Applications

This article has 9 authors:
1. Ali Daud
2. Mobushira Khan
3. Sakher Ghanem
4. Sami Alesawi
5. Saud Yonbawi
6. Raed Alsini
7. Omar Alghushairy
8. Manal Linjawi
9. Abdulrahman Ahmed Gharawi
This article has no evaluationsLatest version Feb 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Streaming Transformer Networks: Unified Hearing-to-Speech Recognition and Intelligent Text Generation Systems

Reg2Bangla: An End-to-End Regional Speech Standardization

Generative AI : A Comprehensive Overview of Large Language Models for Prompt Engineering and Applications