Multimodal RAG for Financial Documents: BART-Based Financial Named Entity Recognition and Attention-based Table Parsing for Financial QA Enhancement
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Document retrieval plays a vital role in the financial domain, particularly in investment decision-making, risk assessment, and market regulation. Financial documents often contain complex multimodal data, including text, tables, and charts, and there are still errors in the parsing and question answering of multimodal financial documents. Firstly, to address the insufficient semantic relevance between the responses of large language models and corresponding queries in complex Chinese financial long text scenarios, we propose a BART-based named entity recognition (NER) approach combined with a prompt-guided strategy. By explicitly capturing and modeling entity and relation information, the model improves the accuracy of entity recognition and semantic understanding, while also enhancing its logical reasoning and interpretability. Secondly, to address the issue of parsing errors in multimodal tabular data, we introduce a financial domain specific table structure recognition model that improves the accuracy of table parsing, significantly reduces GPU memory consumption, and ultimately enhances the answer accuracy of the multimodal RAG system. In addition, to address the lack of high-quality named entity annotation data in the financial domain, we constructed a Chinese financial multimodal NER dataset to support multimodal RAG models. Experimental results demonstrate the effectiveness of our approach in enhancing both table parsing performance and answer generation for multimodal financial documents. Our table structure recognition method requires only 1.5 GB of GPU memory, and the RAG approach achieves an Answer Correctness score of 48\% on Ragas. More information and access to our code are available at our GitHub repository: https://github.com/LeKit089/NER_MultimodalRAG.