Using Multimodal Large Language Models for False Alarm Reduction in Image-based Fire Detection

Qie Gao
Haihui Wang
Zhenhai Qin
Linhao Fan
Kang Li
Chong Wang
Qixing Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Existing vision-based methods suffer from high false alarm rates in urban flame detection. Applying Multimodal Large Language Models (MLLMs) for secondary filtering shows great potential in reducing false alarms, yet they have high inference latency and are prone to reasoning collapse on negative samples without explicit Chain-of-Thought (CoT) guidance. To overcome these challenges, this study proposed Flash-Cascade, the first sub-second MLLM-based firewall to leverage CoT to efficiently filter false alarms. We deconstructed the flame detection process into four logical stages (planning, observation, analysis, and judgment), which informed the design of three switchable reasoning modes (Detailed, Quick, and Rapid) to achieve inference acceleration via CoT compression. We fine-tuned Qwen2-VL-7B-Instruct on a multi-grained instruction dataset via Low-Rank Adaptation. This process internalizes explicit reasoning logic into implicit parameter representations, enabling the model to maintain robust reasoning capability even without explicit CoT guidance. On our newly constructed benchmark incorporating real-world hard negatives, Flash-Cascade achieves an accuracy of 97.79% and an F1-score of 0.9767 in Rapid mode, outperforming the baseline by 61.63 percentage points (pp) and 0.5152, respectively. Furthermore, it outperforms the state-of-the-art object detector DEIMv2 by 14.64 pp in accuracy. The method exhibits exceptional sample efficiency, converging with only 600 samples and 2 epochs, and improves inference speed by 810% over standard CoT. This study will open a door for robust and efficient flame detection in high-interference scenarios.

Version published to 10.21203/rs.3.rs-8847038/v1 on Research Square
Feb 24, 2026

HyFiD: LLM-ML Hybrid Framework for Subway Fire Detection

This article has 5 authors:
1. Kihwan Ko
2. Ikgeun Kwon
3. Yujin Kang
4. Hee-Dong Kim
5. Yoon-Sik Cho
This article has no evaluationsLatest version Mar 3, 2026
Scalable Automated Video Labeling for Early Wildfire Smoke Detection with Fast-Then-Precise Two-Stage Inference

This article has 3 authors:
1. Srikantnag Angondalli Nagaraja
2. Imre Bartos
3. Chang Zhao
This article has no evaluationsLatest version Feb 25, 2026
Real-time detection of fires and smoke in healthcare facilities using advanced deep learning models on live video streams of surveillance cameras

This article has 6 authors:
1. Mostafa Rizk
2. Houssein Taleb
3. Ali Rhayem
4. Jad Abou Chaaya
5. Chamseddine Zaki
6. Abbass Nasser
This article has no evaluationsLatest version Mar 24, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

HyFiD: LLM-ML Hybrid Framework for Subway Fire Detection

Scalable Automated Video Labeling for Early Wildfire Smoke Detection with Fast-Then-Precise Two-Stage Inference

Real-time detection of fires and smoke in healthcare facilities using advanced deep learning models on live video streams of surveillance cameras