Xinzhi Wang , Mengyue Li , Mingke Gao , Quanyi Liu , Zhennan Li , Luyao Kou
{"title":"Early smoke and flame detection based on transformer","authors":"Xinzhi Wang , Mengyue Li , Mingke Gao , Quanyi Liu , Zhennan Li , Luyao Kou","doi":"10.1016/j.jnlssr.2023.06.002","DOIUrl":null,"url":null,"abstract":"<div><p>Fire-detection technology plays a critical role in ensuring public safety and facilitating the development of smart cities. Early fire detection is imperative to mitigate potential hazards and minimize associated losses. However, existing vision-based fire-detection methods exhibit limited generalizability and fail to adequately consider the effect of fire object size on detection accuracy. To address this issue, in this study a decoder-free fully transformer-based (DFFT) detector is used to achieve early smoke and flame detection, improving the detection performance for fires of different sizes. This method effectively captures multi-level and multi-scale fire features with rich semantic information while using two powerful encoders to maintain the accuracy of the single-feature map prediction. First, data augmentation is performed to enhance the generalizability of the model. Second, the detection-oriented transformer (DOT) backbone network is treated as a single-layer fire-feature extractor to obtain fire-related features on four scales, which are then fed into an encoder-only single-layer dense prediction module. Finally, the prediction module aggregates the multi-scale fire features into a single feature map using a scale-aggregated encoder (SAE). The prediction module then aligns the classification and regression features using a task-aligned encoder (TAE) to ensure the semantic interaction of the classification and regression predictions. Experimental results on one private dataset and one public dataset demonstrate that the adopted DFFT possesses high detection accuracy and a strong generalizability for fires of different sizes, particularly early small fires. The DFFT achieved mean average precision (mAP) values of 87.40% and 81.12% for the two datasets, outperforming other baseline models. It exhibits a better detection performance on flame objects than on smoke objects because of the prominence of flame features.</p></div>","PeriodicalId":62710,"journal":{"name":"安全科学与韧性(英文)","volume":"4 3","pages":"Pages 294-304"},"PeriodicalIF":3.7000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"安全科学与韧性(英文)","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666449623000282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 3
Abstract
Fire-detection technology plays a critical role in ensuring public safety and facilitating the development of smart cities. Early fire detection is imperative to mitigate potential hazards and minimize associated losses. However, existing vision-based fire-detection methods exhibit limited generalizability and fail to adequately consider the effect of fire object size on detection accuracy. To address this issue, in this study a decoder-free fully transformer-based (DFFT) detector is used to achieve early smoke and flame detection, improving the detection performance for fires of different sizes. This method effectively captures multi-level and multi-scale fire features with rich semantic information while using two powerful encoders to maintain the accuracy of the single-feature map prediction. First, data augmentation is performed to enhance the generalizability of the model. Second, the detection-oriented transformer (DOT) backbone network is treated as a single-layer fire-feature extractor to obtain fire-related features on four scales, which are then fed into an encoder-only single-layer dense prediction module. Finally, the prediction module aggregates the multi-scale fire features into a single feature map using a scale-aggregated encoder (SAE). The prediction module then aligns the classification and regression features using a task-aligned encoder (TAE) to ensure the semantic interaction of the classification and regression predictions. Experimental results on one private dataset and one public dataset demonstrate that the adopted DFFT possesses high detection accuracy and a strong generalizability for fires of different sizes, particularly early small fires. The DFFT achieved mean average precision (mAP) values of 87.40% and 81.12% for the two datasets, outperforming other baseline models. It exhibits a better detection performance on flame objects than on smoke objects because of the prominence of flame features.