{"title":"用于流量检测的多模态融合YoLo网络","authors":"Xinwang Zheng, Wenjie Zheng, Chujie Xu","doi":"10.1111/coin.12615","DOIUrl":null,"url":null,"abstract":"<p>Traffic detection (including lane detection and traffic sign detection) is one of the key technologies to realize driving assistance system and auto drive system. However, most of the existing detection methods are designed based on single-modal visible light data, when there are dramatic changes in lighting in the scene (such as insufficient lighting in night), it is difficult for these methods to obtain good detection results. In view of multi-modal data can provide complementary discriminative information, based on the YoLoV5 model, this paper proposes a multi-modal fusion YoLoV5 network, which consists of three key components: the dual stream feature extraction module, the correlation feature extraction module, and the self-attention fusion module. Specifically, the dual stream feature extraction module is used to extract the features of each of the two modalities. Secondly, input the features learned from the dual stream feature extraction module into the correlation feature extraction module to learn the features with maximum correlation. Then, the extracted maximum correlation features are used to achieve information exchange between modalities through a self-attention mechanism, and thus obtain fused features. Finally, the fused features are inputted into the detection layer to obtain the final detection result. Experimental results on different traffic detection tasks can demonstrate the superiority of the proposed method.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multi-modal fusion YoLo network for traffic detection\",\"authors\":\"Xinwang Zheng, Wenjie Zheng, Chujie Xu\",\"doi\":\"10.1111/coin.12615\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Traffic detection (including lane detection and traffic sign detection) is one of the key technologies to realize driving assistance system and auto drive system. However, most of the existing detection methods are designed based on single-modal visible light data, when there are dramatic changes in lighting in the scene (such as insufficient lighting in night), it is difficult for these methods to obtain good detection results. In view of multi-modal data can provide complementary discriminative information, based on the YoLoV5 model, this paper proposes a multi-modal fusion YoLoV5 network, which consists of three key components: the dual stream feature extraction module, the correlation feature extraction module, and the self-attention fusion module. Specifically, the dual stream feature extraction module is used to extract the features of each of the two modalities. Secondly, input the features learned from the dual stream feature extraction module into the correlation feature extraction module to learn the features with maximum correlation. Then, the extracted maximum correlation features are used to achieve information exchange between modalities through a self-attention mechanism, and thus obtain fused features. Finally, the fused features are inputted into the detection layer to obtain the final detection result. Experimental results on different traffic detection tasks can demonstrate the superiority of the proposed method.</p>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.12615\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12615","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A multi-modal fusion YoLo network for traffic detection
Traffic detection (including lane detection and traffic sign detection) is one of the key technologies to realize driving assistance system and auto drive system. However, most of the existing detection methods are designed based on single-modal visible light data, when there are dramatic changes in lighting in the scene (such as insufficient lighting in night), it is difficult for these methods to obtain good detection results. In view of multi-modal data can provide complementary discriminative information, based on the YoLoV5 model, this paper proposes a multi-modal fusion YoLoV5 network, which consists of three key components: the dual stream feature extraction module, the correlation feature extraction module, and the self-attention fusion module. Specifically, the dual stream feature extraction module is used to extract the features of each of the two modalities. Secondly, input the features learned from the dual stream feature extraction module into the correlation feature extraction module to learn the features with maximum correlation. Then, the extracted maximum correlation features are used to achieve information exchange between modalities through a self-attention mechanism, and thus obtain fused features. Finally, the fused features are inputted into the detection layer to obtain the final detection result. Experimental results on different traffic detection tasks can demonstrate the superiority of the proposed method.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.