Pub Date : 2023-03-01DOI: 10.1109/prmvia58252.2023.00017
Jian-qi Li, Yincong Liang, Rui Du, Jingying Wan, Bin-fang Cao, Hui Liu
Aiming at the problem that the defects generated in the production and transportation of punched nickel-plated steel strips are not easy to be detected by deep learning methods, a lightweight, low-redundancy, and high-precision detection method is proposed in this paper. Firstly, a feature extraction network based on GhostNet is constructed, which reduces the amount of computation and feature redundancy while ensuring accuracy. Then the ECA module is applied to the detection head to perform weighted fusion of the features of different channels for better differentiation. Finally, the YOLO detection head is used for multi-scale detection. In the experiment, the mAP of 84.86% was obtained by this method, which proves that this method can be applied to the actual steel strip defect: detection.
{"title":"Lightweight defect detection method of punched nickel-plated steel strip based on GhostNet","authors":"Jian-qi Li, Yincong Liang, Rui Du, Jingying Wan, Bin-fang Cao, Hui Liu","doi":"10.1109/prmvia58252.2023.00017","DOIUrl":"https://doi.org/10.1109/prmvia58252.2023.00017","url":null,"abstract":"Aiming at the problem that the defects generated in the production and transportation of punched nickel-plated steel strips are not easy to be detected by deep learning methods, a lightweight, low-redundancy, and high-precision detection method is proposed in this paper. Firstly, a feature extraction network based on GhostNet is constructed, which reduces the amount of computation and feature redundancy while ensuring accuracy. Then the ECA module is applied to the detection head to perform weighted fusion of the features of different channels for better differentiation. Finally, the YOLO detection head is used for multi-scale detection. In the experiment, the mAP of 84.86% was obtained by this method, which proves that this method can be applied to the actual steel strip defect: detection.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115539164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/PRMVIA58252.2023.00011
Min Xiong, Wenming Cao, Jianqi Zhong
With the vigorous development of image classification technology in the field of computer vision, Few-shot learning (FSL) has become a research hotspot for solving classification task model training with a small number of samples. FSL aims to achieve efficient identification and processing of new category samples with few annotations. Previous works focus on information extraction based on one single model for FSL, lacking the distinction of the differences between data samples. Therefore, we present a meta-learning-based dual model with knowledge clustering for few-shot image classification, trying to learn the correlation between dual models and capture the information embedded in the data samples. In addition, we introduce the center loss to cluster the same sort of samples and to maximize the similarity among the intraclass and the difference among the inter-class. We adopt multiple tasks based on Meta-learning during the training stage. For each task, the training of dual models divides into two phases, which depend on each other under the guidance of the center loss. At the first phase, the first model is trained with a soft label obtained by the predicted label of the second model. The second phase repeats the information exchange of the first phase. We find that the optimal predictions of the active model are close to the soft and actual labels. Extensive experimental results on three general benchmarks illustrate the effectiveness of our proposed methods on few-shot classification tasks.
{"title":"Collaborative Learning-based Dual Network for Few-Shot Image Classification","authors":"Min Xiong, Wenming Cao, Jianqi Zhong","doi":"10.1109/PRMVIA58252.2023.00011","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00011","url":null,"abstract":"With the vigorous development of image classification technology in the field of computer vision, Few-shot learning (FSL) has become a research hotspot for solving classification task model training with a small number of samples. FSL aims to achieve efficient identification and processing of new category samples with few annotations. Previous works focus on information extraction based on one single model for FSL, lacking the distinction of the differences between data samples. Therefore, we present a meta-learning-based dual model with knowledge clustering for few-shot image classification, trying to learn the correlation between dual models and capture the information embedded in the data samples. In addition, we introduce the center loss to cluster the same sort of samples and to maximize the similarity among the intraclass and the difference among the inter-class. We adopt multiple tasks based on Meta-learning during the training stage. For each task, the training of dual models divides into two phases, which depend on each other under the guidance of the center loss. At the first phase, the first model is trained with a soft label obtained by the predicted label of the second model. The second phase repeats the information exchange of the first phase. We find that the optimal predictions of the active model are close to the soft and actual labels. Extensive experimental results on three general benchmarks illustrate the effectiveness of our proposed methods on few-shot classification tasks.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116472575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/PRMVIA58252.2023.00018
Yin Liu, Fangqiang Yu, Jinglin Xu, Peikang Xin
Indentify dangerous houses in rural areas isn’t very efficient, considering the large workload to visit the rural area, patchy and untimely manual document’s registration management. This study first uses UAV oblique photography technology to quickly obtain high-resolution aerial photographic images of villages and reconstruct three-dimensional reality models. Then, based on the YOLOv5 algorithm, the features of dangerous houses in aerial photography images are automatically detected, and the features of dangerous houses are mapped to the real 3D model to accurately locate the dangerous buildings. Finally, a digital management platform for rural dangerous houses is developed to support rural managers in identifying, measuring and tracking dangerous houses. The application results in a village along the coast of southern Fujian province showed that the accuracy rate of the final dangerous house screening rate of this method was 92%, and the coverage rate was 95%, which could greatly improve the efficiency, accuracy and coverage of dangerous house screening and reduce the workload of manual screening; and improve management efficiency through platform-based and visual methods.
{"title":"Identification of Dangerous Rural Houses Using Oblique Photogrammetry and Photo Recognition Technology","authors":"Yin Liu, Fangqiang Yu, Jinglin Xu, Peikang Xin","doi":"10.1109/PRMVIA58252.2023.00018","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00018","url":null,"abstract":"Indentify dangerous houses in rural areas isn’t very efficient, considering the large workload to visit the rural area, patchy and untimely manual document’s registration management. This study first uses UAV oblique photography technology to quickly obtain high-resolution aerial photographic images of villages and reconstruct three-dimensional reality models. Then, based on the YOLOv5 algorithm, the features of dangerous houses in aerial photography images are automatically detected, and the features of dangerous houses are mapped to the real 3D model to accurately locate the dangerous buildings. Finally, a digital management platform for rural dangerous houses is developed to support rural managers in identifying, measuring and tracking dangerous houses. The application results in a village along the coast of southern Fujian province showed that the accuracy rate of the final dangerous house screening rate of this method was 92%, and the coverage rate was 95%, which could greatly improve the efficiency, accuracy and coverage of dangerous house screening and reduce the workload of manual screening; and improve management efficiency through platform-based and visual methods.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122565982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/prmvia58252.2023.00054
Lei Jin, Chongxiao Qu, Yongjin Zhang, Changjun Fan, Zhongke Zhu, Shuo Liu
Nowadays, transfer learning is getting more and more popular in both industry and academia. It enables people to benefit from current advanced AI technologies, which used to be only accessible to professional teams with the most powerful talents, software and hardware resources. It has been proved that transfer learning is the best available option to apply learned patterns for one problem to a different but related problem. But rare research has been done to evaluate the performance of employing an existing model to a less related problem. In this paper, we apply the pre-trained model in the computer vision field, VGG, to a radar dataset, Ionosphere, which is heterogeneous to the above vision data, and carry out extensive experiments. The results show that the classification accuracy is much lower than that in the early research work, and the application of transfer learning should depend on certain situations.
{"title":"Transfer Learning on Trial: A Case Study to Apply Existing Models to Heterogeneous Datasets","authors":"Lei Jin, Chongxiao Qu, Yongjin Zhang, Changjun Fan, Zhongke Zhu, Shuo Liu","doi":"10.1109/prmvia58252.2023.00054","DOIUrl":"https://doi.org/10.1109/prmvia58252.2023.00054","url":null,"abstract":"Nowadays, transfer learning is getting more and more popular in both industry and academia. It enables people to benefit from current advanced AI technologies, which used to be only accessible to professional teams with the most powerful talents, software and hardware resources. It has been proved that transfer learning is the best available option to apply learned patterns for one problem to a different but related problem. But rare research has been done to evaluate the performance of employing an existing model to a less related problem. In this paper, we apply the pre-trained model in the computer vision field, VGG, to a radar dataset, Ionosphere, which is heterogeneous to the above vision data, and carry out extensive experiments. The results show that the classification accuracy is much lower than that in the early research work, and the application of transfer learning should depend on certain situations.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120829772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/PRMVIA58252.2023.00023
Yongkang Lan
A new real coding genetic algorithm is proposed, which discretizes the continuous feasible region and then makes it continuous and complete by mutation operator and local search operator, thus achieving the uniformity of the discretization and continuity of the genetic algorithm. By comparison with binary genetic algorithm, differential evolution algorithm (DE), particle swarm optimization algorithm (PSO), simulated annealing algorithm (SA), and artificial bee colony algorithm (ABC), the results show that the proposed algorithm outperforms the others in all test functions. The algorithm is applied to the case of optimizing the weights of neural networks and excellent results are obtained, which validates the effectiveness of the algorithm.
{"title":"Binary-like Real Coding Genetic Algorithm","authors":"Yongkang Lan","doi":"10.1109/PRMVIA58252.2023.00023","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00023","url":null,"abstract":"A new real coding genetic algorithm is proposed, which discretizes the continuous feasible region and then makes it continuous and complete by mutation operator and local search operator, thus achieving the uniformity of the discretization and continuity of the genetic algorithm. By comparison with binary genetic algorithm, differential evolution algorithm (DE), particle swarm optimization algorithm (PSO), simulated annealing algorithm (SA), and artificial bee colony algorithm (ABC), the results show that the proposed algorithm outperforms the others in all test functions. The algorithm is applied to the case of optimizing the weights of neural networks and excellent results are obtained, which validates the effectiveness of the algorithm.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115809572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/prmvia58252.2023.00024
Gengchen Yu, Birui Shao
With the improvement of people’s living standards, garbage classification is gradually forced. However, due to people’s awareness and knowledge, the classification accuracy and disposal of garbage are difficult to keep pace with guideline changes. With the consideration of the problems of low efficiency, heavy task and poor environment of garbage manual classification, an improved YOLOv7 target detection method is proposed to realize the effective classification of garbage. In this study, the recursive gated convolutional gnconv was used to establish the HorNet network architecture, and the model was trained by making specific data sets. The C3HB module is added to the YOLO model, and the pooling layer is optimized to replace SPPFCSPC to improve the detection accuracy of the target. The experimental results show that the garbage detection and classification method proposed in this study has excellent accuracy. Experiments show that the map value, accuracy and recall rate of the proposed model on garbage datasets are 99.25%, 99.33% and 98.03%, respectively, which are 1.50%, 3.99% and 1.41% higher than those of YOLOv7. The overall results are better than the original model.
{"title":"Garbage Classification and Detection Based on Improved YOLOv7 Network","authors":"Gengchen Yu, Birui Shao","doi":"10.1109/prmvia58252.2023.00024","DOIUrl":"https://doi.org/10.1109/prmvia58252.2023.00024","url":null,"abstract":"With the improvement of people’s living standards, garbage classification is gradually forced. However, due to people’s awareness and knowledge, the classification accuracy and disposal of garbage are difficult to keep pace with guideline changes. With the consideration of the problems of low efficiency, heavy task and poor environment of garbage manual classification, an improved YOLOv7 target detection method is proposed to realize the effective classification of garbage. In this study, the recursive gated convolutional gnconv was used to establish the HorNet network architecture, and the model was trained by making specific data sets. The C3HB module is added to the YOLO model, and the pooling layer is optimized to replace SPPFCSPC to improve the detection accuracy of the target. The experimental results show that the garbage detection and classification method proposed in this study has excellent accuracy. Experiments show that the map value, accuracy and recall rate of the proposed model on garbage datasets are 99.25%, 99.33% and 98.03%, respectively, which are 1.50%, 3.99% and 1.41% higher than those of YOLOv7. The overall results are better than the original model.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115657079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/PRMVIA58252.2023.00031
Xi Zhang, W. Wang, Jing Chen
Dynamic tolling of toll roads is a way to dynamically adjust the toll rates according to the changing road traffic conditions in order to alleviate traffic congestion and improve commuting efficiency. Aiming at the dynamic toll collection problem of Chinese expressway, we design a reinforcement learning simulation environment for China’s expressway network and propose a reinforcement learning dynamic toll model based on a priori lane selection strategy that adapts to the characteristics of the network and travelers’ travel habits. Experiments show that the reinforcement learning-based dynamic tolling can increase the total revenue by more than 10% compared with the fixed- rate tolling scheme and keep the congestion rate at a low level. In addition, the ablation experiments demonstrate that the priori knowledge-based lane selection model can better weigh the "total revenue", "system throughput" and "total system travel time" of the optimized road network under the joint optimization objective
{"title":"A Priori Lane Selection Strategy for Reinforcement Learning of Dynamic Expressway Tolling","authors":"Xi Zhang, W. Wang, Jing Chen","doi":"10.1109/PRMVIA58252.2023.00031","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00031","url":null,"abstract":"Dynamic tolling of toll roads is a way to dynamically adjust the toll rates according to the changing road traffic conditions in order to alleviate traffic congestion and improve commuting efficiency. Aiming at the dynamic toll collection problem of Chinese expressway, we design a reinforcement learning simulation environment for China’s expressway network and propose a reinforcement learning dynamic toll model based on a priori lane selection strategy that adapts to the characteristics of the network and travelers’ travel habits. Experiments show that the reinforcement learning-based dynamic tolling can increase the total revenue by more than 10% compared with the fixed- rate tolling scheme and keep the congestion rate at a low level. In addition, the ablation experiments demonstrate that the priori knowledge-based lane selection model can better weigh the \"total revenue\", \"system throughput\" and \"total system travel time\" of the optimized road network under the joint optimization objective","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116653910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/PRMVIA58252.2023.00016
Yan Wang, Haijiang Zhu, Yutong Liu
The wearing inspection of personnel’s safety protective clothing has important practical significance in the safety production of coal chemical plants. Manual detection or traditional target detection methods are utilized in coal chemical plants for personnel’s safety detection at the moment. However, the clothing detection accuracy is seriously reduced due to the installation position of cameras and the change of light intensity in coal chemical plants. An dual attention based on YOLOv5 is proposed on coal chemical for object detection. Two attention modules, including Efficient Channel Attention (ECA) and Pyramid Split Attention (PSA) module, are integrated into the Spatial Pyramid Pooling (SPP) module and Bottleneck module of this YOLOv5 network. Thus, more global context information is obtained to make up for the lack of global convolution, and the ability to extract features and learn multi-scale information is enhanced. Safety helmet wearing detect data set (SHWD) and self-made data set in our work are utilized to display the improved method’s effectiveness. Compared with the original YOLOv5 algorithm, the improved method achieved an average accuracy increase of 2.7% at different thresholds. Numerous comparative experiments further verify the feasibility of the improved method.
{"title":"DA-YOLOv5: Improved YOLOv5 based on Dual Attention for Object Detection on Coal Chemical Industry","authors":"Yan Wang, Haijiang Zhu, Yutong Liu","doi":"10.1109/PRMVIA58252.2023.00016","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00016","url":null,"abstract":"The wearing inspection of personnel’s safety protective clothing has important practical significance in the safety production of coal chemical plants. Manual detection or traditional target detection methods are utilized in coal chemical plants for personnel’s safety detection at the moment. However, the clothing detection accuracy is seriously reduced due to the installation position of cameras and the change of light intensity in coal chemical plants. An dual attention based on YOLOv5 is proposed on coal chemical for object detection. Two attention modules, including Efficient Channel Attention (ECA) and Pyramid Split Attention (PSA) module, are integrated into the Spatial Pyramid Pooling (SPP) module and Bottleneck module of this YOLOv5 network. Thus, more global context information is obtained to make up for the lack of global convolution, and the ability to extract features and learn multi-scale information is enhanced. Safety helmet wearing detect data set (SHWD) and self-made data set in our work are utilized to display the improved method’s effectiveness. Compared with the original YOLOv5 algorithm, the improved method achieved an average accuracy increase of 2.7% at different thresholds. Numerous comparative experiments further verify the feasibility of the improved method.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126902769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/PRMVIA58252.2023.00008
Xiaosheng Wen, Ping Jian
Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.
{"title":"Image Dense Captioning of Irregular Regions Based on Visual Saliency","authors":"Xiaosheng Wen, Ping Jian","doi":"10.1109/PRMVIA58252.2023.00008","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00008","url":null,"abstract":"Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128857147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/prmvia58252.2023.00015
Xin Xu, Haixia Pan, Hongqiang Wang, Yefan Cao
The driver-assistance system tends to fuse multi-modal sensor data, for instance, the infrared and RGB sensors, to detect intrusion objects to enhance driving safety. However, the semantic misalignment dilemma and the spectral imb-alance between infrared and RGB images make it hard to exp-loit the advantages of multi-sensors in the end-to-end learning system. To solve these problems, we employ the widely used affine transformation on our railway dataset to solve the se-mantic-misalignment issue, in addition, we propose a fusion module, DMF, to fuse the well-aligned features, which can bri-dge the domain gap among different sensors. To this end, we propose an efficient railway invasive object detection network, YOLOv5s-DMF. Compared with the state-of-the-art metho-ds, the YOLOv5s-DMF substantially reduces the MR by 14.23% by employing the well-established decouple head. And our YOLOv5s-DMF further increases the mAP@0.5 by 5.7% and the mAP@0.5:0.95by4.1%.
{"title":"Object Detection Algorithm for Railway Scenes Based on Infrared and RGB Image Fusion","authors":"Xin Xu, Haixia Pan, Hongqiang Wang, Yefan Cao","doi":"10.1109/prmvia58252.2023.00015","DOIUrl":"https://doi.org/10.1109/prmvia58252.2023.00015","url":null,"abstract":"The driver-assistance system tends to fuse multi-modal sensor data, for instance, the infrared and RGB sensors, to detect intrusion objects to enhance driving safety. However, the semantic misalignment dilemma and the spectral imb-alance between infrared and RGB images make it hard to exp-loit the advantages of multi-sensors in the end-to-end learning system. To solve these problems, we employ the widely used affine transformation on our railway dataset to solve the se-mantic-misalignment issue, in addition, we propose a fusion module, DMF, to fuse the well-aligned features, which can bri-dge the domain gap among different sensors. To this end, we propose an efficient railway invasive object detection network, YOLOv5s-DMF. Compared with the state-of-the-art metho-ds, the YOLOv5s-DMF substantially reduces the MR by 14.23% by employing the well-established decouple head. And our YOLOv5s-DMF further increases the mAP@0.5 by 5.7% and the mAP@0.5:0.95by4.1%.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123444729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}