首页 > 最新文献

2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)最新文献

英文 中文
Lightweight defect detection method of punched nickel-plated steel strip based on GhostNet 基于GhostNet的镀镍冲孔钢带轻量化缺陷检测方法
Jian-qi Li, Yincong Liang, Rui Du, Jingying Wan, Bin-fang Cao, Hui Liu
Aiming at the problem that the defects generated in the production and transportation of punched nickel-plated steel strips are not easy to be detected by deep learning methods, a lightweight, low-redundancy, and high-precision detection method is proposed in this paper. Firstly, a feature extraction network based on GhostNet is constructed, which reduces the amount of computation and feature redundancy while ensuring accuracy. Then the ECA module is applied to the detection head to perform weighted fusion of the features of different channels for better differentiation. Finally, the YOLO detection head is used for multi-scale detection. In the experiment, the mAP of 84.86% was obtained by this method, which proves that this method can be applied to the actual steel strip defect: detection.
针对穿孔镀镍钢带在生产和运输过程中产生的缺陷不易被深度学习方法检测的问题,本文提出了一种轻量、低冗余、高精度的检测方法。首先,构建基于GhostNet的特征提取网络,在保证准确率的同时减少了计算量和特征冗余;然后将ECA模块应用于检测头,对不同通道的特征进行加权融合,以更好地区分。最后利用YOLO检测头进行多尺度检测。在实验中,该方法获得了84.86%的mAP,证明了该方法可以应用于实际钢带缺陷的检测。
{"title":"Lightweight defect detection method of punched nickel-plated steel strip based on GhostNet","authors":"Jian-qi Li, Yincong Liang, Rui Du, Jingying Wan, Bin-fang Cao, Hui Liu","doi":"10.1109/prmvia58252.2023.00017","DOIUrl":"https://doi.org/10.1109/prmvia58252.2023.00017","url":null,"abstract":"Aiming at the problem that the defects generated in the production and transportation of punched nickel-plated steel strips are not easy to be detected by deep learning methods, a lightweight, low-redundancy, and high-precision detection method is proposed in this paper. Firstly, a feature extraction network based on GhostNet is constructed, which reduces the amount of computation and feature redundancy while ensuring accuracy. Then the ECA module is applied to the detection head to perform weighted fusion of the features of different channels for better differentiation. Finally, the YOLO detection head is used for multi-scale detection. In the experiment, the mAP of 84.86% was obtained by this method, which proves that this method can be applied to the actual steel strip defect: detection.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115539164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collaborative Learning-based Dual Network for Few-Shot Image Classification 基于协作学习的双网络少拍图像分类
Min Xiong, Wenming Cao, Jianqi Zhong
With the vigorous development of image classification technology in the field of computer vision, Few-shot learning (FSL) has become a research hotspot for solving classification task model training with a small number of samples. FSL aims to achieve efficient identification and processing of new category samples with few annotations. Previous works focus on information extraction based on one single model for FSL, lacking the distinction of the differences between data samples. Therefore, we present a meta-learning-based dual model with knowledge clustering for few-shot image classification, trying to learn the correlation between dual models and capture the information embedded in the data samples. In addition, we introduce the center loss to cluster the same sort of samples and to maximize the similarity among the intraclass and the difference among the inter-class. We adopt multiple tasks based on Meta-learning during the training stage. For each task, the training of dual models divides into two phases, which depend on each other under the guidance of the center loss. At the first phase, the first model is trained with a soft label obtained by the predicted label of the second model. The second phase repeats the information exchange of the first phase. We find that the optimal predictions of the active model are close to the soft and actual labels. Extensive experimental results on three general benchmarks illustrate the effectiveness of our proposed methods on few-shot classification tasks.
随着图像分类技术在计算机视觉领域的蓬勃发展,Few-shot learning (FSL)已成为解决小样本分类任务模型训练的研究热点。FSL旨在以较少的注释实现对新类别样本的高效识别和处理。以往的工作主要集中在基于单一模型的FSL信息提取上,缺乏对数据样本差异的区分。因此,我们提出了一种基于元学习的双模型和知识聚类方法,用于小样本图像分类,试图学习双模型之间的相关性,并捕获数据样本中嵌入的信息。此外,我们引入中心损失对同类样本进行聚类,并最大限度地提高类内相似性和类间差异性。我们在训练阶段采用了基于元学习的多任务。对于每个任务,双模型的训练分为两个阶段,在中心损失的指导下,两个阶段相互依赖。在第一阶段,用第二个模型的预测标签得到的软标签对第一个模型进行训练。第二阶段重复第一阶段的信息交换。我们发现主动模型的最优预测接近软标签和实际标签。在三个通用基准上的大量实验结果证明了我们提出的方法在少镜头分类任务上的有效性。
{"title":"Collaborative Learning-based Dual Network for Few-Shot Image Classification","authors":"Min Xiong, Wenming Cao, Jianqi Zhong","doi":"10.1109/PRMVIA58252.2023.00011","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00011","url":null,"abstract":"With the vigorous development of image classification technology in the field of computer vision, Few-shot learning (FSL) has become a research hotspot for solving classification task model training with a small number of samples. FSL aims to achieve efficient identification and processing of new category samples with few annotations. Previous works focus on information extraction based on one single model for FSL, lacking the distinction of the differences between data samples. Therefore, we present a meta-learning-based dual model with knowledge clustering for few-shot image classification, trying to learn the correlation between dual models and capture the information embedded in the data samples. In addition, we introduce the center loss to cluster the same sort of samples and to maximize the similarity among the intraclass and the difference among the inter-class. We adopt multiple tasks based on Meta-learning during the training stage. For each task, the training of dual models divides into two phases, which depend on each other under the guidance of the center loss. At the first phase, the first model is trained with a soft label obtained by the predicted label of the second model. The second phase repeats the information exchange of the first phase. We find that the optimal predictions of the active model are close to the soft and actual labels. Extensive experimental results on three general benchmarks illustrate the effectiveness of our proposed methods on few-shot classification tasks.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116472575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of Dangerous Rural Houses Using Oblique Photogrammetry and Photo Recognition Technology 利用倾斜摄影测量和照片识别技术识别农村危房
Yin Liu, Fangqiang Yu, Jinglin Xu, Peikang Xin
Indentify dangerous houses in rural areas isn’t very efficient, considering the large workload to visit the rural area, patchy and untimely manual document’s registration management. This study first uses UAV oblique photography technology to quickly obtain high-resolution aerial photographic images of villages and reconstruct three-dimensional reality models. Then, based on the YOLOv5 algorithm, the features of dangerous houses in aerial photography images are automatically detected, and the features of dangerous houses are mapped to the real 3D model to accurately locate the dangerous buildings. Finally, a digital management platform for rural dangerous houses is developed to support rural managers in identifying, measuring and tracking dangerous houses. The application results in a village along the coast of southern Fujian province showed that the accuracy rate of the final dangerous house screening rate of this method was 92%, and the coverage rate was 95%, which could greatly improve the efficiency, accuracy and coverage of dangerous house screening and reduce the workload of manual screening; and improve management efficiency through platform-based and visual methods.
农村危房识别工作效率不高,主要原因是查房工作量大,手工文件登记管理不完整、不及时。本研究首先利用无人机倾斜摄影技术,快速获取高分辨率航拍村庄图像,重建三维现实模型。然后,基于YOLOv5算法,自动检测航拍图像中的危险房屋特征,并将危险房屋特征映射到真实的三维模型中,对危险建筑进行精确定位。最后,开发了农村危房数字化管理平台,支持农村管理者对危房进行识别、测量和跟踪。在闽南沿海某村的应用结果表明,该方法最终的危房筛查准确率为92%,覆盖率为95%,可大大提高危房筛查的效率、准确性和覆盖率,减少人工筛查的工作量;通过平台化和可视化的方式提高管理效率。
{"title":"Identification of Dangerous Rural Houses Using Oblique Photogrammetry and Photo Recognition Technology","authors":"Yin Liu, Fangqiang Yu, Jinglin Xu, Peikang Xin","doi":"10.1109/PRMVIA58252.2023.00018","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00018","url":null,"abstract":"Indentify dangerous houses in rural areas isn’t very efficient, considering the large workload to visit the rural area, patchy and untimely manual document’s registration management. This study first uses UAV oblique photography technology to quickly obtain high-resolution aerial photographic images of villages and reconstruct three-dimensional reality models. Then, based on the YOLOv5 algorithm, the features of dangerous houses in aerial photography images are automatically detected, and the features of dangerous houses are mapped to the real 3D model to accurately locate the dangerous buildings. Finally, a digital management platform for rural dangerous houses is developed to support rural managers in identifying, measuring and tracking dangerous houses. The application results in a village along the coast of southern Fujian province showed that the accuracy rate of the final dangerous house screening rate of this method was 92%, and the coverage rate was 95%, which could greatly improve the efficiency, accuracy and coverage of dangerous house screening and reduce the workload of manual screening; and improve management efficiency through platform-based and visual methods.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122565982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer Learning on Trial: A Case Study to Apply Existing Models to Heterogeneous Datasets 迁移学习试验:将现有模型应用于异构数据集的案例研究
Lei Jin, Chongxiao Qu, Yongjin Zhang, Changjun Fan, Zhongke Zhu, Shuo Liu
Nowadays, transfer learning is getting more and more popular in both industry and academia. It enables people to benefit from current advanced AI technologies, which used to be only accessible to professional teams with the most powerful talents, software and hardware resources. It has been proved that transfer learning is the best available option to apply learned patterns for one problem to a different but related problem. But rare research has been done to evaluate the performance of employing an existing model to a less related problem. In this paper, we apply the pre-trained model in the computer vision field, VGG, to a radar dataset, Ionosphere, which is heterogeneous to the above vision data, and carry out extensive experiments. The results show that the classification accuracy is much lower than that in the early research work, and the application of transfer learning should depend on certain situations.
目前,迁移学习在工业界和学术界都受到越来越多的关注。它使人们能够从当前先进的人工智能技术中受益,而过去只有拥有最强大人才、软件和硬件资源的专业团队才能使用这些技术。事实证明,迁移学习是将一个问题的学习模式应用于另一个不同但相关的问题的最佳选择。但是,很少有研究对一个不太相关的问题使用现有模型的性能进行评估。本文将计算机视觉领域的预训练模型VGG应用于与上述视觉数据异构的雷达数据集电离层,并进行了大量实验。结果表明,分类精度远低于早期的研究工作,迁移学习的应用应该取决于特定的情况。
{"title":"Transfer Learning on Trial: A Case Study to Apply Existing Models to Heterogeneous Datasets","authors":"Lei Jin, Chongxiao Qu, Yongjin Zhang, Changjun Fan, Zhongke Zhu, Shuo Liu","doi":"10.1109/prmvia58252.2023.00054","DOIUrl":"https://doi.org/10.1109/prmvia58252.2023.00054","url":null,"abstract":"Nowadays, transfer learning is getting more and more popular in both industry and academia. It enables people to benefit from current advanced AI technologies, which used to be only accessible to professional teams with the most powerful talents, software and hardware resources. It has been proved that transfer learning is the best available option to apply learned patterns for one problem to a different but related problem. But rare research has been done to evaluate the performance of employing an existing model to a less related problem. In this paper, we apply the pre-trained model in the computer vision field, VGG, to a radar dataset, Ionosphere, which is heterogeneous to the above vision data, and carry out extensive experiments. The results show that the classification accuracy is much lower than that in the early research work, and the application of transfer learning should depend on certain situations.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120829772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binary-like Real Coding Genetic Algorithm 类二进制实编码遗传算法
Yongkang Lan
A new real coding genetic algorithm is proposed, which discretizes the continuous feasible region and then makes it continuous and complete by mutation operator and local search operator, thus achieving the uniformity of the discretization and continuity of the genetic algorithm. By comparison with binary genetic algorithm, differential evolution algorithm (DE), particle swarm optimization algorithm (PSO), simulated annealing algorithm (SA), and artificial bee colony algorithm (ABC), the results show that the proposed algorithm outperforms the others in all test functions. The algorithm is applied to the case of optimizing the weights of neural networks and excellent results are obtained, which validates the effectiveness of the algorithm.
提出了一种新的实数编码遗传算法,将连续可行域离散化,再通过变异算子和局部搜索算子使其连续完备,从而实现了遗传算法离散化和连续性的一致性。通过与二元遗传算法、差分进化算法(DE)、粒子群优化算法(PSO)、模拟退火算法(SA)和人工蜂群算法(ABC)的比较,结果表明该算法在所有测试功能上都优于其他算法。将该算法应用于神经网络权值优化的实例,取得了良好的效果,验证了算法的有效性。
{"title":"Binary-like Real Coding Genetic Algorithm","authors":"Yongkang Lan","doi":"10.1109/PRMVIA58252.2023.00023","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00023","url":null,"abstract":"A new real coding genetic algorithm is proposed, which discretizes the continuous feasible region and then makes it continuous and complete by mutation operator and local search operator, thus achieving the uniformity of the discretization and continuity of the genetic algorithm. By comparison with binary genetic algorithm, differential evolution algorithm (DE), particle swarm optimization algorithm (PSO), simulated annealing algorithm (SA), and artificial bee colony algorithm (ABC), the results show that the proposed algorithm outperforms the others in all test functions. The algorithm is applied to the case of optimizing the weights of neural networks and excellent results are obtained, which validates the effectiveness of the algorithm.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115809572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Garbage Classification and Detection Based on Improved YOLOv7 Network 基于改进YOLOv7网络的垃圾分类与检测
Gengchen Yu, Birui Shao
With the improvement of people’s living standards, garbage classification is gradually forced. However, due to people’s awareness and knowledge, the classification accuracy and disposal of garbage are difficult to keep pace with guideline changes. With the consideration of the problems of low efficiency, heavy task and poor environment of garbage manual classification, an improved YOLOv7 target detection method is proposed to realize the effective classification of garbage. In this study, the recursive gated convolutional gnconv was used to establish the HorNet network architecture, and the model was trained by making specific data sets. The C3HB module is added to the YOLO model, and the pooling layer is optimized to replace SPPFCSPC to improve the detection accuracy of the target. The experimental results show that the garbage detection and classification method proposed in this study has excellent accuracy. Experiments show that the map value, accuracy and recall rate of the proposed model on garbage datasets are 99.25%, 99.33% and 98.03%, respectively, which are 1.50%, 3.99% and 1.41% higher than those of YOLOv7. The overall results are better than the original model.
随着人们生活水平的提高,垃圾分类逐渐被强制。然而,由于人们的意识和知识,垃圾的分类精度和处理很难跟上指南的变化。针对垃圾人工分类效率低、任务重、环境差的问题,提出一种改进的YOLOv7目标检测方法,实现垃圾的有效分类。本研究采用递归门控卷积gnconv建立HorNet网络架构,并通过制作特定数据集对模型进行训练。在YOLO模型中加入C3HB模块,优化池化层取代SPPFCSPC,提高目标检测精度。实验结果表明,本文提出的垃圾检测分类方法具有良好的准确率。实验表明,该模型在垃圾数据集上的地图值、准确率和召回率分别为99.25%、99.33%和98.03%,分别比YOLOv7提高了1.50%、3.99%和1.41%。总体结果优于原模型。
{"title":"Garbage Classification and Detection Based on Improved YOLOv7 Network","authors":"Gengchen Yu, Birui Shao","doi":"10.1109/prmvia58252.2023.00024","DOIUrl":"https://doi.org/10.1109/prmvia58252.2023.00024","url":null,"abstract":"With the improvement of people’s living standards, garbage classification is gradually forced. However, due to people’s awareness and knowledge, the classification accuracy and disposal of garbage are difficult to keep pace with guideline changes. With the consideration of the problems of low efficiency, heavy task and poor environment of garbage manual classification, an improved YOLOv7 target detection method is proposed to realize the effective classification of garbage. In this study, the recursive gated convolutional gnconv was used to establish the HorNet network architecture, and the model was trained by making specific data sets. The C3HB module is added to the YOLO model, and the pooling layer is optimized to replace SPPFCSPC to improve the detection accuracy of the target. The experimental results show that the garbage detection and classification method proposed in this study has excellent accuracy. Experiments show that the map value, accuracy and recall rate of the proposed model on garbage datasets are 99.25%, 99.33% and 98.03%, respectively, which are 1.50%, 3.99% and 1.41% higher than those of YOLOv7. The overall results are better than the original model.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115657079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Priori Lane Selection Strategy for Reinforcement Learning of Dynamic Expressway Tolling 高速公路动态收费强化学习的先验车道选择策略
Xi Zhang, W. Wang, Jing Chen
Dynamic tolling of toll roads is a way to dynamically adjust the toll rates according to the changing road traffic conditions in order to alleviate traffic congestion and improve commuting efficiency. Aiming at the dynamic toll collection problem of Chinese expressway, we design a reinforcement learning simulation environment for China’s expressway network and propose a reinforcement learning dynamic toll model based on a priori lane selection strategy that adapts to the characteristics of the network and travelers’ travel habits. Experiments show that the reinforcement learning-based dynamic tolling can increase the total revenue by more than 10% compared with the fixed- rate tolling scheme and keep the congestion rate at a low level. In addition, the ablation experiments demonstrate that the priori knowledge-based lane selection model can better weigh the "total revenue", "system throughput" and "total system travel time" of the optimized road network under the joint optimization objective
收费公路动态收费是指根据道路交通状况的变化动态调整收费费率,以缓解交通拥堵,提高通勤效率的一种方式。针对中国高速公路的动态收费问题,设计了中国高速公路网络的强化学习仿真环境,提出了一种基于先验车道选择策略的强化学习动态收费模型,该模型适应网络特点和出行者的出行习惯。实验表明,与固定费率收费方案相比,基于强化学习的动态收费方案可使总收入提高10%以上,并使拥堵率保持在较低水平。此外,消融实验表明,基于先验知识的车道选择模型能够更好地权衡联合优化目标下优化路网的“总收益”、“系统吞吐量”和“系统总行驶时间”
{"title":"A Priori Lane Selection Strategy for Reinforcement Learning of Dynamic Expressway Tolling","authors":"Xi Zhang, W. Wang, Jing Chen","doi":"10.1109/PRMVIA58252.2023.00031","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00031","url":null,"abstract":"Dynamic tolling of toll roads is a way to dynamically adjust the toll rates according to the changing road traffic conditions in order to alleviate traffic congestion and improve commuting efficiency. Aiming at the dynamic toll collection problem of Chinese expressway, we design a reinforcement learning simulation environment for China’s expressway network and propose a reinforcement learning dynamic toll model based on a priori lane selection strategy that adapts to the characteristics of the network and travelers’ travel habits. Experiments show that the reinforcement learning-based dynamic tolling can increase the total revenue by more than 10% compared with the fixed- rate tolling scheme and keep the congestion rate at a low level. In addition, the ablation experiments demonstrate that the priori knowledge-based lane selection model can better weigh the \"total revenue\", \"system throughput\" and \"total system travel time\" of the optimized road network under the joint optimization objective","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116653910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DA-YOLOv5: Improved YOLOv5 based on Dual Attention for Object Detection on Coal Chemical Industry DA-YOLOv5:基于双注意的改进YOLOv5煤化工目标检测
Yan Wang, Haijiang Zhu, Yutong Liu
The wearing inspection of personnel’s safety protective clothing has important practical significance in the safety production of coal chemical plants. Manual detection or traditional target detection methods are utilized in coal chemical plants for personnel’s safety detection at the moment. However, the clothing detection accuracy is seriously reduced due to the installation position of cameras and the change of light intensity in coal chemical plants. An dual attention based on YOLOv5 is proposed on coal chemical for object detection. Two attention modules, including Efficient Channel Attention (ECA) and Pyramid Split Attention (PSA) module, are integrated into the Spatial Pyramid Pooling (SPP) module and Bottleneck module of this YOLOv5 network. Thus, more global context information is obtained to make up for the lack of global convolution, and the ability to extract features and learn multi-scale information is enhanced. Safety helmet wearing detect data set (SHWD) and self-made data set in our work are utilized to display the improved method’s effectiveness. Compared with the original YOLOv5 algorithm, the improved method achieved an average accuracy increase of 2.7% at different thresholds. Numerous comparative experiments further verify the feasibility of the improved method.
人员安全防护服的穿着检查在煤化工安全生产中具有重要的现实意义。目前煤化工企业对人员的安全检测主要采用人工检测或传统的目标检测方法。但在煤化工工厂,由于摄像机的安装位置和光照强度的变化,严重降低了服装检测精度。提出了一种基于YOLOv5的煤化工双注意力目标检测方法。在YOLOv5网络的空间金字塔池(SPP)模块和瓶颈(Bottleneck)模块中集成了高效通道注意(ECA)和金字塔分裂注意(PSA)两个注意模块。从而获得更多的全局上下文信息,弥补了全局卷积的不足,增强了提取特征和学习多尺度信息的能力。利用工作中的安全帽佩戴检测数据集(SHWD)和自制数据集验证了改进方法的有效性。与原来的YOLOv5算法相比,改进后的方法在不同阈值下的平均准确率提高了2.7%。大量对比实验进一步验证了改进方法的可行性。
{"title":"DA-YOLOv5: Improved YOLOv5 based on Dual Attention for Object Detection on Coal Chemical Industry","authors":"Yan Wang, Haijiang Zhu, Yutong Liu","doi":"10.1109/PRMVIA58252.2023.00016","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00016","url":null,"abstract":"The wearing inspection of personnel’s safety protective clothing has important practical significance in the safety production of coal chemical plants. Manual detection or traditional target detection methods are utilized in coal chemical plants for personnel’s safety detection at the moment. However, the clothing detection accuracy is seriously reduced due to the installation position of cameras and the change of light intensity in coal chemical plants. An dual attention based on YOLOv5 is proposed on coal chemical for object detection. Two attention modules, including Efficient Channel Attention (ECA) and Pyramid Split Attention (PSA) module, are integrated into the Spatial Pyramid Pooling (SPP) module and Bottleneck module of this YOLOv5 network. Thus, more global context information is obtained to make up for the lack of global convolution, and the ability to extract features and learn multi-scale information is enhanced. Safety helmet wearing detect data set (SHWD) and self-made data set in our work are utilized to display the improved method’s effectiveness. Compared with the original YOLOv5 algorithm, the improved method achieved an average accuracy increase of 2.7% at different thresholds. Numerous comparative experiments further verify the feasibility of the improved method.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126902769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Dense Captioning of Irregular Regions Based on Visual Saliency 基于视觉显著性的不规则区域图像密集字幕
Xiaosheng Wen, Ping Jian
Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.
传统的密集字幕是用自然语言描述图像的局部细节。通常先对目标进行检测,然后对检测到的边界框内的内容进行描述,使描述内容更加丰富。但基于目标检测的字幕往往缺乏对目标与环境之间或目标之间关联的关注。而目前,还没有密集字幕的方法能够处理不规则区域。为了解决这些问题,我们提出了一种基于视觉显著性的区域划分方法。它更多地关注区域而不仅仅是对象。在此基础上,对不规则区域进行局部描述。对于每个区域,我们将图像与目标区域结合生成特征,并将这些特征放入标题模型中。我们使用Visual Genome数据集进行训练和测试。通过实验,我们的模型与传统边界框下的基线具有可比性。对不规则区域的描述也很好。该模型在图像检索实验中表现良好,信息冗余少。在应用程序中,我们支持手动选择图像上感兴趣的区域进行描述,以帮助扩展数据集。
{"title":"Image Dense Captioning of Irregular Regions Based on Visual Saliency","authors":"Xiaosheng Wen, Ping Jian","doi":"10.1109/PRMVIA58252.2023.00008","DOIUrl":"https://doi.org/10.1109/PRMVIA58252.2023.00008","url":null,"abstract":"Traditional Dense Captioning intends to describe local details of image with natural language. It usually uses target detection first and then describes the contents in the detected bounding box, which will make the description content rich. But captioning based on target detection often lacks the attention to the association between objects and the environment, or between the objects. And for now, there is no dense captioning method has the ability to deal with irregular areas. To solve these problems, we propose a visual-saliency based region division method. It focuses more on areas than just on objects. Based on the division, the local description of the irregular region is carried out. For each area, we combine the image with the target area to generate features, which are put into the caption model. We used the Visual Genome dataset for training and testing. Through experiments, our model is comparable to the baseline under the traditional bounding box. And the description of irregular region generated by our method is equally good. Our model performs well in image retrieval experiments and has less information redundancy. In the application, we support to manually select the region of interest on the image for description, for assist in expanding the dataset.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128857147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Object Detection Algorithm for Railway Scenes Based on Infrared and RGB Image Fusion 基于红外和RGB图像融合的铁路场景目标检测算法
Xin Xu, Haixia Pan, Hongqiang Wang, Yefan Cao
The driver-assistance system tends to fuse multi-modal sensor data, for instance, the infrared and RGB sensors, to detect intrusion objects to enhance driving safety. However, the semantic misalignment dilemma and the spectral imb-alance between infrared and RGB images make it hard to exp-loit the advantages of multi-sensors in the end-to-end learning system. To solve these problems, we employ the widely used affine transformation on our railway dataset to solve the se-mantic-misalignment issue, in addition, we propose a fusion module, DMF, to fuse the well-aligned features, which can bri-dge the domain gap among different sensors. To this end, we propose an efficient railway invasive object detection network, YOLOv5s-DMF. Compared with the state-of-the-art metho-ds, the YOLOv5s-DMF substantially reduces the MR by 14.23% by employing the well-established decouple head. And our YOLOv5s-DMF further increases the mAP@0.5 by 5.7% and the mAP@0.5:0.95by4.1%.
驾驶辅助系统倾向于融合多模态传感器数据,如红外和RGB传感器,以检测入侵物体,以提高驾驶安全性。然而,红外和RGB图像之间的语义失调困境和光谱不平衡使得多传感器在端到端学习系统中难以发挥其优势。为了解决这些问题,我们在我们的铁路数据集上采用了广泛使用的仿射变换来解决语义失调问题,此外,我们提出了一个融合模块DMF来融合对齐良好的特征,从而可以弥合不同传感器之间的域差距。为此,我们提出了一种高效的铁路入侵目标检测网络YOLOv5s-DMF。与最先进的方法相比,YOLOv5s-DMF通过采用成熟的解耦头,大大降低了14.23%的磁阻。我们的YOLOv5s-DMF进一步提高了mAP@0.5 5.7%和mAP@0.5:0.95 4.1%。
{"title":"Object Detection Algorithm for Railway Scenes Based on Infrared and RGB Image Fusion","authors":"Xin Xu, Haixia Pan, Hongqiang Wang, Yefan Cao","doi":"10.1109/prmvia58252.2023.00015","DOIUrl":"https://doi.org/10.1109/prmvia58252.2023.00015","url":null,"abstract":"The driver-assistance system tends to fuse multi-modal sensor data, for instance, the infrared and RGB sensors, to detect intrusion objects to enhance driving safety. However, the semantic misalignment dilemma and the spectral imb-alance between infrared and RGB images make it hard to exp-loit the advantages of multi-sensors in the end-to-end learning system. To solve these problems, we employ the widely used affine transformation on our railway dataset to solve the se-mantic-misalignment issue, in addition, we propose a fusion module, DMF, to fuse the well-aligned features, which can bri-dge the domain gap among different sensors. To this end, we propose an efficient railway invasive object detection network, YOLOv5s-DMF. Compared with the state-of-the-art metho-ds, the YOLOv5s-DMF substantially reduces the MR by 14.23% by employing the well-established decouple head. And our YOLOv5s-DMF further increases the mAP@0.5 by 5.7% and the mAP@0.5:0.95by4.1%.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123444729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1