首页 > 最新文献

Machine Vision and Applications最新文献

英文 中文
Supervised contrastive learning with multi-scale interaction and integrity learning for salient object detection 针对突出物体检测的多尺度交互和完整性学习的有监督对比学习
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-29 DOI: 10.1007/s00138-024-01552-0
Yu Bi, Zhenxue Chen, Chengyun Liu, Tian Liang, Fei Zheng

Salient object detection (SOD) is designed to mimic human visual mechanisms to identify and segment the most salient part of an image. Although related works have achieved great progress in SOD, they are limited when it comes to interferences of non-salient objects, finely shaped objects and co-salient objects. To improve the effectiveness and capability of SOD, we propose a supervised contrastive learning network with multi-scale interaction and integrity learning named SCLNet. It adopts contrastive learning (CL), multi-reception field confusion (MRFC) and context enhancement (CE) mechanisms. Using this method, the input image is first divided into two branches after two different data augmentations. Unlike existing models, which focus more on boundary guidance, we add a random position mask on one branch to break the continuous of objects. Through the CL module, we obtain more semantic information than appearance information by learning the invariance of different data augmentations. The MRFC module is then designed to learn the internal connections and common influences of various reception field features layer by layer. Next, the obtained features are learned through the CE module for the integrity and continuity of salient objects. Finally, comprehensive evaluations on five challenging benchmark datasets show that SCLNet achieves superior results. Code is available at https://github.com/YuPangpangpang/SCLNet.

突出物体检测(SOD)旨在模仿人类的视觉机制,识别和分割图像中最突出的部分。虽然相关工作在 SOD 方面取得了很大进展,但在非突出物体、形状精细的物体和共突出物体的干扰方面,这些工作还存在局限性。为了提高 SOD 的效果和能力,我们提出了一种具有多尺度交互和完整性学习的有监督对比学习网络,并将其命名为 SCLNet。它采用了对比学习(CL)、多接收场混淆(MRFC)和上下文增强(CE)机制。利用这种方法,输入图像在经过两种不同的数据增强后首先被分为两个分支。与现有模型更注重边界引导不同,我们在一个分支上添加了一个随机位置掩码,以打破物体的连续性。通过 CL 模块,我们可以学习不同数据增强的不变性,从而获得比外观信息更多的语义信息。然后设计 MRFC 模块,逐层学习各种接收场特征的内部联系和共同影响因素。接着,通过 CE 模块学习所获得的特征,以确保突出对象的完整性和连续性。最后,在五个具有挑战性的基准数据集上进行的综合评估表明,SCLNet 取得了优异的成绩。代码见 https://github.com/YuPangpangpang/SCLNet。
{"title":"Supervised contrastive learning with multi-scale interaction and integrity learning for salient object detection","authors":"Yu Bi, Zhenxue Chen, Chengyun Liu, Tian Liang, Fei Zheng","doi":"10.1007/s00138-024-01552-0","DOIUrl":"https://doi.org/10.1007/s00138-024-01552-0","url":null,"abstract":"<p>Salient object detection (SOD) is designed to mimic human visual mechanisms to identify and segment the most salient part of an image. Although related works have achieved great progress in SOD, they are limited when it comes to interferences of non-salient objects, finely shaped objects and co-salient objects. To improve the effectiveness and capability of SOD, we propose a supervised contrastive learning network with multi-scale interaction and integrity learning named SCLNet. It adopts contrastive learning (CL), multi-reception field confusion (MRFC) and context enhancement (CE) mechanisms. Using this method, the input image is first divided into two branches after two different data augmentations. Unlike existing models, which focus more on boundary guidance, we add a random position mask on one branch to break the continuous of objects. Through the CL module, we obtain more semantic information than appearance information by learning the invariance of different data augmentations. The MRFC module is then designed to learn the internal connections and common influences of various reception field features layer by layer. Next, the obtained features are learned through the CE module for the integrity and continuity of salient objects. Finally, comprehensive evaluations on five challenging benchmark datasets show that SCLNet achieves superior results. Code is available at https://github.com/YuPangpangpang/SCLNet.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Medtransnet: advanced gating transformer network for medical image classification Medtransnet:用于医学图像分类的高级门控变压器网络
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-29 DOI: 10.1007/s00138-024-01542-2
Nagur Shareef Shaik, Teja Krishna Cherukuri, N Veeranjaneulu, Jyostna Devi Bodapati

Accurate medical image classification poses a significant challenge in designing expert computer-aided diagnosis systems. While deep learning approaches have shown remarkable advancements over traditional techniques, addressing inter-class similarity and intra-class dissimilarity across medical imaging modalities remains challenging. This work introduces the advanced gating transformer network (MedTransNet), a deep learning model tailored for precise medical image classification. MedTransNet utilizes channel and multi-gate attention mechanisms, coupled with residual interconnections, to learn category-specific attention representations from diverse medical imaging modalities. Additionally, the use of gradient centralization during training helps in preventing overfitting and improving generalization, which is especially important in medical imaging applications where the availability of labeled data is often limited. Evaluation on benchmark datasets, including APTOS-2019, Figshare, and SARS-CoV-2, demonstrates effectiveness of the proposed MedTransNet across tasks such as diabetic retinopathy severity grading, multi-class brain tumor classification, and COVID-19 detection. Experimental results showcase MedTransNet achieving 85.68% accuracy for retinopathy grading, 98.37% ((pm ,0.44)) for tumor classification, and 99.60% for COVID-19 detection, surpassing recent deep learning models. MedTransNet holds promise for significantly improving medical image classification accuracy.

准确的医学图像分类是设计专家计算机辅助诊断系统的一大挑战。虽然深度学习方法与传统技术相比取得了显著进步,但解决医学影像模式中的类间相似性和类内不相似性问题仍具有挑战性。这项研究引入了高级门控变压器网络(MedTransNet),这是一种专为精确医学影像分类而定制的深度学习模型。MedTransNet 利用通道和多门注意机制以及残差互连,从不同的医学成像模式中学习特定类别的注意表征。此外,在训练过程中使用梯度集中化有助于防止过拟合和提高泛化能力,这在医疗成像应用中尤为重要,因为标注数据的可用性往往有限。在 APTOS-2019、Figshare 和 SARS-CoV-2 等基准数据集上进行的评估证明了所提出的 MedTransNet 在糖尿病视网膜病变严重程度分级、多类脑肿瘤分类和 COVID-19 检测等任务中的有效性。实验结果表明,MedTransNet 的视网膜病变分级准确率达到 85.68%,肿瘤分类准确率达到 98.37%,COVID-19 检测准确率达到 99.60%,超过了最近的深度学习模型。MedTransNet有望显著提高医学图像分类的准确性。
{"title":"Medtransnet: advanced gating transformer network for medical image classification","authors":"Nagur Shareef Shaik, Teja Krishna Cherukuri, N Veeranjaneulu, Jyostna Devi Bodapati","doi":"10.1007/s00138-024-01542-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01542-2","url":null,"abstract":"<p>Accurate medical image classification poses a significant challenge in designing expert computer-aided diagnosis systems. While deep learning approaches have shown remarkable advancements over traditional techniques, addressing inter-class similarity and intra-class dissimilarity across medical imaging modalities remains challenging. This work introduces the advanced gating transformer network (MedTransNet), a deep learning model tailored for precise medical image classification. MedTransNet utilizes channel and multi-gate attention mechanisms, coupled with residual interconnections, to learn category-specific attention representations from diverse medical imaging modalities. Additionally, the use of gradient centralization during training helps in preventing overfitting and improving generalization, which is especially important in medical imaging applications where the availability of labeled data is often limited. Evaluation on benchmark datasets, including APTOS-2019, Figshare, and SARS-CoV-2, demonstrates effectiveness of the proposed MedTransNet across tasks such as diabetic retinopathy severity grading, multi-class brain tumor classification, and COVID-19 detection. Experimental results showcase MedTransNet achieving 85.68% accuracy for retinopathy grading, 98.37% (<span>(pm ,0.44)</span>) for tumor classification, and 99.60% for COVID-19 detection, surpassing recent deep learning models. MedTransNet holds promise for significantly improving medical image classification accuracy.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning more discriminative local descriptors with parameter-free weighted attention for few-shot learning 利用无参数加权注意力学习更具辨别力的局部描述符,实现少镜头学习
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-28 DOI: 10.1007/s00138-024-01551-1
Qijun Song, Siyun Zhou, Die Chen

Few-shot learning for image classification comes up as a hot topic in computer vision, which aims at fast learning from a limited number of labeled images and generalize over the new tasks. In this paper, motivated by the idea of Fisher Score, we propose a Discriminative Local Descriptors Attention model that uses the ratio of intra-class and inter-class similarity to adaptively highlight the representative local descriptors without introducing any additional parameters, while most of the existing local descriptors based methods utilize the neural networks that inevitably involve the tedious parameter tuning. Experiments on four benchmark datasets show that our method achieves higher accuracy compared with the state-of-art approaches for few-shot learning. Specifically, our method is optimal on the CUB-200 dataset, and outperforms the second best competitive algorithm by 4.12(%) and 0.49(%) under the 5-way 1-shot and 5-way 5-shot settings, respectively.

图像分类的快速学习是计算机视觉领域的一个热门话题,其目的是从数量有限的标注图像中快速学习,并在新任务中实现泛化。现有的基于局部描述符的方法大多使用神经网络,不可避免地会涉及繁琐的参数调整,而本文受 Fisher Score 的思想启发,提出了一种 Discriminative Local Descriptors Attention 模型,利用类内和类间相似性的比率自适应地突出具有代表性的局部描述符,而无需引入任何额外参数。在四个基准数据集上进行的实验表明,我们的方法与最先进的少量学习方法相比具有更高的准确性。具体来说,我们的方法在CUB-200数据集上是最优的,在5路1-shot和5路5-shot设置下,分别比第二好的竞争算法高出4.12(%)和0.49(%)。
{"title":"Learning more discriminative local descriptors with parameter-free weighted attention for few-shot learning","authors":"Qijun Song, Siyun Zhou, Die Chen","doi":"10.1007/s00138-024-01551-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01551-1","url":null,"abstract":"<p>Few-shot learning for image classification comes up as a hot topic in computer vision, which aims at fast learning from a limited number of labeled images and generalize over the new tasks. In this paper, motivated by the idea of Fisher Score, we propose a Discriminative Local Descriptors Attention model that uses the ratio of intra-class and inter-class similarity to adaptively highlight the representative local descriptors without introducing any additional parameters, while most of the existing local descriptors based methods utilize the neural networks that inevitably involve the tedious parameter tuning. Experiments on four benchmark datasets show that our method achieves higher accuracy compared with the state-of-art approaches for few-shot learning. Specifically, our method is optimal on the CUB-200 dataset, and outperforms the second best competitive algorithm by 4.12<span>(%)</span> and 0.49<span>(%)</span> under the 5-way 1-shot and 5-way 5-shot settings, respectively.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human–object interaction detection based on disentangled axial attention transformer 基于离散轴向注意力变换器的人-物互动检测
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-28 DOI: 10.1007/s00138-024-01558-8
Limin Xia, Qiyue Xiao

Human–object interaction (HOI) detection aims to localize and infer interactions between human and objects in an image. Recent work proposed transformer encoder–decoder architectures for HOI detection with exceptional performance, but possess certain drawbacks: they do not employ a complete disentanglement strategy to learn more discriminative features for different sub-tasks; they cannot achieve sufficient contextual exchange within each branch, which is crucial for accurate relational reasoning; their transformer models suffer from high computational costs and large memory usage due to complex attention calculations. In this work, we propose a disentangled transformer network that disentangles both the encoder and decoder into three branches for human detection, object detection, and interaction classification. Then we propose a novel feature unify decoder to associate the predictions of each disentangled decoder, and introduce a multiplex relation embedding module and an attentive fusion module to perform sufficient contextual information exchange among branches. Additionally, to reduce the model’s computational cost, a position-sensitive axial attention is incorporated into the encoder, allowing our model to achieve a better accuracy-complexity trade-off. Extensive experiments are conducted on two public HOI benchmarks to demonstrate the effectiveness of our approach. The results indicate that our model outperforms other methods, achieving state-of-the-art performance.

人-物互动(HOI)检测旨在定位和推断图像中人与物体之间的互动。最近的研究提出了用于 HOI 检测的变压器编码器-解码器架构,其性能优异,但也存在一些缺点:它们没有采用完整的解纠缠策略来为不同的子任务学习更多的判别特征;它们无法在每个分支内实现充分的上下文交换,而这对于准确的关系推理至关重要;由于复杂的注意力计算,它们的变压器模型存在计算成本高和内存占用大的问题。在这项工作中,我们提出了一种分解变换器网络,它将编码器和解码器分解为三个分支,分别用于人类检测、物体检测和交互分类。然后,我们提出了一种新颖的特征统一解码器,用于关联每个分解解码器的预测结果,并引入了多路关系嵌入模块和殷勤融合模块,以便在各分支之间进行充分的上下文信息交换。此外,为了降低模型的计算成本,我们在编码器中加入了对位置敏感的轴向注意力,从而使我们的模型在准确性和复杂性之间实现了更好的权衡。我们在两个公开的 HOI 基准上进行了广泛的实验,以证明我们方法的有效性。结果表明,我们的模型优于其他方法,达到了最先进的性能。
{"title":"Human–object interaction detection based on disentangled axial attention transformer","authors":"Limin Xia, Qiyue Xiao","doi":"10.1007/s00138-024-01558-8","DOIUrl":"https://doi.org/10.1007/s00138-024-01558-8","url":null,"abstract":"<p>Human–object interaction (HOI) detection aims to localize and infer interactions between human and objects in an image. Recent work proposed transformer encoder–decoder architectures for HOI detection with exceptional performance, but possess certain drawbacks: they do not employ a complete disentanglement strategy to learn more discriminative features for different sub-tasks; they cannot achieve sufficient contextual exchange within each branch, which is crucial for accurate relational reasoning; their transformer models suffer from high computational costs and large memory usage due to complex attention calculations. In this work, we propose a disentangled transformer network that disentangles both the encoder and decoder into three branches for human detection, object detection, and interaction classification. Then we propose a novel feature unify decoder to associate the predictions of each disentangled decoder, and introduce a multiplex relation embedding module and an attentive fusion module to perform sufficient contextual information exchange among branches. Additionally, to reduce the model’s computational cost, a position-sensitive axial attention is incorporated into the encoder, allowing our model to achieve a better accuracy-complexity trade-off. Extensive experiments are conducted on two public HOI benchmarks to demonstrate the effectiveness of our approach. The results indicate that our model outperforms other methods, achieving state-of-the-art performance.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141166144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EEA-Net: edge-enhanced assistance network for infrared small target detection EEA-Net:用于红外小目标探测的边缘增强辅助网络
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-23 DOI: 10.1007/s00138-024-01554-y
Chen Wang, Xiaopeng Hu, Xiang Gao, Haoyu Wei, Jiawei Tao, Fan Wang
{"title":"EEA-Net: edge-enhanced assistance network for infrared small target detection","authors":"Chen Wang, Xiaopeng Hu, Xiang Gao, Haoyu Wei, Jiawei Tao, Fan Wang","doi":"10.1007/s00138-024-01554-y","DOIUrl":"https://doi.org/10.1007/s00138-024-01554-y","url":null,"abstract":"","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141106759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A visual foreign object detection system for wireless charging of electric vehicles 电动汽车无线充电视觉异物检测系统
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-22 DOI: 10.1007/s00138-024-01553-z
Bijan Shahbaz Nejad, Peter Roch, M. Handte, P. J. Marrón
{"title":"A visual foreign object detection system for wireless charging of electric vehicles","authors":"Bijan Shahbaz Nejad, Peter Roch, M. Handte, P. J. Marrón","doi":"10.1007/s00138-024-01553-z","DOIUrl":"https://doi.org/10.1007/s00138-024-01553-z","url":null,"abstract":"","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141110486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FDT − Dr2T: a unified Dense Radiology Report Generation Transformer framework for X-ray images FDT - Dr2T:统一的 X 射线图像密集放射报告生成转换器框架
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-21 DOI: 10.1007/s00138-024-01544-0
Dhruv Sharma, Chhavi Dhiman, Dinesh Kumar
{"title":"FDT − Dr2T: a unified Dense Radiology Report Generation Transformer framework for X-ray images","authors":"Dhruv Sharma, Chhavi Dhiman, Dinesh Kumar","doi":"10.1007/s00138-024-01544-0","DOIUrl":"https://doi.org/10.1007/s00138-024-01544-0","url":null,"abstract":"","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141116678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation 三维点云分类和语义分割的深度学习技术综述
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-18 DOI: 10.1007/s00138-024-01543-1
Sushmita Sarker, Prithul Sarker, Gunner Stone, Ryan Gorman, Alireza Tavakkoli, George Bebis, Javad Sattarvand

Point cloud analysis has a wide range of applications in many areas such as computer vision, robotic manipulation, and autonomous driving. While deep learning has achieved remarkable success on image-based tasks, there are many unique challenges faced by deep neural networks in processing massive, unordered, irregular and noisy 3D points. To stimulate future research, this paper analyzes recent progress in deep learning methods employed for point cloud processing and presents challenges and potential directions to advance this field. It serves as a comprehensive review on two major tasks in 3D point cloud processing—namely, 3D shape classification and semantic segmentation.

点云分析在计算机视觉、机器人操纵和自动驾驶等许多领域有着广泛的应用。虽然深度学习在基于图像的任务中取得了显著成就,但深度神经网络在处理大量、无序、不规则和嘈杂的三维点时面临着许多独特的挑战。为了激励未来的研究,本文分析了点云处理所采用的深度学习方法的最新进展,并提出了推进这一领域的挑战和潜在方向。本文全面回顾了三维点云处理中的两大任务,即三维形状分类和语义分割。
{"title":"A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation","authors":"Sushmita Sarker, Prithul Sarker, Gunner Stone, Ryan Gorman, Alireza Tavakkoli, George Bebis, Javad Sattarvand","doi":"10.1007/s00138-024-01543-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01543-1","url":null,"abstract":"<p>Point cloud analysis has a wide range of applications in many areas such as computer vision, robotic manipulation, and autonomous driving. While deep learning has achieved remarkable success on image-based tasks, there are many unique challenges faced by deep neural networks in processing massive, unordered, irregular and noisy 3D points. To stimulate future research, this paper analyzes recent progress in deep learning methods employed for point cloud processing and presents challenges and potential directions to advance this field. It serves as a comprehensive review on two major tasks in 3D point cloud processing—namely, 3D shape classification and semantic segmentation.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141059136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty estimates for semantic segmentation: providing enhanced reliability for automated motor claims handling 语义分割的不确定性估计:为自动汽车索赔处理提供更高可靠性
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-15 DOI: 10.1007/s00138-024-01541-3
Jan Küchler, Daniel Kröll, Sebastian Schoenen, Andreas Witte

Deep neural network models for image segmentation can be a powerful tool for the automation of motor claims handling processes in the insurance industry. A crucial aspect is the reliability of the model outputs when facing adverse conditions, such as low quality photos taken by claimants to document damages. We explore the use of a meta-classification model to empirically assess the precision of segments predicted by a model trained for the semantic segmentation of car body parts. Different sets of features correlated with the quality of a segment are compared, and an AUROC score of 0.915 is achieved for distinguishing between high- and low-quality segments. By removing low-quality segments, the average (m{textit{IoU}} ) of the segmentation output is improved by 16 percentage points and the number of wrongly predicted segments is reduced by 77%.

用于图像分割的深度神经网络模型是保险业汽车理赔处理流程自动化的有力工具。一个至关重要的方面是,在面临不利条件(例如索赔人为记录损失而拍摄的低质量照片)时,模型输出的可靠性。我们探索了元分类模型的使用方法,以实证评估为车身部件语义分割而训练的模型所预测的分割精确度。我们比较了与片段质量相关的不同特征集,在区分高质量和低质量片段方面,AUROC 得分为 0.915。通过去除低质量片段,分割输出的平均 (m{textit{IoU}} ) 分数提高了 16 个百分点,错误预测的片段数量减少了 77%。
{"title":"Uncertainty estimates for semantic segmentation: providing enhanced reliability for automated motor claims handling","authors":"Jan Küchler, Daniel Kröll, Sebastian Schoenen, Andreas Witte","doi":"10.1007/s00138-024-01541-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01541-3","url":null,"abstract":"<p>Deep neural network models for image segmentation can be a powerful tool for the automation of motor claims handling processes in the insurance industry. A crucial aspect is the reliability of the model outputs when facing adverse conditions, such as low quality photos taken by claimants to document damages. We explore the use of a meta-classification model to empirically assess the precision of segments predicted by a model trained for the semantic segmentation of car body parts. Different sets of features correlated with the quality of a segment are compared, and an AUROC score of 0.915 is achieved for distinguishing between high- and low-quality segments. By removing low-quality segments, the average <span>(m{textit{IoU}} )</span> of the segmentation output is improved by 16 percentage points and the number of wrongly predicted segments is reduced by 77%.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141059207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thermal infrared action recognition with two-stream shift Graph Convolutional Network 利用双流移位图卷积网络识别热红外动作
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-05-13 DOI: 10.1007/s00138-024-01550-2
Jishi Liu, Huanyu Wang, Junnian Wang, Dalin He, Ruihan Xu, Xiongfeng Tang

The extensive deployment of camera-based IoT devices in our society is heightening the vulnerability of citizens’ sensitive information and individual data privacy. In this context, thermal imaging techniques become essential for data desensitization, entailing the elimination of sensitive data to safeguard individual privacy. Meanwhile, thermal imaging techniques can also play a important role in industry by considering the industrial environment with low resolution, high noise and unclear objects’ features. Moreover, existing works often process the entire video as a single entity, which results in suboptimal robustness by overlooking individual actions occurring at different times. In this paper, we propose a lightweight algorithm for action recognition in thermal infrared videos using human skeletons to address this. Our approach includes YOLOv7-tiny for target detection, Alphapose for pose estimation, dynamic skeleton modeling, and Graph Convolutional Networks (GCN) for spatial-temporal feature extraction in action prediction. To overcome detection and pose challenges, we created OQ35-human and OQ35-keypoint datasets for training. Besides, the proposed model enhances robustness by using visible spectrum data for GCN training. Furthermore, we introduce the two-stream shift Graph Convolutional Network to improve the action recognition accuracy. Our experimental results on the custom thermal infrared action dataset (InfAR-skeleton) demonstrate Top-1 accuracy of 88.06% and Top-5 accuracy of 98.28%. On the filtered kinetics-skeleton dataset, the algorithm achieves Top-1 accuracy of 55.26% and Top-5 accuracy of 83.98%. Thermal Infrared Action Recognition ensures the protection of individual privacy while meeting the requirements of action recognition.

随着基于摄像头的物联网设备在社会中的广泛部署,公民的敏感信息和个人数据隐私变得更加脆弱。在这种情况下,红外热成像技术成为数据脱敏的必要手段,即消除敏感数据以保护个人隐私。同时,考虑到工业环境分辨率低、噪声大、物体特征不清晰等问题,热成像技术在工业领域也能发挥重要作用。此外,现有的研究通常将整个视频作为一个整体进行处理,从而忽略了不同时间发生的个别动作,导致鲁棒性不理想。针对这一问题,我们在本文中提出了一种利用人体骨骼在热红外视频中进行动作识别的轻量级算法。我们的方法包括用于目标检测的 YOLOv7-tiny、用于姿势估计的 Alphapose、动态骨架建模以及用于动作预测的时空特征提取的图卷积网络(GCN)。为了克服检测和姿势方面的挑战,我们创建了 OQ35-human 和 OQ35-keypoint 数据集进行训练。此外,通过使用可见光谱数据进行 GCN 训练,所提出的模型增强了鲁棒性。此外,我们还引入了双流移位图卷积网络,以提高动作识别的准确性。我们在定制热红外动作数据集(InfAR-skeleton)上的实验结果表明,Top-1 准确率为 88.06%,Top-5 准确率为 98.28%。在过滤动力学骨架数据集上,该算法的 Top-1 准确率为 55.26%,Top-5 准确率为 83.98%。热红外动作识别技术在满足动作识别要求的同时,确保了对个人隐私的保护。
{"title":"Thermal infrared action recognition with two-stream shift Graph Convolutional Network","authors":"Jishi Liu, Huanyu Wang, Junnian Wang, Dalin He, Ruihan Xu, Xiongfeng Tang","doi":"10.1007/s00138-024-01550-2","DOIUrl":"https://doi.org/10.1007/s00138-024-01550-2","url":null,"abstract":"<p>The extensive deployment of camera-based IoT devices in our society is heightening the vulnerability of citizens’ sensitive information and individual data privacy. In this context, thermal imaging techniques become essential for data desensitization, entailing the elimination of sensitive data to safeguard individual privacy. Meanwhile, thermal imaging techniques can also play a important role in industry by considering the industrial environment with low resolution, high noise and unclear objects’ features. Moreover, existing works often process the entire video as a single entity, which results in suboptimal robustness by overlooking individual actions occurring at different times. In this paper, we propose a lightweight algorithm for action recognition in thermal infrared videos using human skeletons to address this. Our approach includes YOLOv7-tiny for target detection, Alphapose for pose estimation, dynamic skeleton modeling, and Graph Convolutional Networks (GCN) for spatial-temporal feature extraction in action prediction. To overcome detection and pose challenges, we created OQ35-human and OQ35-keypoint datasets for training. Besides, the proposed model enhances robustness by using visible spectrum data for GCN training. Furthermore, we introduce the two-stream shift Graph Convolutional Network to improve the action recognition accuracy. Our experimental results on the custom thermal infrared action dataset (InfAR-skeleton) demonstrate Top-1 accuracy of 88.06% and Top-5 accuracy of 98.28%. On the filtered kinetics-skeleton dataset, the algorithm achieves Top-1 accuracy of 55.26% and Top-5 accuracy of 83.98%. Thermal Infrared Action Recognition ensures the protection of individual privacy while meeting the requirements of action recognition.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140932771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Vision and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1