首页 > 最新文献

Machine Vision and Applications最新文献

英文 中文
YOLOMH: you only look once for multi-task driving perception with high efficiency YOLOMH:您只需看一次,就能获得高效的多任务驾驶感知
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-29 DOI: 10.1007/s00138-024-01525-3
Liu Fang, Sun Bowen, Jianxi Miao, Weixing Su
{"title":"YOLOMH: you only look once for multi-task driving perception with high efficiency","authors":"Liu Fang, Sun Bowen, Jianxi Miao, Weixing Su","doi":"10.1007/s00138-024-01525-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01525-3","url":null,"abstract":"","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140365096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ssman: self-supervised masked adaptive network for 3D human pose estimation Ssman:用于三维人体姿态估计的自监督屏蔽自适应网络
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-27 DOI: 10.1007/s00138-024-01514-6

Abstract

The modern deep learning-based models for 3D human pose estimation from monocular images always lack the adaption ability between occlusion and non-occlusion scenarios, which might restrict the performance of current methods when faced with various scales of occluded conditions. In an attempt to tackle this problem, we propose a novel network called self-supervised masked adaptive network (SSMAN). Firstly, we leverage different levels of masks to cover the richness of occlusion in fully in-the-wild environment. Then, we design a multi-line adaptive network, which could be trained with various scales of masked images in parallel. Based on this masked adaptive network, we train it with self-supervised learning to enforce the consistency across the outputs under different mask ratios. Furthermore, a global refinement module is proposed to leverage global features of the human body to refine the human pose estimated solely by local features. We perform extensive experiments both on the occlusion datasets like 3DPW-OCC and OCHuman and general datasets such as Human3.6M and 3DPW. The results show that SSMAN achieves new state-of-the-art performance on both lightly and heavily occluded benchmarks and is highly competitive with significant improvement on standard benchmarks.

摘要 基于深度学习的现代单目图像三维人体姿态估计模型总是缺乏遮挡与非遮挡场景之间的自适应能力,这可能会限制当前方法在面对各种规模的遮挡条件时的性能。为了解决这一问题,我们提出了一种名为自监督遮挡自适应网络(SSMAN)的新型网络。首先,我们利用不同层次的遮挡来覆盖完全野外环境中丰富的遮挡情况。然后,我们设计了一个多线自适应网络,可以并行地使用不同尺度的遮蔽图像进行训练。在此遮蔽自适应网络的基础上,我们通过自监督学习对其进行训练,以确保不同遮蔽比例下输出的一致性。此外,我们还提出了一个全局细化模块,利用人体的全局特征来细化仅由局部特征估算出的人体姿态。我们在 3DPW-OCC 和 OCHuman 等遮挡数据集以及 Human3.6M 和 3DPW 等一般数据集上进行了大量实验。结果表明,无论是在轻度还是重度遮挡基准上,SSMAN 都取得了新的一流性能,而且在标准基准上也有显著改进,具有很强的竞争力。
{"title":"Ssman: self-supervised masked adaptive network for 3D human pose estimation","authors":"","doi":"10.1007/s00138-024-01514-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01514-6","url":null,"abstract":"<h3>Abstract</h3> <p>The modern deep learning-based models for 3D human pose estimation from monocular images always lack the adaption ability between occlusion and non-occlusion scenarios, which might restrict the performance of current methods when faced with various scales of occluded conditions. In an attempt to tackle this problem, we propose a novel network called self-supervised masked adaptive network (SSMAN). Firstly, we leverage different levels of masks to cover the richness of occlusion in fully in-the-wild environment. Then, we design a multi-line adaptive network, which could be trained with various scales of masked images in parallel. Based on this masked adaptive network, we train it with self-supervised learning to enforce the consistency across the outputs under different mask ratios. Furthermore, a global refinement module is proposed to leverage global features of the human body to refine the human pose estimated solely by local features. We perform extensive experiments both on the occlusion datasets like 3DPW-OCC and OCHuman and general datasets such as Human3.6M and 3DPW. The results show that SSMAN achieves new state-of-the-art performance on both lightly and heavily occluded benchmarks and is highly competitive with significant improvement on standard benchmarks.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140316482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel based local matching network for video object segmentation 基于核的局部匹配网络用于视频对象分割
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-25 DOI: 10.1007/s00138-024-01524-4
Guoqiang Wang, Lan Li, Min Zhu, Rui Zhao, Xiang Zhang

Recently, the methods based on space-time memory network have achieved advanced performance in semi-supervised video object segmentation, which has attracted wide attention. However, this kind of methods still have a fatal limitation. It has the interference problem of similar objects caused by the way of non-local matching, which seriously limits the performance of video object segmentation. To solve this problem, we propose a Kernel-guided Attention Matching Network (KAMNet) by the use of local matching instead of non-local matching. At first, KAMNet uses spatio-temporal attention mechanism to enhance the model’s discrimination between foreground objects and background areas. Then KAMNet utilizes gaussian kernel to guide the matching between the current frame and the reference set. Because the gaussian kernel decays away from the center, it can limit the matching to the central region, thus achieving local matching. Our KAMNet gets speed-accuracy trade-off on benchmark datasets DAVIS 2016 (( mathcal {J & F}) of 87.6%) and DAVIS 2017 (( mathcal {J & F}) of 76.0%) with 0.12 second per frame.

最近,基于时空记忆网络的方法在半监督视频对象分割方面取得了先进的性能,引起了广泛关注。但是,这种方法仍然存在致命的局限性。它存在非局部匹配方式导致的相似对象干扰问题,严重限制了视频对象分割的性能。为了解决这个问题,我们提出了一种利用局部匹配代替非局部匹配的内核引导注意力匹配网络(KAMNet)。首先,KAMNet 利用时空注意力机制来增强模型对前景物体和背景区域的辨别能力。然后,KAMNet 利用高斯核引导当前帧与参考集之间的匹配。由于高斯核从中心开始衰减,它可以将匹配限制在中心区域,从而实现局部匹配。我们的 KAMNet 在基准数据集 DAVIS 2016(87.6%)和 DAVIS 2017(76.0%)上实现了速度与精度的权衡,每帧耗时 0.12 秒。
{"title":"Kernel based local matching network for video object segmentation","authors":"Guoqiang Wang, Lan Li, Min Zhu, Rui Zhao, Xiang Zhang","doi":"10.1007/s00138-024-01524-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01524-4","url":null,"abstract":"<p>Recently, the methods based on space-time memory network have achieved advanced performance in semi-supervised video object segmentation, which has attracted wide attention. However, this kind of methods still have a fatal limitation. It has the interference problem of similar objects caused by the way of non-local matching, which seriously limits the performance of video object segmentation. To solve this problem, we propose a Kernel-guided Attention Matching Network (KAMNet) by the use of local matching instead of non-local matching. At first, KAMNet uses spatio-temporal attention mechanism to enhance the model’s discrimination between foreground objects and background areas. Then KAMNet utilizes gaussian kernel to guide the matching between the current frame and the reference set. Because the gaussian kernel decays away from the center, it can limit the matching to the central region, thus achieving local matching. Our KAMNet gets speed-accuracy trade-off on benchmark datasets DAVIS 2016 (<span>( mathcal {J &amp; F})</span> of 87.6%) and DAVIS 2017 (<span>( mathcal {J &amp; F})</span> of 76.0%) with 0.12 second per frame.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140300673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark 利用无特征基线和无偏基准解决三维注册方法的通用性问题
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-23 DOI: 10.1007/s00138-024-01510-w
David Bojanić, Kristijan Bartol, Josep Forest, Tomislav Petković, Tomislav Pribanić

Recent 3D registration methods are mostly learning-based that either find correspondences in feature space and match them, or directly estimate the registration transformation from the given point cloud features. Therefore, these feature-based methods have difficulties with generalizing onto point clouds that differ substantially from their training data. This issue is not so apparent because of the problematic benchmark definitions that cannot provide any in-depth analysis and contain a bias toward similar data. Therefore, we propose a methodology to create a 3D registration benchmark, given a point cloud dataset, that provides a more informative evaluation of a method w.r.t. other benchmarks. Using this methodology, we create a novel FAUST-partial (FP) benchmark, based on the FAUST dataset, with several difficulty levels. The FP benchmark addresses the limitations of the current benchmarks: lack of data and parameter range variability, and allows to evaluate the strengths and weaknesses of a 3D registration method w.r.t. a single registration parameter. Using the new FP benchmark, we provide a thorough analysis of the current state-of-the-art methods and observe that the current method still struggle to generalize onto severely different out-of-sample data. Therefore, we propose a simple featureless traditional 3D registration baseline method based on the weighted cross-correlation between two given point clouds. Our method achieves strong results on current benchmarking datasets, outperforming most deep learning methods. Our source code is available on github.com/DavidBoja/exhaustive-grid-search.

最新的三维配准方法大多基于学习,要么在特征空间中找到对应点并进行匹配,要么直接从给定的点云特征中估算配准变换。因此,这些基于特征的方法很难推广到与其训练数据有很大差异的点云上。由于基准定义存在问题,无法提供任何深入分析,且偏向于类似数据,因此这一问题并不明显。因此,我们提出了一种创建三维注册基准的方法,给定一个点云数据集,该数据集可提供一种方法相对于其他基准的更翔实的评估。利用这种方法,我们在 FAUST 数据集的基础上创建了一个具有多个难度级别的新型 FAUST-partial(FP)基准。FP 基准解决了当前基准的局限性:缺乏数据和参数范围的可变性,并允许在单一注册参数方面评估三维注册方法的优缺点。通过使用新的 FP 基准,我们对当前最先进的方法进行了全面分析,发现当前的方法仍难以推广到严重不同的样本外数据上。因此,我们基于两个给定点云之间的加权交叉相关性,提出了一种简单的无特征传统三维注册基线方法。我们的方法在当前的基准数据集上取得了很好的效果,优于大多数深度学习方法。我们的源代码可在 github.com/DavidBoja/exhaustive-grid-search 上获取。
{"title":"Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark","authors":"David Bojanić, Kristijan Bartol, Josep Forest, Tomislav Petković, Tomislav Pribanić","doi":"10.1007/s00138-024-01510-w","DOIUrl":"https://doi.org/10.1007/s00138-024-01510-w","url":null,"abstract":"<p>Recent 3D registration methods are mostly learning-based that either find correspondences in feature space and match them, or directly estimate the registration transformation from the given point cloud features. Therefore, these feature-based methods have difficulties with generalizing onto point clouds that differ substantially from their training data. This issue is not so apparent because of the problematic benchmark definitions that cannot provide any in-depth analysis and contain a bias toward similar data. Therefore, we propose a methodology to create a 3D registration benchmark, given a point cloud dataset, that provides a more informative evaluation of a method w.r.t. other benchmarks. Using this methodology, we create a novel FAUST-partial (FP) benchmark, based on the FAUST dataset, with several difficulty levels. The FP benchmark addresses the limitations of the current benchmarks: lack of data and parameter range variability, and allows to evaluate the strengths and weaknesses of a 3D registration method w.r.t. a single registration parameter. Using the new FP benchmark, we provide a thorough analysis of the current state-of-the-art methods and observe that the current method still struggle to generalize onto severely different out-of-sample data. Therefore, we propose a simple featureless traditional 3D registration baseline method based on the weighted cross-correlation between two given point clouds. Our method achieves strong results on current benchmarking datasets, outperforming most deep learning methods. Our source code is available on github.com/DavidBoja/exhaustive-grid-search.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection AFMCT:基于跨模态变换块的自适应融合模块,用于三维物体检测
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-23 DOI: 10.1007/s00138-024-01509-3
Bingli Zhang, Yixin Wang, Chengbiao Zhang, Junzhao Jiang, Zehao Pan, Jin Cheng, Yangyang Zhang, Xinyu Wang, Chenglei Yang, Yanhui Wang

Lidar and camera are essential sensors for environment perception in autonomous driving. However, fully fusing heterogeneous data from multiple sources remains a non-trivial challenge. As a result, 3D object detection based on multi-modal sensor fusion are often inferior to single-modal methods only based on Lidar, which indicates that multi-sensor machine vision still needs development. In this paper, we propose an adaptive fusion module based on cross-modal transformer block(AFMCT) for 3D object detection by utilizing a bidirectional enhancing strategy. Specifically, we first enhance image feature by extracting an attention-based point feature based on a cross-modal transformer block and linking them in a concatenation fashion, followed by another cross-modal transformer block acting on the enhanced image feature to strengthen the point feature with image semantic information. Extensive experiments operated on the 3D detection benchmark of the KITTI dataset reveal that our proposed structure can significantly improve the detection accuracy of Lidar-only methods and outperform the existing advanced multi-sensor fusion modules by at least 0.45%, which indicates that our method might be a feasible solution to improving 3D object detection based on multi-sensor fusion.

激光雷达和摄像头是自动驾驶环境感知的重要传感器。然而,如何充分融合来自多个来源的异构数据仍是一项艰巨的挑战。因此,基于多模态传感器融合的 3D 物体检测往往不如仅基于激光雷达的单模态方法,这表明多传感器机器视觉仍有待发展。本文提出了一种基于跨模态变换块(AFMCT)的自适应融合模块,利用双向增强策略进行三维物体检测。具体来说,我们首先通过提取基于跨模态变换块的注意力点特征来增强图像特征,并以串联的方式将它们连接起来;然后,另一个跨模态变换块作用于增强后的图像特征,以图像语义信息来强化点特征。在 KITTI 数据集的三维检测基准上进行的大量实验表明,我们提出的结构可以显著提高纯激光雷达方法的检测精度,比现有的先进多传感器融合模块至少高出 0.45%,这表明我们的方法可能是基于多传感器融合改进三维物体检测的可行解决方案。
{"title":"AFMCT: adaptive fusion module based on cross-modal transformer block for 3D object detection","authors":"Bingli Zhang, Yixin Wang, Chengbiao Zhang, Junzhao Jiang, Zehao Pan, Jin Cheng, Yangyang Zhang, Xinyu Wang, Chenglei Yang, Yanhui Wang","doi":"10.1007/s00138-024-01509-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01509-3","url":null,"abstract":"<p>Lidar and camera are essential sensors for environment perception in autonomous driving. However, fully fusing heterogeneous data from multiple sources remains a non-trivial challenge. As a result, 3D object detection based on multi-modal sensor fusion are often inferior to single-modal methods only based on Lidar, which indicates that multi-sensor machine vision still needs development. In this paper, we propose an adaptive fusion module based on cross-modal transformer block(AFMCT) for 3D object detection by utilizing a bidirectional enhancing strategy. Specifically, we first enhance image feature by extracting an attention-based point feature based on a cross-modal transformer block and linking them in a concatenation fashion, followed by another cross-modal transformer block acting on the enhanced image feature to strengthen the point feature with image semantic information. Extensive experiments operated on the 3D detection benchmark of the KITTI dataset reveal that our proposed structure can significantly improve the detection accuracy of Lidar-only methods and outperform the existing advanced multi-sensor fusion modules by at least 0.45%, which indicates that our method might be a feasible solution to improving 3D object detection based on multi-sensor fusion.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyperspectral image dynamic range reconstruction using deep neural network-based denoising methods 利用基于深度神经网络的去噪方法重建高光谱图像动态范围
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-22 DOI: 10.1007/s00138-024-01523-5
Loran Cheplanov, Shai Avidan, David J. Bonfil, Iftach Klapp

Hyperspectral (HS) measurement is among the most useful tools in agriculture for early disease detection. However, the cost of HS cameras that can perform the desired detection tasks is prohibitive-typically fifty thousand to hundreds of thousands of dollars. In a previous study at the Agricultural Research Organization’s Volcani Institute (Israel), a low-cost, high-performing HS system was developed which included a point spectrometer and optical components. Its main disadvantage was long shooting time for each image. Shooting time strongly depends on the predetermined integration time of the point spectrometer. While essential for performing monitoring tasks in a reasonable time, shortening integration time from a typical value in the range of 200 ms to the 10 ms range results in deterioration of the dynamic range of the captured scene. In this work, we suggest correcting this by learning the transformation from data measured with short integration time to that measured with long integration time. Reduction of the dynamic range and consequent low SNR were successfully overcome using three developed deep neural networks models based on a denoising auto-encoder, DnCNN and LambdaNetworks architectures as a backbone. The best model was based on DnCNN using a combined loss function of (ell _{2}) and Kullback–Leibler divergence on images with 20 consecutive channels. The full spectrum of the model achieved a mean PSNR of 30.61 and mean SSIM of 0.9, showing total improvement relatively to the 10 ms measurements’ mean PSNR and mean SSIM values by 60.43% and 94.51%, respectively.

高光谱(HS)测量是农业领域早期疾病检测最有用的工具之一。然而,能够完成所需检测任务的高光谱相机成本高昂,通常需要五万到数十万美元。在以色列农业研究组织 Volcani 研究所之前进行的一项研究中,开发了一种低成本、高性能的 HS 系统,其中包括一个点光谱仪和光学元件。其主要缺点是每幅图像的拍摄时间较长。拍摄时间在很大程度上取决于点光谱仪的预定积分时间。虽然在合理的时间内执行监测任务非常重要,但将积分时间从 200 毫秒的典型值缩短到 10 毫秒范围内会导致拍摄场景的动态范围恶化。在这项工作中,我们建议通过学习从短积分时间测量的数据到长积分时间测量的数据之间的转换来纠正这种情况。以去噪自动编码器、DnCNN 和 LambdaNetworks 架构为骨干,利用开发的三种深度神经网络模型,成功克服了动态范围减小和随之而来的低信噪比问题。最好的模型是基于 DnCNN 的模型,在 20 个连续通道的图像上使用了 (ell _{2}) 和 Kullback-Leibler 发散的组合损失函数。该模型的全光谱平均 PSNR 为 30.61,平均 SSIM 为 0.9,与 10 ms 测量值相比,平均 PSNR 和平均 SSIM 值分别提高了 60.43% 和 94.51%。
{"title":"Hyperspectral image dynamic range reconstruction using deep neural network-based denoising methods","authors":"Loran Cheplanov, Shai Avidan, David J. Bonfil, Iftach Klapp","doi":"10.1007/s00138-024-01523-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01523-5","url":null,"abstract":"<p>Hyperspectral (HS) measurement is among the most useful tools in agriculture for early disease detection. However, the cost of HS cameras that can perform the desired detection tasks is prohibitive-typically fifty thousand to hundreds of thousands of dollars. In a previous study at the Agricultural Research Organization’s Volcani Institute (Israel), a low-cost, high-performing HS system was developed which included a point spectrometer and optical components. Its main disadvantage was long shooting time for each image. Shooting time strongly depends on the predetermined integration time of the point spectrometer. While essential for performing monitoring tasks in a reasonable time, shortening integration time from a typical value in the range of 200 ms to the 10 ms range results in deterioration of the dynamic range of the captured scene. In this work, we suggest correcting this by learning the transformation from data measured with short integration time to that measured with long integration time. Reduction of the dynamic range and consequent low SNR were successfully overcome using three developed deep neural networks models based on a denoising auto-encoder, DnCNN and LambdaNetworks architectures as a backbone. The best model was based on DnCNN using a combined loss function of <span>(ell _{2})</span> and Kullback–Leibler divergence on images with 20 consecutive channels. The full spectrum of the model achieved a mean PSNR of 30.61 and mean SSIM of 0.9, showing total improvement relatively to the 10 ms measurements’ mean PSNR and mean SSIM values by 60.43% and 94.51%, respectively.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Point cloud registration with quantile assignment 利用量化赋值进行点云注册
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-19 DOI: 10.1007/s00138-024-01517-3
Ecenur Oğuz, Yalım Doğan, Uğur Güdükbay, Oya Karaşan, Mustafa Pınar

Point cloud registration is a fundamental problem in computer vision. The problem encompasses critical tasks such as feature estimation, correspondence matching, and transformation estimation. The point cloud registration problem can be cast as a quantile matching problem. We refined the quantile assignment algorithm by integrating prevalent feature descriptors and transformation estimation methods to enhance the correspondence between the source and target point clouds. We evaluated the performances of these descriptors and methods with our approach through controlled experiments on a dataset we constructed using well-known 3D models. This systematic investigation led us to identify the most suitable methods for complementing our approach. Subsequently, we devised a new end-to-end, coarse-to-fine pairwise point cloud registration framework. Finally, we tested our framework on indoor and outdoor benchmark datasets and compared our results with state-of-the-art point cloud registration methods.

点云注册是计算机视觉中的一个基本问题。该问题包括特征估计、对应匹配和变换估计等关键任务。点云注册问题可以看作是一个量化匹配问题。我们通过整合流行的特征描述和变换估计方法,改进了量子分配算法,以提高源点云和目标点云之间的对应性。我们通过在利用著名 3D 模型构建的数据集上进行对照实验,评估了这些描述符和方法与我们的方法的性能。这一系统性调查使我们确定了最适合补充我们方法的方法。随后,我们设计了一个新的端到端、粗到细的点云配对注册框架。最后,我们在室内和室外基准数据集上测试了我们的框架,并将结果与最先进的点云配准方法进行了比较。
{"title":"Point cloud registration with quantile assignment","authors":"Ecenur Oğuz, Yalım Doğan, Uğur Güdükbay, Oya Karaşan, Mustafa Pınar","doi":"10.1007/s00138-024-01517-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01517-3","url":null,"abstract":"<p>Point cloud registration is a fundamental problem in computer vision. The problem encompasses critical tasks such as feature estimation, correspondence matching, and transformation estimation. The point cloud registration problem can be cast as a quantile matching problem. We refined the quantile assignment algorithm by integrating prevalent feature descriptors and transformation estimation methods to enhance the correspondence between the source and target point clouds. We evaluated the performances of these descriptors and methods with our approach through controlled experiments on a dataset we constructed using well-known 3D models. This systematic investigation led us to identify the most suitable methods for complementing our approach. Subsequently, we devised a new end-to-end, coarse-to-fine pairwise point cloud registration framework. Finally, we tested our framework on indoor and outdoor benchmark datasets and compared our results with state-of-the-art point cloud registration methods.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An image quality assessment method based on edge extraction and singular value for blurriness 基于边缘提取和奇异值的模糊度图像质量评估方法
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-19 DOI: 10.1007/s00138-024-01522-6
Lei Zhou, Chuanlin Liu, Amit Yadav, Sami Azam, Asif Karim

The automatic assessment of perceived image quality is crucial in the field of image processing. To achieve this idea, we propose an image quality assessment (IQA) method for blurriness. The features of gradient and singular value were extracted in this method instead of the single feature in the traditional IQA algorithms. According to the insufficient size of existing public image quality assessment datasets to support deep learning, machine learning was introduced to fuse the features of multiple domains, and a new no-reference (NR) IQA method for blurriness denoted Feature fusion IQA(Ffu-IQA) was proposed. The Ffu-IQA uses a probabilistic model to estimate the probability of each edge detection blur in the image, and then uses machine learning to aggregate the probability information to obtain the edge quality score. After that uses the singular value obtained by singular value decomposition of the image matrix to calculate the singular value score. Finally, machine learning pooling is used to obtain the true quality score. Ffu-IQA achieves PLCC scores of 0.9570 and 0.9616 on CSIQ and TID2013, respectively, and SROCC scores of 0.9380 and 0.9531, which are better than most traditional image quality assessment methods for blurriness.

自动评估感知图像质量在图像处理领域至关重要。为此,我们提出了一种模糊度图像质量评估(IQA)方法。该方法提取了梯度和奇异值特征,而非传统 IQA 算法中的单一特征。针对现有公开图像质量评估数据集的规模不足以支持深度学习的问题,引入机器学习来融合多个域的特征,提出了一种新的无参考(NR)模糊度 IQA 方法,即特征融合 IQA(Ffu-IQA)。Ffu-IQA 利用概率模型估计图像中每个边缘检测模糊的概率,然后利用机器学习将概率信息汇总,得到边缘质量得分。然后利用图像矩阵奇异值分解得到的奇异值计算奇异值得分。最后,通过机器学习池化得到真实质量得分。Ffu-IQA 在 CSIQ 和 TID2013 上的 PLCC 得分分别为 0.9570 和 0.9616,SROCC 得分分别为 0.9380 和 0.9531,在模糊度方面优于大多数传统图像质量评估方法。
{"title":"An image quality assessment method based on edge extraction and singular value for blurriness","authors":"Lei Zhou, Chuanlin Liu, Amit Yadav, Sami Azam, Asif Karim","doi":"10.1007/s00138-024-01522-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01522-6","url":null,"abstract":"<p>The automatic assessment of perceived image quality is crucial in the field of image processing. To achieve this idea, we propose an image quality assessment (IQA) method for blurriness. The features of gradient and singular value were extracted in this method instead of the single feature in the traditional IQA algorithms. According to the insufficient size of existing public image quality assessment datasets to support deep learning, machine learning was introduced to fuse the features of multiple domains, and a new no-reference (NR) IQA method for blurriness denoted Feature fusion IQA(Ffu-IQA) was proposed. The Ffu-IQA uses a probabilistic model to estimate the probability of each edge detection blur in the image, and then uses machine learning to aggregate the probability information to obtain the edge quality score. After that uses the singular value obtained by singular value decomposition of the image matrix to calculate the singular value score. Finally, machine learning pooling is used to obtain the true quality score. Ffu-IQA achieves PLCC scores of 0.9570 and 0.9616 on CSIQ and TID2013, respectively, and SROCC scores of 0.9380 and 0.9531, which are better than most traditional image quality assessment methods for blurriness.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140171870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal teacher with masked transformers for semi-supervised action proposal generation 半监督行动建议生成时态教师与遮蔽变换器
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-15 DOI: 10.1007/s00138-024-01521-7
Selen Pehlivan, Jorma Laaksonen

By conditioning on unit-level predictions, anchor-free models for action proposal generation have displayed impressive capabilities, such as having a lightweight architecture. However, task performance depends significantly on the quality of data used in training, and most effective models have relied on human-annotated data. Semi-supervised learning, i.e., jointly training deep neural networks with a labeled dataset as well as an unlabeled dataset, has made significant progress recently. Existing works have either primarily focused on classification tasks, which may require less annotation effort, or considered anchor-based detection models. Inspired by recent advances in semi-supervised methods on anchor-free object detectors, we propose a teacher-student framework for a two-stage action detection pipeline, named Temporal Teacher with Masked Transformers (TTMT), to generate high-quality action proposals based on an anchor-free transformer model. Leveraging consistency learning as one self-training technique, the model jointly trains an anchor-free student model and a gradually progressing teacher counterpart in a mutually beneficial manner. As the core model, we design a Transformer-based anchor-free model to improve effectiveness for temporal evaluation. We integrate bi-directional masks and devise encoder-only Masked Transformers for sequences. Jointly training on boundary locations and various local snippet-based features, our model predicts via the proposed scoring function for generating proposal candidates. Experiments on the THUMOS14 and ActivityNet-1.3 benchmarks demonstrate the effectiveness of our model for temporal proposal generation task.

通过以单元级预测为条件,用于生成行动建议的无锚模型已显示出令人印象深刻的能力,如轻量级架构。然而,任务性能在很大程度上取决于训练中使用的数据质量,而大多数有效的模型都依赖于人类标注的数据。半监督学习,即使用标注数据集和未标注数据集联合训练深度神经网络,最近取得了重大进展。现有的研究要么主要关注分类任务,这可能需要较少的标注工作,要么考虑基于锚的检测模型。受无锚对象检测器半监督方法最新进展的启发,我们提出了一个两阶段动作检测管道的师生框架,命名为 "带屏蔽变换器的时态教师"(TTMT),以基于无锚变换器模型生成高质量的动作建议。该模型利用一致性学习作为一种自我训练技术,以互惠互利的方式联合训练无锚学生模型和渐进教师模型。作为核心模型,我们设计了一个基于转换器的无锚模型,以提高时态评估的有效性。我们整合了双向掩码,并为序列设计了仅用于编码器的掩码变换器。通过对边界位置和各种基于局部片段的特征进行联合训练,我们的模型通过提议的评分函数进行预测,从而生成提案候选。在 THUMOS14 和 ActivityNet-1.3 基准上进行的实验证明了我们的模型在时态提案生成任务中的有效性。
{"title":"Temporal teacher with masked transformers for semi-supervised action proposal generation","authors":"Selen Pehlivan, Jorma Laaksonen","doi":"10.1007/s00138-024-01521-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01521-7","url":null,"abstract":"<p>By conditioning on unit-level predictions, anchor-free models for action proposal generation have displayed impressive capabilities, such as having a lightweight architecture. However, task performance depends significantly on the quality of data used in training, and most effective models have relied on human-annotated data. Semi-supervised learning, i.e., jointly training deep neural networks with a labeled dataset as well as an unlabeled dataset, has made significant progress recently. Existing works have either primarily focused on classification tasks, which may require less annotation effort, or considered anchor-based detection models. Inspired by recent advances in semi-supervised methods on anchor-free object detectors, we propose a teacher-student framework for a two-stage action detection pipeline, named Temporal Teacher with Masked Transformers (TTMT), to generate high-quality action proposals based on an anchor-free transformer model. Leveraging consistency learning as one self-training technique, the model jointly trains an anchor-free student model and a gradually progressing teacher counterpart in a mutually beneficial manner. As the core model, we design a Transformer-based anchor-free model to improve effectiveness for temporal evaluation. We integrate bi-directional masks and devise encoder-only Masked Transformers for sequences. Jointly training on boundary locations and various local snippet-based features, our model predicts via the proposed scoring function for generating proposal candidates. Experiments on the THUMOS14 and ActivityNet-1.3 benchmarks demonstrate the effectiveness of our model for temporal proposal generation task.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140150877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial robustness improvement for deep neural networks 提高深度神经网络的对抗鲁棒性
IF 3.3 4区 计算机科学 Q2 Computer Science Pub Date : 2024-03-14 DOI: 10.1007/s00138-024-01519-1
Charis Eleftheriadis, Andreas Symeonidis, Panagiotis Katsaros

Deep neural networks (DNNs) are key components for the implementation of autonomy in systems that operate in highly complex and unpredictable environments (self-driving cars, smart traffic systems, smart manufacturing, etc.). It is well known that DNNs are vulnerable to adversarial examples, i.e. minimal and usually imperceptible perturbations, applied to their inputs, leading to false predictions. This threat poses critical challenges, especially when DNNs are deployed in safety or security-critical systems, and renders as urgent the need for defences that can improve the trustworthiness of DNN functions. Adversarial training has proven effective in improving the robustness of DNNs against a wide range of adversarial perturbations. However, a general framework for adversarial defences is needed that will extend beyond a single-dimensional assessment of robustness improvement; it is essential to consider simultaneously several distance metrics and adversarial attack strategies. Using such an approach we report the results from extensive experimentation on adversarial defence methods that could improve DNNs resilience to adversarial threats. We wrap up by introducing a general adversarial training methodology, which, according to our experimental results, opens prospects for an holistic defence against a range of diverse types of adversarial perturbations.

深度神经网络(DNN)是在高度复杂和不可预测的环境(自动驾驶汽车、智能交通系统、智能制造等)中运行的系统实现自动驾驶的关键组件。众所周知,DNNs 很容易受到对抗范例的影响,即对其输入施加最小且通常难以察觉的扰动,从而导致错误预测。这种威胁带来了严峻的挑战,尤其是当 DNN 被部署到安全或安保关键系统中时,因此迫切需要能够提高 DNN 功能可信度的防御措施。事实证明,对抗性训练能有效提高 DNN 的鲁棒性,使其免受各种对抗性扰动的影响。然而,我们需要一个通用的对抗性防御框架,它将超越对鲁棒性改进的单一维度评估;同时考虑多个距离度量和对抗性攻击策略至关重要。利用这种方法,我们报告了对抗性防御方法的广泛实验结果,这些方法可以提高 DNN 对对抗性威胁的复原力。最后,我们介绍了一种通用的对抗训练方法,根据我们的实验结果,这种方法为全面防御各种类型的对抗扰动开辟了前景。
{"title":"Adversarial robustness improvement for deep neural networks","authors":"Charis Eleftheriadis, Andreas Symeonidis, Panagiotis Katsaros","doi":"10.1007/s00138-024-01519-1","DOIUrl":"https://doi.org/10.1007/s00138-024-01519-1","url":null,"abstract":"<p>Deep neural networks (DNNs) are key components for the implementation of autonomy in systems that operate in highly complex and unpredictable environments (self-driving cars, smart traffic systems, smart manufacturing, etc.). It is well known that DNNs are vulnerable to adversarial examples, i.e. minimal and usually imperceptible perturbations, applied to their inputs, leading to false predictions. This threat poses critical challenges, especially when DNNs are deployed in safety or security-critical systems, and renders as urgent the need for defences that can improve the trustworthiness of DNN functions. Adversarial training has proven effective in improving the robustness of DNNs against a wide range of adversarial perturbations. However, a general framework for adversarial defences is needed that will extend beyond a single-dimensional assessment of robustness improvement; it is essential to consider simultaneously several distance metrics and adversarial attack strategies. Using such an approach we report the results from extensive experimentation on adversarial defence methods that could improve DNNs resilience to adversarial threats. We wrap up by introducing a general adversarial training methodology, which, according to our experimental results, opens prospects for an holistic defence against a range of diverse types of adversarial perturbations.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140156501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Vision and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1