Machine Vision and Applications最新文献_第4页

An adversarial sample detection method based on heterogeneous denoising 基于异构去噪的对抗样本检测方法

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-09 DOI: 10.1007/s00138-024-01579-3

Lifang Zhu, Chao Liu, Zhiqiang Zhang, Yifan Cheng, Biao Jie, Xintao Ding

Deep learning has been used in many computer-vision-based applications. However, deep neural networks are vulnerable to adversarial examples that have been crafted specifically to fool a system while being imperceptible to humans. In this paper, we propose a detection defense method based on heterogeneous denoising on foreground and background (HDFB). Since an image region that dominates to the output classification is usually sensitive to adversarial perturbations, HDFB focuses defense on the foreground region rather than the whole image. First, HDFB uses class activation map to segment examples into foreground and background regions. Second, the foreground and background are encoded to square patches. Third, the encoded foreground is zoomed in and out and is denoised in two scales. Subsequently, the encoded background is denoised once using bilateral filtering. After that, the denoised foreground and background patches are decoded. Finally, the decoded foreground and background are stitched together as a denoised sample for classification. If the classifications of the denoised and input images are different, the input image is detected as an adversarial example. The comparison experiments are implemented on CIFAR-10 and MiniImageNet. The average detection rate (DR) against white-box attacks on the test sets of the two datasets is 86.4%. The average DR against black-box attacks on MiniImageNet is 88.4%. The experimental results suggest that HDFB shows high performance on adversarial examples and is robust against white-box and black-box adversarial attacks. However, HDFB is insecure if its defense parameters are exposed to attackers.

深度学习已被用于许多基于计算机视觉的应用中。然而，深度神经网络很容易受到对抗性示例的影响，这些对抗性示例专门用来欺骗系统，而人类却无法察觉。本文提出了一种基于前景和背景异构去噪（HDFB）的检测防御方法。由于对输出分类起主导作用的图像区域通常对对抗性扰动很敏感，因此 HDFB 将防御重点放在前景区域而不是整个图像上。首先，HDFB 使用类激活图将示例分割为前景和背景区域。其次，将前景和背景编码为正方形斑块。第三，对编码后的前景进行放大和缩小，并在两个尺度上进行去噪处理。随后，使用双边滤波对编码后的背景进行一次去噪。然后，对去噪的前景和背景斑块进行解码。最后，将解码后的前景和背景拼接在一起，作为去噪样本进行分类。如果去噪图像和输入图像的分类结果不同，输入图像就会被检测为对抗样本。对比实验在 CIFAR-10 和 MiniImageNet 上进行。在这两个数据集的测试集上，针对白盒攻击的平均检测率（DR）为 86.4%。在 MiniImageNet 上，针对黑盒攻击的平均检测率为 88.4%。实验结果表明，HDFB 在对抗性示例上表现出很高的性能，并且对白盒和黑盒对抗性攻击具有很强的鲁棒性。但是，如果 HDFB 的防御参数暴露给攻击者，它就不安全了。

{"title":"An adversarial sample detection method based on heterogeneous denoising","authors":"Lifang Zhu, Chao Liu, Zhiqiang Zhang, Yifan Cheng, Biao Jie, Xintao Ding","doi":"10.1007/s00138-024-01579-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01579-3","url":null,"abstract":"Deep learning has been used in many computer-vision-based applications. However, deep neural networks are vulnerable to adversarial examples that have been crafted specifically to fool a system while being imperceptible to humans. In this paper, we propose a detection defense method based on heterogeneous denoising on foreground and background (HDFB). Since an image region that dominates to the output classification is usually sensitive to adversarial perturbations, HDFB focuses defense on the foreground region rather than the whole image. First, HDFB uses class activation map to segment examples into foreground and background regions. Second, the foreground and background are encoded to square patches. Third, the encoded foreground is zoomed in and out and is denoised in two scales. Subsequently, the encoded background is denoised once using bilateral filtering. After that, the denoised foreground and background patches are decoded. Finally, the decoded foreground and background are stitched together as a denoised sample for classification. If the classifications of the denoised and input images are different, the input image is detected as an adversarial example. The comparison experiments are implemented on CIFAR-10 and MiniImageNet. The average detection rate (DR) against white-box attacks on the test sets of the two datasets is 86.4%. The average DR against black-box attacks on MiniImageNet is 88.4%. The experimental results suggest that HDFB shows high performance on adversarial examples and is robust against white-box and black-box adversarial attacks. However, HDFB is insecure if its defense parameters are exposed to attackers.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"30 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IAFPN: interlayer enhancement and multilayer fusion network for object detection IAFPN：用于物体检测的层间增强和多层融合网络

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-08 DOI: 10.1007/s00138-024-01577-5

Zhicheng Li, Chao Yang, Longyu Jiang

Feature pyramid network (FPN) improves object detection performance by means of top-down multilevel feature fusion. However, the current FPN-based methods have not effectively utilized the interlayer features to suppress the aliasing effects in the feature downward fusion process. We propose an interlayer attention feature pyramid network that attempts to integrate attention gates into FPN through interlayer enhancement to establish the correlation between context and model, thereby highlighting the salient region of each layer and suppressing the aliasing effects. Moreover, in order to avoid feature dilution in the feature downward fusion process and inability of multilayer features to utilize each other, simplified non-local algorithm is used in the multilayer fusion module to fuse and enhance the multiscale features. A comprehensive analysis of MS COCO and PASCAL VOC benchmarks demonstrate that our network achieves precise object localization and also outperforms current FPN-based object detection algorithms.

特征金字塔网络（FPN）通过自上而下的多层次特征融合提高了物体检测性能。然而，目前基于 FPN 的方法并未有效利用层间特征来抑制特征向下融合过程中的混叠效应。我们提出了一种层间注意力特征金字塔网络，试图通过层间增强将注意力门集成到 FPN 中，建立上下文与模型之间的相关性，从而突出各层的突出区域，抑制混叠效应。此外，为了避免特征向下融合过程中的特征稀释和多层特征无法相互利用，在多层融合模块中采用了简化的非局部算法来融合和增强多尺度特征。对 MS COCO 和 PASCAL VOC 基准的综合分析表明，我们的网络实现了精确的目标定位，其性能也优于目前基于 FPN 的目标检测算法。

引用次数: 0

GOA-net: generic occlusion aware networks for visual tracking GOA-网络：用于视觉跟踪的通用遮挡感知网络

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-07 DOI: 10.1007/s00138-024-01580-w

Mohana Murali Dasari, Rama Krishna Gorthi

Occlusion is a frequent phenomenon that hinders the task of visual object tracking. Since occlusion can be from any object and in any shape, data augmentation techniques will not greatly help identify or mitigate the tracker loss. Some of the existing works deal with occlusion only in an unsupervised manner. This paper proposes a generic deep learning framework for identifying occlusion in a given frame by formulating it as a supervised classification task for the first time. The proposed architecture introduces an “occlusion classification” branch into supervised trackers. This branch helps in the effective learning of features and also provides occlusion status for each frame. A metric is proposed to measure the performance of trackers under occlusion at frame level. The efficacy of the proposed framework is demonstrated on two supervised tracking paradigms: One is from the most commonly used Siamese region proposal class of trackers, and another from the emerging transformer-based trackers. This framework is tested on six diverse datasets (GOT-10k, LaSOT, OTB2015, TrackingNet, UAV123, and VOT2018), and it achieved significant improvements in performance over the corresponding baselines while performing on par with the state-of-the-art trackers. The contributions in this work are more generic, as any supervised tracker can easily adopt them.

遮挡是妨碍视觉物体跟踪任务的一种常见现象。由于遮挡可能来自任何物体，也可能是任何形状的物体，因此数据增强技术对识别或减少跟踪器的损失不会有很大帮助。现有的一些作品只是以无监督的方式处理遮挡问题。本文首次提出了一种通用的深度学习框架，通过将其表述为有监督的分类任务，来识别给定帧中的闭塞。所提出的架构在有监督跟踪器中引入了 "闭塞分类 "分支。该分支有助于有效学习特征，还能提供每个帧的闭塞状态。我们还提出了一种衡量标准，用于衡量跟踪器在帧级闭塞情况下的性能。我们在两个监督跟踪范例上演示了所提出框架的功效：一个是最常用的连体区域建议类跟踪器，另一个是新兴的基于变换器的跟踪器。该框架在六个不同的数据集（GOT-10k、LaSOT、OTB2015、TrackingNet、UAV123 和 VOT2018）上进行了测试，其性能比相应的基线有了显著提高，同时与最先进的跟踪器性能相当。这项工作的贡献更具通用性，因为任何有监督跟踪器都可以轻松采用它们。

{"title":"GOA-net: generic occlusion aware networks for visual tracking","authors":"Mohana Murali Dasari, Rama Krishna Gorthi","doi":"10.1007/s00138-024-01580-w","DOIUrl":"https://doi.org/10.1007/s00138-024-01580-w","url":null,"abstract":"Occlusion is a frequent phenomenon that hinders the task of visual object tracking. Since occlusion can be from any object and in any shape, data augmentation techniques will not greatly help identify or mitigate the tracker loss. Some of the existing works deal with occlusion only in an unsupervised manner. This paper proposes a generic deep learning framework for identifying occlusion in a given frame by formulating it as a supervised classification task for the first time. The proposed architecture introduces an “occlusion classification” branch into supervised trackers. This branch helps in the effective learning of features and also provides occlusion status for each frame. A metric is proposed to measure the performance of trackers under occlusion at frame level. The efficacy of the proposed framework is demonstrated on two supervised tracking paradigms: One is from the most commonly used Siamese region proposal class of trackers, and another from the emerging transformer-based trackers. This framework is tested on six diverse datasets (GOT-10k, LaSOT, OTB2015, TrackingNet, UAV123, and VOT2018), and it achieved significant improvements in performance over the corresponding baselines while performing on par with the state-of-the-art trackers. The contributions in this work are more generic, as any supervised tracker can easily adopt them.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"38 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online camera auto-calibration appliable to road surveillance 适用于道路监控的在线摄像机自动校准功能

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-05 DOI: 10.1007/s00138-024-01576-6

Shusen Guo, Xianwen Yu, Yuejin Sha, Yifan Ju, Mingchen Zhu, Jiafu Wang

Camera calibration is an essential prerequisite for road surveillance applications, which determines the accuracy of obtaining three-dimensional spatial information from surveillance video. The common practice for calibration is collecting the correspondences between the object points and their projections on surveillance, which usually needs to operate the calibrator manually. However, complex traffic and calibrator requirement limit the applicability of existing methods to road scenes. This paper proposes an online camera auto-calibration method for road surveillance to overcome the above problem. It constructs a large-scale virtual checkerboard adopting the road information from surveillance video, in which the structural size of the checkerboard can be easily obtained in advance because of the standardization for road design. The position coordinates of checkerboard corners are used for calibrating camera parameters, which is designed as a “coarse-to-fine” two-step procedure to recover the camera intrinsic and extrinsic parameters efficiently. Experimental results based on real datasets demonstrate that the proposed approach can accurately estimate camera parameters without manual involvement or additional information input. It achieves competitive effects on road surveillance auto-calibration while having lower requirements and computational costs than the automatic state-of-the-art.

摄像机校准是道路监控应用的必要前提，它决定了从监控视频中获取三维空间信息的准确性。校准的常见做法是收集监控对象点与其投影之间的对应关系，通常需要手动操作校准器。然而，复杂的交通和校准器要求限制了现有方法对道路场景的适用性。本文提出了一种用于道路监控的在线摄像机自动校准方法，以克服上述问题。它利用监控视频中的道路信息构建了一个大尺度的虚拟棋盘，由于道路设计的标准化，棋盘的结构尺寸很容易提前获得。棋盘角的位置坐标用于校准摄像机参数，设计为 "从粗到细 "的两步程序，以有效恢复摄像机的内在和外在参数。基于真实数据集的实验结果表明，所提出的方法无需人工参与或额外的信息输入，就能准确估计摄像机参数。与最先进的自动校准方法相比，它的要求和计算成本更低，在道路监控自动校准方面取得了具有竞争力的效果。

{"title":"Online camera auto-calibration appliable to road surveillance","authors":"Shusen Guo, Xianwen Yu, Yuejin Sha, Yifan Ju, Mingchen Zhu, Jiafu Wang","doi":"10.1007/s00138-024-01576-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01576-6","url":null,"abstract":"Camera calibration is an essential prerequisite for road surveillance applications, which determines the accuracy of obtaining three-dimensional spatial information from surveillance video. The common practice for calibration is collecting the correspondences between the object points and their projections on surveillance, which usually needs to operate the calibrator manually. However, complex traffic and calibrator requirement limit the applicability of existing methods to road scenes. This paper proposes an online camera auto-calibration method for road surveillance to overcome the above problem. It constructs a large-scale virtual checkerboard adopting the road information from surveillance video, in which the structural size of the checkerboard can be easily obtained in advance because of the standardization for road design. The position coordinates of checkerboard corners are used for calibrating camera parameters, which is designed as a “coarse-to-fine” two-step procedure to recover the camera intrinsic and extrinsic parameters efficiently. Experimental results based on real datasets demonstrate that the proposed approach can accurately estimate camera parameters without manual involvement or additional information input. It achieves competitive effects on road surveillance auto-calibration while having lower requirements and computational costs than the automatic state-of-the-art.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"25 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141547677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tree-managed network ensembles for video prediction 用于视频预测的树状管理网络集合

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-04 DOI: 10.1007/s00138-024-01575-7

Everett Fall, Kai-Wei Chang, Liang-Gee Chen

This paper presents an innovative approach that leverages a tree structure to effectively manage a large ensemble of neural networks for tackling complex video prediction tasks. Our proposed method introduces a novel technique for partitioning the function domain into simpler subsets, enabling piecewise learning by the ensemble. Seamlessly accessed by an accompanying tree structure with a time complexity of O(log(N)), this ensemble-tree framework progressively expands while training examples become more complex. The tree construction process incorporates a specialized algorithm that utilizes localized comparison functions, learned at each decision node. To evaluate the effectiveness of our method, we conducted experiments in two challenging scenarios: action-conditional video prediction in a 3D video game environment and error detection in real-world 3D printing scenarios. Our approach consistently outperformed existing methods by a significant margin across various experiments. Additionally, we introduce a new evaluation methodology for long-term video prediction tasks, which demonstrates improved alignment with qualitative observations. The results highlight the efficacy and superiority of our ensemble-tree approach in addressing complex video prediction challenges.

本文提出了一种创新方法，利用树状结构有效管理大型神经网络集合，以处理复杂的视频预测任务。我们提出的方法引入了一种新技术，可将功能域划分为更简单的子集，从而实现集合的分片学习。该集合树框架可通过时间复杂度为 O（log(N)）的配套树结构无缝访问，并随着训练示例的复杂程度增加而逐步扩展。树的构建过程采用了一种专门的算法，利用在每个决策节点学习到的局部比较函数。为了评估我们方法的有效性，我们在两个具有挑战性的场景中进行了实验：三维视频游戏环境中的动作条件视频预测和真实世界三维打印场景中的错误检测。在各种实验中，我们的方法始终远远优于现有方法。此外，我们还为长期视频预测任务引入了一种新的评估方法，该方法与定性观察的一致性得到了改善。这些结果凸显了我们的集合树方法在应对复杂视频预测挑战方面的有效性和优越性。

{"title":"Tree-managed network ensembles for video prediction","authors":"Everett Fall, Kai-Wei Chang, Liang-Gee Chen","doi":"10.1007/s00138-024-01575-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01575-7","url":null,"abstract":"This paper presents an innovative approach that leverages a tree structure to effectively manage a large ensemble of neural networks for tackling complex video prediction tasks. Our proposed method introduces a novel technique for partitioning the function domain into simpler subsets, enabling piecewise learning by the ensemble. Seamlessly accessed by an accompanying tree structure with a time complexity of O(log(N)), this ensemble-tree framework progressively expands while training examples become more complex. The tree construction process incorporates a specialized algorithm that utilizes localized comparison functions, learned at each decision node. To evaluate the effectiveness of our method, we conducted experiments in two challenging scenarios: action-conditional video prediction in a 3D video game environment and error detection in real-world 3D printing scenarios. Our approach consistently outperformed existing methods by a significant margin across various experiments. Additionally, we introduce a new evaluation methodology for long-term video prediction tasks, which demonstrates improved alignment with qualitative observations. The results highlight the efficacy and superiority of our ensemble-tree approach in addressing complex video prediction challenges.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141547676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Poly-cam: high resolution class activation map for convolutional neural networks Poly-cam：卷积神经网络的高分辨率类激活图谱

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-03 DOI: 10.1007/s00138-024-01567-7

Alexandre Englebert, Olivier Cornu, Christophe De Vleeschouwer

The demand for explainable AI continues to rise alongside advancements in deep learning technology. Existing methods such as convolutional neural networks often struggle to accurately pinpoint the image features justifying a network’s prediction due to low-resolution saliency maps (e.g., CAM), smooth visualizations from perturbation-based techniques, or numerous isolated peaky spots in gradient-based approaches. In response, our work seeks to merge information from earlier and later layers within the network to create high-resolution class activation maps that not only maintain a level of competitiveness with previous art in terms of insertion-deletion faithfulness metrics but also significantly surpass it regarding the precision in localizing class-specific features.

随着深度学习技术的不断进步，对可解释人工智能的需求也持续上升。卷积神经网络等现有方法往往难以准确定位图像特征，因为低分辨率的显著性图（如 CAM）、基于扰动技术的平滑可视化，或基于梯度的方法中大量孤立的峰值点，都会影响网络预测的准确性。对此，我们的研究试图融合网络中前层和后层的信息，创建高分辨率的类别激活图，不仅在插入-删除忠实度指标方面与前人的技术保持一定的竞争水平，而且在定位特定类别特征的精确度方面也大大超过前人。

引用次数: 0

Adversarial defence by learning differentiated feature representation in deep ensemble 通过在深度集合中学习差异化特征表示进行对抗性防御

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-01 DOI: 10.1007/s00138-024-01571-x

Xi Chen, Huang Wei, Wei Guo, Fan Zhang, Jiayu Du, Zhizhong Zhou

Deep learning models have been shown to be vulnerable to critical attacks under adversarial conditions. Attackers are able to generate powerful adversarial examples by searching for adversarial perturbations, without interfering with model training or directly modifying the model. This phenomenon indicates an endogenous problem in existing deep learning frameworks. Therefore, optimizing individual models for defense is often limited and can always be defeated by new attack methods. Ensemble defense has been shown to be effective in defending against adversarial attacks by combining diverse models. However, the problem of insufficient differentiation among existing models persists. Active defense in cyberspace security has successfully defended against unknown vulnerabilities by integrating subsystems with multiple different implementations to achieve a unified mission objective. Inspired by this, we propose exploring the feasibility of achieving model differentiation by changing the data features used in training individual models, as they are the core factor of functional implementation. We utilize several feature extraction methods to preprocess the data and train differentiated models based on these features. By generating adversarial perturbations to attack different models, we demonstrate that the feature representation of the data is highly resistant to adversarial perturbations. The entire ensemble is able to operate normally in an error-bearing environment.

研究表明，深度学习模型在对抗条件下很容易受到关键攻击。攻击者能够在不干扰模型训练或直接修改模型的情况下，通过搜索对抗性扰动生成强大的对抗性示例。这一现象表明，现有的深度学习框架存在内生性问题。因此，优化单个模型的防御往往是有限的，而且总是会被新的攻击方法击败。事实证明，通过组合不同的模型，集合防御可以有效抵御对抗性攻击。然而，现有模型之间差异化不足的问题依然存在。网络空间安全中的主动防御通过整合具有多种不同实现方式的子系统来实现统一的任务目标，成功抵御了未知漏洞的攻击。受此启发，我们提议探索通过改变用于训练单个模型的数据特征来实现模型差异化的可行性，因为数据特征是功能实现的核心因素。我们利用多种特征提取方法对数据进行预处理，并根据这些特征训练差异化模型。通过产生对抗性扰动来攻击不同的模型，我们证明了数据的特征表示对对抗性扰动具有很强的抵抗力。整个集合能够在有误差的环境中正常运行。

{"title":"Adversarial defence by learning differentiated feature representation in deep ensemble","authors":"Xi Chen, Huang Wei, Wei Guo, Fan Zhang, Jiayu Du, Zhizhong Zhou","doi":"10.1007/s00138-024-01571-x","DOIUrl":"https://doi.org/10.1007/s00138-024-01571-x","url":null,"abstract":"Deep learning models have been shown to be vulnerable to critical attacks under adversarial conditions. Attackers are able to generate powerful adversarial examples by searching for adversarial perturbations, without interfering with model training or directly modifying the model. This phenomenon indicates an endogenous problem in existing deep learning frameworks. Therefore, optimizing individual models for defense is often limited and can always be defeated by new attack methods. Ensemble defense has been shown to be effective in defending against adversarial attacks by combining diverse models. However, the problem of insufficient differentiation among existing models persists. Active defense in cyberspace security has successfully defended against unknown vulnerabilities by integrating subsystems with multiple different implementations to achieve a unified mission objective. Inspired by this, we propose exploring the feasibility of achieving model differentiation by changing the data features used in training individual models, as they are the core factor of functional implementation. We utilize several feature extraction methods to preprocess the data and train differentiated models based on these features. By generating adversarial perturbations to attack different models, we demonstrate that the feature representation of the data is highly resistant to adversarial perturbations. The entire ensemble is able to operate normally in an error-bearing environment.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"16 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards scanning electron microscopy image denoising: a state-of-the-art overview, benchmark, taxonomies, and future direction 实现扫描电子显微镜图像去噪：最新技术概述、基准、分类和未来方向

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-01 DOI: 10.1007/s00138-024-01573-9

Sheikh Shah Mohammad Motiur Rahman, Michel Salomon, Sounkalo Dembélé

Scanning electron microscope (SEM) enables imaging of micro-nano scale objects. It is an analytical tool widely used in the material, earth and life sciences. However, SEM images often suffer from high noise levels, influenced by factors such as dwell time, the time during which the electron beam remains per pixel during acquisition. Slower dwell times reduce noise but risk damaging the sample, while faster ones introduce uncertainty. To this end, the latest state-of-the-art denoising techniques must be explored. Experimentation is crucial to identify the most effective methods that balance noise reduction and sample preservation, ensuring high-quality SEM images with enhanced clarity and accuracy. A thorough analysis tracing the evolution of image denoising techniques was conducted, ranging from classical methods to deep learning approaches. A comprehensive taxonomy of this reverse problem solutions was established, detailing the developmental flow of these methods. Subsequently, the latest state-of-the-art techniques were identified and reviewed based on their reproducibility and the public availability of their source code. The selected techniques were then tested and investigated using scanning electron microscope images. After in-depth analysis and benchmarking, it is clear that the existing deep learning-based denoising techniques fall short in maintaining a balance between noise reduction and preserving crucial information for SEM images. Issues like information removal and over-smoothing have been identified. To address these constraints, there is a critical need for the development of SEM image denoising techniques that prioritize both noise reduction and information preservation. Additionally, one can see that the combination of several networks, such as the generative adversarial network and the convolutional neural network (CNN), known as BoostNet, or the vision transformer and the CNN, known as SCUNet, improves denoising performance. It is recommended to use blind techniques to denoise real noise while taking into account detail preservation and tackling excessive smoothing, particularly in the context of SEM. In the future the use of explainable AI will facilitate the debugging and the identification of these problems.

扫描电子显微镜（SEM）可对微纳米级物体进行成像。它是一种广泛应用于材料科学、地球科学和生命科学的分析工具。然而，扫描电子显微镜图像往往受驻留时间（即采集时电子束在每个像素上停留的时间）等因素的影响而出现高噪声。较慢的停留时间会降低噪声，但有可能损坏样品，而较快的停留时间则会带来不确定性。为此，必须探索最新的去噪技术。实验对于确定最有效的方法至关重要，这些方法能在减少噪音和保护样品之间取得平衡，确保高质量的扫描电镜图像具有更高的清晰度和准确性。从经典方法到深度学习方法，我们对图像去噪技术的演变进行了深入分析。对这种反向问题解决方案进行了全面分类，详细介绍了这些方法的发展流程。随后，根据这些技术的可重复性及其源代码的公开性，确定并审查了最新的先进技术。然后，利用扫描电子显微镜图像对所选技术进行了测试和研究。经过深入分析和基准测试后发现，现有的基于深度学习的去噪技术在保持 SEM 图像降噪和保留关键信息之间的平衡方面存在不足。已经发现了信息去除和过度平滑等问题。为了解决这些制约因素，亟需开发同时优先考虑降噪和保存信息的 SEM 图像去噪技术。此外，我们还可以看到，将生成式对抗网络和卷积神经网络（称为 BoostNet）或视觉转换器和卷积神经网络（称为 SCUNet）等多个网络结合起来，可以提高去噪性能。建议使用盲技术对真实噪声进行去噪，同时考虑到细节保留和解决过度平滑问题，特别是在 SEM 的情况下。未来，使用可解释的人工智能将有助于调试和识别这些问题。

{"title":"Towards scanning electron microscopy image denoising: a state-of-the-art overview, benchmark, taxonomies, and future direction","authors":"Sheikh Shah Mohammad Motiur Rahman, Michel Salomon, Sounkalo Dembélé","doi":"10.1007/s00138-024-01573-9","DOIUrl":"https://doi.org/10.1007/s00138-024-01573-9","url":null,"abstract":"Scanning electron microscope (SEM) enables imaging of micro-nano scale objects. It is an analytical tool widely used in the material, earth and life sciences. However, SEM images often suffer from high noise levels, influenced by factors such as dwell time, the time during which the electron beam remains per pixel during acquisition. Slower dwell times reduce noise but risk damaging the sample, while faster ones introduce uncertainty. To this end, the latest state-of-the-art denoising techniques must be explored. Experimentation is crucial to identify the most effective methods that balance noise reduction and sample preservation, ensuring high-quality SEM images with enhanced clarity and accuracy. A thorough analysis tracing the evolution of image denoising techniques was conducted, ranging from classical methods to deep learning approaches. A comprehensive taxonomy of this reverse problem solutions was established, detailing the developmental flow of these methods. Subsequently, the latest state-of-the-art techniques were identified and reviewed based on their reproducibility and the public availability of their source code. The selected techniques were then tested and investigated using scanning electron microscope images. After in-depth analysis and benchmarking, it is clear that the existing deep learning-based denoising techniques fall short in maintaining a balance between noise reduction and preserving crucial information for SEM images. Issues like information removal and over-smoothing have been identified. To address these constraints, there is a critical need for the development of SEM image denoising techniques that prioritize both noise reduction and information preservation. Additionally, one can see that the combination of several networks, such as the generative adversarial network and the convolutional neural network (CNN), known as BoostNet, or the vision transformer and the CNN, known as SCUNet, improves denoising performance. It is recommended to use blind techniques to denoise real noise while taking into account detail preservation and tackling excessive smoothing, particularly in the context of SEM. In the future the use of explainable AI will facilitate the debugging and the identification of these problems.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"28 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Motioninsights: real-time object tracking in streaming video Motioninsights：流媒体视频中的实时物体跟踪

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-06-27 DOI: 10.1007/s00138-024-01570-y

Dimitrios Banelas, Euripides G. M. Petrakis

MotionInsights facilitates object detection and tracking from multiple video streams in real-time. Leveraging the distributed stream processing capabilities of Apache Flink and Apache Kafka (as an intermediate message broker), the system models video processing as a data flow stream processing pipeline. Each video frame is split into smaller blocks, which are dispatched to be processed in parallel by a number of Flink operators. In the first stage, each block undergoes background subtraction and component labeling. The connected components from each frame are grouped, and the eligible components are merged into objects. In the last stage of the pipeline, all objects from each frame are concentrated to produce the trajectory of each object. The Flink application is deployed as a Kubernetes cluster in the Google Cloud Platform. Experimenting in a Flink cluster with 7 machines, revealed that MotionInsights achieves up to 6 times speedup compared to a monolithic (nonparallel) implementation while providing accurate trajectory patterns. The highest (i.e., more than 6 times) speed-up was observed with video streams of the highest resolution. Compared to existing systems that use custom or proprietary architectures, MotionInsights is independent of the underlying hardware platform and can be deployed on common CPU architectures and the cloud.

MotionInsights 可实时从多个视频流中进行对象检测和跟踪。该系统利用 Apache Flink 和 Apache Kafka（作为中间消息代理）的分布式流处理功能，将视频处理模型化为数据流处理管道。每个视频帧都被分割成较小的块，由多个 Flink 操作员并行调度处理。在第一阶段，每个区块都要进行背景减除和组件标记。对每个帧中的连接组件进行分组，并将符合条件的组件合并为对象。在流水线的最后阶段，每个帧中的所有对象被集中起来，以生成每个对象的轨迹。Flink 应用程序作为 Kubernetes 集群部署在谷歌云平台上。在拥有 7 台机器的 Flink 集群中进行的实验表明，与单机（非并行）实现相比，MotionInsights 的速度最多可提高 6 倍，同时还能提供精确的轨迹模式。在最高分辨率的视频流中观察到了最高（即超过 6 倍）的速度提升。与使用定制或专有架构的现有系统相比，MotionInsights 与底层硬件平台无关，可以部署在普通 CPU 架构和云上。

{"title":"Motioninsights: real-time object tracking in streaming video","authors":"Dimitrios Banelas, Euripides G. M. Petrakis","doi":"10.1007/s00138-024-01570-y","DOIUrl":"https://doi.org/10.1007/s00138-024-01570-y","url":null,"abstract":"MotionInsights facilitates object detection and tracking from multiple video streams in real-time. Leveraging the distributed stream processing capabilities of Apache Flink and Apache Kafka (as an intermediate message broker), the system models video processing as a data flow stream processing pipeline. Each video frame is split into smaller blocks, which are dispatched to be processed in parallel by a number of Flink operators. In the first stage, each block undergoes background subtraction and component labeling. The connected components from each frame are grouped, and the eligible components are merged into objects. In the last stage of the pipeline, all objects from each frame are concentrated to produce the trajectory of each object. The Flink application is deployed as a Kubernetes cluster in the Google Cloud Platform. Experimenting in a Flink cluster with 7 machines, revealed that MotionInsights achieves up to 6 times speedup compared to a monolithic (nonparallel) implementation while providing accurate trajectory patterns. The highest (i.e., more than 6 times) speed-up was observed with video streams of the highest resolution. Compared to existing systems that use custom or proprietary architectures, MotionInsights is independent of the underlying hardware platform and can be deployed on common CPU architectures and the cloud.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"34 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-modal framework for continuous and isolated hand gesture recognition utilizing movement epenthesis detection 利用运动外延检测进行连续和孤立手势识别的多模式框架

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-06-27 DOI: 10.1007/s00138-024-01565-9

Navneet Nayan, Debashis Ghosh, Pyari Mohan Pradhan

Gesture recognition, having multitudinous applications in the real world, is one of the core areas of research in the field of human-computer interaction. In this paper, we propose a novel method for isolated and continuous hand gesture recognition utilizing the movement epenthesis detection and removal. For this purpose, the present work detects and removes the movement epenthesis frames from the isolated and continuous hand gesture videos. In this paper, we have also proposed a novel modality based on the temporal difference that extracts hand regions, removes gesture irrelevant factors and provides temporal information contained in the hand gesture videos. Using the proposed modality and other modalities such as the RGB modality, depth modality and segmented hand modality, features are extracted using Googlenet Caffe Model. Next, we derive a set of discriminative features by fusing the acquired features that form a feature vector representing the sign gesture in question. We have designed and used a Bidirectional Long Short-Term Memory Network (Bi-LSTM) for classification purpose. To test the efficacy of our proposed work, we applied our method on various publicly available continuous and isolated hand gesture datasets like ChaLearn LAP IsoGD, ChaLearn LAP ConGD, IPN Hand, and NVGesture. We observe in our experiments that our proposed method performs exceptionally well with several individual modalities as well as combination of modalities of these datasets. The combined effect of the proposed modality and movement epenthesis frames removal led to significant improvement in gesture recognition accuracy and considerable reduction in computational burden. Thus the obtained results advocate our proposed approach to be at par with the existing state-of-the-art methods.

手势识别在现实世界中应用广泛，是人机交互领域的核心研究领域之一。在本文中，我们提出了一种利用运动外显检测和移除进行孤立和连续手势识别的新方法。为此，本文从孤立和连续手势视频中检测并移除运动外显帧。在本文中，我们还提出了一种基于时间差的新型模态，该模态可提取手部区域、去除手势无关因素并提供手势视频中包含的时间信息。利用提出的模态和其他模态（如 RGB 模态、深度模态和分割手部模态），我们使用 Googlenet Caffe 模型提取了特征。接下来，我们通过融合所获得的特征，形成代表相关手势的特征向量，从而得出一组判别特征。我们设计并使用了双向长短期记忆网络（Bi-LSTM）进行分类。为了测试我们提出的方法的有效性，我们在各种公开的连续和孤立手势数据集上应用了我们的方法，如 ChaLearn LAP IsoGD、ChaLearn LAP ConGD、IPN Hand 和 NVGesture。我们在实验中观察到，我们提出的方法在这些数据集的几种单独模态和模态组合中都表现出色。所提议的模式和运动外显帧移除的综合效应显著提高了手势识别的准确性，并大大减轻了计算负担。因此，所获得的结果表明，我们提出的方法与现有的最先进方法不相上下。

{"title":"A multi-modal framework for continuous and isolated hand gesture recognition utilizing movement epenthesis detection","authors":"Navneet Nayan, Debashis Ghosh, Pyari Mohan Pradhan","doi":"10.1007/s00138-024-01565-9","DOIUrl":"https://doi.org/10.1007/s00138-024-01565-9","url":null,"abstract":"Gesture recognition, having multitudinous applications in the real world, is one of the core areas of research in the field of human-computer interaction. In this paper, we propose a novel method for isolated and continuous hand gesture recognition utilizing the movement epenthesis detection and removal. For this purpose, the present work detects and removes the movement epenthesis frames from the isolated and continuous hand gesture videos. In this paper, we have also proposed a novel modality based on the temporal difference that extracts hand regions, removes gesture irrelevant factors and provides temporal information contained in the hand gesture videos. Using the proposed modality and other modalities such as the RGB modality, depth modality and segmented hand modality, features are extracted using Googlenet Caffe Model. Next, we derive a set of discriminative features by fusing the acquired features that form a feature vector representing the sign gesture in question. We have designed and used a Bidirectional Long Short-Term Memory Network (Bi-LSTM) for classification purpose. To test the efficacy of our proposed work, we applied our method on various publicly available continuous and isolated hand gesture datasets like ChaLearn LAP IsoGD, ChaLearn LAP ConGD, IPN Hand, and NVGesture. We observe in our experiments that our proposed method performs exceptionally well with several individual modalities as well as combination of modalities of these datasets. The combined effect of the proposed modality and movement epenthesis frames removal led to significant improvement in gesture recognition accuracy and considerable reduction in computational burden. Thus the obtained results advocate our proposed approach to be at par with the existing state-of-the-art methods.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"12 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0