Deep learning has been used in many computer-vision-based applications. However, deep neural networks are vulnerable to adversarial examples that have been crafted specifically to fool a system while being imperceptible to humans. In this paper, we propose a detection defense method based on heterogeneous denoising on foreground and background (HDFB). Since an image region that dominates to the output classification is usually sensitive to adversarial perturbations, HDFB focuses defense on the foreground region rather than the whole image. First, HDFB uses class activation map to segment examples into foreground and background regions. Second, the foreground and background are encoded to square patches. Third, the encoded foreground is zoomed in and out and is denoised in two scales. Subsequently, the encoded background is denoised once using bilateral filtering. After that, the denoised foreground and background patches are decoded. Finally, the decoded foreground and background are stitched together as a denoised sample for classification. If the classifications of the denoised and input images are different, the input image is detected as an adversarial example. The comparison experiments are implemented on CIFAR-10 and MiniImageNet. The average detection rate (DR) against white-box attacks on the test sets of the two datasets is 86.4%. The average DR against black-box attacks on MiniImageNet is 88.4%. The experimental results suggest that HDFB shows high performance on adversarial examples and is robust against white-box and black-box adversarial attacks. However, HDFB is insecure if its defense parameters are exposed to attackers.
{"title":"An adversarial sample detection method based on heterogeneous denoising","authors":"Lifang Zhu, Chao Liu, Zhiqiang Zhang, Yifan Cheng, Biao Jie, Xintao Ding","doi":"10.1007/s00138-024-01579-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01579-3","url":null,"abstract":"<p>Deep learning has been used in many computer-vision-based applications. However, deep neural networks are vulnerable to adversarial examples that have been crafted specifically to fool a system while being imperceptible to humans. In this paper, we propose a detection defense method based on heterogeneous denoising on foreground and background (HDFB). Since an image region that dominates to the output classification is usually sensitive to adversarial perturbations, HDFB focuses defense on the foreground region rather than the whole image. First, HDFB uses class activation map to segment examples into foreground and background regions. Second, the foreground and background are encoded to square patches. Third, the encoded foreground is zoomed in and out and is denoised in two scales. Subsequently, the encoded background is denoised once using bilateral filtering. After that, the denoised foreground and background patches are decoded. Finally, the decoded foreground and background are stitched together as a denoised sample for classification. If the classifications of the denoised and input images are different, the input image is detected as an adversarial example. The comparison experiments are implemented on CIFAR-10 and MiniImageNet. The average detection rate (DR) against white-box attacks on the test sets of the two datasets is 86.4%. The average DR against black-box attacks on MiniImageNet is 88.4%. The experimental results suggest that HDFB shows high performance on adversarial examples and is robust against white-box and black-box adversarial attacks. However, HDFB is insecure if its defense parameters are exposed to attackers.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s00138-024-01577-5
Zhicheng Li, Chao Yang, Longyu Jiang
Feature pyramid network (FPN) improves object detection performance by means of top-down multilevel feature fusion. However, the current FPN-based methods have not effectively utilized the interlayer features to suppress the aliasing effects in the feature downward fusion process. We propose an interlayer attention feature pyramid network that attempts to integrate attention gates into FPN through interlayer enhancement to establish the correlation between context and model, thereby highlighting the salient region of each layer and suppressing the aliasing effects. Moreover, in order to avoid feature dilution in the feature downward fusion process and inability of multilayer features to utilize each other, simplified non-local algorithm is used in the multilayer fusion module to fuse and enhance the multiscale features. A comprehensive analysis of MS COCO and PASCAL VOC benchmarks demonstrate that our network achieves precise object localization and also outperforms current FPN-based object detection algorithms.
{"title":"IAFPN: interlayer enhancement and multilayer fusion network for object detection","authors":"Zhicheng Li, Chao Yang, Longyu Jiang","doi":"10.1007/s00138-024-01577-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01577-5","url":null,"abstract":"<p>Feature pyramid network (FPN) improves object detection performance by means of top-down multilevel feature fusion. However, the current FPN-based methods have not effectively utilized the interlayer features to suppress the aliasing effects in the feature downward fusion process. We propose an interlayer attention feature pyramid network that attempts to integrate attention gates into FPN through interlayer enhancement to establish the correlation between context and model, thereby highlighting the salient region of each layer and suppressing the aliasing effects. Moreover, in order to avoid feature dilution in the feature downward fusion process and inability of multilayer features to utilize each other, simplified non-local algorithm is used in the multilayer fusion module to fuse and enhance the multiscale features. A comprehensive analysis of MS COCO and PASCAL VOC benchmarks demonstrate that our network achieves precise object localization and also outperforms current FPN-based object detection algorithms.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-07DOI: 10.1007/s00138-024-01580-w
Mohana Murali Dasari, Rama Krishna Gorthi
Occlusion is a frequent phenomenon that hinders the task of visual object tracking. Since occlusion can be from any object and in any shape, data augmentation techniques will not greatly help identify or mitigate the tracker loss. Some of the existing works deal with occlusion only in an unsupervised manner. This paper proposes a generic deep learning framework for identifying occlusion in a given frame by formulating it as a supervised classification task for the first time. The proposed architecture introduces an “occlusion classification” branch into supervised trackers. This branch helps in the effective learning of features and also provides occlusion status for each frame. A metric is proposed to measure the performance of trackers under occlusion at frame level. The efficacy of the proposed framework is demonstrated on two supervised tracking paradigms: One is from the most commonly used Siamese region proposal class of trackers, and another from the emerging transformer-based trackers. This framework is tested on six diverse datasets (GOT-10k, LaSOT, OTB2015, TrackingNet, UAV123, and VOT2018), and it achieved significant improvements in performance over the corresponding baselines while performing on par with the state-of-the-art trackers. The contributions in this work are more generic, as any supervised tracker can easily adopt them.
{"title":"GOA-net: generic occlusion aware networks for visual tracking","authors":"Mohana Murali Dasari, Rama Krishna Gorthi","doi":"10.1007/s00138-024-01580-w","DOIUrl":"https://doi.org/10.1007/s00138-024-01580-w","url":null,"abstract":"<p><i>Occlusion</i> is a frequent phenomenon that hinders the task of visual object tracking. Since occlusion can be from any object and in any shape, data augmentation techniques will not greatly help identify or mitigate the tracker loss. Some of the existing works deal with occlusion only in an unsupervised manner. This paper proposes a generic deep learning framework for identifying occlusion in a given frame by formulating it as a supervised classification task for the first time. The proposed architecture introduces an “occlusion classification” branch into supervised trackers. This branch helps in the effective learning of features and also provides occlusion status for each frame. A metric is proposed to measure the performance of trackers under occlusion at frame level. The efficacy of the proposed framework is demonstrated on two supervised tracking paradigms: One is from the most commonly used Siamese region proposal class of trackers, and another from the emerging transformer-based trackers. This framework is tested on six diverse datasets (GOT-10k, LaSOT, OTB2015, TrackingNet, UAV123, and VOT2018), and it achieved significant improvements in performance over the corresponding baselines while performing on par with the state-of-the-art trackers. The contributions in this work are more generic, as any supervised tracker can easily adopt them.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camera calibration is an essential prerequisite for road surveillance applications, which determines the accuracy of obtaining three-dimensional spatial information from surveillance video. The common practice for calibration is collecting the correspondences between the object points and their projections on surveillance, which usually needs to operate the calibrator manually. However, complex traffic and calibrator requirement limit the applicability of existing methods to road scenes. This paper proposes an online camera auto-calibration method for road surveillance to overcome the above problem. It constructs a large-scale virtual checkerboard adopting the road information from surveillance video, in which the structural size of the checkerboard can be easily obtained in advance because of the standardization for road design. The position coordinates of checkerboard corners are used for calibrating camera parameters, which is designed as a “coarse-to-fine” two-step procedure to recover the camera intrinsic and extrinsic parameters efficiently. Experimental results based on real datasets demonstrate that the proposed approach can accurately estimate camera parameters without manual involvement or additional information input. It achieves competitive effects on road surveillance auto-calibration while having lower requirements and computational costs than the automatic state-of-the-art.
{"title":"Online camera auto-calibration appliable to road surveillance","authors":"Shusen Guo, Xianwen Yu, Yuejin Sha, Yifan Ju, Mingchen Zhu, Jiafu Wang","doi":"10.1007/s00138-024-01576-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01576-6","url":null,"abstract":"<p>Camera calibration is an essential prerequisite for road surveillance applications, which determines the accuracy of obtaining three-dimensional spatial information from surveillance video. The common practice for calibration is collecting the correspondences between the object points and their projections on surveillance, which usually needs to operate the calibrator manually. However, complex traffic and calibrator requirement limit the applicability of existing methods to road scenes. This paper proposes an online camera auto-calibration method for road surveillance to overcome the above problem. It constructs a large-scale virtual checkerboard adopting the road information from surveillance video, in which the structural size of the checkerboard can be easily obtained in advance because of the standardization for road design. The position coordinates of checkerboard corners are used for calibrating camera parameters, which is designed as a “coarse-to-fine” two-step procedure to recover the camera intrinsic and extrinsic parameters efficiently. Experimental results based on real datasets demonstrate that the proposed approach can accurately estimate camera parameters without manual involvement or additional information input. It achieves competitive effects on road surveillance auto-calibration while having lower requirements and computational costs than the automatic state-of-the-art.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141547677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-04DOI: 10.1007/s00138-024-01575-7
Everett Fall, Kai-Wei Chang, Liang-Gee Chen
This paper presents an innovative approach that leverages a tree structure to effectively manage a large ensemble of neural networks for tackling complex video prediction tasks. Our proposed method introduces a novel technique for partitioning the function domain into simpler subsets, enabling piecewise learning by the ensemble. Seamlessly accessed by an accompanying tree structure with a time complexity of O(log(N)), this ensemble-tree framework progressively expands while training examples become more complex. The tree construction process incorporates a specialized algorithm that utilizes localized comparison functions, learned at each decision node. To evaluate the effectiveness of our method, we conducted experiments in two challenging scenarios: action-conditional video prediction in a 3D video game environment and error detection in real-world 3D printing scenarios. Our approach consistently outperformed existing methods by a significant margin across various experiments. Additionally, we introduce a new evaluation methodology for long-term video prediction tasks, which demonstrates improved alignment with qualitative observations. The results highlight the efficacy and superiority of our ensemble-tree approach in addressing complex video prediction challenges.
{"title":"Tree-managed network ensembles for video prediction","authors":"Everett Fall, Kai-Wei Chang, Liang-Gee Chen","doi":"10.1007/s00138-024-01575-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01575-7","url":null,"abstract":"<p>This paper presents an innovative approach that leverages a tree structure to effectively manage a large ensemble of neural networks for tackling complex video prediction tasks. Our proposed method introduces a novel technique for partitioning the function domain into simpler subsets, enabling piecewise learning by the ensemble. Seamlessly accessed by an accompanying tree structure with a time complexity of O(log(N)), this ensemble-tree framework progressively expands while training examples become more complex. The tree construction process incorporates a specialized algorithm that utilizes localized comparison functions, learned at each decision node. To evaluate the effectiveness of our method, we conducted experiments in two challenging scenarios: action-conditional video prediction in a 3D video game environment and error detection in real-world 3D printing scenarios. Our approach consistently outperformed existing methods by a significant margin across various experiments. Additionally, we introduce a new evaluation methodology for long-term video prediction tasks, which demonstrates improved alignment with qualitative observations. The results highlight the efficacy and superiority of our ensemble-tree approach in addressing complex video prediction challenges.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141547676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1007/s00138-024-01567-7
Alexandre Englebert, Olivier Cornu, Christophe De Vleeschouwer
The demand for explainable AI continues to rise alongside advancements in deep learning technology. Existing methods such as convolutional neural networks often struggle to accurately pinpoint the image features justifying a network’s prediction due to low-resolution saliency maps (e.g., CAM), smooth visualizations from perturbation-based techniques, or numerous isolated peaky spots in gradient-based approaches. In response, our work seeks to merge information from earlier and later layers within the network to create high-resolution class activation maps that not only maintain a level of competitiveness with previous art in terms of insertion-deletion faithfulness metrics but also significantly surpass it regarding the precision in localizing class-specific features.
{"title":"Poly-cam: high resolution class activation map for convolutional neural networks","authors":"Alexandre Englebert, Olivier Cornu, Christophe De Vleeschouwer","doi":"10.1007/s00138-024-01567-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01567-7","url":null,"abstract":"<p>The demand for explainable AI continues to rise alongside advancements in deep learning technology. Existing methods such as convolutional neural networks often struggle to accurately pinpoint the image features justifying a network’s prediction due to low-resolution saliency maps (e.g., CAM), smooth visualizations from perturbation-based techniques, or numerous isolated peaky spots in gradient-based approaches. In response, our work seeks to merge information from earlier and later layers within the network to create high-resolution class activation maps that not only maintain a level of competitiveness with previous art in terms of insertion-deletion faithfulness metrics but also significantly surpass it regarding the precision in localizing class-specific features.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1007/s00138-024-01583-7
Xi Chen, Wei Huang, Wei Guo, Fan Zhang, Jiayu Du, Zhizhong Zhou
{"title":"Correction: Adversarial defence by learning differentiated feature representation in deep ensemble","authors":"Xi Chen, Wei Huang, Wei Guo, Fan Zhang, Jiayu Du, Zhizhong Zhou","doi":"10.1007/s00138-024-01583-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01583-7","url":null,"abstract":"","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1007/s00138-024-01571-x
Xi Chen, Huang Wei, Wei Guo, Fan Zhang, Jiayu Du, Zhizhong Zhou
Deep learning models have been shown to be vulnerable to critical attacks under adversarial conditions. Attackers are able to generate powerful adversarial examples by searching for adversarial perturbations, without interfering with model training or directly modifying the model. This phenomenon indicates an endogenous problem in existing deep learning frameworks. Therefore, optimizing individual models for defense is often limited and can always be defeated by new attack methods. Ensemble defense has been shown to be effective in defending against adversarial attacks by combining diverse models. However, the problem of insufficient differentiation among existing models persists. Active defense in cyberspace security has successfully defended against unknown vulnerabilities by integrating subsystems with multiple different implementations to achieve a unified mission objective. Inspired by this, we propose exploring the feasibility of achieving model differentiation by changing the data features used in training individual models, as they are the core factor of functional implementation. We utilize several feature extraction methods to preprocess the data and train differentiated models based on these features. By generating adversarial perturbations to attack different models, we demonstrate that the feature representation of the data is highly resistant to adversarial perturbations. The entire ensemble is able to operate normally in an error-bearing environment.
{"title":"Adversarial defence by learning differentiated feature representation in deep ensemble","authors":"Xi Chen, Huang Wei, Wei Guo, Fan Zhang, Jiayu Du, Zhizhong Zhou","doi":"10.1007/s00138-024-01571-x","DOIUrl":"https://doi.org/10.1007/s00138-024-01571-x","url":null,"abstract":"<p>Deep learning models have been shown to be vulnerable to critical attacks under adversarial conditions. Attackers are able to generate powerful adversarial examples by searching for adversarial perturbations, without interfering with model training or directly modifying the model. This phenomenon indicates an endogenous problem in existing deep learning frameworks. Therefore, optimizing individual models for defense is often limited and can always be defeated by new attack methods. Ensemble defense has been shown to be effective in defending against adversarial attacks by combining diverse models. However, the problem of insufficient differentiation among existing models persists. Active defense in cyberspace security has successfully defended against unknown vulnerabilities by integrating subsystems with multiple different implementations to achieve a unified mission objective. Inspired by this, we propose exploring the feasibility of achieving model differentiation by changing the data features used in training individual models, as they are the core factor of functional implementation. We utilize several feature extraction methods to preprocess the data and train differentiated models based on these features. By generating adversarial perturbations to attack different models, we demonstrate that the feature representation of the data is highly resistant to adversarial perturbations. The entire ensemble is able to operate normally in an error-bearing environment.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1007/s00138-024-01573-9
Sheikh Shah Mohammad Motiur Rahman, Michel Salomon, Sounkalo Dembélé
Scanning electron microscope (SEM) enables imaging of micro-nano scale objects. It is an analytical tool widely used in the material, earth and life sciences. However, SEM images often suffer from high noise levels, influenced by factors such as dwell time, the time during which the electron beam remains per pixel during acquisition. Slower dwell times reduce noise but risk damaging the sample, while faster ones introduce uncertainty. To this end, the latest state-of-the-art denoising techniques must be explored. Experimentation is crucial to identify the most effective methods that balance noise reduction and sample preservation, ensuring high-quality SEM images with enhanced clarity and accuracy. A thorough analysis tracing the evolution of image denoising techniques was conducted, ranging from classical methods to deep learning approaches. A comprehensive taxonomy of this reverse problem solutions was established, detailing the developmental flow of these methods. Subsequently, the latest state-of-the-art techniques were identified and reviewed based on their reproducibility and the public availability of their source code. The selected techniques were then tested and investigated using scanning electron microscope images. After in-depth analysis and benchmarking, it is clear that the existing deep learning-based denoising techniques fall short in maintaining a balance between noise reduction and preserving crucial information for SEM images. Issues like information removal and over-smoothing have been identified. To address these constraints, there is a critical need for the development of SEM image denoising techniques that prioritize both noise reduction and information preservation. Additionally, one can see that the combination of several networks, such as the generative adversarial network and the convolutional neural network (CNN), known as BoostNet, or the vision transformer and the CNN, known as SCUNet, improves denoising performance. It is recommended to use blind techniques to denoise real noise while taking into account detail preservation and tackling excessive smoothing, particularly in the context of SEM. In the future the use of explainable AI will facilitate the debugging and the identification of these problems.
扫描电子显微镜(SEM)可对微纳米级物体进行成像。它是一种广泛应用于材料科学、地球科学和生命科学的分析工具。然而,扫描电子显微镜图像往往受驻留时间(即采集时电子束在每个像素上停留的时间)等因素的影响而出现高噪声。较慢的停留时间会降低噪声,但有可能损坏样品,而较快的停留时间则会带来不确定性。为此,必须探索最新的去噪技术。实验对于确定最有效的方法至关重要,这些方法能在减少噪音和保护样品之间取得平衡,确保高质量的扫描电镜图像具有更高的清晰度和准确性。从经典方法到深度学习方法,我们对图像去噪技术的演变进行了深入分析。对这种反向问题解决方案进行了全面分类,详细介绍了这些方法的发展流程。随后,根据这些技术的可重复性及其源代码的公开性,确定并审查了最新的先进技术。然后,利用扫描电子显微镜图像对所选技术进行了测试和研究。经过深入分析和基准测试后发现,现有的基于深度学习的去噪技术在保持 SEM 图像降噪和保留关键信息之间的平衡方面存在不足。已经发现了信息去除和过度平滑等问题。为了解决这些制约因素,亟需开发同时优先考虑降噪和保存信息的 SEM 图像去噪技术。此外,我们还可以看到,将生成式对抗网络和卷积神经网络(称为 BoostNet)或视觉转换器和卷积神经网络(称为 SCUNet)等多个网络结合起来,可以提高去噪性能。建议使用盲技术对真实噪声进行去噪,同时考虑到细节保留和解决过度平滑问题,特别是在 SEM 的情况下。未来,使用可解释的人工智能将有助于调试和识别这些问题。
{"title":"Towards scanning electron microscopy image denoising: a state-of-the-art overview, benchmark, taxonomies, and future direction","authors":"Sheikh Shah Mohammad Motiur Rahman, Michel Salomon, Sounkalo Dembélé","doi":"10.1007/s00138-024-01573-9","DOIUrl":"https://doi.org/10.1007/s00138-024-01573-9","url":null,"abstract":"<p>Scanning electron microscope (SEM) enables imaging of micro-nano scale objects. It is an analytical tool widely used in the material, earth and life sciences. However, SEM images often suffer from high noise levels, influenced by factors such as dwell time, the time during which the electron beam remains per pixel during acquisition. Slower dwell times reduce noise but risk damaging the sample, while faster ones introduce uncertainty. To this end, the latest state-of-the-art denoising techniques must be explored. Experimentation is crucial to identify the most effective methods that balance noise reduction and sample preservation, ensuring high-quality SEM images with enhanced clarity and accuracy. A thorough analysis tracing the evolution of image denoising techniques was conducted, ranging from classical methods to deep learning approaches. A comprehensive taxonomy of this reverse problem solutions was established, detailing the developmental flow of these methods. Subsequently, the latest state-of-the-art techniques were identified and reviewed based on their reproducibility and the public availability of their source code. The selected techniques were then tested and investigated using scanning electron microscope images. After in-depth analysis and benchmarking, it is clear that the existing deep learning-based denoising techniques fall short in maintaining a balance between noise reduction and preserving crucial information for SEM images. Issues like information removal and over-smoothing have been identified. To address these constraints, there is a critical need for the development of SEM image denoising techniques that prioritize both noise reduction and information preservation. Additionally, one can see that the combination of several networks, such as the generative adversarial network and the convolutional neural network (CNN), known as BoostNet, or the vision transformer and the CNN, known as SCUNet, improves denoising performance. It is recommended to use blind techniques to denoise real noise while taking into account detail preservation and tackling excessive smoothing, particularly in the context of SEM. In the future the use of explainable AI will facilitate the debugging and the identification of these problems.\u0000</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1007/s00138-024-01584-6
Karim Slimani, Brahim Tamadazte, Catherine Achard
{"title":"Rocnet: 3D robust registration of points clouds using deep learning","authors":"Karim Slimani, Brahim Tamadazte, Catherine Achard","doi":"10.1007/s00138-024-01584-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01584-6","url":null,"abstract":"","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141689602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}