Pub Date : 2024-07-08DOI: 10.1016/j.patrec.2024.07.005
Ruben van Heusden, Maarten Marx
The Panoptic Quality metric, developed by Kirillov et al. in 2019, makes object-level precision, recall and F1 measures available for evaluating image segmentation, and more generally any partitioning task, against a gold standard. Panoptic Quality is based on partial isomorphisms between hypothesized and true segmentations. Kirillov et al. desire that functions defining these one-to-one matchings should be simple, interpretable and effectively computable. They show that for and , true and hypothesized segments, the condition stating that there are more correct than wrongly predicted pixels, formalized as or equivalently as has these properties. We show that a weaker function, requiring that more than half of the pixels in the hypothesized segment are in the true segment and vice-versa, formalized as and , is not only sufficient but also necessary. With a small proviso, every function defining a partial isomorphism satisfies this condition. We theoretically and empirically compare the two conditions.
Kirillov 等人于 2019 年开发了 Panoptic Quality 指标,该指标提供了对象级精度、召回率和 F1 度量,用于对照黄金标准评估图像分割,以及更广泛的任何分割任务。Panoptic Quality 基于假设分割与真实分割之间的部分同构。Kirillov 等人希望定义这些一对一匹配的函数应该简单、可解释且可有效计算。他们证明,对于 t 和 h(真实分割和假设分割),说明正确预测像素多于错误预测像素的条件(形式化为 IoU(t,h)>.5,或等价为 |t∩h|>.5|t∪h|)具有这些特性。我们证明了一个较弱的函数,即要求假设区段中一半以上的像素在真实区段中,反之亦然,形式化为|t∩h|>.5|t|和|t∩h|>.5|h|,不仅是充分的,而且是必要的。只要有一个小条件,定义部分同构的每个函数都满足这个条件。我们从理论和经验上比较了这两个条件。
{"title":"A sharper definition of alignment for Panoptic Quality","authors":"Ruben van Heusden, Maarten Marx","doi":"10.1016/j.patrec.2024.07.005","DOIUrl":"10.1016/j.patrec.2024.07.005","url":null,"abstract":"<div><p>The Panoptic Quality metric, developed by Kirillov et al. in 2019, makes object-level precision, recall and F1 measures available for evaluating image segmentation, and more generally any partitioning task, against a gold standard. Panoptic Quality is based on partial isomorphisms between hypothesized and true segmentations. Kirillov et al. desire that functions defining these one-to-one matchings should be simple, interpretable and effectively computable. They show that for <span><math><mi>t</mi></math></span> and <span><math><mi>h</mi></math></span>, true and hypothesized segments, the condition stating that there are more correct than wrongly predicted pixels, formalized as <span><math><mrow><mi>I</mi><mi>o</mi><mi>U</mi><mrow><mo>(</mo><mi>t</mi><mo>,</mo><mi>h</mi><mo>)</mo></mrow><mo>></mo><mo>.</mo><mn>5</mn></mrow></math></span> or equivalently as <span><math><mrow><mo>|</mo><mi>t</mi><mo>∩</mo><mi>h</mi><mo>|</mo><mo>></mo><mo>.</mo><mn>5</mn><mo>|</mo><mi>t</mi><mo>∪</mo><mi>h</mi><mo>|</mo></mrow></math></span> has these properties. We show that a weaker function, requiring that more than half of the pixels in the hypothesized segment are in the true segment and vice-versa, formalized as <span><math><mrow><mo>|</mo><mi>t</mi><mo>∩</mo><mi>h</mi><mo>|</mo><mo>></mo><mo>.</mo><mn>5</mn><mo>|</mo><mi>t</mi><mo>|</mo></mrow></math></span> and <span><math><mrow><mo>|</mo><mi>t</mi><mo>∩</mo><mi>h</mi><mo>|</mo><mo>></mo><mo>.</mo><mn>5</mn><mo>|</mo><mi>h</mi><mo>|</mo></mrow></math></span>, is not only sufficient but also necessary. With a small proviso, every function defining a partial isomorphism satisfies this condition. We theoretically and empirically compare the two conditions.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 87-93"},"PeriodicalIF":3.9,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002083/pdfft?md5=bb6442127be088116923de392456ce0d&pid=1-s2.0-S0167865524002083-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1016/j.patrec.2024.06.033
Ivan Karpukhin , Stanislav Dereka , Sergey Kolesnikov
Classification tasks are typically evaluated based on accuracy. However, due to the discontinuous nature of accuracy, it cannot be directly optimized using gradient-based methods. The conventional approach involves minimizing surrogate losses such as cross-entropy or hinge loss, which may result in suboptimal performance. In this paper, we introduce a novel optimization technique that incorporates stochasticity into the model’s output and focuses on optimizing the expected accuracy, defined as the accuracy of the stochastic model. Comprehensive experimental evaluations demonstrate that our proposed optimization method significantly enhances performance across various classification tasks, including SVHN, CIFAR-10, CIFAR-100, and ImageNet.
{"title":"EXACT: How to train your accuracy","authors":"Ivan Karpukhin , Stanislav Dereka , Sergey Kolesnikov","doi":"10.1016/j.patrec.2024.06.033","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.033","url":null,"abstract":"<div><p>Classification tasks are typically evaluated based on accuracy. However, due to the discontinuous nature of accuracy, it cannot be directly optimized using gradient-based methods. The conventional approach involves minimizing surrogate losses such as cross-entropy or hinge loss, which may result in suboptimal performance. In this paper, we introduce a novel optimization technique that incorporates stochasticity into the model’s output and focuses on optimizing the expected accuracy, defined as the accuracy of the stochastic model. Comprehensive experimental evaluations demonstrate that our proposed optimization method significantly enhances performance across various classification tasks, including SVHN, CIFAR-10, CIFAR-100, and ImageNet.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 23-30"},"PeriodicalIF":3.9,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1016/j.patrec.2024.06.031
Yuanhao Yue , Laixiang Shi , Zheng Zheng , Long Chen , Zhongyuan Wang , Qin Zou
Gait recognition is a form of identity verification that can be performed over long distances without requiring the subject’s cooperation, making it particularly valuable for applications such as access control, surveillance, and criminal investigation. The essence of gait lies in the motion dynamics of a walking individual. Accurate gait-motion estimation is crucial for high-performance gait recognition. In this paper, we introduce two main designs for gait motion estimation. Firstly, we propose a fully convolutional neural network named W-Net for silhouette segmentation from video sequences. Secondly, we present an adversarial learning-based algorithm for robust gait motion estimation. Together, these designs contribute to a high-performance system for gait recognition and user authentication. In the experiment, two datasets, i.e., OU-IRIS and our own dataset, are used for performance evaluation. Experimental results show that, the W-Net achieves an accuracy of 89.46% in silhouette segmentation, and the proposed user-authentication method achieves over 99.6% and 93.8% accuracy on the two datasets, respectively.
{"title":"Deep motion estimation through adversarial learning for gait recognition","authors":"Yuanhao Yue , Laixiang Shi , Zheng Zheng , Long Chen , Zhongyuan Wang , Qin Zou","doi":"10.1016/j.patrec.2024.06.031","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.031","url":null,"abstract":"<div><p>Gait recognition is a form of identity verification that can be performed over long distances without requiring the subject’s cooperation, making it particularly valuable for applications such as access control, surveillance, and criminal investigation. The essence of gait lies in the motion dynamics of a walking individual. Accurate gait-motion estimation is crucial for high-performance gait recognition. In this paper, we introduce two main designs for gait motion estimation. Firstly, we propose a fully convolutional neural network named W-Net for silhouette segmentation from video sequences. Secondly, we present an adversarial learning-based algorithm for robust gait motion estimation. Together, these designs contribute to a high-performance system for gait recognition and user authentication. In the experiment, two datasets, i.e., OU-IRIS and our own dataset, are used for performance evaluation. Experimental results show that, the W-Net achieves an accuracy of 89.46% in silhouette segmentation, and the proposed user-authentication method achieves over 99.6% and 93.8% accuracy on the two datasets, respectively.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 232-237"},"PeriodicalIF":3.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1016/j.patrec.2024.06.032
Sona Taheri , Adil M. Bagirov , Nargiz Sultanova , Burak Ordin
The presence of noise or outliers in data sets may heavily affect the performance of clustering algorithms and lead to unsatisfactory results. The majority of conventional clustering algorithms are sensitive to noise and outliers. Robust clustering algorithms often overcome difficulties associated with noise and outliers and find true cluster structures. We introduce a soft trimming approach for the hard clustering problem where its objective is modeled as a sum of the cluster function and a function represented as a composition of the algebraic and distance functions. We utilize the composite function to estimate the degree of the significance of each data point in clustering. A robust clustering algorithm based on the new model and a procedure for generating starting cluster centers is developed. We demonstrate the performance of the proposed algorithm using some synthetic and real-world data sets containing noise and outliers. We also compare its performance with that of some well-known clustering techniques. Results show that the new algorithm is robust to noise and outliers and finds true cluster structures.
{"title":"Robust clustering algorithm: The use of soft trimming approach","authors":"Sona Taheri , Adil M. Bagirov , Nargiz Sultanova , Burak Ordin","doi":"10.1016/j.patrec.2024.06.032","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.032","url":null,"abstract":"<div><p>The presence of noise or outliers in data sets may heavily affect the performance of clustering algorithms and lead to unsatisfactory results. The majority of conventional clustering algorithms are sensitive to noise and outliers. Robust clustering algorithms often overcome difficulties associated with noise and outliers and find true cluster structures. We introduce a soft trimming approach for the hard clustering problem where its objective is modeled as a sum of the cluster function and a function represented as a composition of the algebraic and distance functions. We utilize the composite function to estimate the degree of the significance of each data point in clustering. A robust clustering algorithm based on the new model and a procedure for generating starting cluster centers is developed. We demonstrate the performance of the proposed algorithm using some synthetic and real-world data sets containing noise and outliers. We also compare its performance with that of some well-known clustering techniques. Results show that the new algorithm is robust to noise and outliers and finds true cluster structures.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 15-22"},"PeriodicalIF":3.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167865524002022/pdfft?md5=dbe56ad973c9985e231d856d9eba464a&pid=1-s2.0-S0167865524002022-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural Radiance Fields (NeRFs) offers a state-of-the-art quality in synthesizing novel views of complex 3D scenes from a small subset of base images. For NeRFs to perform optimally, the registration of base images has to follow certain assumptions, including maintaining a constant distance between the camera and the object. We can address this limitation by training NeRFs with 3D point clouds instead of images, yet a straightforward substitution is impossible due to the sparsity of 3D clouds in the under-sampled regions, which leads to incomplete reconstruction output by NeRFs. To solve this problem, here we propose an auto-encoder-based architecture that leverages a hypernetwork paradigm to transfer 3D points with the associated color values through a lower-dimensional latent space and generate weights of NeRF model. This way, we can accommodate the sparsity of 3D point clouds and fully exploit the potential of point cloud data. As a side benefit, our method offers an implicit way of representing 3D scenes and objects that can be employed to condition NeRFs and hence generalize the models beyond objects seen during training. The empirical evaluation confirms the advantages of our method over conventional NeRFs and proves its superiority in practical applications.
{"title":"Points2NeRF: Generating Neural Radiance Fields from 3D point cloud","authors":"Dominik Zimny , Joanna Waczyńska , Tomasz Trzciński , Przemysław Spurek","doi":"10.1016/j.patrec.2024.07.002","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.07.002","url":null,"abstract":"<div><p>Neural Radiance Fields (NeRFs) offers a state-of-the-art quality in synthesizing novel views of complex 3D scenes from a small subset of base images. For NeRFs to perform optimally, the registration of base images has to follow certain assumptions, including maintaining a constant distance between the camera and the object. We can address this limitation by training NeRFs with 3D point clouds instead of images, yet a straightforward substitution is impossible due to the sparsity of 3D clouds in the under-sampled regions, which leads to incomplete reconstruction output by NeRFs. To solve this problem, here we propose an auto-encoder-based architecture that leverages a hypernetwork paradigm to transfer 3D points with the associated color values through a lower-dimensional latent space and generate weights of NeRF model. This way, we can accommodate the sparsity of 3D point clouds and fully exploit the potential of point cloud data. As a side benefit, our method offers an implicit way of representing 3D scenes and objects that can be employed to condition NeRFs and hence generalize the models beyond objects seen during training. The empirical evaluation confirms the advantages of our method over conventional NeRFs and proves its superiority in practical applications.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 8-14"},"PeriodicalIF":3.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1016/j.patrec.2024.07.001
Thanh Tuan Nguyen , Thanh Phuong Nguyen
In fact, several categories in a large dataset are not difficult for recent advanced deep neural networks to recognize. Eliminating them for a challenging smaller subset will assist the early network proposals in taking a quick trial of verification. To this end, we propose an efficient rescaling method based on the validation outcomes of a pre-trained model. Firstly, we will take out the sensitive images of the lowest-accuracy classes of the validation outcomes. Each of such images is then considered to identify which label it was confused with. Gathering the lowest-accuracy classes along with the most confused ones can produce a smaller subset with a higher challenge for quick validation of an early network draft. Finally, a rescaling application is introduced to rescale two popular large datasets (ImageNet and Places365) for different tiny subsets (i.e., and respectively). Experiments for image classification have proved that neural networks obtaining good performance on the original datasets also achieve good results on their rescaled subsets. For instance, MobileNetV1 and MobileNetV2 with 70.6% and 72% on ImageNet respectively obtained 46.53% and 47.47% on its small subset , which only contains about 39 000 images. It can be observed that the better performance of MobileNetV2 on ImageNet correspondingly leads to the better rate on its rescaled subset. Appropriately, utilizing these rescaled sets would help researchers save time and computational costs in the way of designing deep neural architectures. All codes related to the rescaling proposal and the resultant subsets are available at http://github.com/nttbdrk25/ImageNetPlaces365.
{"title":"Rescaling large datasets based on validation outcomes of a pre-trained network","authors":"Thanh Tuan Nguyen , Thanh Phuong Nguyen","doi":"10.1016/j.patrec.2024.07.001","DOIUrl":"10.1016/j.patrec.2024.07.001","url":null,"abstract":"<div><p>In fact, several categories in a large dataset are not difficult for recent advanced deep neural networks to recognize. Eliminating them for a challenging smaller subset will assist the early network proposals in taking a quick trial of verification. To this end, we propose an efficient rescaling method based on the validation outcomes of a pre-trained model. Firstly, we will take out the sensitive images of the lowest-accuracy classes of the validation outcomes. Each of such images is then considered to identify which label it was confused with. Gathering the lowest-accuracy classes along with the most confused ones can produce a smaller subset with a higher challenge for quick validation of an early network draft. Finally, a rescaling application is introduced to rescale two popular large datasets (ImageNet and Places365) for different tiny subsets (i.e., <span><math><msup><mrow><mi>ReIN</mi></mrow><mrow><mi>Ω</mi></mrow></msup></math></span> and <span><math><msup><mrow><mi>RePL</mi></mrow><mrow><mi>Ω</mi></mrow></msup></math></span> respectively). Experiments for image classification have proved that neural networks obtaining good performance on the original datasets also achieve good results on their rescaled subsets. For instance, MobileNetV1 and MobileNetV2 with 70.6% and 72% on ImageNet respectively obtained 46.53% and 47.47% on its small subset <span><math><msup><mrow><mi>ReIN</mi></mrow><mrow><mn>30</mn></mrow></msup></math></span>, which only contains about 39<!--> <!-->000 images. It can be observed that the better performance of MobileNetV2 on ImageNet correspondingly leads to the better rate on its rescaled subset. Appropriately, utilizing these rescaled sets would help researchers save time and computational costs in the way of designing deep neural architectures. All codes related to the rescaling proposal and the resultant subsets are available at <span><span>http://github.com/nttbdrk25/ImageNetPlaces365</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 73-80"},"PeriodicalIF":3.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141702955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1016/j.patrec.2024.06.030
Yonghyun Jeong , Doyeon Kim , Pyounggeon Kim , Youngmin Ro , Jongwon Choi
Although the recent advancement in generative models brings diverse advantages to society, it can also be abused with malicious purposes, such as fraud, defamation, and fake news. To prevent such cases, vigorous research is conducted to distinguish the generated images from the real images, but challenges still remain to distinguish the generated images outside of the training settings. Such limitations occur due to data dependency arising from the model’s overfitting issue to the specific Generative Adversarial Networks (GANs) and categories of the training data. To overcome this issue, we adopt a self-supervised scheme. Our method is composed of the artificial artifact generator reconstructing the high-quality artificial artifacts of GAN images, and the GAN detector distinguishing GAN images by learning the reconstructed artificial artifacts. To improve the generalization of the artificial artifact generator, we build multiple autoencoders with different numbers of upconvolution layers. With numerous ablation studies, the robust generalization of our method is validated by outperforming the generalization of the previous state-of-the-art algorithms, even without utilizing the GAN images of the training dataset.
尽管近年来生成模型的进步为社会带来了各种优势,但它也可能被恶意滥用,如欺诈、诽谤和假新闻。为了防止此类情况的发生,人们在区分生成的图像和真实图像方面进行了大量研究,但要在训练设置之外区分生成的图像仍面临挑战。这种局限性是由于模型对特定生成对抗网络(GAN)和训练数据类别的过拟合问题导致的数据依赖性造成的。为了克服这一问题,我们采用了一种自监督方案。我们的方法由人工伪影生成器和 GAN 检测器组成,前者负责重建 GAN 图像的高质量人工伪影,后者则通过学习重建的人工伪影来区分 GAN 图像。为了提高人工伪影发生器的通用性,我们建立了多个具有不同上卷积层数的自动编码器。通过大量的消融研究,我们的方法即使不使用训练数据集的 GAN 图像,其强大的泛化能力也超过了之前最先进算法的泛化能力,从而验证了我们方法的强大泛化能力。
{"title":"Self-supervised scheme for generalizing GAN image detection","authors":"Yonghyun Jeong , Doyeon Kim , Pyounggeon Kim , Youngmin Ro , Jongwon Choi","doi":"10.1016/j.patrec.2024.06.030","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.030","url":null,"abstract":"<div><p>Although the recent advancement in generative models brings diverse advantages to society, it can also be abused with malicious purposes, such as fraud, defamation, and fake news. To prevent such cases, vigorous research is conducted to distinguish the generated images from the real images, but challenges still remain to distinguish the generated images outside of the training settings. Such limitations occur due to data dependency arising from the model’s overfitting issue to the specific Generative Adversarial Networks (GANs) and categories of the training data. To overcome this issue, we adopt a self-supervised scheme. Our method is composed of the artificial artifact generator reconstructing the high-quality artificial artifacts of GAN images, and the GAN detector distinguishing GAN images by learning the reconstructed artificial artifacts. To improve the generalization of the artificial artifact generator, we build multiple autoencoders with different numbers of upconvolution layers. With numerous ablation studies, the robust generalization of our method is validated by outperforming the generalization of the previous state-of-the-art algorithms, even without utilizing the GAN images of the training dataset.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 219-224"},"PeriodicalIF":3.9,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1016/j.patrec.2024.06.029
Luis M. Valentín-Coronado, Rodolfo Martínez-Manuel, Jonathan Esquivel-Hernández, Maria de los Angeles Martínez-Guerrero, Sophie LaRochelle
Bending monitoring is critical in engineering applications, as it helps determine any structural deformation caused by load action or fatigue effect. While strain gauges and accelerometers were previously used to measure bending magnitude, optical fiber sensors have emerged as a reliable alternative. In this work, a machine-learning-based model is proposed to analyze the interference signal of an interferometric fiber sensor system and characterize the bending magnitude and direction. In particular, shallow learning-based and convolutional neural network-based (CNN) models have been implemented to perform this task. Furthermore, given the repeatability of the interference signals, a synthetic dataset was created to train the models, whereas real interferometric signals were used to evaluate the models’ performance. Experiments were conducted on a flexible rod in fixed–free and fixed–fixed ends configurations for bending monitoring. Although both models achieved mean accuracies above 91%, only the CNN-based model reached a mean accuracy above 98%. This confirms that monitoring bending movements through interference signal analysis by means of a CNN-based model is a viable approach.
{"title":"Bending classification from interference signals of a fiber optic sensor using shallow learning and convolutional neural networks","authors":"Luis M. Valentín-Coronado, Rodolfo Martínez-Manuel, Jonathan Esquivel-Hernández, Maria de los Angeles Martínez-Guerrero, Sophie LaRochelle","doi":"10.1016/j.patrec.2024.06.029","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.029","url":null,"abstract":"Bending monitoring is critical in engineering applications, as it helps determine any structural deformation caused by load action or fatigue effect. While strain gauges and accelerometers were previously used to measure bending magnitude, optical fiber sensors have emerged as a reliable alternative. In this work, a machine-learning-based model is proposed to analyze the interference signal of an interferometric fiber sensor system and characterize the bending magnitude and direction. In particular, shallow learning-based and convolutional neural network-based (CNN) models have been implemented to perform this task. Furthermore, given the repeatability of the interference signals, a synthetic dataset was created to train the models, whereas real interferometric signals were used to evaluate the models’ performance. Experiments were conducted on a flexible rod in fixed–free and fixed–fixed ends configurations for bending monitoring. Although both models achieved mean accuracies above 91%, only the CNN-based model reached a mean accuracy above 98%. This confirms that monitoring bending movements through interference signal analysis by means of a CNN-based model is a viable approach.","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"56 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1016/j.patrec.2024.06.028
Taejune Kim , Yun-Gyoo Lee , Inho Jeong , Soo-Youn Ham , Simon S. Woo
Radiography images inherently possess globally consistent structures while exhibiting significant diversity in local anatomical regions, making it challenging to model their normal features through unsupervised anomaly detection. Since unsupervised anomaly detection methods localize anomalies by utilizing discrepancies between learned normal features and input abnormal features, previous studies introduce a memory structure to capture the normal features of radiography images. However, these approaches store extremely localized image segments in their memory, causing the model to represent both normal and pathological features with the stored components. This poses a significant challenge in unsupervised anomaly detection by reducing the disparity between learned features and abnormal features. Furthermore, with the diverse settings in radiography imaging, the above issue is exacerbated: more diversity in the normal images results in stronger representation of pathological features. To resolve the issues above, we propose a novel pathology detection method called Patch-wise Vector Quantization (P-VQ). Unlike the previous methods, P-VQ learns vector-quantized representations of normal “patches” while preserving its spatial information by incorporating vector similarity metric. Furthermore, we introduce a novel method for selecting features in the memory to further enhance the robustness against diverse imaging settings. P-VQ even mitigates the “index collapse” problem of vector quantization by proposing top- dropout. Our extensive experiments on the BMAD benchmark demonstrate the superior performance of P-VQ against existing state-of-the-art methods.
{"title":"Patch-wise vector quantization for unsupervised medical anomaly detection","authors":"Taejune Kim , Yun-Gyoo Lee , Inho Jeong , Soo-Youn Ham , Simon S. Woo","doi":"10.1016/j.patrec.2024.06.028","DOIUrl":"https://doi.org/10.1016/j.patrec.2024.06.028","url":null,"abstract":"<div><p>Radiography images inherently possess globally consistent structures while exhibiting significant diversity in local anatomical regions, making it challenging to model their normal features through unsupervised anomaly detection. Since unsupervised anomaly detection methods localize anomalies by utilizing discrepancies between learned normal features and input abnormal features, previous studies introduce a memory structure to capture the normal features of radiography images. However, these approaches store extremely localized image segments in their memory, causing the model to represent both normal and pathological features with the stored components. This poses a significant challenge in unsupervised anomaly detection by reducing the disparity between learned features and abnormal features. Furthermore, with the diverse settings in radiography imaging, the above issue is exacerbated: more diversity in the normal images results in stronger representation of pathological features. To resolve the issues above, we propose a novel pathology detection method called Patch-wise Vector Quantization (P-VQ). Unlike the previous methods, P-VQ learns vector-quantized representations of normal “patches” while preserving its spatial information by incorporating vector similarity metric. Furthermore, we introduce a novel method for selecting features in the memory to further enhance the robustness against diverse imaging settings. P-VQ even mitigates the “index collapse” problem of vector quantization by proposing top-<span><math><mrow><mi>k</mi><mtext>%</mtext></mrow></math></span> dropout. Our extensive experiments on the BMAD benchmark demonstrate the superior performance of P-VQ against existing state-of-the-art methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"184 ","pages":"Pages 205-211"},"PeriodicalIF":3.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1016/j.patrec.2024.06.027
Shuixin Deng , Lei Deng , Xiangze Meng , Ting Sun , Baohua Chen , Zhixiang Chen , Hao Hu , Yusen Xie , Hanxi Yin , Shijie Yu
Printed Circuit Board (PCB) Surface defect detection is crucial to ensure the quality of electronic products in manufacturing industry. Detection methods can be divided into non-referential and referential methods. Non-referential methods employ designed rules or learned data distribution without template images but are difficult to address the uncertainty and subjectivity issues of defects. In contrast, referential methods use templates to achieve better performance but rely on precise image registration. However, image registration is especially challenging in feature extracting and matching for PCB images with defective, reduplicated or less features. To address these issues, we propose a novel Energy-based Hierarchical Iterative Image Registration method (EHIR) to formulate image registration as an energy optimization problem based on the edge points rather than finite features. Our framework consists of three stages: Edge-guided Energy Transformation (EET), EHIR and Edge-guided Energy-based Defect Detection (EEDD). The novelty is that the consistency of contours contributes to aligning images and the difference is highlighted for defect location. Extensive experiments show that this method has high accuracy and strong robustness, especially in the presence of defect feature interference, where our method demonstrates an overwhelming advantage over other methods.
{"title":"EHIR: Energy-based Hierarchical Iterative Image Registration for Accurate PCB Defect Detection","authors":"Shuixin Deng , Lei Deng , Xiangze Meng , Ting Sun , Baohua Chen , Zhixiang Chen , Hao Hu , Yusen Xie , Hanxi Yin , Shijie Yu","doi":"10.1016/j.patrec.2024.06.027","DOIUrl":"10.1016/j.patrec.2024.06.027","url":null,"abstract":"<div><p>Printed Circuit Board (PCB) Surface defect detection is crucial to ensure the quality of electronic products in manufacturing industry. Detection methods can be divided into non-referential and referential methods. Non-referential methods employ designed rules or learned data distribution without template images but are difficult to address the uncertainty and subjectivity issues of defects. In contrast, referential methods use templates to achieve better performance but rely on precise image registration. However, image registration is especially challenging in feature extracting and matching for PCB images with defective, reduplicated or less features. To address these issues, we propose a novel <strong>E</strong>nergy-based <strong>H</strong>ierarchical <strong>I</strong>terative Image <strong>R</strong>egistration method (EHIR) to formulate image registration as an energy optimization problem based on the edge points rather than finite features. Our framework consists of three stages: Edge-guided Energy Transformation (EET), EHIR and Edge-guided Energy-based Defect Detection (EEDD). The novelty is that the consistency of contours contributes to aligning images and the difference is highlighted for defect location. Extensive experiments show that this method has high accuracy and strong robustness, especially in the presence of defect feature interference, where our method demonstrates an overwhelming advantage over other methods.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 38-44"},"PeriodicalIF":3.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}