Pub Date : 2024-07-02DOI: 10.1007/s10044-024-01295-8
Simon Thomine, Hichem Snoussi
Unsupervised anomaly detection holds significant importance in large-scale industrial manufacturing. Recent methods have capitalized on the benefits of employing a classifier pretrained on natural images to extract representative features from specific layers, which are subsequently processed using various techniques. Notably, memory bank-based methods, which have demonstrated exceptional accuracy, often incur a trade-off in terms of latency, posing a challenge in real-time industrial applications where prompt anomaly detection and response are crucial. Indeed, alternative approaches such as knowledge distillation and normalized flow have demonstrated promising performance in unsupervised anomaly detection while maintaining low latency. In this paper, we aim to revisit the concept of knowledge distillation in the context of unsupervised anomaly detection, emphasizing the significance of feature selection. By employing distinctive features and leveraging different models, we intend to highlight the importance of carefully selecting and utilizing relevant features specifically tailored for the task of anomaly detection. This article presents a novel approach for anomaly detection, which employs dual model knowledge distillation and incorporates various types of semantic information by leveraging high and low-level semantic information.
{"title":"Dual model knowledge distillation for industrial anomaly detection","authors":"Simon Thomine, Hichem Snoussi","doi":"10.1007/s10044-024-01295-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01295-8","url":null,"abstract":"<p>Unsupervised anomaly detection holds significant importance in large-scale industrial manufacturing. Recent methods have capitalized on the benefits of employing a classifier pretrained on natural images to extract representative features from specific layers, which are subsequently processed using various techniques. Notably, memory bank-based methods, which have demonstrated exceptional accuracy, often incur a trade-off in terms of latency, posing a challenge in real-time industrial applications where prompt anomaly detection and response are crucial. Indeed, alternative approaches such as knowledge distillation and normalized flow have demonstrated promising performance in unsupervised anomaly detection while maintaining low latency. In this paper, we aim to revisit the concept of knowledge distillation in the context of unsupervised anomaly detection, emphasizing the significance of feature selection. By employing distinctive features and leveraging different models, we intend to highlight the importance of carefully selecting and utilizing relevant features specifically tailored for the task of anomaly detection. This article presents a novel approach for anomaly detection, which employs dual model knowledge distillation and incorporates various types of semantic information by leveraging high and low-level semantic information.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1007/s10044-024-01293-w
Shekhar Karanwal
Among several local descriptors invented in literature, the local binary pattern (LBP) is the prolific one. Despite its advantages like low computational complexity and monotonic gray invariance property, there are various demerits are observed in LBP and these are limited spatial patch, high dimension feature, noisy thresholding function and un-affective in harsh illumination variations. To overcome these issues presented work introduces the novel local descriptor called as discriminative binary pattern (DBP). Precisely two descriptors are introduced under DBP so-called Radial orthogonal binary pattern (ROBP) and radial variance binary pattern (RVBP). In former proposed descriptor, for neighborhood comparison, the center pixel is replaced by mean of medians computed from [orthogonal pixels + center pixel] of two 3 × 3 pixel window, formed from radius S1 and S2 of the 5 × 5 image patch. In latter proposed descriptor, the radial variances generated from 8 pair of two pixels are utilized for comparison with their mean value. In case of the both proposed descriptors, the sub-region wise histograms are extracted and fused to develop the entire feature size. Further the feature length of ROBP and RVBP are merged to form the size of the DBP descriptor. The compression is conducted by principal component analysis (PCA) and Fishers linear discriminant analysis). For matching support vector machines is used. Experiments conducted on 8 benchmark datasets reveals the effectiveness of the proposed DBP as compared to the other state of art benchmark methods.
{"title":"Discriminative binary pattern descriptor for face recognition","authors":"Shekhar Karanwal","doi":"10.1007/s10044-024-01293-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01293-w","url":null,"abstract":"<p>Among several local descriptors invented in literature, the local binary pattern (LBP) is the prolific one. Despite its advantages like low computational complexity and monotonic gray invariance property, there are various demerits are observed in LBP and these are limited spatial patch, high dimension feature, noisy thresholding function and un-affective in harsh illumination variations. To overcome these issues presented work introduces the novel local descriptor called as discriminative binary pattern (DBP). Precisely two descriptors are introduced under DBP so-called Radial orthogonal binary pattern (ROBP) and radial variance binary pattern (RVBP). In former proposed descriptor, for neighborhood comparison, the center pixel is replaced by mean of medians computed from [orthogonal pixels + center pixel] of two 3 × 3 pixel window, formed from radius S1 and S2 of the 5 × 5 image patch. In latter proposed descriptor, the radial variances generated from 8 pair of two pixels are utilized for comparison with their mean value. In case of the both proposed descriptors, the sub-region wise histograms are extracted and fused to develop the entire feature size. Further the feature length of ROBP and RVBP are merged to form the size of the DBP descriptor. The compression is conducted by principal component analysis (PCA) and Fishers linear discriminant analysis). For matching support vector machines is used. Experiments conducted on 8 benchmark datasets reveals the effectiveness of the proposed DBP as compared to the other state of art benchmark methods.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1007/s10044-024-01297-6
S. Muhammad Ahmed Hassan Shah, Muhammad Qasim Khan, Atif Rizwan, Sana Ullah Jan, Nagwan Abdel Samee, Mona M. Jamjoom
Cognitive disorders affect various cognitive functions that can have a substantial impact on individual’s daily life. Alzheimer’s disease (AD) is one of such well-known cognitive disorders. Early detection and treatment of cognitive diseases using artificial intelligence can help contain them. However, the complex spatial relationships and long-range dependencies found in medical imaging data present challenges in achieving the objective. Moreover, for a few years, the application of transformers in imaging has emerged as a promising area of research. A reason can be transformer’s impressive capabilities of tackling spatial relationships and long-range dependency challenges in two ways, i.e., (1) using their self-attention mechanism to generate comprehensive features, and (2) capture complex patterns by incorporating global context and long-range dependencies. In this work, a Bi-Vision Transformer (BiViT) architecture is proposed for classifying different stages of AD, and multiple types of cognitive disorders from 2-dimensional MRI imaging data. More specifically, the transformer is composed of two novel modules, namely Mutual Latent Fusion (MLF) and Parallel Coupled Encoding Strategy (PCES), for effective feature learning. Two different datasets have been used to evaluate the performance of proposed BiViT-based architecture. The first dataset contain several classes such as mild or moderate demented stages of the AD. The other dataset is composed of samples from patients with AD and different cognitive disorders such as mild, early, or moderate impairments. For comprehensive comparison, a multiple transfer learning algorithm and a deep autoencoder have been each trained on both datasets. The results show that the proposed BiViT-based model achieves an accuracy of 96.38% on the AD dataset. However, when applied to cognitive disease data, the accuracy slightly decreases below 96% which can be resulted due to smaller amount of data and imbalance in data distribution. Nevertheless, given the results, it can be hypothesized that the proposed algorithm can perform better if the imbalanced distribution and limited availability problems in data can be addressed.
Graphical abstract
认知障碍会影响各种认知功能,从而对个人的日常生活产生重大影响。阿尔茨海默病(AD)就是众所周知的认知障碍之一。利用人工智能对认知疾病进行早期检测和治疗有助于控制疾病。然而,医学影像数据中复杂的空间关系和远距离依赖关系给实现这一目标带来了挑战。此外,几年来,变压器在成像中的应用已成为一个前景广阔的研究领域。其中一个原因可能是变换器具有令人印象深刻的能力,能通过两种方式解决空间关系和长距离依赖性难题,即:(1)利用其自我注意机制生成综合特征;(2)通过结合全局上下文和长距离依赖性捕捉复杂模式。在这项工作中,我们提出了一种双视觉转换器(BiViT)架构,用于从二维核磁共振成像数据中对不同阶段的注意力缺失症和多种类型的认知障碍进行分类。更具体地说,转换器由两个新模块组成,即相互潜在融合(MLF)和并行耦合编码策略(PCES),用于有效的特征学习。为了评估基于 BiViT 架构的性能,我们使用了两个不同的数据集。第一个数据集包含几个类别,如轻度或中度痴呆阶段的注意力缺失症。另一个数据集由患有注意力缺失症和不同认知障碍(如轻度、早期或中度障碍)的患者样本组成。为了进行全面比较,在这两个数据集上分别训练了多重迁移学习算法和深度自动编码器。结果显示,基于 BiViT 的模型在 AD 数据集上的准确率达到了 96.38%。然而,当应用于认知疾病数据时,准确率略有下降,低于 96%,这可能是由于数据量较小和数据分布不平衡造成的。尽管如此,从结果来看,如果能解决数据分布不平衡和可用性有限的问题,可以推测所提出的算法会有更好的表现。
{"title":"Computer-aided diagnosis of Alzheimer’s disease and neurocognitive disorders with multimodal Bi-Vision Transformer (BiViT)","authors":"S. Muhammad Ahmed Hassan Shah, Muhammad Qasim Khan, Atif Rizwan, Sana Ullah Jan, Nagwan Abdel Samee, Mona M. Jamjoom","doi":"10.1007/s10044-024-01297-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01297-6","url":null,"abstract":"<p>Cognitive disorders affect various cognitive functions that can have a substantial impact on individual’s daily life. Alzheimer’s disease (AD) is one of such well-known cognitive disorders. Early detection and treatment of cognitive diseases using artificial intelligence can help contain them. However, the complex spatial relationships and long-range dependencies found in medical imaging data present challenges in achieving the objective. Moreover, for a few years, the application of transformers in imaging has emerged as a promising area of research. A reason can be transformer’s impressive capabilities of tackling spatial relationships and long-range dependency challenges in two ways, i.e., (1) using their self-attention mechanism to generate comprehensive features, and (2) capture complex patterns by incorporating global context and long-range dependencies. In this work, a Bi-Vision Transformer (BiViT) architecture is proposed for classifying different stages of AD, and multiple types of cognitive disorders from 2-dimensional MRI imaging data. More specifically, the transformer is composed of two novel modules, namely Mutual Latent Fusion (MLF) and Parallel Coupled Encoding Strategy (PCES), for effective feature learning. Two different datasets have been used to evaluate the performance of proposed BiViT-based architecture. The first dataset contain several classes such as mild or moderate demented stages of the AD. The other dataset is composed of samples from patients with AD and different cognitive disorders such as mild, early, or moderate impairments. For comprehensive comparison, a multiple transfer learning algorithm and a deep autoencoder have been each trained on both datasets. The results show that the proposed BiViT-based model achieves an accuracy of 96.38% on the AD dataset. However, when applied to cognitive disease data, the accuracy slightly decreases below 96% which can be resulted due to smaller amount of data and imbalance in data distribution. Nevertheless, given the results, it can be hypothesized that the proposed algorithm can perform better if the imbalanced distribution and limited availability problems in data can be addressed.</p><h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1007/s10044-024-01286-9
Andrés Bribiesca-Sánchez, Adolfo Guzmán, Fernando Montoya, Dan S. Díaz-Guerrero, Haydeé O. Hernández, Paul Hernández-Herrera, Alberto Darszon, Gabriel Corkidi, Ernesto Bribiesca
In the realm of 3D image processing, accurately representing the geometric nuances of line curves is crucial. Building upon the foundation set by the slope chain code, which adeptly represents intricate two-dimensional curves using an array capturing the exterior angles at each vertex, this study introduces an innovative 3D encoding method tailored for polygonal curves. This 3D encoding employs parallel slope and torsion chains, ensuring invariance to common transformations like translations, rotations, and uniform scaling, while also demonstrating robustness against mirror imaging and variable starting points. A hallmark feature of this method is its ability to compute tortuosity, a descriptor of curve complexity or winding nature. By applying this technique to biomedical engineering, we delved into the flagellar beat patterns of human sperm. These insights underscore the versatility of our 3D encoding across diverse computer vision applications.
{"title":"A three-dimensional extension of the slope chain code: analyzing the tortuosity of the flagellar beat of human sperm","authors":"Andrés Bribiesca-Sánchez, Adolfo Guzmán, Fernando Montoya, Dan S. Díaz-Guerrero, Haydeé O. Hernández, Paul Hernández-Herrera, Alberto Darszon, Gabriel Corkidi, Ernesto Bribiesca","doi":"10.1007/s10044-024-01286-9","DOIUrl":"https://doi.org/10.1007/s10044-024-01286-9","url":null,"abstract":"<p>In the realm of 3D image processing, accurately representing the geometric nuances of line curves is crucial. Building upon the foundation set by the slope chain code, which adeptly represents intricate two-dimensional curves using an array capturing the exterior angles at each vertex, this study introduces an innovative 3D encoding method tailored for polygonal curves. This 3D encoding employs parallel slope and torsion chains, ensuring invariance to common transformations like translations, rotations, and uniform scaling, while also demonstrating robustness against mirror imaging and variable starting points. A hallmark feature of this method is its ability to compute tortuosity, a descriptor of curve complexity or winding nature. By applying this technique to biomedical engineering, we delved into the flagellar beat patterns of human sperm. These insights underscore the versatility of our 3D encoding across diverse computer vision applications.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For channel and spatial feature map C×W×H in object detection task, its information fusion usually relies on attention mechanism, that is, all C channels and the entire space W×H are all compressed respectively via average/max pooling, and then their attention weight masks are obtained based on correlation calculation. This coarse-grained global operation ignores the differences among multiple channels and diverse spatial regions, resulting in inaccurate attention weights. In addition, how to mine the contextual information in the space W×H is also a challenge for object recognition and localization. To this end, we propose a Fine-Grained Dual Level Attention Mechanism joint Spacial Context Information Fusion module for object detection (FGDLAM&SCIF). It is a cascaded structure, firstly, we subdivide the feature space W×H into n (optimized as n = 4 in experiments) subspaces and construct a global adaptive pooling and one-dimensional convolution algorithm to effectively extract the feature channel weights on each subspace respectively. Secondly, the C feature channels are divided into n (n = 4) sub-channels, and then a multi-scale module is constructed in the feature space W×H to mine context information. Finally, row and column coding is used to fuse them orthogonally to obtain enhanced features. This module is embeddable, which can be transplanted into any object detection network, such as YOLOv4/v5, PPYOLOE, YOLOX and MobileNet, ResNet as well. Experiments are conducted on the MS COCO 2017 and Pascal VOC 2007 datasets to verify its effectiveness and good portability.
对于物体检测任务中的信道和空间特征图 C×W×H,其信息融合通常依赖于注意力机制,即通过平均/最大池化将所有 C 信道和整个空间 W×H 都分别压缩,然后根据相关性计算得到它们的注意力权重掩码。这种粗粒度的全局操作忽略了多个通道和不同空间区域之间的差异,导致注意力权重不准确。此外,如何挖掘 W×H 空间中的上下文信息也是物体识别和定位的一个难题。为此,我们提出了一种用于物体检测的细粒度双层注意力机制联合空间上下文信息融合模块(FGDLAM&SCIF)。它是一个级联结构,首先,我们将特征空间 W×H 细分为 n 个(实验中优化为 n = 4)子空间,并构建了全局自适应池化算法和一维卷积算法,分别有效提取每个子空间上的特征通道权重。其次,将 C 特征通道划分为 n(n = 4)个子通道,然后在特征空间 W×H 中构建多尺度模块来挖掘上下文信息。最后,利用行列编码将它们正交融合,从而获得增强的特征。该模块是可嵌入的,可以移植到任何物体检测网络中,如 YOLOv4/v5、PPYOLOE、YOLOX 和 MobileNet、ResNet 等。我们在 MS COCO 2017 和 Pascal VOC 2007 数据集上进行了实验,以验证其有效性和良好的可移植性。
{"title":"Fine grained dual level attention mechanisms with spacial context information fusion for object detection","authors":"Haigang Deng, Chuanxu Wang, Chengwei Li, Zhang Hao","doi":"10.1007/s10044-024-01290-z","DOIUrl":"https://doi.org/10.1007/s10044-024-01290-z","url":null,"abstract":"<p>For channel and spatial feature map C×W×H in object detection task, its information fusion usually relies on attention mechanism, that is, all C channels and the entire space W×H are all compressed respectively via average/max pooling, and then their attention weight masks are obtained based on correlation calculation. This coarse-grained global operation ignores the differences among multiple channels and diverse spatial regions, resulting in inaccurate attention weights. In addition, how to mine the contextual information in the space W×H is also a challenge for object recognition and localization. To this end, we propose a Fine-Grained Dual Level Attention Mechanism joint Spacial Context Information Fusion module for object detection (FGDLAM&SCIF). It is a cascaded structure, firstly, we subdivide the feature space W×H into <i>n</i> (optimized as <i>n</i> = 4 in experiments) subspaces and construct a global adaptive pooling and one-dimensional convolution algorithm to effectively extract the feature channel weights on each subspace respectively. Secondly, the C feature channels are divided into <i>n</i> (<i>n</i> = 4) sub-channels, and then a multi-scale module is constructed in the feature space W×H to mine context information. Finally, row and column coding is used to fuse them orthogonally to obtain enhanced features. This module is embeddable, which can be transplanted into any object detection network, such as YOLOv4/v5, PPYOLOE, YOLOX and MobileNet, ResNet as well. Experiments are conducted on the MS COCO 2017 and Pascal VOC 2007 datasets to verify its effectiveness and good portability.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time series data are sequences of values that are obtained by sampling a signal at a fixed frequency, and time series classification algorithms distinguish time series into different categories. Among many time series classification algorithms, subseries-based algorithms have received widespread attention because of their high accuracy and low computational complexity. However, subseries-based algorithms consider the similarity of subseries only by shape and ignore semantic similarity. Therefore, the purpose of this paper is to determine how to solve the problem that subseries-based time series classification algorithms ignore the semantic similarity between subseries. To address this issue, we introduce the deep graph kernel technique to capture the semantic similarity between subseries. To verify the performance of the method, we test the proposed algorithm on publicly available datasets from the UCR repository and the experimental results prove that the deep graph kernel has an important role in enhancing the accuracy of the algorithm and that the proposed algorithm performs quite well in terms of accuracy and has a considerable advantage over other representative algorithms.
{"title":"A deep graph kernel-based time series classification algorithm","authors":"Mengping Yu, Huan Huang, Rui Hou, Xiaoxuan Ma, Shuai Yuan","doi":"10.1007/s10044-024-01292-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01292-x","url":null,"abstract":"<p>Time series data are sequences of values that are obtained by sampling a signal at a fixed frequency, and time series classification algorithms distinguish time series into different categories. Among many time series classification algorithms, subseries-based algorithms have received widespread attention because of their high accuracy and low computational complexity. However, subseries-based algorithms consider the similarity of subseries only by shape and ignore semantic similarity. Therefore, the purpose of this paper is to determine how to solve the problem that subseries-based time series classification algorithms ignore the semantic similarity between subseries. To address this issue, we introduce the deep graph kernel technique to capture the semantic similarity between subseries. To verify the performance of the method, we test the proposed algorithm on publicly available datasets from the UCR repository and the experimental results prove that the deep graph kernel has an important role in enhancing the accuracy of the algorithm and that the proposed algorithm performs quite well in terms of accuracy and has a considerable advantage over other representative algorithms.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-24DOI: 10.1007/s10044-024-01288-7
Zhong Wang, Yi Liu, Lanfang Lei, Peibei Shi
This study aims to address the challenges of detecting smoking behavior among workers in chemical plant environments. Smoking behavior is difficult to discern in images, with the cigarette occupying only a small pixel area, compounded by the complex background of chemical plants. Traditional models struggle to accurately capture smoking features, leading to feature loss, reduced recognition accuracy, and issues like false positives and missed detections. To overcome these challenges, we have developed a smoking behavior recognition method based on the YOLOv8 model, named Smoking-YOLOv8. Our approach introduces an SD attention mechanism that focuses on the smoking areas within images. By aggregating information from different positions through weighted averaging, it effectively manages long-distance dependencies and suppresses irrelevant background noise, thereby enhancing detection performance. Furthermore, we utilize Wise-IoU as the regression loss for bounding boxes, along with a rational gradient distribution strategy that prioritizes samples of average quality to improve the model’s precision in localization. Finally, the introduction of SPPCSPC and PConv modules in the neck section of the network allows for multi-faceted feature extraction from images, reducing redundant computation and memory access, and effectively extracting spatial features to balance computational load and optimize network architecture. Experimental results on a custom dataset of smoking behavior in chemical plants show that our model outperforms the standard YOLOv8 model in mean Average Precision (mAP@0.5) by 6.18%, surpassing other mainstream models in overall performance.
{"title":"Smoking-YOLOv8: a novel smoking detection algorithm for chemical plant personnel","authors":"Zhong Wang, Yi Liu, Lanfang Lei, Peibei Shi","doi":"10.1007/s10044-024-01288-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01288-7","url":null,"abstract":"<p>This study aims to address the challenges of detecting smoking behavior among workers in chemical plant environments. Smoking behavior is difficult to discern in images, with the cigarette occupying only a small pixel area, compounded by the complex background of chemical plants. Traditional models struggle to accurately capture smoking features, leading to feature loss, reduced recognition accuracy, and issues like false positives and missed detections. To overcome these challenges, we have developed a smoking behavior recognition method based on the YOLOv8 model, named Smoking-YOLOv8. Our approach introduces an SD attention mechanism that focuses on the smoking areas within images. By aggregating information from different positions through weighted averaging, it effectively manages long-distance dependencies and suppresses irrelevant background noise, thereby enhancing detection performance. Furthermore, we utilize Wise-IoU as the regression loss for bounding boxes, along with a rational gradient distribution strategy that prioritizes samples of average quality to improve the model’s precision in localization. Finally, the introduction of SPPCSPC and PConv modules in the neck section of the network allows for multi-faceted feature extraction from images, reducing redundant computation and memory access, and effectively extracting spatial features to balance computational load and optimize network architecture. Experimental results on a custom dataset of smoking behavior in chemical plants show that our model outperforms the standard YOLOv8 model in mean Average Precision (mAP@0.5) by 6.18%, surpassing other mainstream models in overall performance.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1007/s10044-024-01287-8
Taojie Kuang, Yiming Ren, Zhixiang Ren
Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.
{"title":"3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information","authors":"Taojie Kuang, Yiming Ren, Zhixiang Ren","doi":"10.1007/s10044-024-01287-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01287-8","url":null,"abstract":"<p>Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1007/s10044-024-01284-x
Florian Lardeux, Petra Gomez-Krämer, Sylvain Marchand
We present a new recognition framework for ancient coins struck from the same die. It is called Low-complexity Arrays of Patch Signatures. To overcome the problem of illumination conditions we use multi-light energy maps which are a light-independent, 2.5D representation of the coin. The coin recognition is based on a local texture analysis of the energy maps. Descriptors of patches, tailored to coin images via the properties provided by the energy map, are matched against a database using a system of associative arrays. The system of associative arrays used for the matching is a generalization of the Low-complexity Arrays of Contour Signatures. Hence, the matching is very efficient and nearly at constant time. Due to the lack of available data, we present two new data sets of artificial and real ancient coins respectively. Theoretical insights for the framework are discussed and various experiments demonstrate the promising efficiency of our method.
{"title":"Low-complexity arrays of patch signature for efficient ancient coin retrieval","authors":"Florian Lardeux, Petra Gomez-Krämer, Sylvain Marchand","doi":"10.1007/s10044-024-01284-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01284-x","url":null,"abstract":"<p>We present a new recognition framework for ancient coins struck from the same die. It is called Low-complexity Arrays of Patch Signatures. To overcome the problem of illumination conditions we use multi-light energy maps which are a light-independent, 2.5D representation of the coin. The coin recognition is based on a local texture analysis of the energy maps. Descriptors of patches, tailored to coin images via the properties provided by the energy map, are matched against a database using a system of associative arrays. The system of associative arrays used for the matching is a generalization of the Low-complexity Arrays of Contour Signatures. Hence, the matching is very efficient and nearly at constant time. Due to the lack of available data, we present two new data sets of artificial and real ancient coins respectively. Theoretical insights for the framework are discussed and various experiments demonstrate the promising efficiency of our method.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-11DOI: 10.1007/s10044-024-01285-w
Alejandro Galán-Cuenca, Antonio Javier Gallego, Marcelo Saval-Calvo, Antonio Pertusa
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research. However, some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images. This work studies the effect of these challenges at the intra- and inter-domain level in few-shot learning scenarios with severe data imbalance. For this, we propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance. Specifically, different initialization and data augmentation methods are analyzed, and four adaptations to Siamese networks of solutions to deal with imbalanced data are introduced, including data balancing and weighted loss, both separately and combined, and with a different balance of pairing ratios. Moreover, we also assess the inference process considering four classifiers, namely Histogram, kNN, SVM, and Random Forest. Evaluation is performed on three chest X-ray datasets with annotated cases of both positive and negative COVID-19 diagnoses. The accuracy of each technique proposed for the Siamese architecture is analyzed separately. The results are compared to those obtained using equivalent methods on a state-of-the-art CNN, achieving an average F1 improvement of up to 3.6%, and up to 5.6% of F1 for intra-domain cases. We conclude that the introduced techniques offer promising improvements over the baseline in almost all cases and that the technique selection may vary depending on the amount of data available and the level of imbalance.
医学图像数据集对于计算机辅助诊断、治疗计划和医学研究中使用的模型训练至关重要。然而,这些数据集也面临着一些挑战,包括数据分布的不稳定性、数据稀缺性以及使用通用图像预训练模型时的迁移学习问题。这项工作研究了这些挑战在数据严重不平衡的少量学习场景中对域内和域间水平的影响。为此,我们提出了一种基于连体神经网络的方法,其中集成了一系列技术来缓解数据稀缺和分布不平衡的影响。具体来说,我们分析了不同的初始化和数据扩充方法,并介绍了四种适应连体网络的解决方案,以处理不平衡数据,包括数据平衡和加权损失,既可单独使用,也可合并使用,并采用不同的配对平衡比。此外,我们还评估了四种分类器的推理过程,即直方图、kNN、SVM 和随机森林。评估在三个胸部 X 光数据集上进行,这些数据集包含 COVID-19 阳性和阴性诊断的注释病例。分别分析了为连体架构提出的每种技术的准确性。结果与在最先进的 CNN 上使用同等方法获得的结果进行了比较,平均 F1 提高了 3.6%,域内病例的 F1 提高了 5.6%。我们得出的结论是,在几乎所有情况下,引入的技术都能比基线技术带来可喜的改进,而技术的选择可能会因可用数据量和不平衡程度的不同而有所变化。
{"title":"Few-shot learning for COVID-19 chest X-ray classification with imbalanced data: an inter vs. intra domain study","authors":"Alejandro Galán-Cuenca, Antonio Javier Gallego, Marcelo Saval-Calvo, Antonio Pertusa","doi":"10.1007/s10044-024-01285-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01285-w","url":null,"abstract":"<p>Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research. However, some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images. This work studies the effect of these challenges at the intra- and inter-domain level in few-shot learning scenarios with severe data imbalance. For this, we propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance. Specifically, different initialization and data augmentation methods are analyzed, and four adaptations to Siamese networks of solutions to deal with imbalanced data are introduced, including data balancing and weighted loss, both separately and combined, and with a different balance of pairing ratios. Moreover, we also assess the inference process considering four classifiers, namely Histogram, <i>k</i>NN, SVM, and Random Forest. Evaluation is performed on three chest X-ray datasets with annotated cases of both positive and negative COVID-19 diagnoses. The accuracy of each technique proposed for the Siamese architecture is analyzed separately. The results are compared to those obtained using equivalent methods on a state-of-the-art CNN, achieving an average F1 improvement of up to 3.6%, and up to 5.6% of F1 for intra-domain cases. We conclude that the introduced techniques offer promising improvements over the baseline in almost all cases and that the technique selection may vary depending on the amount of data available and the level of imbalance.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}