Pattern Analysis and Applications最新文献_第5页

Dual model knowledge distillation for industrial anomaly detection 用于工业异常检测的双模型知识提炼

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-07-02 DOI: 10.1007/s10044-024-01295-8

Simon Thomine, Hichem Snoussi

Unsupervised anomaly detection holds significant importance in large-scale industrial manufacturing. Recent methods have capitalized on the benefits of employing a classifier pretrained on natural images to extract representative features from specific layers, which are subsequently processed using various techniques. Notably, memory bank-based methods, which have demonstrated exceptional accuracy, often incur a trade-off in terms of latency, posing a challenge in real-time industrial applications where prompt anomaly detection and response are crucial. Indeed, alternative approaches such as knowledge distillation and normalized flow have demonstrated promising performance in unsupervised anomaly detection while maintaining low latency. In this paper, we aim to revisit the concept of knowledge distillation in the context of unsupervised anomaly detection, emphasizing the significance of feature selection. By employing distinctive features and leveraging different models, we intend to highlight the importance of carefully selecting and utilizing relevant features specifically tailored for the task of anomaly detection. This article presents a novel approach for anomaly detection, which employs dual model knowledge distillation and incorporates various types of semantic information by leveraging high and low-level semantic information.

无监督异常检测在大规模工业制造中具有重要意义。最近的方法充分利用了在自然图像上使用预训练分类器的优势，从特定层中提取代表性特征，然后使用各种技术对其进行处理。值得注意的是，基于内存库的方法虽然已证明具有极高的准确性，但往往需要在延迟方面进行权衡，这给实时工业应用带来了挑战，因为在这种应用中，及时的异常检测和响应至关重要。事实上，知识提炼和归一化流量等替代方法在无监督异常检测中表现出了良好的性能，同时还能保持较低的延迟。在本文中，我们将在无监督异常检测的背景下重新审视知识提炼的概念，强调特征选择的重要性。通过采用与众不同的特征和利用不同的模型，我们希望强调精心选择和利用专门为异常检测任务定制的相关特征的重要性。本文提出了一种新颖的异常检测方法，该方法采用双模型知识提炼，并通过利用高层和低层语义信息来整合各类语义信息。

{"title":"Dual model knowledge distillation for industrial anomaly detection","authors":"Simon Thomine, Hichem Snoussi","doi":"10.1007/s10044-024-01295-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01295-8","url":null,"abstract":"Unsupervised anomaly detection holds significant importance in large-scale industrial manufacturing. Recent methods have capitalized on the benefits of employing a classifier pretrained on natural images to extract representative features from specific layers, which are subsequently processed using various techniques. Notably, memory bank-based methods, which have demonstrated exceptional accuracy, often incur a trade-off in terms of latency, posing a challenge in real-time industrial applications where prompt anomaly detection and response are crucial. Indeed, alternative approaches such as knowledge distillation and normalized flow have demonstrated promising performance in unsupervised anomaly detection while maintaining low latency. In this paper, we aim to revisit the concept of knowledge distillation in the context of unsupervised anomaly detection, emphasizing the significance of feature selection. By employing distinctive features and leveraging different models, we intend to highlight the importance of carefully selecting and utilizing relevant features specifically tailored for the task of anomaly detection. This article presents a novel approach for anomaly detection, which employs dual model knowledge distillation and incorporates various types of semantic information by leveraging high and low-level semantic information.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discriminative binary pattern descriptor for face recognition 用于人脸识别的判别式二进制模式描述符

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-07-02 DOI: 10.1007/s10044-024-01293-w

Shekhar Karanwal

Among several local descriptors invented in literature, the local binary pattern (LBP) is the prolific one. Despite its advantages like low computational complexity and monotonic gray invariance property, there are various demerits are observed in LBP and these are limited spatial patch, high dimension feature, noisy thresholding function and un-affective in harsh illumination variations. To overcome these issues presented work introduces the novel local descriptor called as discriminative binary pattern (DBP). Precisely two descriptors are introduced under DBP so-called Radial orthogonal binary pattern (ROBP) and radial variance binary pattern (RVBP). In former proposed descriptor, for neighborhood comparison, the center pixel is replaced by mean of medians computed from [orthogonal pixels + center pixel] of two 3 × 3 pixel window, formed from radius S1 and S2 of the 5 × 5 image patch. In latter proposed descriptor, the radial variances generated from 8 pair of two pixels are utilized for comparison with their mean value. In case of the both proposed descriptors, the sub-region wise histograms are extracted and fused to develop the entire feature size. Further the feature length of ROBP and RVBP are merged to form the size of the DBP descriptor. The compression is conducted by principal component analysis (PCA) and Fishers linear discriminant analysis). For matching support vector machines is used. Experiments conducted on 8 benchmark datasets reveals the effectiveness of the proposed DBP as compared to the other state of art benchmark methods.

在文献中发明的几种局部描述符中，局部二值模式（LBP）是最多的一种。尽管局部二进制模式具有计算复杂度低、单调灰度不变性强等优点，但它也存在各种缺点，如空间片段有限、特征维度高、阈值函数噪声大以及在光照变化剧烈的情况下无效等。为了克服这些问题，本研究提出了一种新的局部描述符，称为判别二元模式（DBP）。DBP 下引入了两种描述符，即所谓的径向正交二进制模式（ROBP）和径向方差二进制模式（RVBP）。在前一种描述符中，为了进行邻域比较，中心像素由两个 3 × 3 像素窗口 [正交像素 + 中心像素] 计算的中值平均值代替，这两个窗口由 5 × 5 图像补丁的半径 S1 和 S2 形成。在后一种描述符中，利用 8 对两个像素生成的径向方差与其平均值进行比较。在这两种描述符中，都提取了子区域直方图，并将其融合以形成整个特征尺寸。然后，将 ROBP 和 RVBP 的特征长度合并，形成 DBP 描述符的大小。压缩是通过主成分分析（PCA）和菲舍尔线性判别分析（Fishers linear discriminant analysis）进行的。支持向量机用于匹配。在 8 个基准数据集上进行的实验表明，与其他最先进的基准方法相比，所提出的 DBP 非常有效。

{"title":"Discriminative binary pattern descriptor for face recognition","authors":"Shekhar Karanwal","doi":"10.1007/s10044-024-01293-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01293-w","url":null,"abstract":"Among several local descriptors invented in literature, the local binary pattern (LBP) is the prolific one. Despite its advantages like low computational complexity and monotonic gray invariance property, there are various demerits are observed in LBP and these are limited spatial patch, high dimension feature, noisy thresholding function and un-affective in harsh illumination variations. To overcome these issues presented work introduces the novel local descriptor called as discriminative binary pattern (DBP). Precisely two descriptors are introduced under DBP so-called Radial orthogonal binary pattern (ROBP) and radial variance binary pattern (RVBP). In former proposed descriptor, for neighborhood comparison, the center pixel is replaced by mean of medians computed from [orthogonal pixels + center pixel] of two 3 × 3 pixel window, formed from radius S1 and S2 of the 5 × 5 image patch. In latter proposed descriptor, the radial variances generated from 8 pair of two pixels are utilized for comparison with their mean value. In case of the both proposed descriptors, the sub-region wise histograms are extracted and fused to develop the entire feature size. Further the feature length of ROBP and RVBP are merged to form the size of the DBP descriptor. The compression is conducted by principal component analysis (PCA) and Fishers linear discriminant analysis). For matching support vector machines is used. Experiments conducted on 8 benchmark datasets reveals the effectiveness of the proposed DBP as compared to the other state of art benchmark methods.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Computer-aided diagnosis of Alzheimer’s disease and neurocognitive disorders with multimodal Bi-Vision Transformer (BiViT) 利用多模态 Bi-Vision Transformer (BiViT) 对阿尔茨海默病和神经认知障碍进行计算机辅助诊断

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-07-01 DOI: 10.1007/s10044-024-01297-6

S. Muhammad Ahmed Hassan Shah, Muhammad Qasim Khan, Atif Rizwan, Sana Ullah Jan, Nagwan Abdel Samee, Mona M. Jamjoom

Cognitive disorders affect various cognitive functions that can have a substantial impact on individual’s daily life. Alzheimer’s disease (AD) is one of such well-known cognitive disorders. Early detection and treatment of cognitive diseases using artificial intelligence can help contain them. However, the complex spatial relationships and long-range dependencies found in medical imaging data present challenges in achieving the objective. Moreover, for a few years, the application of transformers in imaging has emerged as a promising area of research. A reason can be transformer’s impressive capabilities of tackling spatial relationships and long-range dependency challenges in two ways, i.e., (1) using their self-attention mechanism to generate comprehensive features, and (2) capture complex patterns by incorporating global context and long-range dependencies. In this work, a Bi-Vision Transformer (BiViT) architecture is proposed for classifying different stages of AD, and multiple types of cognitive disorders from 2-dimensional MRI imaging data. More specifically, the transformer is composed of two novel modules, namely Mutual Latent Fusion (MLF) and Parallel Coupled Encoding Strategy (PCES), for effective feature learning. Two different datasets have been used to evaluate the performance of proposed BiViT-based architecture. The first dataset contain several classes such as mild or moderate demented stages of the AD. The other dataset is composed of samples from patients with AD and different cognitive disorders such as mild, early, or moderate impairments. For comprehensive comparison, a multiple transfer learning algorithm and a deep autoencoder have been each trained on both datasets. The results show that the proposed BiViT-based model achieves an accuracy of 96.38% on the AD dataset. However, when applied to cognitive disease data, the accuracy slightly decreases below 96% which can be resulted due to smaller amount of data and imbalance in data distribution. Nevertheless, given the results, it can be hypothesized that the proposed algorithm can perform better if the imbalanced distribution and limited availability problems in data can be addressed.

Graphical abstract

认知障碍会影响各种认知功能，从而对个人的日常生活产生重大影响。阿尔茨海默病（AD）就是众所周知的认知障碍之一。利用人工智能对认知疾病进行早期检测和治疗有助于控制疾病。然而，医学影像数据中复杂的空间关系和远距离依赖关系给实现这一目标带来了挑战。此外，几年来，变压器在成像中的应用已成为一个前景广阔的研究领域。其中一个原因可能是变换器具有令人印象深刻的能力，能通过两种方式解决空间关系和长距离依赖性难题，即：（1）利用其自我注意机制生成综合特征；（2）通过结合全局上下文和长距离依赖性捕捉复杂模式。在这项工作中，我们提出了一种双视觉转换器（BiViT）架构，用于从二维核磁共振成像数据中对不同阶段的注意力缺失症和多种类型的认知障碍进行分类。更具体地说，转换器由两个新模块组成，即相互潜在融合（MLF）和并行耦合编码策略（PCES），用于有效的特征学习。为了评估基于 BiViT 架构的性能，我们使用了两个不同的数据集。第一个数据集包含几个类别，如轻度或中度痴呆阶段的注意力缺失症。另一个数据集由患有注意力缺失症和不同认知障碍（如轻度、早期或中度障碍）的患者样本组成。为了进行全面比较，在这两个数据集上分别训练了多重迁移学习算法和深度自动编码器。结果显示，基于 BiViT 的模型在 AD 数据集上的准确率达到了 96.38%。然而，当应用于认知疾病数据时，准确率略有下降，低于 96%，这可能是由于数据量较小和数据分布不平衡造成的。尽管如此，从结果来看，如果能解决数据分布不平衡和可用性有限的问题，可以推测所提出的算法会有更好的表现。

{"title":"Computer-aided diagnosis of Alzheimer’s disease and neurocognitive disorders with multimodal Bi-Vision Transformer (BiViT)","authors":"S. Muhammad Ahmed Hassan Shah, Muhammad Qasim Khan, Atif Rizwan, Sana Ullah Jan, Nagwan Abdel Samee, Mona M. Jamjoom","doi":"10.1007/s10044-024-01297-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01297-6","url":null,"abstract":"Cognitive disorders affect various cognitive functions that can have a substantial impact on individual’s daily life. Alzheimer’s disease (AD) is one of such well-known cognitive disorders. Early detection and treatment of cognitive diseases using artificial intelligence can help contain them. However, the complex spatial relationships and long-range dependencies found in medical imaging data present challenges in achieving the objective. Moreover, for a few years, the application of transformers in imaging has emerged as a promising area of research. A reason can be transformer’s impressive capabilities of tackling spatial relationships and long-range dependency challenges in two ways, i.e., (1) using their self-attention mechanism to generate comprehensive features, and (2) capture complex patterns by incorporating global context and long-range dependencies. In this work, a Bi-Vision Transformer (BiViT) architecture is proposed for classifying different stages of AD, and multiple types of cognitive disorders from 2-dimensional MRI imaging data. More specifically, the transformer is composed of two novel modules, namely Mutual Latent Fusion (MLF) and Parallel Coupled Encoding Strategy (PCES), for effective feature learning. Two different datasets have been used to evaluate the performance of proposed BiViT-based architecture. The first dataset contain several classes such as mild or moderate demented stages of the AD. The other dataset is composed of samples from patients with AD and different cognitive disorders such as mild, early, or moderate impairments. For comprehensive comparison, a multiple transfer learning algorithm and a deep autoencoder have been each trained on both datasets. The results show that the proposed BiViT-based model achieves an accuracy of 96.38% on the AD dataset. However, when applied to cognitive disease data, the accuracy slightly decreases below 96% which can be resulted due to smaller amount of data and imbalance in data distribution. Nevertheless, given the results, it can be hypothesized that the proposed algorithm can perform better if the imbalanced distribution and limited availability problems in data can be addressed.<h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A three-dimensional extension of the slope chain code: analyzing the tortuosity of the flagellar beat of human sperm 斜链代码的三维扩展：分析人类精子鞭毛搏动的曲折性

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-06-28 DOI: 10.1007/s10044-024-01286-9

Andrés Bribiesca-Sánchez, Adolfo Guzmán, Fernando Montoya, Dan S. Díaz-Guerrero, Haydeé O. Hernández, Paul Hernández-Herrera, Alberto Darszon, Gabriel Corkidi, Ernesto Bribiesca

In the realm of 3D image processing, accurately representing the geometric nuances of line curves is crucial. Building upon the foundation set by the slope chain code, which adeptly represents intricate two-dimensional curves using an array capturing the exterior angles at each vertex, this study introduces an innovative 3D encoding method tailored for polygonal curves. This 3D encoding employs parallel slope and torsion chains, ensuring invariance to common transformations like translations, rotations, and uniform scaling, while also demonstrating robustness against mirror imaging and variable starting points. A hallmark feature of this method is its ability to compute tortuosity, a descriptor of curve complexity or winding nature. By applying this technique to biomedical engineering, we delved into the flagellar beat patterns of human sperm. These insights underscore the versatility of our 3D encoding across diverse computer vision applications.

在三维图像处理领域，准确表达线形曲线的几何细微差别至关重要。斜率链代码使用捕捉每个顶点外角的数组来巧妙地表示复杂的二维曲线，在此基础上，本研究引入了一种为多边形曲线量身定制的创新三维编码方法。这种三维编码方法采用平行斜率链和扭转链，确保了对平移、旋转和均匀缩放等常见变换的不变性，同时还证明了对镜像和可变起点的鲁棒性。该方法的一大特点是能够计算曲折度，这是曲线复杂性或缠绕性的描述符。通过将这一技术应用于生物医学工程，我们深入研究了人类精子的鞭毛跳动模式。这些洞察力强调了我们的三维编码在各种计算机视觉应用中的通用性。

{"title":"A three-dimensional extension of the slope chain code: analyzing the tortuosity of the flagellar beat of human sperm","authors":"Andrés Bribiesca-Sánchez, Adolfo Guzmán, Fernando Montoya, Dan S. Díaz-Guerrero, Haydeé O. Hernández, Paul Hernández-Herrera, Alberto Darszon, Gabriel Corkidi, Ernesto Bribiesca","doi":"10.1007/s10044-024-01286-9","DOIUrl":"https://doi.org/10.1007/s10044-024-01286-9","url":null,"abstract":"In the realm of 3D image processing, accurately representing the geometric nuances of line curves is crucial. Building upon the foundation set by the slope chain code, which adeptly represents intricate two-dimensional curves using an array capturing the exterior angles at each vertex, this study introduces an innovative 3D encoding method tailored for polygonal curves. This 3D encoding employs parallel slope and torsion chains, ensuring invariance to common transformations like translations, rotations, and uniform scaling, while also demonstrating robustness against mirror imaging and variable starting points. A hallmark feature of this method is its ability to compute tortuosity, a descriptor of curve complexity or winding nature. By applying this technique to biomedical engineering, we delved into the flagellar beat patterns of human sperm. These insights underscore the versatility of our 3D encoding across diverse computer vision applications.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fine grained dual level attention mechanisms with spacial context information fusion for object detection 细粒度双层注意力机制与空间上下文信息融合用于物体检测

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-06-28 DOI: 10.1007/s10044-024-01290-z

Haigang Deng, Chuanxu Wang, Chengwei Li, Zhang Hao

For channel and spatial feature map C×W×H in object detection task, its information fusion usually relies on attention mechanism, that is, all C channels and the entire space W×H are all compressed respectively via average/max pooling, and then their attention weight masks are obtained based on correlation calculation. This coarse-grained global operation ignores the differences among multiple channels and diverse spatial regions, resulting in inaccurate attention weights. In addition, how to mine the contextual information in the space W×H is also a challenge for object recognition and localization. To this end, we propose a Fine-Grained Dual Level Attention Mechanism joint Spacial Context Information Fusion module for object detection (FGDLAM&SCIF). It is a cascaded structure, firstly, we subdivide the feature space W×H into n (optimized as n = 4 in experiments) subspaces and construct a global adaptive pooling and one-dimensional convolution algorithm to effectively extract the feature channel weights on each subspace respectively. Secondly, the C feature channels are divided into n (n = 4) sub-channels, and then a multi-scale module is constructed in the feature space W×H to mine context information. Finally, row and column coding is used to fuse them orthogonally to obtain enhanced features. This module is embeddable, which can be transplanted into any object detection network, such as YOLOv4/v5, PPYOLOE, YOLOX and MobileNet, ResNet as well. Experiments are conducted on the MS COCO 2017 and Pascal VOC 2007 datasets to verify its effectiveness and good portability.

对于物体检测任务中的信道和空间特征图 C×W×H，其信息融合通常依赖于注意力机制，即通过平均/最大池化将所有 C 信道和整个空间 W×H 都分别压缩，然后根据相关性计算得到它们的注意力权重掩码。这种粗粒度的全局操作忽略了多个通道和不同空间区域之间的差异，导致注意力权重不准确。此外，如何挖掘 W×H 空间中的上下文信息也是物体识别和定位的一个难题。为此，我们提出了一种用于物体检测的细粒度双层注意力机制联合空间上下文信息融合模块（FGDLAM&SCIF）。它是一个级联结构，首先，我们将特征空间 W×H 细分为 n 个（实验中优化为 n = 4）子空间，并构建了全局自适应池化算法和一维卷积算法，分别有效提取每个子空间上的特征通道权重。其次，将 C 特征通道划分为 n（n = 4）个子通道，然后在特征空间 W×H 中构建多尺度模块来挖掘上下文信息。最后，利用行列编码将它们正交融合，从而获得增强的特征。该模块是可嵌入的，可以移植到任何物体检测网络中，如 YOLOv4/v5、PPYOLOE、YOLOX 和 MobileNet、ResNet 等。我们在 MS COCO 2017 和 Pascal VOC 2007 数据集上进行了实验，以验证其有效性和良好的可移植性。

{"title":"Fine grained dual level attention mechanisms with spacial context information fusion for object detection","authors":"Haigang Deng, Chuanxu Wang, Chengwei Li, Zhang Hao","doi":"10.1007/s10044-024-01290-z","DOIUrl":"https://doi.org/10.1007/s10044-024-01290-z","url":null,"abstract":"For channel and spatial feature map C×W×H in object detection task, its information fusion usually relies on attention mechanism, that is, all C channels and the entire space W×H are all compressed respectively via average/max pooling, and then their attention weight masks are obtained based on correlation calculation. This coarse-grained global operation ignores the differences among multiple channels and diverse spatial regions, resulting in inaccurate attention weights. In addition, how to mine the contextual information in the space W×H is also a challenge for object recognition and localization. To this end, we propose a Fine-Grained Dual Level Attention Mechanism joint Spacial Context Information Fusion module for object detection (FGDLAM&SCIF). It is a cascaded structure, firstly, we subdivide the feature space W×H into n (optimized as n = 4 in experiments) subspaces and construct a global adaptive pooling and one-dimensional convolution algorithm to effectively extract the feature channel weights on each subspace respectively. Secondly, the C feature channels are divided into n (n = 4) sub-channels, and then a multi-scale module is constructed in the feature space W×H to mine context information. Finally, row and column coding is used to fuse them orthogonally to obtain enhanced features. This module is embeddable, which can be transplanted into any object detection network, such as YOLOv4/v5, PPYOLOE, YOLOX and MobileNet, ResNet as well. Experiments are conducted on the MS COCO 2017 and Pascal VOC 2007 datasets to verify its effectiveness and good portability.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A deep graph kernel-based time series classification algorithm 基于深度图核的时间序列分类算法

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-06-26 DOI: 10.1007/s10044-024-01292-x

Mengping Yu, Huan Huang, Rui Hou, Xiaoxuan Ma, Shuai Yuan

Time series data are sequences of values that are obtained by sampling a signal at a fixed frequency, and time series classification algorithms distinguish time series into different categories. Among many time series classification algorithms, subseries-based algorithms have received widespread attention because of their high accuracy and low computational complexity. However, subseries-based algorithms consider the similarity of subseries only by shape and ignore semantic similarity. Therefore, the purpose of this paper is to determine how to solve the problem that subseries-based time series classification algorithms ignore the semantic similarity between subseries. To address this issue, we introduce the deep graph kernel technique to capture the semantic similarity between subseries. To verify the performance of the method, we test the proposed algorithm on publicly available datasets from the UCR repository and the experimental results prove that the deep graph kernel has an important role in enhancing the accuracy of the algorithm and that the proposed algorithm performs quite well in terms of accuracy and has a considerable advantage over other representative algorithms.

时间序列数据是以固定频率对信号进行采样而得到的数值序列，时间序列分类算法将时间序列分为不同的类别。在众多时间序列分类算法中，基于子序列的算法因其高精度和低计算复杂度而受到广泛关注。然而，基于子序列的算法只考虑子序列的形状相似性，而忽略了语义相似性。因此，本文旨在确定如何解决基于子序列的时间序列分类算法忽略子序列间语义相似性的问题。为了解决这个问题，我们引入了深度图核技术来捕捉子序列之间的语义相似性。为了验证该方法的性能，我们在 UCR 数据库的公开数据集上测试了所提出的算法，实验结果证明深度图核在提高算法准确性方面发挥了重要作用，而且所提出的算法在准确性方面表现相当出色，与其他代表性算法相比具有相当大的优势。

{"title":"A deep graph kernel-based time series classification algorithm","authors":"Mengping Yu, Huan Huang, Rui Hou, Xiaoxuan Ma, Shuai Yuan","doi":"10.1007/s10044-024-01292-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01292-x","url":null,"abstract":"Time series data are sequences of values that are obtained by sampling a signal at a fixed frequency, and time series classification algorithms distinguish time series into different categories. Among many time series classification algorithms, subseries-based algorithms have received widespread attention because of their high accuracy and low computational complexity. However, subseries-based algorithms consider the similarity of subseries only by shape and ignore semantic similarity. Therefore, the purpose of this paper is to determine how to solve the problem that subseries-based time series classification algorithms ignore the semantic similarity between subseries. To address this issue, we introduce the deep graph kernel technique to capture the semantic similarity between subseries. To verify the performance of the method, we test the proposed algorithm on publicly available datasets from the UCR repository and the experimental results prove that the deep graph kernel has an important role in enhancing the accuracy of the algorithm and that the proposed algorithm performs quite well in terms of accuracy and has a considerable advantage over other representative algorithms.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Smoking-YOLOv8: a novel smoking detection algorithm for chemical plant personnel 吸烟-YOLOv8：针对化工厂员工的新型吸烟检测算法

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-06-24 DOI: 10.1007/s10044-024-01288-7

Zhong Wang, Yi Liu, Lanfang Lei, Peibei Shi

This study aims to address the challenges of detecting smoking behavior among workers in chemical plant environments. Smoking behavior is difficult to discern in images, with the cigarette occupying only a small pixel area, compounded by the complex background of chemical plants. Traditional models struggle to accurately capture smoking features, leading to feature loss, reduced recognition accuracy, and issues like false positives and missed detections. To overcome these challenges, we have developed a smoking behavior recognition method based on the YOLOv8 model, named Smoking-YOLOv8. Our approach introduces an SD attention mechanism that focuses on the smoking areas within images. By aggregating information from different positions through weighted averaging, it effectively manages long-distance dependencies and suppresses irrelevant background noise, thereby enhancing detection performance. Furthermore, we utilize Wise-IoU as the regression loss for bounding boxes, along with a rational gradient distribution strategy that prioritizes samples of average quality to improve the model’s precision in localization. Finally, the introduction of SPPCSPC and PConv modules in the neck section of the network allows for multi-faceted feature extraction from images, reducing redundant computation and memory access, and effectively extracting spatial features to balance computational load and optimize network architecture. Experimental results on a custom dataset of smoking behavior in chemical plants show that our model outperforms the standard YOLOv8 model in mean Average Precision (mAP@0.5) by 6.18%, surpassing other mainstream models in overall performance.

本研究旨在解决在化工厂环境中检测工人吸烟行为的难题。吸烟行为在图像中很难辨别，因为香烟只占很小的像素区域，再加上化工厂复杂的背景。传统模型难以准确捕捉吸烟特征，从而导致特征丢失、识别准确率降低以及误报和漏报等问题。为了克服这些挑战，我们开发了一种基于 YOLOv8 模型的吸烟行为识别方法，命名为 Smoking-YOLOv8。我们的方法引入了 SD 关注机制，重点关注图像中的吸烟区域。通过加权平均法汇总来自不同位置的信息，它能有效管理长距离依赖关系，抑制无关背景噪音，从而提高检测性能。此外，我们还利用 Wise-IoU 作为边界框的回归损失，并采用合理的梯度分布策略，优先处理质量一般的样本，从而提高模型的定位精度。最后，在网络的颈部引入 SPPCSPC 和 PConv 模块，可以从图像中进行多方面的特征提取，减少冗余计算和内存访问，并有效提取空间特征，以平衡计算负荷和优化网络结构。在化工厂吸烟行为定制数据集上的实验结果表明，我们的模型在平均精度（mAP@0.5）上比标准 YOLOv8 模型高出 6.18%，在整体性能上超越了其他主流模型。

{"title":"Smoking-YOLOv8: a novel smoking detection algorithm for chemical plant personnel","authors":"Zhong Wang, Yi Liu, Lanfang Lei, Peibei Shi","doi":"10.1007/s10044-024-01288-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01288-7","url":null,"abstract":"This study aims to address the challenges of detecting smoking behavior among workers in chemical plant environments. Smoking behavior is difficult to discern in images, with the cigarette occupying only a small pixel area, compounded by the complex background of chemical plants. Traditional models struggle to accurately capture smoking features, leading to feature loss, reduced recognition accuracy, and issues like false positives and missed detections. To overcome these challenges, we have developed a smoking behavior recognition method based on the YOLOv8 model, named Smoking-YOLOv8. Our approach introduces an SD attention mechanism that focuses on the smoking areas within images. By aggregating information from different positions through weighted averaging, it effectively manages long-distance dependencies and suppresses irrelevant background noise, thereby enhancing detection performance. Furthermore, we utilize Wise-IoU as the regression loss for bounding boxes, along with a rational gradient distribution strategy that prioritizes samples of average quality to improve the model’s precision in localization. Finally, the introduction of SPPCSPC and PConv modules in the neck section of the network allows for multi-faceted feature extraction from images, reducing redundant computation and memory access, and effectively extracting spatial features to balance computational load and optimize network architecture. Experimental results on a custom dataset of smoking behavior in chemical plants show that our model outperforms the standard YOLOv8 model in mean Average Precision (mAP@0.5) by 6.18%, surpassing other mainstream models in overall performance.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information 3D-Mol：利用三维信息进行分子特性预测的新型对比学习框架

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-06-21 DOI: 10.1007/s10044-024-01287-8

Taojie Kuang, Yiming Ren, Zhixiang Ren

Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.

分子性质预测对早期候选药物的筛选和优化至关重要，基于深度学习的方法在这方面取得了进步。虽然基于深度学习的方法取得了长足的进步，但它们在充分利用三维空间信息方面往往存在不足。具体来说，目前的分子编码技术往往不能充分提取空间信息，导致表征模糊，一个表征可能代表多个不同的分子。此外，现有的分子建模方法主要关注最稳定的三维构象，而忽略了现实中存在的其他可行构象。为了解决这些问题，我们提出了 3D-Mol 这种新方法，旨在实现更精确的空间结构表示。它将分子解构为三个层次图，以更好地提取几何信息。此外，3D-Mol 还利用对比学习对 2000 万个未标记数据进行预训练，根据其三维构象描述符和指纹的相似性，将具有相同拓扑结构的构象视为加权正对，而将具有反差的构象视为负对。我们在 7 个基准上将 3D-Mol 与各种最先进的基线进行了比较，证明了我们的出色性能。

{"title":"3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information","authors":"Taojie Kuang, Yiming Ren, Zhixiang Ren","doi":"10.1007/s10044-024-01287-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01287-8","url":null,"abstract":"Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low-complexity arrays of patch signature for efficient ancient coin retrieval 用于高效古钱币检索的低复杂度补丁签名阵列

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-06-19 DOI: 10.1007/s10044-024-01284-x

Florian Lardeux, Petra Gomez-Krämer, Sylvain Marchand

We present a new recognition framework for ancient coins struck from the same die. It is called Low-complexity Arrays of Patch Signatures. To overcome the problem of illumination conditions we use multi-light energy maps which are a light-independent, 2.5D representation of the coin. The coin recognition is based on a local texture analysis of the energy maps. Descriptors of patches, tailored to coin images via the properties provided by the energy map, are matched against a database using a system of associative arrays. The system of associative arrays used for the matching is a generalization of the Low-complexity Arrays of Contour Signatures. Hence, the matching is very efficient and nearly at constant time. Due to the lack of available data, we present two new data sets of artificial and real ancient coins respectively. Theoretical insights for the framework are discussed and various experiments demonstrate the promising efficiency of our method.

我们提出了一种新的古钱币识别框架。它被称为低复杂度补丁签名阵列。为了克服光照条件的问题，我们使用了多光能量图，这是一种不受光照影响的钱币 2.5D 表示法。硬币识别基于对能量图的局部纹理分析。通过能量图提供的属性为硬币图像量身定制的斑块描述符，使用关联阵列系统与数据库进行匹配。用于匹配的关联阵列系统是低复杂度轮廓特征阵列的一般化。因此，匹配效率非常高，而且时间几乎不变。由于缺乏可用数据，我们提出了两个新的数据集，分别是人工古钱币和真实古钱币。我们讨论了该框架的理论见解，各种实验证明了我们方法的高效性。

引用次数: 0

Few-shot learning for COVID-19 chest X-ray classification with imbalanced data: an inter vs. intra domain study 不平衡数据下 COVID-19 胸部 X 光片分类的少量学习：域间与域内研究

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-06-11 DOI: 10.1007/s10044-024-01285-w

Alejandro Galán-Cuenca, Antonio Javier Gallego, Marcelo Saval-Calvo, Antonio Pertusa

Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research. However, some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images. This work studies the effect of these challenges at the intra- and inter-domain level in few-shot learning scenarios with severe data imbalance. For this, we propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance. Specifically, different initialization and data augmentation methods are analyzed, and four adaptations to Siamese networks of solutions to deal with imbalanced data are introduced, including data balancing and weighted loss, both separately and combined, and with a different balance of pairing ratios. Moreover, we also assess the inference process considering four classifiers, namely Histogram, kNN, SVM, and Random Forest. Evaluation is performed on three chest X-ray datasets with annotated cases of both positive and negative COVID-19 diagnoses. The accuracy of each technique proposed for the Siamese architecture is analyzed separately. The results are compared to those obtained using equivalent methods on a state-of-the-art CNN, achieving an average F1 improvement of up to 3.6%, and up to 5.6% of F1 for intra-domain cases. We conclude that the introduced techniques offer promising improvements over the baseline in almost all cases and that the technique selection may vary depending on the amount of data available and the level of imbalance.

医学图像数据集对于计算机辅助诊断、治疗计划和医学研究中使用的模型训练至关重要。然而，这些数据集也面临着一些挑战，包括数据分布的不稳定性、数据稀缺性以及使用通用图像预训练模型时的迁移学习问题。这项工作研究了这些挑战在数据严重不平衡的少量学习场景中对域内和域间水平的影响。为此，我们提出了一种基于连体神经网络的方法，其中集成了一系列技术来缓解数据稀缺和分布不平衡的影响。具体来说，我们分析了不同的初始化和数据扩充方法，并介绍了四种适应连体网络的解决方案，以处理不平衡数据，包括数据平衡和加权损失，既可单独使用，也可合并使用，并采用不同的配对平衡比。此外，我们还评估了四种分类器的推理过程，即直方图、kNN、SVM 和随机森林。评估在三个胸部 X 光数据集上进行，这些数据集包含 COVID-19 阳性和阴性诊断的注释病例。分别分析了为连体架构提出的每种技术的准确性。结果与在最先进的 CNN 上使用同等方法获得的结果进行了比较，平均 F1 提高了 3.6%，域内病例的 F1 提高了 5.6%。我们得出的结论是，在几乎所有情况下，引入的技术都能比基线技术带来可喜的改进，而技术的选择可能会因可用数据量和不平衡程度的不同而有所变化。

{"title":"Few-shot learning for COVID-19 chest X-ray classification with imbalanced data: an inter vs. intra domain study","authors":"Alejandro Galán-Cuenca, Antonio Javier Gallego, Marcelo Saval-Calvo, Antonio Pertusa","doi":"10.1007/s10044-024-01285-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01285-w","url":null,"abstract":"Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research. However, some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images. This work studies the effect of these challenges at the intra- and inter-domain level in few-shot learning scenarios with severe data imbalance. For this, we propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance. Specifically, different initialization and data augmentation methods are analyzed, and four adaptations to Siamese networks of solutions to deal with imbalanced data are introduced, including data balancing and weighted loss, both separately and combined, and with a different balance of pairing ratios. Moreover, we also assess the inference process considering four classifiers, namely Histogram, kNN, SVM, and Random Forest. Evaluation is performed on three chest X-ray datasets with annotated cases of both positive and negative COVID-19 diagnoses. The accuracy of each technique proposed for the Siamese architecture is analyzed separately. The results are compared to those obtained using equivalent methods on a state-of-the-art CNN, achieving an average F1 improvement of up to 3.6%, and up to 5.6% of F1 for intra-domain cases. We conclude that the introduced techniques offer promising improvements over the baseline in almost all cases and that the technique selection may vary depending on the amount of data available and the level of imbalance.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0