首页 > 最新文献

Pattern Analysis and Applications最新文献

英文 中文
K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering K-BEST 子空间聚类:内核友好的块对角嵌入式和保全相似性的变换子空间聚类
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-19 DOI: 10.1007/s10044-024-01336-2
Jyoti Maggu, Anurag Goel

Subspace clustering methods, employing sparse and low-rank models, have demonstrated efficacy in clustering high-dimensional data. These approaches typically assume the separability of input data into distinct subspaces, a premise that does not hold true in general. Furthermore, prevalent low-rank and sparse methods relying on self-expression exhibit effectiveness primarily with linear structure data, facing limitations in processing datasets with intricate nonlinear structures. While kernel subspace clustering methods excel in handling nonlinear structures, they may compromise similarity information during the reconstruction of original data in kernel space. Additionally, these methods may fall short of attaining an affinity matrix with an optimal block-diagonal property. In response to these challenges, this paper introduces a novel subspace clustering approach named Similarity Preserving Kernel Block Diagonal Representation based Transformed Subspace Clustering (KBD-TSC). KBD-TSC contributes in three key aspects: (1) integration of a kernelized version of transform learning within a subspace clustering framework, introducing a block diagonal representation term to generate an affinity matrix with a block-diagonal structure. (2) Construction and integration of a similarity preserving regularizer into the model by minimizing the discrepancy between inner products of the original data and those of the reconstructed data in kernel space. This facilitates enhanced preservation of similarity information between the original data points. (3) Proposal of KBD-TSC by integrating the block diagonal representation term and similarity preserving regularizer into a kernel self-expressing model. The optimization of the proposed model is efficiently addressed through the alternating direction method of multipliers. This study validates the effectiveness of the proposed KBD-TSC method through experimental results obtained from nine datasets, showcasing its potential in addressing the limitations of existing subspace clustering techniques.

采用稀疏和低秩模型的子空间聚类方法已在高维数据聚类方面显示出功效。这些方法通常假定输入数据可分离成不同的子空间,但这一前提在一般情况下并不成立。此外,依靠自我表达的普遍低阶和稀疏方法主要对线性结构数据有效,在处理具有复杂非线性结构的数据集时受到限制。虽然核子空间聚类方法在处理非线性结构方面表现出色,但它们在核子空间中重建原始数据时可能会损害相似性信息。此外,这些方法可能无法获得具有最佳块对角特性的亲和矩阵。为了应对这些挑战,本文介绍了一种新颖的子空间聚类方法,名为基于变换子空间聚类的相似性保留内核块对角线表示(KBD-TSC)。KBD-TSC 主要在三个方面做出了贡献:(1) 在子空间聚类框架内整合了变换学习的核化版本,引入了块对角线表示项,以生成具有块对角线结构的亲和矩阵。(2) 通过最小化原始数据与核空间中重建数据的内积之间的差异,构建并在模型中集成了一个保持相似性的正则器。这有助于增强原始数据点之间相似性信息的保存。(3) 将块对角线表示项和保持相似性的正则化器整合到内核自表达模型中,提出 KBD-TSC 模型。通过乘法交替方向法有效地解决了所提模型的优化问题。本研究通过九个数据集的实验结果验证了所提出的 KBD-TSC 方法的有效性,展示了该方法在解决现有子空间聚类技术的局限性方面的潜力。
{"title":"K-BEST subspace clustering: kernel-friendly block-diagonal embedded and similarity-preserving transformed subspace clustering","authors":"Jyoti Maggu, Anurag Goel","doi":"10.1007/s10044-024-01336-2","DOIUrl":"https://doi.org/10.1007/s10044-024-01336-2","url":null,"abstract":"<p>Subspace clustering methods, employing sparse and low-rank models, have demonstrated efficacy in clustering high-dimensional data. These approaches typically assume the separability of input data into distinct subspaces, a premise that does not hold true in general. Furthermore, prevalent low-rank and sparse methods relying on self-expression exhibit effectiveness primarily with linear structure data, facing limitations in processing datasets with intricate nonlinear structures. While kernel subspace clustering methods excel in handling nonlinear structures, they may compromise similarity information during the reconstruction of original data in kernel space. Additionally, these methods may fall short of attaining an affinity matrix with an optimal block-diagonal property. In response to these challenges, this paper introduces a novel subspace clustering approach named Similarity Preserving Kernel Block Diagonal Representation based Transformed Subspace Clustering (KBD-TSC). KBD-TSC contributes in three key aspects: (1) integration of a kernelized version of transform learning within a subspace clustering framework, introducing a block diagonal representation term to generate an affinity matrix with a block-diagonal structure. (2) Construction and integration of a similarity preserving regularizer into the model by minimizing the discrepancy between inner products of the original data and those of the reconstructed data in kernel space. This facilitates enhanced preservation of similarity information between the original data points. (3) Proposal of KBD-TSC by integrating the block diagonal representation term and similarity preserving regularizer into a kernel self-expressing model. The optimization of the proposed model is efficiently addressed through the alternating direction method of multipliers. This study validates the effectiveness of the proposed KBD-TSC method through experimental results obtained from nine datasets, showcasing its potential in addressing the limitations of existing subspace clustering techniques.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hidden Markov models with multivariate bounded asymmetric student’s t-mixture model emissions 隐马尔可夫模型与多变量有界非对称学生 t 混合模型排放
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-18 DOI: 10.1007/s10044-024-01341-5
Ons Bouarada, Muhammad Azam, Manar Amayri, Nizar Bouguila

Hidden Markov models (HMMs) are popular methods for continuous sequential data modeling and classification tasks. In such applications, the observation emission densities of the HMM hidden states are generally continuous, can vary from one model to the other, and are typically modeled by elliptically contoured distributions, namely Gaussians or Student’s t-distributions. In this context, this paper proposes a novel HMM with Bounded Asymmetric Student’s t-Mixture Model (BASMM) emissions. Our new BASMMHMM is introduced in the light of the added robustness guaranteed by the BASMM in comparison to other popular emission distributions such as the Gaussian Mixture Model (GMM). In fact, GMMs generally have a limited performance with outliers in the data sets (observations) that the HMM is fitted to. Also, GMMs cannot sufficiently model skewed populations, which are typical in many fields, such as financial or signal processing-related data sets. An excellent alternative to solve this problem is found in Student’s t-mixture models. They have similar behaviour and shape to GMMs, but with heavier tails. This allows to have more tolerance towards data sets that span extensive ranges and include outliers. Asymmetry and bounded support are also important features that can further extend the model’s flexibility and fit the imperfections of real-world data. This leads us to explore the effectiveness of the BASMM as an observation emission distribution in HMMs, hence the proposed BASMMHMM. We will also demonstrate the improved robustness of our model by presenting the results of three different experiments: occupancy estimation, stock price prediction, and human activity recognition.

隐马尔可夫模型(HMM)是连续序列数据建模和分类任务的常用方法。在此类应用中,HMM 隐藏状态的观测发射密度通常是连续的,可以从一个模型变化到另一个模型,并且通常由椭圆轮廓分布(即高斯分布或学生 t 分布)建模。在这种情况下,本文提出了一种新型 HMM,即有界非对称学生 t 混合模型(BASMM)。与高斯混杂模型(GMM)等其他流行的发射分布相比,BASMM 保证了更强的鲁棒性,因此我们引入了新的 BASMMHMM。事实上,当 HMM 拟合的数据集(观测值)中出现异常值时,GMM 的性能通常有限。此外,GMM 无法充分模拟偏斜群体,而这在许多领域都很典型,如金融或信号处理相关数据集。要解决这个问题,Student's t 混合物模型是一个很好的选择。它们的行为和形状与 GMM 相似,但尾部更重。这样就可以对跨度大、包含异常值的数据集有更大的容忍度。不对称和有界支持也是重要的特征,可以进一步扩展模型的灵活性,并适应现实世界数据的不完美。这促使我们探索 BASMM 在 HMM 中作为观测发射分布的有效性,从而提出了 BASMMHMM。我们还将通过展示三个不同实验的结果,证明我们的模型具有更好的鲁棒性:占用率估计、股票价格预测和人类活动识别。
{"title":"Hidden Markov models with multivariate bounded asymmetric student’s t-mixture model emissions","authors":"Ons Bouarada, Muhammad Azam, Manar Amayri, Nizar Bouguila","doi":"10.1007/s10044-024-01341-5","DOIUrl":"https://doi.org/10.1007/s10044-024-01341-5","url":null,"abstract":"<p>Hidden Markov models (HMMs) are popular methods for continuous sequential data modeling and classification tasks. In such applications, the observation emission densities of the HMM hidden states are generally continuous, can vary from one model to the other, and are typically modeled by elliptically contoured distributions, namely Gaussians or Student’s t-distributions. In this context, this paper proposes a novel HMM with Bounded Asymmetric Student’s t-Mixture Model (BASMM) emissions. Our new BASMMHMM is introduced in the light of the added robustness guaranteed by the BASMM in comparison to other popular emission distributions such as the Gaussian Mixture Model (GMM). In fact, GMMs generally have a limited performance with outliers in the data sets (observations) that the HMM is fitted to. Also, GMMs cannot sufficiently model skewed populations, which are typical in many fields, such as financial or signal processing-related data sets. An excellent alternative to solve this problem is found in Student’s t-mixture models. They have similar behaviour and shape to GMMs, but with heavier tails. This allows to have more tolerance towards data sets that span extensive ranges and include outliers. Asymmetry and bounded support are also important features that can further extend the model’s flexibility and fit the imperfections of real-world data. This leads us to explore the effectiveness of the BASMM as an observation emission distribution in HMMs, hence the proposed BASMMHMM. We will also demonstrate the improved robustness of our model by presenting the results of three different experiments: occupancy estimation, stock price prediction, and human activity recognition.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on decoupled adaptive graph convolution networks based on skeleton data for action recognition 基于骨架数据的去耦合自适应图卷积网络在动作识别中的应用研究
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-18 DOI: 10.1007/s10044-024-01319-3
Haigang Deng, Guocheng Lin, Chengwei Li, Chuanxu Wang

Graph convolutional network is apt for feature extraction in terms of non-Euclidian human skeleton data, but its adjacency matrix is fixed and the receptive field is small, which results in bias representation for skeleton intrinsic information. In addition, the operation of mean pooling on spatio-temporal features in classification layer will result in losing information and degrade recognition accuracy. To this end, the Decoupled Adaptive Graph Convolutional Network (DAGCN) is proposed. Specifically, a multi-level adaptive adjacency matrix is designed, which can dynamically obtain the rich correlation information among the skeleton nodes by a non-local adaptive algorithm. Whereafter, a new Residual Multi-scale Temporal Convolution Network (RMTCN) is proposed to fully extract temporal feature of the above decoupled skeleton dada. For the second problem in classification, we decompose the spatio-temporal features into three parts as spatial, temporal, spatio-temporal information, they are averagely pooled respectively, and added together for classification, denoted as STMP (spatio-temporal mean pooling) module. Experimental results show that our algorithm achieves accuracy of 96.5%, 90.6%, 96.4% on NTU-RGB+D60, NTU-RGB+D120 and NW-UCLA data sets respectively.

图卷积网络适用于非欧几里得人体骨骼数据的特征提取,但其邻接矩阵固定,感受野小,导致骨骼内在信息的表示存在偏差。此外,在分类层对时空特征进行均值池操作会导致信息丢失,降低识别精度。为此,我们提出了去耦合自适应图卷积网络(DAGCN)。具体来说,设计了一个多层次自适应邻接矩阵,通过非局部自适应算法动态获取骨架节点间丰富的相关信息。之后,我们提出了一种新的残差多尺度时空卷积网络(RMTCN),以充分提取上述解耦骨架达达的时空特征。针对分类中的第二个问题,我们将时空特征分解为空间、时间、时空信息三部分,分别进行平均池化,并将它们加在一起进行分类,称为 STMP(时空平均池化)模块。实验结果表明,我们的算法在 NTU-RGB+D60、NTU-RGB+D120 和 NW-UCLA 数据集上的准确率分别达到 96.5%、90.6% 和 96.4%。
{"title":"Research on decoupled adaptive graph convolution networks based on skeleton data for action recognition","authors":"Haigang Deng, Guocheng Lin, Chengwei Li, Chuanxu Wang","doi":"10.1007/s10044-024-01319-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01319-3","url":null,"abstract":"<p>Graph convolutional network is apt for feature extraction in terms of non-Euclidian human skeleton data, but its adjacency matrix is fixed and the receptive field is small, which results in bias representation for skeleton intrinsic information. In addition, the operation of mean pooling on spatio-temporal features in classification layer will result in losing information and degrade recognition accuracy. To this end, the Decoupled Adaptive Graph Convolutional Network (DAGCN) is proposed. Specifically, a multi-level adaptive adjacency matrix is designed, which can dynamically obtain the rich correlation information among the skeleton nodes by a non-local adaptive algorithm. Whereafter, a new Residual Multi-scale Temporal Convolution Network (RMTCN) is proposed to fully extract temporal feature of the above decoupled skeleton dada. For the second problem in classification, we decompose the spatio-temporal features into three parts as spatial, temporal, spatio-temporal information, they are averagely pooled respectively, and added together for classification, denoted as STMP (spatio-temporal mean pooling) module. Experimental results show that our algorithm achieves accuracy of 96.5%, 90.6%, 96.4% on NTU-RGB+D60, NTU-RGB+D120 and NW-UCLA data sets respectively.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
YOLOv7-GCM: a detection algorithm for creek waste based on improved YOLOv7 model YOLOv7-GCM:基于改进的 YOLOv7 模型的溪流废物检测算法
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-17 DOI: 10.1007/s10044-024-01338-0
Jianhua Qin, Honglan Zhou, Huaian Yi, Luyao Ma, Jianhan Nie, Tingting Huang

To enhance the cleanliness of creek environments, quadruped robots can be utilized to detect for creek waste. The continuous changes in the water environment significantly reduce the accuracy of image detection when using quadruped robots for image acquisition. In order to improve the accuracy of quadruped robots in waste detection, this article proposed a detection model called YOLOv7-GCM model for creek waste. The model integrated a global attention mechanism (GAM) into the YOLOv7 model, which achieved accurate waste detection in ever-changing backgrounds and underwater conditions. A content-aware reassembly of features (CARAFE) replaced a up-sampling of the YOLOv7 model to achieve more accurate and efficient feature reconstruction. A minimum point distance intersection over union (MPDIOU) loss function replaced the CIOU loss function of the YOLOv7 model to more accurately measure the similarity between target boxes and predictive boxes. After the aforementioned improvements, the YOLOv7-GCM model was obtained. A quadruped robot to patrol the creek and collect images of creek waste. Finally, the YOLOv7-GCM model was trained on the creek waste dataset. The outcomes of the experiment show that the precision rate of the YOLOv7-GCM model has increased by 4.2% and the mean average precision (mAP@0.5) has accumulated by 2.1%. The YOLOv7-GCM model provides a new method for identifying creek waste, which may help promote efficient waste management.

为了提高溪流环境的清洁度,可以利用四足机器人检测溪流垃圾。在使用四足机器人采集图像时,水环境的不断变化会大大降低图像检测的准确性。为了提高四足机器人检测垃圾的准确性,本文提出了一种名为 YOLOv7-GCM 的溪流垃圾检测模型。该模型将全局注意力机制(GAM)集成到 YOLOv7 模型中,在不断变化的背景和水下条件下实现了精确的垃圾检测。内容感知特征重组(CARAFE)取代了 YOLOv7 模型的上采样,实现了更准确、更高效的特征重建。最小点距离交集大于联合(MPDIOU)损失函数取代了 YOLOv7 模型的 CIOU 损失函数,以更准确地衡量目标方框和预测方框之间的相似性。经过上述改进后,得到了 YOLOv7-GCM 模型。四足机器人巡视小溪并收集小溪废弃物的图像。最后,在溪流垃圾数据集上对 YOLOv7-GCM 模型进行了训练。实验结果表明,YOLOv7-GCM 模型的精确率提高了 4.2%,平均精度 (mAP@0.5) 提高了 2.1%。YOLOv7-GCM 模型为识别溪流垃圾提供了一种新方法,有助于促进有效的垃圾管理。
{"title":"YOLOv7-GCM: a detection algorithm for creek waste based on improved YOLOv7 model","authors":"Jianhua Qin, Honglan Zhou, Huaian Yi, Luyao Ma, Jianhan Nie, Tingting Huang","doi":"10.1007/s10044-024-01338-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01338-0","url":null,"abstract":"<p>To enhance the cleanliness of creek environments, quadruped robots can be utilized to detect for creek waste. The continuous changes in the water environment significantly reduce the accuracy of image detection when using quadruped robots for image acquisition. In order to improve the accuracy of quadruped robots in waste detection, this article proposed a detection model called YOLOv7-GCM model for creek waste. The model integrated a global attention mechanism (GAM) into the YOLOv7 model, which achieved accurate waste detection in ever-changing backgrounds and underwater conditions. A content-aware reassembly of features (CARAFE) replaced a up-sampling of the YOLOv7 model to achieve more accurate and efficient feature reconstruction. A minimum point distance intersection over union (MPDIOU) loss function replaced the CIOU loss function of the YOLOv7 model to more accurately measure the similarity between target boxes and predictive boxes. After the aforementioned improvements, the YOLOv7-GCM model was obtained. A quadruped robot to patrol the creek and collect images of creek waste. Finally, the YOLOv7-GCM model was trained on the creek waste dataset. The outcomes of the experiment show that the precision rate of the YOLOv7-GCM model has increased by 4.2% and the mean average precision (mAP@0.5) has accumulated by 2.1%. The YOLOv7-GCM model provides a new method for identifying creek waste, which may help promote efficient waste management.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling the unseen: novel strategies for object detection beyond known distributions 揭开看不见的面纱:已知分布之外的物体检测新策略
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-13 DOI: 10.1007/s10044-024-01334-4
S. Devi, R. Dayana, P. Malarvezhi

In contemporary machine learning, models often struggle with data distribution variations, severely impacting their out-of-distribution (OOD) generalization and detection capabilities. Current object detection methods, relying on virtual outlier synthesis and class-conditional density estimation, struggle to effectively distinguish OOD samples. They often depend on accurate density estimation and may produce virtual outliers that lack realism, particularly in complex or dynamic environments. Furthermore, previous research has typically addressed covariate and semantic shifts independently, resulting in fragmented solutions that fail to comprehensively tackle OOD generalization. This study introduces a unified approach to enhance OOD generalization in object recognition models, addressing these critical gaps. The strategy involves employing adversarial perturbations on the ID (In-Distribution) dataset to enhance the model’s resilience to distribution shifts, thereby simulating potential real-world scenarios characterized by imperceptible variations. Additionally, the integration of Maximum Mean Discrepancy (MMD) at the object level effectively discriminates between ID and OOD samples by quantifying distributional differences. For precise OOD detection, a K-nearest neighbors (KNN) algorithm is used during inference to measure similarity between samples and their closest neighbors in the training data. Evaluations on benchmark datasets, including PASCAL VOC and BDD100K as ID, with COCO and Open Images subsets as OOD, demonstrate significant improvements in OOD generalization compared to existing methods. These discoveries underscore the framework’s potential to elevate the dependability and flexibility of object recognition systems in practical scenarios, particularly in autonomous vehicles where accurate object detection under diverse conditions is critical for safety. This research contributes to advancing OOD generalization techniques and lays the groundwork for future refinement to address evolving challenges in machine learning applications. The code can be accessed from https://github.com/DeviSPhd/(OODG_OD)

在当代机器学习中,模型往往难以应对数据分布的变化,严重影响其分布外(OOD)泛化和检测能力。目前的物体检测方法依赖于虚拟离群值合成和类条件密度估计,很难有效区分 OOD 样本。它们通常依赖于精确的密度估计,并可能产生缺乏真实感的虚拟异常值,尤其是在复杂或动态环境中。此外,以往的研究通常将协变量和语义偏移分开处理,导致解决方案支离破碎,无法全面解决 OOD 泛化问题。本研究引入了一种统一的方法来增强物体识别模型中的 OOD 泛化,以弥补这些关键的不足。该策略包括在 ID(分布内)数据集上采用对抗性扰动,以增强模型对分布变化的适应能力,从而模拟以难以察觉的变化为特征的潜在真实世界场景。此外,在对象层面整合最大均值差异(MMD),通过量化分布差异有效区分 ID 和 OOD 样本。为了精确检测 OOD,在推理过程中使用了 K-nearest neighbors(KNN)算法来测量样本与其训练数据中的近邻之间的相似性。在基准数据集(包括作为 ID 的 PASCAL VOC 和 BDD100K,以及作为 OOD 的 COCO 和 Open Images 子集)上进行的评估表明,与现有方法相比,OOD 的泛化能力有了显著提高。这些发现凸显了该框架在提高实际场景中物体识别系统的可靠性和灵活性方面的潜力,特别是在自动驾驶汽车中,不同条件下的精确物体检测对安全至关重要。这项研究有助于推动 OOD 泛化技术的发展,并为未来的改进奠定基础,以应对机器学习应用中不断变化的挑战。代码可从 https://github.com/DeviSPhd/(OODG_OD) 访问。
{"title":"Unveiling the unseen: novel strategies for object detection beyond known distributions","authors":"S. Devi, R. Dayana, P. Malarvezhi","doi":"10.1007/s10044-024-01334-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01334-4","url":null,"abstract":"<p>In contemporary machine learning, models often struggle with data distribution variations, severely impacting their out-of-distribution (OOD) generalization and detection capabilities. Current object detection methods, relying on virtual outlier synthesis and class-conditional density estimation, struggle to effectively distinguish OOD samples. They often depend on accurate density estimation and may produce virtual outliers that lack realism, particularly in complex or dynamic environments. Furthermore, previous research has typically addressed covariate and semantic shifts independently, resulting in fragmented solutions that fail to comprehensively tackle OOD generalization. This study introduces a unified approach to enhance OOD generalization in object recognition models, addressing these critical gaps. The strategy involves employing adversarial perturbations on the ID (In-Distribution) dataset to enhance the model’s resilience to distribution shifts, thereby simulating potential real-world scenarios characterized by imperceptible variations. Additionally, the integration of Maximum Mean Discrepancy (MMD) at the object level effectively discriminates between ID and OOD samples by quantifying distributional differences. For precise OOD detection, a K-nearest neighbors (KNN) algorithm is used during inference to measure similarity between samples and their closest neighbors in the training data. Evaluations on benchmark datasets, including PASCAL VOC and BDD100K as ID, with COCO and Open Images subsets as OOD, demonstrate significant improvements in OOD generalization compared to existing methods. These discoveries underscore the framework’s potential to elevate the dependability and flexibility of object recognition systems in practical scenarios, particularly in autonomous vehicles where accurate object detection under diverse conditions is critical for safety. This research contributes to advancing OOD generalization techniques and lays the groundwork for future refinement to address evolving challenges in machine learning applications. The code can be accessed from https://github.com/DeviSPhd/<span>(OODG_OD)</span></p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methods for calculating gliding-box lacunarity efficiently on large datasets 在大型数据集上高效计算滑动盒缺陷的方法
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-13 DOI: 10.1007/s10044-024-01332-6
Bálint Barna H. Kovács, Miklós Erdélyi

Lacunarity has proven to be a useful, multifaceted tool for image analysis in several different scientific fields, from geography to virology, which has lent increasing importance to the lacunarity analysis of large datasets. It can be most reliably calculated with the so-called gliding-box method, but the evaluation process can be exceedingly time-consuming and unviable as this algorithm is not designed to operate on large datasets. Here we introduce two novel methods that can calculate gliding-box lacunarity orders of magnitude faster than the original method without any loss of accuracy. We compare these methods with the original as well as with two already existing optimized methods based on runtime memory usage and complexity. The application of all five methods for both 2D and 3D datasets analysis confirms that each of the four optimized methods are orders of magnitude faster than the original one, but each has its advantages and limitations.

在从地理学到病毒学等多个不同的科学领域,裂隙度已被证明是一种有用的、多方面的图像分析工具,这使得对大型数据集进行裂隙度分析变得越来越重要。利用所谓的滑动盒方法可以最可靠地计算出裂隙度,但由于这种算法并非专为大型数据集而设计,因此评估过程可能会非常耗时且不可行。在这里,我们介绍了两种新方法,它们计算滑动盒裂隙度的速度比原始方法快很多,而且准确性丝毫无损。我们将这些方法与原始方法以及现有的两种基于运行时内存使用和复杂性的优化方法进行了比较。将所有五种方法应用于二维和三维数据集分析证实,四种优化方法都比原始方法快几个数量级,但每种方法都有其优势和局限性。
{"title":"Methods for calculating gliding-box lacunarity efficiently on large datasets","authors":"Bálint Barna H. Kovács, Miklós Erdélyi","doi":"10.1007/s10044-024-01332-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01332-6","url":null,"abstract":"<p>Lacunarity has proven to be a useful, multifaceted tool for image analysis in several different scientific fields, from geography to virology, which has lent increasing importance to the lacunarity analysis of large datasets. It can be most reliably calculated with the so-called gliding-box method, but the evaluation process can be exceedingly time-consuming and unviable as this algorithm is not designed to operate on large datasets. Here we introduce two novel methods that can calculate gliding-box lacunarity orders of magnitude faster than the original method without any loss of accuracy. We compare these methods with the original as well as with two already existing optimized methods based on runtime memory usage and complexity. The application of all five methods for both 2D and 3D datasets analysis confirms that each of the four optimized methods are orders of magnitude faster than the original one, but each has its advantages and limitations.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LDC-PP-YOLOE: a lightweight model for detecting and counting citrus fruit LDC-PP-YOLOE:检测和计数柑橘类水果的轻量级模型
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-13 DOI: 10.1007/s10044-024-01329-1
Yibo Lv, Shenglian Lu, Xiaoyu Liu, Jiangchuan Bao, Binghao Liu, Ming Chen, Guo Li

In the citrus orchard environment, accurate counting of the fruit, and the use of lightweight detection methods are the key presteps to automate citrus picking and yield estimations. Most high-precision fruit detection models based on deep learning use complex models with devices that require high quantities of computational resources and memory. Devices with limited resources cannot meet the requirements of these models. Thus, to overcome this problem, we focus on creating a lightweight model with a convolutional neural network. In this research, we propose a lightweight citrus detection model based on the mobile device LDC-PP-YOLOE. LDC-PP-YOLOE is improved based on PP-YOLOE by using localized knowledge distillation and CBAM, with a mAP@0.5 of 88(%), mAP@0.95 of 51.3(%), params of 8 M and speed of 0.34 s, respectively. The performance of LDC-PP-YOLOE was compared against commonly used detectors and LDC-PP-YOLOE’s mAP@0.5 was 2.5, 6.9 and 16.3(%), and was 4.3(%) greater than Faster R-CNN, YOLOX-s and PicoDet-L, respectively. LDC-PP-YOLOE achieved an RMSE of 8.63 and an MSE of 5.27 compared to the ground truth on citrus applications. In addition, we used apple and passion fruit datasets to verify the generalization of the model; the mAP@0.5 is improved by 1 and 0.7(%). LDC-PP-YOLOE can be used as a lightweight model to help growers track citrus populations and optimize citrus yields in complex citrus orchard environments with resource-limited equipment. It also provides a solution for lightweight models.

在柑橘园环境中,准确的果实计数和轻量级检测方法的使用是实现柑橘采摘和产量估算自动化的关键步骤。大多数基于深度学习的高精度水果检测模型都使用复杂的模型,其设备需要大量的计算资源和内存。资源有限的设备无法满足这些模型的要求。因此,为了克服这一问题,我们专注于利用卷积神经网络创建轻量级模型。在本研究中,我们提出了一种基于移动设备 LDC-PP-YOLOE 的轻量级柑橘检测模型。LDC-PP-YOLOE是在PP-YOLOE的基础上通过使用局部知识提炼和CBAM改进而来的,其mAP@0.5,mAP@0.95,参数为8 M,速度为0.34 s。LDC-PP-YOLOE 的性能与常用检测器进行了比较,LDC-PP-YOLOE 的 mAP@0.5 分别为 2.5、6.9 和 16.3(%),比 Faster R-CNN、YOLOX-s 和 PicoDet-L 分别高出 4.3(%)。在柑橘应用中,LDC-PP-YOLOE 与地面实况相比,RMSE 为 8.63,MSE 为 5.27。此外,我们还使用苹果和百香果数据集来验证模型的泛化效果;mAP@0.5,分别提高了 1 和 0.7(%)。LDC-PP-YOLOE 可作为一种轻量级模型,帮助种植者在设备资源有限的复杂柑橘园环境中跟踪柑橘种群并优化柑橘产量。它还为轻量级模型提供了一种解决方案。
{"title":"LDC-PP-YOLOE: a lightweight model for detecting and counting citrus fruit","authors":"Yibo Lv, Shenglian Lu, Xiaoyu Liu, Jiangchuan Bao, Binghao Liu, Ming Chen, Guo Li","doi":"10.1007/s10044-024-01329-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01329-1","url":null,"abstract":"<p>In the citrus orchard environment, accurate counting of the fruit, and the use of lightweight detection methods are the key presteps to automate citrus picking and yield estimations. Most high-precision fruit detection models based on deep learning use complex models with devices that require high quantities of computational resources and memory. Devices with limited resources cannot meet the requirements of these models. Thus, to overcome this problem, we focus on creating a lightweight model with a convolutional neural network. In this research, we propose a lightweight citrus detection model based on the mobile device LDC-PP-YOLOE. LDC-PP-YOLOE is improved based on PP-YOLOE by using localized knowledge distillation and CBAM, with a mAP@0.5 of 88<span>(%)</span>, mAP@0.95 of 51.3<span>(%)</span>, params of 8 M and speed of 0.34 s, respectively. The performance of LDC-PP-YOLOE was compared against commonly used detectors and LDC-PP-YOLOE’s mAP@0.5 was 2.5, 6.9 and 16.3<span>(%)</span>, and was 4.3<span>(%)</span> greater than Faster R-CNN, YOLOX-s and PicoDet-L, respectively. LDC-PP-YOLOE achieved an RMSE of 8.63 and an MSE of 5.27 compared to the ground truth on citrus applications. In addition, we used apple and passion fruit datasets to verify the generalization of the model; the mAP@0.5 is improved by 1 and 0.7<span>(%)</span>. LDC-PP-YOLOE can be used as a lightweight model to help growers track citrus populations and optimize citrus yields in complex citrus orchard environments with resource-limited equipment. It also provides a solution for lightweight models.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DN3MF: deep neural network for non-negative matrix factorization towards low rank approximation DN3MF:面向低等级逼近的非负矩阵因式分解深度神经网络
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-11 DOI: 10.1007/s10044-024-01335-3
Prasun Dutta, Rajat K. De

Dimension reduction is one of the most sought-after methodologies to deal with high-dimensional ever-expanding complex datasets. Non-negative matrix factorization (NMF) is one such technique for dimension reduction. Here, a multiple deconstruction multiple reconstruction deep learning model (DN3MF) for NMF targeted towards low rank approximation, has been developed. Non-negative input data has been processed using hierarchical learning to generate part-based sparse and meaningful representation. The novel design of DN3MF ensures the non-negativity requirement of the model. The use of Xavier initialization technique solves the exploding or vanishing gradient problem. The objective function of the model has been designed employing regularization, ensuring the best possible approximation of the input matrix. A novel adaptive learning mechanism has been developed to accomplish the objective of the model. The superior performance of the proposed model has been established by comparing the results obtained by the model with that of six other well-established dimension reduction algorithms on three well-known datasets in terms of preservation of the local structure of data in low rank embedding, and in the context of downstream analyses using classification and clustering. The statistical significance of the results has also been established. The outcome clearly demonstrates DN3MF’s superiority over compared dimension reduction approaches in terms of both statistical and intrinsic property preservation standards. The comparative analysis of all seven dimensionality reduction algorithms including DN3MF with respect to the computational complexity and a pictorial depiction of the convergence analysis for both stages of DN3MF have also been presented.

降维是处理不断扩展的高维复杂数据集的最受欢迎的方法之一。非负矩阵因式分解(NMF)就是这样一种降维技术。在这里,我们开发了一种针对非负矩阵因式分解的多重解构多重重构深度学习模型(DN3MF),旨在实现低秩逼近。非负输入数据经过分层学习处理,生成了基于部分的稀疏而有意义的表示。DN3MF 的新颖设计确保了模型的非负性要求。Xavier 初始化技术的使用解决了梯度爆炸或消失问题。该模型的目标函数采用了正则化设计,确保了输入矩阵的最佳近似值。为了实现模型的目标,还开发了一种新颖的自适应学习机制。通过比较该模型与其他六种成熟的降维算法在三个知名数据集上所获得的结果,证明了所提模型在低秩嵌入数据局部结构的保留方面,以及在使用分类和聚类进行下游分析方面的优越性能。此外,还确定了结果的统计意义。结果清楚地表明,DN3MF 在统计和内在属性保存标准方面都优于其他降维方法。此外,还介绍了包括 DN3MF 在内的所有七种降维算法在计算复杂度方面的比较分析,以及 DN3MF 两个阶段的收敛分析图示。
{"title":"DN3MF: deep neural network for non-negative matrix factorization towards low rank approximation","authors":"Prasun Dutta, Rajat K. De","doi":"10.1007/s10044-024-01335-3","DOIUrl":"https://doi.org/10.1007/s10044-024-01335-3","url":null,"abstract":"<p>Dimension reduction is one of the most sought-after methodologies to deal with high-dimensional ever-expanding complex datasets. Non-negative matrix factorization (NMF) is one such technique for dimension reduction. Here, a multiple deconstruction multiple reconstruction deep learning model (DN3MF) for NMF targeted towards low rank approximation, has been developed. Non-negative input data has been processed using hierarchical learning to generate part-based sparse and meaningful representation. The novel design of DN3MF ensures the non-negativity requirement of the model. The use of Xavier initialization technique solves the exploding or vanishing gradient problem. The objective function of the model has been designed employing regularization, ensuring the best possible approximation of the input matrix. A novel adaptive learning mechanism has been developed to accomplish the objective of the model. The superior performance of the proposed model has been established by comparing the results obtained by the model with that of six other well-established dimension reduction algorithms on three well-known datasets in terms of preservation of the local structure of data in low rank embedding, and in the context of downstream analyses using classification and clustering. The statistical significance of the results has also been established. The outcome clearly demonstrates DN3MF’s superiority over compared dimension reduction approaches in terms of both statistical and intrinsic property preservation standards. The comparative analysis of all seven dimensionality reduction algorithms including DN3MF with respect to the computational complexity and a pictorial depiction of the convergence analysis for both stages of DN3MF have also been presented.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFA U-Net: a U-Net like multi-stage feature analysis network for medical image segmentation MFA U-Net:用于医学图像分割的类似 U-Net 的多级特征分析网络
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-09 DOI: 10.1007/s10044-024-01331-7
Yupeng Wang, Suyu Wang, Jian He

The U-Net and its extensions have achieved good success in medical image segmentation. However, fine-grained segmentation of the objects at their fuzzy edges, which is commonly found in medical images, is still challenging. In this paper, we propose a U-Net like Multi-Stage Feature Analysis Network (MFA U-Net) for medical image segmentation, which focus on mining the reusability of the images and features from several perspectives. Firstly, a multi-channel dimensional feature extraction module is proposed, where the input image was reused by multiple branches of convolutions with different channels to generate supplement features to the original U shaped network. Next, a cascaded U-shaped network is designed for deeper feature mining and analysis, which enables progressive refinement of the features. In the neck of the cascaded network, a parallel hybrid convolution module is designed that concatenating several types of convolutional methods to enhance the semantic representation ability of the model. In short, by reusing of the input images and detected features in several stages, more effective features were extracted and the segmentation performances were improved. The proposed algorithm was evaluated by three mainstream 2D color medical image segmentation datasets and gets significant improvements compared with the traditional U-Net framework, as well as the latest improved ones. Compared to the baseline network, it gets the improvements of 0.93% (Dice) and 1.45% (IoU) on GlaS, 2.09% (Dice) and 2.87% (IoU) on MoNuSeg, and 0.17% (F1) and 1.72% (SE) on DRIVE.

U-Net 及其扩展技术在医学图像分割方面取得了巨大成功。然而,对医学图像中常见的模糊边缘物体进行细粒度分割仍然具有挑战性。本文提出了一种类似于多阶段特征分析网络(MFA U-Net)的医学图像分割方法,主要从几个方面挖掘图像和特征的可重用性。首先,提出了一个多通道维度特征提取模块,通过不同通道的卷积分支对输入图像进行重用,以生成对原始 U 形网络的补充特征。接下来,设计了一个级联 U 型网络,用于更深层次的特征挖掘和分析,从而实现对特征的逐步细化。在级联网络的颈部,设计了一个并行混合卷积模块,将多种卷积方法串联起来,以增强模型的语义表征能力。总之,通过在多个阶段重复使用输入图像和检测到的特征,提取出了更有效的特征,提高了分割性能。通过三个主流二维彩色医学图像分割数据集对所提出的算法进行了评估,结果表明,与传统的 U-Net 框架以及最新改进的 U-Net 框架相比,所提出的算法有了显著的改进。与基线网络相比,该算法在 GlaS 上提高了 0.93%(Dice)和 1.45%(IoU),在 MoNuSeg 上提高了 2.09%(Dice)和 2.87%(IoU),在 DRIVE 上提高了 0.17%(F1)和 1.72%(SE)。
{"title":"MFA U-Net: a U-Net like multi-stage feature analysis network for medical image segmentation","authors":"Yupeng Wang, Suyu Wang, Jian He","doi":"10.1007/s10044-024-01331-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01331-7","url":null,"abstract":"<p>The U-Net and its extensions have achieved good success in medical image segmentation. However, fine-grained segmentation of the objects at their fuzzy edges, which is commonly found in medical images, is still challenging. In this paper, we propose a U-Net like Multi-Stage Feature Analysis Network (MFA U-Net) for medical image segmentation, which focus on mining the reusability of the images and features from several perspectives. Firstly, a multi-channel dimensional feature extraction module is proposed, where the input image was reused by multiple branches of convolutions with different channels to generate supplement features to the original U shaped network. Next, a cascaded U-shaped network is designed for deeper feature mining and analysis, which enables progressive refinement of the features. In the neck of the cascaded network, a parallel hybrid convolution module is designed that concatenating several types of convolutional methods to enhance the semantic representation ability of the model. In short, by reusing of the input images and detected features in several stages, more effective features were extracted and the segmentation performances were improved. The proposed algorithm was evaluated by three mainstream 2D color medical image segmentation datasets and gets significant improvements compared with the traditional U-Net framework, as well as the latest improved ones. Compared to the baseline network, it gets the improvements of 0.93% (Dice) and 1.45% (IoU) on GlaS, 2.09% (Dice) and 2.87% (IoU) on MoNuSeg, and 0.17% (F1) and 1.72% (SE) on DRIVE.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel residual fourier convolution model for brain tumor segmentation of mr images 用于 mr 图像脑肿瘤分割的新型残差傅立叶卷积模型
IF 3.9 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-09 DOI: 10.1007/s10044-024-01312-w
Haipeng Zhu, Hong He

Magnetic resonance imaging is an essential tool for the early diagnosis of brain tumors. However, it is challenging for the segmentation of the brain tumor of magnetic resonance images due to the most severe problem of blurred boundaries and variable spatial structure. Therefore, combining multiple brain datasets, a novel residual Fourier convolution model with local interpretability is presented to address mentioned above problem in this study. Firstly, an interpretable residual Fourier convolution encoder is constructed by the Fourier transform and its inverse for fast extraction of the spectral features of the brain tumor regions. Furthermore, the dilated-gated attention mechanism is designed to expand the receptive fields and extract blurred irregular boundary features that are closer to the lesion regions. Finally, the encoder-decoder spatial attention fusion mechanism is developed to further extract more fine-grained contextual spatial features from the variable spatial structure of adjacent magnetic resonance slices. Compared to other advanced models, our proposed model has achieved state-of-the-art average segmentation performance by testing on the BraTS2019, Figshare, and TCIA datasets. The average Dice coefficient, sensitivity, MIoU, and PPV respectively reach to 0.892, 87.1%, 0.843, and 91.5%. The proposed segmentation framework can provide more reliable segmentation results for the early diagnosis of brain tumors because of its robust feature extraction ability, interpretability, and generalization ability.

磁共振成像是早期诊断脑肿瘤的重要工具。然而,由于边界模糊和空间结构多变等最严重的问题,磁共振图像的脑肿瘤分割具有挑战性。因此,本研究结合多个脑数据集,提出了一种具有局部可解释性的新型残差傅立叶卷积模型,以解决上述问题。首先,通过傅里叶变换及其逆变换构建了一个可解释的残差傅里叶卷积编码器,用于快速提取脑肿瘤区域的频谱特征。此外,还设计了扩张门控注意机制,以扩大感受野,提取更接近病变区域的模糊不规则边界特征。最后,我们还开发了编码器-解码器空间注意力融合机制,以进一步从相邻磁共振切片的可变空间结构中提取更精细的上下文空间特征。通过在 BraTS2019、Figshare 和 TCIA 数据集上的测试,与其他先进模型相比,我们提出的模型达到了最先进的平均分割性能。平均 Dice 系数、灵敏度、MIoU 和 PPV 分别达到 0.892、87.1%、0.843 和 91.5%。所提出的分割框架具有鲁棒的特征提取能力、可解释性和泛化能力,能为脑肿瘤的早期诊断提供更可靠的分割结果。
{"title":"A novel residual fourier convolution model for brain tumor segmentation of mr images","authors":"Haipeng Zhu, Hong He","doi":"10.1007/s10044-024-01312-w","DOIUrl":"https://doi.org/10.1007/s10044-024-01312-w","url":null,"abstract":"<p>Magnetic resonance imaging is an essential tool for the early diagnosis of brain tumors. However, it is challenging for the segmentation of the brain tumor of magnetic resonance images due to the most severe problem of blurred boundaries and variable spatial structure. Therefore, combining multiple brain datasets, a novel residual Fourier convolution model with local interpretability is presented to address mentioned above problem in this study. Firstly, an interpretable residual Fourier convolution encoder is constructed by the Fourier transform and its inverse for fast extraction of the spectral features of the brain tumor regions. Furthermore, the dilated-gated attention mechanism is designed to expand the receptive fields and extract blurred irregular boundary features that are closer to the lesion regions. Finally, the encoder-decoder spatial attention fusion mechanism is developed to further extract more fine-grained contextual spatial features from the variable spatial structure of adjacent magnetic resonance slices. Compared to other advanced models, our proposed model has achieved state-of-the-art average segmentation performance by testing on the BraTS2019, Figshare, and TCIA datasets. The average Dice coefficient, sensitivity, MIoU, and PPV respectively reach to 0.892, 87.1%, 0.843, and 91.5%. The proposed segmentation framework can provide more reliable segmentation results for the early diagnosis of brain tumors because of its robust feature extraction ability, interpretability, and generalization ability.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Analysis and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1