Pattern Analysis and Applications最新文献_第8页

Saliency information and mosaic based data augmentation method for densely occluded object recognition 基于显著性信息和马赛克的数据增强方法用于密集遮挡物体识别

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-29 DOI: 10.1007/s10044-024-01258-z

Ying Tong, Xiangfeng Luo, Liyan Ma, Shaorong Xie, Wenbin Yang, Yinsai Guo

Data augmentation methods are crucial to improve the accuracy of densely occluded object recognition in the scene where the quantity and diversity of training images are insufficient. However, the current methods that use regional dropping and mixing strategies suffer from the problem of missing foreground objects and redundant background features, which can lead to densely occluded object recognition issues in classification or detection tasks. Herein, saliency information and mosaic based data augmentation method for densely occluded object recognition is proposed, which utilizes saliency information as prior knowledge to supervise the mosaic process of training images containing densely occluded objects. And the method uses fogging processing and class label mixing to construct new augmented images, in order to improve the accuracy of image classification and object recognition tasks by augmenting the quantity and diversity of training images. Extensive experiments on different classification datasets with various CNN architectures prove the effectiveness of our method.

在训练图像数量和多样性不足的场景中，数据增强方法对于提高密集遮挡物体识别的准确性至关重要。然而，目前使用区域丢弃和混合策略的方法存在前景物体缺失和背景特征冗余的问题，这可能导致在分类或检测任务中出现密集遮挡物体识别问题。本文提出了基于显著性信息和马赛克的密集遮挡物体识别数据增强方法，该方法利用显著性信息作为先验知识，对包含密集遮挡物体的训练图像的马赛克过程进行监督。该方法利用雾化处理和类标签混合来构建新的增强图像，从而通过增强训练图像的数量和多样性来提高图像分类和物体识别任务的准确性。利用各种 CNN 架构在不同分类数据集上进行的大量实验证明了我们方法的有效性。

{"title":"Saliency information and mosaic based data augmentation method for densely occluded object recognition","authors":"Ying Tong, Xiangfeng Luo, Liyan Ma, Shaorong Xie, Wenbin Yang, Yinsai Guo","doi":"10.1007/s10044-024-01258-z","DOIUrl":"https://doi.org/10.1007/s10044-024-01258-z","url":null,"abstract":"Data augmentation methods are crucial to improve the accuracy of densely occluded object recognition in the scene where the quantity and diversity of training images are insufficient. However, the current methods that use regional dropping and mixing strategies suffer from the problem of missing foreground objects and redundant background features, which can lead to densely occluded object recognition issues in classification or detection tasks. Herein, saliency information and mosaic based data augmentation method for densely occluded object recognition is proposed, which utilizes saliency information as prior knowledge to supervise the mosaic process of training images containing densely occluded objects. And the method uses fogging processing and class label mixing to construct new augmented images, in order to improve the accuracy of image classification and object recognition tasks by augmenting the quantity and diversity of training images. Extensive experiments on different classification datasets with various CNN architectures prove the effectiveness of our method.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"43 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140322598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scene text detection using structured information and an end-to-end trainable generative adversarial networks 利用结构化信息和端到端可训练生成式对抗网络进行场景文本检测

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-19 DOI: 10.1007/s10044-024-01259-y

Palanichamy Naveen, Mahmoud Hassaballah

Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.

由于文字外观、背景和方向的多样性，场景文字检测是一项相当大的挑战。在这种情况下，提高鲁棒性、准确性和效率对于光学字符识别、图像理解和自动驾驶汽车等多种应用至关重要。本文探讨了生成式对抗网络（GAN）与网络变异自动编码器（VAE）的整合，以创建一个强大而有效的文本检测网络。所提出的架构由三个相互关联的模块组成：VAE 模块、GAN 模块和文本检测模块。在此框架中，VAE 模块在生成多样化和可变的文本区域方面发挥着关键作用。随后，GAN 模块对这些区域进行细化和增强，以确保更高的真实性和准确性。然后，文本检测模块负责通过为每个区域分配置信度分数来识别输入图像中的文本区域。整个网络的综合训练包括最小化联合损失函数，其中包括 VAE 损失、GAN 损失和文本检测损失。VAE 损失可确保生成文本区域的多样性，GAN 损失可确保真实性和准确性，而文本检测损失则可确保文本区域的高精度识别。所提出的方法在 VAE 模块中采用了编码器-解码器结构，在 GAN 模块中采用了生成器-鉴别器结构。在不同的数据集（包括 Total-Text、CTW1500、ICDAR 2015、ICDAR 2017、ReCTS、TD500、COCO-Text、SynthText、Street View Text 和 KIAST Scene Text）上进行的严格测试表明，与现有方法相比，所提出的方法具有更优越的性能。

{"title":"Scene text detection using structured information and an end-to-end trainable generative adversarial networks","authors":"Palanichamy Naveen, Mahmoud Hassaballah","doi":"10.1007/s10044-024-01259-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01259-y","url":null,"abstract":"Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DA-ResNet: dual-stream ResNet with attention mechanism for classroom video summary DA-ResNet：带有注意力机制的双流 ResNet，用于课堂视频摘要

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-14 DOI: 10.1007/s10044-024-01256-1

Yuxiang Wu, Xiaoyan Wang, Tianpan Chen, Yan Dou

It is important to generate both diverse and representative video summary for massive videos. In this paper, a convolution neural network based on dual-stream attention mechanism(DA-ResNet) is designed to obtain candidate summary sequences for classroom scenes. DA-ResNet constructs a dual stream input of image frame sequence and optical flow frame sequence to enhance the expression ability. The network also embeds the attention mechanism into ResNet. On the other hand, the final video summary is obtained by removing redundant frames with the improved hash clustering algorithm. In this process, preprocessing is performed first to reduce computational complexity. And then hash clustering is used to retain the frame with the highest entropy value in each class, removing other similar frames. To verify its effectiveness in classroom scenes, we also created ClassVideo, a real dataset consisting of 45 videos from the normal teaching environment of our school. The results of the experiments show the competitiveness of the proposed method DA-ResNet outperforms the existing methods by about 8% in terms of the F-measure. Besides, the visual results also demonstrate its ability to produce classroom video summaries that are very close to the human preferences.

为海量视频生成既多样化又有代表性的视频摘要非常重要。本文设计了一种基于双流关注机制的卷积神经网络（DA-ResNet）来获取教室场景的候选摘要序列。DA-ResNet 构建了图像帧序列和光流帧序列的双流输入，以增强表达能力。该网络还在 ResNet 中嵌入了注意力机制。另一方面，通过改进的哈希聚类算法去除冗余帧，得到最终的视频摘要。在此过程中，首先要进行预处理，以降低计算复杂度。然后使用哈希聚类保留每个类别中熵值最高的帧，去除其他类似帧。为了验证其在课堂场景中的有效性，我们还创建了一个真实数据集 ClassVideo，该数据集由我校正常教学环境中的 45 个视频组成。实验结果表明，DA-ResNet 的 F-measure 优于现有方法约 8%。此外，可视化结果也证明了该方法能够生成非常接近人类偏好的课堂视频摘要。

{"title":"DA-ResNet: dual-stream ResNet with attention mechanism for classroom video summary","authors":"Yuxiang Wu, Xiaoyan Wang, Tianpan Chen, Yan Dou","doi":"10.1007/s10044-024-01256-1","DOIUrl":"https://doi.org/10.1007/s10044-024-01256-1","url":null,"abstract":"It is important to generate both diverse and representative video summary for massive videos. In this paper, a convolution neural network based on dual-stream attention mechanism(DA-ResNet) is designed to obtain candidate summary sequences for classroom scenes. DA-ResNet constructs a dual stream input of image frame sequence and optical flow frame sequence to enhance the expression ability. The network also embeds the attention mechanism into ResNet. On the other hand, the final video summary is obtained by removing redundant frames with the improved hash clustering algorithm. In this process, preprocessing is performed first to reduce computational complexity. And then hash clustering is used to retain the frame with the highest entropy value in each class, removing other similar frames. To verify its effectiveness in classroom scenes, we also created ClassVideo, a real dataset consisting of 45 videos from the normal teaching environment of our school. The results of the experiments show the competitiveness of the proposed method DA-ResNet outperforms the existing methods by about 8% in terms of the F-measure. Besides, the visual results also demonstrate its ability to produce classroom video summaries that are very close to the human preferences.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"20 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140155039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel Venus’ visible image processing neoteric workflow for improved planetary surface feature analysis 用于改进行星表面特征分析的新型金星可见光图像处理新工作流程

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-12 DOI: 10.1007/s10044-024-01253-4

Indranil Misra, Mukesh Kumar Rohil, SManthira Moorthi, Debajyoti Dhar

The article presents a novel methodology that comprises of end-to-end Venus’ visible image processing neoteric workflow. The visible raw image is denoised using Tri-State median filter with background dark subtraction, and then enhanced using Contrast Limited Adaptive Histogram Equalization. The multi-modal image registration technique is developed using Segmented Affine Scale Invariant Feature Transform and Motion Smoothness Constraint outlier removal for co-registration of Venus’ visible and radar image. A novel image fusion algorithm using guided filter is developed to merge multi-modal Visible-Radar Venus’ image pair for generating the fused image. The Venus’ visible image quality assessment is performed at each processing step, and results are quantified and visualized. In addition, fuzzy color-coded segmentation map is generated for crucial information retrieval about Venus’ surface feature characteristics. It is found that Venus’ fused image clearly demarked planetary morphological features and validated with publicly available Venus’ radar nomenclature map.

Graphical abstract

文章介绍了一种新颖的方法，包括端到端维纳斯可见光图像处理新技术工作流程。利用三态中值滤波器对可见光原始图像进行去噪处理，并减去背景暗部，然后利用对比度受限的自适应直方图均衡化技术进行增强。利用分段仿射尺度不变特征变换和运动平滑约束离群点去除技术，开发了多模态图像配准技术，用于金星可见光图像和雷达图像的共同配准。利用引导滤波器开发了一种新型图像融合算法，用于合并金星可见光-雷达多模态图像对，生成融合图像。在每个处理步骤中都对金星可见光图像质量进行评估，并将结果量化和可视化。此外，还生成了模糊彩色编码分割图，用于检索有关金星表面特征的重要信息。研究发现，金星融合图像清晰地标示了行星形态特征，并与公开的金星雷达命名图进行了验证。

{"title":"A novel Venus’ visible image processing neoteric workflow for improved planetary surface feature analysis","authors":"Indranil Misra, Mukesh Kumar Rohil, SManthira Moorthi, Debajyoti Dhar","doi":"10.1007/s10044-024-01253-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01253-4","url":null,"abstract":"The article presents a novel methodology that comprises of end-to-end Venus’ visible image processing neoteric workflow. The visible raw image is denoised using Tri-State median filter with background dark subtraction, and then enhanced using Contrast Limited Adaptive Histogram Equalization. The multi-modal image registration technique is developed using Segmented Affine Scale Invariant Feature Transform and Motion Smoothness Constraint outlier removal for co-registration of Venus’ visible and radar image. A novel image fusion algorithm using guided filter is developed to merge multi-modal Visible-Radar Venus’ image pair for generating the fused image. The Venus’ visible image quality assessment is performed at each processing step, and results are quantified and visualized. In addition, fuzzy color-coded segmentation map is generated for crucial information retrieval about Venus’ surface feature characteristics. It is found that Venus’ fused image clearly demarked planetary morphological features and validated with publicly available Venus’ radar nomenclature map.<h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>\u0000","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140115429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding the limitations of self-supervised learning for tabular anomaly detection 了解表格异常检测中自我监督学习的局限性

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-12 DOI: 10.1007/s10044-023-01208-1

Kimberly T. Mai, Toby Davies, Lewis D. Griffin

While self-supervised learning has improved anomaly detection in computer vision and natural language processing, it is unclear whether tabular data can benefit from it. This paper explores the limitations of self-supervision for tabular anomaly detection. We conduct several experiments spanning various pretext tasks on 26 benchmark datasets to understand why this is the case. Our results confirm representations derived from self-supervision do not improve tabular anomaly detection performance compared to using the raw representations of the data. We show this is due to neural networks introducing irrelevant features, which reduces the effectiveness of anomaly detectors. However, we demonstrate that using a subspace of the neural network’s representation can recover performance.

虽然自监督学习改进了计算机视觉和自然语言处理中的异常检测，但还不清楚表格数据是否能从中受益。本文探讨了自监督在表格异常检测中的局限性。我们在 26 个基准数据集上进行了多项实验，涵盖了各种借口任务，以了解出现这种情况的原因。我们的结果证实，与使用原始数据表征相比，通过自我监督获得的表征并不能提高表格异常检测性能。我们证明这是由于神经网络引入了不相关的特征，从而降低了异常检测器的有效性。不过，我们证明，使用神经网络表示的子空间可以恢复性能。

引用次数: 0

Wise-SrNet: a novel architecture for enhancing image classification by learning spatial resolution of feature maps Wise-SRNet：通过学习特征图的空间分辨率增强图像分类的新型架构

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-09 DOI: 10.1007/s10044-024-01211-0

Mohammad Rahimzadeh, Soroush Parvin, Amirali Askari, Elnaz Safi, Mohammad Reza Mohammadi

One of the main challenges, since the advancement of convolutional neural networks is how to connect the extracted feature map to the final classification layer. VGG models used two sets of fully connected layers for the classification part of their architectures, which significantly increased the number of models’ weights. ResNet and the next deep convolutional models used the global average pooling layer to compress the feature map and feed it to the classification layer. Although using the GAP layer reduces the computational cost, but also causes losing spatial resolution of the feature map, which results in decreasing learning efficiency. In this paper, we aim to tackle this problem by replacing the GAP layer with a new architecture called Wise-SrNet. It is inspired by the depthwise convolutional idea and is designed for processing spatial resolution while not increasing computational cost. We have evaluated our method using three different datasets they are Intel Image Classification Challenge, MIT Indoors Scenes, and a part of the ImageNet dataset. We investigated the implementation of our architecture on several models of the Inception, ResNet, and DenseNet families. Applying our architecture has revealed a significant effect on increasing convergence speed and accuracy. Our experiments on images with 224224 resolution increased the Top-1 accuracy between 2 to 8% on different datasets and models. Running our models on 512512 resolution images of the MIT Indoors Scenes dataset showed a notable result of improving the Top-1 accuracy within 3 to 26%. We will also demonstrate the GAP layer’s disadvantage when the input images are large and the number of classes is not few. In this circumstance, our proposed architecture can do a great help in enhancing classification results. The code is shared at https://github.com/mr7495/image-classification-spatial.

自卷积神经网络发展以来，主要挑战之一就是如何将提取的特征图连接到最终分类层。VGG 模型在其架构的分类部分使用了两组全连接层，这大大增加了模型的权重数量。ResNet 和下一个深度卷积模型使用全局平均池化层来压缩特征图并将其输送到分类层。虽然使用全局平均池化层可以降低计算成本，但也会导致丢失特征图的空间分辨率，从而降低学习效率。本文旨在用一种名为 Wise-SrNet 的新架构取代 GAP 层，从而解决这一问题。它受到深度卷积思想的启发，旨在处理空间分辨率的同时不增加计算成本。我们使用三个不同的数据集对我们的方法进行了评估，它们分别是英特尔图像分类挑战赛、麻省理工学院室内场景以及 ImageNet 数据集的一部分。我们在 Inception、ResNet 和 DenseNet 系列的几个模型上研究了我们架构的实施。应用我们的架构对提高收敛速度和准确性有显著效果。我们在分辨率为 224224 的图像上进行了实验，在不同的数据集和模型上，Top-1 的准确率提高了 2% 到 8%。在麻省理工学院室内场景数据集的 512512 分辨率图像上运行我们的模型，结果显示 Top-1 准确率提高了 3% 到 26%。我们还将展示 GAP 层在输入图像较大且类别数量较少时的劣势。在这种情况下，我们提出的架构对提高分类结果有很大帮助。代码共享于 https://github.com/mr7495/image-classification-spatial。

{"title":"Wise-SrNet: a novel architecture for enhancing image classification by learning spatial resolution of feature maps","authors":"Mohammad Rahimzadeh, Soroush Parvin, Amirali Askari, Elnaz Safi, Mohammad Reza Mohammadi","doi":"10.1007/s10044-024-01211-0","DOIUrl":"https://doi.org/10.1007/s10044-024-01211-0","url":null,"abstract":"One of the main challenges, since the advancement of convolutional neural networks is how to connect the extracted feature map to the final classification layer. VGG models used two sets of fully connected layers for the classification part of their architectures, which significantly increased the number of models’ weights. ResNet and the next deep convolutional models used the global average pooling layer to compress the feature map and feed it to the classification layer. Although using the GAP layer reduces the computational cost, but also causes losing spatial resolution of the feature map, which results in decreasing learning efficiency. In this paper, we aim to tackle this problem by replacing the GAP layer with a new architecture called Wise-SrNet. It is inspired by the depthwise convolutional idea and is designed for processing spatial resolution while not increasing computational cost. We have evaluated our method using three different datasets they are Intel Image Classification Challenge, MIT Indoors Scenes, and a part of the ImageNet dataset. We investigated the implementation of our architecture on several models of the Inception, ResNet, and DenseNet families. Applying our architecture has revealed a significant effect on increasing convergence speed and accuracy. Our experiments on images with 224224 resolution increased the Top-1 accuracy between 2 to 8% on different datasets and models. Running our models on 512512 resolution images of the MIT Indoors Scenes dataset showed a notable result of improving the Top-1 accuracy within 3 to 26%. We will also demonstrate the GAP layer’s disadvantage when the input images are large and the number of classes is not few. In this circumstance, our proposed architecture can do a great help in enhancing classification results. The code is shared at https://github.com/mr7495/image-classification-spatial.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature selection using adaptive manta ray foraging optimization for brain tumor classification 利用自适应蝠鲼觅食优化技术为脑肿瘤分类选择特征

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-07 DOI: 10.1007/s10044-024-01236-5

K. S. Neetha, Dayanand Lal Narayan

Brain tumor is an anomalous growth of glial and neural cells and is considered as one of the primary causes of death worldwide. Therefore, it is essential to identify the tumor as soon as possible for reducing the mortality rate throughout the world. However, the classification of brain tumor is a challenging task due to presence of irrelevant features that cause misclassification during detection. In this research, the adaptive manta ray foraging optimization (AMRFO) is proposed for performing an effective feature selection to avoid the problem of overfitting while performing the classification. The adaptive control parameter strategy is incorporated in the AMRFO for enhancing the search process while selecting the feature subset. The linear intensity distribution information and regularization parameter-based intuitionistic fuzzy C-means algorithm namely LRIFCM is used to perform the segmentation of tumor regions. Next, LeeNET, gray-level co-occurrence matrix, local ternary pattern, histogram of gradients, and shape features are used to extract essential features from the segmented regions. Further, the attention-based long short-term memory (ALSTM) is used to classify the brain tumor types according to the features selected by AMRFO. The datasets utilized in this research study for the evaluation of AMRFO-ALSTM method are BRATS 2017, BRATS 2018, and Figshare brain datasets. Segmentation and classification are the two different evaluations examined for the AMRFO-ALSTM. The structural similarity index measure, Jaccard, dice, accuracy, and sensitivity are utilized during segmentation evaluation, while accuracy, specificity, sensitivity, precision, and F1-score are used during classification evaluation. The existing researches namely, transformer-enhanced convolutional neural network, Chan Vese (CV)-support vector machine, CV-K-nearest neighbor, deep convolutional neural network (DCNN), and salp water optimization with deep belief network are used to compare with the AMRFO-ALSTM. The accuracy of AMRFO-ALSTM for Figshare brain dataset is 99.80 which is a greater achievement when compared to the DCNN.

脑肿瘤是神经胶质细胞和神经细胞的异常生长，被认为是全球死亡的主要原因之一。因此，尽快识别肿瘤以降低全球死亡率至关重要。然而，脑肿瘤的分类是一项具有挑战性的任务，因为在检测过程中存在一些不相关的特征，导致分类错误。在这项研究中，提出了自适应鳐鱼觅食优化（AMRFO）来进行有效的特征选择，以避免在进行分类时出现过拟合问题。在 AMRFO 中加入了自适应控制参数策略，以便在选择特征子集时增强搜索过程。基于线性强度分布信息和正则化参数的直觉模糊 C-means 算法（即 LRIFCM）被用来对肿瘤区域进行分割。接下来，LeeNET、灰度共现矩阵、局部三元模式、梯度直方图和形状特征被用来提取分割区域的基本特征。然后，根据 AMRFO 选择的特征，使用注意力长短期记忆（ALSTM）对脑肿瘤类型进行分类。本研究用于评估 AMRFO-ALSTM 方法的数据集是 BRATS 2017、BRATS 2018 和 Figshare 脑数据集。分割和分类是对 AMRFO-ALSTM 的两种不同评估。在分割评估中使用了结构相似性指数度量、Jaccard、骰子、准确度和灵敏度，而在分类评估中使用了准确度、特异性、灵敏度、精确度和 F1 分数。与 AMRFO-ALSTM 相比，现有的研究包括变压器增强型卷积神经网络、支持向量机（CV）、CV-K-近邻、深度卷积神经网络（DCNN）和带深度信念网络的盐水优化。在 Figshare 大脑数据集上，AMRFO-ALSTM 的准确率为 99.80，与 DCNN 相比取得了更高的成绩。

{"title":"Feature selection using adaptive manta ray foraging optimization for brain tumor classification","authors":"K. S. Neetha, Dayanand Lal Narayan","doi":"10.1007/s10044-024-01236-5","DOIUrl":"https://doi.org/10.1007/s10044-024-01236-5","url":null,"abstract":"Brain tumor is an anomalous growth of glial and neural cells and is considered as one of the primary causes of death worldwide. Therefore, it is essential to identify the tumor as soon as possible for reducing the mortality rate throughout the world. However, the classification of brain tumor is a challenging task due to presence of irrelevant features that cause misclassification during detection. In this research, the adaptive manta ray foraging optimization (AMRFO) is proposed for performing an effective feature selection to avoid the problem of overfitting while performing the classification. The adaptive control parameter strategy is incorporated in the AMRFO for enhancing the search process while selecting the feature subset. The linear intensity distribution information and regularization parameter-based intuitionistic fuzzy C-means algorithm namely LRIFCM is used to perform the segmentation of tumor regions. Next, LeeNET, gray-level co-occurrence matrix, local ternary pattern, histogram of gradients, and shape features are used to extract essential features from the segmented regions. Further, the attention-based long short-term memory (ALSTM) is used to classify the brain tumor types according to the features selected by AMRFO. The datasets utilized in this research study for the evaluation of AMRFO-ALSTM method are BRATS 2017, BRATS 2018, and Figshare brain datasets. Segmentation and classification are the two different evaluations examined for the AMRFO-ALSTM. The structural similarity index measure, Jaccard, dice, accuracy, and sensitivity are utilized during segmentation evaluation, while accuracy, specificity, sensitivity, precision, and F1-score are used during classification evaluation. The existing researches namely, transformer-enhanced convolutional neural network, Chan Vese (CV)-support vector machine, CV-K-nearest neighbor, deep convolutional neural network (DCNN), and salp water optimization with deep belief network are used to compare with the AMRFO-ALSTM. The accuracy of AMRFO-ALSTM for Figshare brain dataset is 99.80 which is a greater achievement when compared to the DCNN.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"59 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140074218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantifying robustness: 3D tree point cloud skeletonization with smart-tree in noisy domains 量化鲁棒性：在噪声域中使用智能树进行三维树点云骨架化

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-05 DOI: 10.1007/s10044-024-01238-3

Abstract

Extracting tree skeletons from 3D tree point clouds is challenged by noise and incomplete data. While our prior work (Dobbs et al., in: Iberian conference on pattern recognition and image analysis, Springer, Berlin, pp. 351–362, 2023) introduced a deep learning approach for approximating tree branch medial axes, its robustness against various types of noise has not been thoroughly evaluated. This paper addresses this gap. Specifically, we simulate real-world noise challenges by introducing 3D Perlin noise (to represent subtractive noise) and Gaussian noise (to mimic additive noise). To facilitate this evaluation, we introduce a new synthetic tree point cloud dataset, available at https://github.com/uc-vision/synthetic-trees-II. Our results indicate that our deep learning-based skeletonization method is tolerant to both additive and subtractive noise.

摘要从三维树木点云中提取树木骨架面临着噪声和数据不完整的挑战。而我们之前的工作（Dobbs et al：Iberian conference on pattern recognition and image analysis, Springer, Berlin, pp.本文弥补了这一空白。具体来说，我们通过引入三维佩林噪声（代表减法噪声）和高斯噪声（模拟加法噪声）来模拟真实世界的噪声挑战。为了便于评估，我们引入了一个新的合成树点云数据集，该数据集可在 https://github.com/uc-vision/synthetic-trees-II 上获取。结果表明，我们基于深度学习的骨架化方法对加性和减性噪声都有很好的耐受性。

引用次数: 0

LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes LFFNet：用于道路场景实时语义分割的轻量级特征增强融合网络

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-03-05 DOI: 10.1007/s10044-024-01237-4

Xuegang Hu, Jing Feng, Juelin Gong

Deep neural networks have significantly improved semantic segmentation, but their great performance frequently comes at the expense of expensive computation and protracted inference times, which fall short of the exacting standards of real-world applications. A lightweight feature-enhanced fusion network (LFFNet) for real-time semantic segmentation is proposed. LFFNet is a particular type of asymmetric encoder–decoder structure. In the encoder, A multi-dilation rate fusion module can guarantee the retention of local information while enlarging the appropriate field in the encoder section, which resolves the issue of insufficient feature extraction caused by the variability of target size. In the decoder, different decoding modules are designed for spatial information and semantic information. The attentional feature enhancement module takes advantage of the attention mechanism to feature-optimize the contextual information of the high-level output, and the lightweight multi-scale feature fusion module fuses the features from various stages to aggregate more spatial detail information and contextual semantic information. The experimental findings demonstrate that LFFNet achieves 72.1% mIoU and 67.0% mIoU on Cityscapes and Camvid datasets at 102 FPS and 244 FPS, respectively, with only 0.63M parameters. Note that there is neither pretraining nor pre-processing. Our model can achieve superior segmentation performance with fewer parameters and less computation compared to existing networks.

深度神经网络极大地改进了语义分割，但其出色的性能往往以昂贵的计算费用和漫长的推理时间为代价，无法满足现实世界应用的严格标准。本文提出了一种用于实时语义分割的轻量级特征增强融合网络（LFFNet）。LFFNet 是一种特殊的非对称编码器-解码器结构。在编码器中，多倍缩放率融合模块可以在保证保留局部信息的同时，扩大编码器部分的适当区域，从而解决因目标尺寸变化而导致的特征提取不足的问题。在解码器中，针对空间信息和语义信息设计了不同的解码模块。注意力特征增强模块利用注意力机制，对高层次输出的上下文信息进行特征优化；轻量级多尺度特征融合模块则将不同阶段的特征进行融合，以聚合更多的空间细节信息和上下文语义信息。实验结果表明，LFFNet 在 Cityscapes 和 Camvid 数据集上分别以 102 FPS 和 244 FPS 实现了 72.1% mIoU 和 67.0% mIoU，仅需 0.63M 参数。请注意，这既不需要预训练，也不需要预处理。与现有网络相比，我们的模型只需较少的参数和计算量就能实现出色的分割性能。

{"title":"LFFNet: lightweight feature-enhanced fusion network for real-time semantic segmentation of road scenes","authors":"Xuegang Hu, Jing Feng, Juelin Gong","doi":"10.1007/s10044-024-01237-4","DOIUrl":"https://doi.org/10.1007/s10044-024-01237-4","url":null,"abstract":"Deep neural networks have significantly improved semantic segmentation, but their great performance frequently comes at the expense of expensive computation and protracted inference times, which fall short of the exacting standards of real-world applications. A lightweight feature-enhanced fusion network (LFFNet) for real-time semantic segmentation is proposed. LFFNet is a particular type of asymmetric encoder–decoder structure. In the encoder, A multi-dilation rate fusion module can guarantee the retention of local information while enlarging the appropriate field in the encoder section, which resolves the issue of insufficient feature extraction caused by the variability of target size. In the decoder, different decoding modules are designed for spatial information and semantic information. The attentional feature enhancement module takes advantage of the attention mechanism to feature-optimize the contextual information of the high-level output, and the lightweight multi-scale feature fusion module fuses the features from various stages to aggregate more spatial detail information and contextual semantic information. The experimental findings demonstrate that LFFNet achieves 72.1% mIoU and 67.0% mIoU on Cityscapes and Camvid datasets at 102 FPS and 244 FPS, respectively, with only 0.63M parameters. Note that there is neither pretraining nor pre-processing. Our model can achieve superior segmentation performance with fewer parameters and less computation compared to existing networks.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"38 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140035365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-supervised fuzzy broad learning system based on mean-teacher model 基于平均教师模型的半监督模糊广泛学习系统

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01217-8

Zizhu Fan, Yijing Huang, Chao Xi, Cheng Peng, Shitong Wang

Fuzzy broad learning system (FBLS) is a newly proposed fuzzy system, which introduces Takagi–Sugeno fuzzy model into broad learning system. It has shown that FBLS has better nonlinear fitting ability and faster calculation speed than the most of fuzzy neural networks proposed earlier. At the same time, compared to other fuzzy neural networks, FBLS has fewer rules and lower cost of training time. However, label errors or missing are prone to appear in large-scale dataset, which will greatly reduce the performance of FBLS. Therefore, how to use limited label information to train a powerful classifier is an important challenge. In order to address this problem, we introduce Mean-Teacher model for the fuzzy broad learning system. We use the Mean-Teacher model to rebuild the weights of the output layer of FBLS, and use the Teacher–Student model to train FBLS. The proposed model is an implementation of semi-supervised learning which integrates fuzzy logic and broad learning system in the Mean-Teacher-based knowledge distillation framework. Finally, we have proved the great performance of Mean-Teacher-based fuzzy broad learning system (MT-FBLS) through a large number of experiments.

模糊广义学习系统（FBLS）是一种新提出的模糊系统，它在广义学习系统中引入了高木-菅野模糊模型。研究表明，与之前提出的大多数模糊神经网络相比，FBLS 具有更好的非线性拟合能力和更快的计算速度。同时，与其他模糊神经网络相比，FBLS 的规则更少，训练时间成本更低。但是，在大规模数据集中容易出现标签错误或缺失，这将大大降低 FBLS 的性能。因此，如何利用有限的标签信息训练出强大的分类器是一个重要的挑战。为了解决这个问题，我们为模糊广义学习系统引入了 Mean-Teacher 模型。我们使用 Mean-Teacher 模型重建 FBLS 输出层的权重，并使用 Teacher-Student 模型训练 FBLS。所提出的模型是半监督学习的一种实现，它在基于平均-教师的知识提炼框架中整合了模糊逻辑和广义学习系统。最后，我们通过大量实验证明了基于中值-教师的模糊广义学习系统（MT-FBLS）的卓越性能。

{"title":"Semi-supervised fuzzy broad learning system based on mean-teacher model","authors":"Zizhu Fan, Yijing Huang, Chao Xi, Cheng Peng, Shitong Wang","doi":"10.1007/s10044-024-01217-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01217-8","url":null,"abstract":"Fuzzy broad learning system (FBLS) is a newly proposed fuzzy system, which introduces Takagi–Sugeno fuzzy model into broad learning system. It has shown that FBLS has better nonlinear fitting ability and faster calculation speed than the most of fuzzy neural networks proposed earlier. At the same time, compared to other fuzzy neural networks, FBLS has fewer rules and lower cost of training time. However, label errors or missing are prone to appear in large-scale dataset, which will greatly reduce the performance of FBLS. Therefore, how to use limited label information to train a powerful classifier is an important challenge. In order to address this problem, we introduce Mean-Teacher model for the fuzzy broad learning system. We use the Mean-Teacher model to rebuild the weights of the output layer of FBLS, and use the Teacher–Student model to train FBLS. The proposed model is an implementation of semi-supervised learning which integrates fuzzy logic and broad learning system in the Mean-Teacher-based knowledge distillation framework. Finally, we have proved the great performance of Mean-Teacher-based fuzzy broad learning system (MT-FBLS) through a large number of experiments.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"148 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0