首页 > 最新文献

Pattern Recognition最新文献

英文 中文
A wrapper feature selection approach using Markov blankets 使用马尔可夫毛毯的包装特征选择方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-05 DOI: 10.1016/j.patcog.2024.111069
In feature selection, Markov Blanket (MB) based approaches have attracted considerable attention with most MB discovery algorithms being categorized as filter based techniques. Typically, the Conditional Independence (CI) test employed by such methods is different for different data types. In this article, we propose a novel Markov Blanket based wrapper feature selection method. The proposed approach employs Predictive Permutation Independence (PPI), a novel Conditional Independence (CI) test that allows it to work out-of-the-box for both classification and regression tasks on mixed data. PPI can work with any supervised algorithm to estimate the association of a feature with the target variable while also providing a measure of feature importance. The proposed approach also includes an optional MB aggregation step that can be used to find the optimal MB under non-faithful conditions. Our method1 outperforms other MB discovery methods, in terms of F1-score, by 7% on average, over 3 large-scale BN datasets. It also outperforms state-of-the-art feature selection techniques on 13 real-world datasets.
在特征选择方面,基于马尔可夫空白(MB)的方法引起了广泛关注,大多数基于马尔可夫空白的发现算法都被归类为基于过滤的技术。通常情况下,这些方法所采用的条件独立性(CI)测试对于不同的数据类型是不同的。在本文中,我们提出了一种新颖的基于马尔可夫空白的包装特征选择方法。所提出的方法采用了一种新颖的条件独立性(CI)测试方法--预测迭代独立性(PPI),使其能够在混合数据的分类和回归任务中开箱即用。PPI 可以与任何监督算法配合使用,以估计特征与目标变量的关联性,同时还能提供特征重要性的度量。所提出的方法还包括一个可选的 MB 聚合步骤,可用于在非忠实条件下找到最佳 MB。在 3 个大规模 BN 数据集上,我们的方法1 在 F1 分数上平均比其他 MB 发现方法高出 7%。在 13 个真实世界数据集上,它的表现也优于最先进的特征选择技术。
{"title":"A wrapper feature selection approach using Markov blankets","authors":"","doi":"10.1016/j.patcog.2024.111069","DOIUrl":"10.1016/j.patcog.2024.111069","url":null,"abstract":"<div><div>In feature selection, Markov Blanket (MB) based approaches have attracted considerable attention with most MB discovery algorithms being categorized as filter based techniques. Typically, the Conditional Independence (CI) test employed by such methods is different for different data types. In this article, we propose a novel Markov Blanket based wrapper feature selection method. The proposed approach employs Predictive Permutation Independence (PPI), a novel Conditional Independence (CI) test that allows it to work out-of-the-box for both classification and regression tasks on mixed data. PPI can work with any supervised algorithm to estimate the association of a feature with the target variable while also providing a measure of feature importance. The proposed approach also includes an optional MB aggregation step that can be used to find the optimal MB under non-faithful conditions. Our method<span><span><sup>1</sup></span></span> outperforms other MB discovery methods, in terms of F1-score, by 7% on average, over 3 large-scale BN datasets. It also outperforms state-of-the-art feature selection techniques on 13 real-world datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intuitive-K-prototypes: A mixed data clustering algorithm with intuitionistic distribution centroid 直观-K-原型:带有直观分布中心点的混合数据聚类算法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-05 DOI: 10.1016/j.patcog.2024.111062
Data sets are usually mixed with numerical and categorical attributes in the real world. Data mining of mixed data makes a lot of sense. This paper proposes an Intuitive-K-prototypes clustering algorithm with improved prototype representation and attribute weights. The proposed algorithm defines intuitionistic distribution centroid for categorical attributes. In our approach, a heuristic search for initial prototypes is performed. Then, we combine the mean of numerical attributes and intuitionistic distribution centroid to represent the cluster prototype. In addition, intra-cluster complexity and inter-cluster similarity are used to adjust attribute weights, with higher priority given to those with lower complexity and similarity. The membership and non-membership distance are calculated using the intuitionistic distribution centroid. These distances are then combined parametrically to obtain the composite distance. The algorithm is judged for its clustering effectiveness on the real UCI data set, and the results show that the proposed algorithm outperforms the traditional clustering algorithm in most cases.
在现实世界中,数据集通常混合了数字和分类属性。对混合数据进行数据挖掘非常有意义。本文提出了一种改进了原型表示和属性权重的直观 K 原型聚类算法。所提出的算法为分类属性定义了直观分布中心点。在我们的方法中,会对初始原型进行启发式搜索。然后,我们结合数值属性的平均值和直觉分布中心点来表示聚类原型。此外,聚类内部复杂性和聚类间相似性也用于调整属性权重,复杂性和相似性较低的属性优先级较高。使用直观分布中心点计算成员和非成员距离。然后将这些距离进行参数组合,得到复合距离。在真实的 UCI 数据集上对该算法的聚类效果进行了评判,结果表明,所提出的算法在大多数情况下都优于传统的聚类算法。
{"title":"Intuitive-K-prototypes: A mixed data clustering algorithm with intuitionistic distribution centroid","authors":"","doi":"10.1016/j.patcog.2024.111062","DOIUrl":"10.1016/j.patcog.2024.111062","url":null,"abstract":"<div><div>Data sets are usually mixed with numerical and categorical attributes in the real world. Data mining of mixed data makes a lot of sense. This paper proposes an Intuitive-K-prototypes clustering algorithm with improved prototype representation and attribute weights. The proposed algorithm defines intuitionistic distribution centroid for categorical attributes. In our approach, a heuristic search for initial prototypes is performed. Then, we combine the mean of numerical attributes and intuitionistic distribution centroid to represent the cluster prototype. In addition, intra-cluster complexity and inter-cluster similarity are used to adjust attribute weights, with higher priority given to those with lower complexity and similarity. The membership and non-membership distance are calculated using the intuitionistic distribution centroid. These distances are then combined parametrically to obtain the composite distance. The algorithm is judged for its clustering effectiveness on the real UCI data set, and the results show that the proposed algorithm outperforms the traditional clustering algorithm in most cases.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RP-Net: A Robust Polar Transformation Network for rotation-invariant face detection RP-Net:用于旋转不变人脸检测的鲁棒极坐标变换网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-04 DOI: 10.1016/j.patcog.2024.111044
Face detection is challenging in unconstrained environments, where it encounters various challenges such as orientation, pose, and occlusion. Deep convolutional neural networks, particularly cascaded ones, have greatly improved detection performance but still struggle with rotating objects due to limitations in the Cartesian coordinate system. Although data augmentation can mitigate this issue, it also increases computational demands. This paper introduces the Robust Polar Transformation Network (RP-Net) for rotation-invariant face detection. RP-Net converts the complex rotational problem into a simpler translational one to enhance feature extraction and computational efficiency. Additionally, the Advanced Spatial-Channel Restoration (ASCR) module optimizes facial landmark detection within polar domains and restores critical details lost during transformation. Experimental results on benchmark datasets show that RP-Net significantly improves rotation invariance over traditional CNNs and surpasses several state-of-the-art rotation-invariant face detection methods.
人脸检测在无约束环境中具有挑战性,会遇到方向、姿势和遮挡等各种难题。深度卷积神经网络,尤其是级联神经网络,大大提高了检测性能,但由于笛卡尔坐标系的限制,在检测旋转物体时仍有困难。虽然数据增强可以缓解这一问题,但也会增加计算需求。本文介绍了用于旋转不变人脸检测的鲁棒极坐标变换网络(RP-Net)。RP-Net 将复杂的旋转问题转换为简单的平移问题,从而提高了特征提取和计算效率。此外,高级空间通道恢复(ASCR)模块优化了极域内的人脸地标检测,并恢复了在转换过程中丢失的关键细节。在基准数据集上的实验结果表明,RP-Net 比传统的 CNN 显著提高了旋转不变性,并超越了几种最先进的旋转不变人脸检测方法。
{"title":"RP-Net: A Robust Polar Transformation Network for rotation-invariant face detection","authors":"","doi":"10.1016/j.patcog.2024.111044","DOIUrl":"10.1016/j.patcog.2024.111044","url":null,"abstract":"<div><div>Face detection is challenging in unconstrained environments, where it encounters various challenges such as orientation, pose, and occlusion. Deep convolutional neural networks, particularly cascaded ones, have greatly improved detection performance but still struggle with rotating objects due to limitations in the Cartesian coordinate system. Although data augmentation can mitigate this issue, it also increases computational demands. This paper introduces the Robust Polar Transformation Network (RP-Net) for rotation-invariant face detection. RP-Net converts the complex rotational problem into a simpler translational one to enhance feature extraction and computational efficiency. Additionally, the Advanced Spatial-Channel Restoration (ASCR) module optimizes facial landmark detection within polar domains and restores critical details lost during transformation. Experimental results on benchmark datasets show that RP-Net significantly improves rotation invariance over traditional CNNs and surpasses several state-of-the-art rotation-invariant face detection methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CRCGAN: Toward robust feature extraction in finger vein recognition CRCGAN:在手指静脉识别中实现鲁棒特征提取
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-04 DOI: 10.1016/j.patcog.2024.111064
Deep convolutional neural networks (CNNs) have produced remarkable outcomes in finger vein recognition. However, these networks often overfit label information, losing essential image features, and are sensitive to noise, with minor input changes leading to incorrect recognition. To address above problems, this paper presents a new classification reconstruction cycle generative adversarial network (CRCGAN) for finger vein recognition. CRCGAN comprises a feature generator, a feature discriminator, an image generator, and an image discriminator, which are designed for robust feature extraction. Concretely, the feature generator extracts features for classification, while the image generator reconstructs images from these features. Two discriminators provide feedback, guiding the generators to improve the quality of generated data. With this design of bi-directional image-to-feature mapping and cyclic adversarial training, CRCGAN achieves the extraction of essential features and minimizes overfitting. Additionally, precisely due to the extraction of essential features, CRCGAN is not sensitive to noise. Experimental results on three public databases, including THU-FVFDT2, HKPU, and USM, demonstrate CRCGAN’s competitive performance and strong noise resistance, achieving recognition accuracies of 98.36%, 99.17% and 99.49% respectively, with less than 0.5% degradation on HKPU and USM databases under noisy conditions.
深度卷积神经网络(CNN)在手指静脉识别方面取得了显著成果。然而,这些网络往往过度拟合标签信息,从而丢失了基本的图像特征,而且对噪声很敏感,微小的输入变化就会导致识别错误。针对上述问题,本文提出了一种新的用于手指静脉识别的分类重建循环生成对抗网络(CRCGAN)。CRCGAN 由特征生成器、特征判别器、图像生成器和图像判别器组成,旨在实现稳健的特征提取。具体来说,特征发生器提取用于分类的特征,而图像发生器则根据这些特征重建图像。两个判别器提供反馈,指导生成器提高生成数据的质量。通过这种图像到特征的双向映射和循环对抗训练设计,CRCGAN 可以提取基本特征,并最大限度地减少过拟合。此外,正是由于提取了基本特征,CRCGAN 对噪声并不敏感。在 THU-FVFDT2、HKPU 和 USM 等三个公共数据库上的实验结果表明,CRCGAN 具有极高的性能竞争力和很强的抗噪能力,在 HKPU 和 USM 数据库上,CRCGAN 在噪声条件下的识别准确率分别达到 98.36%、99.17% 和 99.49%,识别率下降不到 0.5%。
{"title":"CRCGAN: Toward robust feature extraction in finger vein recognition","authors":"","doi":"10.1016/j.patcog.2024.111064","DOIUrl":"10.1016/j.patcog.2024.111064","url":null,"abstract":"<div><div>Deep convolutional neural networks (CNNs) have produced remarkable outcomes in finger vein recognition. However, these networks often overfit label information, losing essential image features, and are sensitive to noise, with minor input changes leading to incorrect recognition. To address above problems, this paper presents a new classification reconstruction cycle generative adversarial network (CRCGAN) for finger vein recognition. CRCGAN comprises a feature generator, a feature discriminator, an image generator, and an image discriminator, which are designed for robust feature extraction. Concretely, the feature generator extracts features for classification, while the image generator reconstructs images from these features. Two discriminators provide feedback, guiding the generators to improve the quality of generated data. With this design of bi-directional image-to-feature mapping and cyclic adversarial training, CRCGAN achieves the extraction of essential features and minimizes overfitting. Additionally, precisely due to the extraction of essential features, CRCGAN is not sensitive to noise. Experimental results on three public databases, including THU-FVFDT2, HKPU, and USM, demonstrate CRCGAN’s competitive performance and strong noise resistance, achieving recognition accuracies of 98.36%, 99.17% and 99.49% respectively, with less than 0.5% degradation on HKPU and USM databases under noisy conditions.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic aware representation learning for optimizing image retrieval systems in radiology 优化放射学图像检索系统的语义感知表征学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-01 DOI: 10.1016/j.patcog.2024.111060
Content-based image retrieval (CBIR), which consists of ranking a set of images with respect to a query image based on visual similarity, can assist diagnostic radiologists in assessing medical images, by identifying similar digital images in large image databases. Despite the many recent advances and innovations in CBIR for general images, their adoption in radiology has been slow and limited. In the current paper we attempt to close the gap between the two domains and wisely adapt modern CBIR techniques to radiology images: by extending the latest representation learning techniques in a way that can overcome the unique challenges and at the same time take advantage of the specific opportunities that are present in radiology we were able to come up with novel and effective medical image retrieval methods. Our method achieves the highest CUI@5 scores (18.48, 15.95) on two widely used datasets (ROCO and MEDICAT respectively), showcasing the superiority of the proposed method in comparison with state-of-the-art relevant alternatives.
基于内容的图像检索(CBIR)包括根据视觉相似性对一组图像与查询图像进行排序,通过识别大型图像数据库中的相似数字图像,可以帮助放射诊断医师评估医学图像。尽管最近在普通图像的 CBIR 方面取得了许多进展和创新,但它们在放射学中的应用却十分缓慢和有限。在本文中,我们试图缩小这两个领域之间的差距,并明智地将现代 CBIR 技术应用于放射学图像:通过扩展最新的表示学习技术,我们能够克服独特的挑战,同时利用放射学中存在的特殊机遇,提出新颖而有效的医学图像检索方法。我们的方法在两个广泛使用的数据集(分别为 ROCO 和 MEDICAT)上获得了最高的 CUI@5 分数(18.48 分和 15.95 分),与最先进的相关替代方法相比,展示了所提出方法的优越性。
{"title":"Semantic aware representation learning for optimizing image retrieval systems in radiology","authors":"","doi":"10.1016/j.patcog.2024.111060","DOIUrl":"10.1016/j.patcog.2024.111060","url":null,"abstract":"<div><div>Content-based image retrieval (CBIR), which consists of ranking a set of images with respect to a query image based on visual similarity, can assist diagnostic radiologists in assessing medical images, by identifying similar digital images in large image databases. Despite the many recent advances and innovations in CBIR for general images, their adoption in radiology has been slow and limited. In the current paper we attempt to close the gap between the two domains and wisely adapt modern CBIR techniques to radiology images: by extending the latest representation learning techniques in a way that can overcome the unique challenges and at the same time take advantage of the specific opportunities that are present in radiology we were able to come up with novel and effective medical image retrieval methods. Our method achieves the highest CUI@5 scores (18.48, 15.95) on two widely used datasets (ROCO and MEDICAT respectively), showcasing the superiority of the proposed method in comparison with state-of-the-art relevant alternatives.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An interpretable unsupervised capsule network via comprehensive contrastive learning and two-stage training 通过综合对比学习和两阶段训练实现可解释的无监督胶囊网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-30 DOI: 10.1016/j.patcog.2024.111059
Limited attention has been given to unsupervised capsule networks (CapsNets) with contrastive learning due to the challenge of harmoniously learning interpretable primary and high-level capsules. To address this issue, we focus on three aspects: loss function, routing algorithm, and training strategy. First, we propose a comprehensive contrastive loss to ensure consistency in learning both high-level and primary capsules across different objects. Next, we introduce an agreement-based routing mechanism for the activation of high-level capsules. Finally, we present a two-stage training strategy to resolve conflicts between multiple losses. Ablation experiments show that these methods all improve model performance. Results from linear evaluation and semi-supervised learning demonstrate that our model outperforms other CapsNets and convolutional neural networks in learning high-level capsules. Additionally, visualizing capsules provides insights into the primary capsules, which remain consistent across images and align with human vision.
由于协调学习可解释的初级胶囊和高级胶囊所面临的挑战,人们对具有对比学习功能的无监督胶囊网络(CapsNets)关注有限。为了解决这个问题,我们重点关注三个方面:损失函数、路由算法和训练策略。首先,我们提出了一个全面的对比损失函数,以确保在学习不同对象的高级和初级胶囊时的一致性。其次,我们引入了一种基于协议的路由机制,用于激活高级胶囊。最后,我们提出了一种两阶段训练策略,以解决多重损失之间的冲突。消融实验表明,这些方法都能提高模型性能。线性评估和半监督学习的结果表明,我们的模型在学习高级胶囊方面优于其他 CapsNets 和卷积神经网络。此外,通过对胶囊进行可视化,还能深入了解主要胶囊,这些胶囊在不同图像中保持一致,并与人类视觉相吻合。
{"title":"An interpretable unsupervised capsule network via comprehensive contrastive learning and two-stage training","authors":"","doi":"10.1016/j.patcog.2024.111059","DOIUrl":"10.1016/j.patcog.2024.111059","url":null,"abstract":"<div><div>Limited attention has been given to unsupervised capsule networks (CapsNets) with contrastive learning due to the challenge of harmoniously learning interpretable primary and high-level capsules. To address this issue, we focus on three aspects: loss function, routing algorithm, and training strategy. First, we propose a comprehensive contrastive loss to ensure consistency in learning both high-level and primary capsules across different objects. Next, we introduce an agreement-based routing mechanism for the activation of high-level capsules. Finally, we present a two-stage training strategy to resolve conflicts between multiple losses. Ablation experiments show that these methods all improve model performance. Results from linear evaluation and semi-supervised learning demonstrate that our model outperforms other CapsNets and convolutional neural networks in learning high-level capsules. Additionally, visualizing capsules provides insights into the primary capsules, which remain consistent across images and align with human vision.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image dehazing via self-supervised depth guidance 通过自监督深度引导进行图像去毛刺
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-30 DOI: 10.1016/j.patcog.2024.111051
Self-supervised learning methods have demonstrated promising benefits to feature representation learning for image dehazing tasks, especially for avoiding the laborious work of collecting hazy-clean image pairs, while also enabling better generalization abilities of the model. Despite the long-standing interests in depth estimation for image dehazing tasks, few works have fully explored the interactions between depth and dehazing tasks in an unsupervised manner. In this paper, a self-supervised image dehazing framework under the guidance of self-supervised depth estimation has been proposed, to fully exploit the interactions between depth and hazes for image dehazing. Specifically, the hazy image and the corresponding depth estimation are generated and optimized from the clear image in a dual-network self-supervised manner. The correlations between depth and hazy images are exploited in depth-guided hybrid attention Transformer blocks, which adaptively leverage both the cross-attention and self-attention to effectively model hazy densities via cross-modality fusion and capture global context information for better feature representations. In addition, the depth estimations of hazy images are further explored for the detection tasks on hazy images. Extensive experiments demonstrate that the depth estimation not only enhances the model generalization ability across different dehazing datasets, leading to state-of-the-art self-supervised dehazing performance, but also benefits downstream detection tasks on hazy images. Our code is available at https://github.com/DongLiangSXU/Depth-Guidance-dehazing.git.
自监督学习方法在图像去毛刺任务的特征表征学习中表现出了很好的优势,尤其是避免了收集朦胧-干净图像对的繁重工作,同时还能使模型具有更好的泛化能力。尽管深度估计在图像去毛刺任务中的应用由来已久,但很少有研究以无监督的方式充分探索深度和去毛刺任务之间的相互作用。本文提出了一种在自监督深度估计指导下的自监督图像去毛刺框架,以充分利用深度和灰度之间的相互作用来进行图像去毛刺。具体来说,朦胧图像和相应的深度估计是以双网络自监督方式从清晰图像中生成并优化的。深度和朦胧图像之间的相关性在深度引导的混合注意力变换器块中得到利用,该变换器块自适应地利用交叉注意力和自我注意力,通过跨模态融合对朦胧密度进行有效建模,并捕捉全局上下文信息以获得更好的特征表示。此外,我们还进一步探索了朦胧图像的深度估计,以用于朦胧图像的检测任务。广泛的实验证明,深度估计不仅增强了模型在不同去毛刺数据集上的泛化能力,从而实现最先进的自监督去毛刺性能,而且还有利于朦胧图像的下游检测任务。我们的代码见 https://github.com/DongLiangSXU/Depth-Guidance-dehazing.git。
{"title":"Image dehazing via self-supervised depth guidance","authors":"","doi":"10.1016/j.patcog.2024.111051","DOIUrl":"10.1016/j.patcog.2024.111051","url":null,"abstract":"<div><div>Self-supervised learning methods have demonstrated promising benefits to feature representation learning for image dehazing tasks, especially for avoiding the laborious work of collecting hazy-clean image pairs, while also enabling better generalization abilities of the model. Despite the long-standing interests in depth estimation for image dehazing tasks, few works have fully explored the interactions between depth and dehazing tasks in an unsupervised manner. In this paper, a self-supervised image dehazing framework under the guidance of self-supervised depth estimation has been proposed, to fully exploit the interactions between depth and hazes for image dehazing. Specifically, the hazy image and the corresponding depth estimation are generated and optimized from the clear image in a dual-network self-supervised manner. The correlations between depth and hazy images are exploited in depth-guided hybrid attention Transformer blocks, which adaptively leverage both the cross-attention and self-attention to effectively model hazy densities via cross-modality fusion and capture global context information for better feature representations. In addition, the depth estimations of hazy images are further explored for the detection tasks on hazy images. Extensive experiments demonstrate that the depth estimation not only enhances the model generalization ability across different dehazing datasets, leading to state-of-the-art self-supervised dehazing performance, but also benefits downstream detection tasks on hazy images. Our code is available at <span><span>https://github.com/DongLiangSXU/Depth-Guidance-dehazing.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IIS-FVIQA: Finger Vein Image Quality Assessment with intra-class and inter-class similarity IIS-FVIQA:利用类内和类间相似性进行手指静脉图像质量评估
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-29 DOI: 10.1016/j.patcog.2024.111056
In recent years, Finger Vein Image Quality Assessment (FVIQA) has been recognized as an effective solution to the problem of erroneous recognition resulting from low image quality due to false and missing information in finger vein images, and has become an important part of finger vein recognition systems. Compared to traditional FVIQA methods that rely on domain knowledge, newer methods that reject low-quality images have been favored for their independence from human interference. However, these methods only consider intra-class similarity information and ignore valuable information from inter-class distribution, which is also an important factor in evaluating the performance of recognition systems. In this work, we propose a novel FVIQA approach, named IIS-FVIQA, which concurrently takes into account the intra-class similarity density and inter-class similarity distribution distance within recognition systems. Specifically, our method generates quality scores for finger vein images by combining the information entropy of intra-class similarity distribution and Wasserstein distance of inter-class distribution. Then, we train a regression network for quality prediction using training images and corresponding quality scores. When a new image enters the recognition system, the trained regression network directly predicts the quality score of the image, making it easier for the system to select the corresponding operation based on the quality score of the image. Extensive experiments conducted on benchmark datasets demonstrate that the IIS-FVIQA method proposed in this paper consistently achieves top performance across multiple public datasets. After filtering out 10% of low-quality images predicted by the quality regression network, the recognition system’s performance improves by 43.96% (SDUMLA), 32.23% (MMCBNU_6000), and 21.20% (FV-USM), respectively. Furthermore, the method exhibits strong generalizability across different recognition algorithms (e.g., LBP, MC, and Inception V3) and datasets (e.g., SDUMLA, MMCBNU_6000, and FV-USM).
近年来,指静脉图像质量评估(Finger Vein Image Quality Assessment,FVIQA)被认为是解决因指静脉图像中虚假和缺失信息导致图像质量低而造成识别错误问题的有效方法,并已成为指静脉识别系统的重要组成部分。与依赖领域知识的传统 FVIQA 方法相比,剔除低质量图像的新方法因不受人为干扰而受到青睐。然而,这些方法只考虑了类内相似性信息,忽略了类间分布的宝贵信息,而这也是评估识别系统性能的一个重要因素。在这项工作中,我们提出了一种名为 IIS-FVIQA 的新型 FVIQA 方法,它同时考虑了识别系统中类内相似性密度和类间相似性分布距离。具体来说,我们的方法通过结合类内相似性分布的信息熵和类间分布的瓦瑟斯坦距离来生成指静脉图像的质量分数。然后,我们利用训练图像和相应的质量分数训练一个回归网络来进行质量预测。当新图像进入识别系统时,训练好的回归网络会直接预测图像的质量得分,使系统更容易根据图像的质量得分选择相应的操作。在基准数据集上进行的大量实验表明,本文提出的 IIS-FVIQA 方法在多个公共数据集上始终保持最高性能。在过滤掉质量回归网络预测的 10% 低质量图像后,识别系统的性能分别提高了 43.96% (SDUMLA)、32.23% (MMCBNU_6000) 和 21.20% (FV-USM)。此外,该方法在不同的识别算法(如 LBP、MC 和 Inception V3)和数据集(如 SDUMLA、MMCBNU_6000 和 FV-USM)中表现出很强的通用性。
{"title":"IIS-FVIQA: Finger Vein Image Quality Assessment with intra-class and inter-class similarity","authors":"","doi":"10.1016/j.patcog.2024.111056","DOIUrl":"10.1016/j.patcog.2024.111056","url":null,"abstract":"<div><div>In recent years, Finger Vein Image Quality Assessment (FVIQA) has been recognized as an effective solution to the problem of erroneous recognition resulting from low image quality due to false and missing information in finger vein images, and has become an important part of finger vein recognition systems. Compared to traditional FVIQA methods that rely on domain knowledge, newer methods that reject low-quality images have been favored for their independence from human interference. However, these methods only consider intra-class similarity information and ignore valuable information from inter-class distribution, which is also an important factor in evaluating the performance of recognition systems. In this work, we propose a novel FVIQA approach, named IIS-FVIQA, which concurrently takes into account the intra-class similarity density and inter-class similarity distribution distance within recognition systems. Specifically, our method generates quality scores for finger vein images by combining the information entropy of intra-class similarity distribution and Wasserstein distance of inter-class distribution. Then, we train a regression network for quality prediction using training images and corresponding quality scores. When a new image enters the recognition system, the trained regression network directly predicts the quality score of the image, making it easier for the system to select the corresponding operation based on the quality score of the image. Extensive experiments conducted on benchmark datasets demonstrate that the IIS-FVIQA method proposed in this paper consistently achieves top performance across multiple public datasets. After filtering out 10% of low-quality images predicted by the quality regression network, the recognition system’s performance improves by 43.96% (SDUMLA), 32.23% (MMCBNU_6000), and 21.20% (FV-USM), respectively. Furthermore, the method exhibits strong generalizability across different recognition algorithms (e.g., LBP, MC, and Inception V3) and datasets (e.g., SDUMLA, MMCBNU_6000, and FV-USM).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient time series adaptive representation learning via Dynamic Routing Sparse Attention 通过动态路由稀疏注意力进行高效时间序列自适应表征学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-28 DOI: 10.1016/j.patcog.2024.111058
Time series prediction plays a crucial role in various fields but also faces significant challenges. Converting original 1D time series data into 2D data through dimension transformation allows capturing more hidden features but incurs high memory consumption and low time efficiency. We have designed a sparse attention mechanism with dynamic routing perception called Dynamic Routing Sparse Attention (DRSA) to address these issues. Specifically, DRSA can effectively handle variations of complex time series data. Meanwhile, under memory constraints, the Dynamic Routing Filter (DRF) module further refines it by filtering the blocked 2D time series data to identify the most relevant feature vectors in the local context. We conducted predictive experiments on six real-world time series datasets with fine granularity and long sequence dependencies. Compared to eight state-of-the-art (SOTA) models, DRSA demonstrated relative improvements ranging from 4.18% to 81.02%. Furthermore, its time efficiency is 2 to 5 times higher than the baseline. Our code and dataset will be available at https://github.com/wwy8/DRSA_main.
时间序列预测在各个领域发挥着重要作用,但也面临着巨大挑战。通过维度变换将原始一维时间序列数据转换为二维数据可以捕捉到更多隐藏特征,但会产生高内存消耗和低时间效率。为了解决这些问题,我们设计了一种具有动态路由感知功能的稀疏注意力机制,称为动态路由稀疏注意力(DRSA)。具体来说,DRSA 可以有效处理复杂时间序列数据的变化。同时,在内存受限的情况下,动态路由过滤器(DRF)模块通过过滤阻塞的二维时间序列数据来进一步完善它,从而识别出本地环境中最相关的特征向量。我们在六个具有细粒度和长序列依赖性的真实世界时间序列数据集上进行了预测实验。与八个最先进的(SOTA)模型相比,DRSA 的相对改进幅度从 4.18% 到 81.02%。此外,它的时间效率是基线的 2 到 5 倍。我们的代码和数据集将发布在 https://github.com/wwy8/DRSA_main 网站上。
{"title":"Efficient time series adaptive representation learning via Dynamic Routing Sparse Attention","authors":"","doi":"10.1016/j.patcog.2024.111058","DOIUrl":"10.1016/j.patcog.2024.111058","url":null,"abstract":"<div><div>Time series prediction plays a crucial role in various fields but also faces significant challenges. Converting original 1D time series data into 2D data through dimension transformation allows capturing more hidden features but incurs high memory consumption and low time efficiency. We have designed a sparse attention mechanism with dynamic routing perception called <strong>D</strong>ynamic <strong>R</strong>outing <strong>S</strong>parse <strong>A</strong>ttention (DRSA) to address these issues. Specifically, DRSA can effectively handle variations of complex time series data. Meanwhile, under memory constraints, the <strong>D</strong>ynamic <strong>R</strong>outing <strong>F</strong>ilter (DRF) module further refines it by filtering the blocked 2D time series data to identify the most relevant feature vectors in the local context. We conducted predictive experiments on six real-world time series datasets with fine granularity and long sequence dependencies. Compared to eight state-of-the-art (SOTA) models, DRSA demonstrated relative improvements ranging from 4.18% to 81.02%. Furthermore, its time efficiency is 2 to 5 times higher than the baseline. Our code and dataset will be available at <span><span>https://github.com/wwy8/DRSA_main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SLAM2: Simultaneous Localization and Multimode Mapping for indoor dynamic environments SLAM2:针对室内动态环境的同步定位和多模绘图
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-28 DOI: 10.1016/j.patcog.2024.111054
Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM2, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.
传统的基于点特征的视觉同步定位与映射(SLAM)方法往往受限于强大的静态假设和纹理信息,导致相机姿态估计和物体定位不准确。为了应对这些挑战,我们提出了 SLAM2,这是一种新颖的语义 RGB-D SLAM 系统,可以准确估计摄像机的姿态和其他物体的 6DOF 姿态,从而在动态环境中绘制出完整、清晰的静态 3D 模型映射。我们的系统充分利用空间中的点、线、面特征来提高摄像机姿态估计的准确性。它将传统的几何方法与深度学习方法相结合,既能检测场景中已知的动态物体,也能检测未知的动态物体。此外,我们的系统还设计了三种模式的映射方法,包括密集、半密集和稀疏,可根据不同任务的需要选择模式。这使得我们的视觉 SLAM 系统适用于多种应用领域。在 TUM RGB-D 和 Bonn RGB-D 数据集中进行的评估表明,与最先进的方法相比,我们的 SLAM 系统在动态环境中实现了最高的定位精度和最简洁的静态三维场景映射。具体来说,在高动态的 TUM w/half 序列中,我们的系统实现了 0.018 米的均方根误差 (RMSE),优于 ORB-SLAM3(0.231 米)和 DRG-SLAM(0.025 米)。在波恩数据集中,我们的系统在 18 个序列中的 14 个序列中表现优异,与次优方法相比,平均 RMSE 降低了 27.3%。
{"title":"SLAM2: Simultaneous Localization and Multimode Mapping for indoor dynamic environments","authors":"","doi":"10.1016/j.patcog.2024.111054","DOIUrl":"10.1016/j.patcog.2024.111054","url":null,"abstract":"<div><div>Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1