Pub Date : 2024-10-05DOI: 10.1016/j.patcog.2024.111069
In feature selection, Markov Blanket (MB) based approaches have attracted considerable attention with most MB discovery algorithms being categorized as filter based techniques. Typically, the Conditional Independence (CI) test employed by such methods is different for different data types. In this article, we propose a novel Markov Blanket based wrapper feature selection method. The proposed approach employs Predictive Permutation Independence (PPI), a novel Conditional Independence (CI) test that allows it to work out-of-the-box for both classification and regression tasks on mixed data. PPI can work with any supervised algorithm to estimate the association of a feature with the target variable while also providing a measure of feature importance. The proposed approach also includes an optional MB aggregation step that can be used to find the optimal MB under non-faithful conditions. Our method1 outperforms other MB discovery methods, in terms of F1-score, by 7% on average, over 3 large-scale BN datasets. It also outperforms state-of-the-art feature selection techniques on 13 real-world datasets.
{"title":"A wrapper feature selection approach using Markov blankets","authors":"","doi":"10.1016/j.patcog.2024.111069","DOIUrl":"10.1016/j.patcog.2024.111069","url":null,"abstract":"<div><div>In feature selection, Markov Blanket (MB) based approaches have attracted considerable attention with most MB discovery algorithms being categorized as filter based techniques. Typically, the Conditional Independence (CI) test employed by such methods is different for different data types. In this article, we propose a novel Markov Blanket based wrapper feature selection method. The proposed approach employs Predictive Permutation Independence (PPI), a novel Conditional Independence (CI) test that allows it to work out-of-the-box for both classification and regression tasks on mixed data. PPI can work with any supervised algorithm to estimate the association of a feature with the target variable while also providing a measure of feature importance. The proposed approach also includes an optional MB aggregation step that can be used to find the optimal MB under non-faithful conditions. Our method<span><span><sup>1</sup></span></span> outperforms other MB discovery methods, in terms of F1-score, by 7% on average, over 3 large-scale BN datasets. It also outperforms state-of-the-art feature selection techniques on 13 real-world datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-05DOI: 10.1016/j.patcog.2024.111062
Data sets are usually mixed with numerical and categorical attributes in the real world. Data mining of mixed data makes a lot of sense. This paper proposes an Intuitive-K-prototypes clustering algorithm with improved prototype representation and attribute weights. The proposed algorithm defines intuitionistic distribution centroid for categorical attributes. In our approach, a heuristic search for initial prototypes is performed. Then, we combine the mean of numerical attributes and intuitionistic distribution centroid to represent the cluster prototype. In addition, intra-cluster complexity and inter-cluster similarity are used to adjust attribute weights, with higher priority given to those with lower complexity and similarity. The membership and non-membership distance are calculated using the intuitionistic distribution centroid. These distances are then combined parametrically to obtain the composite distance. The algorithm is judged for its clustering effectiveness on the real UCI data set, and the results show that the proposed algorithm outperforms the traditional clustering algorithm in most cases.
在现实世界中,数据集通常混合了数字和分类属性。对混合数据进行数据挖掘非常有意义。本文提出了一种改进了原型表示和属性权重的直观 K 原型聚类算法。所提出的算法为分类属性定义了直观分布中心点。在我们的方法中,会对初始原型进行启发式搜索。然后,我们结合数值属性的平均值和直觉分布中心点来表示聚类原型。此外,聚类内部复杂性和聚类间相似性也用于调整属性权重,复杂性和相似性较低的属性优先级较高。使用直观分布中心点计算成员和非成员距离。然后将这些距离进行参数组合,得到复合距离。在真实的 UCI 数据集上对该算法的聚类效果进行了评判,结果表明,所提出的算法在大多数情况下都优于传统的聚类算法。
{"title":"Intuitive-K-prototypes: A mixed data clustering algorithm with intuitionistic distribution centroid","authors":"","doi":"10.1016/j.patcog.2024.111062","DOIUrl":"10.1016/j.patcog.2024.111062","url":null,"abstract":"<div><div>Data sets are usually mixed with numerical and categorical attributes in the real world. Data mining of mixed data makes a lot of sense. This paper proposes an Intuitive-K-prototypes clustering algorithm with improved prototype representation and attribute weights. The proposed algorithm defines intuitionistic distribution centroid for categorical attributes. In our approach, a heuristic search for initial prototypes is performed. Then, we combine the mean of numerical attributes and intuitionistic distribution centroid to represent the cluster prototype. In addition, intra-cluster complexity and inter-cluster similarity are used to adjust attribute weights, with higher priority given to those with lower complexity and similarity. The membership and non-membership distance are calculated using the intuitionistic distribution centroid. These distances are then combined parametrically to obtain the composite distance. The algorithm is judged for its clustering effectiveness on the real UCI data set, and the results show that the proposed algorithm outperforms the traditional clustering algorithm in most cases.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1016/j.patcog.2024.111044
Face detection is challenging in unconstrained environments, where it encounters various challenges such as orientation, pose, and occlusion. Deep convolutional neural networks, particularly cascaded ones, have greatly improved detection performance but still struggle with rotating objects due to limitations in the Cartesian coordinate system. Although data augmentation can mitigate this issue, it also increases computational demands. This paper introduces the Robust Polar Transformation Network (RP-Net) for rotation-invariant face detection. RP-Net converts the complex rotational problem into a simpler translational one to enhance feature extraction and computational efficiency. Additionally, the Advanced Spatial-Channel Restoration (ASCR) module optimizes facial landmark detection within polar domains and restores critical details lost during transformation. Experimental results on benchmark datasets show that RP-Net significantly improves rotation invariance over traditional CNNs and surpasses several state-of-the-art rotation-invariant face detection methods.
{"title":"RP-Net: A Robust Polar Transformation Network for rotation-invariant face detection","authors":"","doi":"10.1016/j.patcog.2024.111044","DOIUrl":"10.1016/j.patcog.2024.111044","url":null,"abstract":"<div><div>Face detection is challenging in unconstrained environments, where it encounters various challenges such as orientation, pose, and occlusion. Deep convolutional neural networks, particularly cascaded ones, have greatly improved detection performance but still struggle with rotating objects due to limitations in the Cartesian coordinate system. Although data augmentation can mitigate this issue, it also increases computational demands. This paper introduces the Robust Polar Transformation Network (RP-Net) for rotation-invariant face detection. RP-Net converts the complex rotational problem into a simpler translational one to enhance feature extraction and computational efficiency. Additionally, the Advanced Spatial-Channel Restoration (ASCR) module optimizes facial landmark detection within polar domains and restores critical details lost during transformation. Experimental results on benchmark datasets show that RP-Net significantly improves rotation invariance over traditional CNNs and surpasses several state-of-the-art rotation-invariant face detection methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1016/j.patcog.2024.111064
Deep convolutional neural networks (CNNs) have produced remarkable outcomes in finger vein recognition. However, these networks often overfit label information, losing essential image features, and are sensitive to noise, with minor input changes leading to incorrect recognition. To address above problems, this paper presents a new classification reconstruction cycle generative adversarial network (CRCGAN) for finger vein recognition. CRCGAN comprises a feature generator, a feature discriminator, an image generator, and an image discriminator, which are designed for robust feature extraction. Concretely, the feature generator extracts features for classification, while the image generator reconstructs images from these features. Two discriminators provide feedback, guiding the generators to improve the quality of generated data. With this design of bi-directional image-to-feature mapping and cyclic adversarial training, CRCGAN achieves the extraction of essential features and minimizes overfitting. Additionally, precisely due to the extraction of essential features, CRCGAN is not sensitive to noise. Experimental results on three public databases, including THU-FVFDT2, HKPU, and USM, demonstrate CRCGAN’s competitive performance and strong noise resistance, achieving recognition accuracies of 98.36%, 99.17% and 99.49% respectively, with less than 0.5% degradation on HKPU and USM databases under noisy conditions.
{"title":"CRCGAN: Toward robust feature extraction in finger vein recognition","authors":"","doi":"10.1016/j.patcog.2024.111064","DOIUrl":"10.1016/j.patcog.2024.111064","url":null,"abstract":"<div><div>Deep convolutional neural networks (CNNs) have produced remarkable outcomes in finger vein recognition. However, these networks often overfit label information, losing essential image features, and are sensitive to noise, with minor input changes leading to incorrect recognition. To address above problems, this paper presents a new classification reconstruction cycle generative adversarial network (CRCGAN) for finger vein recognition. CRCGAN comprises a feature generator, a feature discriminator, an image generator, and an image discriminator, which are designed for robust feature extraction. Concretely, the feature generator extracts features for classification, while the image generator reconstructs images from these features. Two discriminators provide feedback, guiding the generators to improve the quality of generated data. With this design of bi-directional image-to-feature mapping and cyclic adversarial training, CRCGAN achieves the extraction of essential features and minimizes overfitting. Additionally, precisely due to the extraction of essential features, CRCGAN is not sensitive to noise. Experimental results on three public databases, including THU-FVFDT2, HKPU, and USM, demonstrate CRCGAN’s competitive performance and strong noise resistance, achieving recognition accuracies of 98.36%, 99.17% and 99.49% respectively, with less than 0.5% degradation on HKPU and USM databases under noisy conditions.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.patcog.2024.111060
Content-based image retrieval (CBIR), which consists of ranking a set of images with respect to a query image based on visual similarity, can assist diagnostic radiologists in assessing medical images, by identifying similar digital images in large image databases. Despite the many recent advances and innovations in CBIR for general images, their adoption in radiology has been slow and limited. In the current paper we attempt to close the gap between the two domains and wisely adapt modern CBIR techniques to radiology images: by extending the latest representation learning techniques in a way that can overcome the unique challenges and at the same time take advantage of the specific opportunities that are present in radiology we were able to come up with novel and effective medical image retrieval methods. Our method achieves the highest CUI@5 scores (18.48, 15.95) on two widely used datasets (ROCO and MEDICAT respectively), showcasing the superiority of the proposed method in comparison with state-of-the-art relevant alternatives.
{"title":"Semantic aware representation learning for optimizing image retrieval systems in radiology","authors":"","doi":"10.1016/j.patcog.2024.111060","DOIUrl":"10.1016/j.patcog.2024.111060","url":null,"abstract":"<div><div>Content-based image retrieval (CBIR), which consists of ranking a set of images with respect to a query image based on visual similarity, can assist diagnostic radiologists in assessing medical images, by identifying similar digital images in large image databases. Despite the many recent advances and innovations in CBIR for general images, their adoption in radiology has been slow and limited. In the current paper we attempt to close the gap between the two domains and wisely adapt modern CBIR techniques to radiology images: by extending the latest representation learning techniques in a way that can overcome the unique challenges and at the same time take advantage of the specific opportunities that are present in radiology we were able to come up with novel and effective medical image retrieval methods. Our method achieves the highest CUI@5 scores (18.48, 15.95) on two widely used datasets (ROCO and MEDICAT respectively), showcasing the superiority of the proposed method in comparison with state-of-the-art relevant alternatives.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30DOI: 10.1016/j.patcog.2024.111059
Limited attention has been given to unsupervised capsule networks (CapsNets) with contrastive learning due to the challenge of harmoniously learning interpretable primary and high-level capsules. To address this issue, we focus on three aspects: loss function, routing algorithm, and training strategy. First, we propose a comprehensive contrastive loss to ensure consistency in learning both high-level and primary capsules across different objects. Next, we introduce an agreement-based routing mechanism for the activation of high-level capsules. Finally, we present a two-stage training strategy to resolve conflicts between multiple losses. Ablation experiments show that these methods all improve model performance. Results from linear evaluation and semi-supervised learning demonstrate that our model outperforms other CapsNets and convolutional neural networks in learning high-level capsules. Additionally, visualizing capsules provides insights into the primary capsules, which remain consistent across images and align with human vision.
{"title":"An interpretable unsupervised capsule network via comprehensive contrastive learning and two-stage training","authors":"","doi":"10.1016/j.patcog.2024.111059","DOIUrl":"10.1016/j.patcog.2024.111059","url":null,"abstract":"<div><div>Limited attention has been given to unsupervised capsule networks (CapsNets) with contrastive learning due to the challenge of harmoniously learning interpretable primary and high-level capsules. To address this issue, we focus on three aspects: loss function, routing algorithm, and training strategy. First, we propose a comprehensive contrastive loss to ensure consistency in learning both high-level and primary capsules across different objects. Next, we introduce an agreement-based routing mechanism for the activation of high-level capsules. Finally, we present a two-stage training strategy to resolve conflicts between multiple losses. Ablation experiments show that these methods all improve model performance. Results from linear evaluation and semi-supervised learning demonstrate that our model outperforms other CapsNets and convolutional neural networks in learning high-level capsules. Additionally, visualizing capsules provides insights into the primary capsules, which remain consistent across images and align with human vision.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30DOI: 10.1016/j.patcog.2024.111051
Self-supervised learning methods have demonstrated promising benefits to feature representation learning for image dehazing tasks, especially for avoiding the laborious work of collecting hazy-clean image pairs, while also enabling better generalization abilities of the model. Despite the long-standing interests in depth estimation for image dehazing tasks, few works have fully explored the interactions between depth and dehazing tasks in an unsupervised manner. In this paper, a self-supervised image dehazing framework under the guidance of self-supervised depth estimation has been proposed, to fully exploit the interactions between depth and hazes for image dehazing. Specifically, the hazy image and the corresponding depth estimation are generated and optimized from the clear image in a dual-network self-supervised manner. The correlations between depth and hazy images are exploited in depth-guided hybrid attention Transformer blocks, which adaptively leverage both the cross-attention and self-attention to effectively model hazy densities via cross-modality fusion and capture global context information for better feature representations. In addition, the depth estimations of hazy images are further explored for the detection tasks on hazy images. Extensive experiments demonstrate that the depth estimation not only enhances the model generalization ability across different dehazing datasets, leading to state-of-the-art self-supervised dehazing performance, but also benefits downstream detection tasks on hazy images. Our code is available at https://github.com/DongLiangSXU/Depth-Guidance-dehazing.git.
{"title":"Image dehazing via self-supervised depth guidance","authors":"","doi":"10.1016/j.patcog.2024.111051","DOIUrl":"10.1016/j.patcog.2024.111051","url":null,"abstract":"<div><div>Self-supervised learning methods have demonstrated promising benefits to feature representation learning for image dehazing tasks, especially for avoiding the laborious work of collecting hazy-clean image pairs, while also enabling better generalization abilities of the model. Despite the long-standing interests in depth estimation for image dehazing tasks, few works have fully explored the interactions between depth and dehazing tasks in an unsupervised manner. In this paper, a self-supervised image dehazing framework under the guidance of self-supervised depth estimation has been proposed, to fully exploit the interactions between depth and hazes for image dehazing. Specifically, the hazy image and the corresponding depth estimation are generated and optimized from the clear image in a dual-network self-supervised manner. The correlations between depth and hazy images are exploited in depth-guided hybrid attention Transformer blocks, which adaptively leverage both the cross-attention and self-attention to effectively model hazy densities via cross-modality fusion and capture global context information for better feature representations. In addition, the depth estimations of hazy images are further explored for the detection tasks on hazy images. Extensive experiments demonstrate that the depth estimation not only enhances the model generalization ability across different dehazing datasets, leading to state-of-the-art self-supervised dehazing performance, but also benefits downstream detection tasks on hazy images. Our code is available at <span><span>https://github.com/DongLiangSXU/Depth-Guidance-dehazing.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-29DOI: 10.1016/j.patcog.2024.111056
In recent years, Finger Vein Image Quality Assessment (FVIQA) has been recognized as an effective solution to the problem of erroneous recognition resulting from low image quality due to false and missing information in finger vein images, and has become an important part of finger vein recognition systems. Compared to traditional FVIQA methods that rely on domain knowledge, newer methods that reject low-quality images have been favored for their independence from human interference. However, these methods only consider intra-class similarity information and ignore valuable information from inter-class distribution, which is also an important factor in evaluating the performance of recognition systems. In this work, we propose a novel FVIQA approach, named IIS-FVIQA, which concurrently takes into account the intra-class similarity density and inter-class similarity distribution distance within recognition systems. Specifically, our method generates quality scores for finger vein images by combining the information entropy of intra-class similarity distribution and Wasserstein distance of inter-class distribution. Then, we train a regression network for quality prediction using training images and corresponding quality scores. When a new image enters the recognition system, the trained regression network directly predicts the quality score of the image, making it easier for the system to select the corresponding operation based on the quality score of the image. Extensive experiments conducted on benchmark datasets demonstrate that the IIS-FVIQA method proposed in this paper consistently achieves top performance across multiple public datasets. After filtering out 10% of low-quality images predicted by the quality regression network, the recognition system’s performance improves by 43.96% (SDUMLA), 32.23% (MMCBNU_6000), and 21.20% (FV-USM), respectively. Furthermore, the method exhibits strong generalizability across different recognition algorithms (e.g., LBP, MC, and Inception V3) and datasets (e.g., SDUMLA, MMCBNU_6000, and FV-USM).
{"title":"IIS-FVIQA: Finger Vein Image Quality Assessment with intra-class and inter-class similarity","authors":"","doi":"10.1016/j.patcog.2024.111056","DOIUrl":"10.1016/j.patcog.2024.111056","url":null,"abstract":"<div><div>In recent years, Finger Vein Image Quality Assessment (FVIQA) has been recognized as an effective solution to the problem of erroneous recognition resulting from low image quality due to false and missing information in finger vein images, and has become an important part of finger vein recognition systems. Compared to traditional FVIQA methods that rely on domain knowledge, newer methods that reject low-quality images have been favored for their independence from human interference. However, these methods only consider intra-class similarity information and ignore valuable information from inter-class distribution, which is also an important factor in evaluating the performance of recognition systems. In this work, we propose a novel FVIQA approach, named IIS-FVIQA, which concurrently takes into account the intra-class similarity density and inter-class similarity distribution distance within recognition systems. Specifically, our method generates quality scores for finger vein images by combining the information entropy of intra-class similarity distribution and Wasserstein distance of inter-class distribution. Then, we train a regression network for quality prediction using training images and corresponding quality scores. When a new image enters the recognition system, the trained regression network directly predicts the quality score of the image, making it easier for the system to select the corresponding operation based on the quality score of the image. Extensive experiments conducted on benchmark datasets demonstrate that the IIS-FVIQA method proposed in this paper consistently achieves top performance across multiple public datasets. After filtering out 10% of low-quality images predicted by the quality regression network, the recognition system’s performance improves by 43.96% (SDUMLA), 32.23% (MMCBNU_6000), and 21.20% (FV-USM), respectively. Furthermore, the method exhibits strong generalizability across different recognition algorithms (e.g., LBP, MC, and Inception V3) and datasets (e.g., SDUMLA, MMCBNU_6000, and FV-USM).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-28DOI: 10.1016/j.patcog.2024.111058
Time series prediction plays a crucial role in various fields but also faces significant challenges. Converting original 1D time series data into 2D data through dimension transformation allows capturing more hidden features but incurs high memory consumption and low time efficiency. We have designed a sparse attention mechanism with dynamic routing perception called Dynamic Routing Sparse Attention (DRSA) to address these issues. Specifically, DRSA can effectively handle variations of complex time series data. Meanwhile, under memory constraints, the Dynamic Routing Filter (DRF) module further refines it by filtering the blocked 2D time series data to identify the most relevant feature vectors in the local context. We conducted predictive experiments on six real-world time series datasets with fine granularity and long sequence dependencies. Compared to eight state-of-the-art (SOTA) models, DRSA demonstrated relative improvements ranging from 4.18% to 81.02%. Furthermore, its time efficiency is 2 to 5 times higher than the baseline. Our code and dataset will be available at https://github.com/wwy8/DRSA_main.
{"title":"Efficient time series adaptive representation learning via Dynamic Routing Sparse Attention","authors":"","doi":"10.1016/j.patcog.2024.111058","DOIUrl":"10.1016/j.patcog.2024.111058","url":null,"abstract":"<div><div>Time series prediction plays a crucial role in various fields but also faces significant challenges. Converting original 1D time series data into 2D data through dimension transformation allows capturing more hidden features but incurs high memory consumption and low time efficiency. We have designed a sparse attention mechanism with dynamic routing perception called <strong>D</strong>ynamic <strong>R</strong>outing <strong>S</strong>parse <strong>A</strong>ttention (DRSA) to address these issues. Specifically, DRSA can effectively handle variations of complex time series data. Meanwhile, under memory constraints, the <strong>D</strong>ynamic <strong>R</strong>outing <strong>F</strong>ilter (DRF) module further refines it by filtering the blocked 2D time series data to identify the most relevant feature vectors in the local context. We conducted predictive experiments on six real-world time series datasets with fine granularity and long sequence dependencies. Compared to eight state-of-the-art (SOTA) models, DRSA demonstrated relative improvements ranging from 4.18% to 81.02%. Furthermore, its time efficiency is 2 to 5 times higher than the baseline. Our code and dataset will be available at <span><span>https://github.com/wwy8/DRSA_main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-28DOI: 10.1016/j.patcog.2024.111054
Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.
传统的基于点特征的视觉同步定位与映射(SLAM)方法往往受限于强大的静态假设和纹理信息,导致相机姿态估计和物体定位不准确。为了应对这些挑战,我们提出了 SLAM2,这是一种新颖的语义 RGB-D SLAM 系统,可以准确估计摄像机的姿态和其他物体的 6DOF 姿态,从而在动态环境中绘制出完整、清晰的静态 3D 模型映射。我们的系统充分利用空间中的点、线、面特征来提高摄像机姿态估计的准确性。它将传统的几何方法与深度学习方法相结合,既能检测场景中已知的动态物体,也能检测未知的动态物体。此外,我们的系统还设计了三种模式的映射方法,包括密集、半密集和稀疏,可根据不同任务的需要选择模式。这使得我们的视觉 SLAM 系统适用于多种应用领域。在 TUM RGB-D 和 Bonn RGB-D 数据集中进行的评估表明,与最先进的方法相比,我们的 SLAM 系统在动态环境中实现了最高的定位精度和最简洁的静态三维场景映射。具体来说,在高动态的 TUM w/half 序列中,我们的系统实现了 0.018 米的均方根误差 (RMSE),优于 ORB-SLAM3(0.231 米)和 DRG-SLAM(0.025 米)。在波恩数据集中,我们的系统在 18 个序列中的 14 个序列中表现优异,与次优方法相比,平均 RMSE 降低了 27.3%。
{"title":"SLAM2: Simultaneous Localization and Multimode Mapping for indoor dynamic environments","authors":"","doi":"10.1016/j.patcog.2024.111054","DOIUrl":"10.1016/j.patcog.2024.111054","url":null,"abstract":"<div><div>Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}