Pub Date : 2025-01-20DOI: 10.1109/TAI.2025.3531741
{"title":"2024 Index IEEE Transactions on Artificial Intelligence Vol. 5","authors":"","doi":"10.1109/TAI.2025.3531741","DOIUrl":"https://doi.org/10.1109/TAI.2025.3531741","url":null,"abstract":"","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"1-93"},"PeriodicalIF":0.0,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10847313","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143183912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-02DOI: 10.1109/TAI.2024.3510479
Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang
Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.
{"title":"ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer","authors":"Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang","doi":"10.1109/TAI.2024.3510479","DOIUrl":"https://doi.org/10.1109/TAI.2024.3510479","url":null,"abstract":"Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"1038-1049"},"PeriodicalIF":0.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143761486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Underwater imaging is often compromised by light scattering and absorption, resulting in image degradation and distortion. This manifests as blurred details, color shifts, and diminished illumination and contrast, thereby hindering advancements in underwater research. To mitigate these issues, we propose Unformer, an innovative underwater image enhancement (UIE) technique that leverages a transformer-based architecture for multiscale adaptive feature aggregation. Our approach employs a multiscale feature fusion strategy that adaptively restores illumination and detail features. We reevaluate the relationship between convolution and transformer to develop a novel encoder structure. This structure effectively integrates both long-range and short-range dependencies, dynamically combines local and global features, and constructs a comprehensive global context. Furthermore, we propose a unique multibranch decoder architecture that enhances and efficiently extracts spatial context information through the transformer module. Extensive experiments on three datasets demonstrate that our proposed method outperforms other techniques in both subjective and objective evaluations. Compared with the latest methods, Unformer has improved the peak signal-to-noise ratio (PSNR) by 19.5% and 14.8% respectively on the LSUI and EUVP datasets. The code is available at: https://github.com/yhflq/Unformer.
{"title":"Unformer: A Transformer-Based Approach for Adaptive Multiscale Feature Aggregation in Underwater Image Enhancement","authors":"Yuhao Qing;Yueying Wang;Huaicheng Yan;Xiangpeng Xie;Zhengguang Wu","doi":"10.1109/TAI.2024.3508667","DOIUrl":"https://doi.org/10.1109/TAI.2024.3508667","url":null,"abstract":"Underwater imaging is often compromised by light scattering and absorption, resulting in image degradation and distortion. This manifests as blurred details, color shifts, and diminished illumination and contrast, thereby hindering advancements in underwater research. To mitigate these issues, we propose Unformer, an innovative underwater image enhancement (UIE) technique that leverages a transformer-based architecture for multiscale adaptive feature aggregation. Our approach employs a multiscale feature fusion strategy that adaptively restores illumination and detail features. We reevaluate the relationship between convolution and transformer to develop a novel encoder structure. This structure effectively integrates both long-range and short-range dependencies, dynamically combines local and global features, and constructs a comprehensive global context. Furthermore, we propose a unique multibranch decoder architecture that enhances and efficiently extracts spatial context information through the transformer module. Extensive experiments on three datasets demonstrate that our proposed method outperforms other techniques in both subjective and objective evaluations. Compared with the latest methods, Unformer has improved the peak signal-to-noise ratio (PSNR) by 19.5% and 14.8% respectively on the LSUI and EUVP datasets. The code is available at: <uri>https://github.com/yhflq/Unformer</uri>.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"1024-1037"},"PeriodicalIF":0.0,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143740385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}