Pub Date : 2024-12-30DOI: 10.1007/s40747-024-01729-0
Xueting Ren, Surong Chu, Guohua Ji, Zijuan Zhao, Juanjuan Zhao, Yan Qiang, Yangyang Wei, Yan Wang
Diagnosing pneumoconiosis is challenging because the lesions are not easily visible on chest X-rays, and the images often lack clear details. Existing deep detection models utilize Feature Pyramid Networks (FPNs) to identify objects at different scales. However, they struggle with insufficient perception of small targets and gradient inconsistency in medical image detection tasks, hindering the full utilization of multi-scale features. To address these issues, we propose an Optimized Multi-Scale Feature Fusion learning framework, OMSF2, which includes the following components: (1) Data specificity augmentation module is introduced to capture intrinsic data representations and introduce diversity by learning morphological variations and lesion locations. (2) Multi-scale feature learning module is utilized that refines micro-feature localization guided by heatmaps, enabling full extraction of multi-directional features of subtle diffuse targets. (3) Multi-scale feature fusion module is employed that facilitates the fusion of high-level and low-level features to better understand subtle differences between disease stages. Notably, this paper innovatively proposes a method for fine learning of low-resolution micro-features in pneumoconiosis, addressing the issue of maintaining cross-layer gradient consistency under multi-scale feature fusion. We established an enhanced pneumoconiosis X-ray dataset to optimize the lesion detection capability of the OMSF2 model. We also introduced an external dataset to evaluate other chest X-rays with complex lesions. On the AP-50 and R-50 evaluation metrics, OMSF2 improved by 3.25% and 3.31% on the internal dataset, and by 2.28% and 0.24% on the external dataset, respectively. Experimental results show that OMSF2 achieves significantly better performance than state-of-the-art baselines in medical image detection tasks.
{"title":"OMSF2: optimizing multi-scale feature fusion learning for pneumoconiosis staging diagnosis through data specificity augmentation","authors":"Xueting Ren, Surong Chu, Guohua Ji, Zijuan Zhao, Juanjuan Zhao, Yan Qiang, Yangyang Wei, Yan Wang","doi":"10.1007/s40747-024-01729-0","DOIUrl":"https://doi.org/10.1007/s40747-024-01729-0","url":null,"abstract":"<p>Diagnosing pneumoconiosis is challenging because the lesions are not easily visible on chest X-rays, and the images often lack clear details. Existing deep detection models utilize Feature Pyramid Networks (FPNs) to identify objects at different scales. However, they struggle with insufficient perception of small targets and gradient inconsistency in medical image detection tasks, hindering the full utilization of multi-scale features. To address these issues, we propose an Optimized Multi-Scale Feature Fusion learning framework, OMSF2, which includes the following components: (1) Data specificity augmentation module is introduced to capture intrinsic data representations and introduce diversity by learning morphological variations and lesion locations. (2) Multi-scale feature learning module is utilized that refines micro-feature localization guided by heatmaps, enabling full extraction of multi-directional features of subtle diffuse targets. (3) Multi-scale feature fusion module is employed that facilitates the fusion of high-level and low-level features to better understand subtle differences between disease stages. Notably, this paper innovatively proposes a method for fine learning of low-resolution micro-features in pneumoconiosis, addressing the issue of maintaining cross-layer gradient consistency under multi-scale feature fusion. We established an enhanced pneumoconiosis X-ray dataset to optimize the lesion detection capability of the OMSF2 model. We also introduced an external dataset to evaluate other chest X-rays with complex lesions. On the AP-50 and R-50 evaluation metrics, OMSF2 improved by 3.25% and 3.31% on the internal dataset, and by 2.28% and 0.24% on the external dataset, respectively. Experimental results show that OMSF2 achieves significantly better performance than state-of-the-art baselines in medical image detection tasks.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"33 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1007/s40747-024-01758-9
Weiwei Wang, Wenping Ma, Kun Yan
Collaborative filtering recommendation systems generate personalized recommendation results by analyzing and collaboratively processing a large numerous of user ratings or behavior data. The widespread use of recommendation systems in daily decision-making also brings potential risks of privacy leakage. Recent literature predominantly employs differential privacy to achieve privacy protection, however, many schemes struggle to balance user privacy and recommendation performance effectively. In this work, we present a practical privacy-preserving scheme for user-based collaborative filtering recommendation that utilizes fuzzy C-means clustering and Shapley value, FSPPCFs, aiming to enhance the recommendation performance while ensuring privacy protection. Specifically, (i) we have modified the traditional recommendation scheme by introducing a similarity balance factor integrated into the Pearson similarity algorithm, enhancing recommendation system performance; (ii) FSPPCFs first clusters the dataset through fuzzy C-means clustering and Shapley value, grouping users with similar interests and attributes into the same cluster, thereby providing more accurate data support for recommendations. Then, differential privacy is used to achieve the user’s personal privacy protection when selecting the neighbor set from the target cluster. Finally, it is theoretically proved that our scheme satisfies differential privacy. Experimental results illustrate that our scheme significantly outperforms existing methods.
{"title":"FSPPCFs: a privacy-preserving collaborative filtering recommendation scheme based on fuzzy C-means and Shapley value","authors":"Weiwei Wang, Wenping Ma, Kun Yan","doi":"10.1007/s40747-024-01758-9","DOIUrl":"https://doi.org/10.1007/s40747-024-01758-9","url":null,"abstract":"<p>Collaborative filtering recommendation systems generate personalized recommendation results by analyzing and collaboratively processing a large numerous of user ratings or behavior data. The widespread use of recommendation systems in daily decision-making also brings potential risks of privacy leakage. Recent literature predominantly employs differential privacy to achieve privacy protection, however, many schemes struggle to balance user privacy and recommendation performance effectively. In this work, we present a practical privacy-preserving scheme for user-based collaborative filtering recommendation that utilizes fuzzy C-means clustering and Shapley value, FSPPCFs, aiming to enhance the recommendation performance while ensuring privacy protection. Specifically, (i) we have modified the traditional recommendation scheme by introducing a similarity balance factor integrated into the Pearson similarity algorithm, enhancing recommendation system performance; (ii) FSPPCFs first clusters the dataset through fuzzy C-means clustering and Shapley value, grouping users with similar interests and attributes into the same cluster, thereby providing more accurate data support for recommendations. Then, differential privacy is used to achieve the user’s personal privacy protection when selecting the neighbor set from the target cluster. Finally, it is theoretically proved that our scheme satisfies differential privacy. Experimental results illustrate that our scheme significantly outperforms existing methods.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"13 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1007/s40747-024-01710-x
Weijie Tang, Bin Wang, Longxiang Huang, Xu Yang, Qian Zhang, Sulei Zhu, Yan Ma
Depth cameras and LiDARs are commonly used sensing devices widely applied in fields such as autonomous driving, navigation, and robotics. Precise calibration between the two is crucial for accurate environmental perception and localization. Methods that utilize the point cloud features of both sensors to estimate extrinsic parameters can also be extended to calibrate limited Field-of-View (FOV) LiDARs and panoramic LiDARs, which holds significant research value. However, calibrating the point clouds from two sensors with different fields of view and densities presents challenges. This paper proposes methods for automatic calibration of the two sensors by extracting and registering features in three scenarios: environments with one plane, two planes, and three planes. For the one-plane and two-plane scenarios, we propose constructing feature histogram descriptors based on plane constraints for the remaining points, in addition to planar features, for registration. Experimental results on simulation and real-world data demonstrate that the proposed methods in all three scenarios achieve precise calibration, maintaining average rotation and translation calibration errors within 2 degrees and 0.05 meters respectively for a (360^{circ }) linear LiDAR and a depth camera with a field of view of (100^{circ }) vertically and (70^{circ }) degrees horizontally.
{"title":"Calibration between a panoramic LiDAR and a limited field-of-view depth camera","authors":"Weijie Tang, Bin Wang, Longxiang Huang, Xu Yang, Qian Zhang, Sulei Zhu, Yan Ma","doi":"10.1007/s40747-024-01710-x","DOIUrl":"https://doi.org/10.1007/s40747-024-01710-x","url":null,"abstract":"<p>Depth cameras and LiDARs are commonly used sensing devices widely applied in fields such as autonomous driving, navigation, and robotics. Precise calibration between the two is crucial for accurate environmental perception and localization. Methods that utilize the point cloud features of both sensors to estimate extrinsic parameters can also be extended to calibrate limited Field-of-View (FOV) LiDARs and panoramic LiDARs, which holds significant research value. However, calibrating the point clouds from two sensors with different fields of view and densities presents challenges. This paper proposes methods for automatic calibration of the two sensors by extracting and registering features in three scenarios: environments with one plane, two planes, and three planes. For the one-plane and two-plane scenarios, we propose constructing feature histogram descriptors based on plane constraints for the remaining points, in addition to planar features, for registration. Experimental results on simulation and real-world data demonstrate that the proposed methods in all three scenarios achieve precise calibration, maintaining average rotation and translation calibration errors within 2 degrees and 0.05 meters respectively for a <span>(360^{circ })</span> linear LiDAR and a depth camera with a field of view of <span>(100^{circ })</span> vertically and <span>(70^{circ })</span> degrees horizontally.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"48 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1007/s40747-024-01746-z
Xinhe Kuang, Yuxin Che, Huiyan Han, Yimin Liu
The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.
{"title":"Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions","authors":"Xinhe Kuang, Yuxin Che, Huiyan Han, Yimin Liu","doi":"10.1007/s40747-024-01746-z","DOIUrl":"https://doi.org/10.1007/s40747-024-01746-z","url":null,"abstract":"<p>The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"65 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of unmanned aerial vehicle (UAV) technology to inspect extensive peach orchards to improve fruit yield and quality is currently a major area of research. The challenge is to accurately detect peach diseases in real time, which is critical to improving peach production. The dense arrangement of peaches and the uneven lighting conditions significantly hamper the accuracy of disease detection. To overcome this, this paper presents a dual-channel transformer network (DCTNet) for peach disease detection. First, an Adaptive Dual-Channel Affine Transformer (ADCT) is developed to efficiently capture key information in images of diseased peaches by integrating features across spatial and channel dimensions within blocks. Next, a Robust Gated Feed Forward Network (RGFN) is constructed to extend the receptive field of the model by improving its context aggregation capabilities. Finally, a Local–Global Network is proposed to fully capture the multi-scale features of peach disease images through a collaborative training approach with input images. Furthermore, a peach disease dataset including different growth stages of peaches is constructed to evaluate the detection performance of the proposed method. Extensive experimental results show that our model outperforms other sophisticated models, achieving an ({AP}_{50}) of 95.57% and an F1 score of 0.91. The integration of this method into UAV systems for surveying large peach orchards ensures accurate disease detection, thereby safeguarding peach production.
利用无人机(UAV)技术对大面积桃园进行巡查以提高果实产量和品质是目前研究的一个重点领域。如何实时准确地检测桃病,是提高桃产量的关键。桃的密集排列和光照条件不均匀严重影响病害检测的准确性。为解决这一问题,本文提出了一种桃病检测的双通道变压器网络(DCTNet)。首先,开发了一种自适应双通道仿射变压器(ADCT),通过整合块内跨空间和通道维度的特征,有效捕获病桃图像中的关键信息。其次,构建鲁棒门控前馈网络(RGFN),通过提高其上下文聚合能力来扩展模型的接受域。最后,提出了一个Local-Global网络,通过与输入图像的协同训练,充分捕捉桃病图像的多尺度特征。在此基础上,构建了包含桃不同生长阶段的桃病数据集,对该方法的检测性能进行了评价。大量的实验结果表明,我们的模型优于其他复杂的模型,达到了95.57的({AP}_{50})% and an F1 score of 0.91. The integration of this method into UAV systems for surveying large peach orchards ensures accurate disease detection, thereby safeguarding peach production.
{"title":"DCTnet: a double-channel transformer network for peach disease detection using UAVs","authors":"Jie Zhang, Dailin Li, Xiaoping Shi, Fengxian Wang, Linwei Li, Yibin Chen","doi":"10.1007/s40747-024-01749-w","DOIUrl":"https://doi.org/10.1007/s40747-024-01749-w","url":null,"abstract":"<p>The use of unmanned aerial vehicle (UAV) technology to inspect extensive peach orchards to improve fruit yield and quality is currently a major area of research. The challenge is to accurately detect peach diseases in real time, which is critical to improving peach production. The dense arrangement of peaches and the uneven lighting conditions significantly hamper the accuracy of disease detection. To overcome this, this paper presents a dual-channel transformer network (DCTNet) for peach disease detection. First, an Adaptive Dual-Channel Affine Transformer (ADCT) is developed to efficiently capture key information in images of diseased peaches by integrating features across spatial and channel dimensions within blocks. Next, a Robust Gated Feed Forward Network (RGFN) is constructed to extend the receptive field of the model by improving its context aggregation capabilities. Finally, a Local–Global Network is proposed to fully capture the multi-scale features of peach disease images through a collaborative training approach with input images. Furthermore, a peach disease dataset including different growth stages of peaches is constructed to evaluate the detection performance of the proposed method. Extensive experimental results show that our model outperforms other sophisticated models, achieving an <span>({AP}_{50})</span> of 95.57% and an F1 score of 0.91. The integration of this method into UAV systems for surveying large peach orchards ensures accurate disease detection, thereby safeguarding peach production.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"160 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1007/s40747-024-01732-5
Wenwen Jia, Wenbin Zhao, Sitian Qin
This paper proposes an efficient penalty-like neurodynamic approach modeled as a second-order multi-agent system under external disturbances to investigate the distributed optimal allocation problems. The sliding mode control technology is integrated into the neurodynamic approach for suppressing the influence of the unknown external disturbance on the system’s stability within a fixed time. Then, based on a finite-time tracking technique, resource allocation constraints are handled by using a penalty parameter approach, and their global information is processed in a distributed manner via a multi-agent system. Compared with the existing neurodynamic approaches developed based on the projection theory, the proposed neurodynamic approach utilizes the penalty method and tracking technique to avoid introducing projection operators. Additionally, the convergence of the proposed neurodynamic approach is proven, and an optimal solution to the distributed optimal allocation problem is obtained. Finally, the main results are validated through a numerical simulation involving a power dispatch problem.
{"title":"A disturbance suppression second-order penalty-like neurodynamic approach to distributed optimal allocation","authors":"Wenwen Jia, Wenbin Zhao, Sitian Qin","doi":"10.1007/s40747-024-01732-5","DOIUrl":"https://doi.org/10.1007/s40747-024-01732-5","url":null,"abstract":"<p>This paper proposes an efficient penalty-like neurodynamic approach modeled as a second-order multi-agent system under external disturbances to investigate the distributed optimal allocation problems. The sliding mode control technology is integrated into the neurodynamic approach for suppressing the influence of the unknown external disturbance on the system’s stability within a fixed time. Then, based on a finite-time tracking technique, resource allocation constraints are handled by using a penalty parameter approach, and their global information is processed in a distributed manner via a multi-agent system. Compared with the existing neurodynamic approaches developed based on the projection theory, the proposed neurodynamic approach utilizes the penalty method and tracking technique to avoid introducing projection operators. Additionally, the convergence of the proposed neurodynamic approach is proven, and an optimal solution to the distributed optimal allocation problem is obtained. Finally, the main results are validated through a numerical simulation involving a power dispatch problem.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"81 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142905139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1007/s40747-024-01723-6
Fei Yan, Zeqian Wang, Kaoru Hirota
With the rapid advancement of next-generation information technology, smart healthcare has seamlessly integrated into various facets of people’s daily routines. Accordingly, enhancing the integrity and security of medical images has gained significant prominence as a crucial research trajectory. In this study, a dual watermarking scheme based on SRU-ConvNeXt V2 (SCNeXt) model and exponential iterative-cubic-cosine (EICC) chaotic map is proposed for medical image integrity verification, tamper localization, and copyright protection. A logo image for integrity verification is embedded into the region of interest within the medical image, and a text image containing copyright information is combined with the feature vectors extracted by SCNeXt for generating zero-watermark information. The security of watermarks is strengthened through a pre-embedding encryption algorithm using the chaotic sequence produced by the EICC map. A comprehensive set of experiments was conducted to validate the proposed dual watermarking scheme. The results demonstrate that the scheme offers significant advantages in both imperceptibility and robustness over traditional methods, including those that rely on manual extraction of medical image features. The scheme achieves excellent imperceptibility, with an average PSNR of 52.29 dB and an average SSIM of 0.9962. Moreover, it displays strong resilience against various attacks, particularly high-strength common and geometric attacks, maintaining an NC value above 0.84, which confirms its robustness. These findings highlight the superiority of the proposed dual watermarking scheme, establishing its potential as an advanced solution for secure and reliable medical image management.
{"title":"Dual medical image watermarking using SRU-enhanced network and EICC chaotic map","authors":"Fei Yan, Zeqian Wang, Kaoru Hirota","doi":"10.1007/s40747-024-01723-6","DOIUrl":"https://doi.org/10.1007/s40747-024-01723-6","url":null,"abstract":"<p>With the rapid advancement of next-generation information technology, smart healthcare has seamlessly integrated into various facets of people’s daily routines. Accordingly, enhancing the integrity and security of medical images has gained significant prominence as a crucial research trajectory. In this study, a dual watermarking scheme based on SRU-ConvNeXt V2 (SCNeXt) model and exponential iterative-cubic-cosine (EICC) chaotic map is proposed for medical image integrity verification, tamper localization, and copyright protection. A logo image for integrity verification is embedded into the region of interest within the medical image, and a text image containing copyright information is combined with the feature vectors extracted by SCNeXt for generating zero-watermark information. The security of watermarks is strengthened through a pre-embedding encryption algorithm using the chaotic sequence produced by the EICC map. A comprehensive set of experiments was conducted to validate the proposed dual watermarking scheme. The results demonstrate that the scheme offers significant advantages in both imperceptibility and robustness over traditional methods, including those that rely on manual extraction of medical image features. The scheme achieves excellent imperceptibility, with an average PSNR of 52.29 dB and an average SSIM of 0.9962. Moreover, it displays strong resilience against various attacks, particularly high-strength common and geometric attacks, maintaining an NC value above 0.84, which confirms its robustness. These findings highlight the superiority of the proposed dual watermarking scheme, establishing its potential as an advanced solution for secure and reliable medical image management.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"19 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1007/s40747-024-01754-z
Lin Pan, Qianqian Ren, Zilong Li, Xingfeng Lv
Graph neural networks integrating contrastive learning have attracted growing attention in urban traffic flow forecasting. However, most existing graph contrastive learning methods do not perform well in capturing local–global spatial dependencies or designing contrastive learning schemes for both spatial and temporal dimensions. We argue that these methods can not well extract the spatial-temporal features and are easily affected by data noise. In light of these challenges, this paper proposes an innovative Urban Spatial-Temporal Graph Contrastive Learning framework (UrbanGCL) to improve the accuracy of urban traffic flow forecasting. Specifically, UrbanGCL proposes multi-level data augmentation to address data noise and incompleteness, learn both local and global topology features. The augmented traffic feature matrices and adjacency matrices are then fed into a simple yet effective dual-branch network with shared parameters to capture spatial-temporal correlations within traffic sequences. Moreover, we introduce spatial and temporal contrastive learning auxiliary tasks to alleviate the sparsity of supervision signal and extract the most critical spatial-temporal information. Extensive experimental results on four real-world urban datasets demonstrate that UrbanGCL significantly outperforms other state-of-the-art methods, with the maximum improvement reaching nearly 8.80%.
{"title":"Rethinking spatial-temporal contrastive learning for Urban traffic flow forecasting: multi-level augmentation framework","authors":"Lin Pan, Qianqian Ren, Zilong Li, Xingfeng Lv","doi":"10.1007/s40747-024-01754-z","DOIUrl":"https://doi.org/10.1007/s40747-024-01754-z","url":null,"abstract":"<p>Graph neural networks integrating contrastive learning have attracted growing attention in urban traffic flow forecasting. However, most existing graph contrastive learning methods do not perform well in capturing local–global spatial dependencies or designing contrastive learning schemes for both spatial and temporal dimensions. We argue that these methods can not well extract the spatial-temporal features and are easily affected by data noise. In light of these challenges, this paper proposes an innovative <u>Urban</u> Spatial-Temporal <u>G</u>raph <u>C</u>ontrastive <u>L</u>earning framework (UrbanGCL) to improve the accuracy of urban traffic flow forecasting. Specifically, UrbanGCL proposes multi-level data augmentation to address data noise and incompleteness, learn both local and global topology features. The augmented traffic feature matrices and adjacency matrices are then fed into a simple yet effective dual-branch network with shared parameters to capture spatial-temporal correlations within traffic sequences. Moreover, we introduce spatial and temporal contrastive learning auxiliary tasks to alleviate the sparsity of supervision signal and extract the most critical spatial-temporal information. Extensive experimental results on four real-world urban datasets demonstrate that UrbanGCL significantly outperforms other state-of-the-art methods, with the maximum improvement reaching nearly 8.80%.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"33 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1007/s40747-024-01747-y
Xiaosong Chen, Yongbo Yang, Dong Liu, Shengsheng Wang
In recent years, there is a growing interest in domain adaptation for remote sensing image scene classification, particularly in universal domain adaptation, where both source and target domains possess their unique private categories. Existing methods often lack precision on remote sensing image datasets due to insufficient prior knowledge between the source and target domains. This study aims to effectively distinguish between common and private classes despite large intra-class sample discrepancies and small inter-class sample discrepancies in remote sensing images. To address these challenges, we propose Sample-Prototype Optimal Transport-Based Universal Domain Adaptation (SPOT). The proposed approach comprises two key components. Firstly, we utilize an unbalanced optimal transport algorithm along with a sample complement mechanism to identify common and private classes based on the optimal transport assignment matrix. Secondly, we leverage the optimal transport algorithm to enhance discriminability among different classes while promoting similarity within the same class. Experimental results demonstrate that SPOT significantly enhances classification accuracy and robustness in universal domain adaptation for remote sensing images, underscoring its efficacy in addressing the identified challenges.
{"title":"Sample-prototype optimal transport-based universal domain adaptation for remote sensing image classification","authors":"Xiaosong Chen, Yongbo Yang, Dong Liu, Shengsheng Wang","doi":"10.1007/s40747-024-01747-y","DOIUrl":"https://doi.org/10.1007/s40747-024-01747-y","url":null,"abstract":"<p>In recent years, there is a growing interest in domain adaptation for remote sensing image scene classification, particularly in universal domain adaptation, where both source and target domains possess their unique private categories. Existing methods often lack precision on remote sensing image datasets due to insufficient prior knowledge between the source and target domains. This study aims to effectively distinguish between common and private classes despite large intra-class sample discrepancies and small inter-class sample discrepancies in remote sensing images. To address these challenges, we propose Sample-Prototype Optimal Transport-Based Universal Domain Adaptation (SPOT). The proposed approach comprises two key components. Firstly, we utilize an unbalanced optimal transport algorithm along with a sample complement mechanism to identify common and private classes based on the optimal transport assignment matrix. Secondly, we leverage the optimal transport algorithm to enhance discriminability among different classes while promoting similarity within the same class. Experimental results demonstrate that SPOT significantly enhances classification accuracy and robustness in universal domain adaptation for remote sensing images, underscoring its efficacy in addressing the identified challenges.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"313 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-28DOI: 10.1007/s40747-024-01711-w
Fangfang Qin, Zongpu Jia, Xiaoyan Pang, Shan Zhao
In response to image degradation caused by rain during image acquisition, this paper proposes a rain removal method for single image of dual-branch joint network based on a sparse Transformer (DBSTNet). The developed model comprises a rain removal subnet and a background recovery subnet. The former extracts rain trace information utilizing a rain removal strategy, while the latter employs this information to restore background details. Furthermore, a U-shaped encoder-decoder branch (UEDB) focuses on local features to mitigate the impact of rainwater on background detail textures. UEDB incorporates a feature refinement unit to maximize the contribution of the channel attention mechanism in recovering local detail features. Additionally, since tokens with low relevance in the Transformer may influence image recovery, this study introduces a residual sparse Transformer branch (RSTB) to overcome the limitations of the Convolutional Neural Network’s (CNN’s) receptive field. Indeed, RSTB preserves the most valuable self-attention values for the aggregation of features, facilitating high-quality image reconstruction from a global perspective. Finally, the parallel dual-branch joint module, composed of RSTB and UEDB branches, effectively captures the local context and global structure, culminating in a clear background image. Experimental validation on synthetic and real datasets demonstrates that rain removal images exhibit richer detail information, significantly improving the overall visual effect.
{"title":"Rain removal method for single image of dual-branch joint network based on sparse transformer","authors":"Fangfang Qin, Zongpu Jia, Xiaoyan Pang, Shan Zhao","doi":"10.1007/s40747-024-01711-w","DOIUrl":"https://doi.org/10.1007/s40747-024-01711-w","url":null,"abstract":"<p>In response to image degradation caused by rain during image acquisition, this paper proposes a rain removal method for single image of dual-branch joint network based on a sparse Transformer (DBSTNet). The developed model comprises a rain removal subnet and a background recovery subnet. The former extracts rain trace information utilizing a rain removal strategy, while the latter employs this information to restore background details. Furthermore, a U-shaped encoder-decoder branch (UEDB) focuses on local features to mitigate the impact of rainwater on background detail textures. UEDB incorporates a feature refinement unit to maximize the contribution of the channel attention mechanism in recovering local detail features. Additionally, since tokens with low relevance in the Transformer may influence image recovery, this study introduces a residual sparse Transformer branch (RSTB) to overcome the limitations of the Convolutional Neural Network’s (CNN’s) receptive field. Indeed, RSTB preserves the most valuable self-attention values for the aggregation of features, facilitating high-quality image reconstruction from a global perspective. Finally, the parallel dual-branch joint module, composed of RSTB and UEDB branches, effectively captures the local context and global structure, culminating in a clear background image. Experimental validation on synthetic and real datasets demonstrates that rain removal images exhibit richer detail information, significantly improving the overall visual effect.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"25 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142888980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}