Pub Date : 2024-08-11DOI: 10.1007/s13042-024-02295-0
Peiqiu Yu, Lei Chen, Weiwei Li, Xiuyi Jia
Label distribution learning is an effective learning approach for addressing label polysemy in the field of machine learning. In contrast to multi-label learning, label distribution learning can accurately represent the relative importance of labels and has richer semantic information about labels. Presently label distribution learning algorithms frequently integrate label correlation into their models to narrow down the assumption space of the model. However, existing label distribution learning works on label correlation use one-to-one or many-to-one correlation which has limitations in representing more complex correlation relationships. To address this issue, we attempt to extend the existing correlation relationships to many-to-many relationships. Specifically, we first construct a many-to-many correlation mining framework based on self-representation. Then by using the learned many-to-many correlation, a label distribution learning algorithm is designed. Our algorithm achieved the best performance in (78.21%) of cases across all datasets and all performance metrics with the algorithm having the best average ranking. It also demonstrated statistical superiority compared to the comparison algorithms in pairwise two-tailed t-tests. This paper introduces a novel approach to representing and applying label correlations in label distribution learning. The exploitation of this new many-to-many correlation can enhance the representational capabilities of label distribution learning models.
标签分布学习是机器学习领域解决标签多义性问题的一种有效学习方法。与多标签学习相比,标签分布学习能准确地表示标签的相对重要性,并拥有更丰富的标签语义信息。目前,标签分布学习算法经常将标签相关性整合到模型中,以缩小模型的假设空间。然而,现有的标签分布学习算法在标签相关性方面使用的是一对一或多对一的相关性,在表示更复杂的相关关系方面存在局限性。为了解决这个问题,我们尝试将现有的相关关系扩展为多对多关系。具体来说,我们首先构建了一个基于自我表示的多对多关联挖掘框架。然后,利用学习到的多对多相关关系,设计一种标签分布学习算法。在所有数据集和所有性能指标中,我们的算法在78.21%的情况下取得了最佳性能,平均排名第一。在成对双尾 t 检验中,它还显示出了与比较算法相比的统计优势。本文介绍了一种在标签分布学习中表示和应用标签相关性的新方法。利用这种新的多对多相关性可以增强标签分布学习模型的表示能力。
{"title":"Label distribution learning via second-order self-representation","authors":"Peiqiu Yu, Lei Chen, Weiwei Li, Xiuyi Jia","doi":"10.1007/s13042-024-02295-0","DOIUrl":"https://doi.org/10.1007/s13042-024-02295-0","url":null,"abstract":"<p>Label distribution learning is an effective learning approach for addressing label polysemy in the field of machine learning. In contrast to multi-label learning, label distribution learning can accurately represent the relative importance of labels and has richer semantic information about labels. Presently label distribution learning algorithms frequently integrate label correlation into their models to narrow down the assumption space of the model. However, existing label distribution learning works on label correlation use one-to-one or many-to-one correlation which has limitations in representing more complex correlation relationships. To address this issue, we attempt to extend the existing correlation relationships to many-to-many relationships. Specifically, we first construct a many-to-many correlation mining framework based on self-representation. Then by using the learned many-to-many correlation, a label distribution learning algorithm is designed. Our algorithm achieved the best performance in <span>(78.21%)</span> of cases across all datasets and all performance metrics with the algorithm having the best average ranking. It also demonstrated statistical superiority compared to the comparison algorithms in pairwise two-tailed <i>t</i>-tests. This paper introduces a novel approach to representing and applying label correlations in label distribution learning. The exploitation of this new many-to-many correlation can enhance the representational capabilities of label distribution learning models.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"311 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1007/s13042-024-02289-y
Qing Liu, Hao Wu, Yu Zong, Zheng-Yu Liu
To improve the speeds of the traditional nuclear norm minimization methods, a fast tri-factorization method (FTF) was recently proposed for matrix completion, and it received widespread attention in the fields of machine learning, image processing and signal processing. However, its low convergence accuracy became increasingly obvious, limiting its further application. To enhance the accuracy of FTF, a generalized tri-factorization method (GTF) is proposed in this paper. In GTF, the nuclear norm minimization model of FTF is improved to a novel ({{varvec{L}}}_{1,{varvec{p}}})(0 < p < 2) norm minimization model that can be optimized very efficiently by using QR decomposition. Since the ({{varvec{L}}}_{1,{varvec{p}}}) norm is a tighter relaxation of the rank function than the nuclear norm, the GTF method is much more accurate than the traditional methods. The experimental results demonstrate that GTF is more accurate and faster than the state-of-the-art methods.
为了提高传统核规范最小化方法的速度,最近提出了一种用于矩阵补全的快速三因式分解方法(FTF),该方法在机器学习、图像处理和信号处理领域受到广泛关注。然而,其收敛精度低的问题日益明显,限制了它的进一步应用。为了提高 FTF 的精度,本文提出了广义三因子化方法(GTF)。在 GTF 中,FTF 的核规范最小化模型被改进为新的({varvec{L}}}_{1,{varvec{p}}})(0 < p < 2) 规范最小化模型,该模型可以通过 QR 分解进行高效优化。由于 ({{varvec{L}}}_{1,{varvec{p}}} 是比核规范更严格的秩函数松弛,因此 GTF 方法比传统方法更精确。实验结果表明,GTF 比最先进的方法更准确、更快速。
{"title":"A generalized tri-factorization method for accurate matrix completion","authors":"Qing Liu, Hao Wu, Yu Zong, Zheng-Yu Liu","doi":"10.1007/s13042-024-02289-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02289-y","url":null,"abstract":"<p>To improve the speeds of the traditional nuclear norm minimization methods, a fast tri-factorization method (FTF) was recently proposed for matrix completion, and it received widespread attention in the fields of machine learning, image processing and signal processing. However, its low convergence accuracy became increasingly obvious, limiting its further application. To enhance the accuracy of FTF, a generalized tri-factorization method (GTF) is proposed in this paper. In GTF, the nuclear norm minimization model of FTF is improved to a novel <span>({{varvec{L}}}_{1,{varvec{p}}})</span>(0 < p < 2) norm minimization model that can be optimized very efficiently by using QR decomposition. Since the <span>({{varvec{L}}}_{1,{varvec{p}}})</span> norm is a tighter relaxation of the rank function than the nuclear norm, the GTF method is much more accurate than the traditional methods. The experimental results demonstrate that GTF is more accurate and faster than the state-of-the-art methods.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"1 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1007/s13042-024-02302-4
Yang Yang, Yuchao Gao, Zijin Wang, Xi’an Li, Hu Zhou, Jinran Wu
Accurate short-term load forecasting (STLF) is crucial for the power system. Traditional methods generally used signal decomposition techniques for feature extraction. However, these methods are limited in extrapolation performance, and the parameter of decomposition modes needs to be preset. To end this, this paper develops a novel STLF algorithm based on multi-scale perspective decomposition. The proposed algorithm adopts the multi-scale deep neural network (MscaleDNN) to decompose load series into low- and high-frequency components. Considering outliers of load series, this paper introduces the adaptive rescaled lncosh (ARlncosh) loss to fit the distribution of load data and improve the robustness. Furthermore, the attention mechanism (ATTN) extracts the correlations between different moments. In two power load data sets from Portugal and Australia, the proposed model generates competitive forecasting results.
{"title":"Multiscale-integrated deep learning approaches for short-term load forecasting","authors":"Yang Yang, Yuchao Gao, Zijin Wang, Xi’an Li, Hu Zhou, Jinran Wu","doi":"10.1007/s13042-024-02302-4","DOIUrl":"https://doi.org/10.1007/s13042-024-02302-4","url":null,"abstract":"<p>Accurate short-term load forecasting (STLF) is crucial for the power system. Traditional methods generally used signal decomposition techniques for feature extraction. However, these methods are limited in extrapolation performance, and the parameter of decomposition modes needs to be preset. To end this, this paper develops a novel STLF algorithm based on multi-scale perspective decomposition. The proposed algorithm adopts the multi-scale deep neural network (MscaleDNN) to decompose load series into low- and high-frequency components. Considering outliers of load series, this paper introduces the adaptive rescaled lncosh (ARlncosh) loss to fit the distribution of load data and improve the robustness. Furthermore, the attention mechanism (ATTN) extracts the correlations between different moments. In two power load data sets from Portugal and Australia, the proposed model generates competitive forecasting results.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"8 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous graph neural networks have attracted considerable attention for their proficiency in handling intricate heterogeneous structures. However, most existing methods model semantic relationships in heterogeneous graphs by manually defining meta-paths, inadvertently overlooking the inherent incompleteness of such graphs. To address this issue, we propose a multi-graph aggregated graph neural network (MGAGNN) for heterogeneous graph representation learning, which simultaneously leverages attribute similarity and high-order semantic information between nodes. Specifically, MGAGNN first employs the feature graph generator to generate a feature graph for completing the original graph structure. A semantic graph is then generated using a semantic graph generator, capturing higher-order semantic information through automatic meta-path learning. Finally, we aggregate the two candidate graphs to reconstruct a new heterogeneous graph and learn node embedding by graph convolutional networks. Extensive experiments on real-world datasets demonstrate the superior performance of the proposed method over state-of-the-art approaches.
{"title":"Multi-graph aggregated graph neural network for heterogeneous graph representation learning","authors":"Shuailei Zhu, Xiaofeng Wang, Shuaiming Lai, Yuntao Chen, Wenchao Zhai, Daying Quan, Yuanyuan Qi, Laishui Lv","doi":"10.1007/s13042-024-02294-1","DOIUrl":"https://doi.org/10.1007/s13042-024-02294-1","url":null,"abstract":"<p>Heterogeneous graph neural networks have attracted considerable attention for their proficiency in handling intricate heterogeneous structures. However, most existing methods model semantic relationships in heterogeneous graphs by manually defining meta-paths, inadvertently overlooking the inherent incompleteness of such graphs. To address this issue, we propose a multi-graph aggregated graph neural network (MGAGNN) for heterogeneous graph representation learning, which simultaneously leverages attribute similarity and high-order semantic information between nodes. Specifically, MGAGNN first employs the feature graph generator to generate a feature graph for completing the original graph structure. A semantic graph is then generated using a semantic graph generator, capturing higher-order semantic information through automatic meta-path learning. Finally, we aggregate the two candidate graphs to reconstruct a new heterogeneous graph and learn node embedding by graph convolutional networks. Extensive experiments on real-world datasets demonstrate the superior performance of the proposed method over state-of-the-art approaches.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"11 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-04DOI: 10.1007/s13042-024-02281-6
Linxin Wei, Quanxing Xu, Ziyu Hu
The path planning for unmanned mobile robots has always been a crucial issue, especially in unknown environments. Reinforcement learning widely used in path planning due to its ability to learn from unknown environments. But, in unknown environments, deep reinforcement learning algorithms have problems such as long training time and instability. In this article, improvements have been made to the deep deterministic policy gradient algorithm (DDPG) to address the aforementioned issues. Firstly, the experience pool is divided into different experience pools based on the difference between adjacent states; Secondly, experience is collected from various experience pools in different proportions for training, enabling the robot to achieve good obstacle avoidance ability; Finally, by designing a guided reward function, the convergence speed of the algorithm has been improved, and the robot can find the target point faster. The algorithm has been tested in practice and simulation, and the results show that it can enable robots to complete path planning tasks in complex unknown environments.
{"title":"Mobile robot path planning based on multi-experience pool deep deterministic policy gradient in unknown environment","authors":"Linxin Wei, Quanxing Xu, Ziyu Hu","doi":"10.1007/s13042-024-02281-6","DOIUrl":"https://doi.org/10.1007/s13042-024-02281-6","url":null,"abstract":"<p>The path planning for unmanned mobile robots has always been a crucial issue, especially in unknown environments. Reinforcement learning widely used in path planning due to its ability to learn from unknown environments. But, in unknown environments, deep reinforcement learning algorithms have problems such as long training time and instability. In this article, improvements have been made to the deep deterministic policy gradient algorithm (DDPG) to address the aforementioned issues. Firstly, the experience pool is divided into different experience pools based on the difference between adjacent states; Secondly, experience is collected from various experience pools in different proportions for training, enabling the robot to achieve good obstacle avoidance ability; Finally, by designing a guided reward function, the convergence speed of the algorithm has been improved, and the robot can find the target point faster. The algorithm has been tested in practice and simulation, and the results show that it can enable robots to complete path planning tasks in complex unknown environments.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"7 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141939012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1007/s13042-024-02278-1
Zhongxu Li, Qihan He, Hong Zhao, Wenyuan Yang
Unmanned aerial vehicles (UAVs) are extensively applied in military, rescue operations, and traffic detection fields, resulting from their flexibility, low cost, and autonomous flight capabilities. However, due to the drone’s flight height and shooting angle, the objects in aerial images are smaller, denser, and more complex than those in general images, triggering an unsatisfactory target detection effect. In this paper, we propose a model for UAV detection called DoubleM-Net, which contains multi-scale spatial pyramid pooling-fast (MS-SPPF) and Multi-Path Adaptive Feature Pyramid Network (MPA-FPN). DoubleM-Net utilizes the MS-SPPF module to extract feature maps of multiple receptive field sizes. Then, the MPA-FPN module first fuses features from every two adjacent scales, followed by a level-by-level interactive fusion of features. First, using the backbone network as the feature extractor, multiple feature maps of different scale ranges are extracted from the input image. Second, the MS-SPPF uses different pooled kernels to repeat multiple pooled operations at various scales to achieve rich multi-perceptive field features. Finally, the MPA-FPN module first incorporates semantic information between each adjacent two-scale layer. The top-level features are then passed back to the bottom level-by-level, and the underlying features are enhanced, enabling interaction and integration of features at different scales. The experimental results show that the mAP50-95 ratio of DoubleM-Net on the VisDrone dataset is 27.5%, and that of Doublem-Net on the DroneVehicle dataset in RGB and Infrared mode is 55.0% and 60.4%, respectively. Our model demonstrates excellent performance in air-to-ground image detection tasks, with exceptional results in detecting small objects.
{"title":"Doublem-net: multi-scale spatial pyramid pooling-fast and multi-path adaptive feature pyramid network for UAV detection","authors":"Zhongxu Li, Qihan He, Hong Zhao, Wenyuan Yang","doi":"10.1007/s13042-024-02278-1","DOIUrl":"https://doi.org/10.1007/s13042-024-02278-1","url":null,"abstract":"<p>Unmanned aerial vehicles (UAVs) are extensively applied in military, rescue operations, and traffic detection fields, resulting from their flexibility, low cost, and autonomous flight capabilities. However, due to the drone’s flight height and shooting angle, the objects in aerial images are smaller, denser, and more complex than those in general images, triggering an unsatisfactory target detection effect. In this paper, we propose a model for UAV detection called DoubleM-Net, which contains multi-scale spatial pyramid pooling-fast (MS-SPPF) and Multi-Path Adaptive Feature Pyramid Network (MPA-FPN). DoubleM-Net utilizes the MS-SPPF module to extract feature maps of multiple receptive field sizes. Then, the MPA-FPN module first fuses features from every two adjacent scales, followed by a level-by-level interactive fusion of features. First, using the backbone network as the feature extractor, multiple feature maps of different scale ranges are extracted from the input image. Second, the MS-SPPF uses different pooled kernels to repeat multiple pooled operations at various scales to achieve rich multi-perceptive field features. Finally, the MPA-FPN module first incorporates semantic information between each adjacent two-scale layer. The top-level features are then passed back to the bottom level-by-level, and the underlying features are enhanced, enabling interaction and integration of features at different scales. The experimental results show that the mAP50-95 ratio of DoubleM-Net on the VisDrone dataset is 27.5%, and that of Doublem-Net on the DroneVehicle dataset in RGB and Infrared mode is 55.0% and 60.4%, respectively. Our model demonstrates excellent performance in air-to-ground image detection tasks, with exceptional results in detecting small objects.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"16 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1007/s13042-024-02284-3
Zhenyang Huang, Yixing Zhao, Jinjiang Li, Yepeng Liu
Skin lesion segmentation is a fundamental task in the field of medical image analysis. Deep learning approaches have become essential tools for segmenting medical images, as their accuracy in effectively analyzing abnormalities plays a critical role in determining the ultimate diagnostic results. Because of the inherent difficulties presented by medical images, including variations in shapes and sizes, along with the indistinct boundaries between lesions and the surrounding backgrounds, certain conventional algorithms face difficulties in fulfilling the growing requirements for elevated accuracy in processing medical images. To enhance the performance in capturing edge features and fine details of lesion processing, this paper presents the Boundary-Prior-Guided Multi-Scale Aggregation Network for skin lesion segmentation (BGMAN). The proposed BGMAN follows a basic Encoder–Decoder structure, wherein the encoder network employs prevalent CNN-based architectures to capture semantic information. We propose the Transformer Bridge Block (TBB) and employ it to enhance multi-scale features captured by the encoder. The TBB strengthens the intensity of weak feature information, establishing long-distance relationships between feature information. In order to augment BGMAN’s capability to identify boundaries, a boundary-guided decoder is designed, utilizing the Boundary Aware Block (BAB) and Cross Scale Fusion Block (CSFB) to guide the decoding learning process. BAB can acquire features embedded with explicit boundary information under the supervision of a boundary mask, while CSFB aggregates boundary features from different scales using learnable embeddings. The proposed method has been validated on the ISIC2016, ISIC2017, and (PH^2) datasets. It outperforms current mainstream networks with the following results: F1 92.99 and IoU 87.71 on ISIC2016, F1 86.42 and IoU 78.34 on ISIC2017, and F1 94.83 and IoU 90.26 on (PH^2).
{"title":"Bgman: Boundary-Prior-Guided Multi-scale Aggregation Network for skin lesion segmentation","authors":"Zhenyang Huang, Yixing Zhao, Jinjiang Li, Yepeng Liu","doi":"10.1007/s13042-024-02284-3","DOIUrl":"https://doi.org/10.1007/s13042-024-02284-3","url":null,"abstract":"<p>Skin lesion segmentation is a fundamental task in the field of medical image analysis. Deep learning approaches have become essential tools for segmenting medical images, as their accuracy in effectively analyzing abnormalities plays a critical role in determining the ultimate diagnostic results. Because of the inherent difficulties presented by medical images, including variations in shapes and sizes, along with the indistinct boundaries between lesions and the surrounding backgrounds, certain conventional algorithms face difficulties in fulfilling the growing requirements for elevated accuracy in processing medical images. To enhance the performance in capturing edge features and fine details of lesion processing, this paper presents the Boundary-Prior-Guided Multi-Scale Aggregation Network for skin lesion segmentation (BGMAN). The proposed BGMAN follows a basic Encoder–Decoder structure, wherein the encoder network employs prevalent CNN-based architectures to capture semantic information. We propose the Transformer Bridge Block (TBB) and employ it to enhance multi-scale features captured by the encoder. The TBB strengthens the intensity of weak feature information, establishing long-distance relationships between feature information. In order to augment BGMAN’s capability to identify boundaries, a boundary-guided decoder is designed, utilizing the Boundary Aware Block (BAB) and Cross Scale Fusion Block (CSFB) to guide the decoding learning process. BAB can acquire features embedded with explicit boundary information under the supervision of a boundary mask, while CSFB aggregates boundary features from different scales using learnable embeddings. The proposed method has been validated on the ISIC2016, ISIC2017, and <span>(PH^2)</span> datasets. It outperforms current mainstream networks with the following results: F1 92.99 and IoU 87.71 on ISIC2016, F1 86.42 and IoU 78.34 on ISIC2017, and F1 94.83 and IoU 90.26 on <span>(PH^2)</span>.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"43 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1007/s13042-024-02286-1
Mengxi Yang, Dai Shi, Xuebin Zheng, Jie Yin, Junbin Gao
This paper aims to provide a novel design of a multiscale framelet convolution for spectral graph neural networks (GNNs). While current spectral methods excel in various graph learning tasks, they often lack the flexibility to adapt to noisy, incomplete, or perturbed graph signals, making them fragile in such conditions. Our newly proposed framelet convolution addresses these limitations by decomposing graph data into low-pass and high-pass spectra through a finely-tuned multiscale approach. Our approach directly designs filtering functions within the spectral domain, allowing for precise control over the spectral components. The proposed design excels in filtering out unwanted spectral information and significantly reduces the adverse effects of noisy graph signals. Our approach not only enhances the robustness of GNNs but also preserves crucial graph features and structures. Through extensive experiments on diverse, real-world graph datasets, we demonstrate that our framelet convolution achieves superior performance in node classification tasks. It exhibits remarkable resilience to noisy data and adversarial attacks, highlighting its potential as a robust solution for real-world graph applications. This advancement opens new avenues for more adaptive and reliable spectral GNN architectures.
{"title":"Quasi-framelets: robust graph neural networks via adaptive framelet convolution","authors":"Mengxi Yang, Dai Shi, Xuebin Zheng, Jie Yin, Junbin Gao","doi":"10.1007/s13042-024-02286-1","DOIUrl":"https://doi.org/10.1007/s13042-024-02286-1","url":null,"abstract":"<p>This paper aims to provide a novel design of a multiscale framelet convolution for spectral graph neural networks (GNNs). While current spectral methods excel in various graph learning tasks, they often lack the flexibility to adapt to noisy, incomplete, or perturbed graph signals, making them fragile in such conditions. Our newly proposed framelet convolution addresses these limitations by decomposing graph data into low-pass and high-pass spectra through a finely-tuned multiscale approach. Our approach directly designs filtering functions within the spectral domain, allowing for precise control over the spectral components. The proposed design excels in filtering out unwanted spectral information and significantly reduces the adverse effects of noisy graph signals. Our approach not only enhances the robustness of GNNs but also preserves crucial graph features and structures. Through extensive experiments on diverse, real-world graph datasets, we demonstrate that our framelet convolution achieves superior performance in node classification tasks. It exhibits remarkable resilience to noisy data and adversarial attacks, highlighting its potential as a robust solution for real-world graph applications. This advancement opens new avenues for more adaptive and reliable spectral GNN architectures.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"13 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1007/s13042-024-02282-5
Yiming Wang, Xiaolong Chen, Yi Chai, Kaixiong Xu, Yutao Jiang, Bowen Liu
The dual-mode 24/7 monitoring systems continuously obtain visible and infrared images in a real scene. However, differences such as color and texture between these cross-modality images pose challenges for visible-infrared person re-identification (ReID). Currently, the general method is modality-shared feature learning or modal-specific information compensation based on style transfer, but the modality differences often result in the inevitable loss of valuable feature information in the training process. To address this issue, A complementary feature fusion and identity consistency learning (CFF-ICL) method is proposed. On the one hand, the multiple feature fusion mechanism based on cross attention is used to promote the features extracted by the two groups of networks in the same modality image to show a more obvious complementary relationship to improve the comprehensiveness of feature information. On the other hand, the designed collaborative adversarial mechanism between dual discriminators and feature extraction network is designed to remove the modality differences, and then construct the identity consistency between visible and infrared images. Experimental results by testing on SYSU-MM01 and RegDB datasets verify the method’s effectiveness and superiority.
{"title":"Visible-infrared person re-identification with complementary feature fusion and identity consistency learning","authors":"Yiming Wang, Xiaolong Chen, Yi Chai, Kaixiong Xu, Yutao Jiang, Bowen Liu","doi":"10.1007/s13042-024-02282-5","DOIUrl":"https://doi.org/10.1007/s13042-024-02282-5","url":null,"abstract":"<p>The dual-mode 24/7 monitoring systems continuously obtain visible and infrared images in a real scene. However, differences such as color and texture between these cross-modality images pose challenges for visible-infrared person re-identification (ReID). Currently, the general method is modality-shared feature learning or modal-specific information compensation based on style transfer, but the modality differences often result in the inevitable loss of valuable feature information in the training process. To address this issue, A complementary feature fusion and identity consistency learning (<b>CFF-ICL</b>) method is proposed. On the one hand, the multiple feature fusion mechanism based on cross attention is used to promote the features extracted by the two groups of networks in the same modality image to show a more obvious complementary relationship to improve the comprehensiveness of feature information. On the other hand, the designed collaborative adversarial mechanism between dual discriminators and feature extraction network is designed to remove the modality differences, and then construct the identity consistency between visible and infrared images. Experimental results by testing on SYSU-MM01 and RegDB datasets verify the method’s effectiveness and superiority.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"25 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1007/s13042-024-02285-2
Jie Hu, Yinglian Zhu, Lishan Wu, Qilei Luo, Fei Teng, Tianrui Li
Measuring the semantic similarity between two texts is a fundamental aspect of text semantic matching. Each word in the texts holds a weighted meaning, and it is essential for the model to effectively capture the most crucial knowledge. However, current text matching methods based on BERT have limitations in acquiring professional domain knowledge. BERT requires extensive domain-specific training data to perform well in specialized fields such as medicine, where obtaining labeled data is challenging. In addition, current text matching models that inject domain knowledge often rely on creating new training tasks to fine-tune the model, which is time-consuming. Although existing works have directly injected domain knowledge into BERT through similarity matrices, they struggle to handle the challenge of small sample sizes in professional fields. Contrastive learning trains a representation learning model by generating instances that exhibit either similarity or dissimilarity, so that a more general representation can be learned with a small number of samples. In this paper, we propose to directly integrate the word similarity matrix into BERT’s multi-head attention mechanism under a contrastive learning framework to align similar words during training. Furthermore, in the context of Chinese medical applications, we propose an entity MASK approach to enhance the understanding of medical terms by pre-trained models. The proposed method helps BERT acquire domain knowledge to better learn text representations in professional fields. Extensive experimental results have shown that the algorithm significantly improves the performance of the text matching model, especially when training data is limited.
{"title":"Text semantic matching algorithm based on the introduction of external knowledge under contrastive learning","authors":"Jie Hu, Yinglian Zhu, Lishan Wu, Qilei Luo, Fei Teng, Tianrui Li","doi":"10.1007/s13042-024-02285-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02285-2","url":null,"abstract":"<p>Measuring the semantic similarity between two texts is a fundamental aspect of text semantic matching. Each word in the texts holds a weighted meaning, and it is essential for the model to effectively capture the most crucial knowledge. However, current text matching methods based on BERT have limitations in acquiring professional domain knowledge. BERT requires extensive domain-specific training data to perform well in specialized fields such as medicine, where obtaining labeled data is challenging. In addition, current text matching models that inject domain knowledge often rely on creating new training tasks to fine-tune the model, which is time-consuming. Although existing works have directly injected domain knowledge into BERT through similarity matrices, they struggle to handle the challenge of small sample sizes in professional fields. Contrastive learning trains a representation learning model by generating instances that exhibit either similarity or dissimilarity, so that a more general representation can be learned with a small number of samples. In this paper, we propose to directly integrate the word similarity matrix into BERT’s multi-head attention mechanism under a contrastive learning framework to align similar words during training. Furthermore, in the context of Chinese medical applications, we propose an entity MASK approach to enhance the understanding of medical terms by pre-trained models. The proposed method helps BERT acquire domain knowledge to better learn text representations in professional fields. Extensive experimental results have shown that the algorithm significantly improves the performance of the text matching model, especially when training data is limited.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"13 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141770181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}