Pub Date : 2024-11-26DOI: 10.1016/j.knosys.2024.112765
Ali Atghaei, Mohammad Rahmati
This article addresses the challenge of adapting deep learning models trained on specific datasets to effectively generalize to similar-class dataset with different underlying distributions. We introduce a novel deep representation learning method that takes into account both statistical and geometric properties of features for domain generalization. Our approach utilizes Fourier augmentation and Nyström estimation to evaluate the similarity between graphs derived from original and augmented data features. Furthermore, we employ a contrastive loss function to maintain proximity among samples belonging to the same class while ensuring separation between samples from different classes in the feature space. By minimizing these loss functions, our method aims to enhance model generalizability across diverse domains. Comprehensive experiments conducted on real-world benchmark datasets, including PACS, Office-Home, VLCS, Digits-DG and UTKFace, demonstrate the effectiveness of the proposed method. The results consistently indicate superior performance compared to other approaches under various conditions, underscoring its robustness in achieving improved generalization across domains.
{"title":"Domain generalization via geometric adaptation over augmented data","authors":"Ali Atghaei, Mohammad Rahmati","doi":"10.1016/j.knosys.2024.112765","DOIUrl":"10.1016/j.knosys.2024.112765","url":null,"abstract":"<div><div>This article addresses the challenge of adapting deep learning models trained on specific datasets to effectively generalize to similar-class dataset with different underlying distributions. We introduce a novel deep representation learning method that takes into account both statistical and geometric properties of features for domain generalization. Our approach utilizes Fourier augmentation and Nyström estimation to evaluate the similarity between graphs derived from original and augmented data features. Furthermore, we employ a contrastive loss function to maintain proximity among samples belonging to the same class while ensuring separation between samples from different classes in the feature space. By minimizing these loss functions, our method aims to enhance model generalizability across diverse domains. Comprehensive experiments conducted on real-world benchmark datasets, including PACS, Office-Home, VLCS, Digits-DG and UTKFace, demonstrate the effectiveness of the proposed method. The results consistently indicate superior performance compared to other approaches under various conditions, underscoring its robustness in achieving improved generalization across domains.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112765"},"PeriodicalIF":7.2,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-26DOI: 10.1016/j.knosys.2024.112771
Shushuai Xie , Wei Cheng , Ji Xing , Xuefeng Chen , Zelin Nie , Qian Huang , Rongyong Zhang
Recently, fault diagnosis methods based on domain generalization (DG) have been developed to improve the diagnostic performance of unseen target domains by multi-source domain knowledge transfer. However, existing methods assume that the source domains are discrete and that domain labels are known a priori, which is difficult to satisfy in complex and changing industrial systems. In addition, the gradient update conflict caused by the specific information of source domains leads to the degradation of the DG performance. Therefore, in this study, we relax the discrete domain assumption to the mixed domain setting and propose a novel gradient-consistency strategy cooperative meta-feature learning for mixed-domain generalized machine fault diagnosis. First, a domain feature-guided adaptive normalization module is proposed to normalize the underlying distribution of multi-source domains, and the mixed-source domains are divided into potential domain clusters. Then, a novel meta-feature encoding method is proposed to explicitly encode the overall fault feature structure, which is used to learn the generalized fault feature representation. Finally, a novel gradient consistency update strategy is designed to reduce the impact of domain-specific differences on model generalization. The effectiveness and superiority of the proposed method are verified on many DG diagnostic tasks on two public bearing datasets and the nuclear circulating water pump planetary gearbox dataset.
{"title":"Gradient consistency strategy cooperative meta-feature learning for mixed domain generalized machine fault diagnosis","authors":"Shushuai Xie , Wei Cheng , Ji Xing , Xuefeng Chen , Zelin Nie , Qian Huang , Rongyong Zhang","doi":"10.1016/j.knosys.2024.112771","DOIUrl":"10.1016/j.knosys.2024.112771","url":null,"abstract":"<div><div>Recently, fault diagnosis methods based on domain generalization (DG) have been developed to improve the diagnostic performance of unseen target domains by multi-source domain knowledge transfer. However, existing methods assume that the source domains are discrete and that domain labels are known a priori, which is difficult to satisfy in complex and changing industrial systems. In addition, the gradient update conflict caused by the specific information of source domains leads to the degradation of the DG performance. Therefore, in this study, we relax the discrete domain assumption to the mixed domain setting and propose a novel gradient-consistency strategy cooperative meta-feature learning for mixed-domain generalized machine fault diagnosis. First, a domain feature-guided adaptive normalization module is proposed to normalize the underlying distribution of multi-source domains, and the mixed-source domains are divided into potential domain clusters. Then, a novel meta-feature encoding method is proposed to explicitly encode the overall fault feature structure, which is used to learn the generalized fault feature representation. Finally, a novel gradient consistency update strategy is designed to reduce the impact of domain-specific differences on model generalization. The effectiveness and superiority of the proposed method are verified on many DG diagnostic tasks on two public bearing datasets and the nuclear circulating water pump planetary gearbox dataset.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112771"},"PeriodicalIF":7.2,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-26DOI: 10.1016/j.knosys.2024.112772
Jili Xia , Lihuo He , Xinbo Gao , Bo Hu
Images taken in natural environments often exhibit complicated distortions, posing significant challenges for assessing their quality. Although current methods prioritize the perception of image contents and distortions, few explicitly investigate local distortions, a crucial factor affecting human visual perception. To mitigate this, this paper proposes a novel blind image quality assessment (IQA) method for in-the-wild images, termed DPSF, which integrates Distorted Patch Selection and multi-scale and multi-granularity feature Fusion. Specifically, it is first explained that the distributions of the mean subtracted contrast normalized coefficients of distorted patches differ from those of undistorted patches. Building upon this, an effective strategy for distorted patch selection is devised. Subsequently, a hybrid Transformer-convolutional neural network (CNN) module is proposed to exploit the benefits of both Transformer and CNN for distortion perception, in which the long-range dependencies of the selected patches are considered. Finally, an effective fusion module is employed for image quality evaluation, amalgamating finer and richer semantic and distortion features from multiple scales and granularities. Experimental results on five authentic IQA databases demonstrate that the proposed method yields more precise quality predictions compared with the state-of-the-art methods.
{"title":"Blind image quality assessment for in-the-wild images by integrating distorted patch selection and multi-scale-and-granularity fusion","authors":"Jili Xia , Lihuo He , Xinbo Gao , Bo Hu","doi":"10.1016/j.knosys.2024.112772","DOIUrl":"10.1016/j.knosys.2024.112772","url":null,"abstract":"<div><div>Images taken in natural environments often exhibit complicated distortions, posing significant challenges for assessing their quality. Although current methods prioritize the perception of image contents and distortions, few explicitly investigate local distortions, a crucial factor affecting human visual perception. To mitigate this, this paper proposes a novel blind image quality assessment (IQA) method for in-the-wild images, termed DPSF, which integrates Distorted Patch Selection and multi-scale and multi-granularity feature Fusion. Specifically, it is first explained that the distributions of the mean subtracted contrast normalized coefficients of distorted patches differ from those of undistorted patches. Building upon this, an effective strategy for distorted patch selection is devised. Subsequently, a hybrid Transformer-convolutional neural network (CNN) module is proposed to exploit the benefits of both Transformer and CNN for distortion perception, in which the long-range dependencies of the selected patches are considered. Finally, an effective fusion module is employed for image quality evaluation, amalgamating finer and richer semantic and distortion features from multiple scales and granularities. Experimental results on five authentic IQA databases demonstrate that the proposed method yields more precise quality predictions compared with the state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112772"},"PeriodicalIF":7.2,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142757412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-23DOI: 10.1016/j.knosys.2024.112764
Yufei Ge , Xiaoli Zhang , Bo Huang , Xiongfei Li , Siwei Ma
Pansharpening aims to obtain high-resolution multispectral images by fusing panchromatic and low-resolution multispectral images. However, current deep learning-based methods lack reasonable interpretability and suffer from certain spectral and spatial distortions. To address these issues, we propose an interpretable deep unfolding network based on intrinsic image decomposition, called DUN-IID. IID decomposes the multispectral image into a reflectance component and a shading component to formulate a novel variational optimization function. The reflectance and shading components effectively reflect spectral and spatial information, respectively. This decoupling strategy enhances feature fidelity by preventing interference between spectral and spatial information, enabling independent optimization of spatial reconstruction and spectral correction during fusion. The optimization function is solved by the half-quadratic splitting method and unfolded into the end-to-end DUN-IID, which consists of two primal update blocks for prior learning and two dual update blocks for reconstruction. To alleviate the effects of information loss across intermediate stages, we introduce the source images into two primal update blocks for information enhancement. Therefore, the reflectance and shading primal update blocks are customized as a multi-scale structure and a band-aware construction, respectively. Besides, the multi-dimension attention mechanism is adopted to improve feature representation. Extensive experiments validate that our method is superior to other state-of-the-art methods.
{"title":"A deep unfolding network based on intrinsic image decomposition for pansharpening","authors":"Yufei Ge , Xiaoli Zhang , Bo Huang , Xiongfei Li , Siwei Ma","doi":"10.1016/j.knosys.2024.112764","DOIUrl":"10.1016/j.knosys.2024.112764","url":null,"abstract":"<div><div>Pansharpening aims to obtain high-resolution multispectral images by fusing panchromatic and low-resolution multispectral images. However, current deep learning-based methods lack reasonable interpretability and suffer from certain spectral and spatial distortions. To address these issues, we propose an interpretable deep unfolding network based on intrinsic image decomposition, called DUN-IID. IID decomposes the multispectral image into a reflectance component and a shading component to formulate a novel variational optimization function. The reflectance and shading components effectively reflect spectral and spatial information, respectively. This decoupling strategy enhances feature fidelity by preventing interference between spectral and spatial information, enabling independent optimization of spatial reconstruction and spectral correction during fusion. The optimization function is solved by the half-quadratic splitting method and unfolded into the end-to-end DUN-IID, which consists of two primal update blocks for prior learning and two dual update blocks for reconstruction. To alleviate the effects of information loss across intermediate stages, we introduce the source images into two primal update blocks for information enhancement. Therefore, the reflectance and shading primal update blocks are customized as a multi-scale structure and a band-aware construction, respectively. Besides, the multi-dimension attention mechanism is adopted to improve feature representation. Extensive experiments validate that our method is superior to other state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"308 ","pages":"Article 112764"},"PeriodicalIF":7.2,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142720584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-23DOI: 10.1016/j.knosys.2024.112782
Bo He, Ruoyu Zhao, Dali Tang
Aspect-based sentiment analysis (ABSA) models typically focus on learning contextual syntactic information and dependency relations. However, these models often struggle with losing or forgetting implicit feature information from shallow and intermediate layers during the learning process, potentially compromising classification performance. We consider the implicit feature information in each layer of the model to be equally important for processing. So, this paper proposes the CABiLSTM-BERT model, which aims to fully leverage implicit features at each layer to address this information loss problem and improve accuracy. The CABiLSTM-BERT model employs a frozen BERT pre-trained model to extract text word vector features, reducing overfitting and accelerating training. These word vectors are then processed through CABiLSTM, which preserves implicit feature representations of input sequences and LSTMs in each direction and layer. The model applies convolution to merge all features into a set of embedding representations after highlighting important features through multi-head self-attention calculations for each feature group. This approach minimizes information loss and maximizes utilization of important implicit feature information at each layer. Finally, the feature representations undergo average pooling before passing through the sentiment classification layer for polarity prediction. The effectiveness of the CABiLSTM-BERT model is validated using five publicly available real-world datasets and evaluated using metrics such as accuracy and Macro-F1. Results demonstrate the model's efficacy in addressing ABSA tasks.
{"title":"CABiLSTM-BERT: Aspect-based sentiment analysis model based on deep implicit feature extraction","authors":"Bo He, Ruoyu Zhao, Dali Tang","doi":"10.1016/j.knosys.2024.112782","DOIUrl":"10.1016/j.knosys.2024.112782","url":null,"abstract":"<div><div>Aspect-based sentiment analysis (ABSA) models typically focus on learning contextual syntactic information and dependency relations. However, these models often struggle with losing or forgetting implicit feature information from shallow and intermediate layers during the learning process, potentially compromising classification performance. We consider the implicit feature information in each layer of the model to be equally important for processing. So, this paper proposes the CABiLSTM-BERT model, which aims to fully leverage implicit features at each layer to address this information loss problem and improve accuracy. The CABiLSTM-BERT model employs a frozen BERT pre-trained model to extract text word vector features, reducing overfitting and accelerating training. These word vectors are then processed through CABiLSTM, which preserves implicit feature representations of input sequences and LSTMs in each direction and layer. The model applies convolution to merge all features into a set of embedding representations after highlighting important features through multi-head self-attention calculations for each feature group. This approach minimizes information loss and maximizes utilization of important implicit feature information at each layer. Finally, the feature representations undergo average pooling before passing through the sentiment classification layer for polarity prediction. The effectiveness of the CABiLSTM-BERT model is validated using five publicly available real-world datasets and evaluated using metrics such as accuracy and Macro-F1. Results demonstrate the model's efficacy in addressing ABSA tasks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112782"},"PeriodicalIF":7.2,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-23DOI: 10.1016/j.knosys.2024.112738
Hengdong Zhu , Yingshan Shen , Choujun Zhan , Fu Lee Wang , Heng Weng , Tianyong Hao
The performance of graph-based clustering is commonly limited by two-stage processing (Constructing and dividing similarity graph) and the quality of similar graphs. To this end, we propose a new graph-based clustering method with dual-feature regularization and Laplacian rank constraint. Specifically, our method reveals the clustering structure and unifies the two-stage process. It imposes a Laplacian rank constraint on the similarity graph to ensure that it has connected components. In addition, a method based on dual-feature regularization is designed to capture local data feature information from both feature extraction and adaptive regression, and is applied to an accurate distance metric learning. A reweighting optimization is integrated to learn a high-quality robust similarity graph. Comprehensive experiments on Ecoli, Yale and Yeast datasets show that our method outperforms the existing graph-based clustering methods with an average improvement of about 4%, 5% and 7% on the evaluation metrics ACC, NMI and RI, respectively.
{"title":"A new graph-based clustering method with dual-feature regularization and Laplacian rank constraint","authors":"Hengdong Zhu , Yingshan Shen , Choujun Zhan , Fu Lee Wang , Heng Weng , Tianyong Hao","doi":"10.1016/j.knosys.2024.112738","DOIUrl":"10.1016/j.knosys.2024.112738","url":null,"abstract":"<div><div>The performance of graph-based clustering is commonly limited by two-stage processing (Constructing and dividing similarity graph) and the quality of similar graphs. To this end, we propose a new graph-based clustering method with dual-feature regularization and Laplacian rank constraint. Specifically, our method reveals the clustering structure and unifies the two-stage process. It imposes a Laplacian rank constraint on the similarity graph to ensure that it has <span><math><mi>C</mi></math></span> connected components. In addition, a method based on dual-feature regularization is designed to capture local data feature information from both feature extraction and adaptive regression, and is applied to an accurate distance metric learning. A reweighting optimization is integrated to learn a high-quality robust similarity graph. Comprehensive experiments on Ecoli, Yale and Yeast datasets show that our method outperforms the existing graph-based clustering methods with an average improvement of about 4%, 5% and 7% on the evaluation metrics ACC, NMI and RI, respectively.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112738"},"PeriodicalIF":7.2,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-23DOI: 10.1016/j.knosys.2024.112787
Jiayang Liu , Chaobing Wang , Rui Wang , Qian Xiao , Xiaosun Wang , Shijing Wu , Long Zhang
The measured signals of mechanical equipment may exhibit distribution discrepancies due to object variations or operational conditions. The multiscale framework has been demonstrated to be effective in enriching features. However, existing methods overlook the structural discrepancies and collaborative contributions at specific scales. Meanwhile, most current domain adaptation strategies predominantly consider class and domain labels, neglecting feature structural shifts, especially in subdomain feature structures. Based on that, a new cross-domain fault diagnosis method for mechanical is proposed in this paper, called multiscale adaptive graph adversarial network(MAGAN). MAGAN consists of a feature extractor, domain adaptation module, and classifier. In the feature extractor, a hierarchical residual multiscale graph learning module is employed to obtain rich features, an adaptive graph learning module is utilized to learn differentiated representations of specific scale structures, and a multiscale fusion module is applied to facilitate the collaboration of different scale features. After that, the domain adaptation module aids the feature extractor in learning transferable features by constructing a measure of subdomain feature structure discrepancy and adversarial domain discriminator. The classifier is then utilized for cross-domain fault diagnosis on the extracted transferable features. Finally, the proposed MAGAN is evaluated using 3 cross-machine transfer scenarios based on a scaled-down test rig for wind turbine gearbox and 12 cross-operating conditions transfer scenarios based on a published bearing dataset. The results validate the transferability and generalization of MAGAN.
{"title":"A novel multiscale adaptive graph adversarial network for mechanical fault diagnosis","authors":"Jiayang Liu , Chaobing Wang , Rui Wang , Qian Xiao , Xiaosun Wang , Shijing Wu , Long Zhang","doi":"10.1016/j.knosys.2024.112787","DOIUrl":"10.1016/j.knosys.2024.112787","url":null,"abstract":"<div><div>The measured signals of mechanical equipment may exhibit distribution discrepancies due to object variations or operational conditions. The multiscale framework has been demonstrated to be effective in enriching features. However, existing methods overlook the structural discrepancies and collaborative contributions at specific scales. Meanwhile, most current domain adaptation strategies predominantly consider class and domain labels, neglecting feature structural shifts, especially in subdomain feature structures. Based on that, a new cross-domain fault diagnosis method for mechanical is proposed in this paper, called multiscale adaptive graph adversarial network(MAGAN). MAGAN consists of a feature extractor, domain adaptation module, and classifier. In the feature extractor, a hierarchical residual multiscale graph learning module is employed to obtain rich features, an adaptive graph learning module is utilized to learn differentiated representations of specific scale structures, and a multiscale fusion module is applied to facilitate the collaboration of different scale features. After that, the domain adaptation module aids the feature extractor in learning transferable features by constructing a measure of subdomain feature structure discrepancy and adversarial domain discriminator. The classifier is then utilized for cross-domain fault diagnosis on the extracted transferable features. Finally, the proposed MAGAN is evaluated using 3 cross-machine transfer scenarios based on a scaled-down test rig for wind turbine gearbox and 12 cross-operating conditions transfer scenarios based on a published bearing dataset. The results validate the transferability and generalization of MAGAN.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112787"},"PeriodicalIF":7.2,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heterogeneous Graph Contrastive Learning (HGCL) has attracted lots of attentions because of eliminating the requirement of node labels. The encoders used in HGCL mainly are message-passing based Heterogeneous Graph Neural Networks, which are vulnerable to edge perturbations. Recently, a few HGCL models replace the polluted graph with KNN graph or threshold graph, which both have flaws: (1) each node in KNN graph have the same degree K, it is irrational and loses original structural features; (2) the threshold is selected artificially, which hinders both effectiveness and interpretability. To tackle the above issues, we propose an Automated Message Selection based Heterogeneous Graph Contrastive Learning (AMS-HGCL) model. We first set relation view and meta-path view for contrast. Then, a robust encoder is proposed to defend structural attacks for both views by automatically selecting harmless messages, without setting all nodes having the same number of neighbors. The learned probabilities of messages can show harmful features directly, which makes our model interpretable. Finally, we design a novel cross-view contrastive loss to optimize AMS-HGCL and output robust node representations. The experimental results on four real datasets demonstrate that AMS-HGCL is feasible and effective.
{"title":"Automated message selection for robust Heterogeneous Graph Contrastive Learning","authors":"Rui Bing , Guan Yuan , Yanmei Zhang , Yong Zhou , Qiuyan Yan","doi":"10.1016/j.knosys.2024.112739","DOIUrl":"10.1016/j.knosys.2024.112739","url":null,"abstract":"<div><div>Heterogeneous Graph Contrastive Learning (HGCL) has attracted lots of attentions because of eliminating the requirement of node labels. The encoders used in HGCL mainly are message-passing based Heterogeneous Graph Neural Networks, which are vulnerable to edge perturbations. Recently, a few HGCL models replace the polluted graph with <em>KNN</em> graph or threshold graph, which both have flaws: (1) each node in <em>KNN</em> graph have the same degree <em>K</em>, it is irrational and loses original structural features; (2) the threshold is selected artificially, which hinders both effectiveness and interpretability. To tackle the above issues, we propose an <u>A</u>utomated <u>M</u>essage <u>S</u>election based <u>H</u>eterogeneous <u>G</u>raph <u>C</u>ontrastive <u>L</u>earning (AMS-HGCL) model. We first set relation view and meta-path view for contrast. Then, a robust encoder is proposed to defend structural attacks for both views by automatically selecting harmless messages, without setting all nodes having the same number of neighbors. The learned probabilities of messages can show harmful features directly, which makes our model interpretable. Finally, we design a novel cross-view contrastive loss to optimize AMS-HGCL and output robust node representations. The experimental results on four real datasets demonstrate that AMS-HGCL is feasible and effective.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112739"},"PeriodicalIF":7.2,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142698129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-22DOI: 10.1016/j.knosys.2024.112750
Yu Jin, Jie Liu, Shaowei Chen
Universal information extraction (Universal IE) aims to develop one model capable of solving multiple IE target tasks. Previous works have enhanced extraction performance of target tasks through auxiliary tasks. However, there are still limitations in terms of learning strategies. From one aspect, joint learning-based universal IE approaches, which simply mix auxiliary tasks with target tasks, fail to enable the model to master basic knowledge from auxiliary tasks before learning target tasks. From another aspect, continual learning-based universal IE approaches, which sequentially update all the model parameters on auxiliary tasks and target tasks, tend to cause catastrophic forgetting. In this study, we design a multi-LoRA continual learning-based instruction fine-tuning framework for universal IE. Specifically, we design unique LoRA modules for learning auxiliary tasks and target tasks. We first freeze pre-trained weights and update additional parameters on auxiliary tasks through one LoRA module. Subsequently, we keep the weights frozen and further adjust parameters through another LoRA module to adapt the model to the target tasks. Finally, we merge the frozen weights with learned weights, thereby enabling the model to better leverage the acquired abilities during the inference phase. Therefore, our model masters basic extraction abilities before learning target tasks and does not forget this basic knowledge during the target learning process. Moreover, we regard extraction, classification, and recognition as basic abilities and further design auxiliary tasks based on these basic abilities. Experimental results on 37 datasets across 3 tasks show that our approach reaches state-of-the-art performance.
{"title":"Multi-LoRA continual learning based instruction tuning framework for universal information extraction","authors":"Yu Jin, Jie Liu, Shaowei Chen","doi":"10.1016/j.knosys.2024.112750","DOIUrl":"10.1016/j.knosys.2024.112750","url":null,"abstract":"<div><div>Universal information extraction (Universal IE) aims to develop one model capable of solving multiple IE target tasks. Previous works have enhanced extraction performance of target tasks through auxiliary tasks. However, there are still limitations in terms of learning strategies. From one aspect, joint learning-based universal IE approaches, which simply mix auxiliary tasks with target tasks, fail to enable the model to master basic knowledge from auxiliary tasks before learning target tasks. From another aspect, continual learning-based universal IE approaches, which sequentially update all the model parameters on auxiliary tasks and target tasks, tend to cause catastrophic forgetting. In this study, we design a multi-LoRA continual learning-based instruction fine-tuning framework for universal IE. Specifically, we design unique LoRA modules for learning auxiliary tasks and target tasks. We first freeze pre-trained weights and update additional parameters on auxiliary tasks through one LoRA module. Subsequently, we keep the weights frozen and further adjust parameters through another LoRA module to adapt the model to the target tasks. Finally, we merge the frozen weights with learned weights, thereby enabling the model to better leverage the acquired abilities during the inference phase. Therefore, our model masters basic extraction abilities before learning target tasks and does not forget this basic knowledge during the target learning process. Moreover, we regard extraction, classification, and recognition as basic abilities and further design auxiliary tasks based on these basic abilities. Experimental results on 37 datasets across 3 tasks show that our approach reaches state-of-the-art performance.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"308 ","pages":"Article 112750"},"PeriodicalIF":7.2,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142720586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-22DOI: 10.1016/j.knosys.2024.112762
Zhizhen Wang , Liu Fu , Meng Ma , Zhi Zhai , Hui Chen
With the advancement of intelligent sensing techniques, massive monitoring signals are collected and accumulated from industrial systems. Given that sensors are often correlated and constructed to reflect graph topology, the signals can be conceptualized as graph data. The polynomial filter-based graph neural networks (GNNs) are commonly employed to exploit information from nodes features and graph topology for graph data analysis. However, the polynomial filter-based GNNs encounter difficulty in accurately modeling sharp changes and the coefficients can vary a lot, making them hard to learn. To address this problem, a novel graph neural network named extended auto-regressive moving average graph neural network (eAGNN) is proposed. Compared with auto-regressive moving average (ARMA) neural network, the order restriction are removed, allowing for the inference of a more general neural network, which enables the modeling of filters with more different shapes. Furthermore, both low-frequency and high-frequency information are explicitly and separately extracted so as to alleviate the burden of the learning process and further enhance the learning capability. Finally, several experiments including public node classification and fault diagnosis were conducted. The results demonstrate that the proposed eAGNN exhibits high performance compared to alternative methods.
{"title":"Extended ARMA graph neural networks for the prognosis of complex systems","authors":"Zhizhen Wang , Liu Fu , Meng Ma , Zhi Zhai , Hui Chen","doi":"10.1016/j.knosys.2024.112762","DOIUrl":"10.1016/j.knosys.2024.112762","url":null,"abstract":"<div><div>With the advancement of intelligent sensing techniques, massive monitoring signals are collected and accumulated from industrial systems. Given that sensors are often correlated and constructed to reflect graph topology, the signals can be conceptualized as graph data. The polynomial filter-based graph neural networks (GNNs) are commonly employed to exploit information from nodes features and graph topology for graph data analysis. However, the polynomial filter-based GNNs encounter difficulty in accurately modeling sharp changes and the coefficients can vary a lot, making them hard to learn. To address this problem, a novel graph neural network named extended auto-regressive moving average graph neural network (eAGNN) is proposed. Compared with auto-regressive moving average (ARMA) neural network, the order restriction are removed, allowing for the inference of a more general neural network, which enables the modeling of filters with more different shapes. Furthermore, both low-frequency and high-frequency information are explicitly and separately extracted so as to alleviate the burden of the learning process and further enhance the learning capability. Finally, several experiments including public node classification and fault diagnosis were conducted. The results demonstrate that the proposed eAGNN exhibits high performance compared to alternative methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"309 ","pages":"Article 112762"},"PeriodicalIF":7.2,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142748613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}