Pub Date : 2025-01-06DOI: 10.1016/j.neunet.2024.107120
Yong Ding, Weijian Mai, Zhijun Zhang
To address the challenge of low recognition accuracy in transformer fault detection, a novel method called swarm budorcas taxicolor optimization-based multi-support vector (SBTO-MSV) is proposed. Firstly, a multi-support vector (MSV) model is proposed to realize multi-classification of transformer faults based on dissolved gas data. Then, a swarm budorcas taxicolor optimization (SBTO) algorithm is proposed to iteratively search the optimal model parameters during MSV model training, so as to obtain the most effective transformer fault diagnosis model. Experimental results on the IEC TC 10 dataset demonstrate that the SBTO-MSV method markedly outperforms traditional methods and state-of-the-art machine learning algorithms with the best average accuracy of 98.1%, effectively highlighting the superior classification performance of SBTO-MSV model and excellent parameter searching ability of SBTO algorithm. Additionally, validation on the collected dataset and UCI dataset further confirms the excellent classification performance and generalization ability of the SBTO-MSV model. This advancement provides robust technical support for improving transformer fault diagnosis and ensuring the reliable operation of power systems.
{"title":"A novel swarm budorcas taxicolor optimization-based multi-support vector method for transformer fault diagnosis.","authors":"Yong Ding, Weijian Mai, Zhijun Zhang","doi":"10.1016/j.neunet.2024.107120","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107120","url":null,"abstract":"<p><p>To address the challenge of low recognition accuracy in transformer fault detection, a novel method called swarm budorcas taxicolor optimization-based multi-support vector (SBTO-MSV) is proposed. Firstly, a multi-support vector (MSV) model is proposed to realize multi-classification of transformer faults based on dissolved gas data. Then, a swarm budorcas taxicolor optimization (SBTO) algorithm is proposed to iteratively search the optimal model parameters during MSV model training, so as to obtain the most effective transformer fault diagnosis model. Experimental results on the IEC TC 10 dataset demonstrate that the SBTO-MSV method markedly outperforms traditional methods and state-of-the-art machine learning algorithms with the best average accuracy of 98.1%, effectively highlighting the superior classification performance of SBTO-MSV model and excellent parameter searching ability of SBTO algorithm. Additionally, validation on the collected dataset and UCI dataset further confirms the excellent classification performance and generalization ability of the SBTO-MSV model. This advancement provides robust technical support for improving transformer fault diagnosis and ensuring the reliable operation of power systems.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107120"},"PeriodicalIF":6.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-04DOI: 10.1016/j.neunet.2024.107116
Mingqi Li, Wenming Ma, Zihao Chu
Recommender systems are widely used in various applications. Knowledge graphs are increasingly used to improve recommendation performance by extracting valuable information from user-item interactions. However, current methods do not effectively use fine-grained information within the knowledge graph. Additionally, some recommendation methods based on graph neural networks tend to overlook the importance of entities to users when performing aggregation operations. To alleviate these issues, we introduce a knowledge-graph-based graph neural network (PIFSA-GNN) for recommendation with two key components. The first component, user preference interaction fusion, incorporates user auxiliary information in the recommendation process. This enhances the influence of users on the recommendation model. The second component is an attention mechanism called user preference swap attention, which improves entity weight calculation for effectively aggregating neighboring entities. Our method was extensively tested on three real-world datasets. On the movie dataset, our method outperforms the best baseline by 1.3% in AUC and 2.8% in F1; Hit@1 increases by 0.7%, Hit@5 by 0.6%, and Hit@10 by 1.0%. On the restaurant dataset, AUC improves by 2.6% and F1 by 7.2%; Hit@1 increases by 1.3%, Hit@5 by 3.7%, and Hit@10 by 2.9%. On the music dataset, AUC improves by 0.9% and F1 by 0.4%; Hit@1 increases by 3.3%, Hit@5 by 1.2%, and Hit@10 by 0.2%. The results show that it outperforms baseline methods.
{"title":"User preference interaction fusion and swap attention graph neural network for recommender system.","authors":"Mingqi Li, Wenming Ma, Zihao Chu","doi":"10.1016/j.neunet.2024.107116","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107116","url":null,"abstract":"<p><p>Recommender systems are widely used in various applications. Knowledge graphs are increasingly used to improve recommendation performance by extracting valuable information from user-item interactions. However, current methods do not effectively use fine-grained information within the knowledge graph. Additionally, some recommendation methods based on graph neural networks tend to overlook the importance of entities to users when performing aggregation operations. To alleviate these issues, we introduce a knowledge-graph-based graph neural network (PIFSA-GNN) for recommendation with two key components. The first component, user preference interaction fusion, incorporates user auxiliary information in the recommendation process. This enhances the influence of users on the recommendation model. The second component is an attention mechanism called user preference swap attention, which improves entity weight calculation for effectively aggregating neighboring entities. Our method was extensively tested on three real-world datasets. On the movie dataset, our method outperforms the best baseline by 1.3% in AUC and 2.8% in F1; Hit@1 increases by 0.7%, Hit@5 by 0.6%, and Hit@10 by 1.0%. On the restaurant dataset, AUC improves by 2.6% and F1 by 7.2%; Hit@1 increases by 1.3%, Hit@5 by 3.7%, and Hit@10 by 2.9%. On the music dataset, AUC improves by 0.9% and F1 by 0.4%; Hit@1 increases by 3.3%, Hit@5 by 1.2%, and Hit@10 by 0.2%. The results show that it outperforms baseline methods.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107116"},"PeriodicalIF":6.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142972998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-04DOI: 10.1016/j.neunet.2024.107112
Yupeng Wang, Yongli Wang, Zaki Ahmad Khan, Anqi Huang, Jianghui Sang
Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.
{"title":"Multi-level feature fusion networks for smoke recognition in remote sensing imagery.","authors":"Yupeng Wang, Yongli Wang, Zaki Ahmad Khan, Anqi Huang, Jianghui Sang","doi":"10.1016/j.neunet.2024.107112","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107112","url":null,"abstract":"<p><p>Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107112"},"PeriodicalIF":6.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-04DOI: 10.1016/j.neunet.2024.107110
Zhao Ding, Yuling Jiao, Xiliang Lu, Peiying Wu, Jerry Zhijian Yang
The deep Ritz method (DRM) has recently been shown to be a simple and effective method for solving PDEs. However, the numerical analysis of DRM is still incomplete, especially why over-parameterized DRM works remains unknown. This paper presents the first convergence analysis of the over-parameterized DRM for second-order elliptic equations with Robin boundary conditions. We demonstrate that the convergence rate can be controlled by the weight norm, regardless of the number of parameters in the network. To this end, we establish novel approximation results in Sobolev spaces with norm constraints, which have independent significance.
{"title":"Convergence analysis of deep Ritz method with over-parameterization.","authors":"Zhao Ding, Yuling Jiao, Xiliang Lu, Peiying Wu, Jerry Zhijian Yang","doi":"10.1016/j.neunet.2024.107110","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107110","url":null,"abstract":"<p><p>The deep Ritz method (DRM) has recently been shown to be a simple and effective method for solving PDEs. However, the numerical analysis of DRM is still incomplete, especially why over-parameterized DRM works remains unknown. This paper presents the first convergence analysis of the over-parameterized DRM for second-order elliptic equations with Robin boundary conditions. We demonstrate that the convergence rate can be controlled by the weight norm, regardless of the number of parameters in the network. To this end, we establish novel approximation results in Sobolev spaces with norm constraints, which have independent significance.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107110"},"PeriodicalIF":6.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-03DOI: 10.1016/j.neunet.2024.107111
Shikun Mei, Qianqian Wang, Quanxue Gao, Ming Yang
Multi-view clustering has garnered significant attention due to its capacity to utilize information from multiple perspectives. The concept of anchor graph-based techniques was introduced to manage large-scale data better. However, current methods rely on K-means or uniform sampling to select anchors in the original space. This results in a disjointed approach separating anchor selection and subsequent graph construction. Moreover, these methods typically require additional K-means or spectral clustering to derive labels, often leading to suboptimal outcomes. To address these challenges, we present a novel approach called Multi-view Clustering based on Feature Selection and Semi-Non-Negative Anchor Graph Factorization (MCFSAF). This method unifies feature selection, anchor and anchor graph learning, and semi-non-negative factorization of the anchor graph into a cohesive framework. Within this framework, the anchors and anchor graph are learned in the embedding space following feature selection, and the clustering indicator matrix is obtained via semi-non-negative factorization of the anchor graph in each view. By applying the minimization of the tensor Schatten p-norm, we can uncover complementary information across multiple views efficiently. This synergetic process of anchor selection, anchor graph learning, and indicator matrix updating can effectively enhance the clustering quality. Critically, the fused indicator matrix enables us to directly acquire clustering labels without requiring additional K-means, thereby significantly improving the stability of the clustering process. Our method is optimized via an alternating iterations algorithm. Comprehensive experimental evaluations underscore the superior performance of our approach.
{"title":"Multi-view clustering based on feature selection and semi-non-negative anchor graph factorization.","authors":"Shikun Mei, Qianqian Wang, Quanxue Gao, Ming Yang","doi":"10.1016/j.neunet.2024.107111","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107111","url":null,"abstract":"<p><p>Multi-view clustering has garnered significant attention due to its capacity to utilize information from multiple perspectives. The concept of anchor graph-based techniques was introduced to manage large-scale data better. However, current methods rely on K-means or uniform sampling to select anchors in the original space. This results in a disjointed approach separating anchor selection and subsequent graph construction. Moreover, these methods typically require additional K-means or spectral clustering to derive labels, often leading to suboptimal outcomes. To address these challenges, we present a novel approach called Multi-view Clustering based on Feature Selection and Semi-Non-Negative Anchor Graph Factorization (MCFSAF). This method unifies feature selection, anchor and anchor graph learning, and semi-non-negative factorization of the anchor graph into a cohesive framework. Within this framework, the anchors and anchor graph are learned in the embedding space following feature selection, and the clustering indicator matrix is obtained via semi-non-negative factorization of the anchor graph in each view. By applying the minimization of the tensor Schatten p-norm, we can uncover complementary information across multiple views efficiently. This synergetic process of anchor selection, anchor graph learning, and indicator matrix updating can effectively enhance the clustering quality. Critically, the fused indicator matrix enables us to directly acquire clustering labels without requiring additional K-means, thereby significantly improving the stability of the clustering process. Our method is optimized via an alternating iterations algorithm. Comprehensive experimental evaluations underscore the superior performance of our approach.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107111"},"PeriodicalIF":6.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-03DOI: 10.1016/j.neunet.2024.107114
Lei Gao, Kai Liu, Ling Guan
Research on video-based understanding and learning has attracted widespread interest and has been adopted in various real applications, such as e-healthcare, action recognition, affective computing, to name a few. Amongst them, video-based action recognition is one of the most representative examples. With the advancement of multi-sensory technology, action recognition using multi-modal data has recently drawn wide attention. However, the research community faces new challenges in effectively exploring and utilizing the discriminative and complementary information across different modalities. Although score level fusion approaches have been popularly employed for multi-modal action recognition, they simply add the scores derived separately from different modalities without proper consideration of cross-modality semantics amongst multiple input data sources, invariably causing sub-optimal performance. To address this issue, this paper presents a two-stream heterogeneous network to extract and jointly process complementary features derived from RGB and skeleton modalities, respectively. Then, a discriminative multi-modal adaptation neural network model (DMANNM) is proposed and applied to the heterogeneous network, by integrating statistical machine learning (SML) principles with convolutional neural network (CNN) architecture. In addition, to achieve high recognition accuracy by the generated multi-modal structure, an effective nonlinear classification algorithm is presented in this work. Leveraging the joint strength of SML and CNN architecture, the proposed model forms an adaptive platform for handling datasets of different scales. To demonstrate the effectiveness and the generic nature of the proposed model, we conducted experiments on four popular video-based action recognition datasets with different scales: NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA (N-UCLA), and SYSU. The experimental results show the superiority of the proposed method over state-of-the-art compared.
{"title":"A discriminative multi-modal adaptation neural network model for video action recognition.","authors":"Lei Gao, Kai Liu, Ling Guan","doi":"10.1016/j.neunet.2024.107114","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107114","url":null,"abstract":"<p><p>Research on video-based understanding and learning has attracted widespread interest and has been adopted in various real applications, such as e-healthcare, action recognition, affective computing, to name a few. Amongst them, video-based action recognition is one of the most representative examples. With the advancement of multi-sensory technology, action recognition using multi-modal data has recently drawn wide attention. However, the research community faces new challenges in effectively exploring and utilizing the discriminative and complementary information across different modalities. Although score level fusion approaches have been popularly employed for multi-modal action recognition, they simply add the scores derived separately from different modalities without proper consideration of cross-modality semantics amongst multiple input data sources, invariably causing sub-optimal performance. To address this issue, this paper presents a two-stream heterogeneous network to extract and jointly process complementary features derived from RGB and skeleton modalities, respectively. Then, a discriminative multi-modal adaptation neural network model (DMANNM) is proposed and applied to the heterogeneous network, by integrating statistical machine learning (SML) principles with convolutional neural network (CNN) architecture. In addition, to achieve high recognition accuracy by the generated multi-modal structure, an effective nonlinear classification algorithm is presented in this work. Leveraging the joint strength of SML and CNN architecture, the proposed model forms an adaptive platform for handling datasets of different scales. To demonstrate the effectiveness and the generic nature of the proposed model, we conducted experiments on four popular video-based action recognition datasets with different scales: NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA (N-UCLA), and SYSU. The experimental results show the superiority of the proposed method over state-of-the-art compared.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107114"},"PeriodicalIF":6.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-03DOI: 10.1016/j.neunet.2024.107113
Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D Shields, George Em Karniadakis
Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.
{"title":"Synergistic learning with multi-task DeepONet for efficient PDE problem solving.","authors":"Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D Shields, George Em Karniadakis","doi":"10.1016/j.neunet.2024.107113","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107113","url":null,"abstract":"<p><p>Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107113"},"PeriodicalIF":6.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1016/j.neunet.2024.107088
Xu Gong, Qun Liu, Rui Han, Yike Guo, Guoyin Wang
The production of expressive molecular representations with scarce labeled data is challenging for AI-driven drug discovery. Mainstream studies often follow a pipeline that pre-trains a specific molecular encoder and then fine-tunes it. However, the significant challenges of these methods are (1) neglecting the propagation of diverse information within molecules and (2) the absence of knowledge and chemical constraints in the pre-training strategy. In this study, we propose an adaptive multipath information fused self-supervised framework (MIFS) that explores molecular representations from large-scale unlabeled data to aid drug discovery. In MIFS, we innovatively design a dedicated molecular graph encoder called Mol-EN, which implements three pathways of information propagation: atom-to-atom, chemical bond-to-atom, and group-to-atom, to comprehensively perceive and capture abundant semantic information. Furthermore, a novel adaptive pre-training strategy based on molecular scaffolds is devised to pre-train Mol-EN on 11 million unlabeled molecules. It optimizes Mol-EN by constructing a topological contrastive loss to provide additional chemical insights into molecular structures. Subsequently, the pre-trained Mol-EN is fine-tuned on 14 widespread drug discovery benchmark datasets, including molecular properties prediction, drug-target interactions, and drug-drug interactions. Notably, to further enhance chemical knowledge, we introduce an elemental knowledge graph (ElementKG) in the fine-tuning phase. Extensive experiments show that MIFS achieves competitive performance while providing plausible explanations for predictions from a chemical perspective.
{"title":"MIFS: An adaptive multipath information fused self-supervised framework for drug discovery.","authors":"Xu Gong, Qun Liu, Rui Han, Yike Guo, Guoyin Wang","doi":"10.1016/j.neunet.2024.107088","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107088","url":null,"abstract":"<p><p>The production of expressive molecular representations with scarce labeled data is challenging for AI-driven drug discovery. Mainstream studies often follow a pipeline that pre-trains a specific molecular encoder and then fine-tunes it. However, the significant challenges of these methods are (1) neglecting the propagation of diverse information within molecules and (2) the absence of knowledge and chemical constraints in the pre-training strategy. In this study, we propose an adaptive multipath information fused self-supervised framework (MIFS) that explores molecular representations from large-scale unlabeled data to aid drug discovery. In MIFS, we innovatively design a dedicated molecular graph encoder called Mol-EN, which implements three pathways of information propagation: atom-to-atom, chemical bond-to-atom, and group-to-atom, to comprehensively perceive and capture abundant semantic information. Furthermore, a novel adaptive pre-training strategy based on molecular scaffolds is devised to pre-train Mol-EN on 11 million unlabeled molecules. It optimizes Mol-EN by constructing a topological contrastive loss to provide additional chemical insights into molecular structures. Subsequently, the pre-trained Mol-EN is fine-tuned on 14 widespread drug discovery benchmark datasets, including molecular properties prediction, drug-target interactions, and drug-drug interactions. Notably, to further enhance chemical knowledge, we introduce an elemental knowledge graph (ElementKG) in the fine-tuning phase. Extensive experiments show that MIFS achieves competitive performance while providing plausible explanations for predictions from a chemical perspective.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107088"},"PeriodicalIF":6.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1016/j.neunet.2024.107108
Yuandi Li, Hui Ji, Fei Yu, Lechao Cheng, Nan Che
Temporal Multi-Modal Knowledge Graphs (TMMKGs) can be regarded as a synthesis of Temporal Knowledge Graphs (TKGs) and Multi-Modal Knowledge Graphs (MMKGs), combining the characteristics of both. TMMKGs can effectively model dynamic real-world phenomena, particularly in scenarios involving multiple heterogeneous information sources and time series characteristics, such as e-commerce websites, scene recording data, and intelligent transportation systems. We propose a Temporal Multi-Modal Knowledge Graph Generation (TMMKGG) method that can automatically construct TMMKGs, aiming to reduce construction costs. To support this, we construct a dynamic Visual-Audio-Language Multimodal (VALM) dataset, which is particularly suitable for extracting structured knowledge in response to temporal multimodal perception data. TMMKGG explores temporal dynamics and cross-modal integration, enabling multimodal data processing for dynamic knowledge graph generation and utilizing alignment strategies to enhance scene perception. To validate the effectiveness of TMMKGG, we compare it with state-of-the-art dynamic graph generation methods using the VALM dataset. Furthermore, TMMKG exhibits a significant disparity in the ratio of newly introduced entities to their associated newly introduced edges compared to TKGs. Based on this phenomenon, we introduce a Temporal Multi-Modal Link Prediction (TMMLP) method, which outperforms existing state-of-the-art techniques.
{"title":"Temporal multi-modal knowledge graph generation for link prediction.","authors":"Yuandi Li, Hui Ji, Fei Yu, Lechao Cheng, Nan Che","doi":"10.1016/j.neunet.2024.107108","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107108","url":null,"abstract":"<p><p>Temporal Multi-Modal Knowledge Graphs (TMMKGs) can be regarded as a synthesis of Temporal Knowledge Graphs (TKGs) and Multi-Modal Knowledge Graphs (MMKGs), combining the characteristics of both. TMMKGs can effectively model dynamic real-world phenomena, particularly in scenarios involving multiple heterogeneous information sources and time series characteristics, such as e-commerce websites, scene recording data, and intelligent transportation systems. We propose a Temporal Multi-Modal Knowledge Graph Generation (TMMKGG) method that can automatically construct TMMKGs, aiming to reduce construction costs. To support this, we construct a dynamic Visual-Audio-Language Multimodal (VALM) dataset, which is particularly suitable for extracting structured knowledge in response to temporal multimodal perception data. TMMKGG explores temporal dynamics and cross-modal integration, enabling multimodal data processing for dynamic knowledge graph generation and utilizing alignment strategies to enhance scene perception. To validate the effectiveness of TMMKGG, we compare it with state-of-the-art dynamic graph generation methods using the VALM dataset. Furthermore, TMMKG exhibits a significant disparity in the ratio of newly introduced entities to their associated newly introduced edges compared to TKGs. Based on this phenomenon, we introduce a Temporal Multi-Modal Link Prediction (TMMLP) method, which outperforms existing state-of-the-art techniques.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107108"},"PeriodicalIF":6.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.1016/j.neunet.2024.107109
Yuhong Shi, Hongguang Pan, Ze Jiang, Libin Zhang, Rui Miao, Zheng Wang, Xinyu Lei
The presence of substantial similarities and redundant information within video data limits the performance of video object recognition models. To address this issue, a Global-Local Storage Enhanced video object recognition model (GSE) is proposed in this paper. Firstly, the model incorporates a two-stage dynamic multi-frame aggregation module to aggregate shallow frame features. This module aggregates features in batches from each input video using feature extraction, dynamic multi-frame aggregation, and centralized concatenations, significantly reducing the model's computational burden while retaining key information. In addition, a Global-Local Storage (GS) module is constructed to retain and utilize the information in the frame sequence effectively. This module classifies features using a temporal difference threshold method and employs a processing approach of inheritance, storage, and output to filter and retain features. By integrating global, local and key features, the model can accurately capture important temporal features when facing complex video scenes. Subsequently, a Cascaded Multi-head Attention (CMA) mechanism is designed. The multi-head cascade structure in this mechanism progressively focuses on object features and explores the correlations between key and global, local features. The differential step attention calculation is used to ensure computational efficiency. Finally, we optimize the model structure and adjust parameters, and verify the GSE model performance through comprehensive experiments. Experimental results on the ImageNet 2015 and NPS-Drones datasets demonstrate that the GSE model achieves the highest mAP of 0.8352 and 0.8617, respectively. Compared with other models, the GSE model achieves a commendable balance across metrics such as precision, efficiency, and power consumption.
{"title":"GSE: A global-local storage enhanced video object recognition model.","authors":"Yuhong Shi, Hongguang Pan, Ze Jiang, Libin Zhang, Rui Miao, Zheng Wang, Xinyu Lei","doi":"10.1016/j.neunet.2024.107109","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107109","url":null,"abstract":"<p><p>The presence of substantial similarities and redundant information within video data limits the performance of video object recognition models. To address this issue, a Global-Local Storage Enhanced video object recognition model (GSE) is proposed in this paper. Firstly, the model incorporates a two-stage dynamic multi-frame aggregation module to aggregate shallow frame features. This module aggregates features in batches from each input video using feature extraction, dynamic multi-frame aggregation, and centralized concatenations, significantly reducing the model's computational burden while retaining key information. In addition, a Global-Local Storage (GS) module is constructed to retain and utilize the information in the frame sequence effectively. This module classifies features using a temporal difference threshold method and employs a processing approach of inheritance, storage, and output to filter and retain features. By integrating global, local and key features, the model can accurately capture important temporal features when facing complex video scenes. Subsequently, a Cascaded Multi-head Attention (CMA) mechanism is designed. The multi-head cascade structure in this mechanism progressively focuses on object features and explores the correlations between key and global, local features. The differential step attention calculation is used to ensure computational efficiency. Finally, we optimize the model structure and adjust parameters, and verify the GSE model performance through comprehensive experiments. Experimental results on the ImageNet 2015 and NPS-Drones datasets demonstrate that the GSE model achieves the highest mAP of 0.8352 and 0.8617, respectively. Compared with other models, the GSE model achieves a commendable balance across metrics such as precision, efficiency, and power consumption.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107109"},"PeriodicalIF":6.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}