首页 > 最新文献

Neural Networks最新文献

英文 中文
A novel swarm budorcas taxicolor optimization-based multi-support vector method for transformer fault diagnosis. 基于群budorcas taxicolcolor优化的多支持向量变压器故障诊断方法。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-06 DOI: 10.1016/j.neunet.2024.107120
Yong Ding, Weijian Mai, Zhijun Zhang

To address the challenge of low recognition accuracy in transformer fault detection, a novel method called swarm budorcas taxicolor optimization-based multi-support vector (SBTO-MSV) is proposed. Firstly, a multi-support vector (MSV) model is proposed to realize multi-classification of transformer faults based on dissolved gas data. Then, a swarm budorcas taxicolor optimization (SBTO) algorithm is proposed to iteratively search the optimal model parameters during MSV model training, so as to obtain the most effective transformer fault diagnosis model. Experimental results on the IEC TC 10 dataset demonstrate that the SBTO-MSV method markedly outperforms traditional methods and state-of-the-art machine learning algorithms with the best average accuracy of 98.1%, effectively highlighting the superior classification performance of SBTO-MSV model and excellent parameter searching ability of SBTO algorithm. Additionally, validation on the collected dataset and UCI dataset further confirms the excellent classification performance and generalization ability of the SBTO-MSV model. This advancement provides robust technical support for improving transformer fault diagnosis and ensuring the reliable operation of power systems.

针对变压器故障检测中识别精度低的问题,提出了一种基于群budorcas taxicolcolor优化的多支持向量(SBTO-MSV)方法。首先,提出了基于溶解气体数据的多支持向量(MSV)模型,实现变压器故障的多分类;然后,提出了一种群budorcas taxicolor optimization (SBTO)算法,在MSV模型训练过程中迭代搜索最优模型参数,从而获得最有效的变压器故障诊断模型。在IEC TC 10数据集上的实验结果表明,SBTO- msv方法显著优于传统方法和最先进的机器学习算法,平均准确率最高达到98.1%,有效地突出了SBTO- msv模型优越的分类性能和SBTO算法优异的参数搜索能力。此外,通过对采集到的数据集和UCI数据集的验证,进一步证实了SBTO-MSV模型出色的分类性能和泛化能力。这一进展为提高变压器故障诊断水平,保证电力系统的可靠运行提供了有力的技术支持。
{"title":"A novel swarm budorcas taxicolor optimization-based multi-support vector method for transformer fault diagnosis.","authors":"Yong Ding, Weijian Mai, Zhijun Zhang","doi":"10.1016/j.neunet.2024.107120","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107120","url":null,"abstract":"<p><p>To address the challenge of low recognition accuracy in transformer fault detection, a novel method called swarm budorcas taxicolor optimization-based multi-support vector (SBTO-MSV) is proposed. Firstly, a multi-support vector (MSV) model is proposed to realize multi-classification of transformer faults based on dissolved gas data. Then, a swarm budorcas taxicolor optimization (SBTO) algorithm is proposed to iteratively search the optimal model parameters during MSV model training, so as to obtain the most effective transformer fault diagnosis model. Experimental results on the IEC TC 10 dataset demonstrate that the SBTO-MSV method markedly outperforms traditional methods and state-of-the-art machine learning algorithms with the best average accuracy of 98.1%, effectively highlighting the superior classification performance of SBTO-MSV model and excellent parameter searching ability of SBTO algorithm. Additionally, validation on the collected dataset and UCI dataset further confirms the excellent classification performance and generalization ability of the SBTO-MSV model. This advancement provides robust technical support for improving transformer fault diagnosis and ensuring the reliable operation of power systems.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107120"},"PeriodicalIF":6.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
User preference interaction fusion and swap attention graph neural network for recommender system. 用户偏好交互融合与交换注意图神经网络推荐系统。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-04 DOI: 10.1016/j.neunet.2024.107116
Mingqi Li, Wenming Ma, Zihao Chu

Recommender systems are widely used in various applications. Knowledge graphs are increasingly used to improve recommendation performance by extracting valuable information from user-item interactions. However, current methods do not effectively use fine-grained information within the knowledge graph. Additionally, some recommendation methods based on graph neural networks tend to overlook the importance of entities to users when performing aggregation operations. To alleviate these issues, we introduce a knowledge-graph-based graph neural network (PIFSA-GNN) for recommendation with two key components. The first component, user preference interaction fusion, incorporates user auxiliary information in the recommendation process. This enhances the influence of users on the recommendation model. The second component is an attention mechanism called user preference swap attention, which improves entity weight calculation for effectively aggregating neighboring entities. Our method was extensively tested on three real-world datasets. On the movie dataset, our method outperforms the best baseline by 1.3% in AUC and 2.8% in F1; Hit@1 increases by 0.7%, Hit@5 by 0.6%, and Hit@10 by 1.0%. On the restaurant dataset, AUC improves by 2.6% and F1 by 7.2%; Hit@1 increases by 1.3%, Hit@5 by 3.7%, and Hit@10 by 2.9%. On the music dataset, AUC improves by 0.9% and F1 by 0.4%; Hit@1 increases by 3.3%, Hit@5 by 1.2%, and Hit@10 by 0.2%. The results show that it outperforms baseline methods.

推荐系统广泛应用于各种应用中。知识图越来越多地用于通过从用户-物品交互中提取有价值的信息来提高推荐性能。然而,目前的方法不能有效地利用知识图中的细粒度信息。此外,一些基于图神经网络的推荐方法在进行聚合操作时往往忽略了实体对用户的重要性。为了缓解这些问题,我们引入了一种基于知识图的图神经网络(PIFSA-GNN),该网络由两个关键组件组成。第一个组件是用户偏好交互融合,在推荐过程中融入用户辅助信息。这增强了用户对推荐模型的影响。第二个组件是一种称为用户偏好交换注意的注意力机制,它改进了实体权重计算,从而有效地聚合相邻实体。我们的方法在三个真实世界的数据集上进行了广泛的测试。在电影数据集上,我们的方法在AUC上优于最佳基线1.3%,在F1上优于2.8%;Hit@1增长0.7%,Hit@5增长0.6%,Hit@10增长1.0%。在餐厅数据集上,AUC提高了2.6%,F1提高了7.2%;Hit@1增长1.3%,Hit@5增长3.7%,Hit@10增长2.9%。在音乐数据集上,AUC提高0.9%,F1提高0.4%;Hit@1增长3.3%,Hit@5增长1.2%,Hit@10增长0.2%。结果表明,该方法优于基准方法。
{"title":"User preference interaction fusion and swap attention graph neural network for recommender system.","authors":"Mingqi Li, Wenming Ma, Zihao Chu","doi":"10.1016/j.neunet.2024.107116","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107116","url":null,"abstract":"<p><p>Recommender systems are widely used in various applications. Knowledge graphs are increasingly used to improve recommendation performance by extracting valuable information from user-item interactions. However, current methods do not effectively use fine-grained information within the knowledge graph. Additionally, some recommendation methods based on graph neural networks tend to overlook the importance of entities to users when performing aggregation operations. To alleviate these issues, we introduce a knowledge-graph-based graph neural network (PIFSA-GNN) for recommendation with two key components. The first component, user preference interaction fusion, incorporates user auxiliary information in the recommendation process. This enhances the influence of users on the recommendation model. The second component is an attention mechanism called user preference swap attention, which improves entity weight calculation for effectively aggregating neighboring entities. Our method was extensively tested on three real-world datasets. On the movie dataset, our method outperforms the best baseline by 1.3% in AUC and 2.8% in F1; Hit@1 increases by 0.7%, Hit@5 by 0.6%, and Hit@10 by 1.0%. On the restaurant dataset, AUC improves by 2.6% and F1 by 7.2%; Hit@1 increases by 1.3%, Hit@5 by 3.7%, and Hit@10 by 2.9%. On the music dataset, AUC improves by 0.9% and F1 by 0.4%; Hit@1 increases by 3.3%, Hit@5 by 1.2%, and Hit@10 by 0.2%. The results show that it outperforms baseline methods.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107116"},"PeriodicalIF":6.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142972998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-level feature fusion networks for smoke recognition in remote sensing imagery. 多尺度特征融合网络用于遥感图像烟雾识别。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-04 DOI: 10.1016/j.neunet.2024.107112
Yupeng Wang, Yongli Wang, Zaki Ahmad Khan, Anqi Huang, Jianghui Sang

Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.

烟雾是森林火灾的关键指标,通常在火焰点燃之前就可以探测到。在物联网(IoT)系统中,准确的遥感图像烟雾识别对于有效的森林火灾监测至关重要。然而,现有的检测方法在复杂的现实场景中经常出现问题,在这些场景中,不同的烟雾形状和大小、复杂的背景和类似烟雾的现象(例如云和雾霾)会导致漏检和误报。为了解决这些挑战,我们提出了多层次特征融合网络(MFFNet),这是一种基于对比学习的新框架。MFFNet首先使用预训练的ConvNeXt模型从遥感图像中提取多尺度特征,捕获不同粒度级别的信息,以适应烟雾外观的变化。注意特征增强模块进一步细化这些多尺度特征,增强与烟雾探测相关的细粒度、判别属性。随后,双线性特征融合模块将这些丰富的特征结合起来,有效地减少了背景干扰,提高了模型区分烟雾和视觉相似现象的能力。最后,通过关注烟雾模式内的独特区域,采用对比特征学习来提高对类内变化的鲁棒性。在基准数据集USTC_SmokeRS上进行评估,MFFNet的准确率达到了98.87%。此外,我们的模型在扩展的E_SmokeRS数据集上的检测率为94.54%,虚警率为3.30%。这些结果突出了MFFNet在识别遥感图像中的烟雾方面的有效性,超越了现有的方法。代码可在https://github.com/WangYuPeng1/MFFNet上访问。
{"title":"Multi-level feature fusion networks for smoke recognition in remote sensing imagery.","authors":"Yupeng Wang, Yongli Wang, Zaki Ahmad Khan, Anqi Huang, Jianghui Sang","doi":"10.1016/j.neunet.2024.107112","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107112","url":null,"abstract":"<p><p>Smoke is a critical indicator of forest fires, often detectable before flames ignite. Accurate smoke identification in remote sensing images is vital for effective forest fire monitoring within Internet of Things (IoT) systems. However, existing detection methods frequently falter in complex real-world scenarios, where variable smoke shapes and sizes, intricate backgrounds, and smoke-like phenomena (e.g., clouds and haze) lead to missed detections and false alarms. To address these challenges, we propose the Multi-level Feature Fusion Network (MFFNet), a novel framework grounded in contrastive learning. MFFNet begins by extracting multi-scale features from remote sensing images using a pre-trained ConvNeXt model, capturing information across different levels of granularity to accommodate variations in smoke appearance. The Attention Feature Enhancement Module further refines these multi-scale features, enhancing fine-grained, discriminative attributes relevant to smoke detection. Subsequently, the Bilinear Feature Fusion Module combines these enriched features, effectively reducing background interference and improving the model's ability to distinguish smoke from visually similar phenomena. Finally, contrastive feature learning is employed to improve robustness against intra-class variations by focusing on unique regions within the smoke patterns. Evaluated on the benchmark dataset USTC_SmokeRS, MFFNet achieves an accuracy of 98.87%. Additionally, our model demonstrates a detection rate of 94.54% on the extended E_SmokeRS dataset, with a low false alarm rate of 3.30%. These results highlight the effectiveness of MFFNet in recognizing smoke in remote sensing images, surpassing existing methodologies. The code is accessible at https://github.com/WangYuPeng1/MFFNet.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107112"},"PeriodicalIF":6.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence analysis of deep Ritz method with over-parameterization. 带过参数化的深Ritz方法的收敛性分析。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-04 DOI: 10.1016/j.neunet.2024.107110
Zhao Ding, Yuling Jiao, Xiliang Lu, Peiying Wu, Jerry Zhijian Yang

The deep Ritz method (DRM) has recently been shown to be a simple and effective method for solving PDEs. However, the numerical analysis of DRM is still incomplete, especially why over-parameterized DRM works remains unknown. This paper presents the first convergence analysis of the over-parameterized DRM for second-order elliptic equations with Robin boundary conditions. We demonstrate that the convergence rate can be controlled by the weight norm, regardless of the number of parameters in the network. To this end, we establish novel approximation results in Sobolev spaces with norm constraints, which have independent significance.

深里兹法(deep Ritz method, DRM)是求解偏微分方程的一种简单有效的方法。然而,对DRM的数值分析还不完整,特别是对过度参数化DRM为何有效的问题仍不清楚。本文给出了具有Robin边界条件的二阶椭圆方程的过参数化DRM的第一次收敛性分析。我们证明了无论网络中参数的数量如何,收敛速度都可以由权范数控制。为此,我们在具有范数约束的Sobolev空间中建立了新的具有独立意义的近似结果。
{"title":"Convergence analysis of deep Ritz method with over-parameterization.","authors":"Zhao Ding, Yuling Jiao, Xiliang Lu, Peiying Wu, Jerry Zhijian Yang","doi":"10.1016/j.neunet.2024.107110","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107110","url":null,"abstract":"<p><p>The deep Ritz method (DRM) has recently been shown to be a simple and effective method for solving PDEs. However, the numerical analysis of DRM is still incomplete, especially why over-parameterized DRM works remains unknown. This paper presents the first convergence analysis of the over-parameterized DRM for second-order elliptic equations with Robin boundary conditions. We demonstrate that the convergence rate can be controlled by the weight norm, regardless of the number of parameters in the network. To this end, we establish novel approximation results in Sobolev spaces with norm constraints, which have independent significance.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107110"},"PeriodicalIF":6.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view clustering based on feature selection and semi-non-negative anchor graph factorization. 基于特征选择和半非负锚图分解的多视图聚类。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-03 DOI: 10.1016/j.neunet.2024.107111
Shikun Mei, Qianqian Wang, Quanxue Gao, Ming Yang

Multi-view clustering has garnered significant attention due to its capacity to utilize information from multiple perspectives. The concept of anchor graph-based techniques was introduced to manage large-scale data better. However, current methods rely on K-means or uniform sampling to select anchors in the original space. This results in a disjointed approach separating anchor selection and subsequent graph construction. Moreover, these methods typically require additional K-means or spectral clustering to derive labels, often leading to suboptimal outcomes. To address these challenges, we present a novel approach called Multi-view Clustering based on Feature Selection and Semi-Non-Negative Anchor Graph Factorization (MCFSAF). This method unifies feature selection, anchor and anchor graph learning, and semi-non-negative factorization of the anchor graph into a cohesive framework. Within this framework, the anchors and anchor graph are learned in the embedding space following feature selection, and the clustering indicator matrix is obtained via semi-non-negative factorization of the anchor graph in each view. By applying the minimization of the tensor Schatten p-norm, we can uncover complementary information across multiple views efficiently. This synergetic process of anchor selection, anchor graph learning, and indicator matrix updating can effectively enhance the clustering quality. Critically, the fused indicator matrix enables us to directly acquire clustering labels without requiring additional K-means, thereby significantly improving the stability of the clustering process. Our method is optimized via an alternating iterations algorithm. Comprehensive experimental evaluations underscore the superior performance of our approach.

多视图聚类由于其从多个角度利用信息的能力而获得了极大的关注。为了更好地管理大规模数据,引入了基于锚点图技术的概念。然而,目前的方法依赖于k均值或均匀抽样来选择原始空间中的锚点。这将导致分离锚选择和随后的图构建的脱节方法。此外,这些方法通常需要额外的k均值或谱聚类来获得标签,这通常会导致次优结果。为了解决这些挑战,我们提出了一种新的方法,称为基于特征选择和半非负锚图分解(MCFSAF)的多视图聚类。该方法将特征选择、锚点和锚图学习以及锚图的半非负分解统一到一个内聚框架中。该框架在特征选择后,在嵌入空间中学习锚点和锚图,并通过对每个视图中的锚图进行半非负分解得到聚类指标矩阵。通过对张量Schatten p-范数的最小化,我们可以有效地揭示跨多个视图的互补信息。这种锚点选择、锚点图学习和指标矩阵更新的协同过程可以有效地提高聚类质量。关键的是,融合的指标矩阵使我们能够直接获得聚类标签,而不需要额外的K-means,从而显著提高聚类过程的稳定性。我们的方法是通过交替迭代算法优化的。全面的实验评估强调了我们的方法的优越性能。
{"title":"Multi-view clustering based on feature selection and semi-non-negative anchor graph factorization.","authors":"Shikun Mei, Qianqian Wang, Quanxue Gao, Ming Yang","doi":"10.1016/j.neunet.2024.107111","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107111","url":null,"abstract":"<p><p>Multi-view clustering has garnered significant attention due to its capacity to utilize information from multiple perspectives. The concept of anchor graph-based techniques was introduced to manage large-scale data better. However, current methods rely on K-means or uniform sampling to select anchors in the original space. This results in a disjointed approach separating anchor selection and subsequent graph construction. Moreover, these methods typically require additional K-means or spectral clustering to derive labels, often leading to suboptimal outcomes. To address these challenges, we present a novel approach called Multi-view Clustering based on Feature Selection and Semi-Non-Negative Anchor Graph Factorization (MCFSAF). This method unifies feature selection, anchor and anchor graph learning, and semi-non-negative factorization of the anchor graph into a cohesive framework. Within this framework, the anchors and anchor graph are learned in the embedding space following feature selection, and the clustering indicator matrix is obtained via semi-non-negative factorization of the anchor graph in each view. By applying the minimization of the tensor Schatten p-norm, we can uncover complementary information across multiple views efficiently. This synergetic process of anchor selection, anchor graph learning, and indicator matrix updating can effectively enhance the clustering quality. Critically, the fused indicator matrix enables us to directly acquire clustering labels without requiring additional K-means, thereby significantly improving the stability of the clustering process. Our method is optimized via an alternating iterations algorithm. Comprehensive experimental evaluations underscore the superior performance of our approach.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107111"},"PeriodicalIF":6.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A discriminative multi-modal adaptation neural network model for video action recognition. 视频动作识别的判别多模态自适应神经网络模型。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-03 DOI: 10.1016/j.neunet.2024.107114
Lei Gao, Kai Liu, Ling Guan

Research on video-based understanding and learning has attracted widespread interest and has been adopted in various real applications, such as e-healthcare, action recognition, affective computing, to name a few. Amongst them, video-based action recognition is one of the most representative examples. With the advancement of multi-sensory technology, action recognition using multi-modal data has recently drawn wide attention. However, the research community faces new challenges in effectively exploring and utilizing the discriminative and complementary information across different modalities. Although score level fusion approaches have been popularly employed for multi-modal action recognition, they simply add the scores derived separately from different modalities without proper consideration of cross-modality semantics amongst multiple input data sources, invariably causing sub-optimal performance. To address this issue, this paper presents a two-stream heterogeneous network to extract and jointly process complementary features derived from RGB and skeleton modalities, respectively. Then, a discriminative multi-modal adaptation neural network model (DMANNM) is proposed and applied to the heterogeneous network, by integrating statistical machine learning (SML) principles with convolutional neural network (CNN) architecture. In addition, to achieve high recognition accuracy by the generated multi-modal structure, an effective nonlinear classification algorithm is presented in this work. Leveraging the joint strength of SML and CNN architecture, the proposed model forms an adaptive platform for handling datasets of different scales. To demonstrate the effectiveness and the generic nature of the proposed model, we conducted experiments on four popular video-based action recognition datasets with different scales: NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA (N-UCLA), and SYSU. The experimental results show the superiority of the proposed method over state-of-the-art compared.

基于视频的理解和学习的研究已经引起了广泛的兴趣,并已被应用于各种实际应用中,如电子医疗、动作识别、情感计算等。其中,基于视频的动作识别是最具代表性的例子之一。随着多感官技术的发展,基于多模态数据的动作识别近年来受到广泛关注。然而,如何有效地探索和利用不同模式下的区别性和互补性信息,面临着新的挑战。尽管分数水平融合方法已被广泛用于多模态动作识别,但它们只是简单地将不同模态分别得出的分数相加,而没有适当考虑多个输入数据源之间的跨模态语义,总是导致性能次优。为了解决这一问题,本文提出了一种两流异构网络,分别从RGB和骨架模式中提取和联合处理互补特征。然后,将统计机器学习(SML)原理与卷积神经网络(CNN)结构相结合,提出了一种判别式多模态自适应神经网络模型(DMANNM),并将其应用于异构网络。此外,为了利用生成的多模态结构达到较高的识别精度,本文提出了一种有效的非线性分类算法。该模型利用SML和CNN架构的联合优势,形成了一个处理不同规模数据集的自适应平台。为了证明该模型的有效性和通用性,我们在四个流行的基于视频的动作识别数据集上进行了实验,这些数据集具有不同的尺度:NTU RGB+D, NTU RGB+D 120,西北加州大学洛杉矶分校(N-UCLA)和SYSU。实验结果表明,该方法与现有方法相比具有优越性。
{"title":"A discriminative multi-modal adaptation neural network model for video action recognition.","authors":"Lei Gao, Kai Liu, Ling Guan","doi":"10.1016/j.neunet.2024.107114","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107114","url":null,"abstract":"<p><p>Research on video-based understanding and learning has attracted widespread interest and has been adopted in various real applications, such as e-healthcare, action recognition, affective computing, to name a few. Amongst them, video-based action recognition is one of the most representative examples. With the advancement of multi-sensory technology, action recognition using multi-modal data has recently drawn wide attention. However, the research community faces new challenges in effectively exploring and utilizing the discriminative and complementary information across different modalities. Although score level fusion approaches have been popularly employed for multi-modal action recognition, they simply add the scores derived separately from different modalities without proper consideration of cross-modality semantics amongst multiple input data sources, invariably causing sub-optimal performance. To address this issue, this paper presents a two-stream heterogeneous network to extract and jointly process complementary features derived from RGB and skeleton modalities, respectively. Then, a discriminative multi-modal adaptation neural network model (DMANNM) is proposed and applied to the heterogeneous network, by integrating statistical machine learning (SML) principles with convolutional neural network (CNN) architecture. In addition, to achieve high recognition accuracy by the generated multi-modal structure, an effective nonlinear classification algorithm is presented in this work. Leveraging the joint strength of SML and CNN architecture, the proposed model forms an adaptive platform for handling datasets of different scales. To demonstrate the effectiveness and the generic nature of the proposed model, we conducted experiments on four popular video-based action recognition datasets with different scales: NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA (N-UCLA), and SYSU. The experimental results show the superiority of the proposed method over state-of-the-art compared.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107114"},"PeriodicalIF":6.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synergistic learning with multi-task DeepONet for efficient PDE problem solving. 协同学习与多任务DeepONet的高效PDE问题求解。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-03 DOI: 10.1016/j.neunet.2024.107113
Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D Shields, George Em Karniadakis

Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.

多任务学习(MTL)是一种归纳迁移机制,旨在利用多任务中的有用信息来提高单任务学习的泛化性能。它在传统机器学习中被广泛探索,以解决神经网络中的数据稀疏性和过拟合等问题。在这项工作中,我们将MTL应用于偏微分方程(PDEs)控制的科学和工程问题。然而,在这种上下文中实现MTL是复杂的,因为它需要特定于任务的修改,以适应表示不同物理过程的各种场景。为此,我们提出了一个多任务深度算子网络(MT-DeepONet),以在单个并发训练会话中学习PDE中源项的各种功能形式和多个几何形状的解决方案。我们在香草DeepONet的分支网络中引入修改,以考虑PDE中参数化系数的各种函数形式。此外,我们通过在分支网络中引入二进制掩码并将其纳入损失项来处理参数化几何,以提高对新几何任务的收敛性和泛化性。我们的方法在三个基准问题上得到了证明:(1)学习Fisher方程中源项的不同函数形式;(2)在二维达西流问题中学习多种几何形状,并展示更好的新几何形状的迁移学习能力;(3)学习一个传热问题的三维参数化几何,并展示在新的但类似的几何上预测的能力。我们的MT-DeepONet框架提供了一种新的方法,在基于协同学习的统一框架下解决工程和科学中的PDE问题,从而降低了神经算子的总体训练成本。
{"title":"Synergistic learning with multi-task DeepONet for efficient PDE problem solving.","authors":"Varun Kumar, Somdatta Goswami, Katiana Kontolati, Michael D Shields, George Em Karniadakis","doi":"10.1016/j.neunet.2024.107113","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107113","url":null,"abstract":"<p><p>Multi-task learning (MTL) is an inductive transfer mechanism designed to leverage useful information from multiple tasks to improve generalization performance compared to single-task learning. It has been extensively explored in traditional machine learning to address issues such as data sparsity and overfitting in neural networks. In this work, we apply MTL to problems in science and engineering governed by partial differential equations (PDEs). However, implementing MTL in this context is complex, as it requires task-specific modifications to accommodate various scenarios representing different physical processes. To this end, we present a multi-task deep operator network (MT-DeepONet) to learn solutions across various functional forms of source terms in a PDE and multiple geometries in a single concurrent training session. We introduce modifications in the branch network of the vanilla DeepONet to account for various functional forms of a parameterized coefficient in a PDE. Additionally, we handle parameterized geometries by introducing a binary mask in the branch network and incorporating it into the loss term to improve convergence and generalization to new geometry tasks. Our approach is demonstrated on three benchmark problems: (1) learning different functional forms of the source term in the Fisher equation; (2) learning multiple geometries in a 2D Darcy Flow problem and showcasing better transfer learning capabilities to new geometries; and (3) learning 3D parameterized geometries for a heat transfer problem and demonstrate the ability to predict on new but similar geometries. Our MT-DeepONet framework offers a novel approach to solving PDE problems in engineering and science under a unified umbrella based on synergistic learning that reduces the overall training cost for neural operators.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107113"},"PeriodicalIF":6.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142967318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIFS: An adaptive multipath information fused self-supervised framework for drug discovery. MIFS:一个自适应多路径信息融合的药物发现自监督框架。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-02 DOI: 10.1016/j.neunet.2024.107088
Xu Gong, Qun Liu, Rui Han, Yike Guo, Guoyin Wang

The production of expressive molecular representations with scarce labeled data is challenging for AI-driven drug discovery. Mainstream studies often follow a pipeline that pre-trains a specific molecular encoder and then fine-tunes it. However, the significant challenges of these methods are (1) neglecting the propagation of diverse information within molecules and (2) the absence of knowledge and chemical constraints in the pre-training strategy. In this study, we propose an adaptive multipath information fused self-supervised framework (MIFS) that explores molecular representations from large-scale unlabeled data to aid drug discovery. In MIFS, we innovatively design a dedicated molecular graph encoder called Mol-EN, which implements three pathways of information propagation: atom-to-atom, chemical bond-to-atom, and group-to-atom, to comprehensively perceive and capture abundant semantic information. Furthermore, a novel adaptive pre-training strategy based on molecular scaffolds is devised to pre-train Mol-EN on 11 million unlabeled molecules. It optimizes Mol-EN by constructing a topological contrastive loss to provide additional chemical insights into molecular structures. Subsequently, the pre-trained Mol-EN is fine-tuned on 14 widespread drug discovery benchmark datasets, including molecular properties prediction, drug-target interactions, and drug-drug interactions. Notably, to further enhance chemical knowledge, we introduce an elemental knowledge graph (ElementKG) in the fine-tuning phase. Extensive experiments show that MIFS achieves competitive performance while providing plausible explanations for predictions from a chemical perspective.

对于人工智能驱动的药物发现来说,利用稀缺的标记数据产生具有表达性的分子表征是一项挑战。主流研究通常遵循预先训练特定分子编码器然后对其进行微调的管道。然而,这些方法的重大挑战是:(1)忽略了分子内不同信息的传播;(2)在预训练策略中缺乏知识和化学约束。在这项研究中,我们提出了一个自适应多路径信息融合自监督框架(MIFS),从大规模未标记数据中探索分子表征,以帮助药物发现。在MIFS中,我们创新设计了专用分子图编码器moll - en,实现了原子到原子、化学键到原子、基团到原子三种信息传播途径,全面感知和捕获丰富的语义信息。此外,设计了一种基于分子支架的自适应预训练策略,对1100万个未标记分子进行了预训练。它通过构建拓扑对比损失来优化Mol-EN,以提供对分子结构的额外化学见解。随后,预训练的Mol-EN在14个广泛的药物发现基准数据集上进行微调,包括分子特性预测、药物-靶标相互作用和药物-药物相互作用。值得注意的是,为了进一步增强化学知识,我们在微调阶段引入了元素知识图(ElementKG)。大量的实验表明,MIFS在实现竞争性表现的同时,为化学角度的预测提供了合理的解释。
{"title":"MIFS: An adaptive multipath information fused self-supervised framework for drug discovery.","authors":"Xu Gong, Qun Liu, Rui Han, Yike Guo, Guoyin Wang","doi":"10.1016/j.neunet.2024.107088","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107088","url":null,"abstract":"<p><p>The production of expressive molecular representations with scarce labeled data is challenging for AI-driven drug discovery. Mainstream studies often follow a pipeline that pre-trains a specific molecular encoder and then fine-tunes it. However, the significant challenges of these methods are (1) neglecting the propagation of diverse information within molecules and (2) the absence of knowledge and chemical constraints in the pre-training strategy. In this study, we propose an adaptive multipath information fused self-supervised framework (MIFS) that explores molecular representations from large-scale unlabeled data to aid drug discovery. In MIFS, we innovatively design a dedicated molecular graph encoder called Mol-EN, which implements three pathways of information propagation: atom-to-atom, chemical bond-to-atom, and group-to-atom, to comprehensively perceive and capture abundant semantic information. Furthermore, a novel adaptive pre-training strategy based on molecular scaffolds is devised to pre-train Mol-EN on 11 million unlabeled molecules. It optimizes Mol-EN by constructing a topological contrastive loss to provide additional chemical insights into molecular structures. Subsequently, the pre-trained Mol-EN is fine-tuned on 14 widespread drug discovery benchmark datasets, including molecular properties prediction, drug-target interactions, and drug-drug interactions. Notably, to further enhance chemical knowledge, we introduce an elemental knowledge graph (ElementKG) in the fine-tuning phase. Extensive experiments show that MIFS achieves competitive performance while providing plausible explanations for predictions from a chemical perspective.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107088"},"PeriodicalIF":6.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal multi-modal knowledge graph generation for link prediction. 链接预测的时态多模态知识图生成。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-02 DOI: 10.1016/j.neunet.2024.107108
Yuandi Li, Hui Ji, Fei Yu, Lechao Cheng, Nan Che

Temporal Multi-Modal Knowledge Graphs (TMMKGs) can be regarded as a synthesis of Temporal Knowledge Graphs (TKGs) and Multi-Modal Knowledge Graphs (MMKGs), combining the characteristics of both. TMMKGs can effectively model dynamic real-world phenomena, particularly in scenarios involving multiple heterogeneous information sources and time series characteristics, such as e-commerce websites, scene recording data, and intelligent transportation systems. We propose a Temporal Multi-Modal Knowledge Graph Generation (TMMKGG) method that can automatically construct TMMKGs, aiming to reduce construction costs. To support this, we construct a dynamic Visual-Audio-Language Multimodal (VALM) dataset, which is particularly suitable for extracting structured knowledge in response to temporal multimodal perception data. TMMKGG explores temporal dynamics and cross-modal integration, enabling multimodal data processing for dynamic knowledge graph generation and utilizing alignment strategies to enhance scene perception. To validate the effectiveness of TMMKGG, we compare it with state-of-the-art dynamic graph generation methods using the VALM dataset. Furthermore, TMMKG exhibits a significant disparity in the ratio of newly introduced entities to their associated newly introduced edges compared to TKGs. Based on this phenomenon, we introduce a Temporal Multi-Modal Link Prediction (TMMLP) method, which outperforms existing state-of-the-art techniques.

时间多模态知识图(TMMKGs)可以看作是时间知识图(TKGs)和多模态知识图(MMKGs)的综合,结合了两者的特点。TMMKGs可以有效地模拟动态的现实世界现象,特别是在涉及多个异构信息源和时间序列特征的场景中,例如电子商务网站、场景记录数据和智能交通系统。本文提出了一种时序多模态知识图生成方法(TMMKGG),该方法可以自动构建时序多模态知识图,以降低构建成本。为了支持这一点,我们构建了一个动态的视觉-听觉-语言多模态(VALM)数据集,该数据集特别适合于根据时间多模态感知数据提取结构化知识。TMMKGG探索时间动态和跨模态集成,实现动态知识图生成的多模态数据处理,并利用对齐策略增强场景感知。为了验证TMMKGG的有效性,我们将其与使用VALM数据集的最先进的动态图生成方法进行了比较。此外,与tkg相比,TMMKG在新引入实体与其相关新引入边缘的比例方面表现出显着差异。基于这种现象,我们引入了一种时间多模态链接预测(tmlp)方法,该方法优于现有的最新技术。
{"title":"Temporal multi-modal knowledge graph generation for link prediction.","authors":"Yuandi Li, Hui Ji, Fei Yu, Lechao Cheng, Nan Che","doi":"10.1016/j.neunet.2024.107108","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107108","url":null,"abstract":"<p><p>Temporal Multi-Modal Knowledge Graphs (TMMKGs) can be regarded as a synthesis of Temporal Knowledge Graphs (TKGs) and Multi-Modal Knowledge Graphs (MMKGs), combining the characteristics of both. TMMKGs can effectively model dynamic real-world phenomena, particularly in scenarios involving multiple heterogeneous information sources and time series characteristics, such as e-commerce websites, scene recording data, and intelligent transportation systems. We propose a Temporal Multi-Modal Knowledge Graph Generation (TMMKGG) method that can automatically construct TMMKGs, aiming to reduce construction costs. To support this, we construct a dynamic Visual-Audio-Language Multimodal (VALM) dataset, which is particularly suitable for extracting structured knowledge in response to temporal multimodal perception data. TMMKGG explores temporal dynamics and cross-modal integration, enabling multimodal data processing for dynamic knowledge graph generation and utilizing alignment strategies to enhance scene perception. To validate the effectiveness of TMMKGG, we compare it with state-of-the-art dynamic graph generation methods using the VALM dataset. Furthermore, TMMKG exhibits a significant disparity in the ratio of newly introduced entities to their associated newly introduced edges compared to TKGs. Based on this phenomenon, we introduce a Temporal Multi-Modal Link Prediction (TMMLP) method, which outperforms existing state-of-the-art techniques.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107108"},"PeriodicalIF":6.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GSE: A global-local storage enhanced video object recognition model. GSE:一种全局-局部存储增强视频目标识别模型。
IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-01-02 DOI: 10.1016/j.neunet.2024.107109
Yuhong Shi, Hongguang Pan, Ze Jiang, Libin Zhang, Rui Miao, Zheng Wang, Xinyu Lei

The presence of substantial similarities and redundant information within video data limits the performance of video object recognition models. To address this issue, a Global-Local Storage Enhanced video object recognition model (GSE) is proposed in this paper. Firstly, the model incorporates a two-stage dynamic multi-frame aggregation module to aggregate shallow frame features. This module aggregates features in batches from each input video using feature extraction, dynamic multi-frame aggregation, and centralized concatenations, significantly reducing the model's computational burden while retaining key information. In addition, a Global-Local Storage (GS) module is constructed to retain and utilize the information in the frame sequence effectively. This module classifies features using a temporal difference threshold method and employs a processing approach of inheritance, storage, and output to filter and retain features. By integrating global, local and key features, the model can accurately capture important temporal features when facing complex video scenes. Subsequently, a Cascaded Multi-head Attention (CMA) mechanism is designed. The multi-head cascade structure in this mechanism progressively focuses on object features and explores the correlations between key and global, local features. The differential step attention calculation is used to ensure computational efficiency. Finally, we optimize the model structure and adjust parameters, and verify the GSE model performance through comprehensive experiments. Experimental results on the ImageNet 2015 and NPS-Drones datasets demonstrate that the GSE model achieves the highest mAP of 0.8352 and 0.8617, respectively. Compared with other models, the GSE model achieves a commendable balance across metrics such as precision, efficiency, and power consumption.

视频数据中大量相似性和冗余信息的存在限制了视频对象识别模型的性能。为了解决这一问题,本文提出了一种全局-局部存储增强视频目标识别模型(GSE)。首先,该模型采用两阶段动态多帧聚合模块对浅帧特征进行聚合;该模块通过特征提取、动态多帧聚合、集中拼接等方式对每个输入视频的特征进行批量聚合,在保留关键信息的同时显著降低了模型的计算负担。此外,为了有效地保留和利用帧序列中的信息,构造了全局局部存储模块。该模块采用时间差阈值法对特征进行分类,采用继承、存储、输出的处理方式对特征进行过滤和保留。该模型通过整合全局特征、局部特征和关键特征,能够在面对复杂视频场景时准确捕捉重要的时间特征。随后,设计了一个级联多头注意(CMA)机制。该机制中的多头级联结构逐步关注目标特征,探索关键特征与全局、局部特征之间的关系。为了保证计算效率,采用差分步长注意力计算。最后对模型结构进行了优化和参数调整,并通过综合实验验证了GSE模型的性能。在ImageNet 2015和NPS-Drones数据集上的实验结果表明,GSE模型的mAP值最高,分别为0.8352和0.8617。与其他模型相比,GSE模型在精度、效率和功耗等指标之间实现了值得称赞的平衡。
{"title":"GSE: A global-local storage enhanced video object recognition model.","authors":"Yuhong Shi, Hongguang Pan, Ze Jiang, Libin Zhang, Rui Miao, Zheng Wang, Xinyu Lei","doi":"10.1016/j.neunet.2024.107109","DOIUrl":"https://doi.org/10.1016/j.neunet.2024.107109","url":null,"abstract":"<p><p>The presence of substantial similarities and redundant information within video data limits the performance of video object recognition models. To address this issue, a Global-Local Storage Enhanced video object recognition model (GSE) is proposed in this paper. Firstly, the model incorporates a two-stage dynamic multi-frame aggregation module to aggregate shallow frame features. This module aggregates features in batches from each input video using feature extraction, dynamic multi-frame aggregation, and centralized concatenations, significantly reducing the model's computational burden while retaining key information. In addition, a Global-Local Storage (GS) module is constructed to retain and utilize the information in the frame sequence effectively. This module classifies features using a temporal difference threshold method and employs a processing approach of inheritance, storage, and output to filter and retain features. By integrating global, local and key features, the model can accurately capture important temporal features when facing complex video scenes. Subsequently, a Cascaded Multi-head Attention (CMA) mechanism is designed. The multi-head cascade structure in this mechanism progressively focuses on object features and explores the correlations between key and global, local features. The differential step attention calculation is used to ensure computational efficiency. Finally, we optimize the model structure and adjust parameters, and verify the GSE model performance through comprehensive experiments. Experimental results on the ImageNet 2015 and NPS-Drones datasets demonstrate that the GSE model achieves the highest mAP of 0.8352 and 0.8617, respectively. Compared with other models, the GSE model achieves a commendable balance across metrics such as precision, efficiency, and power consumption.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"184 ","pages":"107109"},"PeriodicalIF":6.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142957816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1