A discriminative multi-modal adaptation neural network model for video action recognition.

IF 6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neural Networks Pub Date : 2025-01-03 DOI:10.1016/j.neunet.2024.107114
Lei Gao, Kai Liu, Ling Guan
{"title":"A discriminative multi-modal adaptation neural network model for video action recognition.","authors":"Lei Gao, Kai Liu, Ling Guan","doi":"10.1016/j.neunet.2024.107114","DOIUrl":null,"url":null,"abstract":"<p><p>Research on video-based understanding and learning has attracted widespread interest and has been adopted in various real applications, such as e-healthcare, action recognition, affective computing, to name a few. Amongst them, video-based action recognition is one of the most representative examples. With the advancement of multi-sensory technology, action recognition using multi-modal data has recently drawn wide attention. However, the research community faces new challenges in effectively exploring and utilizing the discriminative and complementary information across different modalities. Although score level fusion approaches have been popularly employed for multi-modal action recognition, they simply add the scores derived separately from different modalities without proper consideration of cross-modality semantics amongst multiple input data sources, invariably causing sub-optimal performance. To address this issue, this paper presents a two-stream heterogeneous network to extract and jointly process complementary features derived from RGB and skeleton modalities, respectively. Then, a discriminative multi-modal adaptation neural network model (DMANNM) is proposed and applied to the heterogeneous network, by integrating statistical machine learning (SML) principles with convolutional neural network (CNN) architecture. In addition, to achieve high recognition accuracy by the generated multi-modal structure, an effective nonlinear classification algorithm is presented in this work. Leveraging the joint strength of SML and CNN architecture, the proposed model forms an adaptive platform for handling datasets of different scales. To demonstrate the effectiveness and the generic nature of the proposed model, we conducted experiments on four popular video-based action recognition datasets with different scales: NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA (N-UCLA), and SYSU. The experimental results show the superiority of the proposed method over state-of-the-art compared.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"107114"},"PeriodicalIF":6.0000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.neunet.2024.107114","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Research on video-based understanding and learning has attracted widespread interest and has been adopted in various real applications, such as e-healthcare, action recognition, affective computing, to name a few. Amongst them, video-based action recognition is one of the most representative examples. With the advancement of multi-sensory technology, action recognition using multi-modal data has recently drawn wide attention. However, the research community faces new challenges in effectively exploring and utilizing the discriminative and complementary information across different modalities. Although score level fusion approaches have been popularly employed for multi-modal action recognition, they simply add the scores derived separately from different modalities without proper consideration of cross-modality semantics amongst multiple input data sources, invariably causing sub-optimal performance. To address this issue, this paper presents a two-stream heterogeneous network to extract and jointly process complementary features derived from RGB and skeleton modalities, respectively. Then, a discriminative multi-modal adaptation neural network model (DMANNM) is proposed and applied to the heterogeneous network, by integrating statistical machine learning (SML) principles with convolutional neural network (CNN) architecture. In addition, to achieve high recognition accuracy by the generated multi-modal structure, an effective nonlinear classification algorithm is presented in this work. Leveraging the joint strength of SML and CNN architecture, the proposed model forms an adaptive platform for handling datasets of different scales. To demonstrate the effectiveness and the generic nature of the proposed model, we conducted experiments on four popular video-based action recognition datasets with different scales: NTU RGB+D, NTU RGB+D 120, Northwestern-UCLA (N-UCLA), and SYSU. The experimental results show the superiority of the proposed method over state-of-the-art compared.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
视频动作识别的判别多模态自适应神经网络模型。
基于视频的理解和学习的研究已经引起了广泛的兴趣,并已被应用于各种实际应用中,如电子医疗、动作识别、情感计算等。其中,基于视频的动作识别是最具代表性的例子之一。随着多感官技术的发展,基于多模态数据的动作识别近年来受到广泛关注。然而,如何有效地探索和利用不同模式下的区别性和互补性信息,面临着新的挑战。尽管分数水平融合方法已被广泛用于多模态动作识别,但它们只是简单地将不同模态分别得出的分数相加,而没有适当考虑多个输入数据源之间的跨模态语义,总是导致性能次优。为了解决这一问题,本文提出了一种两流异构网络,分别从RGB和骨架模式中提取和联合处理互补特征。然后,将统计机器学习(SML)原理与卷积神经网络(CNN)结构相结合,提出了一种判别式多模态自适应神经网络模型(DMANNM),并将其应用于异构网络。此外,为了利用生成的多模态结构达到较高的识别精度,本文提出了一种有效的非线性分类算法。该模型利用SML和CNN架构的联合优势,形成了一个处理不同规模数据集的自适应平台。为了证明该模型的有效性和通用性,我们在四个流行的基于视频的动作识别数据集上进行了实验,这些数据集具有不同的尺度:NTU RGB+D, NTU RGB+D 120,西北加州大学洛杉矶分校(N-UCLA)和SYSU。实验结果表明,该方法与现有方法相比具有优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
期刊最新文献
Estimating global phase synchronization by quantifying multivariate mutual information and detecting network structure. Event-based adaptive fixed-time optimal control for saturated fault-tolerant nonlinear multiagent systems via reinforcement learning algorithm. Lie group convolution neural networks with scale-rotation equivariance. Multi-hop interpretable meta learning for few-shot temporal knowledge graph completion. An object detection-based model for automated screening of stem-cells senescence during drug screening.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1