识别野生大熊猫行为的基于时间和空间注意力的联合变换器方法

IF 5.8 2区环境科学与生态学 Q1 ECOLOGY Ecological Informatics Pub Date : 2024-08-26 DOI:10.1016/j.ecoinf.2024.102797

Jing Liu , Jin Hou , Dan Liu , Qijun Zhao , Rui Chen , Xiaoyuan Chen , Vanessa Hull , Jindong Zhang , Jifeng Ning

{"title":"识别野生大熊猫行为的基于时间和空间注意力的联合变换器方法","authors":"Jing Liu , Jin Hou , Dan Liu , Qijun Zhao , Rui Chen , Xiaoyuan Chen , Vanessa Hull , Jindong Zhang , Jifeng Ning","doi":"10.1016/j.ecoinf.2024.102797","DOIUrl":null,"url":null,"abstract":"<div><p>Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S157495412400339X/pdfft?md5=789f7bb46c25667b7b6903e3a1edf5d4&pid=1-s2.0-S157495412400339X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A joint time and spatial attention-based transformer approach for recognizing the behaviors of wild giant pandas\",\"authors\":\"Jing Liu , Jin Hou , Dan Liu , Qijun Zhao , Rui Chen , Xiaoyuan Chen , Vanessa Hull , Jindong Zhang , Jifeng Ning\",\"doi\":\"10.1016/j.ecoinf.2024.102797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.</p></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S157495412400339X/pdfft?md5=789f7bb46c25667b7b6903e3a1edf5d4&pid=1-s2.0-S157495412400339X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S157495412400339X\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S157495412400339X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

野生大熊猫是中国特有的濒危物种，是保护工作的重点。大熊猫的行为反映了其健康状况和活动能力，对制定和实施保护措施具有重要作用。研究和开发基于深度学习的高效行为识别方法，可以大大推进野生大熊猫行为研究。本研究首次提出了一种基于变换器的行为识别方法--PandaFormer，该方法利用时间-空间注意力来分析行为的时间模式并估计活动空间。该方法集成了交叉融合递归时间编码和变换器模块等先进技术，可同时处理熊猫行为视频中的时间动态和空间关系。首先，我们设计了交叉融合递归时间编码，以有效表示行为的发生时间。利用变换器的多模态处理能力，我们将时间和视频标记输入变换器模块，以探索行为与发生时间之间的关系。其次，我们在视频帧之间引入相对时间权重，使模型能够学习顺序关系。最后，考虑到摄像机在拍摄过程中的固定位置，我们提出了一种基于熊猫活动区域估计的空间注意机制。为了验证模型的有效性，我们构建了一个野生大熊猫视频数据集，其中包含五种典型行为。在这个视频级注释数据集上对所提出的方法进行了评估。它的 Top-1 准确率达到 92.25%，平均类精度达到 91.19%，大大超过了最先进的行为识别算法。此外，消融实验验证了所提出的时间和空间注意力机制的有效性。总之，所提出的方法为研究大熊猫的行为提供了一种有效的途径，并有望应用于其他野生动物物种。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A joint time and spatial attention-based transformer approach for recognizing the behaviors of wild giant pandas

Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ecological Informatics 环境科学-生态学

CiteScore

8.30

自引率

11.80%

发文量

346

审稿时长

46 days

期刊介绍： The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.