Jing Liu , Jin Hou , Dan Liu , Qijun Zhao , Rui Chen , Xiaoyuan Chen , Vanessa Hull , Jindong Zhang , Jifeng Ning
{"title":"识别野生大熊猫行为的基于时间和空间注意力的联合变换器方法","authors":"Jing Liu , Jin Hou , Dan Liu , Qijun Zhao , Rui Chen , Xiaoyuan Chen , Vanessa Hull , Jindong Zhang , Jifeng Ning","doi":"10.1016/j.ecoinf.2024.102797","DOIUrl":null,"url":null,"abstract":"<div><p>Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S157495412400339X/pdfft?md5=789f7bb46c25667b7b6903e3a1edf5d4&pid=1-s2.0-S157495412400339X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A joint time and spatial attention-based transformer approach for recognizing the behaviors of wild giant pandas\",\"authors\":\"Jing Liu , Jin Hou , Dan Liu , Qijun Zhao , Rui Chen , Xiaoyuan Chen , Vanessa Hull , Jindong Zhang , Jifeng Ning\",\"doi\":\"10.1016/j.ecoinf.2024.102797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.</p></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S157495412400339X/pdfft?md5=789f7bb46c25667b7b6903e3a1edf5d4&pid=1-s2.0-S157495412400339X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S157495412400339X\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S157495412400339X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
A joint time and spatial attention-based transformer approach for recognizing the behaviors of wild giant pandas
Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.
期刊介绍:
The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change.
The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.