Yamin Han , Jie Wu , Qi Zhang , Xilong Feng , Yang Xu , Taoping Zhang , Bowen Wang , Hongming Zhang
{"title":"ASGP-IDet: Temporal behaviour localisation of beef cattle in untrimmed surveillance videos","authors":"Yamin Han , Jie Wu , Qi Zhang , Xilong Feng , Yang Xu , Taoping Zhang , Bowen Wang , Hongming Zhang","doi":"10.1016/j.compag.2025.110059","DOIUrl":null,"url":null,"abstract":"<div><div>An accurate analysis of beef cattle behaviour provides valuable information about their important characteristics such as health status and fertility. Recent studies have utilised computer vision technologies to recognise beef cattle behaviour in trimmed videos with a single behaviour. However, these methods ignore the fact that surveillance videos in real farm circumstances are usually untrimmed and contain multiple behaviour instances and background scenes, which limit their applicability. To address this issue, we propose a temporal behaviour localisation method using aggregate scalable-granularity perception instance detection (ASGP-IDet) to localise beef cattle behaviours in untrimmed videos. It provides semantic information, such as “ when does a specific behaviour start and end?” and “ duration of a specific behaviour”. To this end, a feature pyramid with ASGP blocks was designed to aggregate information across different temporal granularities. The trident head was then employed to achieve precise behaviour boundary predictions, and the classification head was used to predict the behaviour category of the instance. Finally, a novel centre–start–end instant offset loss (CSEIO Loss) is proposed for correct offsets at the start, end, and temporal centre of behaviours. Experiments on the newly collected Cattle Temporal Action dataset demonstrated that ASGP-IDet outperformed other state-of-the-art approaches. It achieved mAP scores of 93.93%, 93.74%, 93.22%, 92.29%, and 87.46% at tIoU thresholds [0.3:0.7:0.1], specifically, an average mAP of 92.13%, and an average processing time of 92.9 ms per video. These findings introduce an efficient method for localising the temporal behaviour of beef cattle in untrimmed farm surveillance videos and further support precision livestock farming.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"232 ","pages":"Article 110059"},"PeriodicalIF":7.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925001656","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
An accurate analysis of beef cattle behaviour provides valuable information about their important characteristics such as health status and fertility. Recent studies have utilised computer vision technologies to recognise beef cattle behaviour in trimmed videos with a single behaviour. However, these methods ignore the fact that surveillance videos in real farm circumstances are usually untrimmed and contain multiple behaviour instances and background scenes, which limit their applicability. To address this issue, we propose a temporal behaviour localisation method using aggregate scalable-granularity perception instance detection (ASGP-IDet) to localise beef cattle behaviours in untrimmed videos. It provides semantic information, such as “ when does a specific behaviour start and end?” and “ duration of a specific behaviour”. To this end, a feature pyramid with ASGP blocks was designed to aggregate information across different temporal granularities. The trident head was then employed to achieve precise behaviour boundary predictions, and the classification head was used to predict the behaviour category of the instance. Finally, a novel centre–start–end instant offset loss (CSEIO Loss) is proposed for correct offsets at the start, end, and temporal centre of behaviours. Experiments on the newly collected Cattle Temporal Action dataset demonstrated that ASGP-IDet outperformed other state-of-the-art approaches. It achieved mAP scores of 93.93%, 93.74%, 93.22%, 92.29%, and 87.46% at tIoU thresholds [0.3:0.7:0.1], specifically, an average mAP of 92.13%, and an average processing time of 92.9 ms per video. These findings introduce an efficient method for localising the temporal behaviour of beef cattle in untrimmed farm surveillance videos and further support precision livestock farming.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.