ASGP-IDet: Temporal behaviour localisation of beef cattle in untrimmed surveillance videos

IF 7.7 1区农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY Computers and Electronics in Agriculture Pub Date : 2025-02-18 DOI:10.1016/j.compag.2025.110059

Yamin Han , Jie Wu , Qi Zhang , Xilong Feng , Yang Xu , Taoping Zhang , Bowen Wang , Hongming Zhang

{"title":"ASGP-IDet: Temporal behaviour localisation of beef cattle in untrimmed surveillance videos","authors":"Yamin Han , Jie Wu , Qi Zhang , Xilong Feng , Yang Xu , Taoping Zhang , Bowen Wang , Hongming Zhang","doi":"10.1016/j.compag.2025.110059","DOIUrl":null,"url":null,"abstract":"<div><div>An accurate analysis of beef cattle behaviour provides valuable information about their important characteristics such as health status and fertility. Recent studies have utilised computer vision technologies to recognise beef cattle behaviour in trimmed videos with a single behaviour. However, these methods ignore the fact that surveillance videos in real farm circumstances are usually untrimmed and contain multiple behaviour instances and background scenes, which limit their applicability. To address this issue, we propose a temporal behaviour localisation method using aggregate scalable-granularity perception instance detection (ASGP-IDet) to localise beef cattle behaviours in untrimmed videos. It provides semantic information, such as “ when does a specific behaviour start and end?” and “ duration of a specific behaviour”. To this end, a feature pyramid with ASGP blocks was designed to aggregate information across different temporal granularities. The trident head was then employed to achieve precise behaviour boundary predictions, and the classification head was used to predict the behaviour category of the instance. Finally, a novel centre–start–end instant offset loss (CSEIO Loss) is proposed for correct offsets at the start, end, and temporal centre of behaviours. Experiments on the newly collected Cattle Temporal Action dataset demonstrated that ASGP-IDet outperformed other state-of-the-art approaches. It achieved mAP scores of 93.93%, 93.74%, 93.22%, 92.29%, and 87.46% at tIoU thresholds [0.3:0.7:0.1], specifically, an average mAP of 92.13%, and an average processing time of 92.9 ms per video. These findings introduce an efficient method for localising the temporal behaviour of beef cattle in untrimmed farm surveillance videos and further support precision livestock farming.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"232 ","pages":"Article 110059"},"PeriodicalIF":7.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925001656","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

An accurate analysis of beef cattle behaviour provides valuable information about their important characteristics such as health status and fertility. Recent studies have utilised computer vision technologies to recognise beef cattle behaviour in trimmed videos with a single behaviour. However, these methods ignore the fact that surveillance videos in real farm circumstances are usually untrimmed and contain multiple behaviour instances and background scenes, which limit their applicability. To address this issue, we propose a temporal behaviour localisation method using aggregate scalable-granularity perception instance detection (ASGP-IDet) to localise beef cattle behaviours in untrimmed videos. It provides semantic information, such as “ when does a specific behaviour start and end?” and “ duration of a specific behaviour”. To this end, a feature pyramid with ASGP blocks was designed to aggregate information across different temporal granularities. The trident head was then employed to achieve precise behaviour boundary predictions, and the classification head was used to predict the behaviour category of the instance. Finally, a novel centre–start–end instant offset loss (CSEIO Loss) is proposed for correct offsets at the start, end, and temporal centre of behaviours. Experiments on the newly collected Cattle Temporal Action dataset demonstrated that ASGP-IDet outperformed other state-of-the-art approaches. It achieved mAP scores of 93.93%, 93.74%, 93.22%, 92.29%, and 87.46% at tIoU thresholds [0.3:0.7:0.1], specifically, an average mAP of 92.13%, and an average processing time of 92.9 ms per video. These findings introduce an efficient method for localising the temporal behaviour of beef cattle in untrimmed farm surveillance videos and further support precision livestock farming.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Computers and Electronics in Agriculture 工程技术-计算机：跨学科应用

CiteScore

15.30

自引率

14.50%

发文量

800

审稿时长

62 days

期刊介绍： Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.