动物行为分析视频基础模型

Jennifer J Sun, Hao Zhou, Long Zhao, Liangzhe Yuan, Bryan Seybold, David Hendon, Florian Schroff, David A Ross, Hartwig Adam, Bo Hu, Ting Liu
{"title":"动物行为分析视频基础模型","authors":"Jennifer J Sun, Hao Zhou, Long Zhao, Liangzhe Yuan, Bryan Seybold, David Hendon, Florian Schroff, David A Ross, Hartwig Adam, Bo Hu, Ting Liu","doi":"10.1101/2024.07.30.605655","DOIUrl":null,"url":null,"abstract":"Computational approaches leveraging computer vision and machine learning have transformed the quantification of animal behavior from video. However, existing methods often rely on task-specific features or models, which struggle to generalize across diverse datasets and tasks. Recent advances in machine learning, particularly the emergence of vision foundation models, i.e., large-scale models pre-trained on massive, diverse visual repositories, offers a way to tackle these challenges. Here, we investigate the potential of frozen video foundation models across a range of behavior analysis tasks, including classification, retrieval, and localization. We use a single, frozen model to extract general-purpose representations from video data, and perform extensive evaluations on diverse open-sourced animal behavior datasets. Our results demonstrate that features with minimal adaptation from foundation models achieve competitive performance compared to existing methods specifically designed for each dataset, across species, behaviors, and experimental contexts. This highlights the potential of frozen video foundation models as a powerful and accessible backbone for automated behavior analysis, with the ability to accelerate research across diverse fields from neuroscience, to ethology, and to ecology.","PeriodicalId":501210,"journal":{"name":"bioRxiv - Animal Behavior and Cognition","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video Foundation Models for Animal Behavior Analysis\",\"authors\":\"Jennifer J Sun, Hao Zhou, Long Zhao, Liangzhe Yuan, Bryan Seybold, David Hendon, Florian Schroff, David A Ross, Hartwig Adam, Bo Hu, Ting Liu\",\"doi\":\"10.1101/2024.07.30.605655\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational approaches leveraging computer vision and machine learning have transformed the quantification of animal behavior from video. However, existing methods often rely on task-specific features or models, which struggle to generalize across diverse datasets and tasks. Recent advances in machine learning, particularly the emergence of vision foundation models, i.e., large-scale models pre-trained on massive, diverse visual repositories, offers a way to tackle these challenges. Here, we investigate the potential of frozen video foundation models across a range of behavior analysis tasks, including classification, retrieval, and localization. We use a single, frozen model to extract general-purpose representations from video data, and perform extensive evaluations on diverse open-sourced animal behavior datasets. Our results demonstrate that features with minimal adaptation from foundation models achieve competitive performance compared to existing methods specifically designed for each dataset, across species, behaviors, and experimental contexts. This highlights the potential of frozen video foundation models as a powerful and accessible backbone for automated behavior analysis, with the ability to accelerate research across diverse fields from neuroscience, to ethology, and to ecology.\",\"PeriodicalId\":501210,\"journal\":{\"name\":\"bioRxiv - Animal Behavior and Cognition\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Animal Behavior and Cognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.07.30.605655\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Animal Behavior and Cognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.30.605655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

利用计算机视觉和机器学习的计算方法改变了从视频中量化动物行为的方法。然而,现有的方法往往依赖于特定任务的特征或模型,很难在不同的数据集和任务中实现通用化。机器学习领域的最新进展,尤其是视觉基础模型的出现,即在大规模、多样化的视觉资源库中预先训练的大规模模型,为应对这些挑战提供了一种方法。在这里,我们研究了冷冻视频基础模型在一系列行为分析任务中的潜力,包括分类、检索和定位。我们使用单一的冻结模型从视频数据中提取通用表征,并在不同的开源动物行为数据集上进行了广泛的评估。我们的结果表明,与专门为每个数据集设计的现有方法相比,只需对基础模型进行最小化的调整,就能在不同物种、行为和实验环境下获得具有竞争力的性能。这凸显了冷冻视频基础模型的潜力,它是自动行为分析强大而易用的支柱,能够加速从神经科学、伦理学到生态学等不同领域的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Video Foundation Models for Animal Behavior Analysis
Computational approaches leveraging computer vision and machine learning have transformed the quantification of animal behavior from video. However, existing methods often rely on task-specific features or models, which struggle to generalize across diverse datasets and tasks. Recent advances in machine learning, particularly the emergence of vision foundation models, i.e., large-scale models pre-trained on massive, diverse visual repositories, offers a way to tackle these challenges. Here, we investigate the potential of frozen video foundation models across a range of behavior analysis tasks, including classification, retrieval, and localization. We use a single, frozen model to extract general-purpose representations from video data, and perform extensive evaluations on diverse open-sourced animal behavior datasets. Our results demonstrate that features with minimal adaptation from foundation models achieve competitive performance compared to existing methods specifically designed for each dataset, across species, behaviors, and experimental contexts. This highlights the potential of frozen video foundation models as a powerful and accessible backbone for automated behavior analysis, with the ability to accelerate research across diverse fields from neuroscience, to ethology, and to ecology.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Emotional contexts influence vocal individuality in ungulates Athene cunicularia hypugaea wintering in a central California urban setting arrive later, leave earlier, prefer sheltered micro-habitat, tolerate rain, and contend with diverse predators Monkeys Predict US Elections Meat transfers follow social ties in the multi-level society of Guinea baboons but are not related to male reproductive success Jumping spiders are not fooled by the peripheral drift illusion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1