Pub Date : 2026-02-06DOI: 10.1007/s11263-025-02709-8
Mitchell Rogers, Kobe Knowles, Gaël Gendron, Shahrokh Heidari, Isla Duporge, David Arturo Soriano Valdez, Mihailo Azhar, Padriac O’Leary, Simon Eyre, Michael Witbrock, Patrice Delmas
Recent advances in deep learning have greatly enhanced the accuracy and scalability of animal re-identification by automating the extraction of subtle distinguishing features from images and videos. This enables large-scale, non-invasive monitoring of animal populations. This article proposes a segmentation pipeline and a re-identification model to identify animals without ground-truth IDs. The segmentation pipeline isolates animals from the background using bounding boxes and leverages the DINOv2 and Segment Anything Model 2 (SAM2) foundation models. For re-identification, Recurrence over Video Frames (RoVF) is introduced, a novel approach that employs a recurrent component based on the Perceiver transformer atop a DINOv2 image model, iteratively refining embeddings from video frames. The proposed methods are evaluated on video datasets of meerkats and polar bears (PolarBearVidID). The proposed segmentation model achieved high accuracy (94.36% and 97.26%) and IoU (73.14% and 92.77%) for meerkats and polar bears, respectively. RoVF outperformed frame- and video-based re-identification baselines, achieving a top-1 accuracy of 46.5% and 55% on masked test sets for meerkats and polar bears, respectively, as well as higher top-3 accuracy. These results highlight the potential of the proposed approach to reduce annotation burdens in future individual-based ecological studies. The code is available at https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification .
深度学习的最新进展通过自动从图像和视频中提取细微的区分特征,大大提高了动物再识别的准确性和可扩展性。这使得对动物种群的大规模、非侵入性监测成为可能。本文提出了一个分割管道和一个重新识别模型来识别没有真实id的动物。分割管道使用边界框将动物从背景中分离出来,并利用DINOv2和Segment Anything Model 2 (SAM2)基础模型。为了重新识别,引入了视频帧上的递归(RoVF),这是一种新颖的方法,它采用基于DINOv2图像模型之上的感知器转换器的递归组件,迭代地从视频帧中细化嵌入。在猫鼬和北极熊的视频数据集(PolarBearVidID)上对所提出的方法进行了评估。该分割模型对猫鼬和北极熊的分割准确率分别为94.36%和97.26%,IoU分别为73.14%和92.77%。RoVF优于基于帧和视频的重新识别基线,在猫鼬和北极熊的蒙面测试集上分别实现了46.5%和55%的前1名准确率,以及更高的前3名准确率。这些结果突出了该方法在未来基于个体的生态学研究中减少注释负担的潜力。代码可在https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification上获得。
{"title":"Recurrence over Video Frames (RoVF) for Animal Re-identification","authors":"Mitchell Rogers, Kobe Knowles, Gaël Gendron, Shahrokh Heidari, Isla Duporge, David Arturo Soriano Valdez, Mihailo Azhar, Padriac O’Leary, Simon Eyre, Michael Witbrock, Patrice Delmas","doi":"10.1007/s11263-025-02709-8","DOIUrl":"https://doi.org/10.1007/s11263-025-02709-8","url":null,"abstract":"Recent advances in deep learning have greatly enhanced the accuracy and scalability of animal re-identification by automating the extraction of subtle distinguishing features from images and videos. This enables large-scale, non-invasive monitoring of animal populations. This article proposes a segmentation pipeline and a re-identification model to identify animals without ground-truth IDs. The segmentation pipeline isolates animals from the background using bounding boxes and leverages the DINOv2 and Segment Anything Model 2 (SAM2) foundation models. For re-identification, Recurrence over Video Frames (RoVF) is introduced, a novel approach that employs a recurrent component based on the Perceiver transformer atop a DINOv2 image model, iteratively refining embeddings from video frames. The proposed methods are evaluated on video datasets of meerkats and polar bears (PolarBearVidID). The proposed segmentation model achieved high accuracy (94.36% and 97.26%) and IoU (73.14% and 92.77%) for meerkats and polar bears, respectively. RoVF outperformed frame- and video-based re-identification baselines, achieving a top-1 accuracy of 46.5% and 55% on masked test sets for meerkats and polar bears, respectively, as well as higher top-3 accuracy. These results highlight the potential of the proposed approach to reduce annotation burdens in future individual-based ecological studies. The code is available at <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification\" ext-link-type=\"uri\">https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification</jats:ext-link> .","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"4 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1007/s11263-025-02667-1
Songtao Li, Hao Tang
{"title":"Multimodal Alignment and Fusion: A Survey","authors":"Songtao Li, Hao Tang","doi":"10.1007/s11263-025-02667-1","DOIUrl":"https://doi.org/10.1007/s11263-025-02667-1","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"46 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1007/s11263-025-02689-9
Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
{"title":"High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling","authors":"Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu","doi":"10.1007/s11263-025-02689-9","DOIUrl":"https://doi.org/10.1007/s11263-025-02689-9","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"1 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1007/s11263-025-02663-5
Thang-Anh-Quan Nguyen, Amine Bourki, Mátyás Macudzinski, Anthony Brunel, Mohammed Bennamoun
{"title":"Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review","authors":"Thang-Anh-Quan Nguyen, Amine Bourki, Mátyás Macudzinski, Anthony Brunel, Mohammed Bennamoun","doi":"10.1007/s11263-025-02663-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02663-5","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"92 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads","authors":"Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Mohamed Daoudi, Stefano Berretti","doi":"10.1007/s11263-025-02726-7","DOIUrl":"https://doi.org/10.1007/s11263-025-02726-7","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"59 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-06DOI: 10.1007/s11263-025-02711-0
Anne Marthe Sophie Ngo Bibinbe, Chiron Bang, Patrick Gagnon, Jamie Ahloy-Dallaire, Eric R. Paquet
{"title":"An HMM-Based Framework for Identity-Aware Long-Term Multi-Object Tracking From Sparse and Uncertain Identification: Use Case on Long-Term Tracking in Livestock","authors":"Anne Marthe Sophie Ngo Bibinbe, Chiron Bang, Patrick Gagnon, Jamie Ahloy-Dallaire, Eric R. Paquet","doi":"10.1007/s11263-025-02711-0","DOIUrl":"https://doi.org/10.1007/s11263-025-02711-0","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"91 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}