首页 > 最新文献

International Journal of Computer Vision最新文献

英文 中文
Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input 袋鼠:支持长上下文视频输入的强大视频语言模型
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-09 DOI: 10.1007/s11263-025-02620-2
Jiajun Liu, Yibing Wang, Hanghang Ma, Xiaoping Wu, Xiaoqi Ma, Xiaoming Wei, Jianbin Jiao, Enhua Wu, Jie Hu
{"title":"Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input","authors":"Jiajun Liu, Yibing Wang, Hanghang Ma, Xiaoping Wu, Xiaoqi Ma, Xiaoming Wei, Jianbin Jiao, Enhua Wu, Jie Hu","doi":"10.1007/s11263-025-02620-2","DOIUrl":"https://doi.org/10.1007/s11263-025-02620-2","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"93 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146146041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation 用于文本到图像生成的学习自适应包容性标记
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-025-02727-6
Xinyu Hou, Xiaoming Li, Chen Change Loy
{"title":"AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation","authors":"Xinyu Hou, Xiaoming Li, Chen Change Loy","doi":"10.1007/s11263-025-02727-6","DOIUrl":"https://doi.org/10.1007/s11263-025-02727-6","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"73 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recurrence over Video Frames (RoVF) for Animal Re-identification 基于视频帧递归(RoVF)的动物再识别
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-025-02709-8
Mitchell Rogers, Kobe Knowles, Gaël Gendron, Shahrokh Heidari, Isla Duporge, David Arturo Soriano Valdez, Mihailo Azhar, Padriac O’Leary, Simon Eyre, Michael Witbrock, Patrice Delmas
Recent advances in deep learning have greatly enhanced the accuracy and scalability of animal re-identification by automating the extraction of subtle distinguishing features from images and videos. This enables large-scale, non-invasive monitoring of animal populations. This article proposes a segmentation pipeline and a re-identification model to identify animals without ground-truth IDs. The segmentation pipeline isolates animals from the background using bounding boxes and leverages the DINOv2 and Segment Anything Model 2 (SAM2) foundation models. For re-identification, Recurrence over Video Frames (RoVF) is introduced, a novel approach that employs a recurrent component based on the Perceiver transformer atop a DINOv2 image model, iteratively refining embeddings from video frames. The proposed methods are evaluated on video datasets of meerkats and polar bears (PolarBearVidID). The proposed segmentation model achieved high accuracy (94.36% and 97.26%) and IoU (73.14% and 92.77%) for meerkats and polar bears, respectively. RoVF outperformed frame- and video-based re-identification baselines, achieving a top-1 accuracy of 46.5% and 55% on masked test sets for meerkats and polar bears, respectively, as well as higher top-3 accuracy. These results highlight the potential of the proposed approach to reduce annotation burdens in future individual-based ecological studies. The code is available at https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification .
深度学习的最新进展通过自动从图像和视频中提取细微的区分特征,大大提高了动物再识别的准确性和可扩展性。这使得对动物种群的大规模、非侵入性监测成为可能。本文提出了一个分割管道和一个重新识别模型来识别没有真实id的动物。分割管道使用边界框将动物从背景中分离出来,并利用DINOv2和Segment Anything Model 2 (SAM2)基础模型。为了重新识别,引入了视频帧上的递归(RoVF),这是一种新颖的方法,它采用基于DINOv2图像模型之上的感知器转换器的递归组件,迭代地从视频帧中细化嵌入。在猫鼬和北极熊的视频数据集(PolarBearVidID)上对所提出的方法进行了评估。该分割模型对猫鼬和北极熊的分割准确率分别为94.36%和97.26%,IoU分别为73.14%和92.77%。RoVF优于基于帧和视频的重新识别基线,在猫鼬和北极熊的蒙面测试集上分别实现了46.5%和55%的前1名准确率,以及更高的前3名准确率。这些结果突出了该方法在未来基于个体的生态学研究中减少注释负担的潜力。代码可在https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification上获得。
{"title":"Recurrence over Video Frames (RoVF) for Animal Re-identification","authors":"Mitchell Rogers, Kobe Knowles, Gaël Gendron, Shahrokh Heidari, Isla Duporge, David Arturo Soriano Valdez, Mihailo Azhar, Padriac O’Leary, Simon Eyre, Michael Witbrock, Patrice Delmas","doi":"10.1007/s11263-025-02709-8","DOIUrl":"https://doi.org/10.1007/s11263-025-02709-8","url":null,"abstract":"Recent advances in deep learning have greatly enhanced the accuracy and scalability of animal re-identification by automating the extraction of subtle distinguishing features from images and videos. This enables large-scale, non-invasive monitoring of animal populations. This article proposes a segmentation pipeline and a re-identification model to identify animals without ground-truth IDs. The segmentation pipeline isolates animals from the background using bounding boxes and leverages the DINOv2 and Segment Anything Model 2 (SAM2) foundation models. For re-identification, Recurrence over Video Frames (RoVF) is introduced, a novel approach that employs a recurrent component based on the Perceiver transformer atop a DINOv2 image model, iteratively refining embeddings from video frames. The proposed methods are evaluated on video datasets of meerkats and polar bears (PolarBearVidID). The proposed segmentation model achieved high accuracy (94.36% and 97.26%) and IoU (73.14% and 92.77%) for meerkats and polar bears, respectively. RoVF outperformed frame- and video-based re-identification baselines, achieving a top-1 accuracy of 46.5% and 55% on masked test sets for meerkats and polar bears, respectively, as well as higher top-3 accuracy. These results highlight the potential of the proposed approach to reduce annotation burdens in future individual-based ecological studies. The code is available at <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification\" ext-link-type=\"uri\">https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification</jats:ext-link> .","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"4 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoIT: Automated Image Tagging with Random Perturbation AutoIT:自动图像标记随机扰动
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-026-02737-y
Xuelin Zhu, Jianshu Li, Jian Liu, Dongqi Tang, Jiawei Ge, Weijia Liu, Bo Liu, Jiuxin Cao
{"title":"AutoIT: Automated Image Tagging with Random Perturbation","authors":"Xuelin Zhu, Jianshu Li, Jian Liu, Dongqi Tang, Jiawei Ge, Weijia Liu, Bo Liu, Jiuxin Cao","doi":"10.1007/s11263-026-02737-y","DOIUrl":"https://doi.org/10.1007/s11263-026-02737-y","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"59 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait 解锁姿态多样性:准确高效的基于隐式关键点的音频驱动谈话肖像时空扩散
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-025-02695-x
Chaolong Yang, Kai Yao, Yuyao Yan, Chenru Jiang, Weiguang Zhao, Jie Sun, Guangliang Cheng, Yifei Zhang, Bin Dong, Kaizhu Huang
{"title":"Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait","authors":"Chaolong Yang, Kai Yao, Yuyao Yan, Chenru Jiang, Weiguang Zhao, Jie Sun, Guangliang Cheng, Yifei Zhang, Bin Dong, Kaizhu Huang","doi":"10.1007/s11263-025-02695-x","DOIUrl":"https://doi.org/10.1007/s11263-025-02695-x","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"1 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Alignment and Fusion: A Survey 多模态对齐与融合:综述
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-025-02667-1
Songtao Li, Hao Tang
{"title":"Multimodal Alignment and Fusion: A Survey","authors":"Songtao Li, Hao Tang","doi":"10.1007/s11263-025-02667-1","DOIUrl":"https://doi.org/10.1007/s11263-025-02667-1","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"46 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling 高质量的声音分离跨不同类别通过视觉引导生成建模
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-025-02689-9
Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
{"title":"High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling","authors":"Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu","doi":"10.1007/s11263-025-02689-9","DOIUrl":"https://doi.org/10.1007/s11263-025-02689-9","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"1 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review 用于视觉场景理解的语义感知神经辐射场:综述
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-025-02663-5
Thang-Anh-Quan Nguyen, Amine Bourki, Mátyás Macudzinski, Anthony Brunel, Mohammed Bennamoun
{"title":"Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review","authors":"Thang-Anh-Quan Nguyen, Amine Bourki, Mátyás Macudzinski, Anthony Brunel, Mohammed Bennamoun","doi":"10.1007/s11263-025-02663-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02663-5","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"92 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads 超越固定拓扑:未注册的培训和3D说话头的综合评估指标
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-025-02726-7
Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Mohamed Daoudi, Stefano Berretti
{"title":"Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads","authors":"Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Mohamed Daoudi, Stefano Berretti","doi":"10.1007/s11263-025-02726-7","DOIUrl":"https://doi.org/10.1007/s11263-025-02726-7","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"59 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An HMM-Based Framework for Identity-Aware Long-Term Multi-Object Tracking From Sparse and Uncertain Identification: Use Case on Long-Term Tracking in Livestock 基于hmm的稀疏不确定识别的身份感知长期多目标跟踪框架:家畜长期跟踪用例
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-06 DOI: 10.1007/s11263-025-02711-0
Anne Marthe Sophie Ngo Bibinbe, Chiron Bang, Patrick Gagnon, Jamie Ahloy-Dallaire, Eric R. Paquet
{"title":"An HMM-Based Framework for Identity-Aware Long-Term Multi-Object Tracking From Sparse and Uncertain Identification: Use Case on Long-Term Tracking in Livestock","authors":"Anne Marthe Sophie Ngo Bibinbe, Chiron Bang, Patrick Gagnon, Jamie Ahloy-Dallaire, Eric R. Paquet","doi":"10.1007/s11263-025-02711-0","DOIUrl":"https://doi.org/10.1007/s11263-025-02711-0","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"91 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1