Pub Date : 2026-01-13DOI: 10.1007/s11263-025-02664-4
Toby Perrett, Tengda Han, Dima Damen, Andrew Zisserman
Long videos contain many repeating actions, events and shots. These repetitions are frequently given identical captions, which makes it difficult to retrieve the exact desired clip using a text search. In this paper, we formulate the problem of unique captioning: Given multiple clips with the same caption, we generate a new caption for each clip that uniquely identifies it. We propose Captioning by Discriminative Prompting (CDP), which predicts a property that can separate identically captioned clips, and use it to generate unique captions. We introduce two benchmarks for unique captioning, based on egocentric footage and timeloop movies – where repeating actions are common. We demonstrate that captions generated by CDP improve text-to-video R@1 by 15% for egocentric videos and 10% in timeloop movies. https://tobyperrett.github.io/its-just-another-day
{"title":"It’s Just Another Day: Unique Video Captioning by Discriminitive Prompting","authors":"Toby Perrett, Tengda Han, Dima Damen, Andrew Zisserman","doi":"10.1007/s11263-025-02664-4","DOIUrl":"https://doi.org/10.1007/s11263-025-02664-4","url":null,"abstract":"Long videos contain many repeating actions, events and shots. These repetitions are frequently given identical captions, which makes it difficult to retrieve the exact desired clip using a text search. In this paper, we formulate the problem of unique captioning: Given multiple clips with the same caption, we generate a new caption for each clip that uniquely identifies it. We propose Captioning by Discriminative Prompting (CDP), which predicts a property that can separate identically captioned clips, and use it to generate unique captions. We introduce two benchmarks for unique captioning, based on egocentric footage and timeloop movies – where repeating actions are common. We demonstrate that captions generated by CDP improve text-to-video R@1 by 15% for egocentric videos and 10% in timeloop movies. <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://tobyperrett.github.io/its-just-another-day\" ext-link-type=\"uri\">https://tobyperrett.github.io/its-just-another-day</jats:ext-link>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"33 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1007/s11263-025-02635-9
Evgeniy Martyushev, Snehal Bhayani, Tomas Pajdla
{"title":"Automatic Solver Generator for Systems of Laurent Polynomial Equations","authors":"Evgeniy Martyushev, Snehal Bhayani, Tomas Pajdla","doi":"10.1007/s11263-025-02635-9","DOIUrl":"https://doi.org/10.1007/s11263-025-02635-9","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"15 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1007/s11263-025-02611-3
Yuqin Dai, Wanlu Zhu, Ronghui Li, Xiu Li, Zhenyu Zhang, Jun Li, Jian Yang
{"title":"TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography","authors":"Yuqin Dai, Wanlu Zhu, Ronghui Li, Xiu Li, Zhenyu Zhang, Jun Li, Jian Yang","doi":"10.1007/s11263-025-02611-3","DOIUrl":"https://doi.org/10.1007/s11263-025-02611-3","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"9 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1007/s11263-025-02721-y
Carlos Francisco Moreno-García, Gerardo Aragon Camarasa, Edmond S. L. Ho, Paul Henderson, Nicolas Pugeault, Jungong Han, Sergio Escalera
{"title":"Guest Editorial: Special Issue for the British Machine Vision Conference (BMVC), 2024 (Glasgow, Scotland, UK)","authors":"Carlos Francisco Moreno-García, Gerardo Aragon Camarasa, Edmond S. L. Ho, Paul Henderson, Nicolas Pugeault, Jungong Han, Sergio Escalera","doi":"10.1007/s11263-025-02721-y","DOIUrl":"https://doi.org/10.1007/s11263-025-02721-y","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"6 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1007/s11263-025-02601-5
Hongduan Tian, Feng Liu, Ka Chun Cheung, Zhen Fang, Simon See, Tongliang Liu, Bo Han
In cross-domain few-shot classification (CFC), mainstream studies aim to train a simple module (e.g. a linear transformation head) to select or transform features (a.k.a., the high-level semantic features) for previously unseen domains with a few labeled training data available on top of a powerful pre-trained model. These studies usually assume that high-level semantic features are shared across these domains, and just simple feature selection or transformations are enough to adapt features to previously unseen domains. However, in this paper, we find that the simply transformed features are too general to fully cover the key content features regarding each class. Thus, we propose an effective method, invariant-content feature reconstruction (IFR), to train a simple module that simultaneously considers both high-level and fine-grained invariant-content features for the previously unseen domains. Specifically, the fine-grained invariant-content features are considered as a set of informative and discriminative features learned from a few labeled training data of tasks sampled from unseen domains and are extracted by retrieving features that are invariant to style modifications from a set of content-preserving augmented data in pixel level with an attention module. Extensive experiments on the Meta-Dataset benchmark show that IFR achieves good generalization performance on unseen domains, which demonstrates the effectiveness of the fusion of the high-level features and the fine-grained invariant-content features. Specifically, IFR improves the average accuracy on unseen domains by 1.6% and 6.5% respectively under two different cross-domain few-shot classification settings.
{"title":"Cross-domain Few-shot Classification via Invariant-content Feature Reconstruction","authors":"Hongduan Tian, Feng Liu, Ka Chun Cheung, Zhen Fang, Simon See, Tongliang Liu, Bo Han","doi":"10.1007/s11263-025-02601-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02601-5","url":null,"abstract":"In <jats:italic>cross-domain few-shot classification</jats:italic> (CFC), mainstream studies aim to train a simple module (e.g. a linear transformation head) to select or transform features (a.k.a., the high-level semantic features) for previously unseen domains with a few labeled training data available on top of a powerful pre-trained model. These studies usually <jats:italic>assume</jats:italic> that high-level semantic features are shared across these domains, and just simple feature selection or transformations are enough to adapt features to previously unseen domains. However, in this paper, we find that the simply transformed features are too general to fully cover the key content features regarding each class. Thus, we propose an effective method, <jats:italic>invariant-content feature reconstruction</jats:italic> (IFR), to train a simple module that simultaneously considers both high-level and fine-grained invariant-content features for the previously unseen domains. Specifically, the fine-grained invariant-content features are considered as a set of <jats:italic>informative</jats:italic> and <jats:italic>discriminative</jats:italic> features learned from a few labeled training data of tasks sampled from unseen domains and are extracted by retrieving features that are invariant to style modifications from a set of content-preserving augmented data in pixel level with an attention module. Extensive experiments on the Meta-Dataset benchmark show that IFR achieves good generalization performance on unseen domains, which demonstrates the effectiveness of the fusion of the high-level features and the fine-grained invariant-content features. Specifically, IFR improves the average accuracy on unseen domains by 1.6% and 6.5% respectively under two different cross-domain few-shot classification settings.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"38 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}