灯塔：用于可重复视频时刻检索和亮点检测的用户友好型库

arXiv - CS - Multimedia Pub Date : 2024-08-06 DOI:arxiv-2408.02901

Taichi Nishimura, Shota Nakada, Hokuto Munakata, Tatsuya Komatsu

{"title":"灯塔：用于可重复视频时刻检索和亮点检测的用户友好型库","authors":"Taichi Nishimura, Shota Nakada, Hokuto Munakata, Tatsuya Komatsu","doi":"arxiv-2408.02901","DOIUrl":null,"url":null,"abstract":"We propose Lighthouse, a user-friendly library for reproducible video moment\nretrieval and highlight detection (MR-HD). Although researchers proposed\nvarious MR-HD approaches, the research community holds two main issues. The\nfirst is a lack of comprehensive and reproducible experiments across various\nmethods, datasets, and video-text features. This is because no unified training\nand evaluation codebase covers multiple settings. The second is user-unfriendly\ndesign. Because previous works use different libraries, researchers set up\nindividual environments. In addition, most works release only the training\ncodes, requiring users to implement the whole inference process of MR-HD.\nLighthouse addresses these issues by implementing a unified reproducible\ncodebase that includes six models, three features, and five datasets. In\naddition, it provides an inference API and web demo to make these methods\neasily accessible for researchers and developers. Our experiments demonstrate\nthat Lighthouse generally reproduces the reported scores in the reference\npapers. The code is available at https://github.com/line/lighthouse.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection\",\"authors\":\"Taichi Nishimura, Shota Nakada, Hokuto Munakata, Tatsuya Komatsu\",\"doi\":\"arxiv-2408.02901\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose Lighthouse, a user-friendly library for reproducible video moment\\nretrieval and highlight detection (MR-HD). Although researchers proposed\\nvarious MR-HD approaches, the research community holds two main issues. The\\nfirst is a lack of comprehensive and reproducible experiments across various\\nmethods, datasets, and video-text features. This is because no unified training\\nand evaluation codebase covers multiple settings. The second is user-unfriendly\\ndesign. Because previous works use different libraries, researchers set up\\nindividual environments. In addition, most works release only the training\\ncodes, requiring users to implement the whole inference process of MR-HD.\\nLighthouse addresses these issues by implementing a unified reproducible\\ncodebase that includes six models, three features, and five datasets. In\\naddition, it provides an inference API and web demo to make these methods\\neasily accessible for researchers and developers. Our experiments demonstrate\\nthat Lighthouse generally reproduces the reported scores in the reference\\npapers. The code is available at https://github.com/line/lighthouse.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"22 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.02901\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们提出的 Lighthouse 是一个用户友好型库，用于可重现的视频瞬间检索和高亮检测（MR-HD）。尽管研究人员提出了多种MR-HD方法，但研究界仍存在两个主要问题。首先是缺乏跨越各种方法、数据集和视频文本特征的全面且可重现的实验。这是因为没有涵盖多种设置的统一训练和评估代码库。其次是设计对用户不友好。由于之前的研究使用了不同的库，研究人员需要建立各自的环境。为了解决这些问题，Lighthouse建立了一个统一的可重现代码库，其中包括六个模型、三个特征和五个数据集。此外，它还提供了推理应用程序接口（API）和网络演示，使研究人员和开发人员可以轻松使用这些方法。我们的实验表明，Lighthouse基本重现了参考文献中报告的分数。代码可在 https://github.com/line/lighthouse 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection

We propose Lighthouse, a user-friendly library for reproducible video moment retrieval and highlight detection (MR-HD). Although researchers proposed various MR-HD approaches, the research community holds two main issues. The first is a lack of comprehensive and reproducible experiments across various methods, datasets, and video-text features. This is because no unified training and evaluation codebase covers multiple settings. The second is user-unfriendly design. Because previous works use different libraries, researchers set up individual environments. In addition, most works release only the training codes, requiring users to implement the whole inference process of MR-HD. Lighthouse addresses these issues by implementing a unified reproducible codebase that includes six models, three features, and five datasets. In addition, it provides an inference API and web demo to make these methods easily accessible for researchers and developers. Our experiments demonstrate that Lighthouse generally reproduces the reported scores in the reference papers. The code is available at https://github.com/line/lighthouse.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multimedia

自引率

0.00%

发文量

期刊最新文献

Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs