从以自我为中心的视频中追踪三维场景中的实例

IF 4.2 2区 医学 Q1 INTEGRATIVE & COMPLEMENTARY MEDICINE Journal of Integrative Medicine-Jim Pub Date : 2023-12-07 DOI:arxiv-2312.04117
Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes
{"title":"从以自我为中心的视频中追踪三维场景中的实例","authors":"Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes","doi":"arxiv-2312.04117","DOIUrl":null,"url":null,"abstract":"Egocentric sensors such as AR/VR devices capture human-object interactions\nand offer the potential to provide task-assistance by recalling 3D locations of\nobjects of interest in the surrounding environment. This capability requires\ninstance tracking in real-world 3D scenes from egocentric videos (IT3DEgo). We\nexplore this problem by first introducing a new benchmark dataset, consisting\nof RGB and depth videos, per-frame camera pose, and instance-level annotations\nin both 2D camera and 3D world coordinates. We present an evaluation protocol\nwhich evaluates tracking performance in 3D coordinates with two settings for\nenrolling instances to track: (1) single-view online enrollment where an\ninstance is specified on-the-fly based on the human wearer's interactions. and\n(2) multi-view pre-enrollment where images of an instance to be tracked are\nstored in memory ahead of time. To address IT3DEgo, we first re-purpose methods\nfrom relevant areas, e.g., single object tracking (SOT) -- running SOT methods\nto track instances in 2D frames and lifting them to 3D using camera pose and\ndepth. We also present a simple method that leverages pretrained segmentation\nand detection models to generate proposals from RGB frames and match proposals\nwith enrolled instance images. Perhaps surprisingly, our extensive experiments\nshow that our method (with no finetuning) significantly outperforms SOT-based\napproaches. We conclude by arguing that the problem of egocentric instance\ntracking is made easier by leveraging camera pose and using a 3D allocentric\n(world) coordinate representation.","PeriodicalId":48599,"journal":{"name":"Journal of Integrative Medicine-Jim","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Instance Tracking in 3D Scenes from Egocentric Videos\",\"authors\":\"Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes\",\"doi\":\"arxiv-2312.04117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Egocentric sensors such as AR/VR devices capture human-object interactions\\nand offer the potential to provide task-assistance by recalling 3D locations of\\nobjects of interest in the surrounding environment. This capability requires\\ninstance tracking in real-world 3D scenes from egocentric videos (IT3DEgo). We\\nexplore this problem by first introducing a new benchmark dataset, consisting\\nof RGB and depth videos, per-frame camera pose, and instance-level annotations\\nin both 2D camera and 3D world coordinates. We present an evaluation protocol\\nwhich evaluates tracking performance in 3D coordinates with two settings for\\nenrolling instances to track: (1) single-view online enrollment where an\\ninstance is specified on-the-fly based on the human wearer's interactions. and\\n(2) multi-view pre-enrollment where images of an instance to be tracked are\\nstored in memory ahead of time. To address IT3DEgo, we first re-purpose methods\\nfrom relevant areas, e.g., single object tracking (SOT) -- running SOT methods\\nto track instances in 2D frames and lifting them to 3D using camera pose and\\ndepth. We also present a simple method that leverages pretrained segmentation\\nand detection models to generate proposals from RGB frames and match proposals\\nwith enrolled instance images. Perhaps surprisingly, our extensive experiments\\nshow that our method (with no finetuning) significantly outperforms SOT-based\\napproaches. We conclude by arguing that the problem of egocentric instance\\ntracking is made easier by leveraging camera pose and using a 3D allocentric\\n(world) coordinate representation.\",\"PeriodicalId\":48599,\"journal\":{\"name\":\"Journal of Integrative Medicine-Jim\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2023-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Integrative Medicine-Jim\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2312.04117\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INTEGRATIVE & COMPLEMENTARY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrative Medicine-Jim","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.04117","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INTEGRATIVE & COMPLEMENTARY MEDICINE","Score":null,"Total":0}
引用次数: 0

摘要

AR/VR设备等以自我为中心的传感器可以捕捉人与物体之间的互动,并通过回忆周围环境中感兴趣物体的三维位置来提供任务辅助。要实现这一功能,需要在真实世界的三维场景中通过以自我为中心的视频进行实例跟踪(IT3DEgo)。我们首先引入了一个新的基准数据集来探讨这一问题,该数据集由 RGB 和深度视频、每帧摄像机姿态以及二维摄像机和三维世界坐标中的实例级注释组成。我们提出了一个评估协议,通过两种设置来评估三维坐标下的跟踪性能:(1) 单视角在线注册,即根据佩戴者的交互行为即时指定一个实例;(2) 多视角预注册,即将跟踪实例的图像提前存储在内存中。为了解决 IT3DEgo 问题,我们首先重新利用了相关领域的方法,例如单个物体跟踪 (SOT) -- 使用 SOT 方法跟踪二维帧中的实例,并利用摄像头姿势和深度将其提升到三维。我们还提出了一种简单的方法,利用预训练的分割和检测模型从 RGB 帧生成建议,并将建议与注册的实例图像进行匹配。也许令人惊讶的是,我们的大量实验表明,我们的方法(无需微调)明显优于基于 SOT 的方法。最后,我们认为,通过利用摄像头姿势和使用 3D 分配中心(世界)坐标表示法,可以使以自我为中心的实例跟踪问题变得更加简单。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Instance Tracking in 3D Scenes from Egocentric Videos
Egocentric sensors such as AR/VR devices capture human-object interactions and offer the potential to provide task-assistance by recalling 3D locations of objects of interest in the surrounding environment. This capability requires instance tracking in real-world 3D scenes from egocentric videos (IT3DEgo). We explore this problem by first introducing a new benchmark dataset, consisting of RGB and depth videos, per-frame camera pose, and instance-level annotations in both 2D camera and 3D world coordinates. We present an evaluation protocol which evaluates tracking performance in 3D coordinates with two settings for enrolling instances to track: (1) single-view online enrollment where an instance is specified on-the-fly based on the human wearer's interactions. and (2) multi-view pre-enrollment where images of an instance to be tracked are stored in memory ahead of time. To address IT3DEgo, we first re-purpose methods from relevant areas, e.g., single object tracking (SOT) -- running SOT methods to track instances in 2D frames and lifting them to 3D using camera pose and depth. We also present a simple method that leverages pretrained segmentation and detection models to generate proposals from RGB frames and match proposals with enrolled instance images. Perhaps surprisingly, our extensive experiments show that our method (with no finetuning) significantly outperforms SOT-based approaches. We conclude by arguing that the problem of egocentric instance tracking is made easier by leveraging camera pose and using a 3D allocentric (world) coordinate representation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Integrative Medicine-Jim
Journal of Integrative Medicine-Jim Medicine-Complementary and Alternative Medicine
CiteScore
9.20
自引率
4.20%
发文量
3319
期刊介绍: The predecessor of JIM is the Journal of Chinese Integrative Medicine (Zhong Xi Yi Jie He Xue Bao). With this new, English-language publication, we are committed to make JIM an international platform for publishing high-quality papers on complementary and alternative medicine (CAM) and an open forum in which the different professions and international scholarly communities can exchange views, share research and their clinical experience, discuss CAM education, and confer about issues and problems in our various disciplines and in CAM as a whole in order to promote integrative medicine. JIM is indexed/abstracted in: MEDLINE/PubMed, ScienceDirect, Emerging Sources Citation Index (ESCI), Scopus, Embase, Chemical Abstracts (CA), CAB Abstracts, EBSCO, WPRIM, JST China, Chinese Science Citation Database (CSCD), and China National Knowledge Infrastructure (CNKI). JIM Editorial Office uses ThomsonReuters ScholarOne Manuscripts as submitting and review system (submission link: http://mc03.manuscriptcentral.com/jcim-en). JIM is published bimonthly. Manuscripts submitted to JIM should be written in English. Article types include but are not limited to randomized controlled and pragmatic trials, translational and patient-centered effectiveness outcome studies, case series and reports, clinical trial protocols, preclinical and basic science studies, systematic reviews and meta-analyses, papers on methodology and CAM history or education, conference proceedings, editorials, commentaries, short communications, book reviews, and letters to the editor. Our purpose is to publish a prestigious international journal for studies in integrative medicine. To achieve this aim, we seek to publish high-quality papers on any aspects of integrative medicine, such as acupuncture and traditional Chinese medicine, Ayurveda medicine, herbal medicine, homeopathy, nutrition, chiropractic, mind-body medicine, taichi, qigong, meditation, and any other modalities of CAM; our commitment to international scope ensures that research and progress from all regions of the world are widely covered. These ensure that articles published in JIM have the maximum exposure to the international scholarly community. JIM can help its authors let their papers reach the widest possible range of readers, and let all those who share an interest in their research field be concerned with their study.
期刊最新文献
A comprehensive overview on antiviral effects of baicalein and its glucuronide derivative baicalin. Effectiveness and safety of adjunctive non-drug measures in improving respiratory symptoms among patients with severe COVID-19: A multicenter randomized controlled trial. Luteolin protects against myocardial ischemia/reperfusion injury by reducing oxidative stress and apoptosis through the p53 pathway. Editorial Board Scientific guidelines for preclinical research on potentised preparations manufactured according to current pharmacopoeias—the PrePoP guidelines
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1