{"title":"从以自我为中心的视频中追踪三维场景中的实例","authors":"Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes","doi":"arxiv-2312.04117","DOIUrl":null,"url":null,"abstract":"Egocentric sensors such as AR/VR devices capture human-object interactions\nand offer the potential to provide task-assistance by recalling 3D locations of\nobjects of interest in the surrounding environment. This capability requires\ninstance tracking in real-world 3D scenes from egocentric videos (IT3DEgo). We\nexplore this problem by first introducing a new benchmark dataset, consisting\nof RGB and depth videos, per-frame camera pose, and instance-level annotations\nin both 2D camera and 3D world coordinates. We present an evaluation protocol\nwhich evaluates tracking performance in 3D coordinates with two settings for\nenrolling instances to track: (1) single-view online enrollment where an\ninstance is specified on-the-fly based on the human wearer's interactions. and\n(2) multi-view pre-enrollment where images of an instance to be tracked are\nstored in memory ahead of time. To address IT3DEgo, we first re-purpose methods\nfrom relevant areas, e.g., single object tracking (SOT) -- running SOT methods\nto track instances in 2D frames and lifting them to 3D using camera pose and\ndepth. We also present a simple method that leverages pretrained segmentation\nand detection models to generate proposals from RGB frames and match proposals\nwith enrolled instance images. Perhaps surprisingly, our extensive experiments\nshow that our method (with no finetuning) significantly outperforms SOT-based\napproaches. We conclude by arguing that the problem of egocentric instance\ntracking is made easier by leveraging camera pose and using a 3D allocentric\n(world) coordinate representation.","PeriodicalId":48599,"journal":{"name":"Journal of Integrative Medicine-Jim","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Instance Tracking in 3D Scenes from Egocentric Videos\",\"authors\":\"Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes\",\"doi\":\"arxiv-2312.04117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Egocentric sensors such as AR/VR devices capture human-object interactions\\nand offer the potential to provide task-assistance by recalling 3D locations of\\nobjects of interest in the surrounding environment. This capability requires\\ninstance tracking in real-world 3D scenes from egocentric videos (IT3DEgo). We\\nexplore this problem by first introducing a new benchmark dataset, consisting\\nof RGB and depth videos, per-frame camera pose, and instance-level annotations\\nin both 2D camera and 3D world coordinates. We present an evaluation protocol\\nwhich evaluates tracking performance in 3D coordinates with two settings for\\nenrolling instances to track: (1) single-view online enrollment where an\\ninstance is specified on-the-fly based on the human wearer's interactions. and\\n(2) multi-view pre-enrollment where images of an instance to be tracked are\\nstored in memory ahead of time. To address IT3DEgo, we first re-purpose methods\\nfrom relevant areas, e.g., single object tracking (SOT) -- running SOT methods\\nto track instances in 2D frames and lifting them to 3D using camera pose and\\ndepth. We also present a simple method that leverages pretrained segmentation\\nand detection models to generate proposals from RGB frames and match proposals\\nwith enrolled instance images. Perhaps surprisingly, our extensive experiments\\nshow that our method (with no finetuning) significantly outperforms SOT-based\\napproaches. We conclude by arguing that the problem of egocentric instance\\ntracking is made easier by leveraging camera pose and using a 3D allocentric\\n(world) coordinate representation.\",\"PeriodicalId\":48599,\"journal\":{\"name\":\"Journal of Integrative Medicine-Jim\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2023-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Integrative Medicine-Jim\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2312.04117\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INTEGRATIVE & COMPLEMENTARY MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrative Medicine-Jim","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.04117","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INTEGRATIVE & COMPLEMENTARY MEDICINE","Score":null,"Total":0}
引用次数: 0
摘要
AR/VR设备等以自我为中心的传感器可以捕捉人与物体之间的互动,并通过回忆周围环境中感兴趣物体的三维位置来提供任务辅助。要实现这一功能,需要在真实世界的三维场景中通过以自我为中心的视频进行实例跟踪(IT3DEgo)。我们首先引入了一个新的基准数据集来探讨这一问题,该数据集由 RGB 和深度视频、每帧摄像机姿态以及二维摄像机和三维世界坐标中的实例级注释组成。我们提出了一个评估协议,通过两种设置来评估三维坐标下的跟踪性能:(1) 单视角在线注册,即根据佩戴者的交互行为即时指定一个实例;(2) 多视角预注册,即将跟踪实例的图像提前存储在内存中。为了解决 IT3DEgo 问题,我们首先重新利用了相关领域的方法,例如单个物体跟踪 (SOT) -- 使用 SOT 方法跟踪二维帧中的实例,并利用摄像头姿势和深度将其提升到三维。我们还提出了一种简单的方法,利用预训练的分割和检测模型从 RGB 帧生成建议,并将建议与注册的实例图像进行匹配。也许令人惊讶的是,我们的大量实验表明,我们的方法(无需微调)明显优于基于 SOT 的方法。最后,我们认为,通过利用摄像头姿势和使用 3D 分配中心(世界)坐标表示法,可以使以自我为中心的实例跟踪问题变得更加简单。
Instance Tracking in 3D Scenes from Egocentric Videos
Egocentric sensors such as AR/VR devices capture human-object interactions
and offer the potential to provide task-assistance by recalling 3D locations of
objects of interest in the surrounding environment. This capability requires
instance tracking in real-world 3D scenes from egocentric videos (IT3DEgo). We
explore this problem by first introducing a new benchmark dataset, consisting
of RGB and depth videos, per-frame camera pose, and instance-level annotations
in both 2D camera and 3D world coordinates. We present an evaluation protocol
which evaluates tracking performance in 3D coordinates with two settings for
enrolling instances to track: (1) single-view online enrollment where an
instance is specified on-the-fly based on the human wearer's interactions. and
(2) multi-view pre-enrollment where images of an instance to be tracked are
stored in memory ahead of time. To address IT3DEgo, we first re-purpose methods
from relevant areas, e.g., single object tracking (SOT) -- running SOT methods
to track instances in 2D frames and lifting them to 3D using camera pose and
depth. We also present a simple method that leverages pretrained segmentation
and detection models to generate proposals from RGB frames and match proposals
with enrolled instance images. Perhaps surprisingly, our extensive experiments
show that our method (with no finetuning) significantly outperforms SOT-based
approaches. We conclude by arguing that the problem of egocentric instance
tracking is made easier by leveraging camera pose and using a 3D allocentric
(world) coordinate representation.
期刊介绍:
The predecessor of JIM is the Journal of Chinese Integrative Medicine (Zhong Xi Yi Jie He Xue Bao). With this new, English-language publication, we are committed to make JIM an international platform for publishing high-quality papers on complementary and alternative medicine (CAM) and an open forum in which the different professions and international scholarly communities can exchange views, share research and their clinical experience, discuss CAM education, and confer about issues and problems in our various disciplines and in CAM as a whole in order to promote integrative medicine.
JIM is indexed/abstracted in: MEDLINE/PubMed, ScienceDirect, Emerging Sources Citation Index (ESCI), Scopus, Embase, Chemical Abstracts (CA), CAB Abstracts, EBSCO, WPRIM, JST China, Chinese Science Citation Database (CSCD), and China National Knowledge Infrastructure (CNKI).
JIM Editorial Office uses ThomsonReuters ScholarOne Manuscripts as submitting and review system (submission link: http://mc03.manuscriptcentral.com/jcim-en).
JIM is published bimonthly. Manuscripts submitted to JIM should be written in English. Article types include but are not limited to randomized controlled and pragmatic trials, translational and patient-centered effectiveness outcome studies, case series and reports, clinical trial protocols, preclinical and basic science studies, systematic reviews and meta-analyses, papers on methodology and CAM history or education, conference proceedings, editorials, commentaries, short communications, book reviews, and letters to the editor.
Our purpose is to publish a prestigious international journal for studies in integrative medicine. To achieve this aim, we seek to publish high-quality papers on any aspects of integrative medicine, such as acupuncture and traditional Chinese medicine, Ayurveda medicine, herbal medicine, homeopathy, nutrition, chiropractic, mind-body medicine, taichi, qigong, meditation, and any other modalities of CAM; our commitment to international scope ensures that research and progress from all regions of the world are widely covered. These ensure that articles published in JIM have the maximum exposure to the international scholarly community.
JIM can help its authors let their papers reach the widest possible range of readers, and let all those who share an interest in their research field be concerned with their study.