首页 > 最新文献

IEEE transactions on visualization and computer graphics最新文献

英文 中文
IEEE ISMAR 2024 Steering Committee Members IEEE ISMAR 2024 指导委员会成员
Pub Date : 2024-10-10 DOI: 10.1109/TVCG.2024.3453149
{"title":"IEEE ISMAR 2024 Steering Committee Members","authors":"","doi":"10.1109/TVCG.2024.3453149","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3453149","url":null,"abstract":"","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10713480","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the ISMAR 2024 Science and Technology Program Chairs and TVCG Guest Editors ISMAR 2024 科学与技术项目主席和 TVCG 特邀编辑的致辞
Pub Date : 2024-10-10 DOI: 10.1109/TVCG.2024.3453128
Ulrich Eck;Maki Sugimoto;Misha Sra;Markus Tatzgern;Jeanine Stefanucci;Ian Williams
In this special issue of IEEE Transactions on Visualization and Computer Graphics (TVCG), we are pleased to present the journal papers from the 23rd IEEE International Symposium on Mixed and Augmented Reality (ISMAR 2024), which will be held as a hybrid conference between October 21 and 25, 2024 in the Greater Seattle Area, USA. ISMAR continues the over twenty-year long tradition of IWAR, ISMR, and ISAR, and is the premier conference for Mixed and Augmented Reality in the world.
第 23 届 IEEE 混合与增强现实国际研讨会(ISMAR 2024)将于 2024 年 10 月 21 日至 25 日在美国大西雅图地区以混合会议的形式举行,在本期《IEEE 可视化与计算机图形学论文集》(TVCG)特刊中,我们将隆重推出该会议的期刊论文。ISMAR 延续了 IWAR、ISMR 和 ISAR 长达二十多年的传统,是全球混合与增强现实领域的顶级会议。
{"title":"Message from the ISMAR 2024 Science and Technology Program Chairs and TVCG Guest Editors","authors":"Ulrich Eck;Maki Sugimoto;Misha Sra;Markus Tatzgern;Jeanine Stefanucci;Ian Williams","doi":"10.1109/TVCG.2024.3453128","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3453128","url":null,"abstract":"In this special issue of IEEE Transactions on Visualization and Computer Graphics (TVCG), we are pleased to present the journal papers from the 23rd IEEE International Symposium on Mixed and Augmented Reality (ISMAR 2024), which will be held as a hybrid conference between October 21 and 25, 2024 in the Greater Seattle Area, USA. ISMAR continues the over twenty-year long tradition of IWAR, ISMR, and ISAR, and is the premier conference for Mixed and Augmented Reality in the world.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10713471","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE ISMAR 2024 - Paper Reviewers for Journal Papers IEEE ISMAR 2024 - 期刊论文审稿人
Pub Date : 2024-10-10 DOI: 10.1109/TVCG.2024.3453151
{"title":"IEEE ISMAR 2024 - Paper Reviewers for Journal Papers","authors":"","doi":"10.1109/TVCG.2024.3453151","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3453151","url":null,"abstract":"","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10713477","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2024 IEEE International Symposium on Mixed and Augmented Reality 2024 电气和电子工程师学会混合与增强现实国际研讨会
Pub Date : 2024-10-10 DOI: 10.1109/TVCG.2024.3453109
{"title":"2024 IEEE International Symposium on Mixed and Augmented Reality","authors":"","doi":"10.1109/TVCG.2024.3453109","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3453109","url":null,"abstract":"","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10713476","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Charting EDA: Characterizing Interactive Visualization Use in Computational Notebooks with a Mixed-Methods Formalism. Charting EDA: Characterizing Interactive Visualization Use in Computational Notebooks with a Mixed-Methods Formalism.
Pub Date : 2024-10-10 DOI: 10.1109/TVCG.2024.3456217
Dylan Wootton, Amy Rae Fox, Evan Peck, Arvind Satyanarayan

Interactive visualizations are powerful tools for Exploratory Data Analysis (EDA), but how do they affect the observations analysts make about their data? We conducted a qualitative experiment with 13 professional data scientists analyzing two datasets with Jupyter notebooks, collecting a rich dataset of interaction traces and think-aloud utterances. By qualitatively coding participant utterances, we introduce a formalism that describes EDA as a sequence of analysis states, where each state is comprised of either a representation an analyst constructs (e.g., the output of a data frame, an interactive visualization, etc.) or an observation the analyst makes (e.g., about missing data, the relationship between variables, etc.). By applying our formalism to our dataset, we identify that interactive visualizations, on average, lead to earlier and more complex insights about relationships between dataset attributes compared to static visualizations. Moreover, by calculating metrics such as revisit count and representational diversity, we uncover that some representations serve more as "planning aids" during EDA rather than tools strictly for hypothesis-answering. We show how these measures help identify other patterns of analysis behavior, such as the "80-20 rule", where a small subset of representations drove the majority of observations. Based on these fndings, we offer design guidelines for interactive exploratory analysis tooling and refect on future directions for studying the role that visualizations play in EDA.

交互式可视化是探索性数据分析(EDA)的强大工具,但它们如何影响分析师对数据的观察?我们与 13 位专业数据科学家一起进行了一项定性实验,他们使用 Jupyter 笔记本分析了两个数据集,收集了丰富的交互痕迹和思考语音数据集。通过对参与者的话语进行定性编码,我们引入了一种形式主义,将 EDA 描述为一系列分析状态,其中每个状态都由分析师构建的表示(如数据框架的输出、交互式可视化等)或分析师的观察(如关于缺失数据、变量之间的关系等)组成。通过将我们的形式主义应用于数据集,我们发现,与静态可视化相比,交互式可视化平均能更早更复杂地洞察数据集属性之间的关系。此外,通过计算重访次数和表征多样性等指标,我们发现有些表征在 EDA 过程中更像是 "规划辅助工具",而不是严格意义上的假设解答工具。我们展示了这些指标如何帮助识别分析行为的其他模式,例如 "80-20 规则",即一小部分表征驱动了大部分观察。基于这些发现,我们为交互式探索分析工具提供了设计指南,并对研究可视化在 EDA 中的作用的未来方向进行了反思。
{"title":"Charting EDA: Characterizing Interactive Visualization Use in Computational Notebooks with a Mixed-Methods Formalism.","authors":"Dylan Wootton, Amy Rae Fox, Evan Peck, Arvind Satyanarayan","doi":"10.1109/TVCG.2024.3456217","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456217","url":null,"abstract":"<p><p>Interactive visualizations are powerful tools for Exploratory Data Analysis (EDA), but how do they affect the observations analysts make about their data? We conducted a qualitative experiment with 13 professional data scientists analyzing two datasets with Jupyter notebooks, collecting a rich dataset of interaction traces and think-aloud utterances. By qualitatively coding participant utterances, we introduce a formalism that describes EDA as a sequence of analysis states, where each state is comprised of either a representation an analyst constructs (e.g., the output of a data frame, an interactive visualization, etc.) or an observation the analyst makes (e.g., about missing data, the relationship between variables, etc.). By applying our formalism to our dataset, we identify that interactive visualizations, on average, lead to earlier and more complex insights about relationships between dataset attributes compared to static visualizations. Moreover, by calculating metrics such as revisit count and representational diversity, we uncover that some representations serve more as \"planning aids\" during EDA rather than tools strictly for hypothesis-answering. We show how these measures help identify other patterns of analysis behavior, such as the \"80-20 rule\", where a small subset of representations drove the majority of observations. Based on these fndings, we offer design guidelines for interactive exploratory analysis tooling and refect on future directions for studying the role that visualizations play in EDA.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the Editor-in-Chief and from the Associate Editor-in-Chief 主编和副主编致辞
Pub Date : 2024-10-10 DOI: 10.1109/TVCG.2024.3453148
Han-Wei Shen;Kiyoshi Kiyokawa
Welcome to the 10th IEEE Transactions on Visualization and Computer Graphics (TVCG) special issue on IEEE International Symposium on Mixed and Augmented Reality (ISMAR). This volume contains a total of 44 full papers selected for and presented at ISMAR 2024, held from October 21 to 25, 2024 in the Greater Seattle Area, USA, in a hybrid mode.
欢迎阅读第 10 期《IEEE Visualization and Computer Graphics (TVCG) Transactions on Visualization and Computer Graphics》(《可视化与计算机图形》)特刊:IEEE 混合现实与增强现实国际研讨会(ISMAR)。本卷收录了为 2024 年 10 月 21 日至 25 日在美国大西雅图地区举行的 ISMAR 2024 会议挑选并以混合模式提交的 44 篇论文全文。
{"title":"Message from the Editor-in-Chief and from the Associate Editor-in-Chief","authors":"Han-Wei Shen;Kiyoshi Kiyokawa","doi":"10.1109/TVCG.2024.3453148","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3453148","url":null,"abstract":"Welcome to the 10th IEEE Transactions on Visualization and Computer Graphics (TVCG) special issue on IEEE International Symposium on Mixed and Augmented Reality (ISMAR). This volume contains a total of 44 full papers selected for and presented at ISMAR 2024, held from October 21 to 25, 2024 in the Greater Seattle Area, USA, in a hybrid mode.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10713479","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChartKG: A Knowledge-Graph-Based Representation for Chart Images. ChartKG:基于知识图谱的图表图像表示法。
Pub Date : 2024-10-09 DOI: 10.1109/TVCG.2024.3476508
Zhiguang Zhou, Haoxuan Wang, Zhengqing Zhao, Fengling Zheng, Yongheng Wang, Wei Chen, Yong Wang

Chart images, such as bar charts, pie charts, and line charts, are explosively produced due to the wide usage of data visualizations. Accordingly, knowledge mining from chart images is becoming increasingly important, which can benefit downstream tasks like chart retrieval and knowledge graph completion. However, existing methods for chart knowledge mining mainly focus on converting chart images into raw data and often ignore their visual encodings and semantic meanings, which can result in information loss for many downstream tasks. In this paper, we propose ChartKG, a novel knowledge graph (KG) based representation for chart images, which can model the visual elements in a chart image and semantic relations among them including visual encodings and visual insights in a unified manner.Further, we develop a general framework to convert chart images to the proposed KG-based representation. It integrates a series of image processing techniques to identify visual elements and relations, e.g., CNNs to classify charts, yolov5 and optical character recognition to parse charts, and rule-based methods to construct graphs. We present four cases to illustrate how our knowledge-graph-based representation can model the detailed visual elements and semantic relations in charts, and further demonstrate how our approach can benefit downstream applications such as semantic-aware chart retrieval and chart question answering. We also conduct quantitative evaluations to assess the two fundamental building blocks of our chart-to-KG framework, i.e., object recognition and optical character recognition. The results provide support for the usefulness and effectiveness of ChartKG.

由于数据可视化的广泛应用,条形图、饼图和折线图等图表图像的产量呈爆炸式增长。因此,从图表图像中挖掘知识变得越来越重要,这将有利于图表检索和知识图谱完善等下游任务。然而,现有的图表知识挖掘方法主要侧重于将图表图像转换为原始数据,往往忽略了图表图像的视觉编码和语义含义,这可能会导致许多下游任务的信息丢失。在本文中,我们提出了基于知识图谱(KG)的新型图表图像表示法--ChartKG,它可以对图表图像中的视觉元素以及它们之间的语义关系(包括视觉编码和视觉洞察)进行统一建模。它整合了一系列图像处理技术来识别视觉元素和关系,例如 CNNs 来对图表进行分类,yolov5 和光学字符识别来解析图表,以及基于规则的方法来构建图表。我们介绍了四个案例,以说明我们基于知识图谱的表示法如何为图表中的详细视觉元素和语义关系建模,并进一步说明我们的方法如何有利于语义感知图表检索和图表问题解答等下游应用。我们还进行了定量评估,以评估图表到知识库框架的两个基本组成部分,即对象识别和光学字符识别。评估结果为 ChartKG 的实用性和有效性提供了支持。
{"title":"ChartKG: A Knowledge-Graph-Based Representation for Chart Images.","authors":"Zhiguang Zhou, Haoxuan Wang, Zhengqing Zhao, Fengling Zheng, Yongheng Wang, Wei Chen, Yong Wang","doi":"10.1109/TVCG.2024.3476508","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3476508","url":null,"abstract":"<p><p>Chart images, such as bar charts, pie charts, and line charts, are explosively produced due to the wide usage of data visualizations. Accordingly, knowledge mining from chart images is becoming increasingly important, which can benefit downstream tasks like chart retrieval and knowledge graph completion. However, existing methods for chart knowledge mining mainly focus on converting chart images into raw data and often ignore their visual encodings and semantic meanings, which can result in information loss for many downstream tasks. In this paper, we propose ChartKG, a novel knowledge graph (KG) based representation for chart images, which can model the visual elements in a chart image and semantic relations among them including visual encodings and visual insights in a unified manner.Further, we develop a general framework to convert chart images to the proposed KG-based representation. It integrates a series of image processing techniques to identify visual elements and relations, e.g., CNNs to classify charts, yolov5 and optical character recognition to parse charts, and rule-based methods to construct graphs. We present four cases to illustrate how our knowledge-graph-based representation can model the detailed visual elements and semantic relations in charts, and further demonstrate how our approach can benefit downstream applications such as semantic-aware chart retrieval and chart question answering. We also conduct quantitative evaluations to assess the two fundamental building blocks of our chart-to-KG framework, i.e., object recognition and optical character recognition. The results provide support for the usefulness and effectiveness of ChartKG.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field. MM-NeRF:多模态引导的神经辐射场三维多类型转移。
Pub Date : 2024-10-08 DOI: 10.1109/TVCG.2024.3476331
Zijiang Yang, Zhongwei Qiu, Chang Xu, Dongmei Fu

3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency. Existing methods still suffer the challenges of high-quality stylization with texture details and stylization with multimodal guidance. In this paper, we reveal that the common training method of stylization with NeRF, which generates stylized multi-view supervision by 2D style transfer models, causes the same object in supervision to show various states (color tone, details, etc.) in different views, leading NeRF to tend to smooth the texture details, further resulting in low-quality rendering for 3D multi-style transfer. To tackle these problems, we propose a novel Multimodal-guided 3D Multi-style transfer of NeRF, termed MM-NeRF. First, MM-NeRF projects multimodal guidance into a unified space to keep the multimodal styles consistency and extracts multimodal features to guide the 3D stylization. Second, a novel multi-head learning scheme is proposed to relieve the difficulty of learning multi-style transfer, and a multi-view style consistent loss is proposed to track the inconsistency of multi-view supervision data. Finally, a novel incremental learning mechanism is proposed to generalize MM-NeRF to any new style with small costs. Extensive experiments on several real-world datasets show that MM-NeRF achieves high-quality 3D multi-style stylization with multimodal guidance, and keeps multi-view consistency and style consistency between multimodal guidance.

三维风格转移的目的是生成具有指定风格的三维场景的风格化视图,这要求高质量的生成和保持多视图的一致性。现有的方法在高质量的纹理细节风格化和多模态引导风格化方面仍面临挑战。本文揭示了常见的 NeRF 风格化训练方法,即通过二维风格转移模型生成风格化的多视角监督,会导致监督中的同一对象在不同视角下呈现不同的状态(色调、细节等),导致 NeRF 倾向于平滑纹理细节,进一步导致三维多风格转移的低质量渲染。针对这些问题,我们提出了一种新颖的多模态引导三维多风格传输 NeRF,称为 MM-NeRF。首先,MM-NeRF 将多模态引导投射到一个统一的空间,以保持多模态风格的一致性,并提取多模态特征来引导 3D 风格化。其次,提出了一种新颖的多头学习方案来缓解多风格转移学习的困难,并提出了一种多视角风格一致性损失来跟踪多视角监督数据的不一致性。最后,提出了一种新颖的增量学习机制,以较小的成本将 MM-NeRF 推广到任何新的样式。在多个真实世界数据集上的广泛实验表明,MM-NeRF 通过多模态引导实现了高质量的三维多风格化,并在多模态引导之间保持了多视角一致性和风格一致性。
{"title":"MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field.","authors":"Zijiang Yang, Zhongwei Qiu, Chang Xu, Dongmei Fu","doi":"10.1109/TVCG.2024.3476331","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3476331","url":null,"abstract":"<p><p>3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency. Existing methods still suffer the challenges of high-quality stylization with texture details and stylization with multimodal guidance. In this paper, we reveal that the common training method of stylization with NeRF, which generates stylized multi-view supervision by 2D style transfer models, causes the same object in supervision to show various states (color tone, details, etc.) in different views, leading NeRF to tend to smooth the texture details, further resulting in low-quality rendering for 3D multi-style transfer. To tackle these problems, we propose a novel Multimodal-guided 3D Multi-style transfer of NeRF, termed MM-NeRF. First, MM-NeRF projects multimodal guidance into a unified space to keep the multimodal styles consistency and extracts multimodal features to guide the 3D stylization. Second, a novel multi-head learning scheme is proposed to relieve the difficulty of learning multi-style transfer, and a multi-view style consistent loss is proposed to track the inconsistency of multi-view supervision data. Finally, a novel incremental learning mechanism is proposed to generalize MM-NeRF to any new style with small costs. Extensive experiments on several real-world datasets show that MM-NeRF achieves high-quality 3D multi-style stylization with multimodal guidance, and keeps multi-view consistency and style consistency between multimodal guidance.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learn2Talk: 3D Talking Face Learns from 2D Talking Face. Learn2Talk:3D 会说话的人脸向 2D 会说话的人脸学习。
Pub Date : 2024-10-07 DOI: 10.1109/TVCG.2024.3476275
Yixiang Zhuang, Baoping Cheng, Yao Cheng, Yuntao Jin, Renshuai Liu, Chengyang Li, Xuan Cheng, Jing Liao, Juncong Lin

The speech-driven facial animation technology is generally categorized into two main types: 3D and 2D talking face. Both of these have garnered considerable research attention in recent years. However, to our knowledge, the research into 3D talking face has not progressed as deeply as that of 2D talking face, particularly in terms of lip-sync and perceptual mouth movements. The lip-sync necessitates an impeccable synchronization between mouth motion and speech audio. The speech perception derived from the perceptual mouth movements should resemble that of the driving audio. To mind the gap between the two sub-fields, we propose Learn2Talk, a learning framework that enhances 3D talking face network by integrating two key insights from the field of 2D talking face. Firstly, drawing inspiration from the audio-video sync network, we develop a 3D sync-lip expert model for the pursuit of lip-sync between audio and 3D facial motions. Secondly, we utilize a teacher model, carefully chosen from among 2D talking face methods, to guide the training of the audio-to-3D motions regression network, thereby increasing the accuracy of 3D vertex movements. Extensive experiments demonstrate the superiority of our proposed framework over state-of-the-art methods in terms of lip-sync, vertex accuracy and perceptual movements. Finally, we showcase two applications of our framework: audio-visual speech recognition and speech-driven 3D Gaussian Splatting-based avatar animation. The project page of this paper is: https://lkjkjoiuiu.github.io/Learn2Talk/.

语音驱动的面部动画技术一般分为两大类:三维和二维会说话的脸。近年来,这两种技术都得到了相当多的研究关注。然而,据我们所知,三维会说话的人脸的研究进展还不如二维会说话的人脸,尤其是在唇部同步和感知嘴部动作方面。唇部同步要求嘴部动作和语音音频之间的同步无懈可击。从感知嘴部动作得出的语音感知应与驱动音频相似。为了弥补这两个子领域之间的差距,我们提出了 Learn2Talk,这是一个学习框架,通过整合二维人脸识别领域的两个关键见解来增强三维人脸识别网络。首先,我们从音视频同步网络中汲取灵感,开发了一个三维同步唇语专家模型,用于追求音频和三维面部动作之间的唇语同步。其次,我们利用从二维说话表情方法中精心挑选的教师模型来指导音频到三维动作回归网络的训练,从而提高三维顶点运动的准确性。大量实验证明,我们提出的框架在唇语同步、顶点准确性和感知运动方面都优于最先进的方法。最后,我们展示了我们的框架的两个应用:视听语音识别和基于语音驱动的三维高斯拼接头像动画。本文的项目页面是:https://lkjkjoiuiu.github.io/Learn2Talk/。
{"title":"Learn2Talk: 3D Talking Face Learns from 2D Talking Face.","authors":"Yixiang Zhuang, Baoping Cheng, Yao Cheng, Yuntao Jin, Renshuai Liu, Chengyang Li, Xuan Cheng, Jing Liao, Juncong Lin","doi":"10.1109/TVCG.2024.3476275","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3476275","url":null,"abstract":"<p><p>The speech-driven facial animation technology is generally categorized into two main types: 3D and 2D talking face. Both of these have garnered considerable research attention in recent years. However, to our knowledge, the research into 3D talking face has not progressed as deeply as that of 2D talking face, particularly in terms of lip-sync and perceptual mouth movements. The lip-sync necessitates an impeccable synchronization between mouth motion and speech audio. The speech perception derived from the perceptual mouth movements should resemble that of the driving audio. To mind the gap between the two sub-fields, we propose Learn2Talk, a learning framework that enhances 3D talking face network by integrating two key insights from the field of 2D talking face. Firstly, drawing inspiration from the audio-video sync network, we develop a 3D sync-lip expert model for the pursuit of lip-sync between audio and 3D facial motions. Secondly, we utilize a teacher model, carefully chosen from among 2D talking face methods, to guide the training of the audio-to-3D motions regression network, thereby increasing the accuracy of 3D vertex movements. Extensive experiments demonstrate the superiority of our proposed framework over state-of-the-art methods in terms of lip-sync, vertex accuracy and perceptual movements. Finally, we showcase two applications of our framework: audio-visual speech recognition and speech-driven 3D Gaussian Splatting-based avatar animation. The project page of this paper is: https://lkjkjoiuiu.github.io/Learn2Talk/.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parametric Body Reconstruction Based on a Single Front Scan Point Cloud. 基于单个前端扫描点云的参数化人体重构
Pub Date : 2024-10-07 DOI: 10.1109/TVCG.2024.3475414
Xihang Li, Guiqin Li, Ming Li, Haoju Song

Full-body 3D scanning simplifies the acquisition of digital body models. However, current systems are bulky, intricate, and costly, with strict clothing constraints. We propose a pipeline that combines inner body shape inference and parametric model registration for reconstructing the corresponding body model from a single front scan of a clothed body. Three networks modules (Scan2Front-Net, Front2Back-Net, and Inner2Corr-Net) with relatively independent functions are proposed for predicting front inner, back inner, and parametric model reference point clouds, respectively. We consider the back inner point cloud as an axial offset of the front inner point cloud and divide the body into 14 parts. This offset relationship is then learned within the same body parts to reduce the ambiguity of the inference. The predicted front and back inner point clouds are concatenated as inner body point cloud, and then reconstruction is achieved by registering the parametric body model through a point-to-point correspondence between the reference point cloud and the inner body point cloud. Qualitative and quantitative analysis show that the proposed method has significant advantages in terms of body shape completion and reconstruction body model accuracy.

全身三维扫描简化了数字人体模型的获取。然而,目前的系统体积庞大、结构复杂、成本高昂,并受到严格的服装限制。我们提出了一种将身体内部形状推理和参数模型注册相结合的管道,用于从穿衣人体的单次正面扫描中重建相应的人体模型。我们提出了三个功能相对独立的网络模块(Scan2Front-Net、Front2Back-Net 和 Inner2Corr-Net),分别用于预测前内部、后内部和参数模型参考点云。我们将后内侧点云视为前内侧点云的轴向偏移,并将人体分为 14 个部分。然后在同一身体部位内学习这种偏移关系,以减少推理的模糊性。将预测的前后内部点云合并为身体内部点云,然后通过参考点云和身体内部点云之间的点对点对应关系注册参数化身体模型,从而实现重建。定性和定量分析表明,所提出的方法在完成人体形状和重建人体模型精度方面具有显著优势。
{"title":"Parametric Body Reconstruction Based on a Single Front Scan Point Cloud.","authors":"Xihang Li, Guiqin Li, Ming Li, Haoju Song","doi":"10.1109/TVCG.2024.3475414","DOIUrl":"10.1109/TVCG.2024.3475414","url":null,"abstract":"<p><p>Full-body 3D scanning simplifies the acquisition of digital body models. However, current systems are bulky, intricate, and costly, with strict clothing constraints. We propose a pipeline that combines inner body shape inference and parametric model registration for reconstructing the corresponding body model from a single front scan of a clothed body. Three networks modules (Scan2Front-Net, Front2Back-Net, and Inner2Corr-Net) with relatively independent functions are proposed for predicting front inner, back inner, and parametric model reference point clouds, respectively. We consider the back inner point cloud as an axial offset of the front inner point cloud and divide the body into 14 parts. This offset relationship is then learned within the same body parts to reduce the ambiguity of the inference. The predicted front and back inner point clouds are concatenated as inner body point cloud, and then reconstruction is achieved by registering the parametric body model through a point-to-point correspondence between the reference point cloud and the inner body point cloud. Qualitative and quantitative analysis show that the proposed method has significant advantages in terms of body shape completion and reconstruction body model accuracy.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on visualization and computer graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1