Expressive Visual Text-to-Speech Using Active Appearance Models

2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI:10.1109/CVPR.2013.434

Robert Anderson, B. Stenger, V. Wan, R. Cipolla

引用次数: 84

Abstract

This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用主动外观模型表达视觉文本到语音

本文提出了一个完整的视觉文本到语音(VTTS)系统，该系统能够在给定输入文本和一组连续表达权重的情况下，以“说话的头”的形式产生富有表现力的输出。采用主动外观模型(AAM)对人脸进行建模，并对其进行了扩展，使其更适用于VTTS任务。该模型允许对姿态和眨眼状态进行归一化，从而显著减少合成序列中的伪影。我们展示了在超过一百万帧的重建误差方面的定量改进，以及在大规模用户研究中，比较不同系统的输出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Segment-Tree Based Cost Aggregation for Stereo Matching Event Retrieval in Large Video Collections with Circulant Temporal Encoding Articulated and Restricted Motion Subspaces and Their Signatures Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation Learning Video Saliency from Human Gaze Using Candidate Selection