语音障碍的自动唇同步视频自我建模干预

Ju Shen, Changpeng Ti, S. Cheung, Rita R. Patel
{"title":"语音障碍的自动唇同步视频自我建模干预","authors":"Ju Shen, Changpeng Ti, S. Cheung, Rita R. Patel","doi":"10.1109/HealthCom.2012.6379415","DOIUrl":null,"url":null,"abstract":"Video self-modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient's original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Experimental evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques.","PeriodicalId":138952,"journal":{"name":"2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Automatic lip-synchronized video-self-modeling intervention for voice disorders\",\"authors\":\"Ju Shen, Changpeng Ti, S. Cheung, Rita R. Patel\",\"doi\":\"10.1109/HealthCom.2012.6379415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video self-modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient's original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Experimental evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques.\",\"PeriodicalId\":138952,\"journal\":{\"name\":\"2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HealthCom.2012.6379415\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HealthCom.2012.6379415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

视频自我建模(VSM)是一种行为干预技术,学习者通过观看自己的视频来模仿目标行为。在言语语言病理学领域,VSM的方法已成功用于自闭症儿童和口吃流利障碍个体的语言治疗。在创建描述以前未见过的行为的VSM内容方面仍然存在技术挑战。在本文中,我们提出了一种新的系统,合成新的视频序列,用于VSM治疗语音障碍患者。该系统从一位语音障碍患者的视频记录开始,用一种与患者原始声音相似的干净、健康的语音取代了粗糙的语音。替代语音是使用文本到语音引擎合成的,或者根据语音相似度度量从干净的语音数据库中选择。为了将替换语音与原始视频重新对齐,提出了一种将音频分割与唇态检测相结合的新型视听算法,以识别音频和视频轨道中相应的时间标记。唇同步,然后完成使用自适应视频重采样方案,最大限度地减少运动抖动的量,并保持空间清晰度。在一个包含31个受试者的数据集上的实验评估证明了所提出技术的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automatic lip-synchronized video-self-modeling intervention for voice disorders
Video self-modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of him- or herself. In the field of speech language pathology, the approach of VSM has been successfully used for treatment of language in children with Autism and in individuals with fluency disorder of stuttering. Technical challenges remain in creating VSM contents that depict previously unseen behaviors. In this paper, we propose a novel system that synthesizes new video sequences for VSM treatment of patients with voice disorders. Starting with a video recording of a voice-disorder patient, the proposed system replaces the coarse speech with a clean, healthier speech that bears resemblance to the patient's original voice. The replacement speech is synthesized using either a text-to-speech engine or selecting from a database of clean speeches based on a voice similarity metric. To realign the replacement speech with the original video, a novel audiovisual algorithm that combines audio segmentation with lip-state detection is proposed to identify corresponding time markers in the audio and video tracks. Lip synchronization is then accomplished by using an adaptive video re-sampling scheme that minimizes the amount of motion jitter and preserves the spatial sharpness. Experimental evaluations on a dataset with 31 subjects demonstrate the effectiveness of the proposed techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Accuracy estimation of detection of extrasystoles in heart rate sequences A multimedia based hybrid system for healthcare application An architectural design framework for an Electronic Health Record system with hospice application Thinking of comparative effectiveness research of the combination in the real traditional Chinese medicine world Scheduling medical tests: A solution to the problem of overcrowding in a hospital emergency department
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1