{"title":"A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation","authors":"Yao Qian, F. Soong","doi":"10.1109/ISCSLP.2012.6423506","DOIUrl":null,"url":null,"abstract":"In human-machine speech communication, it is technically challenging to make the machine talk as naturally as human so as to facilitate “frictionless” interactions, or make a human user to feel the communication is as natural as human-human. We propose a trajectory tiling approach to high quality speech synthesis, where the speech parameter trajectories, extracted from natural, processed, or synthesized speech, are used to guide the search for the best sequence of waveform segment “tiles” stored in a pre-recorded speech database. We test our approach in both TTS and cross-lingual voice transformation applications. Experimental results show that the proposed trajectory tiling approach can render speech which is both natural and highly intelligible. The perceived high quality speech is also confirmed in objective and subjective tests.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In human-machine speech communication, it is technically challenging to make the machine talk as naturally as human so as to facilitate “frictionless” interactions, or make a human user to feel the communication is as natural as human-human. We propose a trajectory tiling approach to high quality speech synthesis, where the speech parameter trajectories, extracted from natural, processed, or synthesized speech, are used to guide the search for the best sequence of waveform segment “tiles” stored in a pre-recorded speech database. We test our approach in both TTS and cross-lingual voice transformation applications. Experimental results show that the proposed trajectory tiling approach can render speech which is both natural and highly intelligible. The perceived high quality speech is also confirmed in objective and subjective tests.