{"title":"A segmental time-alignment technique for text-speech synchronization","authors":"F. Vignoli, F. Lavagetto","doi":"10.1109/ICCIMA.1999.798565","DOIUrl":null,"url":null,"abstract":"The bimodal acoustic-visual effect is of extreme importance in human face-to-face communication; it has been broadly investigated and the improvement in understanding when visual cues are integrated with speech has been clearly demonstrated, with particular emphasis in noisy environments. In this paper, we propose a novel synchronization procedure for speech and text, consisting of a neural network-based acoustic segmentation method for phoneme classes and a phonetic-acoustic time alignment algorithm which we call Segmental Time-Alignment (STA). The proposed algorithm is fast and speaker-independent since it uses neural networks trained to discriminate among broad phoneme classes. This technique has been used to animate the MPEG-4 compliant DIST face model.","PeriodicalId":110736,"journal":{"name":"Proceedings Third International Conference on Computational Intelligence and Multimedia Applications. ICCIMA'99 (Cat. No.PR00300)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Third International Conference on Computational Intelligence and Multimedia Applications. ICCIMA'99 (Cat. No.PR00300)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIMA.1999.798565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The bimodal acoustic-visual effect is of extreme importance in human face-to-face communication; it has been broadly investigated and the improvement in understanding when visual cues are integrated with speech has been clearly demonstrated, with particular emphasis in noisy environments. In this paper, we propose a novel synchronization procedure for speech and text, consisting of a neural network-based acoustic segmentation method for phoneme classes and a phonetic-acoustic time alignment algorithm which we call Segmental Time-Alignment (STA). The proposed algorithm is fast and speaker-independent since it uses neural networks trained to discriminate among broad phoneme classes. This technique has been used to animate the MPEG-4 compliant DIST face model.