{"title":"文本驱动的虚拟扬声器","authors":"V. Obradović, Ilija Rajak, M. Secujski, V. Delić","doi":"10.23919/eusipco55093.2022.9909813","DOIUrl":null,"url":null,"abstract":"Online courses have had exponential growth during COVID-19 pandemic, and video lectures are also important for lifelong learning. However, lecturers experience a number of challenges in creating video lectures, related to both speech recording (microphone and noise; diction, articulation and intonation) and video recording (camera and light; consistency in appearance). It is particularly difficult to modify and update recorded content. The paper presents a solution for these problems based on the application of artificial intelligence in creating virtual speakers based on TTS synthesis and Wav2Lip GAN trained on a custom data set. A pilot project which included the evaluation and testing of the developed system by dozens of teachers will be presented in detail. The use of TTS overcomes the problems in achieving speaker consistency by providing high quality speech in different languages, while the attention and motivation of students is improved by using animated virtual speakers.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text driven virtual speakers\",\"authors\":\"V. Obradović, Ilija Rajak, M. Secujski, V. Delić\",\"doi\":\"10.23919/eusipco55093.2022.9909813\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online courses have had exponential growth during COVID-19 pandemic, and video lectures are also important for lifelong learning. However, lecturers experience a number of challenges in creating video lectures, related to both speech recording (microphone and noise; diction, articulation and intonation) and video recording (camera and light; consistency in appearance). It is particularly difficult to modify and update recorded content. The paper presents a solution for these problems based on the application of artificial intelligence in creating virtual speakers based on TTS synthesis and Wav2Lip GAN trained on a custom data set. A pilot project which included the evaluation and testing of the developed system by dozens of teachers will be presented in detail. The use of TTS overcomes the problems in achieving speaker consistency by providing high quality speech in different languages, while the attention and motivation of students is improved by using animated virtual speakers.\",\"PeriodicalId\":231263,\"journal\":{\"name\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eusipco55093.2022.9909813\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Online courses have had exponential growth during COVID-19 pandemic, and video lectures are also important for lifelong learning. However, lecturers experience a number of challenges in creating video lectures, related to both speech recording (microphone and noise; diction, articulation and intonation) and video recording (camera and light; consistency in appearance). It is particularly difficult to modify and update recorded content. The paper presents a solution for these problems based on the application of artificial intelligence in creating virtual speakers based on TTS synthesis and Wav2Lip GAN trained on a custom data set. A pilot project which included the evaluation and testing of the developed system by dozens of teachers will be presented in detail. The use of TTS overcomes the problems in achieving speaker consistency by providing high quality speech in different languages, while the attention and motivation of students is improved by using animated virtual speakers.