{"title":"基于TDPSOLA的语音修改基本信号的基音标记","authors":"F. Ykhlef, L. Bendaouia","doi":"10.1109/ISM.2013.28","DOIUrl":null,"url":null,"abstract":"The quality of synthetic speech offered by pitch and duration modifications via Time Domain Pitch Synchronous Overlap Add method (TD-PSOLA) relies on an accurate positioning of pitch marks. In this paper, we propose a new pitch marking technique of voiced regions based on the fundamental signal of the speech waveform. By using the valleys of the fundamental signal, we locate a set of precise intervals where the exact instants of pitch marks are expected to be found. The fundamental signal is composed only from the fundamental frequency (pitch) of speech. It is represented by a specific signal named \"mean based signal\" (MBS). The optimal pitch marks are found by extracting the set of global peak instants within the obtained intervals. To improve the performance of the proposed technique, we have proposed a post processing stage which allows us to correct the erroneous pitch marks that may occur due to some synchronization problems. The proposed technique is evaluated on CMU ACRTIC database by using objective and subjective measures. The experiments demonstrate that the proposed technique allows pitch and duration modifications via TD-PSOLA with high quality.","PeriodicalId":6311,"journal":{"name":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","volume":"57 1","pages":"118-124"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Pitch Marking Using the Fundamental Signal for Speech Modifications via TDPSOLA\",\"authors\":\"F. Ykhlef, L. Bendaouia\",\"doi\":\"10.1109/ISM.2013.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The quality of synthetic speech offered by pitch and duration modifications via Time Domain Pitch Synchronous Overlap Add method (TD-PSOLA) relies on an accurate positioning of pitch marks. In this paper, we propose a new pitch marking technique of voiced regions based on the fundamental signal of the speech waveform. By using the valleys of the fundamental signal, we locate a set of precise intervals where the exact instants of pitch marks are expected to be found. The fundamental signal is composed only from the fundamental frequency (pitch) of speech. It is represented by a specific signal named \\\"mean based signal\\\" (MBS). The optimal pitch marks are found by extracting the set of global peak instants within the obtained intervals. To improve the performance of the proposed technique, we have proposed a post processing stage which allows us to correct the erroneous pitch marks that may occur due to some synchronization problems. The proposed technique is evaluated on CMU ACRTIC database by using objective and subjective measures. The experiments demonstrate that the proposed technique allows pitch and duration modifications via TD-PSOLA with high quality.\",\"PeriodicalId\":6311,\"journal\":{\"name\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"volume\":\"57 1\",\"pages\":\"118-124\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISM.2013.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2013.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pitch Marking Using the Fundamental Signal for Speech Modifications via TDPSOLA
The quality of synthetic speech offered by pitch and duration modifications via Time Domain Pitch Synchronous Overlap Add method (TD-PSOLA) relies on an accurate positioning of pitch marks. In this paper, we propose a new pitch marking technique of voiced regions based on the fundamental signal of the speech waveform. By using the valleys of the fundamental signal, we locate a set of precise intervals where the exact instants of pitch marks are expected to be found. The fundamental signal is composed only from the fundamental frequency (pitch) of speech. It is represented by a specific signal named "mean based signal" (MBS). The optimal pitch marks are found by extracting the set of global peak instants within the obtained intervals. To improve the performance of the proposed technique, we have proposed a post processing stage which allows us to correct the erroneous pitch marks that may occur due to some synchronization problems. The proposed technique is evaluated on CMU ACRTIC database by using objective and subjective measures. The experiments demonstrate that the proposed technique allows pitch and duration modifications via TD-PSOLA with high quality.