{"title":"语音中说话人间和说话人内变异性波动的声学分析","authors":"J. Kaur, K. Juglan, Kush Sharma, Vishal Sharma","doi":"10.2139/ssrn.3960502","DOIUrl":null,"url":null,"abstract":"Background: Variation in the speech of speakers is a crucial issue for the forensic system. The main reason behind incorrect speaker identification is greater intra-speaker fluctuation. In the forensic state of play, a lot of research has been carried out on speaker identification. However inter variations and intra fluctuations in speakers for the Punjabi language is still a grey area. Aims and Objectives: Our aim is to study acoustic analysis of fluctuations for inter and intra speaker variability in speech sounds. In our study, we will consider Punjabi vowel with consonants. The Statistical methods will be applied to analyze the data; firstly, the Shapiro-Wilk test will be checked for normality and then Levene's Test to assess the equality of variances. Materials and Method: Five vowels were selected with different consonants. They were combined to make meaningful words. Then these meaningful words were embedded in sentences. Ten speakers participated voluntarily. All are students of A.S College at Khanna in Punjab. The individuals were aged between 20-22 years with no hearing or speech disorder. The voice samples were recorded with help of good quality microphone and by Goldwave software in the sound proof lab.Samples were introduced directly into PRAAT software by the use of a Sony microphone and with sampling rate of 44100 Hz frequency. Acoustic Analysis has been done with help of Goldwave software in form of spectrograms. Results and Conclusion: Each formant shows a different value for inter variations and inter speaker fluctuations. F1 and F2 shows lesser speaker variation than the high-frequency region in F3 and F4, so we can say that in comparison with the lower part, high-frequency regions are more valuable. The assumptions for TWO-WAY ANOVA is violated and hence, we have used the non-parametric Friedman Test and performed its Post hoc analysis. From Posthoc analysis, we can say that F1 and F2 (p >0.05) and F2 and F3 (p>0.05) gave the same type of results. Hence, from the results of these statistical tests, we can conclude that F1 is recommended over F2, F3, and F4. As the frequency of F1 is high as well as in line with the results of statistical tests. Because we prefer more variation among frequencies so that we can easily distinguish different speakers and it would be more beneficial for inter variations and intra fluctuations.","PeriodicalId":36434,"journal":{"name":"Journal of Forensic Science and Medicine","volume":"9 1","pages":"38 - 43"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An acoustic analysis of fluctuations for inter- and intra-speaker variability in speech sounds\",\"authors\":\"J. Kaur, K. Juglan, Kush Sharma, Vishal Sharma\",\"doi\":\"10.2139/ssrn.3960502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Variation in the speech of speakers is a crucial issue for the forensic system. The main reason behind incorrect speaker identification is greater intra-speaker fluctuation. In the forensic state of play, a lot of research has been carried out on speaker identification. However inter variations and intra fluctuations in speakers for the Punjabi language is still a grey area. Aims and Objectives: Our aim is to study acoustic analysis of fluctuations for inter and intra speaker variability in speech sounds. In our study, we will consider Punjabi vowel with consonants. The Statistical methods will be applied to analyze the data; firstly, the Shapiro-Wilk test will be checked for normality and then Levene's Test to assess the equality of variances. Materials and Method: Five vowels were selected with different consonants. They were combined to make meaningful words. Then these meaningful words were embedded in sentences. Ten speakers participated voluntarily. All are students of A.S College at Khanna in Punjab. The individuals were aged between 20-22 years with no hearing or speech disorder. The voice samples were recorded with help of good quality microphone and by Goldwave software in the sound proof lab.Samples were introduced directly into PRAAT software by the use of a Sony microphone and with sampling rate of 44100 Hz frequency. Acoustic Analysis has been done with help of Goldwave software in form of spectrograms. Results and Conclusion: Each formant shows a different value for inter variations and inter speaker fluctuations. F1 and F2 shows lesser speaker variation than the high-frequency region in F3 and F4, so we can say that in comparison with the lower part, high-frequency regions are more valuable. The assumptions for TWO-WAY ANOVA is violated and hence, we have used the non-parametric Friedman Test and performed its Post hoc analysis. From Posthoc analysis, we can say that F1 and F2 (p >0.05) and F2 and F3 (p>0.05) gave the same type of results. Hence, from the results of these statistical tests, we can conclude that F1 is recommended over F2, F3, and F4. As the frequency of F1 is high as well as in line with the results of statistical tests. Because we prefer more variation among frequencies so that we can easily distinguish different speakers and it would be more beneficial for inter variations and intra fluctuations.\",\"PeriodicalId\":36434,\"journal\":{\"name\":\"Journal of Forensic Science and Medicine\",\"volume\":\"9 1\",\"pages\":\"38 - 43\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Forensic Science and Medicine\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3960502\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Forensic Science and Medicine","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.2139/ssrn.3960502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
An acoustic analysis of fluctuations for inter- and intra-speaker variability in speech sounds
Background: Variation in the speech of speakers is a crucial issue for the forensic system. The main reason behind incorrect speaker identification is greater intra-speaker fluctuation. In the forensic state of play, a lot of research has been carried out on speaker identification. However inter variations and intra fluctuations in speakers for the Punjabi language is still a grey area. Aims and Objectives: Our aim is to study acoustic analysis of fluctuations for inter and intra speaker variability in speech sounds. In our study, we will consider Punjabi vowel with consonants. The Statistical methods will be applied to analyze the data; firstly, the Shapiro-Wilk test will be checked for normality and then Levene's Test to assess the equality of variances. Materials and Method: Five vowels were selected with different consonants. They were combined to make meaningful words. Then these meaningful words were embedded in sentences. Ten speakers participated voluntarily. All are students of A.S College at Khanna in Punjab. The individuals were aged between 20-22 years with no hearing or speech disorder. The voice samples were recorded with help of good quality microphone and by Goldwave software in the sound proof lab.Samples were introduced directly into PRAAT software by the use of a Sony microphone and with sampling rate of 44100 Hz frequency. Acoustic Analysis has been done with help of Goldwave software in form of spectrograms. Results and Conclusion: Each formant shows a different value for inter variations and inter speaker fluctuations. F1 and F2 shows lesser speaker variation than the high-frequency region in F3 and F4, so we can say that in comparison with the lower part, high-frequency regions are more valuable. The assumptions for TWO-WAY ANOVA is violated and hence, we have used the non-parametric Friedman Test and performed its Post hoc analysis. From Posthoc analysis, we can say that F1 and F2 (p >0.05) and F2 and F3 (p>0.05) gave the same type of results. Hence, from the results of these statistical tests, we can conclude that F1 is recommended over F2, F3, and F4. As the frequency of F1 is high as well as in line with the results of statistical tests. Because we prefer more variation among frequencies so that we can easily distinguish different speakers and it would be more beneficial for inter variations and intra fluctuations.