{"title":"基于音高的音频分类特征提取","authors":"A.R. Abu-El-Quran, R. Goubran","doi":"10.1109/HAVE.2003.1244723","DOIUrl":null,"url":null,"abstract":"This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Pitch-based feature extraction for audio classification\",\"authors\":\"A.R. Abu-El-Quran, R. Goubran\",\"doi\":\"10.1109/HAVE.2003.1244723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.\",\"PeriodicalId\":431267,\"journal\":{\"name\":\"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HAVE.2003.1244723\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HAVE.2003.1244723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pitch-based feature extraction for audio classification
This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.