基于音高的音频分类特征提取

The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings. Pub Date : 2003-11-10 DOI:10.1109/HAVE.2003.1244723

A.R. Abu-El-Quran, R. Goubran

{"title":"基于音高的音频分类特征提取","authors":"A.R. Abu-El-Quran, R. Goubran","doi":"10.1109/HAVE.2003.1244723","DOIUrl":null,"url":null,"abstract":"This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.","PeriodicalId":431267,"journal":{"name":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Pitch-based feature extraction for audio classification\",\"authors\":\"A.R. Abu-El-Quran, R. Goubran\",\"doi\":\"10.1109/HAVE.2003.1244723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.\",\"PeriodicalId\":431267,\"journal\":{\"name\":\"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HAVE.2003.1244723\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HAVE.2003.1244723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

本文提出了一种区分语音和非语音音频片段的新算法。它旨在用于安全应用以及配备麦克风阵列的音频会议系统中的讲话者位置识别。该方法基于将音频片段分割成小帧并检测每个小帧上是否存在音高。检测到的具有基音的帧数与总帧数的比值被定义为基音比，并被用作语音和非语音片段分类的主要特征。使用包含女性和男性语音的音频片段库以及计算机风扇噪声、鸡尾酒噪声、脚步声和交通噪声等非语音片段来评估所提出方法的性能。实验表明，该算法对0.5秒长的语音片段的判断正确率为97%，对非语音片段的判断正确率为98%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Pitch-based feature extraction for audio classification

This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identification in audio conferencing systems, equipped with microphone arrays. The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch on each one of them. The ratio of frames with pitch detected to the total number of frames is defined as the pitch ratio and is used as the main feature to classify speech and non-speech segments. The performance of the proposed method is evaluated using a library of audio segments containing female and male speech, and non-speech segments such as computer fan noise, cocktail noise, footsteps, and traffic noise. It is shown that the proposed algorithm can achieve correct decision of 97% for the speech and 98% for non-speech segments, 0.5-seconds long.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceedings.

自引率

0.00%

发文量

期刊最新文献

The effect of time delays on tele-haptics Development of a humanoid avatar in Java3D Haptic/graphic interface for in-vehicle comfort functions - a simulator study and an experimental study A novel semi-fragile audio watermarking scheme Optical character recognition for model-based object recognition applications