{"title":"Statistical discrimination and identification of some acoustic sounds","authors":"H. Sayoud, S. Ouamour","doi":"10.1109/IEEEGCC.2006.5686246","DOIUrl":null,"url":null,"abstract":"Given that most of the speech signal recordings are generally mixed with other sounds like music, songs, or noises and knowing that the processing of any speech signal will be easier when we separate the speech area from the non-speech area, we propose a preprocessing method for speech/ non speech discrimination which is also able to identify some acoustic sounds, by using some statistical observations (mean, standard deviation) linked to a statistic measure of similarity (μGc). Since it has been possible to discriminate between speakers thanks to the small within-variability and the large between-variability of the speaker's acoustic features, we thought to extend this property for the purpose of acoustic sounds discrimination. Thus, we led an investigation on different types of sounds as: noises, music and speech (speech signals are extracted from TIMIT database). The purpose of this investigation is to try to define a separate class for each type of sound according to the similarity measure μGc. Experiments showed that the similarity distance range, between speech and other acoustic signals, has a mean and standard deviation which are specific for each sound. So, for instance it will be possible to state whether a particular audio signal is really speech or non-speech, only by observing the statistical range of the μGc which is chosen as a similarity distance. For instance, we have deduced that thanks to the value of μGc it is possible to know if an audio frame is a pure speech or music: if μGc is within [2.5-4.9] then the considered sound should be music.","PeriodicalId":433452,"journal":{"name":"2006 IEEE GCC Conference (GCC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE GCC Conference (GCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEEEGCC.2006.5686246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Given that most of the speech signal recordings are generally mixed with other sounds like music, songs, or noises and knowing that the processing of any speech signal will be easier when we separate the speech area from the non-speech area, we propose a preprocessing method for speech/ non speech discrimination which is also able to identify some acoustic sounds, by using some statistical observations (mean, standard deviation) linked to a statistic measure of similarity (μGc). Since it has been possible to discriminate between speakers thanks to the small within-variability and the large between-variability of the speaker's acoustic features, we thought to extend this property for the purpose of acoustic sounds discrimination. Thus, we led an investigation on different types of sounds as: noises, music and speech (speech signals are extracted from TIMIT database). The purpose of this investigation is to try to define a separate class for each type of sound according to the similarity measure μGc. Experiments showed that the similarity distance range, between speech and other acoustic signals, has a mean and standard deviation which are specific for each sound. So, for instance it will be possible to state whether a particular audio signal is really speech or non-speech, only by observing the statistical range of the μGc which is chosen as a similarity distance. For instance, we have deduced that thanks to the value of μGc it is possible to know if an audio frame is a pure speech or music: if μGc is within [2.5-4.9] then the considered sound should be music.