{"title":"用谱和韵律特征描述婴儿哭声","authors":"R. R. Vempada, B. Kumar, K. S. Rao","doi":"10.1109/NCC.2012.6176851","DOIUrl":null,"url":null,"abstract":"In this paper, spectral and prosodic features are explored for recognition of infant cry. Different types of infant cries considered in this work are wet-diaper, hunger and pain. In this work, mel-frequency cepstral coefficients (MFCC) are used to represent the spectral information, and short-time frame energies (STE) and pause duration are used for representing the prosodic information. Support Vector Machines (SVM) are used to capture the discriminative information with respect to above mentioned cries from the spectral and prosodic features. SVM models are developed seperately using spectral and prosodic features. For carrying out these studies, infant cry database collected under Telemedicine project at IIT-KGP has been used. The recognition performance of the developed SVM models using spectral and prosodic features is observed to be 61.11% and 57.41% respectively. In this work, we also examined the recognition performance by combining the spectral and prosodic information at feature and score levels. The recognition performance using feature and score level fusion is observed to be 74.07% and 80.56% respectively.","PeriodicalId":178278,"journal":{"name":"2012 National Conference on Communications (NCC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Characterization of infant cries using spectral and prosodic features\",\"authors\":\"R. R. Vempada, B. Kumar, K. S. Rao\",\"doi\":\"10.1109/NCC.2012.6176851\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, spectral and prosodic features are explored for recognition of infant cry. Different types of infant cries considered in this work are wet-diaper, hunger and pain. In this work, mel-frequency cepstral coefficients (MFCC) are used to represent the spectral information, and short-time frame energies (STE) and pause duration are used for representing the prosodic information. Support Vector Machines (SVM) are used to capture the discriminative information with respect to above mentioned cries from the spectral and prosodic features. SVM models are developed seperately using spectral and prosodic features. For carrying out these studies, infant cry database collected under Telemedicine project at IIT-KGP has been used. The recognition performance of the developed SVM models using spectral and prosodic features is observed to be 61.11% and 57.41% respectively. In this work, we also examined the recognition performance by combining the spectral and prosodic information at feature and score levels. The recognition performance using feature and score level fusion is observed to be 74.07% and 80.56% respectively.\",\"PeriodicalId\":178278,\"journal\":{\"name\":\"2012 National Conference on Communications (NCC)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2012.6176851\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2012.6176851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Characterization of infant cries using spectral and prosodic features
In this paper, spectral and prosodic features are explored for recognition of infant cry. Different types of infant cries considered in this work are wet-diaper, hunger and pain. In this work, mel-frequency cepstral coefficients (MFCC) are used to represent the spectral information, and short-time frame energies (STE) and pause duration are used for representing the prosodic information. Support Vector Machines (SVM) are used to capture the discriminative information with respect to above mentioned cries from the spectral and prosodic features. SVM models are developed seperately using spectral and prosodic features. For carrying out these studies, infant cry database collected under Telemedicine project at IIT-KGP has been used. The recognition performance of the developed SVM models using spectral and prosodic features is observed to be 61.11% and 57.41% respectively. In this work, we also examined the recognition performance by combining the spectral and prosodic information at feature and score levels. The recognition performance using feature and score level fusion is observed to be 74.07% and 80.56% respectively.