К. Koshekov, А. Savostin, B. Seidakhmetov, R. Anayatova, I. Fedorov
{"title":"基于深度学习技术的语音信号情感识别航空剖面方法","authors":"К. Koshekov, А. Savostin, B. Seidakhmetov, R. Anayatova, I. Fedorov","doi":"10.2478/ttj-2021-0037","DOIUrl":null,"url":null,"abstract":"Abstract This paper proposes a method of automatic speaker-independent recognition of human psycho-emotional states by analyzing the speech signal based on Deep Learning technology to solve the problems of aviation profiling. For this purpose, an algorithm to classify seven human psycho-emotional states, including anger, joy, fear, surprise, disgust, sadness, and neutral state was developed. The algorithm is based on the use of Mel-frequency cepstral coefficients and Mel spectrograms as informative features of speech signals audio recordings. These informative features are used to train two deep convolutional neural networks on the generated dataset. The developed classifier testing on a delayed verification dataset showed that the metric for the multiclass fraction of correct answers’ accuracy is 0.93. The solution proposed in the paper can be in demand in human-machine interfaces creation, medicine, marketing, and in the field of air transportation.","PeriodicalId":44110,"journal":{"name":"Transport and Telecommunication Journal","volume":"52 1","pages":"471 - 481"},"PeriodicalIF":1.1000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Aviation Profiling Method Based on Deep Learning Technology for Emotion Recognition by Speech Signal\",\"authors\":\"К. Koshekov, А. Savostin, B. Seidakhmetov, R. Anayatova, I. Fedorov\",\"doi\":\"10.2478/ttj-2021-0037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract This paper proposes a method of automatic speaker-independent recognition of human psycho-emotional states by analyzing the speech signal based on Deep Learning technology to solve the problems of aviation profiling. For this purpose, an algorithm to classify seven human psycho-emotional states, including anger, joy, fear, surprise, disgust, sadness, and neutral state was developed. The algorithm is based on the use of Mel-frequency cepstral coefficients and Mel spectrograms as informative features of speech signals audio recordings. These informative features are used to train two deep convolutional neural networks on the generated dataset. The developed classifier testing on a delayed verification dataset showed that the metric for the multiclass fraction of correct answers’ accuracy is 0.93. The solution proposed in the paper can be in demand in human-machine interfaces creation, medicine, marketing, and in the field of air transportation.\",\"PeriodicalId\":44110,\"journal\":{\"name\":\"Transport and Telecommunication Journal\",\"volume\":\"52 1\",\"pages\":\"471 - 481\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transport and Telecommunication Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/ttj-2021-0037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transport and Telecommunication Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/ttj-2021-0037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
Aviation Profiling Method Based on Deep Learning Technology for Emotion Recognition by Speech Signal
Abstract This paper proposes a method of automatic speaker-independent recognition of human psycho-emotional states by analyzing the speech signal based on Deep Learning technology to solve the problems of aviation profiling. For this purpose, an algorithm to classify seven human psycho-emotional states, including anger, joy, fear, surprise, disgust, sadness, and neutral state was developed. The algorithm is based on the use of Mel-frequency cepstral coefficients and Mel spectrograms as informative features of speech signals audio recordings. These informative features are used to train two deep convolutional neural networks on the generated dataset. The developed classifier testing on a delayed verification dataset showed that the metric for the multiclass fraction of correct answers’ accuracy is 0.93. The solution proposed in the paper can be in demand in human-machine interfaces creation, medicine, marketing, and in the field of air transportation.