{"title":"Speech Recognition and Separation System using Deep Learning","authors":"Meet Singh Chauhan, R. Mishra, Manish I. Patel","doi":"10.1109/ICSES52305.2021.9633779","DOIUrl":null,"url":null,"abstract":"Human voice is considered one of the most important features and speech helps humans to communicate with each other. Analysis of speech features is carried out to recognize and separate the target speech. Speech signals are continuous and generally contain overlap regions which make conventional methods like signal based matrices inefficient, thus there is a need to develop an advanced and efficient, architecture that can handle speech recognition and speech separation efficiently. This paper provides a brief view of the work carried out for the speech recognition and separation process with the help of deep learning using mel-frequency cepstral coefficients as a parameter. The speech recognition model is implemented using MFCC-DNN based approach and the speech separation model is based on DNN architecture. Various methods were used like MFCC extraction, DNN tuning, etc. to get better performance and higher accuracy than conventional methods like single channel speech separation, HMM etc.","PeriodicalId":6777,"journal":{"name":"2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES)","volume":"1 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSES52305.2021.9633779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Human voice is considered one of the most important features and speech helps humans to communicate with each other. Analysis of speech features is carried out to recognize and separate the target speech. Speech signals are continuous and generally contain overlap regions which make conventional methods like signal based matrices inefficient, thus there is a need to develop an advanced and efficient, architecture that can handle speech recognition and speech separation efficiently. This paper provides a brief view of the work carried out for the speech recognition and separation process with the help of deep learning using mel-frequency cepstral coefficients as a parameter. The speech recognition model is implemented using MFCC-DNN based approach and the speech separation model is based on DNN architecture. Various methods were used like MFCC extraction, DNN tuning, etc. to get better performance and higher accuracy than conventional methods like single channel speech separation, HMM etc.