A Robust Pipeline based Deep Learning Approach to Detect Speech Attribution

2023 IEEE 8th International Conference for Convergence in Technology (I2CT) Pub Date : 2023-04-07 DOI:10.1109/I2CT57861.2023.10126219

Shreya Chakravarty, R. Khandelwal

{"title":"A Robust Pipeline based Deep Learning Approach to Detect Speech Attribution","authors":"Shreya Chakravarty, R. Khandelwal","doi":"10.1109/I2CT57861.2023.10126219","DOIUrl":null,"url":null,"abstract":"The \"thinking machines\" today, breathe hand-in-hand with the blessing of expunging human effort, as well as the disadvantage of being misused easily. There are enormous applications of automation, one of the most popular being speech recognition. Automated systems can now be controlled by voice commands, and also can provide human-like responses, whether it is appearance or communication media like speech. There won’t always be times when the source of audio would be in ideal surroundings. This aggravates the possibility of human-system interaction involving audio aberrations and hence, raises a great apprehension regarding forensic issues like authenticity and the source of the given audio, which calls for a challenge to resolve. This paper seeks to illustrate thorough augmentation of audio data for a robust solution that eradicates the anomalies in audio using a pipeline approach. We propose analysing the spectrogram representation of an audio signal to determine a mask that segregates noise from pure signal, and results in a signal that can be processed for speech recognition, further extending to fabrication of a deep neural network having an accuracy of 95.87%.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":"385 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The "thinking machines" today, breathe hand-in-hand with the blessing of expunging human effort, as well as the disadvantage of being misused easily. There are enormous applications of automation, one of the most popular being speech recognition. Automated systems can now be controlled by voice commands, and also can provide human-like responses, whether it is appearance or communication media like speech. There won’t always be times when the source of audio would be in ideal surroundings. This aggravates the possibility of human-system interaction involving audio aberrations and hence, raises a great apprehension regarding forensic issues like authenticity and the source of the given audio, which calls for a challenge to resolve. This paper seeks to illustrate thorough augmentation of audio data for a robust solution that eradicates the anomalies in audio using a pipeline approach. We propose analysing the spectrogram representation of an audio signal to determine a mask that segregates noise from pure signal, and results in a signal that can be processed for speech recognition, further extending to fabrication of a deep neural network having an accuracy of 95.87%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于鲁棒管道的深度学习语音归因检测方法

今天的“思考机器”，与消除人类努力的好处一起呼吸，同时也有容易被滥用的缺点。自动化有很多应用，其中最流行的是语音识别。自动化系统现在可以通过语音命令来控制，也可以提供类似人类的反应，无论是外观还是像语音这样的交流媒介。并非所有情况下音频源都处于理想环境中。这加剧了涉及音频畸变的人类系统交互的可能性，因此，引起了对真实性和给定音频来源等法医问题的极大担忧，这需要挑战来解决。本文旨在说明音频数据的全面增强，以实现一个强大的解决方案，该解决方案使用管道方法消除音频中的异常。我们建议分析音频信号的频谱图表示，以确定将噪声从纯信号中分离出来的掩模，并产生可用于语音识别的信号，进一步扩展到具有95.87%精度的深度神经网络的制造。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE 8th International Conference for Convergence in Technology (I2CT)

自引率

0.00%

发文量