A Robust Pipeline based Deep Learning Approach to Detect Speech Attribution

Shreya Chakravarty, R. Khandelwal
{"title":"A Robust Pipeline based Deep Learning Approach to Detect Speech Attribution","authors":"Shreya Chakravarty, R. Khandelwal","doi":"10.1109/I2CT57861.2023.10126219","DOIUrl":null,"url":null,"abstract":"The \"thinking machines\" today, breathe hand-in-hand with the blessing of expunging human effort, as well as the disadvantage of being misused easily. There are enormous applications of automation, one of the most popular being speech recognition. Automated systems can now be controlled by voice commands, and also can provide human-like responses, whether it is appearance or communication media like speech. There won’t always be times when the source of audio would be in ideal surroundings. This aggravates the possibility of human-system interaction involving audio aberrations and hence, raises a great apprehension regarding forensic issues like authenticity and the source of the given audio, which calls for a challenge to resolve. This paper seeks to illustrate thorough augmentation of audio data for a robust solution that eradicates the anomalies in audio using a pipeline approach. We propose analysing the spectrogram representation of an audio signal to determine a mask that segregates noise from pure signal, and results in a signal that can be processed for speech recognition, further extending to fabrication of a deep neural network having an accuracy of 95.87%.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The "thinking machines" today, breathe hand-in-hand with the blessing of expunging human effort, as well as the disadvantage of being misused easily. There are enormous applications of automation, one of the most popular being speech recognition. Automated systems can now be controlled by voice commands, and also can provide human-like responses, whether it is appearance or communication media like speech. There won’t always be times when the source of audio would be in ideal surroundings. This aggravates the possibility of human-system interaction involving audio aberrations and hence, raises a great apprehension regarding forensic issues like authenticity and the source of the given audio, which calls for a challenge to resolve. This paper seeks to illustrate thorough augmentation of audio data for a robust solution that eradicates the anomalies in audio using a pipeline approach. We propose analysing the spectrogram representation of an audio signal to determine a mask that segregates noise from pure signal, and results in a signal that can be processed for speech recognition, further extending to fabrication of a deep neural network having an accuracy of 95.87%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于鲁棒管道的深度学习语音归因检测方法
今天的“思考机器”,与消除人类努力的好处一起呼吸,同时也有容易被滥用的缺点。自动化有很多应用,其中最流行的是语音识别。自动化系统现在可以通过语音命令来控制,也可以提供类似人类的反应,无论是外观还是像语音这样的交流媒介。并非所有情况下音频源都处于理想环境中。这加剧了涉及音频畸变的人类系统交互的可能性,因此,引起了对真实性和给定音频来源等法医问题的极大担忧,这需要挑战来解决。本文旨在说明音频数据的全面增强,以实现一个强大的解决方案,该解决方案使用管道方法消除音频中的异常。我们建议分析音频信号的频谱图表示,以确定将噪声从纯信号中分离出来的掩模,并产生可用于语音识别的信号,进一步扩展到具有95.87%精度的深度神经网络的制造。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Investigation on Impact of Partial Shading on Solar PV Array Character and Word Level Gesture Recognition of Indian Sign Language Electricity Theft Detection Employing Machine Learning Algorithms Precision Agriculture: Classifying Banana Leaf Diseases with Hybrid Deep Learning Models Multimodal Question Generation using Multimodal Adaptation Gate (MAG) and BERT-based Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1