Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs

Valentina Franzoni
{"title":"Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs","authors":"Valentina Franzoni","doi":"10.32629/jai.v6i3.678","DOIUrl":null,"url":null,"abstract":"In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.","PeriodicalId":70721,"journal":{"name":"自主智能(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.32629/jai.v6i3.678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
跨域协同:利用图像处理技术,通过使用cnn的频谱图分析来增强声音分类
本文综述了利用图像处理技术对音频信号的频谱表示的潜力来进行声音分类的创新方法。本研究表明,结合成熟的图像处理方法,如滤波、分割和模式识别,在转换为频谱图时提高音频信号的特征提取和分类性能是有效的。概述了基于图像和基于频谱图的音频处理所共享的数学方法,重点介绍了这两个领域在基本原理、技术和算法方面的共性。所提出的方法特别利用卷积神经网络(cnn)从频谱图中提取和分类时频特征的能力,利用其分层特征学习和对平移和尺度变化的鲁棒性的优势。在分析过程中提出了其他深度学习网络和先进技术。我们讨论了将音频信号转换为频谱图的好处和局限性,包括人类的可解释性,与图像处理技术的兼容性以及时频分辨率的灵活性。通过弥合图像处理和音频处理之间的差距,基于频谱图的音频深度学习为声音分类提供了更深入的视角,为这两个领域的跨学科研究和应用奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
0.40
自引率
0.00%
发文量
25
期刊最新文献
Conditioning and monitoring of grinding wheels: A state-of-the-art review Design and implementation of secured file delivery protocol using enhanced elliptic curve cryptography for class I and class II transactions An improved fuzzy c-means-raindrop optimizer for brain magnetic resonance image segmentation Key management and access control based on combination of cipher text-policy attribute-based encryption with Proxy Re-Encryption for cloud data Novel scientific design of hybrid opposition based—Chaotic little golden-mantled flying fox, White-winged chough search optimization algorithm for real power loss reduction and voltage stability expansion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1