Classification and Separation of Audio and Music Signals

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Multimedia Information Retrieval Pub Date : 2020-12-15 DOI:10.5772/intechopen.94940

A. Al-Shoshan

{"title":"Classification and Separation of Audio and Music Signals","authors":"A. Al-Shoshan","doi":"10.5772/intechopen.94940","DOIUrl":null,"url":null,"abstract":"This chapter addresses the topic of classification and separation of audio and music signals. It is a very important and a challenging research area. The importance of classification process of a stream of sounds come up for the sake of building two different libraries: speech library and music library. However, the separation process is needed sometimes in a cocktail-party problem to separate speech from music and remove the undesired one. In this chapter, some existed algorithms for the classification process and the separation process are presented and discussed thoroughly. The classification algorithms will be divided into three categories. The first category includes most of the real time approaches. The second category includes most of the frequency domain approaches. However, the third category introduces some of the approaches in the time-frequency distribution. The approaches of time domain discussed in this chapter are the short-time energy (STE), the zero-crossing rate (ZCR), modified version of the ZCR and the STE with positive derivative, the neural networks, and the roll-off variance. The approaches of the frequency spectrum are specifically the roll-off of the spectrum, the spectral centroid and the variance of the spectral centroid, the spectral flux and the variance of the spectral flux, the cepstral residual, and the delta pitch. The time-frequency domain approaches have not been yet tested thoroughly in the process of classification and separation of audio and music signals. Therefore, the spectrogram and the evolutionary spectrum will be introduced and discussed. In addition, some algorithms for separation and segregation of music and audio signals, like the independent Component Analysis, the pitch cancelation and the artificial neural networks will be introduced.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"58 2 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2020-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Multimedia Information Retrieval","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.5772/intechopen.94940","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This chapter addresses the topic of classification and separation of audio and music signals. It is a very important and a challenging research area. The importance of classification process of a stream of sounds come up for the sake of building two different libraries: speech library and music library. However, the separation process is needed sometimes in a cocktail-party problem to separate speech from music and remove the undesired one. In this chapter, some existed algorithms for the classification process and the separation process are presented and discussed thoroughly. The classification algorithms will be divided into three categories. The first category includes most of the real time approaches. The second category includes most of the frequency domain approaches. However, the third category introduces some of the approaches in the time-frequency distribution. The approaches of time domain discussed in this chapter are the short-time energy (STE), the zero-crossing rate (ZCR), modified version of the ZCR and the STE with positive derivative, the neural networks, and the roll-off variance. The approaches of the frequency spectrum are specifically the roll-off of the spectrum, the spectral centroid and the variance of the spectral centroid, the spectral flux and the variance of the spectral flux, the cepstral residual, and the delta pitch. The time-frequency domain approaches have not been yet tested thoroughly in the process of classification and separation of audio and music signals. Therefore, the spectrogram and the evolutionary spectrum will be introduced and discussed. In addition, some algorithms for separation and segregation of music and audio signals, like the independent Component Analysis, the pitch cancelation and the artificial neural networks will be introduced.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

音频和音乐信号的分类与分离

本章讨论音频和音乐信号的分类和分离。这是一个非常重要且具有挑战性的研究领域。为了建立两个不同的库:语音库和音乐库，人们提出了声音流分类过程的重要性。然而，在鸡尾酒会问题中，有时需要分离过程来将语音从音乐中分离出来并删除不需要的部分。本章对现有的分类和分离算法进行了详细的介绍和讨论。分类算法将分为三类。第一类包括大多数实时方法。第二类包括大多数频域方法。然而，第三类介绍了时频分布中的一些方法。本章讨论的时域方法包括短时能量法(STE)、过零率法(ZCR)、过零率法(ZCR)和过零率法的正导数修正法、神经网络法和滚转方差法。频谱的处理方法主要有频谱的滚转、频谱质心和频谱质心的方差、频谱通量和频谱通量的方差、倒谱残差和δ基音。在音频和音乐信号的分类和分离过程中，时频域方法尚未得到充分的验证。因此，本文将对谱图和进化谱进行介绍和讨论。此外，还将介绍一些用于音乐和音频信号分离和分离的算法，如独立分量分析、音高消除和人工神经网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Multimedia Information Retrieval Multiple-

CiteScore

7.80

自引率

5.40%

发文量

期刊介绍： Aims and Scope The International Journal of Multimedia Information Retrieval (IJMIR) is a scholarly archival journal publishing original, peer-reviewed research contributions. Its editorial board strives to present the most important research results in areas within the field of multimedia information retrieval. Core areas include exploration, search, and mining in general collections of multimedia consisting of information from the WWW to scientific imaging to personal archives. Comprehensive review and survey papers that offer up new insights, and lay the foundations for further exploratory and experimental work, are also relevant. Relevant topics include Image and video retrieval - theory, algorithms, and systems Social media interaction and retrieval - collaborative filtering, social voting and ranking Music and audio retrieval - theory, algorithms, and systems Scientific and Bio-imaging - MRI, X-ray, ultrasound imaging analysis and retrieval Semantic learning - visual concept detection, object recognition, and tag learning Exploration of media archives - browsing, experiential computing Interfaces - multimedia exploration, visualization, query and retrieval Multimedia mining - life logs, WWW media mining, pervasive media analysis Interactive search - interactive learning and relevance feedback in multimedia retrieval Distributed and high performance media search - efficient and very large scale search Applications - preserving cultural heritage, 3D graphics models, etc. Editorial Policies: We aim for a fast decision time (less than 4 months for the initial decision) There are no page charges in IJMIR. Papers are published on line in advance of print publication. Academic, industrial researchers, and practitioners involved with multimedia search, exploration, and mining will find IJMIR to be an essential source for important results in the field.

期刊最新文献

Text-assisted attention-based cross-modal hashing Augmented inputs for surveillance re-identification Image enhancement with bi-directional normalization and color attention-guided generative adversarial networks PSNet: position-shift alignment network for image caption Sentiment analysis using deep learning techniques: a comprehensive review