首页 > 最新文献

Eurasip Journal on Audio Speech and Music Processing最新文献

英文 中文
Articulation constrained learning with application to speech emotion recognition. 发音约束学习在语音情感识别中的应用。
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2019-01-01 Epub Date: 2019-08-20 DOI: 10.1186/s13636-019-0157-9
Mohit Shah, Ming Tu, Visar Berisha, Chaitali Chakrabarti, Andreas Spanias

Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional 1-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels /AA/,/AE/,/IY/,/UW/ and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.

结合发音信息和声学特征的语音情感识别方法已被证明可以提高识别性能。在许多情况下,大规模收集发音数据可能是不可行的,从而限制了这些方法的范围和适用性。本文提出了一种基于语音和发音信息的情感识别判别学习方法。将传统的1-正则化逻辑回归代价函数扩展到包含附加约束,以强制模型重构铰接数据。这导致同时为两个任务联合优化稀疏和可解释的表示。此外,该模型在训练过程中只需要发音特征;对样本外数据的推断只需要语音特征。实验评估了对元音/AA/、/AE/、/IY/、/UW/和完整语音的情绪识别性能。结合发音信息可以显着提高基于值的分类的性能。在语料库内和跨语料库的分类情绪识别结果表明,该方法在区分快乐和其他情绪方面更有效。
{"title":"Articulation constrained learning with application to speech emotion recognition.","authors":"Mohit Shah,&nbsp;Ming Tu,&nbsp;Visar Berisha,&nbsp;Chaitali Chakrabarti,&nbsp;Andreas Spanias","doi":"10.1186/s13636-019-0157-9","DOIUrl":"https://doi.org/10.1186/s13636-019-0157-9","url":null,"abstract":"<p><p>Speech emotion recognition methods combining articulatory information with acoustic features have been previously shown to improve recognition performance. Collection of articulatory data on a large scale may not be feasible in many scenarios, thus restricting the scope and applicability of such methods. In this paper, a discriminative learning method for emotion recognition using both articulatory and acoustic information is proposed. A traditional <i>ℓ</i> <sub>1</sub>-regularized logistic regression cost function is extended to include additional constraints that enforce the model to reconstruct articulatory data. This leads to sparse and interpretable representations jointly optimized for both tasks simultaneously. Furthermore, the model only requires articulatory features during training; only speech features are required for inference on out-of-sample data. Experiments are conducted to evaluate emotion recognition performance over vowels <i>/AA/,/AE/,/IY/,/UW/</i> and complete utterances. Incorporating articulatory information is shown to significantly improve the performance for valence-based classification. Results obtained for within-corpus and cross-corpus categorical emotion recognition indicate that the proposed method is more effective at distinguishing happiness from other emotions.</p>","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2019 ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13636-019-0157-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37471483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass 从原始音频到无缝混合:为鼓和贝斯创建一个自动化的DJ系统
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2018-09-24 DOI: 10.1186/s13636-018-0134-8
Len Vande Veire, Tijl De Bie
{"title":"From raw audio to a seamless mix: creating an automated DJ system for Drum and Bass","authors":"Len Vande Veire, Tijl De Bie","doi":"10.1186/s13636-018-0134-8","DOIUrl":"https://doi.org/10.1186/s13636-018-0134-8","url":null,"abstract":"","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"98 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2018-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73628910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases. 在孤立音符和独奏乐句中识别乐器的仿生光谱-时间特征。
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2015-01-01 DOI: 10.1186/s13636-015-0070-9
Kailash Patil, Mounya Elhilali

The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be best captured through an analysis that encompasses both time and frequency domains; with a focus on the modulations or changes in the signal in the spectrotemporal space. This representation mimics the spectrotemporal receptive field (STRF) analysis believed to underlie processing in the central mammalian auditory system, particularly at the level of primary auditory cortex. How well does this STRF representation capture timbral identity of musical instruments in continuous solo recordings remains unclear. The current work investigates the applicability of the STRF feature space for instrument recognition in solo musical phrases and explores best approaches to leveraging knowledge from isolated musical notes for instrument recognition in solo recordings. The study presents an approach for parsing solo performances into their individual note constituents and adapting back-end classifiers using support vector machines to achieve a generalization of instrument recognition to off-the-shelf, commercially available solo music.

乐器的身份体现在用乐器演奏的音符的声学属性上。最近,有人认为,音乐身份(或音色)的这些特征可以通过包括时域和频域的分析来最好地捕捉;聚焦于信号在光谱时间空间中的调制或变化。这种表征模仿了谱颞感受野(STRF)分析,该分析被认为是哺乳动物中枢听觉系统处理的基础,特别是在初级听觉皮层的水平。在连续的独奏录音中,这种STRF表示如何很好地捕捉乐器的音色身份仍然不清楚。目前的工作调查了STRF特征空间在独奏乐句中乐器识别的适用性,并探索了在独奏录音中利用孤立音符知识进行乐器识别的最佳方法。该研究提出了一种将独奏表演解析为单个音符成分的方法,并使用支持向量机调整后端分类器,以实现对现成的、市售的独奏音乐的乐器识别的泛化。
{"title":"Biomimetic spectro-temporal features for music instrument recognition in isolated notes and solo phrases.","authors":"Kailash Patil,&nbsp;Mounya Elhilali","doi":"10.1186/s13636-015-0070-9","DOIUrl":"https://doi.org/10.1186/s13636-015-0070-9","url":null,"abstract":"<p><p>The identity of musical instruments is reflected in the acoustic attributes of musical notes played with them. Recently, it has been argued that these characteristics of musical identity (or timbre) can be best captured through an analysis that encompasses both time and frequency domains; with a focus on the modulations or changes in the signal in the spectrotemporal space. This representation mimics the spectrotemporal receptive field (STRF) analysis believed to underlie processing in the central mammalian auditory system, particularly at the level of primary auditory cortex. How well does this STRF representation capture timbral identity of musical instruments in continuous solo recordings remains unclear. The current work investigates the applicability of the STRF feature space for instrument recognition in solo musical phrases and explores best approaches to leveraging knowledge from isolated musical notes for instrument recognition in solo recordings. The study presents an approach for parsing solo performances into their individual note constituents and adapting back-end classifiers using support vector machines to achieve a generalization of instrument recognition to off-the-shelf, commercially available solo music.</p>","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2015 ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13636-015-0070-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36776486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Biomimetic multi-resolution analysis for robust speaker recognition. 鲁棒说话人识别的仿生多分辨率分析。
IF 2.4 3区 计算机科学 Q2 ACOUSTICS Pub Date : 2012-01-01 Epub Date: 2012-09-07 DOI: 10.1186/1687-4722-2012-22
Sridhar Krishna Nemala, Dmitry N Zotkin, Ramani Duraiswami, Mounya Elhilali

Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equipped with elaborate machinery for speech analysis and feature extraction, which hold great lessons for improving the performance of automatic speech processing systems under adverse conditions. The work presented here explores a biologically-motivated multi-resolution speaker information representation obtained by performing an intricate yet computationally-efficient analysis of the information-rich spectro-temporal attributes of the speech signal. We evaluate the proposed features in a speaker verification task performed on NIST SRE 2010 data. The biomimetic approach yields significant robustness in presence of non-stationary noise and reverberation, offering a new framework for deriving reliable features for speaker recognition and speech processing.

人类表现出一种非凡的能力,即使在噪音很大的环境中也能可靠地对声源进行分类。相比之下,当语音信号被信道或背景失真破坏时,大多数工程系统的性能会急剧下降。我们的大脑配备了复杂的语音分析和特征提取机制,这对于提高语音自动处理系统在不利条件下的性能具有重要的借鉴意义。本文介绍的工作探索了一种生物驱动的多分辨率说话人信息表示,该信息表示是通过对语音信号的信息丰富的光谱时间属性进行复杂但计算效率高的分析获得的。我们在NIST SRE 2010数据上执行的说话人验证任务中评估了所提出的特征。仿生方法在存在非平稳噪声和混响时具有显著的鲁棒性,为获得可靠的说话人识别和语音处理特征提供了新的框架。
{"title":"Biomimetic multi-resolution analysis for robust speaker recognition.","authors":"Sridhar Krishna Nemala,&nbsp;Dmitry N Zotkin,&nbsp;Ramani Duraiswami,&nbsp;Mounya Elhilali","doi":"10.1186/1687-4722-2012-22","DOIUrl":"https://doi.org/10.1186/1687-4722-2012-22","url":null,"abstract":"<p><p>Humans exhibit a remarkable ability to reliably classify sound sources in the environment even in presence of high levels of noise. In contrast, most engineering systems suffer a drastic drop in performance when speech signals are corrupted with channel or background distortions. Our brains are equipped with elaborate machinery for speech analysis and feature extraction, which hold great lessons for improving the performance of automatic speech processing systems under adverse conditions. The work presented here explores a biologically-motivated multi-resolution speaker information representation obtained by performing an intricate yet computationally-efficient analysis of the information-rich spectro-temporal attributes of the speech signal. We evaluate the proposed features in a speaker verification task performed on NIST SRE 2010 data. The biomimetic approach yields significant robustness in presence of non-stationary noise and reverberation, offering a new framework for deriving reliable features for speaker recognition and speech processing.</p>","PeriodicalId":49202,"journal":{"name":"Eurasip Journal on Audio Speech and Music Processing","volume":"2012 ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1687-4722-2012-22","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36781151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Eurasip Journal on Audio Speech and Music Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1