首页 > 最新文献

Journal on Audio Speech and Music Processing最新文献

英文 中文
Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization 基于单方向传声器和群稀疏优化的到达方向和功率谱密度估计
3区 计算机科学 Pub Date : 2023-10-04 DOI: 10.1186/s13636-023-00304-8
Elisa Tengan, Thomas Dietzen, Filip Elvander, Toon van Waterschoot
Abstract In this paper, two approaches are proposed for estimating the direction of arrival (DOA) and power spectral density (PSD) of stationary point sources by using a single, rotating, directional microphone. These approaches are based on a method previously presented by the authors, in which point source DOAs were estimated by using a broadband signal model and solving a group-sparse optimization problem, where the number of observations made by the rotating directional microphone can be lower than the number of candidate DOAs in an angular grid. The DOA estimation is followed by the estimation of the sources’ PSDs through the solution of an overdetermined least squares problem. The first approach proposed in this paper includes the use of an additional nonnegativity constraint on the residual noise term when solving the group-sparse optimization problem and is referred to as the Group Lasso Least Squares (GL-LS) approach. The second proposed approach, in addition to the new nonnegativity constraint, employs a narrowband signal model when building the linear system of equations used for formulating the group-sparse optimization problem, where the DOAs and PSDs can be jointly estimated by iterative, group-wise reweighting. This is referred to as the Group-Lasso with $$l_1$$ l 1 -reweighting (GL-L1) approach. Both proposed approaches are implemented using the alternating direction method of multipliers (ADMM), and their performance is evaluated through simulations in which different setup conditions are considered, ranging from different types of model mismatch to variations in the acoustic scene and microphone directivity pattern. The results obtained show that in a scenario involving a microphone response mismatch between observed data and the signal model used, having the additional nonnegativity constraint on the residual noise can improve the DOA estimation for the case of GL-LS and the PSD estimation for the case of GL-L1. Moreover, the GL-L1 approach can present an advantage over GL-LS in terms of DOA estimation performance in scenarios with low SNR or where multiple sources are closely located to each other. Finally, it is shown that having the least squares PSD re-estimation step is beneficial in most scenarios, such that GL-LS outperformed GL-L1 in terms of PSD estimation errors.
摘要本文提出了两种利用单旋转定向传声器估计静止点源的到达方向(DOA)和功率谱密度(PSD)的方法。这些方法基于作者先前提出的方法,其中通过使用宽带信号模型和解决组稀疏优化问题来估计点源doa,其中旋转定向麦克风的观测数量可以低于角网格中的候选doa数量。在DOA估计之后,通过求解过定最小二乘问题估计源的psd。本文提出的第一种方法包括在求解群稀疏优化问题时对残余噪声项使用附加的非负性约束,称为群Lasso最小二乘(GL-LS)方法。第二种提出的方法,除了新的非负性约束外,在构建用于制定群稀疏优化问题的线性方程组时采用窄带信号模型,其中doa和psd可以通过迭代,群加权来联合估计。这被称为Group-Lasso with $$l_1$$ 1 -reweighting (GL-L1)方法。这两种方法都使用乘法器的交替方向方法(ADMM)来实现,并通过考虑不同设置条件的仿真来评估它们的性能,这些条件包括不同类型的模型不匹配、声场景和麦克风指向性模式的变化。结果表明,在观测数据与所用信号模型麦克风响应不匹配的情况下,对残差噪声附加非负性约束可以改善GL-LS情况下的DOA估计和GL-L1情况下的PSD估计。此外,GL-L1方法在低信噪比或多个信源彼此靠近的情况下,在DOA估计性能方面比GL-LS方法具有优势。最后,研究表明,在大多数情况下,最小二乘PSD重估计步骤是有益的,因此GL-LS在PSD估计误差方面优于GL-L1。
{"title":"Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization","authors":"Elisa Tengan, Thomas Dietzen, Filip Elvander, Toon van Waterschoot","doi":"10.1186/s13636-023-00304-8","DOIUrl":"https://doi.org/10.1186/s13636-023-00304-8","url":null,"abstract":"Abstract In this paper, two approaches are proposed for estimating the direction of arrival (DOA) and power spectral density (PSD) of stationary point sources by using a single, rotating, directional microphone. These approaches are based on a method previously presented by the authors, in which point source DOAs were estimated by using a broadband signal model and solving a group-sparse optimization problem, where the number of observations made by the rotating directional microphone can be lower than the number of candidate DOAs in an angular grid. The DOA estimation is followed by the estimation of the sources’ PSDs through the solution of an overdetermined least squares problem. The first approach proposed in this paper includes the use of an additional nonnegativity constraint on the residual noise term when solving the group-sparse optimization problem and is referred to as the Group Lasso Least Squares (GL-LS) approach. The second proposed approach, in addition to the new nonnegativity constraint, employs a narrowband signal model when building the linear system of equations used for formulating the group-sparse optimization problem, where the DOAs and PSDs can be jointly estimated by iterative, group-wise reweighting. This is referred to as the Group-Lasso with $$l_1$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:msub> <mml:mi>l</mml:mi> <mml:mn>1</mml:mn> </mml:msub> </mml:math> -reweighting (GL-L1) approach. Both proposed approaches are implemented using the alternating direction method of multipliers (ADMM), and their performance is evaluated through simulations in which different setup conditions are considered, ranging from different types of model mismatch to variations in the acoustic scene and microphone directivity pattern. The results obtained show that in a scenario involving a microphone response mismatch between observed data and the signal model used, having the additional nonnegativity constraint on the residual noise can improve the DOA estimation for the case of GL-LS and the PSD estimation for the case of GL-L1. Moreover, the GL-L1 approach can present an advantage over GL-LS in terms of DOA estimation performance in scenarios with low SNR or where multiple sources are closely located to each other. Finally, it is shown that having the least squares PSD re-estimation step is beneficial in most scenarios, such that GL-LS outperformed GL-L1 in terms of PSD estimation errors.","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135590949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cascade algorithms for combined acoustic feedback cancelation and noise reduction 结合声反馈消除和降噪的级联算法
3区 计算机科学 Pub Date : 2023-09-21 DOI: 10.1186/s13636-023-00296-5
Santiago Ruiz, Toon van Waterschoot, Marc Moonen
Abstract This paper presents three cascade algorithms for combined acoustic feedback cancelation (AFC) and noise reduction (NR) in speech applications. A prediction error method (PEM)-based adaptive feedback cancelation (PEM-based AFC) algorithm is used for the AFC stage, while a multichannel Wiener filter (MWF) is applied for the NR stage. A scenario with M microphones and 1 loudspeaker is considered, without loss of generality. The first algorithm is the baseline algorithm, namely the cascade M -channel rank-1 MWF and PEM-AFC, where a NR stage is performed first using a rank-1 MWF followed by a single-channel AFC stage using a PEM-based AFC algorithm. The second algorithm is the cascade $$(M+1)$$ ( M + 1 ) -channel rank-2 MWF and PEM-AFC, where again a NR stage is applied first followed by a single-channel AFC stage. The novelty of this algorithm is to consider an ( $$M+1$$ M + 1 )-channel data model in the MWF formulation with two different desired signals, i.e., the speech component in the reference microphone signal and in the loudspeaker signal, both defined by the speech source signal but not equal to each other. The two desired signal estimates are later used in a single-channel PEM-based AFC stage. The third algorithm is the cascade M -channel PEM-AFC and rank-1 MWF where an M -channel AFC stage is performed first followed by an M -channel NR stage. Although in cascade algorithms where NR is performed first and then AFC the estimation of the feedback path is usually affected by the NR stage, it is shown here that by performing a rank-2 approximation of the speech correlation matrix this issue can be avoided and the feedback path can be correctly estimated. The performance of the algorithms is assessed by means of closed-loop simulations where it is shown that for the considered input signal-to-noise ratios (iSNRs) the cascade $$(M+1)$$ ( M + 1 ) -channel rank-2 MWF and PEM-AFC and the cascade M -channel PEM-AFC and rank-1 MWF algorithms outperform the cascade M -channel rank-1 MWF and PEM-AFC algorithm in terms of the added stable gain (ASG) and misadjustment (Mis) as well as in terms of perceptual metrics such as the short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and signal distortion (SD).
摘要本文提出了语音应用中声学反馈消除(AFC)和降噪(NR)相结合的三种级联算法。AFC阶段采用基于预测误差法(PEM)的自适应反馈抵消(PEM-based AFC)算法,NR阶段采用多通道维纳滤波器(MWF)。考虑一个有M个麦克风和1个扬声器的场景,但不失一般性。第一种算法是基线算法,即M通道rank-1 MWF和PEM-AFC级联,其中首先使用rank-1 MWF执行NR阶段,然后使用基于pem的AFC算法执行单通道AFC阶段。第二种算法是级联$$(M+1)$$ (M + 1) -通道排名2的MWF和PEM-AFC,其中首先应用NR阶段,然后是单通道AFC阶段。该算法的新颖之处在于在MWF公式中考虑一个($$M+1$$ M + 1)通道数据模型,该模型具有两个不同的期望信号,即参考麦克风信号中的语音分量和扬声器信号中的语音分量,这两个信号都由语音源信号定义,但彼此不相等。两个期望的信号估计随后用于单通道基于pem的AFC级。第三种算法是级联M通道PEM-AFC和排名1的MWF,其中M通道AFC阶段首先执行,然后是M通道NR阶段。虽然在先进行NR再进行AFC的级联算法中,反馈路径的估计通常会受到NR阶段的影响,但这里显示,通过对语音相关矩阵进行秩2近似,可以避免这个问题,并且可以正确估计反馈路径。通过闭环仿真来评估算法的性能,结果表明,对于考虑的输入信噪比(isnr),级联$$(M+1)$$ (M + 1)通道等级2 MWF和PEM-AFC以及级联M通道等级2 MWF和等级1 MWF算法在增加的稳定增益(ASG)和失调(Mis)以及感知指标(如短时间)方面优于级联M通道等级1 MWF和PEM-AFC算法客观可理解度(STOI)、语音质量感知评价(PESQ)和信号失真(SD)。
{"title":"Cascade algorithms for combined acoustic feedback cancelation and noise reduction","authors":"Santiago Ruiz, Toon van Waterschoot, Marc Moonen","doi":"10.1186/s13636-023-00296-5","DOIUrl":"https://doi.org/10.1186/s13636-023-00296-5","url":null,"abstract":"Abstract This paper presents three cascade algorithms for combined acoustic feedback cancelation (AFC) and noise reduction (NR) in speech applications. A prediction error method (PEM)-based adaptive feedback cancelation (PEM-based AFC) algorithm is used for the AFC stage, while a multichannel Wiener filter (MWF) is applied for the NR stage. A scenario with M microphones and 1 loudspeaker is considered, without loss of generality. The first algorithm is the baseline algorithm, namely the cascade M -channel rank-1 MWF and PEM-AFC, where a NR stage is performed first using a rank-1 MWF followed by a single-channel AFC stage using a PEM-based AFC algorithm. The second algorithm is the cascade $$(M+1)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mo>(</mml:mo> <mml:mi>M</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> -channel rank-2 MWF and PEM-AFC, where again a NR stage is applied first followed by a single-channel AFC stage. The novelty of this algorithm is to consider an ( $$M+1$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>M</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:math> )-channel data model in the MWF formulation with two different desired signals, i.e., the speech component in the reference microphone signal and in the loudspeaker signal, both defined by the speech source signal but not equal to each other. The two desired signal estimates are later used in a single-channel PEM-based AFC stage. The third algorithm is the cascade M -channel PEM-AFC and rank-1 MWF where an M -channel AFC stage is performed first followed by an M -channel NR stage. Although in cascade algorithms where NR is performed first and then AFC the estimation of the feedback path is usually affected by the NR stage, it is shown here that by performing a rank-2 approximation of the speech correlation matrix this issue can be avoided and the feedback path can be correctly estimated. The performance of the algorithms is assessed by means of closed-loop simulations where it is shown that for the considered input signal-to-noise ratios (iSNRs) the cascade $$(M+1)$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mo>(</mml:mo> <mml:mi>M</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> -channel rank-2 MWF and PEM-AFC and the cascade M -channel PEM-AFC and rank-1 MWF algorithms outperform the cascade M -channel rank-1 MWF and PEM-AFC algorithm in terms of the added stable gain (ASG) and misadjustment (Mis) as well as in terms of perceptual metrics such as the short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), and signal distortion (SD).","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136155664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning-based robust speaker counting and separation with the aid of spatial coherence 基于学习的基于空间连贯的鲁棒说话人计数和分离
3区 计算机科学 Pub Date : 2023-09-20 DOI: 10.1186/s13636-023-00298-3
Yicheng Hsu, Mingsian R. Bai
Abstract A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions (wRTFs) across time frames. The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the local coherence functions are computed from the coherence between the wRTFs of a time-frequency bin and the global activity function-weighted RTF of the target speaker. In speaker counting, we use the eigenvalues of the SCM and the maximum similarity of the interframe global activity distributions between two speakers as the input features to the speaker counting network (SCnet). In speaker separation, a global and local activity-driven network (GLADnet) is used to extract each independent speaker signal, which is particularly useful for highly overlapping speech signals. Experimental results obtained from the real meeting recordings show that the proposed system achieves superior speaker counting and speaker separation performance compared to previous publications without the prior knowledge of the array configurations.
摘要:提出了一种用于嘈杂和混响环境下说话人计数和语音分离的三段式方法。在空间特征提取中,利用白化的相对传递函数计算空间相干矩阵(SCM)。利用单片机的特征向量构造一个单纯形来估计每个说话人的全局活动函数,而局部相干函数则由时频bin的wrtf与目标说话人的全局活动函数加权RTF之间的相干性来计算。在说话人计数中,我们使用单片机的特征值和两个说话人之间帧间全局活动分布的最大相似度作为说话人计数网络(SCnet)的输入特征。在说话人分离中,使用全局和局部活动驱动网络(GLADnet)来提取每个独立的说话人信号,这对高度重叠的语音信号特别有用。实际会议录音的实验结果表明,该系统在不需要事先了解阵列配置的情况下,取得了较好的发言者计数和发言者分离性能。
{"title":"Learning-based robust speaker counting and separation with the aid of spatial coherence","authors":"Yicheng Hsu, Mingsian R. Bai","doi":"10.1186/s13636-023-00298-3","DOIUrl":"https://doi.org/10.1186/s13636-023-00298-3","url":null,"abstract":"Abstract A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions (wRTFs) across time frames. The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the local coherence functions are computed from the coherence between the wRTFs of a time-frequency bin and the global activity function-weighted RTF of the target speaker. In speaker counting, we use the eigenvalues of the SCM and the maximum similarity of the interframe global activity distributions between two speakers as the input features to the speaker counting network (SCnet). In speaker separation, a global and local activity-driven network (GLADnet) is used to extract each independent speaker signal, which is particularly useful for highly overlapping speech signals. Experimental results obtained from the real meeting recordings show that the proposed system achieves superior speaker counting and speaker separation performance compared to previous publications without the prior knowledge of the array configurations.","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136263672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Acoustic object canceller: removing a known signal from monaural recording using blind synchronization 声学对象消除:使用盲同步从单声记录中去除已知信号
3区 计算机科学 Pub Date : 2023-09-11 DOI: 10.1186/s13636-023-00300-y
Takao Kawamura, Kouei Yamaoka, Yukoh Wakabayashi, Nobutaka Ono, Ryoichi Miyazaki
Abstract In this paper, we propose a technique for removing a specific type of interference from a monaural recording. Nonstationary interferences are generally challenging to eliminate from such recordings. However, if the interference is a known sound like a cell phone ringtone, music from a CD or streaming service, or a radio or TV broadcast, its source signal can be easily obtained. In our method, we define such interference as an acoustic object. Even if the sampling frequencies of the recording and the acoustic object do not match, we compensate for the mismatch and use the maximum likelihood estimation technique with the auxiliary function to remove the interference from the recording. We compare several probabilistic models for representing the object-canceled signal. Experimental evaluations confirm the effectiveness of our proposed method.
摘要在本文中,我们提出了一种从单声录音中去除特定类型干扰的技术。非平稳干扰通常很难从这样的记录中消除。但是,如果干扰是已知的声音,如手机铃声、CD或流媒体服务中的音乐、广播或电视广播,则很容易获得其源信号。在我们的方法中,我们将这种干扰定义为声学对象。即使录音和声学对象的采样频率不匹配,我们也会对不匹配进行补偿,并使用最大似然估计技术和辅助函数来消除录音中的干扰。我们比较了几种表示对象取消信号的概率模型。实验验证了该方法的有效性。
{"title":"Acoustic object canceller: removing a known signal from monaural recording using blind synchronization","authors":"Takao Kawamura, Kouei Yamaoka, Yukoh Wakabayashi, Nobutaka Ono, Ryoichi Miyazaki","doi":"10.1186/s13636-023-00300-y","DOIUrl":"https://doi.org/10.1186/s13636-023-00300-y","url":null,"abstract":"Abstract In this paper, we propose a technique for removing a specific type of interference from a monaural recording. Nonstationary interferences are generally challenging to eliminate from such recordings. However, if the interference is a known sound like a cell phone ringtone, music from a CD or streaming service, or a radio or TV broadcast, its source signal can be easily obtained. In our method, we define such interference as an acoustic object. Even if the sampling frequencies of the recording and the acoustic object do not match, we compensate for the mismatch and use the maximum likelihood estimation technique with the auxiliary function to remove the interference from the recording. We compare several probabilistic models for representing the object-canceled signal. Experimental evaluations confirm the effectiveness of our proposed method.","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135982643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study 幽默音频的力量:基于脑电图的交通拥堵情绪调节研究
IF 2.4 3区 计算机科学 Pub Date : 2023-09-07 DOI: 10.1186/s13636-023-00302-w
Lekai Zhang, Yingfan Wang, Kailun He, Hailong Zhang, Baixi Xing, Xiaofeng Liu, Fo Hu
{"title":"The power of humorous audio: exploring emotion regulation in traffic congestion through EEG-based study","authors":"Lekai Zhang, Yingfan Wang, Kailun He, Hailong Zhang, Baixi Xing, Xiaofeng Liu, Fo Hu","doi":"10.1186/s13636-023-00302-w","DOIUrl":"https://doi.org/10.1186/s13636-023-00302-w","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43382828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning 具有个性化连续联合学习的学习域异构说话人识别系统
IF 2.4 3区 计算机科学 Pub Date : 2023-09-05 DOI: 10.1186/s13636-023-00299-2
Zhiyong Chen, Shugong Xu
{"title":"Learning domain-heterogeneous speaker recognition systems with personalized continual federated learning","authors":"Zhiyong Chen, Shugong Xu","doi":"10.1186/s13636-023-00299-2","DOIUrl":"https://doi.org/10.1186/s13636-023-00299-2","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49629565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Training audio transformers for cover song identification 训练用于翻唱歌曲识别的音频转换器
IF 2.4 3区 计算机科学 Pub Date : 2023-08-25 DOI: 10.1186/s13636-023-00297-4
Te Zeng, F. Lau
{"title":"Training audio transformers for cover song identification","authors":"Te Zeng, F. Lau","doi":"10.1186/s13636-023-00297-4","DOIUrl":"https://doi.org/10.1186/s13636-023-00297-4","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44510561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Channel and temporal-frequency attention UNet for monaural speech enhancement 用于单音语音增强的信道和时频注意UNet
IF 2.4 3区 计算机科学 Pub Date : 2023-08-14 DOI: 10.1186/s13636-023-00295-6
Shibiao Xu, Zehua Zhang, Mingjiang Wang
{"title":"Channel and temporal-frequency attention UNet for monaural speech enhancement","authors":"Shibiao Xu, Zehua Zhang, Mingjiang Wang","doi":"10.1186/s13636-023-00295-6","DOIUrl":"https://doi.org/10.1186/s13636-023-00295-6","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45794956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual input neural networks for positional sound source localization 位置声源定位的双输入神经网络
IF 2.4 3区 计算机科学 Pub Date : 2023-08-08 DOI: 10.1186/s13636-023-00301-x
Eric Grinstein, Vincent W. Neo, P. Naylor
{"title":"Dual input neural networks for positional sound source localization","authors":"Eric Grinstein, Vincent W. Neo, P. Naylor","doi":"10.1186/s13636-023-00301-x","DOIUrl":"https://doi.org/10.1186/s13636-023-00301-x","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44921492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting 用于远场说话人验证和关键词识别的多任务深度交叉注意网络
IF 2.4 3区 计算机科学 Pub Date : 2023-07-01 DOI: 10.1186/s13636-023-00293-8
Xingwei Liang, Zehua Zhang, Ruifeng Xu
{"title":"Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting","authors":"Xingwei Liang, Zehua Zhang, Ruifeng Xu","doi":"10.1186/s13636-023-00293-8","DOIUrl":"https://doi.org/10.1186/s13636-023-00293-8","url":null,"abstract":"","PeriodicalId":49309,"journal":{"name":"Journal on Audio Speech and Music Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44852720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal on Audio Speech and Music Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1