首页 > 最新文献

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Normalized amplitude modulation features for large vocabulary noise-robust speech recognition 大词汇噪声鲁棒语音识别的归一化调幅特征
V. Mitra, H. Franco, M. Graciarena, Arindam Mandal
Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, we present an amplitude modulation feature derived from Teager's nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features. The proposed NMCC features are compared with respect to state-of-the-art noise-robust features in Aurora-2 and a renoised Wall Street Journal (WSJ) corpus. The WSJ word-recognition experiments were performed on both a clean and artificially renoised WSJ corpus using SRI's DECIPHER large vocabulary speech recognition system. The experiments were performed under three train-test conditions: (a) matched, (b) mismatched, and (c) multi-conditioned. The Aurora-2 digit recognition task was performed using the standard HTK recognizer distributed with Aurora-2. Our results indicate that the proposed NMCC features demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and also improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features.
背景噪声和信道退化严重制约了当前语音识别系统的性能。人类语音识别性能与自动语音识别系统的比较研究表明,与自动系统相比,人类听觉系统对背景噪声和信道变异性具有很强的鲁棒性。传统的增强语音识别系统鲁棒性的方法是为语音识别模型构建鲁棒特征集。在这项工作中,我们提出了一个从Teager的非线性能量算子衍生出来的调幅特征,该特征是功率归一化和余弦变换,以产生归一化调制倒谱系数(NMCC)特征。将提议的NMCC特征与Aurora-2中最先进的噪声鲁棒特征和修正的华尔街日报(WSJ)语料库进行比较。使用SRI的破译大词汇量语音识别系统,在清洁和人工修正的WSJ语料库上进行了WSJ单词识别实验。实验在三种训练测试条件下进行:(a)匹配,(b)不匹配和(c)多条件。使用与Aurora-2一起分发的标准HTK识别器执行Aurora-2数字识别任务。我们的研究结果表明,与mfccc和最先进的噪声鲁棒性特征相比,所提出的NMCC特征在几乎所有修正WSJ数据的训练测试条件下都表现出了噪声鲁棒性,并且提高了Aurora-2的数字识别精度。
{"title":"Normalized amplitude modulation features for large vocabulary noise-robust speech recognition","authors":"V. Mitra, H. Franco, M. Graciarena, Arindam Mandal","doi":"10.1109/ICASSP.2012.6288824","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288824","url":null,"abstract":"Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, we present an amplitude modulation feature derived from Teager's nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features. The proposed NMCC features are compared with respect to state-of-the-art noise-robust features in Aurora-2 and a renoised Wall Street Journal (WSJ) corpus. The WSJ word-recognition experiments were performed on both a clean and artificially renoised WSJ corpus using SRI's DECIPHER large vocabulary speech recognition system. The experiments were performed under three train-test conditions: (a) matched, (b) mismatched, and (c) multi-conditioned. The Aurora-2 digit recognition task was performed using the standard HTK recognizer distributed with Aurora-2. Our results indicate that the proposed NMCC features demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and also improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85498682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
A family of Bounded Component Analysis algorithms 一类有界分量分析算法
A. Erdogan
Bounded Component Analysis (BCA) has recently been introduced as an alternative method for the Blind Source Separation problem. Under the generic assumption on source boundedness, BCA provides a flexible framework for the separation of dependent (even correlated) as well as independent sources. This article provides a family of algorithms derived based on the geometric picture implied by the founding assumptions of the BCA approach. We also provide a numerical example demonstrating the ability of the proposed algorithms to separate mixtures of some dependent sources.
有界分量分析(BCA)最近被引入作为盲源分离问题的一种替代方法。在源有界性的一般假设下,BCA为分离依赖源(甚至相关源)和独立源提供了一个灵活的框架。本文提供了一系列基于BCA方法的基本假设所隐含的几何图像的算法。我们还提供了一个数值例子来证明所提出的算法能够分离一些依赖源的混合物。
{"title":"A family of Bounded Component Analysis algorithms","authors":"A. Erdogan","doi":"10.1109/ICASSP.2012.6288270","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288270","url":null,"abstract":"Bounded Component Analysis (BCA) has recently been introduced as an alternative method for the Blind Source Separation problem. Under the generic assumption on source boundedness, BCA provides a flexible framework for the separation of dependent (even correlated) as well as independent sources. This article provides a family of algorithms derived based on the geometric picture implied by the founding assumptions of the BCA approach. We also provide a numerical example demonstrating the ability of the proposed algorithms to separate mixtures of some dependent sources.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85551713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Automatic generation of synthesizable hardware implementation from high level RVC-cal description 从高级RVC-cal描述自动生成可合成的硬件实现
Khaled Jerbi, M. Raulet, O. Déforges, M. Abid
Data process algorithms are increasing in complexity especially for image and video coding. Therefore, hardware development using directly hardware description languages (HDL) such as VHDL or Verilog is a difficult task. Current research axes in this context are introducing new methodologies to automate the generation of such descriptions. In our work we adopted a high level and target-independent language called CAL (Caltrop Actor Language). This language is associated with a set of tools to easily design dataflow applications and also a hardware compiler to automatically generate the implementation. Before the modifications presented in this paper, the existing CAL hardware back-end did not support some high-level features of the CAL language. Consequently, high-level designed actors had to be manually transformed to be synthesizable. In this paper, we introduce a general automatic transformation of CAL descriptions to make these structures compliant and synthesizable. This transformation analyses the CAL code, detects the target features and makes the required changes to obtain synthesizable code while keeping the same application behavior. This work resolves the main bottleneck of the hardware generation flow from CAL designs.
数据处理算法越来越复杂,尤其是图像和视频编码。因此,直接使用硬件描述语言(HDL)(如VHDL或Verilog)进行硬件开发是一项困难的任务。在此背景下,当前的研究轴正在引入新的方法来自动生成此类描述。在我们的工作中,我们采用了一种高级且与目标无关的语言,称为CAL (Caltrop Actor语言)。该语言与一组工具相关联,可以轻松地设计数据流应用程序,还与一个硬件编译器相关联,可以自动生成实现。在本文提出修改之前,现有的CAL硬件后端不支持CAL语言的一些高级特性。因此,高级设计的角色必须手动转换为可合成的。在本文中,我们引入了一种通用的自动转换的CAL描述,使这些结构兼容和可综合。此转换分析CAL代码,检测目标特性,并进行必要的更改以获得可合成的代码,同时保持相同的应用程序行为。该工作解决了硬件生成流程中的主要瓶颈。
{"title":"Automatic generation of synthesizable hardware implementation from high level RVC-cal description","authors":"Khaled Jerbi, M. Raulet, O. Déforges, M. Abid","doi":"10.1109/ICASSP.2012.6288199","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288199","url":null,"abstract":"Data process algorithms are increasing in complexity especially for image and video coding. Therefore, hardware development using directly hardware description languages (HDL) such as VHDL or Verilog is a difficult task. Current research axes in this context are introducing new methodologies to automate the generation of such descriptions. In our work we adopted a high level and target-independent language called CAL (Caltrop Actor Language). This language is associated with a set of tools to easily design dataflow applications and also a hardware compiler to automatically generate the implementation. Before the modifications presented in this paper, the existing CAL hardware back-end did not support some high-level features of the CAL language. Consequently, high-level designed actors had to be manually transformed to be synthesizable. In this paper, we introduce a general automatic transformation of CAL descriptions to make these structures compliant and synthesizable. This transformation analyses the CAL code, detects the target features and makes the required changes to obtain synthesizable code while keeping the same application behavior. This work resolves the main bottleneck of the hardware generation flow from CAL designs.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85692792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Graph spectral compressed sensing for sensor networks 用于传感器网络的图谱压缩感知
Xiaofan Zhu, M. Rabbat
Consider a wireless sensor network with N sensor nodes measuring data which are correlated temporally or spatially. We consider the problem of reconstructing the original data by only transmitting M ≪ N sensor readings while guaranteeing that the reconstruction error is small. Assuming the original signal is “smooth” with respect to the network topology, our approach is to gather measurements from a random subset of nodes and then interpolate with respect to the graph Laplacian eigenbasis, leveraging ideas from compressed sensing. We propose algorithms for both temporally and spatially correlated signals, and the performance of these algorithms is verified using both synthesized data and real world data. Significant savings are made in terms of energy resources, bandwidth, and query latency.
考虑一个具有N个传感器节点的无线传感器网络,这些节点测量的数据在时间或空间上是相关的。我们考虑通过只传送M≪N传感器读数来重建原始数据的问题,同时保证重建误差小。假设原始信号相对于网络拓扑是“平滑的”,我们的方法是从节点的随机子集中收集测量值,然后利用压缩感知的思想,对图拉普拉斯特征基进行插值。我们提出了时间和空间相关信号的算法,并使用合成数据和真实世界的数据验证了这些算法的性能。在能源、带宽和查询延迟方面可以显著节省。
{"title":"Graph spectral compressed sensing for sensor networks","authors":"Xiaofan Zhu, M. Rabbat","doi":"10.1109/ICASSP.2012.6288515","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288515","url":null,"abstract":"Consider a wireless sensor network with N sensor nodes measuring data which are correlated temporally or spatially. We consider the problem of reconstructing the original data by only transmitting M ≪ N sensor readings while guaranteeing that the reconstruction error is small. Assuming the original signal is “smooth” with respect to the network topology, our approach is to gather measurements from a random subset of nodes and then interpolate with respect to the graph Laplacian eigenbasis, leveraging ideas from compressed sensing. We propose algorithms for both temporally and spatially correlated signals, and the performance of these algorithms is verified using both synthesized data and real world data. Significant savings are made in terms of energy resources, bandwidth, and query latency.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84595318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Noncoherent misbehavior detection in space-time coded cooperative networks 时空编码合作网络中的非相干异常行为检测
Li-Chung Lo, Zhao-Jie Wang, Wan-Jen Huang
Consider a two-relay decode-and-forward (DF) cooperative network where Alamouti coding is adopted among relays to exploit spatial diversity. However, the spatial diversity gain is diminished with the existence of misbehaving relays. Most existing work on detecting malicious relays requires the knowledge of instantaneous channel status, which is usually unavailable if the relays garble retransmitted signals deliberately. With this regard, we propose a noncoherent misbehavior detection using the second-order statistics of channel estimates for relay-destination links. It shows from simulation results that increasing the number of received blocks provides significant improvement even at low SNR regime.
考虑一个双中继解码转发(DF)合作网络,其中中继之间采用阿拉穆蒂编码来利用空间分集。然而,空间分集增益会随着继电器的存在而降低。大多数现有的检测恶意中继的工作需要了解瞬时信道状态,如果中继故意干扰重传信号,通常无法获得瞬时信道状态。在这方面,我们提出了一种利用中继-目的地链路信道估计的二阶统计量的非相干错误行为检测方法。仿真结果表明,即使在低信噪比的情况下,增加接收块的数量也能显著改善。
{"title":"Noncoherent misbehavior detection in space-time coded cooperative networks","authors":"Li-Chung Lo, Zhao-Jie Wang, Wan-Jen Huang","doi":"10.1109/ICASSP.2012.6288561","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288561","url":null,"abstract":"Consider a two-relay decode-and-forward (DF) cooperative network where Alamouti coding is adopted among relays to exploit spatial diversity. However, the spatial diversity gain is diminished with the existence of misbehaving relays. Most existing work on detecting malicious relays requires the knowledge of instantaneous channel status, which is usually unavailable if the relays garble retransmitted signals deliberately. With this regard, we propose a noncoherent misbehavior detection using the second-order statistics of channel estimates for relay-destination links. It shows from simulation results that increasing the number of received blocks provides significant improvement even at low SNR regime.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77671094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Detecting passive eavesdroppers in the MIMO wiretap channel 在MIMO窃听信道中检测无源窃听者
A. Mukherjee, A. L. Swindlehurst
The MIMO wiretap channel comprises a passive eavesdropper that attempts to intercept communications between an authorized transmitter-receiver pair, with each node being equipped with multiple antennas. In a dynamic network, it is imperative that the presence of a passive eavesdropper be determined before the transmitter can deploy robust secrecy-encoding schemes as a countermeasure. This is a difficult task in general, since by definition the eavesdropper is passive and never transmits. In this work we adopt a method that allows the legitimate nodes to detect the passive eavesdropper from the local oscillator power that is inadvertently leaked from its RF front end. We examine the performance of non-coherent energy detection as well as optimal coherent detection schemes. We then show how the proposed detectors allow the legitimate nodes to increase the MIMO secrecy rate of the channel.
MIMO窃听信道包括一个无源窃听器,该无源窃听器试图拦截授权的发射器-接收器对之间的通信,每个节点都配备有多个天线。在动态网络中,必须先确定无源窃听者的存在,然后发送方才能部署健壮的保密编码方案作为对抗措施。这通常是一项困难的任务,因为根据定义,窃听者是被动的,从不传输。在这项工作中,我们采用了一种方法,允许合法节点从其射频前端无意中泄露的本地振荡器功率中检测无源窃听者。我们研究了非相干能量检测的性能以及最优相干检测方案。然后,我们展示了所提出的检测器如何允许合法节点提高信道的MIMO保密率。
{"title":"Detecting passive eavesdroppers in the MIMO wiretap channel","authors":"A. Mukherjee, A. L. Swindlehurst","doi":"10.1109/ICASSP.2012.6288501","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288501","url":null,"abstract":"The MIMO wiretap channel comprises a passive eavesdropper that attempts to intercept communications between an authorized transmitter-receiver pair, with each node being equipped with multiple antennas. In a dynamic network, it is imperative that the presence of a passive eavesdropper be determined before the transmitter can deploy robust secrecy-encoding schemes as a countermeasure. This is a difficult task in general, since by definition the eavesdropper is passive and never transmits. In this work we adopt a method that allows the legitimate nodes to detect the passive eavesdropper from the local oscillator power that is inadvertently leaked from its RF front end. We examine the performance of non-coherent energy detection as well as optimal coherent detection schemes. We then show how the proposed detectors allow the legitimate nodes to increase the MIMO secrecy rate of the channel.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78154830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 147
MLLR transforms of self-organized units as features in speaker recognition 自组织单元的MLLR变换作为特征在说话人识别中
M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball
Using speaker adaptation parameters, such as maximum likelihood linear regression (MLLR) adaptation matrices, as features for speaker recognition (SR) has been shown to perform well and can also provide complementary information for fusion with other acoustic-based SR systems, such as GMM-based systems. In order to estimate the adaptation parameters, a speech recognizer in the SR domain is required which in turn requires transcribed training data for recognizer training. This limits the approach only to domains where training transcriptions are available. To generalize the adaptation parameter approach to domains without transcriptions, we propose the use of self-organized unit recognizers that can be trained without supervision (or transcribed data). We report results on the 2002 NIST speaker recognition evaluation (SRE2002) extended data set and show that using MLLR parameters estimated from SOU recognizers give comparable performance to systems using a matched recognizers. SOU recognizers also outperform those using cross-lingual recognizers. When we fused the SOU- and word recognizers, SR equal error rate (EER) can be reduced by another 15%. This suggests SOU recognizers can be useful whether or not transcribed data for recognition training are available.
使用说话人自适应参数,如最大似然线性回归(MLLR)自适应矩阵,作为说话人识别(SR)的特征已被证明表现良好,并且还可以为与其他基于声学的SR系统(如基于gmm的系统)的融合提供补充信息。为了估计自适应参数,需要一个SR域中的语音识别器,这反过来又需要转录的训练数据用于识别器的训练。这限制了该方法仅适用于训练转录可用的领域。为了将自适应参数方法推广到没有转录的域,我们建议使用可以在没有监督(或转录数据)的情况下训练的自组织单元识别器。我们报告了2002年NIST说话人识别评估(SRE2002)扩展数据集的结果,并表明使用从SOU识别器估计的MLLR参数与使用匹配识别器的系统具有相当的性能。SOU识别器的表现也优于那些使用跨语言识别器的识别器。将词识别器与词识别器融合后,平均误差率(EER)又降低了15%。这表明无论是否有识别训练的转录数据可用,SOU识别器都是有用的。
{"title":"MLLR transforms of self-organized units as features in speaker recognition","authors":"M. Siu, Omer Lang, H. Gish, S. Lowe, Arthur Chan, O. Kimball","doi":"10.1109/ICASSP.2012.6288891","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288891","url":null,"abstract":"Using speaker adaptation parameters, such as maximum likelihood linear regression (MLLR) adaptation matrices, as features for speaker recognition (SR) has been shown to perform well and can also provide complementary information for fusion with other acoustic-based SR systems, such as GMM-based systems. In order to estimate the adaptation parameters, a speech recognizer in the SR domain is required which in turn requires transcribed training data for recognizer training. This limits the approach only to domains where training transcriptions are available. To generalize the adaptation parameter approach to domains without transcriptions, we propose the use of self-organized unit recognizers that can be trained without supervision (or transcribed data). We report results on the 2002 NIST speaker recognition evaluation (SRE2002) extended data set and show that using MLLR parameters estimated from SOU recognizers give comparable performance to systems using a matched recognizers. SOU recognizers also outperform those using cross-lingual recognizers. When we fused the SOU- and word recognizers, SR equal error rate (EER) can be reduced by another 15%. This suggests SOU recognizers can be useful whether or not transcribed data for recognition training are available.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72910724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Minimax design of sparse FIR digital filters 稀疏FIR数字滤波器的极大极小设计
A. Jiang, H. Kwan, Yanping Zhu, Xiaofeng Liu
In this paper, we present a novel algorithm to design sparse FIR digital filters in the minimax sense. To tackle the nonconvexity of the design problem, an efficient iterative procedure is developed to find a potential sparsity pattern. In each iteration, a subproblem in a simpler form is constructed. Instead of directly resolving these nonconvex subproblems, we resort to their respective dual problems. It can be proved that under a weak condition, globally optimal solutions of these subproblems can be attained by solving their dual problems. In this case, the overall iterative procedure can converge to a locally optimal solution of the original design problem. The real minimax design can then be achieved by refining the FIR filter obtained by the iterative procedure. The design procedure described above can be repeated for several times to further improve the sparsity of design results. The output of the previous stage can be used as the initial point of the subsequent design. Simulation results demonstrate the effectiveness of our proposed algorithm.
本文提出了一种设计极小极大意义上稀疏FIR数字滤波器的新算法。为了解决设计问题的非凸性,开发了一种有效的迭代方法来寻找潜在的稀疏模式。在每次迭代中,以更简单的形式构造一个子问题。而不是直接解决这些非凸子问题,我们求助于它们各自的对偶问题。证明了在弱条件下,通过求解这些子问题的对偶问题可以得到它们的全局最优解。在这种情况下,整个迭代过程可以收敛到原设计问题的局部最优解。然后,通过改进迭代过程得到的FIR滤波器,可以实现真正的极大极小设计。上述设计过程可以重复多次,以进一步提高设计结果的稀疏性。前一阶段的输出可以作为后续设计的起始点。仿真结果验证了算法的有效性。
{"title":"Minimax design of sparse FIR digital filters","authors":"A. Jiang, H. Kwan, Yanping Zhu, Xiaofeng Liu","doi":"10.1109/ICASSP.2012.6288670","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288670","url":null,"abstract":"In this paper, we present a novel algorithm to design sparse FIR digital filters in the minimax sense. To tackle the nonconvexity of the design problem, an efficient iterative procedure is developed to find a potential sparsity pattern. In each iteration, a subproblem in a simpler form is constructed. Instead of directly resolving these nonconvex subproblems, we resort to their respective dual problems. It can be proved that under a weak condition, globally optimal solutions of these subproblems can be attained by solving their dual problems. In this case, the overall iterative procedure can converge to a locally optimal solution of the original design problem. The real minimax design can then be achieved by refining the FIR filter obtained by the iterative procedure. The design procedure described above can be repeated for several times to further improve the sparsity of design results. The output of the previous stage can be used as the initial point of the subsequent design. Simulation results demonstrate the effectiveness of our proposed algorithm.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79912214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Connexions and the SPEN fellows program 连接和SPEN研究员计划
T. Welch, M. Morrow, C. Wright
Texas Instruments (TI) has created the Signal Processing Education Network (SPEN) Fellows program to help identify and fill content gaps within the signal processing content library hosted on the Connexions ecosystem. This paper will overview Connexions, SPEN, the SPEN Fellows program, and review this year's SPEN Fellows project involving Connexions content creation involving real-time DSP (RT-DSP).
德州仪器(TI)创建了信号处理教育网络(SPEN)研究员计划,以帮助识别和填补Connexions生态系统上托管的信号处理内容库中的内容空白。本文将概述Connexions、SPEN、SPEN研究员计划,并回顾今年的SPEN研究员项目,该项目涉及Connexions涉及实时DSP (RT-DSP)的内容创建。
{"title":"Connexions and the SPEN fellows program","authors":"T. Welch, M. Morrow, C. Wright","doi":"10.1109/ICASSP.2012.6288495","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6288495","url":null,"abstract":"Texas Instruments (TI) has created the Signal Processing Education Network (SPEN) Fellows program to help identify and fill content gaps within the signal processing content library hosted on the Connexions ecosystem. This paper will overview Connexions, SPEN, the SPEN Fellows program, and review this year's SPEN Fellows project involving Connexions content creation involving real-time DSP (RT-DSP).","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79962826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Shift-variant non-negative matrix deconvolution for music transcription 音乐转录的移位非负矩阵反褶积
Holger Kirchhoff, S. Dixon, Anssi Klapuri
In this paper, we address the task of semi-automatic music transcription in which the user provides prior information about the polyphonic mixture under analysis. We propose a non-negative matrix deconvolution framework for this task that allows instruments to be represented by a different basis function for each fundamental frequency (“shift variance”). Two different types of user input are studied: information about the types of instruments, which enables the use of basis functions from an instrument database, and a manual transcription of a number of notes which enables the template estimation from the data under analysis itself. Experiments are performed on a data set of mixtures of acoustical instruments up to a polyphony of five. The results confirm a significant loss in accuracy when database templates are used and show the superiority of the Kullback-Leibler divergence over the least squares error cost function.
在本文中,我们解决了半自动音乐转录的任务,其中用户提供了有关分析中的复调混合的先验信息。我们为这项任务提出了一个非负矩阵反卷积框架,该框架允许仪器由每个基频(“移位方差”)的不同基函数表示。研究了两种不同类型的用户输入:一种是关于仪器类型的信息,这种信息可以使用仪器数据库中的基函数,另一种是手工抄写一些音符,这种笔记可以根据分析中的数据本身进行模板估计。实验是在一组声学仪器的混合数据集上进行的,最多可达五复调。结果证实,当使用数据库模板时,准确性会有显著的损失,并显示了Kullback-Leibler散度比最小二乘误差代价函数的优越性。
{"title":"Shift-variant non-negative matrix deconvolution for music transcription","authors":"Holger Kirchhoff, S. Dixon, Anssi Klapuri","doi":"10.1109/ICASSP.2012.6287833","DOIUrl":"https://doi.org/10.1109/ICASSP.2012.6287833","url":null,"abstract":"In this paper, we address the task of semi-automatic music transcription in which the user provides prior information about the polyphonic mixture under analysis. We propose a non-negative matrix deconvolution framework for this task that allows instruments to be represented by a different basis function for each fundamental frequency (“shift variance”). Two different types of user input are studied: information about the types of instruments, which enables the use of basis functions from an instrument database, and a manual transcription of a number of notes which enables the template estimation from the data under analysis itself. Experiments are performed on a data set of mixtures of acoustical instruments up to a polyphony of five. The results confirm a significant loss in accuracy when database templates are used and show the superiority of the Kullback-Leibler divergence over the least squares error cost function.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80348177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
期刊
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1