首页 > 最新文献

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition 基于多头自注意的扩展残差网络语音情绪识别
Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, H. Meng
Speech emotion recognition (SER) plays an important role in intelligent speech interaction. One vital challenge in SER is to extract emotion-relevant features from speech signals. In state-of-the-art SER techniques, deep learning methods, e.g, Convolutional Neural Networks (CNNs), are widely employed for feature learning and have achieved significant performance. However, in the CNN-oriented methods, two performance limitations have raised: 1) the loss of temporal structure of speech in the progressive resolution reduction; 2) the ignoring of relative dependencies between elements in suprasegmental feature sequence. In this paper, we proposed the combining use of Dilated Residual Network (DRN) and Multi-head Self-attention to alleviate the above limitations. By employing DRN, the network can retain high resolution of temporal structure in feature learning, with similar size of receptive field to CNN based approach. By employing Multi-head Self-attention, the network can model the inner dependencies between elements with different positions in the learned suprasegmental feature sequence, which enhances the importing of emotion-salient information. Experiments on emotional benchmarking dataset IEMOCAP have demonstrated the effectiveness of the proposed framework, with 11.7% to 18.6% relative improvement to state-of-the-art approaches.
语音情感识别在智能语音交互中起着重要的作用。从语音信号中提取情感相关特征是语音识别的一个重要挑战。在最先进的SER技术中,深度学习方法,例如卷积神经网络(cnn),被广泛用于特征学习并取得了显着的性能。然而,在面向cnn的方法中,提出了两个性能限制:1)在逐级分辨率降低中语音时间结构的丢失;2)忽略了超分段特征序列中元素之间的相对依赖关系。本文提出了扩展残差网络(DRN)和多头自注意相结合的方法来缓解上述局限性。通过使用DRN,网络可以在特征学习中保持较高的时间结构分辨率,并且接收野的大小与基于CNN的方法相似。该网络利用多头自注意对学习到的超切分特征序列中不同位置元素之间的内在依赖关系进行建模,增强了情绪显著性信息的导入。在情感基准测试数据集IEMOCAP上的实验证明了所提出框架的有效性,相对于最先进的方法,该框架的相对改进幅度为11.7%至18.6%。
{"title":"Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition","authors":"Runnan Li, Zhiyong Wu, Jia Jia, Sheng Zhao, H. Meng","doi":"10.1109/ICASSP.2019.8682154","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682154","url":null,"abstract":"Speech emotion recognition (SER) plays an important role in intelligent speech interaction. One vital challenge in SER is to extract emotion-relevant features from speech signals. In state-of-the-art SER techniques, deep learning methods, e.g, Convolutional Neural Networks (CNNs), are widely employed for feature learning and have achieved significant performance. However, in the CNN-oriented methods, two performance limitations have raised: 1) the loss of temporal structure of speech in the progressive resolution reduction; 2) the ignoring of relative dependencies between elements in suprasegmental feature sequence. In this paper, we proposed the combining use of Dilated Residual Network (DRN) and Multi-head Self-attention to alleviate the above limitations. By employing DRN, the network can retain high resolution of temporal structure in feature learning, with similar size of receptive field to CNN based approach. By employing Multi-head Self-attention, the network can model the inner dependencies between elements with different positions in the learned suprasegmental feature sequence, which enhances the importing of emotion-salient information. Experiments on emotional benchmarking dataset IEMOCAP have demonstrated the effectiveness of the proposed framework, with 11.7% to 18.6% relative improvement to state-of-the-art approaches.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"80 1 1","pages":"6675-6679"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89560647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Baseline Wander Removal and Isoelectric Correction in Electrocardiograms Using Clustering 基于聚类的心电图基线漂移去除和等电校正
Kjell Le, T. Eftestøl, K. Engan, Ø. Kleiven, S. Ørn
Baseline wander is a low frequency noise which is often removed by a highpass filter in electrocardiogram signals. However, this might not be sufficient to correct the isoelectric level of the signal, there exist an isoelectric bias. The isoelectric level is used as a reference point for amplitude measurements, and is recommended to have this point at 0 V, i.e. isoelectric adjusted. To correct the isoelectric level a clustering method is proposed to determine the isoelectric bias, which is thereafter subtracted from a signal averaged template. Calculation of the mean electrical axis (MEA) is used to evaluate the iso-electric correction. The MEA can be estimated from any lead pairs in the frontal plane, and a low variance in the estimates over the different lead pairs would suggest that the calculation of the MEA in each lead pair are consistent. Different methods are evaluated for calculating MEA, and the variance in the results as well as other measures, favour the proposed isoelectric adjusted signals in all MEA methods.
基线漂移是一种低频噪声,在心电图信号中常被高通滤波器去除。然而,这可能不足以纠正信号的等电电平,存在等电偏置。等电电平用作幅度测量的参考点,建议将该点置于0 V,即等电调节。为了校正等电水平,提出了一种聚类方法来确定等电偏差,然后从信号平均模板中减去等电偏差。用平均电轴(MEA)的计算来评价等电校正。可以从锋面上的任何导联对估计MEA,不同导联对估计的低方差表明每个导联对的MEA计算是一致的。对不同的MEA计算方法进行了评价,结果的方差和其他度量都有利于所提出的等电调整信号。
{"title":"Baseline Wander Removal and Isoelectric Correction in Electrocardiograms Using Clustering","authors":"Kjell Le, T. Eftestøl, K. Engan, Ø. Kleiven, S. Ørn","doi":"10.1109/ICASSP.2019.8683084","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683084","url":null,"abstract":"Baseline wander is a low frequency noise which is often removed by a highpass filter in electrocardiogram signals. However, this might not be sufficient to correct the isoelectric level of the signal, there exist an isoelectric bias. The isoelectric level is used as a reference point for amplitude measurements, and is recommended to have this point at 0 V, i.e. isoelectric adjusted. To correct the isoelectric level a clustering method is proposed to determine the isoelectric bias, which is thereafter subtracted from a signal averaged template. Calculation of the mean electrical axis (MEA) is used to evaluate the iso-electric correction. The MEA can be estimated from any lead pairs in the frontal plane, and a low variance in the estimates over the different lead pairs would suggest that the calculation of the MEA in each lead pair are consistent. Different methods are evaluated for calculating MEA, and the variance in the results as well as other measures, favour the proposed isoelectric adjusted signals in all MEA methods.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"1274-1278"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90022306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning for Super-resolution Vascular Ultrasound Imaging 超分辨率血管超声成像的深度学习
R. V. Sloun, Oren Solomon, M. Bruce, Zin Z. Khaing, Yonina C. Eldar, M. Mischi
Based on the intravascular infusion of gas microbubbles, which act as ultrasound contrast agents, ultrasound localization microscopy has enabled super resolution vascular imaging through precise detection of individual microbubbles across numerous imaging frames. However, analysis of high-density regions with significant overlaps among the microbubble point spread functions typically yields high localization errors, constraining the technique to low-concentration conditions. As such, long acquisition times are required for sufficient coverage of the vascular bed. Algorithms based on sparse recovery have been developed specifically to cope with the overlapping point-spread-functions of multiple microbubbles. While successful localization of densely-spaced emitters has been demonstrated, even highly optimized fast sparse recovery techniques involve a time-consuming iterative procedure. In this work, we used deep learning to improve upon standard ultrasound localization microscopy (Deep-ULM), and obtain super-resolution vascular images from high-density contrast-enhanced ultrasound data. Deep-ULM is suitable for real-time applications, resolving about 1250 high-resolution patches (128×128 pixels) per second using GPU acceleration.
基于血管内注入气体微泡作为超声造影剂,超声定位显微镜通过精确检测多个成像帧中的单个微泡,实现了超分辨率血管成像。然而,分析具有微泡点扩展函数之间显著重叠的高密度区域通常会产生很高的定位误差,限制了该技术在低浓度条件下的应用。因此,需要较长的采集时间才能充分覆盖血管床。基于稀疏恢复的算法专门用于处理多个微泡的重叠点扩展函数。虽然密集间隔发射体的成功定位已经被证明,但即使是高度优化的快速稀疏恢复技术也涉及一个耗时的迭代过程。在这项工作中,我们使用深度学习来改进标准超声定位显微镜(deep - ulm),并从高密度对比度增强超声数据中获得超分辨率血管图像。Deep-ULM适用于实时应用,使用GPU加速,每秒可解析1250个高分辨率补丁(128×128像素)。
{"title":"Deep Learning for Super-resolution Vascular Ultrasound Imaging","authors":"R. V. Sloun, Oren Solomon, M. Bruce, Zin Z. Khaing, Yonina C. Eldar, M. Mischi","doi":"10.1109/ICASSP.2019.8683813","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683813","url":null,"abstract":"Based on the intravascular infusion of gas microbubbles, which act as ultrasound contrast agents, ultrasound localization microscopy has enabled super resolution vascular imaging through precise detection of individual microbubbles across numerous imaging frames. However, analysis of high-density regions with significant overlaps among the microbubble point spread functions typically yields high localization errors, constraining the technique to low-concentration conditions. As such, long acquisition times are required for sufficient coverage of the vascular bed. Algorithms based on sparse recovery have been developed specifically to cope with the overlapping point-spread-functions of multiple microbubbles. While successful localization of densely-spaced emitters has been demonstrated, even highly optimized fast sparse recovery techniques involve a time-consuming iterative procedure. In this work, we used deep learning to improve upon standard ultrasound localization microscopy (Deep-ULM), and obtain super-resolution vascular images from high-density contrast-enhanced ultrasound data. Deep-ULM is suitable for real-time applications, resolving about 1250 high-resolution patches (128×128 pixels) per second using GPU acceleration.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"1055-1059"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90401242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Towards End-to-end Speech-to-text Translation with Two-pass Decoding 基于双通道解码的端到端语音到文本翻译
Tzu-Wei Sung, Jun-You Liu, Hung-yi Lee, Lin-Shan Lee
Speech-to-text translation (ST) refers to transforming the audio in source language to the text in target language. Mainstream solutions for such tasks are to cascade automatic speech recognition with machine translation, for which the transcriptions of the source language are needed in training. End-to-end approaches for ST tasks have been investigated because of not only technical interests such as to achieve globally optimized solution, but the need for ST tasks for the many source languages worldwide which do not have written form. In this paper, we propose a new end-to-end ST framework with two decoders to handle the relatively deeper relationships between the source language audio and target language text. The first-pass decoder generates some useful latent representations, and the second-pass decoder then integrates the output of both the encoder and the first-pass decoder to generate the text translation in target language. Only paired source language audio and target language text are used in training. Preliminary experiments on several language pairs showed improved performance, and offered some initial analysis.
语音到文本的翻译是指将源语音频转换成目的语文本。此类任务的主流解决方案是将自动语音识别与机器翻译级联,为此在训练中需要源语言的转录。对ST任务的端到端方法进行了调查,因为不仅技术利益,如实现全局优化的解决方案,而且需要ST任务为世界各地的许多源语言没有书面形式。在本文中,我们提出了一个带有两个解码器的新的端到端翻译框架来处理源语言音频和目标语言文本之间相对更深层次的关系。第一遍解码器生成一些有用的潜在表示,然后第二遍解码器集成编码器和第一遍解码器的输出以生成目标语言的文本翻译。在训练中只使用成对的源语言音频和目标语言文本。对几种语言对的初步实验表明,性能有所提高,并提供了一些初步分析。
{"title":"Towards End-to-end Speech-to-text Translation with Two-pass Decoding","authors":"Tzu-Wei Sung, Jun-You Liu, Hung-yi Lee, Lin-Shan Lee","doi":"10.1109/ICASSP.2019.8682801","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682801","url":null,"abstract":"Speech-to-text translation (ST) refers to transforming the audio in source language to the text in target language. Mainstream solutions for such tasks are to cascade automatic speech recognition with machine translation, for which the transcriptions of the source language are needed in training. End-to-end approaches for ST tasks have been investigated because of not only technical interests such as to achieve globally optimized solution, but the need for ST tasks for the many source languages worldwide which do not have written form. In this paper, we propose a new end-to-end ST framework with two decoders to handle the relatively deeper relationships between the source language audio and target language text. The first-pass decoder generates some useful latent representations, and the second-pass decoder then integrates the output of both the encoder and the first-pass decoder to generate the text translation in target language. Only paired source language audio and target language text are used in training. Preliminary experiments on several language pairs showed improved performance, and offered some initial analysis.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"7175-7179"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88029031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Network Adaptation Strategies for Learning New Classes without Forgetting the Original Ones 学习新课程不忘原课程的网络适应策略
Hagai Taitelbaum, Gal Chechik, J. Goldberger
We address the problem of adding new classes to an existing classifier without hurting the original classes, when no access is allowed to any sample from the original classes. This problem arises frequently since models are often shared without their training data, due to privacy and data ownership concerns. We propose an easy-to-use approach that modifies the original classifier by retraining a suitable subset of layers using a linearly-tuned, knowledge-distillation regularization. The set of layers that is tuned depends on the number of new added classes and the number of original classes.We evaluate the proposed method on two standard datasets, first in a language-identification task, then in an image classification setup. In both cases, the method achieves classification accuracy that is almost as good as that obtained by a system trained using unrestricted samples from both the original and new classes.
当不允许访问原始类的任何样本时,我们解决了在不损害原始类的情况下向现有分类器添加新类的问题。这个问题经常出现,因为由于隐私和数据所有权问题,模型通常在没有训练数据的情况下共享。我们提出了一种易于使用的方法,通过使用线性调整的知识蒸馏正则化重新训练合适的层子集来修改原始分类器。调优的层集取决于新添加类的数量和原始类的数量。我们在两个标准数据集上评估了所提出的方法,首先是在语言识别任务中,然后是在图像分类设置中。在这两种情况下,该方法获得的分类精度几乎与使用来自原始和新类的无限制样本训练的系统获得的分类精度一样好。
{"title":"Network Adaptation Strategies for Learning New Classes without Forgetting the Original Ones","authors":"Hagai Taitelbaum, Gal Chechik, J. Goldberger","doi":"10.1109/ICASSP.2019.8682848","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682848","url":null,"abstract":"We address the problem of adding new classes to an existing classifier without hurting the original classes, when no access is allowed to any sample from the original classes. This problem arises frequently since models are often shared without their training data, due to privacy and data ownership concerns. We propose an easy-to-use approach that modifies the original classifier by retraining a suitable subset of layers using a linearly-tuned, knowledge-distillation regularization. The set of layers that is tuned depends on the number of new added classes and the number of original classes.We evaluate the proposed method on two standard datasets, first in a language-identification task, then in an image classification setup. In both cases, the method achieves classification accuracy that is almost as good as that obtained by a system trained using unrestricted samples from both the original and new classes.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"78 1","pages":"3637-3641"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90247056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Introducing the Orthogonal Periodic Sequences for the Identification of Functional Link Polynomial Filters 引入正交周期序列用于函数链多项式滤波器的辨识
A. Carini, S. Orcioni, S. Cecchi
The paper introduces a novel family of deterministic signals, the orthogonal periodic sequences (OPSs), for the identification of functional link polynomial (FLiP) filters. The novel sequences share many of the characteristics of the perfect periodic sequences (PPSs). As the PPSs, they allow the perfect identification of a FLiP filter on a finite time interval with the cross-correlation method. In contrast to the PPSs, OPSs can identify also non-orthogonal FLiP filters, as the Volterra filters. With OPSs, the input sequence can have any persistently exciting distribution and can also be a quantized sequence. OPSs can often identify FLiP filters with a sequence period and a computational complexity much smaller than that of PPSs. Several results are reported to show the effectiveness of the proposed sequences identifying a real nonlinear audio system.
本文引入了一种新的确定性信号族——正交周期序列(OPSs),用于函数链多项式滤波器的辨识。新序列具有完美周期序列(PPSs)的许多特征。作为pps,它们允许在有限时间间隔内用互相关方法完美地识别翻转滤波器。与pps相比,ops还可以识别非正交FLiP滤波器,如Volterra滤波器。使用ops,输入序列可以具有任何持久的激励分布,也可以是量化序列。ops通常可以识别具有序列周期和计算复杂度的FLiP滤波器。实验结果表明了所提序列识别实际非线性音频系统的有效性。
{"title":"Introducing the Orthogonal Periodic Sequences for the Identification of Functional Link Polynomial Filters","authors":"A. Carini, S. Orcioni, S. Cecchi","doi":"10.1109/ICASSP.2019.8683342","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683342","url":null,"abstract":"The paper introduces a novel family of deterministic signals, the orthogonal periodic sequences (OPSs), for the identification of functional link polynomial (FLiP) filters. The novel sequences share many of the characteristics of the perfect periodic sequences (PPSs). As the PPSs, they allow the perfect identification of a FLiP filter on a finite time interval with the cross-correlation method. In contrast to the PPSs, OPSs can identify also non-orthogonal FLiP filters, as the Volterra filters. With OPSs, the input sequence can have any persistently exciting distribution and can also be a quantized sequence. OPSs can often identify FLiP filters with a sequence period and a computational complexity much smaller than that of PPSs. Several results are reported to show the effectiveness of the proposed sequences identifying a real nonlinear audio system.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"65 1","pages":"5486-5490"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90268114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Performance Analysis of Convex Data Detection in MIMO MIMO中凸数据检测性能分析
Ehsan Abbasi, Fariborz Salehi, B. Hassibi
We study the performance of a convex data detection method in large multiple-input multiple-output (MIMO) systems. The goal is to recover an n-dimensional complex signal whose entries are from an arbitrary constellation $mathcal{D} subset mathbb{C}$, using m noisy linear measurements. Since the Maximum Likelihood (ML) estimation involves minimizing a loss function over the discrete set ${mathcal{D}^n}$, it becomes computationally intractable for large n. One approach is to relax to a $mathcal{D}$ convex set and to utilize convex programing to solve the problem precise and then to map the answer to the closest point in the set $mathcal{D}$. We assume an i.i.d. complex Gaussian channel matrix and derive expressions for the symbol error probability of the proposed convex method in the limit of m, n → ∞. Prior work was only able to do so for real valued constellations such as BPSK and PAM. The main contribution of this paper is to extend the results to complex valued constellations. In particular, we use our main theorem to calculate the performance of the complex algorithm for PSK and QAM constellations. In addition, we introduce a closed-form formula for the symbol error probability in the high-SNR regime and determine the minimum number of measurements m required for consistent signal recovery.
研究了大型多输入多输出(MIMO)系统中凸数据检测方法的性能。目标是恢复一个n维复信号,其条目来自任意星座$mathcal{D} 子集mathbb{C}$,使用m噪声线性测量。由于最大似然(ML)估计涉及最小化离散集${mathcal{D}^n}$上的损失函数,因此对于较大的n来说,它在计算上变得难以处理。一种方法是松弛到$mathcal{D}$凸集,并利用凸编程精确地解决问题,然后将答案映射到集合$mathcal{D}$中最近的点。我们假设一个i.i.d的复高斯信道矩阵,并推导出该凸方法在m, n→∞极限下的符号误差概率表达式。以前的工作只能对BPSK和PAM等真正有价值的星座进行这样的研究。本文的主要贡献是将结果推广到复值星座。特别地,我们使用我们的主要定理来计算PSK和QAM星座的复杂算法的性能。此外,我们引入了高信噪比下符号误差概率的封闭公式,并确定了一致信号恢复所需的最小测量次数m。
{"title":"Performance Analysis of Convex Data Detection in MIMO","authors":"Ehsan Abbasi, Fariborz Salehi, B. Hassibi","doi":"10.1109/ICASSP.2019.8683890","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683890","url":null,"abstract":"We study the performance of a convex data detection method in large multiple-input multiple-output (MIMO) systems. The goal is to recover an n-dimensional complex signal whose entries are from an arbitrary constellation $mathcal{D} subset mathbb{C}$, using m noisy linear measurements. Since the Maximum Likelihood (ML) estimation involves minimizing a loss function over the discrete set ${mathcal{D}^n}$, it becomes computationally intractable for large n. One approach is to relax to a $mathcal{D}$ convex set and to utilize convex programing to solve the problem precise and then to map the answer to the closest point in the set $mathcal{D}$. We assume an i.i.d. complex Gaussian channel matrix and derive expressions for the symbol error probability of the proposed convex method in the limit of m, n → ∞. Prior work was only able to do so for real valued constellations such as BPSK and PAM. The main contribution of this paper is to extend the results to complex valued constellations. In particular, we use our main theorem to calculate the performance of the complex algorithm for PSK and QAM constellations. In addition, we introduce a closed-form formula for the symbol error probability in the high-SNR regime and determine the minimum number of measurements m required for consistent signal recovery.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"66 1","pages":"4554-4558"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90291360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Automatic Transcription of Diatonic Harmonica Recordings 自动转录的全音阶口琴录音
Filipe M. Lins, M. Johann, Emmanouil Benetos, Rodrigo Schramm
This paper presents a method for automatic transcription of the diatonic Harmonica instrument. It estimates the multi-pitch activations through a spectrogram factorisation framework. This framework is based on Probabilistic Latent Component Analysis (PLCA) and uses a fixed 4-dimensional dictionary with spectral templates extracted from Harmonica’s instrument timbre. Methods based on spectrogram factorisation may suffer from local-optima issues in the presence of harmonic overlap or considerable timbre variability. To alleviate this issue, we propose a set of harmonic constraints that are inherent to the Harmonica instrument note layout or are caused by specific diatonic Harmonica playing techniques. These constraints help to guide the factorisation process until convergence into meaningful multi-pitch activations is achieved. This work also builds a new audio dataset containing solo recordings of diatonic Harmonica excerpts and the respective multi-pitch annotations. We compare our proposed approach against multiple baseline techniques for automatic music transcription on this dataset and report the results based on frame-based F-measure statistics.
本文提出了一种自动抄写全音阶口琴的方法。它通过谱图分解框架估计多音高激活。该框架基于概率潜在成分分析(PLCA),并使用固定的四维字典,其中包含从口琴乐器音色中提取的频谱模板。在存在谐波重叠或相当大的音色变化时,基于谱图分解的方法可能存在局部最优问题。为了缓解这个问题,我们提出了一套谐波约束,这些约束是口琴乐器音符布局固有的或由特定的全音阶口琴演奏技术引起的。这些约束有助于指导分解过程,直到收敛到有意义的多音高激活。这项工作还建立了一个新的音频数据集,其中包含全音阶口琴摘录的独奏录音和相应的多音高注释。我们将我们提出的方法与该数据集上用于自动音乐转录的多种基线技术进行比较,并基于基于帧的f测量统计报告结果。
{"title":"Automatic Transcription of Diatonic Harmonica Recordings","authors":"Filipe M. Lins, M. Johann, Emmanouil Benetos, Rodrigo Schramm","doi":"10.1109/ICASSP.2019.8682334","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682334","url":null,"abstract":"This paper presents a method for automatic transcription of the diatonic Harmonica instrument. It estimates the multi-pitch activations through a spectrogram factorisation framework. This framework is based on Probabilistic Latent Component Analysis (PLCA) and uses a fixed 4-dimensional dictionary with spectral templates extracted from Harmonica’s instrument timbre. Methods based on spectrogram factorisation may suffer from local-optima issues in the presence of harmonic overlap or considerable timbre variability. To alleviate this issue, we propose a set of harmonic constraints that are inherent to the Harmonica instrument note layout or are caused by specific diatonic Harmonica playing techniques. These constraints help to guide the factorisation process until convergence into meaningful multi-pitch activations is achieved. This work also builds a new audio dataset containing solo recordings of diatonic Harmonica excerpts and the respective multi-pitch annotations. We compare our proposed approach against multiple baseline techniques for automatic music transcription on this dataset and report the results based on frame-based F-measure statistics.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"256-260"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90299941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Imitation Refinement for X-ray Diffraction Signal Processing x射线衍射信号处理的模拟改进
Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, J. Gregoire, C. Gomes
Many real-world tasks involve identifying signals from data satisfying background or prior knowledge. In domains like materials discovery, due to the flaws and biases in raw experimental data, the identification of X-ray diffraction (XRD) signals often requires significant (manual) expert work to find refined signals that are similar to the ideal theoretical ones. Automatically refining the raw XRD signals utilizing simulated theoretical data is thus desirable. We propose imitation refinement, a novel approach to refine imperfect input signals, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined signals imitate the ideal ones. The classifier is trained on the ideal simulated data to classify signals and learns an embedding space where each class is represented by a prototype. The refiner learns to refine the imperfect signals with small modifications, such that their embeddings are closer to the corresponding prototypes. We show that the refiner can be trained in both supervised and unsupervised fashions. We further illustrate the effectiveness of the proposed approach both qualitatively and quantitatively in an X-ray diffraction signal refinement task in materials discovery.
许多现实世界的任务涉及从满足背景或先验知识的数据中识别信号。在材料发现等领域,由于原始实验数据的缺陷和偏差,x射线衍射(XRD)信号的识别往往需要大量的(人工)专家工作来找到与理想理论信号相似的精炼信号。因此,利用模拟理论数据自动精炼原始XRD信号是可取的。我们提出了一种新的方法来改进不完美的输入信号,通过一个预训练的分类器结合来自模拟理论数据的先验知识,使改进后的信号模仿理想信号。分类器在理想的模拟数据上进行训练,对信号进行分类,并学习一个嵌入空间,其中每个类由一个原型表示。细化器通过小的修改来学习细化不完美的信号,使它们的嵌入更接近相应的原型。我们证明了精炼厂可以用监督和非监督两种方式进行训练。我们进一步说明了所提出的方法在材料发现中的x射线衍射信号细化任务中的定性和定量有效性。
{"title":"Imitation Refinement for X-ray Diffraction Signal Processing","authors":"Junwen Bai, Zihang Lai, Runzhe Yang, Yexiang Xue, J. Gregoire, C. Gomes","doi":"10.1109/ICASSP.2019.8683723","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683723","url":null,"abstract":"Many real-world tasks involve identifying signals from data satisfying background or prior knowledge. In domains like materials discovery, due to the flaws and biases in raw experimental data, the identification of X-ray diffraction (XRD) signals often requires significant (manual) expert work to find refined signals that are similar to the ideal theoretical ones. Automatically refining the raw XRD signals utilizing simulated theoretical data is thus desirable. We propose imitation refinement, a novel approach to refine imperfect input signals, guided by a pre-trained classifier incorporating prior knowledge from simulated theoretical data, such that the refined signals imitate the ideal ones. The classifier is trained on the ideal simulated data to classify signals and learns an embedding space where each class is represented by a prototype. The refiner learns to refine the imperfect signals with small modifications, such that their embeddings are closer to the corresponding prototypes. We show that the refiner can be trained in both supervised and unsupervised fashions. We further illustrate the effectiveness of the proposed approach both qualitatively and quantitatively in an X-ray diffraction signal refinement task in materials discovery.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"46 1","pages":"3337-3341"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90311979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Variational Adaptive Population Importance Sampler 变分适应种群重要性采样器
Yousef El-Laham, P. Djurić, M. Bugallo
Adaptive importance sampling (AIS) methods are a family of algorithms which can be used to approximate Bayesian posterior distributions. Many AIS algorithms exist in the literature, where the differences arise in the manner by which the proposal distribution is adapted at each iteration. The adaptive population importance sampler (APIS), for example, deterministically samples from a mixture distribution and uses the local information given by the samples and weights to adapt the location parameter of each proposal. The update rules by nature are heuristic, but effective, especially in the case that the target posterior is multimodal. In this work, we introduce a novel AIS scheme which incorporates modern techniques in stochastic optimization to improve the methodology for higher-dimensional posterior inference. More specifically, we derive update rules for the parameters of each proposal by means of deterministic mixture sampling and show that the method outperforms other state-of-the-art approaches in high-dimensional scenarios.
自适应重要性抽样(AIS)方法是一类用于近似贝叶斯后验分布的算法。文献中存在许多AIS算法,其中差异在于每次迭代时适应提案分布的方式。例如,自适应种群重要性采样器(api)从混合分布中确定样本,并使用样本和权重给出的局部信息来适应每个提案的位置参数。更新规则本质上是启发式的,但它是有效的,特别是在目标后验是多模态的情况下。在这项工作中,我们引入了一种新的AIS方案,该方案结合了随机优化中的现代技术,以改进高维后验推理的方法。更具体地说,我们通过确定性混合抽样推导出每个提案参数的更新规则,并表明该方法在高维场景下优于其他最先进的方法。
{"title":"A Variational Adaptive Population Importance Sampler","authors":"Yousef El-Laham, P. Djurić, M. Bugallo","doi":"10.1109/ICASSP.2019.8683152","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683152","url":null,"abstract":"Adaptive importance sampling (AIS) methods are a family of algorithms which can be used to approximate Bayesian posterior distributions. Many AIS algorithms exist in the literature, where the differences arise in the manner by which the proposal distribution is adapted at each iteration. The adaptive population importance sampler (APIS), for example, deterministically samples from a mixture distribution and uses the local information given by the samples and weights to adapt the location parameter of each proposal. The update rules by nature are heuristic, but effective, especially in the case that the target posterior is multimodal. In this work, we introduce a novel AIS scheme which incorporates modern techniques in stochastic optimization to improve the methodology for higher-dimensional posterior inference. More specifically, we derive update rules for the parameters of each proposal by means of deterministic mixture sampling and show that the method outperforms other state-of-the-art approaches in high-dimensional scenarios.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"69 1","pages":"5052-5056"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84446581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1