首页 > 最新文献

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context 基于时空背景的实时磁共振图像声道发音轮廓检测
Ashwin Hebbar, Rahul Sharma, Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan
Due to its ability to visualize and measure the dynamics of vocal tract shaping during speech production, real-time magnetic resonance imaging (rtMRI) has emerged as one of the prominent research tools. The ability to track different articulators such as the tongue, lips, velum, and the pharynx is a crucial step toward automating further scientific and clinical analysis. Recently, various researchers have addressed the problem of detecting articulatory boundaries, but those are primarily limited to static-image based methods. In this work, we propose to use information from temporal dynamics together with the spatial structure to detect the articulatory boundaries in rtMRI videos. We train a convolutional LSTM network to detect and label the articulatory contours. We compare the produced contours against reference labels generated by iteratively fitting a manually created subject-specific template. We observe that the proposed method outperforms solely image-based methods, especially for the difficult-to-track articulators involved in airway constriction formation during speech.
由于实时磁共振成像(rtMRI)能够可视化和测量语音产生过程中声道形成的动态,因此已成为重要的研究工具之一。跟踪不同的发音器官,如舌头、嘴唇、口膜和咽部的能力是实现进一步科学和临床分析自动化的关键一步。近年来,许多研究人员已经解决了发音边界检测的问题,但这些问题主要局限于基于静态图像的方法。在这项工作中,我们建议使用时间动态信息和空间结构来检测rtMRI视频中的发音边界。我们训练了一个卷积LSTM网络来检测和标记发音轮廓。我们将生成的轮廓与通过迭代拟合手动创建的特定主题模板生成的参考标签进行比较。我们观察到,所提出的方法优于单纯的基于图像的方法,特别是对于难以跟踪的发音器,涉及在讲话期间气道收缩形成。
{"title":"Vocal Tract Articulatory Contour Detection in Real-Time Magnetic Resonance Images Using Spatio-Temporal Context","authors":"Ashwin Hebbar, Rahul Sharma, Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan","doi":"10.1109/ICASSP40776.2020.9053111","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053111","url":null,"abstract":"Due to its ability to visualize and measure the dynamics of vocal tract shaping during speech production, real-time magnetic resonance imaging (rtMRI) has emerged as one of the prominent research tools. The ability to track different articulators such as the tongue, lips, velum, and the pharynx is a crucial step toward automating further scientific and clinical analysis. Recently, various researchers have addressed the problem of detecting articulatory boundaries, but those are primarily limited to static-image based methods. In this work, we propose to use information from temporal dynamics together with the spatial structure to detect the articulatory boundaries in rtMRI videos. We train a convolutional LSTM network to detect and label the articulatory contours. We compare the produced contours against reference labels generated by iteratively fitting a manually created subject-specific template. We observe that the proposed method outperforms solely image-based methods, especially for the difficult-to-track articulators involved in airway constriction formation during speech.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"104 1","pages":"7354-7358"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73471078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Interpretability-Guided Convolutional Neural Networks for Seismic Fault Segmentation 基于可解释性的卷积神经网络地震断层分割
Zhining Liu, Cheng Zhou, Guangmin Hu, Chengyun Song
Delineating the seismic fault, which is an important type of geologic structures in seismic images, is a key step for seismic interpretation. Comparing with conventional methods that design a number of hand-crafted features based on the observed characteristics of the seismic fault, convolutional neural networks (CNNs) have proven to be more powerful for automatically learning effective representations. However, the CNN usually serves as a black box in the process of training and inference, which would lead to trust issues. The inability of humans to understand the CNN would be more problematic, especially in critical areas like seismic exploration, medicine and financial markets. To include domain knowledge to improve the interpretability of the CNN, we propose to jointly optimize the prediction accuracy and consistency between explanations of the neural network and domain knowledge. Taking the seismic fault segmentation as an example, we show that the proposed method not only gives reasonable explanations for its predictions, but also more accurately predicts faults than the baseline model.
地震断层是地震图像中一种重要的地质构造类型,其圈定是地震解释的关键步骤。与基于地震断层观测特征设计大量手工特征的传统方法相比,卷积神经网络(cnn)已被证明在自动学习有效表征方面更强大。然而,CNN在训练和推理过程中通常会充当一个黑匣子,这会导致信任问题。人类无法理解CNN将会带来更大的问题,尤其是在地震勘探、医学和金融市场等关键领域。为了包含领域知识来提高CNN的可解释性,我们提出联合优化神经网络与领域知识解释之间的预测精度和一致性。以地震断层分割为例,表明该方法不仅对其预测给出了合理的解释,而且比基线模型更准确地预测了断层。
{"title":"Interpretability-Guided Convolutional Neural Networks for Seismic Fault Segmentation","authors":"Zhining Liu, Cheng Zhou, Guangmin Hu, Chengyun Song","doi":"10.1109/ICASSP40776.2020.9053472","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053472","url":null,"abstract":"Delineating the seismic fault, which is an important type of geologic structures in seismic images, is a key step for seismic interpretation. Comparing with conventional methods that design a number of hand-crafted features based on the observed characteristics of the seismic fault, convolutional neural networks (CNNs) have proven to be more powerful for automatically learning effective representations. However, the CNN usually serves as a black box in the process of training and inference, which would lead to trust issues. The inability of humans to understand the CNN would be more problematic, especially in critical areas like seismic exploration, medicine and financial markets. To include domain knowledge to improve the interpretability of the CNN, we propose to jointly optimize the prediction accuracy and consistency between explanations of the neural network and domain knowledge. Taking the seismic fault segmentation as an example, we show that the proposed method not only gives reasonable explanations for its predictions, but also more accurately predicts faults than the baseline model.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"4312-4316"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73474098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion 自定义词频损耗准则改进端到端自动识别中的专有名词识别
Cal Peyser, Tara N. Sainath, G. Pundak
Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional ASR models, E2E systems lack an explicit pronounciation model that can be specifically trained with proper noun pronounciations and a language model that can be trained on a large text-only corpus. Past work has addressed this issue by incorporating additional training data or additional models. In this paper, we instead build on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition. Unlike past work on this problem, this method requires no new data during training or external models during inference. We see improvements ranging from 2% to 7% relative on several relevant benchmarks.
专有名词对端到端(E2E)自动语音识别(ASR)系统提出了挑战,因为特定名称可能在训练期间很少出现,并且可能具有与更常见的单词相似的发音。与传统的ASR模型不同,E2E系统缺乏可以用专有名词发音进行专门训练的明确发音模型和可以在大型纯文本语料库上进行训练的语言模型。过去的工作通过合并额外的训练数据或额外的模型来解决这个问题。在本文中,我们在最小词错误率(MWER)训练的最新进展的基础上,开发了两个新的丢失标准,特别强调专有名词识别。与以往解决该问题的工作不同,该方法在训练过程中不需要新的数据,在推理过程中也不需要外部模型。在几个相关的基准测试中,我们看到了相对于2%到7%的改进。
{"title":"Improving Proper Noun Recognition in End-To-End Asr by Customization of the Mwer Loss Criterion","authors":"Cal Peyser, Tara N. Sainath, G. Pundak","doi":"10.1109/ICASSP40776.2020.9054235","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054235","url":null,"abstract":"Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional ASR models, E2E systems lack an explicit pronounciation model that can be specifically trained with proper noun pronounciations and a language model that can be trained on a large text-only corpus. Past work has addressed this issue by incorporating additional training data or additional models. In this paper, we instead build on recent advances in minimum word error rate (MWER) training to develop two new loss criteria that specifically emphasize proper noun recognition. Unlike past work on this problem, this method requires no new data during training or external models during inference. We see improvements ranging from 2% to 7% relative on several relevant benchmarks.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"7789-7793"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74076425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Coded Illumination and Multiplexing for Lensless Imaging 无透镜成像的编码照明和多路复用
Yucheng Zheng, Rongjia Zhang, M. Salman Asif
Mask-based lensless cameras offer an alternative option to conventional cameras. Compared to conventional cameras, lensless cameras can be extremely thin, flexible, and lightweight. Despite these advantages, the quality of images recovered from the lensless cameras is often poor because of the ill-conditioning of the underlying linear system. In this paper, we propose a new method to address the problem of illconditioning by combining coded illumination patterns with the mask-based lensless imaging. We assume that the object is illuminated with multiple binary patterns and the camera acquires a sequence of images for different illumination patterns. We propose a low-complexity, recursive algorithm that avoids storing all the images or creating a large system matrix. We present simulation results on standard test images under various extreme conditions and demonstrate that the quality of the image improves significantly with a small number of illumination patterns.
基于掩模的无镜头相机为传统相机提供了另一种选择。与传统相机相比,无镜头相机可以非常薄、灵活和轻便。尽管有这些优点,但由于底层线性系统的不良调节,从无镜头相机恢复的图像质量往往很差。本文提出了一种将编码照明模式与基于掩模的无透镜成像相结合的新方法来解决不适应问题。我们假设物体被多种二值模式照射,相机获得不同光照模式下的一系列图像。我们提出一种低复杂度的递归算法,避免存储所有图像或创建一个大的系统矩阵。我们给出了在各种极端条件下的标准测试图像的仿真结果,并证明了少量照明模式显著提高了图像质量。
{"title":"Coded Illumination and Multiplexing for Lensless Imaging","authors":"Yucheng Zheng, Rongjia Zhang, M. Salman Asif","doi":"10.1109/ICASSP40776.2020.9052955","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052955","url":null,"abstract":"Mask-based lensless cameras offer an alternative option to conventional cameras. Compared to conventional cameras, lensless cameras can be extremely thin, flexible, and lightweight. Despite these advantages, the quality of images recovered from the lensless cameras is often poor because of the ill-conditioning of the underlying linear system. In this paper, we propose a new method to address the problem of illconditioning by combining coded illumination patterns with the mask-based lensless imaging. We assume that the object is illuminated with multiple binary patterns and the camera acquires a sequence of images for different illumination patterns. We propose a low-complexity, recursive algorithm that avoids storing all the images or creating a large system matrix. We present simulation results on standard test images under various extreme conditions and demonstrate that the quality of the image improves significantly with a small number of illumination patterns.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"74 1","pages":"9250-9253"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75278631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploiting Channel Locality for Adaptive Massive MIMO Signal Detection 利用信道局部性实现自适应海量MIMO信号检测
Mehrdad Khani Shirkoohi, Mohammad Alizadeh, J. Hoydis, Phil Fleming
We propose MMNet, a deep learning MIMO detection scheme that significantly outperforms existing approaches on realistic channels with the same or lower computational complexity. MMNet’s design builds on the theory of iterative soft-thresholding algorithms and uses a novel training algorithm that leverages temporal and spectral correlation in real channels to accelerate training. These innovations make it practical to train MMNet online for every realization of the channel. On spatially-correlated channels, MMNet achieves the same error rate as the next-best learning scheme (OAMPNet) at 2.5dB lower signal-to-noise ratio (SNR), and with at least 10× less computational complexity. MMNet is also 4–8dB better overall than the linear minimum mean square error (MMSE) detector.
我们提出了MMNet,这是一种深度学习MIMO检测方案,在具有相同或更低计算复杂度的实际信道上显着优于现有方法。MMNet的设计建立在迭代软阈值算法理论的基础上,并使用了一种新的训练算法,该算法利用真实信道中的时间和频谱相关性来加速训练。这些创新使得对MMNet进行在线培训以实现该渠道的所有实现成为可能。在空间相关信道上,MMNet的误差率与次优学习方案(OAMPNet)相同,信噪比(SNR)降低2.5dB,计算复杂度至少降低10倍。MMNet总体上也比线性最小均方误差(MMSE)检测器好4-8dB。
{"title":"Exploiting Channel Locality for Adaptive Massive MIMO Signal Detection","authors":"Mehrdad Khani Shirkoohi, Mohammad Alizadeh, J. Hoydis, Phil Fleming","doi":"10.1109/ICASSP40776.2020.9052971","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052971","url":null,"abstract":"We propose MMNet, a deep learning MIMO detection scheme that significantly outperforms existing approaches on realistic channels with the same or lower computational complexity. MMNet’s design builds on the theory of iterative soft-thresholding algorithms and uses a novel training algorithm that leverages temporal and spectral correlation in real channels to accelerate training. These innovations make it practical to train MMNet online for every realization of the channel. On spatially-correlated channels, MMNet achieves the same error rate as the next-best learning scheme (OAMPNet) at 2.5dB lower signal-to-noise ratio (SNR), and with at least 10× less computational complexity. MMNet is also 4–8dB better overall than the linear minimum mean square error (MMSE) detector.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"8565-8568"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75642564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hybrid Active Contour Driven by Double-Weighted Signed Pressure Force for Image Segmentation 双加权签名压力驱动的混合主动轮廓图像分割
Xingyu Fu, Bin Fang, Mingliang Zhou, Jiajun Li
In this paper, we proposed a novel hybrid active contour driven by double-weighted signed pressure force method for image segmentation. First, the Legendre polynomials and global information are integrated into the signed pressure force (SPF) function and a coefficient is applied to weight the effect degrees of the Legendre term and global term. Second, by introducing a weighted factor as the coefficient of inside and outside region fitting center, the curve can be optimally evolved to the interior and branches of the region of interest (ROI). Third, a new edge stopping function is adopted to robustly capture the edge of ROI and speed up the multi-object image segmentation. Experiments show that the proposed method can achieve better accuracy for images with noise, inhomogeneous intensity, blur edge and complex branches, in the meanwhile, it also controls the time-consuming effectively and is insensitive to the initial contour position.
本文提出了一种基于双加权签名压力力驱动的混合主动轮廓分割方法。首先,将Legendre多项式和全局信息集成到签名压力(SPF)函数中,并采用系数对Legendre项和全局项的影响程度进行加权。其次,通过引入加权因子作为区域内外拟合中心的系数,使曲线最优演化到感兴趣区域的内部和分支;第三,采用一种新的边缘停止函数,鲁棒地捕获感兴趣区域的边缘,加快多目标图像分割速度。实验表明,该方法对噪声、强度不均匀、边缘模糊和分支复杂的图像具有较好的精度,同时有效地控制了算法的耗时,对初始轮廓位置不敏感。
{"title":"Hybrid Active Contour Driven by Double-Weighted Signed Pressure Force for Image Segmentation","authors":"Xingyu Fu, Bin Fang, Mingliang Zhou, Jiajun Li","doi":"10.1109/ICASSP40776.2020.9054627","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054627","url":null,"abstract":"In this paper, we proposed a novel hybrid active contour driven by double-weighted signed pressure force method for image segmentation. First, the Legendre polynomials and global information are integrated into the signed pressure force (SPF) function and a coefficient is applied to weight the effect degrees of the Legendre term and global term. Second, by introducing a weighted factor as the coefficient of inside and outside region fitting center, the curve can be optimally evolved to the interior and branches of the region of interest (ROI). Third, a new edge stopping function is adopted to robustly capture the edge of ROI and speed up the multi-object image segmentation. Experiments show that the proposed method can achieve better accuracy for images with noise, inhomogeneous intensity, blur edge and complex branches, in the meanwhile, it also controls the time-consuming effectively and is insensitive to the initial contour position.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"4 1","pages":"2463-2467"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75749810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhanced Action Tubelet Detector for Spatio-Temporal Video Action Detection 用于时空视频动作检测的增强型动作小管检测器
Yutang Wu, Hanli Wang, Shuheng Wang, Qinyu Li
Current spatio-temporal action detection methods usually employ a two-stream architecture, a RGB stream for raw images and an auxiliary motion stream for optical flow. Training is required individually for each stream and more efforts are necessary to improve the precision of RGB stream. To this end, a single stream network named enhanced action tubelet (EAT) detector is proposed in this work based on RGB stream. A modulation layer is designed to modulate RGB features with conditional information from the visual clues of optical flow and human pose. This network is end-to-end and the proposed layer can be easily applied into other action detectors. Experiments show that EAT detector outperforms traditional RGB stream and is competitive to existing two-stream methods while free from the trouble of training streams separately. By being embedded in a new three-stream architecture, the resulting three-stream EAT detector achieves impressive performances among the best competitors on UCF-Sports, JHMDB and UCF-101.
当前的时空动作检测方法通常采用两流架构,原始图像的RGB流和光流的辅助运动流。每个流都需要单独训练,提高RGB流的精度需要更多的努力。为此,本文提出了一种基于RGB流的单流网络——增强动作小管(enhanced action tubelet, EAT)检测器。设计了一个调制层,利用来自光流和人体姿态的视觉线索的条件信息调制RGB特征。该网络是端到端的,所提出的层可以很容易地应用到其他动作检测器中。实验结果表明,该检测器不仅优于传统的RGB流,而且可以与现有的双流方法相竞争,同时避免了单独训练流的麻烦。通过嵌入新的三流架构,由此产生的三流EAT检测器在UCF-Sports, JHMDB和UCF-101的最佳竞争对手中取得了令人印象深刻的性能。
{"title":"Enhanced Action Tubelet Detector for Spatio-Temporal Video Action Detection","authors":"Yutang Wu, Hanli Wang, Shuheng Wang, Qinyu Li","doi":"10.1109/ICASSP40776.2020.9054394","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054394","url":null,"abstract":"Current spatio-temporal action detection methods usually employ a two-stream architecture, a RGB stream for raw images and an auxiliary motion stream for optical flow. Training is required individually for each stream and more efforts are necessary to improve the precision of RGB stream. To this end, a single stream network named enhanced action tubelet (EAT) detector is proposed in this work based on RGB stream. A modulation layer is designed to modulate RGB features with conditional information from the visual clues of optical flow and human pose. This network is end-to-end and the proposed layer can be easily applied into other action detectors. Experiments show that EAT detector outperforms traditional RGB stream and is competitive to existing two-stream methods while free from the trouble of training streams separately. By being embedded in a new three-stream architecture, the resulting three-stream EAT detector achieves impressive performances among the best competitors on UCF-Sports, JHMDB and UCF-101.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"240 1","pages":"2388-2392"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74682503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Rate-Invariant Autoencoding of Time-Series 时间序列的率不变自编码
K. Koneripalli, Suhas Lohit, Rushil Anirudh, P. Turaga
For time-series classification and retrieval applications, an important requirement is to develop representations/metrics that are robust to re-parametrization of the time-axis. Temporal re-parametrization as a model can account for variability in the underlying generative process, sampling rate variations, or plain temporal mis-alignment. In this paper, we extend prior work in disentangling latent spaces of autoencoding models, to design a novel architecture to learn rate-invariant latent codes in a completely unsupervised fashion. Unlike conventional neural network architectures, this method allows to explicitly disentangle temporal parameters in the form of order-preserving diffeomorphisms with respect to a learnable template. This makes the latent space more easily interpretable. We show the efficacy of our approach on a synthetic dataset and a real dataset for hand action-recognition.
对于时间序列分类和检索应用程序,一个重要的要求是开发对时间轴的重新参数化具有鲁棒性的表示/度量。时间再参数化作为一种模型可以解释潜在生成过程中的可变性、采样率变化或简单的时间偏差。在本文中,我们扩展了先前在自动编码模型的潜在空间解纠缠方面的工作,设计了一种新的架构,以完全无监督的方式学习速率不变的潜在代码。与传统的神经网络体系结构不同,该方法允许以相对于可学习模板的保序微分同态的形式显式地解纠缠时间参数。这使得潜在空间更容易解释。我们在一个合成数据集和一个真实的手部动作识别数据集上展示了我们的方法的有效性。
{"title":"Rate-Invariant Autoencoding of Time-Series","authors":"K. Koneripalli, Suhas Lohit, Rushil Anirudh, P. Turaga","doi":"10.1109/ICASSP40776.2020.9053983","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053983","url":null,"abstract":"For time-series classification and retrieval applications, an important requirement is to develop representations/metrics that are robust to re-parametrization of the time-axis. Temporal re-parametrization as a model can account for variability in the underlying generative process, sampling rate variations, or plain temporal mis-alignment. In this paper, we extend prior work in disentangling latent spaces of autoencoding models, to design a novel architecture to learn rate-invariant latent codes in a completely unsupervised fashion. Unlike conventional neural network architectures, this method allows to explicitly disentangle temporal parameters in the form of order-preserving diffeomorphisms with respect to a learnable template. This makes the latent space more easily interpretable. We show the efficacy of our approach on a synthetic dataset and a real dataset for hand action-recognition.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"7 1","pages":"3732-3736"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73095209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
SED-MDD: Towards Sentence Dependent End-To-End Mispronunciation Detection and Diagnosis 基于句子的端到端发音错误检测与诊断
Yiqing Feng, Guanyu Fu, Qingcai Chen, Kai Chen
A mispronunciation detection and diagnosis (MD&D) system typically consists of multiple stages, such as an acoustic model, a language model and a Viterbi decoder. In order to integrate these stages, we propose SED-MDD, an end-to-end model for sentence dependent mispronunciation detection and diagnosis (MD&D) . Our proposed model takes mel-spectrogram and characters as inputs and outputs the corresponding phone sequence. Our experiments prove that SED-MDD can implicitly learn the phonological rules in both acoustic and linguistic features directly from the phonological annotation and transcription in the training data. To the best of our knowledge, SED-MDD is the first model of its kind and it achieves an accuracy of 86.35% and a correctness of 88.61% on L2-ARCTIC which significantly outperforms the existing end-to-end mispronunciation detection and diagnosis (MD&D) model CNN-RNN-CTC.
错误发音检测和诊断(MD&D)系统通常由多个阶段组成,如声学模型、语言模型和维特比解码器。为了整合这些阶段,我们提出了SED-MDD,一个基于句子的错误发音检测和诊断(MD&D)的端到端模型。我们提出的模型以梅尔谱图和字符作为输入和输出相应的电话序列。我们的实验证明,SED-MDD可以直接从训练数据中的语音注释和转录中隐式学习声学和语言特征中的语音规则。据我们所知,SED-MDD是同类模型中的第一个,它在L2-ARCTIC上的准确率为86.35%,正确率为88.61%,显著优于现有的端到端发音错误检测和诊断(MD&D)模型CNN-RNN-CTC。
{"title":"SED-MDD: Towards Sentence Dependent End-To-End Mispronunciation Detection and Diagnosis","authors":"Yiqing Feng, Guanyu Fu, Qingcai Chen, Kai Chen","doi":"10.1109/ICASSP40776.2020.9052975","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052975","url":null,"abstract":"A mispronunciation detection and diagnosis (MD&D) system typically consists of multiple stages, such as an acoustic model, a language model and a Viterbi decoder. In order to integrate these stages, we propose SED-MDD, an end-to-end model for sentence dependent mispronunciation detection and diagnosis (MD&D) . Our proposed model takes mel-spectrogram and characters as inputs and outputs the corresponding phone sequence. Our experiments prove that SED-MDD can implicitly learn the phonological rules in both acoustic and linguistic features directly from the phonological annotation and transcription in the training data. To the best of our knowledge, SED-MDD is the first model of its kind and it achieves an accuracy of 86.35% and a correctness of 88.61% on L2-ARCTIC which significantly outperforms the existing end-to-end mispronunciation detection and diagnosis (MD&D) model CNN-RNN-CTC.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"77 1","pages":"3492-3496"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74199794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Theoretical Analysis of Multi-Carrier Agile Phased Array Radar 多载波敏捷相控阵雷达的理论分析
Tianyao Huang, Nir Shlezinger, Xingyu Xu, Dingyou Ma, Yimin Liu, Yonina C. Eldar
Modern radar systems are expected to operate reliably in congested environments under cost and power constraints. A recent technology for realizing such systems is frequency agile radar (FAR), which transmits narrowband pulses in a frequency hopping manner. To enhance the target recovery performance of FAR in complex electromagnetic environments, and particularly, its range-Doppler recovery performance, multi-Carrier AgilE phaSed Array Radar (CAESAR) was proposed. CAESAR extends FAR to multi-carrier waveforms while introducing the notion of spatial agility. In this paper, we theoretically analyze the range-Doppler recovery capabilities of CAESAR. Particularly, we derive conditions which guarantee accurate reconstruction of these range-Doppler parameters. These conditions indicate that by increasing the number of frequencies transmitted in each pulse, CAESAR improves performance over conventional FAR, especially in complex environments where some radar measurements are severely corrupted by interference.
现代雷达系统有望在成本和功率限制下的拥挤环境中可靠地运行。实现这种系统的最新技术是频率捷变雷达(FAR),它以跳频方式传输窄带脉冲。为了提高FAR在复杂电磁环境下的目标恢复性能,特别是距离-多普勒恢复性能,提出了多载波敏捷相控阵雷达(CAESAR)。CAESAR将FAR扩展到多载波波形,同时引入了空间敏捷性的概念。本文从理论上分析了凯撒雷达的距离-多普勒恢复能力。特别地,我们推导了保证精确重建这些距离-多普勒参数的条件。这些条件表明,通过增加每个脉冲中传输的频率数量,CAESAR提高了传统FAR的性能,特别是在一些雷达测量受到干扰严重破坏的复杂环境中。
{"title":"Theoretical Analysis of Multi-Carrier Agile Phased Array Radar","authors":"Tianyao Huang, Nir Shlezinger, Xingyu Xu, Dingyou Ma, Yimin Liu, Yonina C. Eldar","doi":"10.1109/ICASSP40776.2020.9054035","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054035","url":null,"abstract":"Modern radar systems are expected to operate reliably in congested environments under cost and power constraints. A recent technology for realizing such systems is frequency agile radar (FAR), which transmits narrowband pulses in a frequency hopping manner. To enhance the target recovery performance of FAR in complex electromagnetic environments, and particularly, its range-Doppler recovery performance, multi-Carrier AgilE phaSed Array Radar (CAESAR) was proposed. CAESAR extends FAR to multi-carrier waveforms while introducing the notion of spatial agility. In this paper, we theoretically analyze the range-Doppler recovery capabilities of CAESAR. Particularly, we derive conditions which guarantee accurate reconstruction of these range-Doppler parameters. These conditions indicate that by increasing the number of frequencies transmitted in each pulse, CAESAR improves performance over conventional FAR, especially in complex environments where some radar measurements are severely corrupted by interference.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 4","pages":"4702-4706"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72573206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1