首页 > 最新文献

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Laplacian Regularized Tensor Low-Rank Minimization for Hyperspectral Snapshot Compressive Imaging 高光谱快照压缩成像的拉普拉斯正则张量低秩最小化
Yi Yang, Fei Jiang, Hongtao Lu
Snapshot Compressive Imaging (SCI) systems, including hyperspectral compressive imaging and video compressive imaging, are designed to depict high-dimensional signals with limited data by mapping multiple images into one. One key module of SCI systems is a high quality reconstruction algorithm for original frames. However, most existing decoding algorithms are based on vectorization representation and fail to capture the intrinsic structural information of high dimensional signals. In this paper, we propose a tensor-based low-rank reconstruction algorithm with hyper-Laplacian constraint for hyperspectral SCI systems. First, we integrate the non-local self-similarity and tensor low-rank minimization approach to explore the intrinsic structural correlations along spatial and spectral domains. Then, we introduce a hyper-Laplacian constraint to model the global spectral structures, alleviating the ringing artifacts in the spatial domain. Experimental results on hyperspectral image corpus demonstrate the proposed algorithm achieves average 0.8~2.9 dB improvement in PSNR over state-of-the-art work.
快照压缩成像(SCI)系统,包括高光谱压缩成像和视频压缩成像,旨在通过将多个图像映射成一个图像来描述有限数据的高维信号。高质量的原始帧重构算法是SCI系统的关键模块之一。然而,现有的解码算法大多基于向量化表示,无法捕获高维信号的内在结构信息。本文提出了一种基于张量的高光谱SCI系统高拉普拉斯约束低秩重构算法。首先,我们结合非局部自相似和张量低秩最小化方法来探索空间和谱域的内在结构相关性。然后,我们引入了一个超拉普拉斯约束来模拟全局光谱结构,减轻了空间域的环形伪影。在高光谱图像语料库上的实验结果表明,该算法的PSNR比现有算法平均提高了0.8~2.9 dB。
{"title":"Laplacian Regularized Tensor Low-Rank Minimization for Hyperspectral Snapshot Compressive Imaging","authors":"Yi Yang, Fei Jiang, Hongtao Lu","doi":"10.1109/ICASSP39728.2021.9413381","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413381","url":null,"abstract":"Snapshot Compressive Imaging (SCI) systems, including hyperspectral compressive imaging and video compressive imaging, are designed to depict high-dimensional signals with limited data by mapping multiple images into one. One key module of SCI systems is a high quality reconstruction algorithm for original frames. However, most existing decoding algorithms are based on vectorization representation and fail to capture the intrinsic structural information of high dimensional signals. In this paper, we propose a tensor-based low-rank reconstruction algorithm with hyper-Laplacian constraint for hyperspectral SCI systems. First, we integrate the non-local self-similarity and tensor low-rank minimization approach to explore the intrinsic structural correlations along spatial and spectral domains. Then, we introduce a hyper-Laplacian constraint to model the global spectral structures, alleviating the ringing artifacts in the spatial domain. Experimental results on hyperspectral image corpus demonstrate the proposed algorithm achieves average 0.8~2.9 dB improvement in PSNR over state-of-the-art work.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125237680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High-Frame-Rate Eye-Tracking Framework for Mobile Devices 移动设备的高帧率眼动追踪框架
Yuhu Chang, Changyang He, Yingying Zhao, T. Lu, Ning Gu
Gaze-on-screen tracking, an appearance-based eye-tracking task, has drawn significant interest in recent years. While learning-based high-precision eye-tracking methods have been designed in the past, the complex pre-training and high computation in neural network-based deep models restrict their applicability in mobile devices. Moreover, as the display frame rate of mobile devices has steadily increased to 120 fps, high-frame-rate eye tracking becomes increasingly challenging. In this work, we tackle the tracking efficiency challenge and introduce GazeHFR, a biologic-inspired eye-tracking model specialized for mobile devices, offering both high accuracy and efficiency. Specifically, GazeHFR classifies the eye movement into two distinct phases, i.e., saccade and smooth pursuit, and leverages inter-frame motion information combined with lightweight learning models tailored to each movement phase to deliver high-efficient eye tracking without affecting accuracy. Compared to prior art, Gaze-HFR achieves approximately 7x speedup and 15% accuracy improvement on mobile devices.
注视屏幕追踪是一种基于外表的眼球追踪任务,近年来引起了人们的极大兴趣。虽然已有基于学习的高精度眼动追踪方法,但基于神经网络的深度模型预训练复杂、计算量大,限制了其在移动设备上的适用性。此外,随着移动设备的显示帧率稳步提高到120fps,高帧率眼动追踪变得越来越具有挑战性。在这项工作中,我们解决了跟踪效率的挑战,并引入了GazeHFR,一种专门用于移动设备的生物灵感眼动追踪模型,提供了高精度和高效率。具体来说,GazeHFR将眼球运动分为扫视和平滑追求两个不同的阶段,并利用帧间运动信息结合针对每个运动阶段量身定制的轻量级学习模型,在不影响准确性的情况下实现高效的眼动追踪。与现有技术相比,Gaze-HFR在移动设备上实现了大约7倍的加速和15%的精度提高。
{"title":"A High-Frame-Rate Eye-Tracking Framework for Mobile Devices","authors":"Yuhu Chang, Changyang He, Yingying Zhao, T. Lu, Ning Gu","doi":"10.1109/ICASSP39728.2021.9414624","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414624","url":null,"abstract":"Gaze-on-screen tracking, an appearance-based eye-tracking task, has drawn significant interest in recent years. While learning-based high-precision eye-tracking methods have been designed in the past, the complex pre-training and high computation in neural network-based deep models restrict their applicability in mobile devices. Moreover, as the display frame rate of mobile devices has steadily increased to 120 fps, high-frame-rate eye tracking becomes increasingly challenging. In this work, we tackle the tracking efficiency challenge and introduce GazeHFR, a biologic-inspired eye-tracking model specialized for mobile devices, offering both high accuracy and efficiency. Specifically, GazeHFR classifies the eye movement into two distinct phases, i.e., saccade and smooth pursuit, and leverages inter-frame motion information combined with lightweight learning models tailored to each movement phase to deliver high-efficient eye tracking without affecting accuracy. Compared to prior art, Gaze-HFR achieves approximately 7x speedup and 15% accuracy improvement on mobile devices.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130788616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predictive Coding for Lossless Dataset Compression 无损数据集压缩的预测编码
Madeleine Barowsky, Alexander Mariona, F. Calmon
Lossless compression of datasets is a problem of significant theoretical and practical interest. It appears naturally in the task of storing, sending, or archiving large collections of information for scientific research. We can greatly improve encoding bitrate if we allow the compression of the original dataset to decompress to a permutation of the data. We prove the equivalence of dataset compression to compressing a permutation-invariant structure of the data and implement such a scheme via predictive coding. We benchmark our compression procedure against state-of-the-art compression utilities on the popular machine-learning datasets MNIST and CIFAR-10 and outperform for multiple parameter sets.
数据集的无损压缩是一个具有重要理论和实践意义的问题。在为科学研究存储、发送或存档大量信息的任务中,自然会出现这种情况。如果我们允许原始数据集的压缩解压缩到数据的排列,我们可以大大提高编码比特率。我们证明了数据集压缩等价于压缩数据的置换不变结构,并通过预测编码实现了这一方案。在流行的机器学习数据集MNIST和CIFAR-10上,我们将我们的压缩过程与最先进的压缩工具进行了基准测试,并在多个参数集上表现出色。
{"title":"Predictive Coding for Lossless Dataset Compression","authors":"Madeleine Barowsky, Alexander Mariona, F. Calmon","doi":"10.1109/ICASSP39728.2021.9413447","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413447","url":null,"abstract":"Lossless compression of datasets is a problem of significant theoretical and practical interest. It appears naturally in the task of storing, sending, or archiving large collections of information for scientific research. We can greatly improve encoding bitrate if we allow the compression of the original dataset to decompress to a permutation of the data. We prove the equivalence of dataset compression to compressing a permutation-invariant structure of the data and implement such a scheme via predictive coding. We benchmark our compression procedure against state-of-the-art compression utilities on the popular machine-learning datasets MNIST and CIFAR-10 and outperform for multiple parameter sets.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131029045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Real-Time Speech Enhancement for Mobile Communication Based on Dual-Channel Complex Spectral Mapping 基于双通道复杂频谱映射的移动通信实时语音增强
Ke Tan, Xueliang Zhang, Deliang Wang
Speech quality and intelligibility can be severely degraded by back-ground noise in mobile communication. In order to attenuate back-ground noise, speech enhancement systems have been integrated into mobile phones, and a microphone array is typically deployed to improve the enhancement performance. This paper proposes a novel approach to real-time speech enhancement for dual-microphone mobile phones. Our approach employs a causal densely-connected convolutional recurrent network to perform dual-channel complex spectral mapping. We apply a structured pruning technique for compressing the model without significantly affecting the enhancement performance. This leads to a real-time enhancement system for on-device processing. Evaluation results show that the pro-posed approach substantially advances the performance of an earlier approach to dual-channel speech enhancement for mobile communication.
在移动通信中,背景噪声会严重降低语音质量和清晰度。为了减弱背景噪声,语音增强系统已被集成到移动电话中,并且通常部署麦克风阵列来提高增强性能。提出了一种双麦克风手机实时语音增强的新方法。我们的方法采用因果密集连接的卷积循环网络来执行双通道复谱映射。我们采用结构化修剪技术来压缩模型,而不会显著影响增强性能。这导致了设备上处理的实时增强系统。评估结果表明,所提出的方法大大提高了先前用于移动通信的双通道语音增强方法的性能。
{"title":"Real-Time Speech Enhancement for Mobile Communication Based on Dual-Channel Complex Spectral Mapping","authors":"Ke Tan, Xueliang Zhang, Deliang Wang","doi":"10.1109/ICASSP39728.2021.9414346","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414346","url":null,"abstract":"Speech quality and intelligibility can be severely degraded by back-ground noise in mobile communication. In order to attenuate back-ground noise, speech enhancement systems have been integrated into mobile phones, and a microphone array is typically deployed to improve the enhancement performance. This paper proposes a novel approach to real-time speech enhancement for dual-microphone mobile phones. Our approach employs a causal densely-connected convolutional recurrent network to perform dual-channel complex spectral mapping. We apply a structured pruning technique for compressing the model without significantly affecting the enhancement performance. This leads to a real-time enhancement system for on-device processing. Evaluation results show that the pro-posed approach substantially advances the performance of an earlier approach to dual-channel speech enhancement for mobile communication.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131078354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Short Tutorial on The Weisfeiler-Lehman Test And Its Variants 关于Weisfeiler-Lehman测试及其变体的简短教程
Ningyuan Huang, Soledad Villar
Graph neural networks are designed to learn functions on graphs. Typically, the relevant target functions are invariant with respect to actions by permutations. Therefore the design of some graph neural network architectures has been inspired by graph-isomorphism algorithms.The classical Weisfeiler-Lehman algorithm (WL)—a graph-isomorphism test based on color refinement—became relevant to the study of graph neural networks. The WL test can be generalized to a hierarchy of higher-order tests, known as k-WL. This hierarchy has been used to characterize the expressive power of graph neural networks, and to inspire the design of graph neural network architectures.A few variants of the WL hierarchy appear in the literature. The goal of this short note is pedagogical and practical: We explain the differences between the WL and folklore-WL formulations, with pointers to existing discussions in the literature. We illuminate the differences between the formulations by visualizing an example.
图神经网络是用来学习图上的函数的。通常,相关的目标函数对于置换的动作是不变的。因此,一些图神经网络架构的设计受到了图同构算法的启发。经典的Weisfeiler-Lehman算法(WL)是一种基于颜色细化的图同构检验,与图神经网络的研究密切相关。WL检验可以推广到一个层次的高阶检验,称为k-WL。这种层次结构被用来描述图神经网络的表达能力,并启发图神经网络架构的设计。在文献中出现了一些WL层次结构的变体。这篇短文的目的是教学和实践:我们解释了WL和民间传说-WL表述之间的区别,并指出了文献中现有的讨论。我们通过一个可视化的例子来说明这些公式之间的区别。
{"title":"A Short Tutorial on The Weisfeiler-Lehman Test And Its Variants","authors":"Ningyuan Huang, Soledad Villar","doi":"10.1109/ICASSP39728.2021.9413523","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413523","url":null,"abstract":"Graph neural networks are designed to learn functions on graphs. Typically, the relevant target functions are invariant with respect to actions by permutations. Therefore the design of some graph neural network architectures has been inspired by graph-isomorphism algorithms.The classical Weisfeiler-Lehman algorithm (WL)—a graph-isomorphism test based on color refinement—became relevant to the study of graph neural networks. The WL test can be generalized to a hierarchy of higher-order tests, known as k-WL. This hierarchy has been used to characterize the expressive power of graph neural networks, and to inspire the design of graph neural network architectures.A few variants of the WL hierarchy appear in the literature. The goal of this short note is pedagogical and practical: We explain the differences between the WL and folklore-WL formulations, with pointers to existing discussions in the literature. We illuminate the differences between the formulations by visualizing an example.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133012431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Leveraging A Multiple-Strain Model with Mutations in Analyzing the Spread of Covid-19 利用多株突变模型分析Covid-19的传播
Anirudh Sridhar, Osman Yağan, Rashad M. Eletreby, S. Levin, J. Plotkin, H. Poor
The spread of COVID-19 has been among the most devastating events affecting the health and well-being of humans worldwide since World War II. A key scientific goal concerning COVID-19 is to develop mathematical models that help us to understand and predict its spreading behavior, as well as to provide guidelines on what can be done to limit its spread. In this paper, we discuss how our recent work on a multiple-strain spreading model with mutations can help address some key questions concerning the spread of COVID-19. We highlight the recent reports on a mutation of SARS-CoV-2 that is thought to be more transmissible than the original strain and discuss the importance of incorporating mutation and evolutionary adaptations (together with the network structure) in epidemic models. We also demonstrate how the multiple-strain transmission model can be used to assess the effectiveness of mask-wearing in limiting the spread of COVID-19. Finally, we present simulation results to demonstrate our ideas and the utility of the multiple-strain model in the context of COVID-19.
自第二次世界大战以来,COVID-19的传播是影响全球人类健康和福祉的最具破坏性的事件之一。关于COVID-19的一个关键科学目标是建立数学模型,帮助我们理解和预测其传播行为,并为如何限制其传播提供指导。在本文中,我们讨论了我们最近在带有突变的多菌株传播模型上的工作如何有助于解决有关COVID-19传播的一些关键问题。我们强调了最近关于SARS-CoV-2突变的报道,该突变被认为比原始菌株更具传染性,并讨论了在流行模型中纳入突变和进化适应(连同网络结构)的重要性。我们还演示了如何使用多菌株传播模型来评估戴口罩限制COVID-19传播的有效性。最后,我们给出了仿真结果来证明我们的想法和多应变模型在COVID-19背景下的实用性。
{"title":"Leveraging A Multiple-Strain Model with Mutations in Analyzing the Spread of Covid-19","authors":"Anirudh Sridhar, Osman Yağan, Rashad M. Eletreby, S. Levin, J. Plotkin, H. Poor","doi":"10.1109/ICASSP39728.2021.9414595","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414595","url":null,"abstract":"The spread of COVID-19 has been among the most devastating events affecting the health and well-being of humans worldwide since World War II. A key scientific goal concerning COVID-19 is to develop mathematical models that help us to understand and predict its spreading behavior, as well as to provide guidelines on what can be done to limit its spread. In this paper, we discuss how our recent work on a multiple-strain spreading model with mutations can help address some key questions concerning the spread of COVID-19. We highlight the recent reports on a mutation of SARS-CoV-2 that is thought to be more transmissible than the original strain and discuss the importance of incorporating mutation and evolutionary adaptations (together with the network structure) in epidemic models. We also demonstrate how the multiple-strain transmission model can be used to assess the effectiveness of mask-wearing in limiting the spread of COVID-19. Finally, we present simulation results to demonstrate our ideas and the utility of the multiple-strain model in the context of COVID-19.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133537653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Quickest Joint Detection and Classification of Faults in Statistically Periodic Processes 统计周期过程中故障的快速联合检测与分类
T. Banerjee, Smruti Padhy, A. Taha, E. John
An algorithm is proposed to detect and classify a change in the distribution of a stochastic process that has periodic statistical behavior. The problem is posed in the framework of independent and periodically identically distributed (i.p.i.d.) processes, a recently introduced class of processes to model statistically periodic data. It is shown that the proposed algorithm is asymptotically optimal as the rate of false alarms and the probability of misclassification goes to zero. This problem has applications in anomaly detection in traffic data, social network data, ECG data, and neural data, where periodic statistical behavior has been observed. The effectiveness of the algorithm is demonstrated by application to real and simulated data.
提出了一种检测和分类具有周期性统计行为的随机过程分布变化的算法。该问题是在独立和周期性同分布(i.p.i.d)过程的框架中提出的,i.p.i.d是最近引入的一类用于统计周期性数据建模的过程。结果表明,当误报警率和误分类概率趋近于零时,该算法是渐近最优的。该问题已应用于交通数据、社会网络数据、心电数据和神经数据的异常检测,在这些数据中观察到周期性的统计行为。通过对实际数据和仿真数据的应用,验证了该算法的有效性。
{"title":"Quickest Joint Detection and Classification of Faults in Statistically Periodic Processes","authors":"T. Banerjee, Smruti Padhy, A. Taha, E. John","doi":"10.1109/ICASSP39728.2021.9414101","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414101","url":null,"abstract":"An algorithm is proposed to detect and classify a change in the distribution of a stochastic process that has periodic statistical behavior. The problem is posed in the framework of independent and periodically identically distributed (i.p.i.d.) processes, a recently introduced class of processes to model statistically periodic data. It is shown that the proposed algorithm is asymptotically optimal as the rate of false alarms and the probability of misclassification goes to zero. This problem has applications in anomaly detection in traffic data, social network data, ECG data, and neural data, where periodic statistical behavior has been observed. The effectiveness of the algorithm is demonstrated by application to real and simulated data.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133539330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Utterance Confidence Measure for RNN-Transducers and Two Pass Models rnn换能器和两通道模型的神经话语置信度度量
Ashutosh Gupta, Ankur Kumar, Dhananjaya N. Gowda, Kwangyoun Kim, Sachin Singh, Shatrughan Singh, Chanwoo Kim
In this paper, we propose methods to compute confidence score on the predictions made by an end-to-end speech recognition model in a 2-pass framework. We use RNN-Transducer for a streaming model, and an attention-based decoder for the second pass model. We use neural technique to compute the confidence score, and experiment with various combinations of features from RNN-Transducer and second pass models. The neural confidence score model is trained as a binary classification task to accept or reject a prediction made by speech recognition model. The model is evaluated in a distributed speech recognition environment, and performs significantly better when features from second pass model are used as compared to the features from streaming model.
在本文中,我们提出了在2通道框架中计算端到端语音识别模型所做预测的置信度分数的方法。我们使用RNN-Transducer作为流模型,并使用基于注意力的解码器作为第二遍模型。我们使用神经网络技术来计算置信度得分,并使用来自RNN-Transducer和second pass模型的各种特征组合进行实验。神经置信度评分模型被训练成一个二元分类任务来接受或拒绝语音识别模型的预测。该模型在分布式语音识别环境中进行了评估,与使用流模型的特征相比,使用第二通道模型的特征时表现明显更好。
{"title":"Neural Utterance Confidence Measure for RNN-Transducers and Two Pass Models","authors":"Ashutosh Gupta, Ankur Kumar, Dhananjaya N. Gowda, Kwangyoun Kim, Sachin Singh, Shatrughan Singh, Chanwoo Kim","doi":"10.1109/ICASSP39728.2021.9414467","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414467","url":null,"abstract":"In this paper, we propose methods to compute confidence score on the predictions made by an end-to-end speech recognition model in a 2-pass framework. We use RNN-Transducer for a streaming model, and an attention-based decoder for the second pass model. We use neural technique to compute the confidence score, and experiment with various combinations of features from RNN-Transducer and second pass models. The neural confidence score model is trained as a binary classification task to accept or reject a prediction made by speech recognition model. The model is evaluated in a distributed speech recognition environment, and performs significantly better when features from second pass model are used as compared to the features from streaming model.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132282925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Latency Polar Decoder Using Overlapped SCL Processing 使用重叠SCL处理的低延迟极性解码器
D. Kam, B. Y. Kong, Youngjoo Lee
In this paper, we present a novel scheduling method that reduces the latency of polar decoders significantly. Unlike the prior pruning-based successive cancellation list (SCL) decoding that suffers from a number of idle cycles, the proposed overlapped SCL scheme immediately begins node operations without waiting for the list to be sorted, being exempt from such unfavorable cycles. All possible candidates for the next node operations are precomputed in parallel with the pruning operations, and are readily selected to minimize the latency. For the 5G New Radio systems, the proposed method shortens the decoding latency of the state-of-the-art approaches by up to 22% without degrading the error-correcting performance.
在本文中,我们提出了一种新的调度方法,可以显着降低极性解码器的延迟。与先前基于剪接的连续取消列表(SCL)解码有许多空闲周期不同,所提出的重叠SCL方案不需要等待列表排序就可以立即开始节点操作,从而免除了这些不利的周期。下一个节点操作的所有可能的候选操作都与修剪操作并行预先计算,并且很容易选择以最小化延迟。对于5G新无线电系统,所提出的方法在不降低纠错性能的情况下,将最先进方法的解码延迟缩短了22%。
{"title":"Low-Latency Polar Decoder Using Overlapped SCL Processing","authors":"D. Kam, B. Y. Kong, Youngjoo Lee","doi":"10.1109/ICASSP39728.2021.9414326","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414326","url":null,"abstract":"In this paper, we present a novel scheduling method that reduces the latency of polar decoders significantly. Unlike the prior pruning-based successive cancellation list (SCL) decoding that suffers from a number of idle cycles, the proposed overlapped SCL scheme immediately begins node operations without waiting for the list to be sorted, being exempt from such unfavorable cycles. All possible candidates for the next node operations are precomputed in parallel with the pruning operations, and are readily selected to minimize the latency. For the 5G New Radio systems, the proposed method shortens the decoding latency of the state-of-the-art approaches by up to 22% without degrading the error-correcting performance.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132669720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Generative Demixing: Error Bounds for Demixing Subgaussian Mixtures of Lipschitz Signals 深度生成解混:Lipschitz信号亚高斯混合解混的误差界
Aaron Berk
Generative neural networks (GNNs) have gained renown for efficaciously capturing intrinsic low-dimensional structure in natural images. Here, we investigate the subgaussian demixing problem for two Lipschitz signals, with GNN demixing as a special case. In demixing, one seeks identification of two signals given their sum and prior structural information. Here, we assume each signal lies in the range of a Lipschitz function, which includes many popular GNNs as a special case. We prove a sample complexity bound for nearly optimal recovery error that extends a recent result of Bora, et al. (2017) from the compressed sensing setting with gaussian matrices to demixing with subgaussian ones. Under a linear signal model in which the signals lie in convex sets, McCoy & Tropp (2014) have characterized the sample complexity for identification under subgaussian mixing. In the present setting, the signal structure need not be convex. For example, our result applies to a domain that is a non-convex union of convex cones. We support the efficacy of this demixing model with numerical simulations using trained GNNs, suggesting an algorithm that would be an interesting object of further theoretical study.
生成式神经网络(gnn)以有效捕获自然图像中的固有低维结构而闻名。本文研究了两个Lipschitz信号的亚高斯解混问题,并以GNN解混为特例。在解混中,人们寻求在给定两个信号的和和先验结构信息的情况下识别它们。在这里,我们假设每个信号都位于Lipschitz函数的范围内,其中包括许多流行的gnn作为特殊情况。我们证明了近乎最优恢复误差的样本复杂度界,它将Bora等人(2017)的最新结果从高斯矩阵的压缩感知设置扩展到亚高斯矩阵的解混。在信号位于凸集的线性信号模型下,McCoy & Tropp(2014)描述了亚高斯混合下识别的样本复杂度。在目前的设置中,信号结构不必是凸的。例如,我们的结果适用于凸锥的非凸并域。我们通过使用训练好的GNNs进行数值模拟来支持这种脱混模型的有效性,这表明一种算法将成为进一步理论研究的有趣对象。
{"title":"Deep Generative Demixing: Error Bounds for Demixing Subgaussian Mixtures of Lipschitz Signals","authors":"Aaron Berk","doi":"10.1109/ICASSP39728.2021.9413573","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413573","url":null,"abstract":"Generative neural networks (GNNs) have gained renown for efficaciously capturing intrinsic low-dimensional structure in natural images. Here, we investigate the subgaussian demixing problem for two Lipschitz signals, with GNN demixing as a special case. In demixing, one seeks identification of two signals given their sum and prior structural information. Here, we assume each signal lies in the range of a Lipschitz function, which includes many popular GNNs as a special case. We prove a sample complexity bound for nearly optimal recovery error that extends a recent result of Bora, et al. (2017) from the compressed sensing setting with gaussian matrices to demixing with subgaussian ones. Under a linear signal model in which the signals lie in convex sets, McCoy & Tropp (2014) have characterized the sample complexity for identification under subgaussian mixing. In the present setting, the signal structure need not be convex. For example, our result applies to a domain that is a non-convex union of convex cones. We support the efficacy of this demixing model with numerical simulations using trained GNNs, suggesting an algorithm that would be an interesting object of further theoretical study.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132751284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1