首页 > 最新文献

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition 多类型语音识别的增量半监督学习
B. K. Khonglah, S. Madikeri, S. Dey, H. Bourlard, P. Motlícek, J. Billa
In this work, we explore a data scheduling strategy for semi-supervised learning (SSL) for acoustic modeling in automatic speech recognition. The conventional approach uses a seed model trained with supervised data to automatically recognize the entire set of unlabeled (auxiliary) data to generate new labels for subsequent acoustic model training. In this paper, we propose an approach in which the unlabelled set is divided into multiple equal-sized subsets. These subsets are processed in an incremental fashion: for each iteration a new subset is added to the data used for SSL, starting from only one subset in the first iteration. The acoustic model from the previous iteration becomes the seed model for the next one. This scheduling strategy is compared to the approach employing all unlabeled data in one-shot for training. Experiments using lattice-free maximum mutual information based acoustic model training on Fisher English gives 80% word error recovery rate. On the multi-genre evaluation sets on Lithuanian and Bulgarian relative improvements of up to 17.2% in word error rate are observed.
在这项工作中,我们探索了一种用于自动语音识别声学建模的半监督学习(SSL)数据调度策略。传统方法使用经过监督数据训练的种子模型来自动识别整个未标记(辅助)数据集,为后续声学模型训练生成新的标签。在本文中,我们提出了一种将未标记集划分为多个等大小子集的方法。这些子集以增量方式处理:对于每个迭代,将一个新的子集添加到用于SSL的数据中,从第一次迭代中的一个子集开始。前一次迭代的声学模型成为下一次迭代的种子模型。将此调度策略与一次性使用所有未标记数据进行训练的方法进行了比较。使用基于无格最大互信息的声学模型训练Fisher英语的实验,单词错误恢复率达到80%。在多体裁评价集上,立陶宛语和保加利亚语的错误率相对提高了17.2%。
{"title":"Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition","authors":"B. K. Khonglah, S. Madikeri, S. Dey, H. Bourlard, P. Motlícek, J. Billa","doi":"10.1109/ICASSP40776.2020.9054309","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054309","url":null,"abstract":"In this work, we explore a data scheduling strategy for semi-supervised learning (SSL) for acoustic modeling in automatic speech recognition. The conventional approach uses a seed model trained with supervised data to automatically recognize the entire set of unlabeled (auxiliary) data to generate new labels for subsequent acoustic model training. In this paper, we propose an approach in which the unlabelled set is divided into multiple equal-sized subsets. These subsets are processed in an incremental fashion: for each iteration a new subset is added to the data used for SSL, starting from only one subset in the first iteration. The acoustic model from the previous iteration becomes the seed model for the next one. This scheduling strategy is compared to the approach employing all unlabeled data in one-shot for training. Experiments using lattice-free maximum mutual information based acoustic model training on Fisher English gives 80% word error recovery rate. On the multi-genre evaluation sets on Lithuanian and Bulgarian relative improvements of up to 17.2% in word error rate are observed.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"332 1","pages":"7419-7423"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76584036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Automatic Epileptic Seizure Onset-Offset Detection Based On CNN in Scalp EEG 基于CNN的头皮脑电图癫痫发作发作偏移自动检测
P. Boonyakitanont, Apiwat Lek-uthai, J. Songsiri
We establish a deep learning-based method to automatically detect the epileptic seizure onsets and offsets in multi-channel electroencephalography (EEG) signals. A convolutional neural network (CNN) is designed to identify occurrences of seizures in EEG epochs from the EEG signals and an onset-offset detector is proposed to determine the seizure onsets and offsets. The EEG signals are considered as inputs and the outputs are the onset and offset. In the CNN, a filter is factorized to separately capture temporal and spatial patterns in EEG epochs. Moreover, we develop an onset-offset detection method based on clinical decision criteria. As a result, verified on the whole CHB-MIT Scalp EEG database, the CNN model correctly detected seizure activities over 90%. Furthermore, combined with the onset-offset detector, this method accomplished F1 of 64.40% and essentially determined the seizure onset and offset with absolute onset and offset latencies of 5.83 and 10.12 seconds, respectively.
我们建立了一种基于深度学习的方法来自动检测多通道脑电图(EEG)信号中的癫痫发作和偏移。设计了一种卷积神经网络(CNN)来识别脑电图信号中癫痫发作的发生,并提出了一种发作-偏移检测器来确定癫痫发作的发作和偏移。将脑电信号作为输入,输出是起始和偏移。在CNN中,对一个滤波器进行分解,分别捕获脑电信号时代的时空模式。此外,我们开发了一种基于临床决策标准的发病偏移检测方法。结果,在整个CHB-MIT头皮脑电图数据库上验证,CNN模型正确检测癫痫发作活动超过90%。结合发作-偏移检测器,该方法完成了64.40%的F1,基本确定了癫痫发作和偏移,绝对发作和偏移延迟分别为5.83和10.12秒。
{"title":"Automatic Epileptic Seizure Onset-Offset Detection Based On CNN in Scalp EEG","authors":"P. Boonyakitanont, Apiwat Lek-uthai, J. Songsiri","doi":"10.1109/ICASSP40776.2020.9053143","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053143","url":null,"abstract":"We establish a deep learning-based method to automatically detect the epileptic seizure onsets and offsets in multi-channel electroencephalography (EEG) signals. A convolutional neural network (CNN) is designed to identify occurrences of seizures in EEG epochs from the EEG signals and an onset-offset detector is proposed to determine the seizure onsets and offsets. The EEG signals are considered as inputs and the outputs are the onset and offset. In the CNN, a filter is factorized to separately capture temporal and spatial patterns in EEG epochs. Moreover, we develop an onset-offset detection method based on clinical decision criteria. As a result, verified on the whole CHB-MIT Scalp EEG database, the CNN model correctly detected seizure activities over 90%. Furthermore, combined with the onset-offset detector, this method accomplished F1 of 64.40% and essentially determined the seizure onset and offset with absolute onset and offset latencies of 5.83 and 10.12 seconds, respectively.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"1225-1229"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77397373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Signal Sensing and Reconstruction Paradigms for a Novel Multi-Source Static Computed Tomography System 一种新型多源静态计算机断层扫描系统的信号感知与重构范式
Alankar Kowtal, A. Cramer, Dufan Wu, Kai Yang, Wolfgang Krull, Ioannis Gkioulekas, Rajiv Gupta
Conventional Computed Tomography (CT) systems use a single X-ray source and an arc of detectors mounted on a rotating gantry to acquire a set of projection data. Novel CT systems are now being pioneered in which a complete ring of distributed X-ray sources and detectors are electronically turned on and off, without any mechanical motion, to acquire a set of projections for tomographic reconstruction. This paper discusses new sensing and reconstruction paradigms enabled by this new CT architecture.
传统的计算机断层扫描(CT)系统使用单个x射线源和安装在旋转龙门上的探测器弧线来获取一组投影数据。新型CT系统目前处于领先地位,其中一个完整的分布式x射线源和探测器环是电子打开和关闭的,没有任何机械运动,以获得一组投影用于层析重建。本文讨论了这种新的CT结构所带来的新的传感和重建范式。
{"title":"Signal Sensing and Reconstruction Paradigms for a Novel Multi-Source Static Computed Tomography System","authors":"Alankar Kowtal, A. Cramer, Dufan Wu, Kai Yang, Wolfgang Krull, Ioannis Gkioulekas, Rajiv Gupta","doi":"10.1109/ICASSP40776.2020.9054146","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054146","url":null,"abstract":"Conventional Computed Tomography (CT) systems use a single X-ray source and an arc of detectors mounted on a rotating gantry to acquire a set of projection data. Novel CT systems are now being pioneered in which a complete ring of distributed X-ray sources and detectors are electronically turned on and off, without any mechanical motion, to acquire a set of projections for tomographic reconstruction. This paper discusses new sensing and reconstruction paradigms enabled by this new CT architecture.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"9274-9278"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77689864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Wideband Channel Tracking for Millimeter Wave Massive Mimo Systems with Hybrid Beamforming Reception 混合波束成形接收毫米波海量Mimo系统的宽带信道跟踪
G. Alexandropoulos, Evangelos Vlachos, J. Thompson
Millimeter Wave (mmWave) massive Multiple Input Multiple Output (MIMO) channel tracking is a challenging task with Hybrid analog and digital BeamForming (HBF) reception architectures. The wireless channel can only be spatially sampled with directive analog beams, which results in lengthy training periods when beam codebooks are large. In this paper, we capitalize on a recently proposed HBF architecture enabling mmWave massive MIMO channel estimation with short beam training overhead, and present a matrix-completion-based channel tracking technique for time correlated HBF receivers. The considered channel tracking problem is formulated as a constrained multi-objective optimization problem incorporating the low rank and group-sparse properties of the mmWave channel as well as a popular model for its time correlation. We present an efficient algorithm for this estimation problem that is based on the alternating direction method of multipliers. Comparisons of the proposed approach over representative state-of-the-art techniques showcase the relation between the channel time correlation coefficient and the amount of beam training needed for acceptable channel estimation performance.
在混合模拟和数字波束成形(HBF)接收架构下,毫米波(mmWave)大规模多输入多输出(MIMO)信道跟踪是一项具有挑战性的任务。无线信道只能用定向模拟波束进行空间采样,当波束码本较大时,训练周期较长。在本文中,我们利用最近提出的HBF架构,利用短波束训练开销实现毫米波大规模MIMO信道估计,并提出了一种基于矩阵补全的时间相关HBF接收机信道跟踪技术。所考虑的信道跟踪问题被表述为一个约束多目标优化问题,该问题结合了毫米波信道的低秩和群稀疏特性以及其时间相关的流行模型。我们提出了一种基于乘法器交替方向法的有效估计算法。将所提出的方法与代表性的最新技术进行比较,显示了信道时间相关系数与可接受的信道估计性能所需的波束训练量之间的关系。
{"title":"Wideband Channel Tracking for Millimeter Wave Massive Mimo Systems with Hybrid Beamforming Reception","authors":"G. Alexandropoulos, Evangelos Vlachos, J. Thompson","doi":"10.1109/ICASSP40776.2020.9053440","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053440","url":null,"abstract":"Millimeter Wave (mmWave) massive Multiple Input Multiple Output (MIMO) channel tracking is a challenging task with Hybrid analog and digital BeamForming (HBF) reception architectures. The wireless channel can only be spatially sampled with directive analog beams, which results in lengthy training periods when beam codebooks are large. In this paper, we capitalize on a recently proposed HBF architecture enabling mmWave massive MIMO channel estimation with short beam training overhead, and present a matrix-completion-based channel tracking technique for time correlated HBF receivers. The considered channel tracking problem is formulated as a constrained multi-objective optimization problem incorporating the low rank and group-sparse properties of the mmWave channel as well as a popular model for its time correlation. We present an efficient algorithm for this estimation problem that is based on the alternating direction method of multipliers. Comparisons of the proposed approach over representative state-of-the-art techniques showcase the relation between the channel time correlation coefficient and the amount of beam training needed for acceptable channel estimation performance.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"8698-8702"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77710074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Fixed-Point Optimization of Transformer Neural Network 变压器神经网络的不动点优化
Yoonho Boo, Wonyong Sung
The Transformer model adopts a self-attention structure and shows very good performance in various natural language processing tasks. However, it is difficult to implement the Transformer in embedded systems because of its very large model size. In this study, we quantize the parameters and hidden signals of the Transformer for complexity reduction. Not only matrices for weights and embedding but the input and the softmax outputs are also quantized to utilize low-precision matrix multiplication. The fixed-point optimization steps consist of quantization sensitivity analysis, hardware conscious word-length assignment, quantization and retraining, and post-training for improved generalization. We achieved 27.51 BLEU score on the WMT English-to-German translation task with 4-bit weights and 6-bit hidden signals.
Transformer模型采用自注意结构,在各种自然语言处理任务中表现出很好的性能。然而,由于其非常大的模型尺寸,很难在嵌入式系统中实现Transformer。在本研究中,我们量化了变压器的参数和隐藏信号,以降低复杂性。不仅用于权重和嵌入的矩阵,而且输入和softmax输出也被量化以利用低精度的矩阵乘法。不动点优化步骤包括量化敏感性分析、硬件有意识的词长分配、量化和再训练以及提高泛化的后训练。我们在4位权值和6位隐藏信号的WMT英语-德语翻译任务上获得了27.51 BLEU分数。
{"title":"Fixed-Point Optimization of Transformer Neural Network","authors":"Yoonho Boo, Wonyong Sung","doi":"10.1109/ICASSP40776.2020.9054724","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054724","url":null,"abstract":"The Transformer model adopts a self-attention structure and shows very good performance in various natural language processing tasks. However, it is difficult to implement the Transformer in embedded systems because of its very large model size. In this study, we quantize the parameters and hidden signals of the Transformer for complexity reduction. Not only matrices for weights and embedding but the input and the softmax outputs are also quantized to utilize low-precision matrix multiplication. The fixed-point optimization steps consist of quantization sensitivity analysis, hardware conscious word-length assignment, quantization and retraining, and post-training for improved generalization. We achieved 27.51 BLEU score on the WMT English-to-German translation task with 4-bit weights and 6-bit hidden signals.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"85 1","pages":"1753-1757"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79833383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Fully-Neural Approach to Heavy Vehicle Detection on Bridges Using a Single Strain Sensor 基于单应变传感器的桥梁重型车辆检测全神经网络方法
T. Kawakatsu, K. Aihara, A. Takasu, J. Adachi
Bridge weigh-in-motion (BWIM) is a technique for detecting heavy vehicles that may cause serious damage to real bridges. BWIM is realized by analyzing the strain signals observed at places on the bridge in terms of bridge-component responses to the axle loads. In current practice, a BWIM system requires multiple strain sensors to collect vehicle properties including speed and axle positions for accurate load estimation, which may limit the system’s life-span. Furthermore, BWIM should consider a wide variety of waveforms, which may be caused by vehicle acceleration and/or the various traveling positions in lanes. In this paper, we propose a novel BWIM mechanism, which employs a deep convolutional neural network (CNN). The CNN is able to learn actual traffic conditions and achieve accurate load estimation by using only a single strain sensor. The training dataset is collected from a distant load meter, by consulting traffic surveillance cameras and identifying similar vehicles. After the system initialization, the CNN requires no additional sensors (or cameras) for axle detection, which may reduce the costs of both installation and system maintenance.
桥梁运动称重(BWIM)是一种检测重型车辆可能对真实桥梁造成严重破坏的技术。BWIM是通过分析桥梁构件对轴载响应的应变信号来实现的。在目前的实践中,BWIM系统需要多个应变传感器来收集车辆属性,包括速度和轴位置,以进行准确的负载估计,这可能会限制系统的使用寿命。此外,BWIM应该考虑各种各样的波形,这些波形可能是由车辆加速和/或车道上不同的行驶位置引起的。在本文中,我们提出了一种新的BWIM机制,该机制采用深度卷积神经网络(CNN)。CNN仅使用单个应变传感器就可以学习实际交通状况并实现准确的负载估计。训练数据集是通过咨询交通监控摄像头和识别类似车辆,从远处的负载计收集的。在系统初始化后,CNN不需要额外的传感器(或摄像头)来检测车轴,这可能会降低安装和系统维护的成本。
{"title":"Fully-Neural Approach to Heavy Vehicle Detection on Bridges Using a Single Strain Sensor","authors":"T. Kawakatsu, K. Aihara, A. Takasu, J. Adachi","doi":"10.1109/ICASSP40776.2020.9053137","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053137","url":null,"abstract":"Bridge weigh-in-motion (BWIM) is a technique for detecting heavy vehicles that may cause serious damage to real bridges. BWIM is realized by analyzing the strain signals observed at places on the bridge in terms of bridge-component responses to the axle loads. In current practice, a BWIM system requires multiple strain sensors to collect vehicle properties including speed and axle positions for accurate load estimation, which may limit the system’s life-span. Furthermore, BWIM should consider a wide variety of waveforms, which may be caused by vehicle acceleration and/or the various traveling positions in lanes. In this paper, we propose a novel BWIM mechanism, which employs a deep convolutional neural network (CNN). The CNN is able to learn actual traffic conditions and achieve accurate load estimation by using only a single strain sensor. The training dataset is collected from a distant load meter, by consulting traffic surveillance cameras and identifying similar vehicles. After the system initialization, the CNN requires no additional sensors (or cameras) for axle detection, which may reduce the costs of both installation and system maintenance.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"45 1","pages":"3047-3051"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80076988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Enhanced Mixture Population Monte Carlo Via Stochastic Optimization and Markov Chain Monte Carlo Sampling 基于随机优化和马尔可夫链蒙特卡罗抽样的增强混合种群蒙特卡罗
Yousef El-Laham, P. Djurić, M. Bugallo
The population Monte Carlo (PMC) algorithm is a popular adaptive importance sampling (AIS) method used for approximate computation of intractable integrals. Over the years, many advances have been made in the theory and implementation of PMC schemes. The mixture PMC (M-PMC) algorithm, for instance, optimizes the parameters of a mixture proposal distribution in a way that minimizes that Kullback-Leibler divergence to the target distribution. The parameters in M-PMC are updated using a single step of expectation maximization (EM), which limits its accuracy. In this work, we introduce a novel M-PMC algorithm that optimizes the parameters of a mixture proposal distribution, where parameter updates are resolved via stochastic optimization instead of EM. The stochastic gradients w.r.t. each of the mixture parameters are approximated using a population of Markov chain Monte Carlo samplers. We validate the proposed scheme via numerical simulations on an example where the considered target distribution is multimodal.
总体蒙特卡罗(PMC)算法是一种常用的自适应重要抽样(AIS)方法,用于求解难解积分的近似计算。多年来,PMC方案在理论和实施方面取得了许多进展。例如,混合PMC (M-PMC)算法以最小化目标分布的Kullback-Leibler散度的方式优化混合建议分布的参数。M-PMC的参数更新采用单步期望最大化方法,这限制了其精度。在这项工作中,我们引入了一种新的M-PMC算法,该算法优化了混合建议分布的参数,其中参数更新通过随机优化而不是EM来解决。每个混合参数的随机梯度w.r.t.使用马尔可夫链蒙特卡罗采样器的总体来近似。在多模态目标分布的情况下,通过数值模拟验证了该方法的有效性。
{"title":"Enhanced Mixture Population Monte Carlo Via Stochastic Optimization and Markov Chain Monte Carlo Sampling","authors":"Yousef El-Laham, P. Djurić, M. Bugallo","doi":"10.1109/ICASSP40776.2020.9053410","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053410","url":null,"abstract":"The population Monte Carlo (PMC) algorithm is a popular adaptive importance sampling (AIS) method used for approximate computation of intractable integrals. Over the years, many advances have been made in the theory and implementation of PMC schemes. The mixture PMC (M-PMC) algorithm, for instance, optimizes the parameters of a mixture proposal distribution in a way that minimizes that Kullback-Leibler divergence to the target distribution. The parameters in M-PMC are updated using a single step of expectation maximization (EM), which limits its accuracy. In this work, we introduce a novel M-PMC algorithm that optimizes the parameters of a mixture proposal distribution, where parameter updates are resolved via stochastic optimization instead of EM. The stochastic gradients w.r.t. each of the mixture parameters are approximated using a population of Markov chain Monte Carlo samplers. We validate the proposed scheme via numerical simulations on an example where the considered target distribution is multimodal.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"5475-5479"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80291874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fast Proximal Point Algorithm for Generalized Graph Laplacian Learning 广义图拉普拉斯学习的快速近点算法
Zengde Deng, A. M. So
Graph learning is one of the most important tasks in machine learning, statistics and signal processing. In this paper, we focus on the problem of learning the generalized graph Lapla-cian (GGL) and propose an efficient algorithm to solve it. We first fully exploit the sparsity structure hidden in the objective function by utilizing soft-thresholding technique to transform the GGL problem into an equivalent problem. Moreover, we propose a fast proximal point algorithm (PPA) to solve the transformed GGL problem and establish the linear convergence rate of our algorithm. Extensive numerical experiments on both synthetic data and real data demonstrate that the soft-thresholding technique accelerates our PPA method and PPA can outperform the current state-of-the-art method in terms of speed.
图学习是机器学习、统计学和信号处理中最重要的任务之一。本文主要研究广义图拉普拉斯(GGL)的学习问题,并提出了一种有效的算法。我们首先利用软阈值技术将GGL问题转化为等价问题,充分利用隐藏在目标函数中的稀疏性结构。此外,我们提出了一种快速的近点算法(PPA)来解决变换后的GGL问题,并建立了算法的线性收敛速度。在合成数据和实际数据上进行的大量数值实验表明,软阈值技术加速了我们的PPA方法,PPA在速度上优于目前最先进的方法。
{"title":"A Fast Proximal Point Algorithm for Generalized Graph Laplacian Learning","authors":"Zengde Deng, A. M. So","doi":"10.1109/ICASSP40776.2020.9054185","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054185","url":null,"abstract":"Graph learning is one of the most important tasks in machine learning, statistics and signal processing. In this paper, we focus on the problem of learning the generalized graph Lapla-cian (GGL) and propose an efficient algorithm to solve it. We first fully exploit the sparsity structure hidden in the objective function by utilizing soft-thresholding technique to transform the GGL problem into an equivalent problem. Moreover, we propose a fast proximal point algorithm (PPA) to solve the transformed GGL problem and establish the linear convergence rate of our algorithm. Extensive numerical experiments on both synthetic data and real data demonstrate that the soft-thresholding technique accelerates our PPA method and PPA can outperform the current state-of-the-art method in terms of speed.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1 1","pages":"5425-5429"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80404967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Enhanced Non-Local Cascading Network with Attention Mechanism for Hyperspectral Image Denoising 基于注意机制的增强非局部级联网络高光谱图像去噪
Hanwen Ma, Ganchao Liu, Yuan Yuan
Because of the complexity of imaging environment, hyper-spectral remote sensing images (HSIs) often suffer from different kinds of noise. Despite the success in natural image denoising, most of the existing CNN-based HSIs denoising methods still suffer from the problem of inadequate noise suppression and insufficient feature extraction. In this paper, a novel HSIs denoising algorithm based on an enhanced non-local cascading network with attention mechanism (ENCAM) is proposed, which can extract the joint spatial-spectral feature more effectively. The main contributions include: (1) the non-local structure is introduced to enlarge the receptive field to extract the spatial features more effectively; (2) multi-scale convolutions and channel attention module are applied to enhance extracted multi-scale features; (3) a cascading residual dense structure is used to extract different frequency features. Both of the theoretical analysis and the experiments indicate that the proposed method is superior to the other state-of-the-art methods on HSIs denoising.
由于成像环境的复杂性,高光谱遥感图像经常受到各种噪声的干扰。尽管在自然图像去噪方面取得了成功,但现有的大多数基于cnn的hsi去噪方法仍然存在噪声抑制不足和特征提取不足的问题。本文提出了一种基于增强非局部级联网络注意机制(ENCAM)的hsi去噪算法,该算法能更有效地提取联合空间-频谱特征。主要贡献有:(1)引入非局部结构,扩大接收野,更有效地提取空间特征;(2)采用多尺度卷积和通道关注模块增强提取的多尺度特征;(3)利用级联残差密集结构提取不同频率特征。理论分析和实验结果均表明,该方法对hsi信号的去噪效果优于现有方法。
{"title":"Enhanced Non-Local Cascading Network with Attention Mechanism for Hyperspectral Image Denoising","authors":"Hanwen Ma, Ganchao Liu, Yuan Yuan","doi":"10.1109/ICASSP40776.2020.9054630","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054630","url":null,"abstract":"Because of the complexity of imaging environment, hyper-spectral remote sensing images (HSIs) often suffer from different kinds of noise. Despite the success in natural image denoising, most of the existing CNN-based HSIs denoising methods still suffer from the problem of inadequate noise suppression and insufficient feature extraction. In this paper, a novel HSIs denoising algorithm based on an enhanced non-local cascading network with attention mechanism (ENCAM) is proposed, which can extract the joint spatial-spectral feature more effectively. The main contributions include: (1) the non-local structure is introduced to enlarge the receptive field to extract the spatial features more effectively; (2) multi-scale convolutions and channel attention module are applied to enhance extracted multi-scale features; (3) a cascading residual dense structure is used to extract different frequency features. Both of the theoretical analysis and the experiments indicate that the proposed method is superior to the other state-of-the-art methods on HSIs denoising.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"2448-2452"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79116148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Intra Frame Rate Control for Versatile Video Coding with Quadratic Rate-Distortion Modelling 基于二次码率失真模型的通用视频编码帧内速率控制
Yi Chen, S. Kwong, Mingliang Zhou, Shiqi Wang, Guopu Zhu, Yi Wang
With numerous coding tools adopted in the forthcoming Versatile Video Coding (VVC) standard, much less work has been dedicated to study the corresponding Rate-Distortion (R-D) characteristics. This paper proposes a new quadratic R-D model for Versatile Video Coding. In particular, based on the proposed model, a new R-λ relationship is derived and used for frame level rate control. The rate control algorithm is implemented on VTM 2.0 platform for intra coding scenarios. Compared to the default rate control algorithm in VTM 2.0, experimental results show that proposed rate control algorithm can achieve 0.77% BD-BR reduction with similar control accuracy.
在即将到来的通用视频编码(VVC)标准中采用了许多编码工具,因此研究相应的率失真(R-D)特性的工作要少得多。本文提出了一种新的二次R-D模型用于通用视频编码。在此基础上,推导了一种新的R-λ关系,并将其用于帧级速率控制。在VTM 2.0平台上实现了码内编码的速率控制算法。实验结果表明,与VTM 2.0中的默认速率控制算法相比,本文提出的速率控制算法在控制精度相近的情况下,可以实现0.77%的BD-BR降频。
{"title":"Intra Frame Rate Control for Versatile Video Coding with Quadratic Rate-Distortion Modelling","authors":"Yi Chen, S. Kwong, Mingliang Zhou, Shiqi Wang, Guopu Zhu, Yi Wang","doi":"10.1109/ICASSP40776.2020.9054633","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054633","url":null,"abstract":"With numerous coding tools adopted in the forthcoming Versatile Video Coding (VVC) standard, much less work has been dedicated to study the corresponding Rate-Distortion (R-D) characteristics. This paper proposes a new quadratic R-D model for Versatile Video Coding. In particular, based on the proposed model, a new R-λ relationship is derived and used for frame level rate control. The rate control algorithm is implemented on VTM 2.0 platform for intra coding scenarios. Compared to the default rate control algorithm in VTM 2.0, experimental results show that proposed rate control algorithm can achieve 0.77% BD-BR reduction with similar control accuracy.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"305 1","pages":"4422-4426"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79341100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1