ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文中文

Time Reversal Based Robust Gesture Recognition Using Wifi 基于时间反转的鲁棒手势识别使用Wifi

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9053420

Sai Deepika Regani, Beibei Wang, Min Wu, K. Liu

Gesture recognition using wireless sensing opened a plethora of applications in the field of human-computer interaction. However, most existing works are not robust without requiring wearables or tedious training/calibration. In this work, we propose WiGRep, a time reversal based gesture recognition approach using Wi-Fi, which can recognize different gestures by counting the number of repeating gesture segments. Built upon the time reversal phenomenon in RF transmission, the Time Reversal Resonating Strength (TRRS) is used to detect repeating patterns in a gesture. A robust low-complexity algorithm is proposed to accommodate possible variations of gestures and indoor environments. The main advantages of WiGRep are that it is calibration-free and location and environment independent. Experiments performed in both line of sight and non-line-of-sight scenarios demonstrate a detection rate of 99.6% and 99.4%, respectively, for a fixed false alarm rate of 5%.

基于无线传感的手势识别在人机交互领域开辟了大量的应用领域。然而，如果不需要可穿戴设备或繁琐的培训/校准，大多数现有的工作都不是健壮的。在这项工作中，我们提出了WiGRep，一种使用Wi-Fi的基于时间反转的手势识别方法，它可以通过计算重复手势片段的数量来识别不同的手势。基于射频传输中的时间反转现象，时间反转共振强度(TRRS)用于检测手势中的重复模式。提出了一种鲁棒的低复杂度算法，以适应手势和室内环境的可能变化。WiGRep的主要优点是无需校准，与位置和环境无关。在视线和非视线场景下进行的实验表明，在固定的虚警率为5%的情况下，检测率分别为99.6%和99.4%。

引用次数: 7

Matching Pursuit Based Dynamic Phase-Amplitude Coupling Measure 基于匹配追踪的动态相幅耦合测量

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9054503

T. T. Munia, Selin Aviyente

Long-distance neuronal communication in the brain is enabled by the interactions across various oscillatory frequencies. One interaction that is gaining importance during cognitive brain functions is phase amplitude coupling (PAC), where the phase of a slow oscillation modulates the amplitude of a fast oscillation. Current techniques for calculating PAC provide a numerical index that represents an average value across a pre-determined time window. However, there is growing empirical evidence that PAC is dynamic, varying across time. Current approaches to quantify time-varying PAC relies on computing PAC over sliding short time windows. This approach suffers from the arbitrary selection of the window length and does not adapt to the signal dynamics. In this paper, we introduce a data-driven approach to quantify dynamic PAC. The proposed approach relies on decomposing the signal using matching pursuit (MP) to extract time and frequency localized atoms that best describe the given signal. These atoms are then used to compute PAC across time and frequency. As the atoms are time and frequency localized, we only compute PAC across time and frequency regions determined by the selected atoms rather than the whole time-frequency range. The proposed approach is evaluated on both simulated and real electroencephalogram (EEG) signals.

大脑中的远距离神经元通信是通过不同振荡频率的相互作用实现的。在认知脑功能中越来越重要的一个相互作用是相位振幅耦合(PAC)，其中慢振荡的相位调制快速振荡的振幅。目前计算PAC的技术提供了一个数字索引，它表示预先确定的时间窗口内的平均值。然而，越来越多的经验证据表明，PAC是动态的，随时间而变化。目前量化时变PAC的方法依赖于计算滑动短时间窗口上的PAC。这种方法存在窗长的任意选择和不适应信号动力学的缺点。在本文中，我们引入了一种数据驱动的方法来量化动态PAC。所提出的方法依赖于使用匹配追踪(MP)对信号进行分解，以提取最能描述给定信号的时间和频率局部原子。然后使用这些原子计算跨时间和频率的PAC。由于原子是时间和频率局域化的，我们只计算由所选原子确定的时间和频率区域的PAC，而不是整个时间-频率范围。在模拟和真实的脑电图信号上对该方法进行了评估。

{"title":"Matching Pursuit Based Dynamic Phase-Amplitude Coupling Measure","authors":"T. T. Munia, Selin Aviyente","doi":"10.1109/ICASSP40776.2020.9054503","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054503","url":null,"abstract":"Long-distance neuronal communication in the brain is enabled by the interactions across various oscillatory frequencies. One interaction that is gaining importance during cognitive brain functions is phase amplitude coupling (PAC), where the phase of a slow oscillation modulates the amplitude of a fast oscillation. Current techniques for calculating PAC provide a numerical index that represents an average value across a pre-determined time window. However, there is growing empirical evidence that PAC is dynamic, varying across time. Current approaches to quantify time-varying PAC relies on computing PAC over sliding short time windows. This approach suffers from the arbitrary selection of the window length and does not adapt to the signal dynamics. In this paper, we introduce a data-driven approach to quantify dynamic PAC. The proposed approach relies on decomposing the signal using matching pursuit (MP) to extract time and frequency localized atoms that best describe the given signal. These atoms are then used to compute PAC across time and frequency. As the atoms are time and frequency localized, we only compute PAC across time and frequency regions determined by the selected atoms rather than the whole time-frequency range. The proposed approach is evaluated on both simulated and real electroencephalogram (EEG) signals.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"1279-1283"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85416048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

TDMF: Task-Driven Multilevel Framework for End-to-End Speaker Verification TDMF:端到端说话者验证的任务驱动多级框架

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9052957

Chen Chen, Jiqing Han

In this paper, a task-driven multilevel framework (TDMF) is proposed for end-to-end speaker verification. The TDMF has four layers, and each layer has different effects on speaker models or representations to implement the functions of universal background model (UBM), Gaussian mixture model (GMM), total variability model (TVM) and probabilistic linear discriminant analysis (PLDA). Unlike the typical i-vector method, the proposed TDMF can supervise the optimal solution of each phase (layer) towards the direction required by the PLDA classifier. Moreover, different from most endto-end neural network approaches, which extract embeddings first and then additionally calculate the distance between two embeddings as the verification score, the TDMF can directly provide scores via the fourth-layer PLDA. The experimental results show that the TDMF can achieve better performance than that of the typical i-vector framework and VGG-M convolutional neural networks (CNN) framework.

本文提出了一种任务驱动多层框架(TDMF)，用于端到端说话人验证。TDMF有四层，每层对说话人模型或表示有不同的影响，实现通用背景模型(UBM)、高斯混合模型(GMM)、总变异模型(TVM)和概率线性判别分析(PLDA)的功能。与典型的i向量方法不同，所提出的TDMF可以监督每个相位(层)的最优解朝着PLDA分类器所需的方向。此外，与大多数端到端神经网络方法先提取嵌入，然后再计算两个嵌入之间的距离作为验证分数不同，TDMF可以通过第四层PLDA直接提供分数。实验结果表明，TDMF比典型的i向量框架和VGG-M卷积神经网络(CNN)框架具有更好的性能。

引用次数: 1

Subjective Quality Estimation Using PESQ For Hands-Free Terminals 基于PESQ的免提终端主观质量评价

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9053960

S. Kurihara, M. Fukui, Suehiro Shimauchi, N. Harada

Previous reports have mentioned the possibility that subjective quality of the echo-suppressed speech signal can be estimated based on perceptual evaluation of speech quality (PESQ), but there are few experimental results. We propose third-party listening and conversational test procedures to assess whether PESQ can be used for predicting the subjective quality of an acoustic echo canceler. In the proposed third-party listening test procedure, near-end and far-end signals are presented separately in the left and right channels of stereo playback and differential category rating evaluation is applied to those stimuli for obtaining differential mean opinion scores. In the proposed conversational test procedure, impaired and non-impaired reference signals are recorded during a conversation to make PESQ processing possible. Experimental results indicate that there is a strong correlation between PESQ and subjective scores.

以前的报道已经提到了基于语音质量感知评价(PESQ)来估计回波抑制语音信号主观质量的可能性，但实验结果很少。我们提出第三方听力和会话测试程序来评估PESQ是否可以用于预测声学回声消除器的主观质量。在本文提出的第三方听力测试流程中，近端和远端信号分别呈现在立体声播放的左右声道中，并对这些刺激进行差分类别评定，得到差分平均意见分数。在提议的会话测试过程中，在会话期间记录受损和未受损的参考信号，以使PESQ处理成为可能。实验结果表明，PESQ与主观得分之间存在很强的相关性。

引用次数: 1

Transmit Beamforming Design with Received-Interference Power Constraints: The Zero-Forcing Relaxation 具有接收干扰功率约束的发射波束形成设计:零强迫松弛

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9053471

E. Lagunas, A. Pérez-Neira, M. Lagunas, M. Vázquez

The use of multi-antenna transmitters is emerging as an essential technology of the future wireless communication systems. While Zero-Forcing Beamforming (ZFB) has become the most popular low-complexity transmit beamforming design, it has some drawbacks basically related to the effort of "trying" to invert the channel coefficients towards the interfered users. In particular, ZFB performs poorly in the low Signal-to-Noise Ratio (SNR) regime and does not work when the interfered users outnumber the transmit antennas. In this paper, we study in detail an alternative transmit beamforming design framework, where we allow some residual received-interference power instead of trying to null it completely out. Subsequently, we provide a close-form non-iterative optimal solution that avoids the use of sophisticated convex optimization techniques that compromise its applicability onto practical systems. Supporting results based on numerical simulations show that the proposed transmit beamforming is able to perform close to the optimal with much lower computational complexity.

多天线发射机的使用正在成为未来无线通信系统的一项基本技术。虽然零强迫波束形成(ZFB)已经成为最流行的低复杂度发射波束形成设计，但它存在一些缺点，主要与“试图”向受干扰用户反转信道系数的努力有关。特别是，ZFB在低信噪比(SNR)条件下表现不佳，并且当受干扰用户数量超过发射天线时无法工作。在本文中，我们详细研究了一种替代的发射波束形成设计框架，其中我们允许一些剩余的接收干扰功率，而不是试图完全消除它。随后，我们提供了一个紧密形式的非迭代最优解，避免了使用复杂的凸优化技术，从而损害了其在实际系统中的适用性。基于数值模拟的支持结果表明，所提出的发射波束形成能够以较低的计算复杂度接近最优。

{"title":"Transmit Beamforming Design with Received-Interference Power Constraints: The Zero-Forcing Relaxation","authors":"E. Lagunas, A. Pérez-Neira, M. Lagunas, M. Vázquez","doi":"10.1109/ICASSP40776.2020.9053471","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053471","url":null,"abstract":"The use of multi-antenna transmitters is emerging as an essential technology of the future wireless communication systems. While Zero-Forcing Beamforming (ZFB) has become the most popular low-complexity transmit beamforming design, it has some drawbacks basically related to the effort of \"trying\" to invert the channel coefficients towards the interfered users. In particular, ZFB performs poorly in the low Signal-to-Noise Ratio (SNR) regime and does not work when the interfered users outnumber the transmit antennas. In this paper, we study in detail an alternative transmit beamforming design framework, where we allow some residual received-interference power instead of trying to null it completely out. Subsequently, we provide a close-form non-iterative optimal solution that avoids the use of sophisticated convex optimization techniques that compromise its applicability onto practical systems. Supporting results based on numerical simulations show that the proposed transmit beamforming is able to perform close to the optimal with much lower computational complexity.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"4727-4731"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84154790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learning 基于语音的肌萎缩侧索硬化症、帕金森病和健康对照的CNN-LSTM转移学习分类

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9053682

Jhansi Mallela, Aravind Illa, BN Suhas, Sathvik Udupa, Yamini Belur, A. Nalini, R. Yadav, P. Reddy, D. Gope, P. Ghosh

In this paper, we consider 2-class and 3-class classification problems for classifying patients with Amyotrophic Lateral Sclerosis (ALS), Parkinson’s Disease (PD), and Healthy Controls (HC) using a CNNLSTM network. Classification performance is examined for three different tasks, namely, Spontaneous speech (SPON), Diadochokinetic rate (DIDK) and Sustained phoneme production (PHON). Experiments are conducted using speech data recorded from 60 ALS, 60 PD, and 60 HC subjects. Classifications using SVM and DNN are considered as baseline schemes. Classification accuracy of ALS and HC (indicated by ALS/HC) using CNN-LSTM has shown an improvement of 10.40%, 4.22% and 0.08% for PHON, SPON and DIDK tasks, respectively over the best of the baseline schemes. Furthermore, the CNN-LSTM network achieves the highest PD/HC classification accuracy of 88.5% for the SPON task and the highest 3-class (ALS/PD/HC) classification accuracy of 85.24% for the DIDK task. Experiments using transfer learning at low resource training data show that data from ALS benefits PD/HC classification and vice-versa. Experiments with fine-tuning weights of 3-class (ALS/PD/HC) classifier for 2-class classification (PD/HC or ALS/HC) gives an absolute improvement of 2% classification accuracy in SPON task when compared with randomly initialized 2-class classifier.

在本文中，我们考虑了使用CNNLSTM网络对肌萎缩侧索硬化症(ALS)、帕金森病(PD)和健康对照(HC)患者进行分类的2类和3类分类问题。本研究考察了三种不同任务的分类性能，即自发性语音(SPON)、双代动力学速率(DIDK)和持续音素产生(PHON)。实验使用60例ALS、60例PD和60例HC受试者的语音数据进行。采用支持向量机和深度神经网络作为分类基准方案。使用CNN-LSTM对PHON、SPON和DIDK任务的ALS和HC(用ALS/HC表示)分类准确率分别比最佳基线方案提高10.40%、4.22%和0.08%。此外，CNN-LSTM网络在SPON任务上的PD/HC分类准确率最高为88.5%，在DIDK任务上的3类(ALS/PD/HC)分类准确率最高为85.24%。在低资源训练数据上使用迁移学习的实验表明，ALS数据有利于PD/HC分类，反之亦然。通过对3类(ALS/PD/HC)分类器权值进行微调，对2类分类器(PD/HC或ALS/HC)进行实验，SPON任务的分类准确率比随机初始化的2类分类器提高了2%。

{"title":"Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learning","authors":"Jhansi Mallela, Aravind Illa, BN Suhas, Sathvik Udupa, Yamini Belur, A. Nalini, R. Yadav, P. Reddy, D. Gope, P. Ghosh","doi":"10.1109/ICASSP40776.2020.9053682","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053682","url":null,"abstract":"In this paper, we consider 2-class and 3-class classification problems for classifying patients with Amyotrophic Lateral Sclerosis (ALS), Parkinson’s Disease (PD), and Healthy Controls (HC) using a CNNLSTM network. Classification performance is examined for three different tasks, namely, Spontaneous speech (SPON), Diadochokinetic rate (DIDK) and Sustained phoneme production (PHON). Experiments are conducted using speech data recorded from 60 ALS, 60 PD, and 60 HC subjects. Classifications using SVM and DNN are considered as baseline schemes. Classification accuracy of ALS and HC (indicated by ALS/HC) using CNN-LSTM has shown an improvement of 10.40%, 4.22% and 0.08% for PHON, SPON and DIDK tasks, respectively over the best of the baseline schemes. Furthermore, the CNN-LSTM network achieves the highest PD/HC classification accuracy of 88.5% for the SPON task and the highest 3-class (ALS/PD/HC) classification accuracy of 85.24% for the DIDK task. Experiments using transfer learning at low resource training data show that data from ALS benefits PD/HC classification and vice-versa. Experiments with fine-tuning weights of 3-class (ALS/PD/HC) classifier for 2-class classification (PD/HC or ALS/HC) gives an absolute improvement of 2% classification accuracy in SPON task when compared with randomly initialized 2-class classifier.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"51 1","pages":"6784-6788"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78291256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

JPEG Steganography with Side Information from the Processing Pipeline 来自处理管道的带有侧信息的JPEG隐写

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9054486

Quentin Giboulot, R. Cogranne, P. Bas

The current art in schemes using deflection criterion such as Mi-POD for JPEG steganography is either under-performing or on par with distortion-based schemes. We link this lack of performance to a poor estimation of the variance of the model of the noise on the cover image. In this paper, we propose a method to better estimate the variances of DCT coefficients by taking into account the dependencies between pixels that come from the development pipeline. Using this estimate, we are able to extend statistically-informed steganographic schemes to the JPEG domain while significantly outperforming the current state-of-the-art JPEG steganography. An extension of Gaussian Embedding in the JPEG domain using quantization error as side-information is also formulated and shown to attain state-of-the-art performances.

目前使用偏转准则的方案，如用于JPEG隐写的Mi-POD，要么性能不佳，要么与基于失真的方案相当。我们将这种性能的缺乏与封面图像上噪声模型方差的不良估计联系起来。在本文中，我们提出了一种方法，通过考虑来自开发管道的像素之间的依赖关系来更好地估计DCT系数的方差。使用这个估计，我们能够将统计信息隐写方案扩展到JPEG域，同时显著优于当前最先进的JPEG隐写。在JPEG域中使用量化误差作为侧信息的高斯嵌入的扩展也被制定并显示达到最先进的性能。

引用次数: 13

Balancing Rates and Variance via Adaptive Batch-Sizes in First-Order Stochastic Optimization 一阶随机优化中自适应批大小的平衡率和方差

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9054292

Zhan Gao, Alec Koppel, Alejandro Ribeiro

Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-sizes is required for exact asymptotic convergence with the fact that larger constant step-sizes learn faster in finite time up to an error. To do so, rather than fixing the mini-batch and step-size at the outset, we propose a strategy to allow parameters to evolve adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is shown to inherit the exact asymptotic convergence of stochastic gradient method. More importantly, the optimal error decreasing rate is achieved theoretically, as well as an overall reduction in sample computational cost. Experimentally, we observe a favorable tradeoff relative to standard SGD schemes absorbing their advantages, which illustrates the significant performance of proposed TSA scheme.

随机梯度下降是解决随机优化问题的规范工具，是现代机器学习和统计学的基础。在这项工作中，我们试图平衡这样一个事实，即衰减步长是精确渐近收敛所必需的，而更大的恒定步长在有限时间内学习得更快，直到一个误差。为了做到这一点，我们提出了一种允许参数自适应进化的策略，而不是在一开始就固定小批量和步长。具体来说，将批大小设置为分段常数递增序列，当满足适当的误差标准时，增加就会发生。此外，步长选择为产生最快的收敛。总体算法双尺度自适应(TSA)方案继承了随机梯度法的精确渐近收敛性。更重要的是，从理论上实现了最优的误码率，以及总体上减少了样本计算成本。实验中，我们观察到相对于标准SGD方案的有利权衡吸收了它们的优点，这说明了所提出的TSA方案的显著性能。

{"title":"Balancing Rates and Variance via Adaptive Batch-Sizes in First-Order Stochastic Optimization","authors":"Zhan Gao, Alec Koppel, Alejandro Ribeiro","doi":"10.1109/ICASSP40776.2020.9054292","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054292","url":null,"abstract":"Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-sizes is required for exact asymptotic convergence with the fact that larger constant step-sizes learn faster in finite time up to an error. To do so, rather than fixing the mini-batch and step-size at the outset, we propose a strategy to allow parameters to evolve adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is shown to inherit the exact asymptotic convergence of stochastic gradient method. More importantly, the optimal error decreasing rate is achieved theoretically, as well as an overall reduction in sample computational cost. Experimentally, we observe a favorable tradeoff relative to standard SGD schemes absorbing their advantages, which illustrates the significant performance of proposed TSA scheme.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"45 1","pages":"5385-5389"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73324555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Multi Image Depth from Defocus Network with Boundary Cue for Dual Aperture Camera 基于边界提示的双光圈相机离焦网络多图像深度研究

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9054346

Gwangmo Song, Yumee Kim, K. Chun, Kyoung Mu Lee

In this paper, we estimate depth information using two defocused images from dual aperture camera. Recent advances in deep learning techniques have increased the accuracy of depth estimation. Besides, methods of using a defocused image in which an object is blurred according to a distance from a camera have been widely studied. We further improve the accuracy of the depth estimation by training the network using two images with different degrees of depth-of-field. Using images taken with different apertures for the same scene, we can determine the degree of blur in an image more accurately. In this work, we propose a novel deep convolutional network that estimates depth map using dual aperture images based on boundary cue. Our proposed method achieves state-of-the-art performance on a synthetically modified NYU-v2 dataset. In addition, we built a new camera using fast variable apertures to build a test environment in the real world. In particular, we collected a new dataset which consists of real world vehicle driving scenes. Our proposed work shows excellent performance in the new dataset.

本文利用双光圈相机的两幅离焦图像估计深度信息。深度学习技术的最新进展提高了深度估计的准确性。此外，利用离焦图像的方法也得到了广泛的研究，其中物体根据与相机的距离模糊。我们通过使用两幅不同景深的图像来训练网络，进一步提高了深度估计的准确性。使用不同光圈对同一场景拍摄的图像，我们可以更准确地确定图像中的模糊程度。在这项工作中，我们提出了一种新的深度卷积网络，该网络使用基于边界线索的双孔径图像估计深度图。我们提出的方法在综合修改的NYU-v2数据集上实现了最先进的性能。此外，我们构建了一个使用快速可变光圈的新相机，以在现实世界中构建测试环境。特别是，我们收集了一个由真实世界车辆驾驶场景组成的新数据集。我们提出的工作在新的数据集中表现出优异的性能。

{"title":"Multi Image Depth from Defocus Network with Boundary Cue for Dual Aperture Camera","authors":"Gwangmo Song, Yumee Kim, K. Chun, Kyoung Mu Lee","doi":"10.1109/ICASSP40776.2020.9054346","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054346","url":null,"abstract":"In this paper, we estimate depth information using two defocused images from dual aperture camera. Recent advances in deep learning techniques have increased the accuracy of depth estimation. Besides, methods of using a defocused image in which an object is blurred according to a distance from a camera have been widely studied. We further improve the accuracy of the depth estimation by training the network using two images with different degrees of depth-of-field. Using images taken with different apertures for the same scene, we can determine the degree of blur in an image more accurately. In this work, we propose a novel deep convolutional network that estimates depth map using dual aperture images based on boundary cue. Our proposed method achieves state-of-the-art performance on a synthetically modified NYU-v2 dataset. In addition, we built a new camera using fast variable apertures to build a test environment in the real world. In particular, we collected a new dataset which consists of real world vehicle driving scenes. Our proposed work shows excellent performance in the new dataset.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"2293-2297"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73384588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Acoustic Scene Classification for Mismatched Recording Devices Using Heated-Up Softmax and Spectrum Correction 基于升温Softmax和频谱校正的不匹配录音设备声场景分类

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pub Date : 2020-05-01 DOI: 10.1109/ICASSP40776.2020.9053582

Truc The Nguyen, F. Pernkopf, Michal Kosmider

Deep neural networks (DNNs) are successful in applications with matching inference and training distributions. In realworld scenarios, DNNs have to cope with truly new data samples during inference, potentially coming from a shifted data distribution. This usually causes a drop in performance. Acoustic scene classification (ASC) with different recording devices is one of this situation. Furthermore, an imbalance in quality and amount of data recorded by different devices causes severe challenges. In this paper, we introduce two calibration methods to tackle these challenges. In particular, we applied scaling of the features to deal with varying frequency response of the recording devices. Furthermore, to account for the shifted data distribution, a heated-up softmax is embedded to calibrate the predictions of the model. We use robust and resource-efficient models, and show the efficiency of heated-up softmax. Our ASC system reaches state-of-the-art performance on the development set of DCASE challenge 2019 task 1B with only ~70K parameters. It achieves 70.1% average classification accuracy for device B and device C. It performs on par with the best single model system of the DCASE 2019 challenge and outperforms the baseline system by 28.7% (absolute).

深度神经网络(dnn)在匹配推理和训练分布的应用中取得了成功。在现实世界的场景中，dnn必须在推理过程中处理真正的新数据样本，这些数据样本可能来自移位的数据分布。这通常会导致性能下降。使用不同的录音设备进行声场景分类(ASC)就是这种情况之一。此外，不同设备记录的数据质量和数量的不平衡带来了严峻的挑战。在本文中，我们介绍了两种校准方法来解决这些挑战。特别是，我们应用了特征的缩放来处理记录设备的不同频率响应。此外，为了解释移位的数据分布，嵌入了一个加热的softmax来校准模型的预测。我们使用鲁棒性和资源效率高的模型，并展示了升温softmax的效率。我们的ASC系统在DCASE挑战2019任务1B的开发集上达到了最先进的性能，只有~70K个参数。设备B和设备c的平均分类准确率达到70.1%，其性能与DCASE 2019挑战中最佳单一模型系统相当，并且比基线系统高出28.7%(绝对)。

{"title":"Acoustic Scene Classification for Mismatched Recording Devices Using Heated-Up Softmax and Spectrum Correction","authors":"Truc The Nguyen, F. Pernkopf, Michal Kosmider","doi":"10.1109/ICASSP40776.2020.9053582","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053582","url":null,"abstract":"Deep neural networks (DNNs) are successful in applications with matching inference and training distributions. In realworld scenarios, DNNs have to cope with truly new data samples during inference, potentially coming from a shifted data distribution. This usually causes a drop in performance. Acoustic scene classification (ASC) with different recording devices is one of this situation. Furthermore, an imbalance in quality and amount of data recorded by different devices causes severe challenges. In this paper, we introduce two calibration methods to tackle these challenges. In particular, we applied scaling of the features to deal with varying frequency response of the recording devices. Furthermore, to account for the shifted data distribution, a heated-up softmax is embedded to calibrate the predictions of the model. We use robust and resource-efficient models, and show the efficiency of heated-up softmax. Our ASC system reaches state-of-the-art performance on the development set of DCASE challenge 2019 task 1B with only ~70K parameters. It achieves 70.1% average classification accuracy for device B and device C. It performs on par with the best single model system of the DCASE 2019 challenge and outperforms the baseline system by 28.7% (absolute).","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"126-130"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79998896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀