首页 > 最新文献

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Raw Waveform Based End-to-end Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources
Harshavardhan Sundar, Weiran Wang, Ming Sun, Chao Wang
In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported deep learning based approaches work well in localizing a single source directly from multi-channel raw-audio, but are not easily extendable to localize multiple sources due to the well known permutation problem. We propose a novel encoding scheme to represent the spatial coordinates of multiple sources, which facilitates 2D localization of multiple sources in an end-to-end fashion, avoiding the permutation problem and achieving arbitrary spatial resolution. Experiments on a simulated data set and real recordings from the AV16.3 Corpus demonstrate that the proposed method generalizes well to unseen test conditions, and outperforms a recent time difference of arrival (TDOA) based multiple source localization approach reported in the literature.
在本文中,我们提出了一个端到端深度卷积神经网络操作多通道原始音频数据,以定位空间中多个同时活跃的声源。先前报道的基于深度学习的方法可以很好地从多声道原始音频中直接定位单个源,但由于众所周知的排列问题,不容易扩展到定位多个源。本文提出了一种新的多源空间坐标编码方案,实现了多源的端到端二维定位,避免了排列问题,实现了任意的空间分辨率。在AV16.3语料库的模拟数据集和真实记录上进行的实验表明,该方法可以很好地泛化到未知的测试条件下,并且优于最近文献报道的基于到达时差(TDOA)的多源定位方法。
{"title":"Raw Waveform Based End-to-end Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources","authors":"Harshavardhan Sundar, Weiran Wang, Ming Sun, Chao Wang","doi":"10.1109/ICASSP40776.2020.9054090","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054090","url":null,"abstract":"In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported deep learning based approaches work well in localizing a single source directly from multi-channel raw-audio, but are not easily extendable to localize multiple sources due to the well known permutation problem. We propose a novel encoding scheme to represent the spatial coordinates of multiple sources, which facilitates 2D localization of multiple sources in an end-to-end fashion, avoiding the permutation problem and achieving arbitrary spatial resolution. Experiments on a simulated data set and real recordings from the AV16.3 Corpus demonstrate that the proposed method generalizes well to unseen test conditions, and outperforms a recent time difference of arrival (TDOA) based multiple source localization approach reported in the literature.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"134 1","pages":"4642-4646"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76427371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Projection Free Dynamic Online Learning 投影免费动态在线学习
Deepak S. Kalhan, A. S. Bedi, Alec Koppel, K. Rajawat, Abhishek K. Gupta, Adrish Banerjee
Projection based algorithms are popular in the literature for online convex optimization with convex constraints and the projection step results in a bottleneck for the practical implementation of the algorithms. To avoid this bottleneck, we propose a projection-free scheme based on Frank-Wolfe: where instead of online gradient steps, we use steps that are collinear with the gradient but guaranteed to be feasible. We establish performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot. Specifically, for convex losses, we establish $mathcal{O}left( {{T^{1/2}}} right)$ dynamic regret up to metrics of non-stationarity. We relax the algorithm’s required information to only noisy gradient estimates, i.e., partial feedback and derived the dynamic regret bounds. Experiments on matrix completion problem and background separation in video demonstrate favorable performance of the proposed scheme.
基于投影的算法在具有凸约束的在线凸优化中很受欢迎,但投影步骤导致了算法实际实现的瓶颈。为了避免这一瓶颈,我们提出了一种基于Frank-Wolfe的无投影方案:我们使用与梯度共线但保证可行的步骤来代替在线梯度步骤。我们根据动态后悔来建立性能,它量化了与每个时间段的最优相比的成本积累。具体来说,对于凸损失,我们建立了$mathcal{O}left({{T^{1/2}}} right)$动态遗憾,直到非平稳度量。我们将算法所需的信息放宽到只有有噪声的梯度估计,即部分反馈,并推导出动态后悔界。对矩阵补全问题和视频背景分离的实验证明了该方法的良好性能。
{"title":"Projection Free Dynamic Online Learning","authors":"Deepak S. Kalhan, A. S. Bedi, Alec Koppel, K. Rajawat, Abhishek K. Gupta, Adrish Banerjee","doi":"10.1109/ICASSP40776.2020.9053771","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053771","url":null,"abstract":"Projection based algorithms are popular in the literature for online convex optimization with convex constraints and the projection step results in a bottleneck for the practical implementation of the algorithms. To avoid this bottleneck, we propose a projection-free scheme based on Frank-Wolfe: where instead of online gradient steps, we use steps that are collinear with the gradient but guaranteed to be feasible. We establish performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot. Specifically, for convex losses, we establish $mathcal{O}left( {{T^{1/2}}} right)$ dynamic regret up to metrics of non-stationarity. We relax the algorithm’s required information to only noisy gradient estimates, i.e., partial feedback and derived the dynamic regret bounds. Experiments on matrix completion problem and background separation in video demonstrate favorable performance of the proposed scheme.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"3957-3961"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76519411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Angular Discriminative Deep Feature Learning for Face Verification 面向人脸验证的角度判别深度特征学习
Bowen Wu, Huaming Wu
Thanks to the development of deep Convolutional Neural Network (CNN), face verification has achieved great success rapidly. Specifically, Deep Distance Metric Learning (DDML), as an emerging area, has achieved great improvements in computer vision community. Softmax loss is widely used to supervise the training of most available CNN models. Whereas, feature normalization is often used to compute the pair similarities when testing. In order to bridge the gap between training and testing, we require that the intra-class cosine similarity of the inner-product layer before softmax loss is larger than a margin in the training step, accompanied by the supervision signal of softmax loss. To enhance the discriminative power of the deeply learned features, we extend the intra-class constraint to force the intra-class cosine similarity larger than the mean of nearest neighboring inter-class ones with a margin in the normalized exponential feature projection space. Extensive experiments on Labeled Face in the Wild (LFW) and Youtube Faces (YTF) datasets demonstrate that the proposed approaches achieve competitive performance for the open-set face verification task.
由于深度卷积神经网络(CNN)的发展,人脸验证迅速取得了巨大的成功。其中,深度距离度量学习(Deep Distance Metric Learning, DDML)作为一个新兴领域,在计算机视觉领域取得了很大的进步。Softmax损失被广泛用于监督大多数可用的CNN模型的训练。而在测试时,通常使用特征归一化来计算对的相似度。为了弥合训练和测试之间的差距,我们要求在训练步骤中,softmax损失前的内积层的类内余弦相似度大于一个裕度,并伴随着softmax损失的监督信号。为了增强深度学习特征的判别能力,我们扩展了类内约束,使类内余弦相似度大于归一化指数特征投影空间中最近邻类间余弦相似度的平均值。在野外标记脸(LFW)和Youtube脸(YTF)数据集上进行的大量实验表明,所提出的方法在开放集人脸验证任务中取得了具有竞争力的性能。
{"title":"Angular Discriminative Deep Feature Learning for Face Verification","authors":"Bowen Wu, Huaming Wu","doi":"10.1109/ICASSP40776.2020.9053675","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053675","url":null,"abstract":"Thanks to the development of deep Convolutional Neural Network (CNN), face verification has achieved great success rapidly. Specifically, Deep Distance Metric Learning (DDML), as an emerging area, has achieved great improvements in computer vision community. Softmax loss is widely used to supervise the training of most available CNN models. Whereas, feature normalization is often used to compute the pair similarities when testing. In order to bridge the gap between training and testing, we require that the intra-class cosine similarity of the inner-product layer before softmax loss is larger than a margin in the training step, accompanied by the supervision signal of softmax loss. To enhance the discriminative power of the deeply learned features, we extend the intra-class constraint to force the intra-class cosine similarity larger than the mean of nearest neighboring inter-class ones with a margin in the normalized exponential feature projection space. Extensive experiments on Labeled Face in the Wild (LFW) and Youtube Faces (YTF) datasets demonstrate that the proposed approaches achieve competitive performance for the open-set face verification task.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"98 1","pages":"2133-2137"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76536403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Slicenet: Slice-Wise 3D Shapes Reconstruction from Single Image 切片:从单个图像的切片三维形状重建
Yunjie Wu, Zhengxing Sun, Youcheng Song, Yunhan Sun, Jinlong Shi
3D object reconstruction from a single image is a highly ill-posed problem, requiring strong prior knowledge of 3D shapes. Deep learning methods are popular for this task. Especially, most works utilized 3D deconvolution to generate 3D shapes. However, the resolution of results is limited by the high resource consumption of 3D deconvolution. In this paper, we propose SliceNet, sequentially generating 2D slices of 3D shapes with shared 2D deconvolution parameters. To capture relations between slices, the RNN is also introduced. Our model has three main advantages: First, the introduction of RNN allows the CNN to focus more on local geometry details,improving the results’ fine-grained plausibility. Second, replacing 3D deconvolution with 2D deconvolution reducs much consumption of memory, enabling higher resolution of final results. Third, an slice-aware attention mechanism is designed to provide dynamic information for each slice’s generation, which helps modeling the difference between multiple slices, making the learning process easier. Experiments on both synthesized data and real data illustrate the effectiveness of our method.
从单幅图像重建三维物体是一个高度不适定的问题,需要对三维形状有很强的先验知识。深度学习方法在这个任务中很受欢迎。特别是,大多数作品利用三维反褶积来生成三维形状。然而,三维反褶积的高资源消耗限制了结果的分辨率。在本文中,我们提出了SliceNet,顺序生成具有共享2D反褶积参数的3D形状的2D切片。为了捕获切片之间的关系,还引入了RNN。我们的模型有三个主要优势:首先,RNN的引入使CNN能够更多地关注局部几何细节,提高结果的细粒度合理性。其次,用2D反褶积取代3D反褶积可以减少大量内存消耗,从而实现更高的最终结果分辨率。第三,设计了切片感知注意机制,为每个切片的生成提供动态信息,有助于对多个切片之间的差异进行建模,使学习过程更容易。在合成数据和实际数据上的实验证明了该方法的有效性。
{"title":"Slicenet: Slice-Wise 3D Shapes Reconstruction from Single Image","authors":"Yunjie Wu, Zhengxing Sun, Youcheng Song, Yunhan Sun, Jinlong Shi","doi":"10.1109/ICASSP40776.2020.9054674","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054674","url":null,"abstract":"3D object reconstruction from a single image is a highly ill-posed problem, requiring strong prior knowledge of 3D shapes. Deep learning methods are popular for this task. Especially, most works utilized 3D deconvolution to generate 3D shapes. However, the resolution of results is limited by the high resource consumption of 3D deconvolution. In this paper, we propose SliceNet, sequentially generating 2D slices of 3D shapes with shared 2D deconvolution parameters. To capture relations between slices, the RNN is also introduced. Our model has three main advantages: First, the introduction of RNN allows the CNN to focus more on local geometry details,improving the results’ fine-grained plausibility. Second, replacing 3D deconvolution with 2D deconvolution reducs much consumption of memory, enabling higher resolution of final results. Third, an slice-aware attention mechanism is designed to provide dynamic information for each slice’s generation, which helps modeling the difference between multiple slices, making the learning process easier. Experiments on both synthesized data and real data illustrate the effectiveness of our method.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"1833-1837"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76188018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust Phase Retrieval with Outliers 基于异常值的鲁棒相位检索
Xue Jiang, H. So, Xingzhao Liu
An outlier-resistance phase retrieval algorithm based on alternating direction method of multipliers (ADMM) is devised in this paper. Instead of the widely used least squares criterion that is only optimal for Gaussian noise environment, we adopt the least absolute deviation criterion to enhance the robustness against outliers. Considering both intensityand amplitude-based observation models, the framework of ADMM is developed to solve the resulting non-differentiable optimization problems. It is demonstrated that the core subproblem of ADMM is the proximity operator of the ℓ1-norm, which can be computed efficiently by soft-thresholding in each iteration. Simulation results are provided to validate the accuracy and efficiency of the proposed approach compared to the existing schemes.
提出了一种基于乘法器交替方向法(ADMM)的离群电阻相位恢复算法。我们采用最小绝对偏差准则来提高对异常值的鲁棒性,而不是广泛使用的仅对高斯噪声环境最优的最小二乘准则。同时考虑基于强度和振幅的观测模型,开发了ADMM的框架来解决由此产生的不可微优化问题。证明了ADMM的核心子问题是1-范数的接近算子,在每次迭代中采用软阈值法可以有效地计算出接近算子。仿真结果验证了该方法与现有方案的准确性和有效性。
{"title":"Robust Phase Retrieval with Outliers","authors":"Xue Jiang, H. So, Xingzhao Liu","doi":"10.1109/ICASSP40776.2020.9053060","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053060","url":null,"abstract":"An outlier-resistance phase retrieval algorithm based on alternating direction method of multipliers (ADMM) is devised in this paper. Instead of the widely used least squares criterion that is only optimal for Gaussian noise environment, we adopt the least absolute deviation criterion to enhance the robustness against outliers. Considering both intensityand amplitude-based observation models, the framework of ADMM is developed to solve the resulting non-differentiable optimization problems. It is demonstrated that the core subproblem of ADMM is the proximity operator of the ℓ1-norm, which can be computed efficiently by soft-thresholding in each iteration. Simulation results are provided to validate the accuracy and efficiency of the proposed approach compared to the existing schemes.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"5320-5324"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87474215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments 利用扩张型CNNS在自然环境中进行抑郁检测的声道协调
Zhaocheng Huang, J. Epps, Dale Joachim
Depression detection from speech continues to attract significant research attention but remains a major challenge, particularly when the speech is acquired from diverse smartphones in natural environments. Analysis methods based on vocal tract coordination have shown great promise in depression and cognitive impairment detection for quantifying relationships between features over time through eigenvalues of multi-scale cross-correlations. Motivated by the success of these methods, this paper proposes a novel way to extract full vocal tract coordination (FVTC) features by use of convolutional neural networks (CNNs), overcoming earlier shortcomings. Evaluations of the proposed FVTC-CNN structure on depressed speech data show improvements in mean F1 scores of at least 16.4% under clean conditions and comparable results under noisy conditions relative to existing VTC baseline systems.
从语音中检测抑郁症继续吸引着大量的研究关注,但仍然是一个重大挑战,特别是当语音是在自然环境中从各种智能手机获取时。基于声道协调的分析方法通过多尺度相互关联的特征值来量化特征之间随时间的关系,在抑郁症和认知障碍检测中显示出很大的前景。在这些方法成功的激励下,本文提出了一种利用卷积神经网络(cnn)提取全声道协调(FVTC)特征的新方法,克服了以前的缺点。对拟议的FVTC-CNN结构在抑郁语音数据上的评估显示,相对于现有的VTC基线系统,在干净条件下和嘈杂条件下的平均F1分数至少提高了16.4%。
{"title":"Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments","authors":"Zhaocheng Huang, J. Epps, Dale Joachim","doi":"10.1109/ICASSP40776.2020.9054323","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054323","url":null,"abstract":"Depression detection from speech continues to attract significant research attention but remains a major challenge, particularly when the speech is acquired from diverse smartphones in natural environments. Analysis methods based on vocal tract coordination have shown great promise in depression and cognitive impairment detection for quantifying relationships between features over time through eigenvalues of multi-scale cross-correlations. Motivated by the success of these methods, this paper proposes a novel way to extract full vocal tract coordination (FVTC) features by use of convolutional neural networks (CNNs), overcoming earlier shortcomings. Evaluations of the proposed FVTC-CNN structure on depressed speech data show improvements in mean F1 scores of at least 16.4% under clean conditions and comparable results under noisy conditions relative to existing VTC baseline systems.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"6549-6553"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87922450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Rnn-Transducer with Stateless Prediction Network 基于无状态预测网络的rnn换能器
M. Ghodsi, Xiaofeng Liu, J. Apfel, Rodrigo Cabrera, Eugene Weinstein
The RNN-Transducer (RNNT) outperforms classic Automatic Speech Recognition (ASR) systems when a large amount of supervised training data is available. For low-resource languages, the RNNT models overfit, and can not directly take advantage of additional large text corpora as in classic ASR systems.We focus on the prediction network of the RNNT, since it is believed to be analogous to the Language Model (LM) in the classic ASR systems. We pre-train the prediction network with text-only data, which is not helpful. Moreover, removing the recurrent layers from the prediction network, which makes the prediction network stateless, performs virtually as well as the original RNNT model, when using wordpieces. The stateless prediction network does not depend on the previous output symbols, except the last one. Therefore it simplifies the RNNT architectures and the inference.Our results suggest that the RNNT prediction network does not function as the LM in classical ASR. Instead, it merely helps the model align to the input audio, while the RNNT encoder and joint networks capture both the acoustic and the linguistic information.
当有大量的监督训练数据可用时,rnn -换能器(RNNT)优于经典的自动语音识别(ASR)系统。对于低资源语言,RNNT模型过拟合,并且不能像经典ASR系统那样直接利用额外的大型文本语料库。我们将重点放在RNNT的预测网络上,因为它被认为类似于经典ASR系统中的语言模型(LM)。我们用纯文本数据预训练预测网络,这是没有帮助的。此外,从预测网络中去除循环层,使预测网络无状态,在使用词块时,几乎与原始RNNT模型一样好。无状态预测网络不依赖于之前的输出符号,除了最后一个。因此,它简化了RNNT体系结构和推理。我们的研究结果表明,RNNT预测网络并不像经典ASR中的LM那样起作用。相反,它只是帮助模型与输入音频对齐,而RNNT编码器和联合网络同时捕获声学和语言信息。
{"title":"Rnn-Transducer with Stateless Prediction Network","authors":"M. Ghodsi, Xiaofeng Liu, J. Apfel, Rodrigo Cabrera, Eugene Weinstein","doi":"10.1109/ICASSP40776.2020.9054419","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054419","url":null,"abstract":"The RNN-Transducer (RNNT) outperforms classic Automatic Speech Recognition (ASR) systems when a large amount of supervised training data is available. For low-resource languages, the RNNT models overfit, and can not directly take advantage of additional large text corpora as in classic ASR systems.We focus on the prediction network of the RNNT, since it is believed to be analogous to the Language Model (LM) in the classic ASR systems. We pre-train the prediction network with text-only data, which is not helpful. Moreover, removing the recurrent layers from the prediction network, which makes the prediction network stateless, performs virtually as well as the original RNNT model, when using wordpieces. The stateless prediction network does not depend on the previous output symbols, except the last one. Therefore it simplifies the RNNT architectures and the inference.Our results suggest that the RNNT prediction network does not function as the LM in classical ASR. Instead, it merely helps the model align to the input audio, while the RNNT encoder and joint networks capture both the acoustic and the linguistic information.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"7049-7053"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87004778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Favorable Propagation and Linear Multiuser Detection for Distributed Antenna Systems 分布式天线系统的有利传播与线性多用户检测
R. Gholami, L. Cottatellucci, D. Slock
Cell-free MIMO, employing distributed antenna systems (DAS), is a promising approach to deal with the capacity crunch of next generation wireless communications. In this paper, we consider a wireless network with transmit and receive antennas distributed according to homogeneous point processes. The received signals are jointly processed at a central processing unit. We study if the favorable propagation properties, which enable almost optimal low complexity detection via matched filtering in massive MIMO systems, hold for DAS with line of sight (LoS) channels and general attenuation exponent. Making use of Euclidean random matrices (ERM) and their moments, we show that the analytical conditions for favorable propagation are not satisfied. Hence, we propose multistage detectors, of which the matched filter represents the initial stage. We show that polynomial expansion detectors and multistage Wiener filters coincide in DAS and substantially outperform matched filtering. Simulation results are presented which validate the analytical results.
采用分布式天线系统(DAS)的无小区MIMO是解决下一代无线通信容量紧张的一种很有前途的方法。本文考虑了一种发射天线和接收天线按齐次点过程分布的无线网络。所接收的信号在中央处理机上进行联合处理。我们研究了在大规模MIMO系统中通过匹配滤波实现几乎最优的低复杂度检测的良好传播特性是否适用于具有视线(LoS)信道和一般衰减指数的DAS。利用欧几里得随机矩阵(ERM)及其矩,我们证明了有利传播的解析条件不满足。因此,我们提出了多级检测器,其中匹配的滤波器代表初始阶段。我们证明了多项式展开检测器和多级维纳滤波器在DAS中重合,并且大大优于匹配滤波。仿真结果验证了分析结果的正确性。
{"title":"Favorable Propagation and Linear Multiuser Detection for Distributed Antenna Systems","authors":"R. Gholami, L. Cottatellucci, D. Slock","doi":"10.1109/ICASSP40776.2020.9053449","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053449","url":null,"abstract":"Cell-free MIMO, employing distributed antenna systems (DAS), is a promising approach to deal with the capacity crunch of next generation wireless communications. In this paper, we consider a wireless network with transmit and receive antennas distributed according to homogeneous point processes. The received signals are jointly processed at a central processing unit. We study if the favorable propagation properties, which enable almost optimal low complexity detection via matched filtering in massive MIMO systems, hold for DAS with line of sight (LoS) channels and general attenuation exponent. Making use of Euclidean random matrices (ERM) and their moments, we show that the analytical conditions for favorable propagation are not satisfied. Hence, we propose multistage detectors, of which the matched filter represents the initial stage. We show that polynomial expansion detectors and multistage Wiener filters coincide in DAS and substantially outperform matched filtering. Simulation results are presented which validate the analytical results.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"5190-5194"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87517141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
High-Accuracy Classification of Attention Deficit Hyperactivity Disorder with L2,1-Norm Linear Discriminant Analysis 注意缺陷多动障碍的L2,1-范数线性判别分析
Yibin Tang, Xufei Li, Ying Chen, Y. Zhong, A. Jiang, Xiaofeng Liu
Attention Deficit Hyperactivity Disorder (ADHD) is a high incidence of neurobehavioral disease in school-age children. Its neurobiological classification is meaningful for clinicians. The existing ADHD classification methods suffer from two problems, i.e., insufficient data and noise disturbance. Here, a high-accuracy classification method is proposed, which uses brain Functional Connectivity (FC) as material for ADHD feature analysis. In detail, we introduce a binary hypothesis testing framework as the classification outline to cope with insufficient data of ADHD database. Under binary hypotheses, the FCs of test data are allowed to use for training and thus affect the subspace learning of training data. To overcome noise disturbance, an l2,1-norm LDA model is adopted to robustly learn ADHD features in subspaces. The subspace energies of training data under binary hypotheses are then calculated, and an energy-based comparison is finally performed to identify ADHD individuals. On the platform of ADHD-200 database, the experiments show our method outperforms other state-of-the-art methods with the significant average accuracy of 97.6%.
注意缺陷多动障碍(ADHD)是学龄儿童中一种高发的神经行为疾病。其神经生物学分类对临床医生有重要意义。现有的ADHD分类方法存在数据不足和噪声干扰两大问题。本文提出了一种使用脑功能连接(FC)作为ADHD特征分析材料的高精度分类方法。为了解决ADHD数据库数据不足的问题,我们引入了一个二元假设检验框架作为分类大纲。在二元假设下,允许使用测试数据的fc进行训练,从而影响训练数据的子空间学习。为了克服噪声干扰,采用l2,1范数LDA模型在子空间中鲁棒学习ADHD特征。然后计算二元假设下训练数据的子空间能量,最后进行基于能量的比较来识别ADHD个体。在ADHD-200数据库平台上,实验结果表明,该方法的平均准确率达到97.6%,明显优于其他先进的方法。
{"title":"High-Accuracy Classification of Attention Deficit Hyperactivity Disorder with L2,1-Norm Linear Discriminant Analysis","authors":"Yibin Tang, Xufei Li, Ying Chen, Y. Zhong, A. Jiang, Xiaofeng Liu","doi":"10.1109/ICASSP40776.2020.9053391","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053391","url":null,"abstract":"Attention Deficit Hyperactivity Disorder (ADHD) is a high incidence of neurobehavioral disease in school-age children. Its neurobiological classification is meaningful for clinicians. The existing ADHD classification methods suffer from two problems, i.e., insufficient data and noise disturbance. Here, a high-accuracy classification method is proposed, which uses brain Functional Connectivity (FC) as material for ADHD feature analysis. In detail, we introduce a binary hypothesis testing framework as the classification outline to cope with insufficient data of ADHD database. Under binary hypotheses, the FCs of test data are allowed to use for training and thus affect the subspace learning of training data. To overcome noise disturbance, an l2,1-norm LDA model is adopted to robustly learn ADHD features in subspaces. The subspace energies of training data under binary hypotheses are then calculated, and an energy-based comparison is finally performed to identify ADHD individuals. On the platform of ADHD-200 database, the experiments show our method outperforms other state-of-the-art methods with the significant average accuracy of 97.6%.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"108 1","pages":"1170-1174"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87589816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Redundant Convolutional Network With Attention Mechanism For Monaural Speech Enhancement 基于注意机制的冗余卷积网络单词语音增强
Tian Lan, Yilan Lyu, Guoqiang Hui, Refuoe Mokhosi, Sen Li, Qiao Liu
The redundant convolutional encoder-decoder network has proven useful in speech enhancement tasks. It can capture localized time-frequency details of speech signals through both the fully convolutional network structure and feature selection capability resulting from the encoder-decoder mechanism. However, it does not explicitly consider the signal filtering mechanism, which we regard as important for speech enhancement models. In this study, we introduce an attention mechanism into the convolutional encoderdecoder model. This mechanism adaptively filters channelwise feature responses by explicitly modeling attentions (on speech versus noise signals) between channels. Experimental results show that the proposed attention model is effective in capturing speech signals from background noise, and performs especially better in unseen noise conditions compared to other state-of-the-art models.
冗余卷积编解码器网络已被证明在语音增强任务中是有用的。通过全卷积网络结构和编码器-解码器机制产生的特征选择能力,可以捕获语音信号的局部时频细节。然而,它没有明确考虑信号滤波机制,我们认为这对语音增强模型很重要。在本研究中,我们将注意机制引入卷积编解码器模型。该机制通过显式建模信道之间的注意力(语音与噪声信号)自适应地过滤信道特征响应。实验结果表明,所提出的注意模型能够有效地从背景噪声中捕获语音信号,特别是在不可见噪声条件下的表现优于现有的注意模型。
{"title":"Redundant Convolutional Network With Attention Mechanism For Monaural Speech Enhancement","authors":"Tian Lan, Yilan Lyu, Guoqiang Hui, Refuoe Mokhosi, Sen Li, Qiao Liu","doi":"10.1109/ICASSP40776.2020.9053277","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053277","url":null,"abstract":"The redundant convolutional encoder-decoder network has proven useful in speech enhancement tasks. It can capture localized time-frequency details of speech signals through both the fully convolutional network structure and feature selection capability resulting from the encoder-decoder mechanism. However, it does not explicitly consider the signal filtering mechanism, which we regard as important for speech enhancement models. In this study, we introduce an attention mechanism into the convolutional encoderdecoder model. This mechanism adaptively filters channelwise feature responses by explicitly modeling attentions (on speech versus noise signals) between channels. Experimental results show that the proposed attention model is effective in capturing speech signals from background noise, and performs especially better in unseen noise conditions compared to other state-of-the-art models.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"6654-6658"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87751460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1