首页 > 最新文献

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Preconditioned Ghost Imaging Via Sparsity Constraint 通过稀疏性约束的预条件鬼影成像
Zhishen Tong, Jian Wang, Shensheng Han
Ghost imaging via sparsity constraint (GISC) can recover objects from the intensity fluctuation of light fields at a sampling rate far below the Nyquist rate. However, its imaging quality may degrade severely when the coherence of sampling matrices is large. To deal with this issue, we propose an efficient recovery algorithm for GISC called the preconditioned multiple orthogonal least squares (PmOLS). Our algorithm consists of two major parts: i) the pseudo-inverse preconditioning (PIP) method refining the coherence of sampling matrices and ii) the multiple orthogonal least squares (mOLS) algorithm recovering the objects. Theoretical analysis shows that PmOLS recovers any n-dimensional K-sparse signal from m random linear samples of the signal with probability exceeding $1 - 3{n^2}{e^{ - cm/{K^2}}}$. Simulations and experiments demonstrate that PmOLS has competitive imaging quality compared to the state-of-the-art approaches.
利用稀疏性约束(GISC)的鬼影成像可以以远低于奈奎斯特速率的采样率从光场强度波动中恢复目标。然而,当采样矩阵的相干性较大时,其成像质量会严重下降。为了解决这个问题,我们提出了一种有效的GISC恢复算法,称为预处理多重正交最小二乘(pols)。我们的算法包括两个主要部分:1)伪逆预处理(pseudo-inverse preconditioning, PIP)方法改进采样矩阵的相干性;2)多重正交最小二乘(multiple orthogonal least squares, mOLS)算法恢复目标。理论分析表明,pols从信号的m个随机线性样本中恢复任意n维K稀疏信号,其概率超过$1 - 3{n^2}{e^{- cm/{K^2}}}$。仿真和实验表明,与最先进的方法相比,pmools具有竞争力的成像质量。
{"title":"Preconditioned Ghost Imaging Via Sparsity Constraint","authors":"Zhishen Tong, Jian Wang, Shensheng Han","doi":"10.1109/ICASSP40776.2020.9053414","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053414","url":null,"abstract":"Ghost imaging via sparsity constraint (GISC) can recover objects from the intensity fluctuation of light fields at a sampling rate far below the Nyquist rate. However, its imaging quality may degrade severely when the coherence of sampling matrices is large. To deal with this issue, we propose an efficient recovery algorithm for GISC called the preconditioned multiple orthogonal least squares (PmOLS). Our algorithm consists of two major parts: i) the pseudo-inverse preconditioning (PIP) method refining the coherence of sampling matrices and ii) the multiple orthogonal least squares (mOLS) algorithm recovering the objects. Theoretical analysis shows that PmOLS recovers any n-dimensional K-sparse signal from m random linear samples of the signal with probability exceeding $1 - 3{n^2}{e^{ - cm/{K^2}}}$. Simulations and experiments demonstrate that PmOLS has competitive imaging quality compared to the state-of-the-art approaches.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"246 1","pages":"1484-1488"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79349791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning 基于情态不可知元学习的端到端语音到文本翻译
S. Indurthi, HyoJung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim
Collecting large amounts of data to train end-to-end Speech Translation (ST) models is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where the ST task severely lacks data. In the meta-learning phase, parameters are updated in such a way that they act as a good ini-tialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.
与ASR和MT任务相比,收集大量数据来训练端到端语音翻译(ST)模型更为困难。先前的研究已经提出使用迁移学习方法来克服上述困难。这些方法受益于弱监督训练数据,如ASR语音到文本或MT文本到文本翻译对。然而,这些模型中的参数是独立于每个任务更新的,这可能导致次优解。在这项工作中,我们采用元学习算法来训练一个模态不可知的多任务模型,该模型将知识从源任务=ASR+MT转移到目标任务=ST,其中ST任务严重缺乏数据。在元学习阶段,参数以这样一种方式更新,即它们作为目标ST任务的良好初始化点。我们评估了基于多语言语音翻译语料库(MuST-C)中英德(En-De)和英法(En-Fr)语言对的元学习方法。我们的方法优于以前的迁移学习方法,并为En-De和En-Fr ST任务设置了新的最先进的结果,分别获得了9.18和11.76个BLEU点改进。
{"title":"End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning","authors":"S. Indurthi, HyoJung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim","doi":"10.1109/ICASSP40776.2020.9054759","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054759","url":null,"abstract":"Collecting large amounts of data to train end-to-end Speech Translation (ST) models is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where the ST task severely lacks data. In the meta-learning phase, parameters are updated in such a way that they act as a good ini-tialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"145 6 1","pages":"7904-7908"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79392323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Indoor Altitude Estimation of Unmanned Aerial Vehicles Using a Bank of Kalman Filters 基于卡尔曼滤波器的无人机室内高度估计
Liu Yang, Hechuan Wang, Yousef El-Laham, J. Fonte, David Trillo Pérez, M. Bugallo
Altitude estimation is important for successful control and navigation of unmanned aerial vehicles (UAVs). UAVs do not have indoor access to GPS signals and can only use on-board sensors for reliable estimation of altitude. Unfortunately, most existing navigation schemes are not robust to the presence of abnormal obstructions above and below the UAV. In this work, we propose a novel strategy for tackling the altitude estimation problem that utilizes multiple model adaptive estimation (MMAE), where the candidate models correspond to four scenarios: no obstacles above and below the UAV; obstacles above the UAV; obstacles below the UAV; and obstacles above and below the UAV. The principle of Occam’s razor ensures that the model that offers the most parsimonious explanation of the sensor data has the most influence in the MMAE algorithm. We validate the proposed scheme on synthetic and real sensor data.
高度估计对无人机的控制和导航具有重要意义。无人机没有室内访问GPS信号,只能使用机载传感器进行可靠的高度估计。不幸的是,大多数现有的导航方案对无人机上方和下方异常障碍物的存在不具有鲁棒性。在这项工作中,我们提出了一种利用多模型自适应估计(MMAE)来解决高度估计问题的新策略,其中候选模型对应于四种场景:无人机上方和下方没有障碍物;无人机上方障碍物;无人机下方障碍物;以及无人机上方和下方的障碍物。奥卡姆剃刀原理确保了对传感器数据提供最简洁解释的模型在MMAE算法中具有最大的影响力。我们在合成数据和真实传感器数据上验证了所提出的方案。
{"title":"Indoor Altitude Estimation of Unmanned Aerial Vehicles Using a Bank of Kalman Filters","authors":"Liu Yang, Hechuan Wang, Yousef El-Laham, J. Fonte, David Trillo Pérez, M. Bugallo","doi":"10.1109/ICASSP40776.2020.9054203","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054203","url":null,"abstract":"Altitude estimation is important for successful control and navigation of unmanned aerial vehicles (UAVs). UAVs do not have indoor access to GPS signals and can only use on-board sensors for reliable estimation of altitude. Unfortunately, most existing navigation schemes are not robust to the presence of abnormal obstructions above and below the UAV. In this work, we propose a novel strategy for tackling the altitude estimation problem that utilizes multiple model adaptive estimation (MMAE), where the candidate models correspond to four scenarios: no obstacles above and below the UAV; obstacles above the UAV; obstacles below the UAV; and obstacles above and below the UAV. The principle of Occam’s razor ensures that the model that offers the most parsimonious explanation of the sensor data has the most influence in the MMAE algorithm. We validate the proposed scheme on synthetic and real sensor data.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 4-5 1","pages":"5455-5459"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84548241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On the Effect of Reflectance on Phasor Field Non-Line-of-Sight Imaging 反射率对相场非视距成像的影响
Ibón Guillén, Xiaochun Liu, A. Velten, D. Gutierrez, A. Jarabo
Non-line-of-sight (NLOS) imaging aims to visualize occluded scenes by exploiting indirect reflections on visible surfaces. Previous methods approach this problem by inverting the light transport on the hidden scene, but are limited to isolated, diffuse objects. The recently introduced phasor fields framework computationally poses NLOS reconstruction as a virtual line-of-sight (LOS) problem, lifting most assumptions about the hidden scene. In this work we complement recent theoretical analysis of phasor field-based reconstruction, by empirically analyzing the effect of reflectance of the hidden scenes on reconstruction. We experimentally study the reconstruction of hidden scenes composed of objects with increasingly specular materials. Then, we evaluate the effect of the virtual aperture size on the reconstruction, and establish connections between the effect of these two different dimensions on the results. We hope our analysis helps to characterize the imaging capabilities of this promising new framework, and foster new NLOS imaging modalities.
非视距成像(NLOS)旨在通过利用可见表面的间接反射来可视化被遮挡的场景。以前的方法通过反转隐藏场景上的光传输来解决这个问题,但仅限于孤立的、漫射的物体。最近引入的相量场框架将NLOS重建计算为虚拟视距(LOS)问题,取消了对隐藏场景的大多数假设。在这项工作中,我们通过实证分析隐藏场景的反射率对重建的影响,补充了最近基于相量场重建的理论分析。我们通过实验研究了由越来越多的镜面材料组成的隐藏场景的重建。然后,我们评估了虚拟孔径大小对重建的影响,并建立了这两个不同维度对重建结果的影响之间的联系。我们希望我们的分析有助于描述这个有前途的新框架的成像能力,并培养新的NLOS成像模式。
{"title":"On the Effect of Reflectance on Phasor Field Non-Line-of-Sight Imaging","authors":"Ibón Guillén, Xiaochun Liu, A. Velten, D. Gutierrez, A. Jarabo","doi":"10.1109/ICASSP40776.2020.9052985","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052985","url":null,"abstract":"Non-line-of-sight (NLOS) imaging aims to visualize occluded scenes by exploiting indirect reflections on visible surfaces. Previous methods approach this problem by inverting the light transport on the hidden scene, but are limited to isolated, diffuse objects. The recently introduced phasor fields framework computationally poses NLOS reconstruction as a virtual line-of-sight (LOS) problem, lifting most assumptions about the hidden scene. In this work we complement recent theoretical analysis of phasor field-based reconstruction, by empirically analyzing the effect of reflectance of the hidden scenes on reconstruction. We experimentally study the reconstruction of hidden scenes composed of objects with increasingly specular materials. Then, we evaluate the effect of the virtual aperture size on the reconstruction, and establish connections between the effect of these two different dimensions on the results. We hope our analysis helps to characterize the imaging capabilities of this promising new framework, and foster new NLOS imaging modalities.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"9269-9273"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84888628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
DGAN: Disentangled Representation Learning for Anisotropic BRDF Reconstruction 各向异性BRDF重建的解纠缠表示学习
Zhongyun Hu, Xue Wang, Qing Wang
Accurate reconstruction of real-world materials’ appearance from a very limited number of samples is still a huge challenge in computer vision and graphics. In this paper, we present a novel deep architecture, Disentangled Generative Adversarial Network (DGAN), which performs anisotropic Bidirectional Reflectance Distribution Function (BRDF) reconstruction from single BRDF subspace with the maximum entropy. In contrast to previous approaches that directly map known samples to a full BRDF using a CNN, a disentangled representation learning is applied to guide the reconstruction process. In order to learn different physical factors of the BRDF, the generator of the DGAN mainly consists of a fresnel estimator module (FEM) and a directional module (DM). Considering the fact that the entropy of different BRDF subspace varies, we further divide the BRDF into He-BRDF and Le-BRDF to reconstruct the interior part and the exterior part of the directional factor. Experimental results show that our approach outperforms state-of-the-art methods.
在计算机视觉和图形学中,从非常有限的样本中准确重建真实世界材料的外观仍然是一个巨大的挑战。在本文中,我们提出了一种新的深层结构——解纠缠生成对抗网络(disentanglement Generative Adversarial Network, DGAN),它从单个具有最大熵的BRDF子空间进行各向异性双向反射分布函数(BRDF)重建。与之前使用CNN直接将已知样本映射到完整BRDF的方法相反,该方法应用了解纠缠表示学习来指导重建过程。为了学习BRDF的不同物理因素,DGAN的生成器主要由菲涅耳估计器模块(FEM)和定向模块(DM)组成。考虑到不同BRDF子空间的熵是不同的,我们进一步将BRDF划分为He-BRDF和Le-BRDF,重建方向因子的内部和外部。实验结果表明,我们的方法优于最先进的方法。
{"title":"DGAN: Disentangled Representation Learning for Anisotropic BRDF Reconstruction","authors":"Zhongyun Hu, Xue Wang, Qing Wang","doi":"10.1109/ICASSP40776.2020.9054095","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054095","url":null,"abstract":"Accurate reconstruction of real-world materials’ appearance from a very limited number of samples is still a huge challenge in computer vision and graphics. In this paper, we present a novel deep architecture, Disentangled Generative Adversarial Network (DGAN), which performs anisotropic Bidirectional Reflectance Distribution Function (BRDF) reconstruction from single BRDF subspace with the maximum entropy. In contrast to previous approaches that directly map known samples to a full BRDF using a CNN, a disentangled representation learning is applied to guide the reconstruction process. In order to learn different physical factors of the BRDF, the generator of the DGAN mainly consists of a fresnel estimator module (FEM) and a directional module (DM). Considering the fact that the entropy of different BRDF subspace varies, we further divide the BRDF into He-BRDF and Le-BRDF to reconstruct the interior part and the exterior part of the directional factor. Experimental results show that our approach outperforms state-of-the-art methods.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"4397-4401"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84913573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Global Optimized Affine Registration Method for Microscopic Images of Biological Tissue 生物组织显微图像的鲁棒全局优化仿射配准方法
Yanan Lv, Xi Chen, Chang Shu, Hua Han
Affine registration can fit the non-rigid deformation of slices effectively, and it is widely used in volume reconstruction of biological tissue. But most of the existing affine registration methods are registered in a given sequence, which results in the accumulation of errors. In this paper, a global optimized affine registration method is proposed, which can be used in volume reconstruction. To eliminate the cumulative error, the affine transformation of all images is estimated simultaneously based on an energy function. A soft penalty on affine transformation is added to restrict the shearing of images. Experiments show that our method provides a more reliable registration result compared with sequential affine registration. It can solve the problems caused by the accumulation of errors. The registration result fits the deformation of slices well and preserves the rigidity of images.
仿射配准可以有效地拟合切片的非刚性变形,在生物组织的体积重建中得到了广泛的应用。但现有的仿射配准方法大多是按给定序列进行配准,导致误差累积。本文提出了一种全局优化仿射配准方法,可用于体重建。为了消除累积误差,基于能量函数同时估计所有图像的仿射变换。增加了仿射变换的软惩罚来限制图像的剪切。实验表明,与序列仿射配准相比,该方法的配准结果更加可靠。它可以解决由于错误积累而产生的问题。配准结果很好地拟合了切片的变形,保持了图像的刚性。
{"title":"Robust Global Optimized Affine Registration Method for Microscopic Images of Biological Tissue","authors":"Yanan Lv, Xi Chen, Chang Shu, Hua Han","doi":"10.1109/ICASSP40776.2020.9054568","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054568","url":null,"abstract":"Affine registration can fit the non-rigid deformation of slices effectively, and it is widely used in volume reconstruction of biological tissue. But most of the existing affine registration methods are registered in a given sequence, which results in the accumulation of errors. In this paper, a global optimized affine registration method is proposed, which can be used in volume reconstruction. To eliminate the cumulative error, the affine transformation of all images is estimated simultaneously based on an energy function. A soft penalty on affine transformation is added to restrict the shearing of images. Experiments show that our method provides a more reliable registration result compared with sequential affine registration. It can solve the problems caused by the accumulation of errors. The registration result fits the deformation of slices well and preserves the rigidity of images.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"1070-1074"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84935246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Time-Frequency Loss for CNN Based Speech Super-Resolution 基于CNN的语音超分辨率时频损失
Heming Wang, Deliang Wang
Speech super-resolution (SR), also called speech bandwidth extension (BWE), aims to increase the sampling rate of a given lower resolution speech signal. Recent years have witnessed the successful application of deep neural networks in time or frequency domains, and deep learning has improved the performance considerably compared with conventional approaches. This paper proposes an autoencoder based fully convolutional neural network (CNN) that merges the information from both time and frequency domains. At the training time, we optimize the CNN using a new time-frequency loss (T-F loss), which combines a time domain loss and a frequency domain loss. The experimental results show that our model trained with the T-F loss achieves significantly better results than other state-of-the-art models, and yields balanced performance in terms of time and frequency metrics.
语音超分辨率(SR)也称为语音带宽扩展(BWE),其目的是提高给定低分辨率语音信号的采样率。本文提出了一种基于自编码器的全卷积神经网络(CNN),该网络融合了时域和频域的信息。在训练时,我们使用一种新的时频损失(T-F损失)来优化CNN,它结合了时域损失和频域损失。实验结果表明,使用T-F损失训练的模型比其他最先进的模型取得了明显更好的结果,并且在时间和频率指标方面产生了平衡的性能。
{"title":"Time-Frequency Loss for CNN Based Speech Super-Resolution","authors":"Heming Wang, Deliang Wang","doi":"10.1109/ICASSP40776.2020.9053712","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053712","url":null,"abstract":"Speech super-resolution (SR), also called speech bandwidth extension (BWE), aims to increase the sampling rate of a given lower resolution speech signal. Recent years have witnessed the successful application of deep neural networks in time or frequency domains, and deep learning has improved the performance considerably compared with conventional approaches. This paper proposes an autoencoder based fully convolutional neural network (CNN) that merges the information from both time and frequency domains. At the training time, we optimize the CNN using a new time-frequency loss (T-F loss), which combines a time domain loss and a frequency domain loss. The experimental results show that our model trained with the T-F loss achieves significantly better results than other state-of-the-art models, and yields balanced performance in terms of time and frequency metrics.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"861-865"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84980801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Generalized Spatial Modulation for Wireless Terabits Systems Under Sub-THZ Channel With RF Impairments 具有射频损伤的亚太赫兹信道下无线太比特系统的广义空间调制
Majed Saad, F. Bader, A. Ghouwayel, Hussein Hijazi, Nizar Bouhel, J. Palicot
Multiple-Input Multiple-Output (MIMO) technique with Index Modulation (IM) over sub-TeraHertz (sub-THz) bands represent a promising solution to design new wireless ultrahigh data rate systems. However, the system design over sub-THz bands suffers from many technological limitations and severe RF-impairments such as low output power, limited resolution of high-speed low-power Analog-to-Digital Converters and important Phase Noise (PN) introduced by the Local Oscillator (LO). In this paper, different modulations schemes with Generalized Spatial Modulation (GSM) are compared from different perspectives while considering the sub-THz impairments. The effect of PN has been investigated for these modulation schemes in sub-THz channels using uniform linear and rectangular antenna arrays. The obtained results reveal that QPSK-GSM system is the best combination compared to GSM systems with any other Mary modulation scheme (e.g. PSK, DPSK, QAM, PAM). Compared to DQPSK-GSM and 4PAM-GSM at 12bpcu, same number of receive and activated transmit antennas, the QPSK-GSM system offers a gain ranging from 3.4 dB up to 5 dB. The results reveals that low to medium residual PN in distributed oscillator architecture can be tolerated when using GSM-QPSK without phase noise mitigation. Thus, enforcing the GSM to be a promising candidate for ultra-high wireless data rate communication in sub-THz bands.
在亚太赫兹(sub-THz)频段上采用指数调制(IM)的多输入多输出(MIMO)技术为设计新的无线超高数据速率系统提供了一种很有前途的解决方案。然而,在次太赫兹频段上的系统设计受到许多技术限制和严重的rf损伤,例如低输出功率,高速低功率模数转换器的有限分辨率以及本振(LO)引入的重要相位噪声(PN)。本文从不同的角度比较了广义空间调制(GSM)的不同调制方案,同时考虑了亚太赫兹的干扰。在亚太赫兹信道中,采用均匀线性和矩形天线阵列,研究了PN对这些调制方案的影响。结果表明,与GSM系统相比,QPSK-GSM系统是与其他多种调制方案(如PSK、DPSK、QAM、PAM)的最佳组合。与12bpcu的DQPSK-GSM和4PAM-GSM相比,相同数量的接收和激活发射天线,QPSK-GSM系统提供从3.4 dB到5 dB的增益范围。结果表明,在不进行相位噪声抑制的情况下,GSM-QPSK在分布式振荡器结构中可以容忍低到中等的剩余PN。因此,强制GSM成为亚太赫兹波段超高无线数据速率通信的一个有前途的候选者。
{"title":"Generalized Spatial Modulation for Wireless Terabits Systems Under Sub-THZ Channel With RF Impairments","authors":"Majed Saad, F. Bader, A. Ghouwayel, Hussein Hijazi, Nizar Bouhel, J. Palicot","doi":"10.1109/ICASSP40776.2020.9053208","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053208","url":null,"abstract":"Multiple-Input Multiple-Output (MIMO) technique with Index Modulation (IM) over sub-TeraHertz (sub-THz) bands represent a promising solution to design new wireless ultrahigh data rate systems. However, the system design over sub-THz bands suffers from many technological limitations and severe RF-impairments such as low output power, limited resolution of high-speed low-power Analog-to-Digital Converters and important Phase Noise (PN) introduced by the Local Oscillator (LO). In this paper, different modulations schemes with Generalized Spatial Modulation (GSM) are compared from different perspectives while considering the sub-THz impairments. The effect of PN has been investigated for these modulation schemes in sub-THz channels using uniform linear and rectangular antenna arrays. The obtained results reveal that QPSK-GSM system is the best combination compared to GSM systems with any other Mary modulation scheme (e.g. PSK, DPSK, QAM, PAM). Compared to DQPSK-GSM and 4PAM-GSM at 12bpcu, same number of receive and activated transmit antennas, the QPSK-GSM system offers a gain ranging from 3.4 dB up to 5 dB. The results reveals that low to medium residual PN in distributed oscillator architecture can be tolerated when using GSM-QPSK without phase noise mitigation. Thus, enforcing the GSM to be a promising candidate for ultra-high wireless data rate communication in sub-THz bands.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"5135-5139"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85198100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Gaussian Lpcnet for Multisample Speech Synthesis 多样本语音合成的高斯Lpcnet
Vadim Popov, M. Kudinov, T. Sadekova
LPCNet vocoder has recently been presented to TTS community and is now gaining increasing popularity due to its effectiveness and high quality of the speech synthesized with it. In this work, we present a modification of LPCNet that is 1.5x faster, has twice less non-zero parameters and synthesizes speech of the same quality. Such enhancement is possible mostly due to two features that we introduce into the original architecture: the proposed vocoder is designed to generate 16-bit signal instead of 8-bit µ-companded signal, and it predicts two consecutive excitation values at a time independently of each other. To show that these modifications do not lead to quality degradation we train models for five different languages and perform extensive human evaluation.
LPCNet声码器最近被介绍给TTS社区,由于它的有效性和高质量的语音合成而越来越受欢迎。在这项工作中,我们提出了一种改进的LPCNet,其速度提高了1.5倍,非零参数减少了两倍,并合成了相同质量的语音。这种增强是可能的,主要是因为我们在原始架构中引入了两个特征:所提出的声码器被设计成生成16位信号,而不是8位微压缩信号,并且它一次独立地预测两个连续的激励值。为了证明这些修改不会导致质量下降,我们训练了五种不同语言的模型,并进行了广泛的人工评估。
{"title":"Gaussian Lpcnet for Multisample Speech Synthesis","authors":"Vadim Popov, M. Kudinov, T. Sadekova","doi":"10.1109/ICASSP40776.2020.9053337","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053337","url":null,"abstract":"LPCNet vocoder has recently been presented to TTS community and is now gaining increasing popularity due to its effectiveness and high quality of the speech synthesized with it. In this work, we present a modification of LPCNet that is 1.5x faster, has twice less non-zero parameters and synthesizes speech of the same quality. Such enhancement is possible mostly due to two features that we introduce into the original architecture: the proposed vocoder is designed to generate 16-bit signal instead of 8-bit µ-companded signal, and it predicts two consecutive excitation values at a time independently of each other. To show that these modifications do not lead to quality degradation we train models for five different languages and perform extensive human evaluation.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"32 1","pages":"6204-6208"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85209595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Robust Hybrid Beamforming for Satellite-Terrestrial Integrated Networks 星地融合网络的鲁棒混合波束形成
Zhi Lin, Min Lin, B. Champagne, Wei-Ping Zhu, N. Al-Dhahir
In this paper, we propose a novel robust downlink beamforming (BF) design for satellite-terrestrial integrated networks. Under a realistic assumption that the angular information of eavesdroppers is not perfectly known, we establish an optimization framework for hybrid BF at the terrestrial base station and digital BF at the satellite to maximize the secrecy-energy efficiency of the system, while satisfying the quality-of-service constraints of both earth station and cellular user. Since the formulated optimization problem is mathematically intractable, we present an iterative algorithm based on the Charnes-Cooper approach to optimize the BF weight vectors. The effectiveness and superiority of the proposed robust hybrid BF scheme are validated via computer simulations.
本文提出了一种新的星地一体化下行波束形成(BF)设计方案。在窃听者角度信息不完全已知的现实假设下,为了在满足地面站和蜂窝用户服务质量约束的前提下,最大限度地提高系统的保密能源效率,建立了地面基站混合BF和卫星数字BF的优化框架。由于公式优化问题在数学上难以解决,我们提出了一种基于Charnes-Cooper方法的迭代算法来优化BF权重向量。通过计算机仿真验证了所提出的鲁棒混合BF方案的有效性和优越性。
{"title":"Robust Hybrid Beamforming for Satellite-Terrestrial Integrated Networks","authors":"Zhi Lin, Min Lin, B. Champagne, Wei-Ping Zhu, N. Al-Dhahir","doi":"10.1109/ICASSP40776.2020.9053756","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053756","url":null,"abstract":"In this paper, we propose a novel robust downlink beamforming (BF) design for satellite-terrestrial integrated networks. Under a realistic assumption that the angular information of eavesdroppers is not perfectly known, we establish an optimization framework for hybrid BF at the terrestrial base station and digital BF at the satellite to maximize the secrecy-energy efficiency of the system, while satisfying the quality-of-service constraints of both earth station and cellular user. Since the formulated optimization problem is mathematically intractable, we present an iterative algorithm based on the Charnes-Cooper approach to optimize the BF weight vectors. The effectiveness and superiority of the proposed robust hybrid BF scheme are validated via computer simulations.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"8792-8796"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85228409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1