首页 > 最新文献

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Full-Duplex Multifunction Transceiver with Joint Constant Envelope Transmission and Wideband Reception 具有联合恒包络传输和宽带接收的全双工多功能收发器
Jaakko Marin, Micael Bernhardt, T. Riihonen
This paper introduces and justifies a novel system concept that consists of full-duplex transceivers and uses a multifunction signal for simultaneous two-way communication, jamming and sensing tasks. The proposed device structure and wave-form enable simple-yet-effective interference suppression at the cost of being limited to constant-envelope transmission— this is a weakness only for the communication functionality that becomes limited to frequency-shift keying (FSK) while frequency-modulated continuous wave (FMCW) waveforms are effective for jamming and sensing purposes. We show how the transmission and reception as well as different interference and distortion compensation procedures are implemented in such multifunction transceivers. The system could be also applied for simultaneous spectrum monitoring with the above functions. Finally, we showcase the expected performance of such a system through numerical results.
本文介绍并论证了一种由全双工收发器组成的新型系统概念,该系统使用多功能信号同时进行双向通信、干扰和传感任务。所提出的器件结构和波形能够以限制在恒定包络传输的代价实现简单而有效的干扰抑制-这是仅限移频键控(FSK)通信功能的弱点,而调频连续波(FMCW)波形对于干扰和传感目的是有效的。我们展示了如何在这种多功能收发器中实现发射和接收以及不同的干扰和失真补偿程序。该系统具有以上功能,也可用于同时进行频谱监测。最后,我们通过数值结果展示了该系统的预期性能。
{"title":"Full-Duplex Multifunction Transceiver with Joint Constant Envelope Transmission and Wideband Reception","authors":"Jaakko Marin, Micael Bernhardt, T. Riihonen","doi":"10.1109/ICASSP39728.2021.9413725","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413725","url":null,"abstract":"This paper introduces and justifies a novel system concept that consists of full-duplex transceivers and uses a multifunction signal for simultaneous two-way communication, jamming and sensing tasks. The proposed device structure and wave-form enable simple-yet-effective interference suppression at the cost of being limited to constant-envelope transmission— this is a weakness only for the communication functionality that becomes limited to frequency-shift keying (FSK) while frequency-modulated continuous wave (FMCW) waveforms are effective for jamming and sensing purposes. We show how the transmission and reception as well as different interference and distortion compensation procedures are implemented in such multifunction transceivers. The system could be also applied for simultaneous spectrum monitoring with the above functions. Finally, we showcase the expected performance of such a system through numerical results.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127918619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
UTDN: An Unsupervised Two-Stream Dirichlet-Net for Hyperspectral Unmixing UTDN:用于高光谱解混的无监督双流Dirichlet-Net
Qiwen Jin, Yong Ma, Xiaoguang Mei, Hao Li, Jiayi Ma
Recently, the learning-based method has received much attention in the unsupervised hyperspectral unmixing, yet their ability to extract physically meaningful endmembers remains limited and the performance has not been satisfactory. In this paper, we propose a novel two-stream Dirichlet-net, termed as uTDN, to address the above problems. The weight-sharing architecture makes it possible to transfer the intrinsic properties of the endmembers during the process of unmixing, which can help to correct the network converging towards a more accurate and interpretable unmixing solution. Besides, the stick-breaking process is adopted to encourage the latent representation to follow a Dirichlet distribution, where the physical property of the estimated abundance can be naturally incorporated. Extensive experiments on both synthetic and real hyperspectral data demonstrate that the proposed uTDN can outperform the other state-of-the-art approaches.
近年来,基于学习的方法在无监督高光谱解混中受到了广泛的关注,但其提取物理意义端元的能力有限,性能也不理想。在本文中,我们提出了一种新的双流Dirichlet-net,称为uTDN,以解决上述问题。权重共享架构使得在解混过程中转移端元的固有属性成为可能,这有助于纠正网络向更准确和可解释的解混解决方案收敛。此外,采用断棒过程鼓励潜在表示遵循狄利克雷分布,其中估计丰度的物理性质可以自然地纳入。在合成和真实高光谱数据上进行的大量实验表明,所提出的uTDN可以优于其他最先进的方法。
{"title":"UTDN: An Unsupervised Two-Stream Dirichlet-Net for Hyperspectral Unmixing","authors":"Qiwen Jin, Yong Ma, Xiaoguang Mei, Hao Li, Jiayi Ma","doi":"10.1109/ICASSP39728.2021.9414810","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414810","url":null,"abstract":"Recently, the learning-based method has received much attention in the unsupervised hyperspectral unmixing, yet their ability to extract physically meaningful endmembers remains limited and the performance has not been satisfactory. In this paper, we propose a novel two-stream Dirichlet-net, termed as uTDN, to address the above problems. The weight-sharing architecture makes it possible to transfer the intrinsic properties of the endmembers during the process of unmixing, which can help to correct the network converging towards a more accurate and interpretable unmixing solution. Besides, the stick-breaking process is adopted to encourage the latent representation to follow a Dirichlet distribution, where the physical property of the estimated abundance can be naturally incorporated. Extensive experiments on both synthetic and real hyperspectral data demonstrate that the proposed uTDN can outperform the other state-of-the-art approaches.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128177823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Detecting Alzheimer’s Disease from Speech Using Neural Networks with Bottleneck Features and Data Augmentation 基于瓶颈特征和数据增强的神经网络语音检测阿尔茨海默病
Zhaoci Liu, Zhiqiang Guo, Zhenhua Ling, Yunxia Li
This paper presents a method of detecting Alzheimer’s disease (AD) from the spontaneous speech of subjects in a picture description task using neural networks. This method does not rely on the manual transcriptions and annotations of a subject’s speech, but utilizes the bottleneck features extracted from audio using an ASR model. The neural network contains convolutional neural network (CNN) layers for local context modeling, bidirectional long shortterm memory (BiLSTM) layers for global context modeling and an attention pooling layer for classification. Furthermore, a masking- based data augmentation method is designed to deal with the data scarcity problem. Experiments on the DementiaBank dataset show that the detection accuracy of our proposed method is 82.59%, which is better than the baseline method based on manually-designed acoustic features and support vector machines (SVM), and achieves the state-of-the-art performance of detecting AD using only audio data on this dataset.
本文提出了一种利用神经网络从图片描述任务中受试者的自发言语中检测阿尔茨海默病的方法。该方法不依赖于受试者语音的手动转录和注释,而是利用使用ASR模型从音频中提取的瓶颈特征。该神经网络包含卷积神经网络(CNN)层用于局部上下文建模,双向长短期记忆(BiLSTM)层用于全局上下文建模,注意池层用于分类。在此基础上,设计了一种基于掩蔽的数据增强方法来解决数据稀缺性问题。在DementiaBank数据集上的实验表明,该方法的检测准确率为82.59%,优于基于人工设计声学特征和支持向量机(SVM)的基线方法,达到了仅使用该数据集上的音频数据检测AD的最先进性能。
{"title":"Detecting Alzheimer’s Disease from Speech Using Neural Networks with Bottleneck Features and Data Augmentation","authors":"Zhaoci Liu, Zhiqiang Guo, Zhenhua Ling, Yunxia Li","doi":"10.1109/ICASSP39728.2021.9413566","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413566","url":null,"abstract":"This paper presents a method of detecting Alzheimer’s disease (AD) from the spontaneous speech of subjects in a picture description task using neural networks. This method does not rely on the manual transcriptions and annotations of a subject’s speech, but utilizes the bottleneck features extracted from audio using an ASR model. The neural network contains convolutional neural network (CNN) layers for local context modeling, bidirectional long shortterm memory (BiLSTM) layers for global context modeling and an attention pooling layer for classification. Furthermore, a masking- based data augmentation method is designed to deal with the data scarcity problem. Experiments on the DementiaBank dataset show that the detection accuracy of our proposed method is 82.59%, which is better than the baseline method based on manually-designed acoustic features and support vector machines (SVM), and achieves the state-of-the-art performance of detecting AD using only audio data on this dataset.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128196130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Decomposing Textures using Exponential Analysis 使用指数分析分解纹理
Yuan Hou, A. Cuyt, Wen-shin Lee, Deepayan Bhowmik
Decomposition is integral to most image processing algorithms and often required in texture analysis. We present a new approach using a recent 2-dimensional exponential analysis technique. Exponential analysis offers the advantage of sparsity in the model and continuity in the parameters. This results in a much more compact representation of textures when compared to traditional Fourier or wavelet transform techniques. Our experiments include synthetic as well as real texture images from standard benchmark datasets. The results outperform FFT in representing texture patterns with significantly fewer terms while retaining RMSE values after reconstruction. The underlying periodic complex exponential model works best for texture patterns that are homogeneous. We demonstrate the usefulness of the method in two common vision processing application examples, namely texture classification and defect detection.
分解是大多数图像处理算法的组成部分,在纹理分析中也经常需要分解。我们提出了一种利用最近的二维指数分析技术的新方法。指数分析具有模型稀疏性和参数连续性的优点。与传统的傅立叶变换或小波变换技术相比,这使得纹理的表示更加紧凑。我们的实验包括来自标准基准数据集的合成纹理图像和真实纹理图像。结果在用更少的词表示纹理模式方面优于FFT,同时保留重建后的RMSE值。底层周期性复指数模型最适合于均匀的纹理模式。通过纹理分类和缺陷检测两种常见的视觉处理应用实例,验证了该方法的有效性。
{"title":"Decomposing Textures using Exponential Analysis","authors":"Yuan Hou, A. Cuyt, Wen-shin Lee, Deepayan Bhowmik","doi":"10.1109/ICASSP39728.2021.9413909","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413909","url":null,"abstract":"Decomposition is integral to most image processing algorithms and often required in texture analysis. We present a new approach using a recent 2-dimensional exponential analysis technique. Exponential analysis offers the advantage of sparsity in the model and continuity in the parameters. This results in a much more compact representation of textures when compared to traditional Fourier or wavelet transform techniques. Our experiments include synthetic as well as real texture images from standard benchmark datasets. The results outperform FFT in representing texture patterns with significantly fewer terms while retaining RMSE values after reconstruction. The underlying periodic complex exponential model works best for texture patterns that are homogeneous. We demonstrate the usefulness of the method in two common vision processing application examples, namely texture classification and defect detection.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115856615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Periodic Signal Denoising: An Analysis-Synthesis Framework Based on Ramanujan Filter Banks and Dictionaries 周期信号去噪:基于拉马努金滤波器组和字典的分析-综合框架
Pranav Kulkarni, P. Vaidyanathan
Ramanujan filter banks (RFB) have in the past been used to identify periodicities in data. These are analysis filter banks with no synthesis counterpart for perfect reconstruction of the original signal, so they have not been useful for denoising periodic signals. This paper proposes to use a hybrid analysis-synthesis framework for denoising discrete-time periodic signals. The synthesis occurs via a pruned dictionary designed based on the output energies of the RFB analysis filters. A unique property of the framework is that the denoised output signal is guaranteed to be periodic unlike any of the other methods. For a large range of input noise levels, the proposed approach achieves a stable and high SNR gain outperforming many traditional denoising techniques.
拉马努金滤波器组(RFB)过去被用于识别数据中的周期性。这些是分析滤波器组,没有合成对应的原始信号的完美重建,因此它们对去噪周期信号没有用处。本文提出了一种用于离散周期信号去噪的混合分析-综合框架。合成通过基于RFB分析滤波器的输出能量设计的修剪字典进行。该框架的一个独特特性是,与其他方法不同,去噪后的输出信号保证是周期性的。对于大范围的输入噪声水平,该方法实现了稳定的高信噪比增益,优于许多传统的去噪技术。
{"title":"Periodic Signal Denoising: An Analysis-Synthesis Framework Based on Ramanujan Filter Banks and Dictionaries","authors":"Pranav Kulkarni, P. Vaidyanathan","doi":"10.1109/ICASSP39728.2021.9413689","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413689","url":null,"abstract":"Ramanujan filter banks (RFB) have in the past been used to identify periodicities in data. These are analysis filter banks with no synthesis counterpart for perfect reconstruction of the original signal, so they have not been useful for denoising periodic signals. This paper proposes to use a hybrid analysis-synthesis framework for denoising discrete-time periodic signals. The synthesis occurs via a pruned dictionary designed based on the output energies of the RFB analysis filters. A unique property of the framework is that the denoised output signal is guaranteed to be periodic unlike any of the other methods. For a large range of input noise levels, the proposed approach achieves a stable and high SNR gain outperforming many traditional denoising techniques.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131985440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Large-Scale Chinese Long-Text Extractive Summarization Corpus 大型中文长文本抽取摘要语料库
Kai Chen, Guanyu Fu, Qingcai Chen, Baotian Hu
Recently, large-scale datasets have vastly facilitated the development in nearly domains of Natural Language Processing. However, lacking large scale Chinese corpus is still a critical bottleneck for further research on deep text summarization methods. In this paper, we publish a large-scale Chinese Long-text Extractive Summarization corpus named CLES. The CLES contains about 104K pairs, which is originally collected from Sina Weibo1. To verify the quality of the corpus, we also manually tagged the relevance score of 5,000 pairs. Our benchmark models on the proposed corpus include conventional deep learning based extractive models and several pre-trained Bert-based algorithms. Their performances are reported and briefly analyzed to facilitate further research on the corpus. We will release this corpus for further research2.
近年来,大规模数据集极大地促进了自然语言处理领域的发展。然而,缺乏大规模的中文语料库仍然是制约深度文本摘要方法进一步研究的关键瓶颈。在本文中,我们发布了一个大型中文长文本抽取摘要语料库CLES。cle包含约104K双,这些数据最初来自新浪微博。为了验证语料库的质量,我们还手动标记了5000对的相关分数。我们提出的语料库的基准模型包括传统的基于深度学习的提取模型和几种预训练的基于bert的算法。本文报道并简要分析了它们的性能,以促进语料库的进一步研究。我们将发布该语料库以供进一步研究。
{"title":"A Large-Scale Chinese Long-Text Extractive Summarization Corpus","authors":"Kai Chen, Guanyu Fu, Qingcai Chen, Baotian Hu","doi":"10.1109/ICASSP39728.2021.9414946","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414946","url":null,"abstract":"Recently, large-scale datasets have vastly facilitated the development in nearly domains of Natural Language Processing. However, lacking large scale Chinese corpus is still a critical bottleneck for further research on deep text summarization methods. In this paper, we publish a large-scale Chinese Long-text Extractive Summarization corpus named CLES. The CLES contains about 104K pairs, which is originally collected from Sina Weibo1. To verify the quality of the corpus, we also manually tagged the relevance score of 5,000 pairs. Our benchmark models on the proposed corpus include conventional deep learning based extractive models and several pre-trained Bert-based algorithms. Their performances are reported and briefly analyzed to facilitate further research on the corpus. We will release this corpus for further research2.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132211851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Drawing Order Recovery from Trajectory Components 从轨迹组件中恢复绘制顺序
Minghao Yang, Xukang Zhou, Yangchang Sun, Jinglong Chen, Baohua Qiang
In spite of widely discussed, drawing order recovery (DOR) from static images is still a great challenge task. Based on the idea that drawing trajectories are able to be recovered by connecting their trajectory components in correct orders, this work proposes a novel DOR method from static images. The method contains two steps: firstly, we adopt a convolution neural network (CNN) to predict the next possible drawing components, which is able to covert the components in images to their reasonable sequences. We denote this architecture as Im2Seq-CNN; secondly, considering possible errors exist in the reasonable sequences generated by the first step, we construct a sequence to order structure (Seq2Order) to adjust the sequences to the correct orders. The main contributions include: (1) the Img2Seq-CNN step considers DOR from components instead of traditional pixels one by one along trajectories, which contributes to static images to component sequences; (2) the Seq2Order step adopts image position codes instead of traditional points’ coordinates in its encoder-decoder gated recurrent neural network (GRU-RNN). The proposed method is experienced on two well-known open handwriting databases, and yields robust and competitive results on handwriting DOR tasks compared to the state-of-arts.
静态图像的图序恢复(DOR)虽然被广泛讨论,但仍然是一项具有挑战性的任务。基于将轨迹分量按正确顺序连接起来即可恢复绘制轨迹的思想,本文提出了一种基于静态图像的DOR方法。该方法包含两个步骤:首先,我们采用卷积神经网络(CNN)来预测下一个可能的绘制组件,该组件能够将图像中的组件转换为其合理的序列;我们将这个架构命名为Im2Seq-CNN;其次,考虑到第一步生成的合理序列可能存在误差,构造了序列到顺序结构(Seq2Order),将序列调整为正确的顺序。主要贡献包括:(1)Img2Seq-CNN步骤从组件考虑DOR,而不是沿着轨迹逐个考虑传统像素,这有助于将静态图像转化为组件序列;(2) Seq2Order步骤在其编解码器门控递归神经网络(GRU-RNN)中采用图像位置码代替传统的点坐标。本文提出的方法在两个知名的开放手写数据库上进行了实验,与现有方法相比,在手写DOR任务上产生了鲁棒性和竞争性的结果。
{"title":"Drawing Order Recovery from Trajectory Components","authors":"Minghao Yang, Xukang Zhou, Yangchang Sun, Jinglong Chen, Baohua Qiang","doi":"10.1109/ICASSP39728.2021.9413542","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413542","url":null,"abstract":"In spite of widely discussed, drawing order recovery (DOR) from static images is still a great challenge task. Based on the idea that drawing trajectories are able to be recovered by connecting their trajectory components in correct orders, this work proposes a novel DOR method from static images. The method contains two steps: firstly, we adopt a convolution neural network (CNN) to predict the next possible drawing components, which is able to covert the components in images to their reasonable sequences. We denote this architecture as Im2Seq-CNN; secondly, considering possible errors exist in the reasonable sequences generated by the first step, we construct a sequence to order structure (Seq2Order) to adjust the sequences to the correct orders. The main contributions include: (1) the Img2Seq-CNN step considers DOR from components instead of traditional pixels one by one along trajectories, which contributes to static images to component sequences; (2) the Seq2Order step adopts image position codes instead of traditional points’ coordinates in its encoder-decoder gated recurrent neural network (GRU-RNN). The proposed method is experienced on two well-known open handwriting databases, and yields robust and competitive results on handwriting DOR tasks compared to the state-of-arts.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132349415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Structure-Guided and Sparse-Representation-Based 3d Seismic Inversion Method 一种结构导向的稀疏表示三维地震反演方法
B. She, Yaojun Wang, Guang Hu
Existing seismic inversion methods are usually 1D, mainly focusing on improving the vertical resolution of inversion results. A few 2D or 3D inversion techniques are either too simple and lack the consideration of stratigraphic structures, or are too complicated which need to extract dip information and solve a complex constrained optimization problem. In this work, with the help of gradient structure tensor (GST) and dictionary learning and sparse representation (DLSR) technologies, we propose a 3D inversion approach (GST-DLSR) that considers both vertical and horizontal structural constraints. In the vertical direction, we investigate the vertical structural features of subsurface models from well-log data by DLSR. In the horizontal direction, we obtain the stratigraphic structural features from a 3D seismic image by GST. We then apply the acquired structural features to constraint the entire inversion procedure. The experiments show that GST-DLSR takes good advantages of both techniques, enabling to produce inversion results with high resolution, good lateral continuity, and enhanced structural features.
现有的地震反演方法通常是一维的,主要侧重于提高反演结果的垂向分辨率。一些二维或三维反演技术要么过于简单,缺乏对地层结构的考虑,要么过于复杂,需要提取倾角信息,求解复杂的约束优化问题。在这项工作中,借助梯度结构张量(GST)和字典学习和稀疏表示(DLSR)技术,我们提出了一种考虑垂直和水平结构约束的3D反演方法(GST-DLSR)。在垂向上,利用DLSR研究了测井资料中地下模型的垂向构造特征。在水平方向上,利用GST从三维地震图像中获取地层构造特征。然后,我们应用获得的结构特征来约束整个反演过程。实验表明,GST-DLSR综合了两种技术的优点,反演结果分辨率高,横向连续性好,结构特征增强。
{"title":"A Structure-Guided and Sparse-Representation-Based 3d Seismic Inversion Method","authors":"B. She, Yaojun Wang, Guang Hu","doi":"10.1109/ICASSP39728.2021.9415071","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9415071","url":null,"abstract":"Existing seismic inversion methods are usually 1D, mainly focusing on improving the vertical resolution of inversion results. A few 2D or 3D inversion techniques are either too simple and lack the consideration of stratigraphic structures, or are too complicated which need to extract dip information and solve a complex constrained optimization problem. In this work, with the help of gradient structure tensor (GST) and dictionary learning and sparse representation (DLSR) technologies, we propose a 3D inversion approach (GST-DLSR) that considers both vertical and horizontal structural constraints. In the vertical direction, we investigate the vertical structural features of subsurface models from well-log data by DLSR. In the horizontal direction, we obtain the stratigraphic structural features from a 3D seismic image by GST. We then apply the acquired structural features to constraint the entire inversion procedure. The experiments show that GST-DLSR takes good advantages of both techniques, enabling to produce inversion results with high resolution, good lateral continuity, and enhanced structural features.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132366413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolving Quantized Neural Networks for Image Classification Using A Multi-Objective Genetic Algorithm 基于多目标遗传算法的演化量化神经网络图像分类
Yong Wang, Xiaojing Wang, Xiaoyu He
Recently, many model quantization approaches have been investigated to reduce the model size and improve the inference speed of convolutional neural networks (CNNs). However, these approaches usually inevitably lead to a decrease in classification accuracy. To address this problem, this paper proposes a mixed precision quantization method combined with channel expansion of CNNs by using a multi-objective genetic algorithm, called MOGAQNN. In MOGAQNN, each individual in the population is used to encode a mixed precision quantization policy and a channel expansion policy. During the evolution process, the two polices are optimized simultaneously by the non-dominated sorting genetic algorithm II (NSGA-II). Finally, we choose the best individual in the last population and evaluate its performance on the test set as the final performance. The experimental results of five popular CNNs on two benchmark datasets demonstrate that MOGAQNN can greatly reduce the model size and improve the classification accuracy at the same time.
近年来,人们研究了许多模型量化方法,以减小卷积神经网络(cnn)的模型尺寸,提高其推理速度。然而,这些方法通常不可避免地导致分类精度的降低。针对这一问题,本文提出了一种结合多目标遗传算法的cnn信道扩展的混合精度量化方法,称为MOGAQNN。在MOGAQNN中,使用群体中的每个个体编码混合精度量化策略和信道扩展策略。在进化过程中,两种策略通过非支配排序遗传算法II (NSGA-II)同时进行优化。最后,在最后一个总体中选择最优个体,并在测试集上评价其性能作为最终性能。五种流行的cnn在两个基准数据集上的实验结果表明,MOGAQNN在极大地减小模型尺寸的同时提高了分类精度。
{"title":"Evolving Quantized Neural Networks for Image Classification Using A Multi-Objective Genetic Algorithm","authors":"Yong Wang, Xiaojing Wang, Xiaoyu He","doi":"10.1109/ICASSP39728.2021.9413519","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413519","url":null,"abstract":"Recently, many model quantization approaches have been investigated to reduce the model size and improve the inference speed of convolutional neural networks (CNNs). However, these approaches usually inevitably lead to a decrease in classification accuracy. To address this problem, this paper proposes a mixed precision quantization method combined with channel expansion of CNNs by using a multi-objective genetic algorithm, called MOGAQNN. In MOGAQNN, each individual in the population is used to encode a mixed precision quantization policy and a channel expansion policy. During the evolution process, the two polices are optimized simultaneously by the non-dominated sorting genetic algorithm II (NSGA-II). Finally, we choose the best individual in the last population and evaluate its performance on the test set as the final performance. The experimental results of five popular CNNs on two benchmark datasets demonstrate that MOGAQNN can greatly reduce the model size and improve the classification accuracy at the same time.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132416133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System 基于混合建模单元的端到端语音识别系统改进研究
Shunfei Chen, Xinhui Hu, Sheng Li, Xinkang Xu
The acoustic modeling unit is crucial for an end-to-end speech recognition system, especially for the Mandarin language. Until now, most of the studies on Mandarin speech recognition focused on individual units, and few of them paid attention to using a combination of these units. This paper uses a hybrid of the syllable, Chinese character, and subword as the modeling units for the end-to-end speech recognition system based on the CTC/attention multi-task learning. In this approach, the character-subword unit is assigned to train the transformer model in the main task learning stage. In contrast, the syllable unit is assigned to enhance the transformer’s shared encoder in the auxiliary task stage with the Connectionist Temporal Classification (CTC) loss function. The recognition experiments were conducted on AISHELL-1 and an open data set of 1200-hour Mandarin speech corpus collected from the OpenSLR, respectively. The experimental results demonstrated that using the syllable-char-subword hybrid modeling unit can achieve better performances than the conventional units of char-subword, and 6.6% relative CER reduction on our 1200-hour data. The substitution error also achieves a considerable reduction.
声学建模单元对于端到端语音识别系统至关重要,特别是对于普通话。到目前为止,对普通话语音识别的研究大多集中在单个单位上,很少有研究关注这些单位的组合使用。本文采用音节、汉字和子词的混合模型作为基于CTC/注意多任务学习的端到端语音识别系统的建模单元。在该方法中,在主任务学习阶段,分配字符-子词单元来训练变压器模型。在辅助任务阶段,使用连接时间分类(Connectionist Temporal Classification, CTC)损失函数分配音节单位来增强变压器的共享编码器。识别实验分别在AISHELL-1和OpenSLR中收集的1200小时普通话语音语料库开放数据集上进行。实验结果表明,使用音节-字符-子词混合建模单元可以获得比传统的字符-子词混合建模单元更好的性能,在1200小时的数据上,相对CER降低了6.6%。替换误差也得到了相当大的减小。
{"title":"An Investigation of Using Hybrid Modeling Units for Improving End-to-End Speech Recognition System","authors":"Shunfei Chen, Xinhui Hu, Sheng Li, Xinkang Xu","doi":"10.1109/ICASSP39728.2021.9414598","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414598","url":null,"abstract":"The acoustic modeling unit is crucial for an end-to-end speech recognition system, especially for the Mandarin language. Until now, most of the studies on Mandarin speech recognition focused on individual units, and few of them paid attention to using a combination of these units. This paper uses a hybrid of the syllable, Chinese character, and subword as the modeling units for the end-to-end speech recognition system based on the CTC/attention multi-task learning. In this approach, the character-subword unit is assigned to train the transformer model in the main task learning stage. In contrast, the syllable unit is assigned to enhance the transformer’s shared encoder in the auxiliary task stage with the Connectionist Temporal Classification (CTC) loss function. The recognition experiments were conducted on AISHELL-1 and an open data set of 1200-hour Mandarin speech corpus collected from the OpenSLR, respectively. The experimental results demonstrated that using the syllable-char-subword hybrid modeling unit can achieve better performances than the conventional units of char-subword, and 6.6% relative CER reduction on our 1200-hour data. The substitution error also achieves a considerable reduction.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132462606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1