首页 > 最新文献

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

英文 中文
Fire Detection in H.264 Compressed Video H.264压缩视频中的火灾检测
Murat Muhammet Savci, Yasin Yildirim, Gorkem Saygili, B. U. Töreyin
In this paper, we propose a compressed domain fire detection algorithm using macroblock types and Markov Model in H.264 video. Compressed domain method does not require decoding to pixel domain, instead a syntax parser extracts syntax elements which are only available in compressed domain. Our method extracts only macroblock type and corresponding macroblock address information. Markov model with fire and non-fire models are evaluated using offline-trained data. Our experiments show that the algorithm is able to detect and identify fire event in compressed domain successfully, despite a small chunk of data is used in the process.
本文提出了一种基于宏块类型和马尔可夫模型的H.264视频压缩域火灾检测算法。压缩域方法不需要解码到像素域,而是由语法解析器提取仅在压缩域中可用的语法元素。我们的方法只提取宏块类型和相应的宏块地址信息。利用离线训练的数据对具有火灾模型和非火灾模型的马尔可夫模型进行了评估。实验结果表明,该算法能够成功地检测和识别压缩域中的火灾事件,尽管在此过程中使用的数据块很小。
{"title":"Fire Detection in H.264 Compressed Video","authors":"Murat Muhammet Savci, Yasin Yildirim, Gorkem Saygili, B. U. Töreyin","doi":"10.1109/ICASSP.2019.8683666","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683666","url":null,"abstract":"In this paper, we propose a compressed domain fire detection algorithm using macroblock types and Markov Model in H.264 video. Compressed domain method does not require decoding to pixel domain, instead a syntax parser extracts syntax elements which are only available in compressed domain. Our method extracts only macroblock type and corresponding macroblock address information. Markov model with fire and non-fire models are evaluated using offline-trained data. Our experiments show that the algorithm is able to detect and identify fire event in compressed domain successfully, despite a small chunk of data is used in the process.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"289 1","pages":"8310-8314"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84153498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Passive Detection and Discrimination of Body Movements in the sub-THz Band: A Case Study 亚太赫兹波段身体运动的被动检测和识别:一个案例研究
S. Kianoush, S. Savazzi, V. Rampa
Passive radio sensing technique is a well established research topic where radio-frequency (RF) devices are used as real-time virtual probes that are able to detect the presence and the movement(s) of one or more (non instrumented) subjects. However, radio sensing methods usually employ frequencies in the unlicensed 2.4−5.0 GHz bands where multipath effects strongly limit their accuracy, thus reducing their wide acceptance. On the contrary, sub-terahertz (sub-THz) radiation, due to its very short wavelength and reduced multipath effects, is well suited for high-resolution body occupancy detection and vision applications. In this paper, for the first time, we adopt radio devices emitting in the 100 GHz band to process an image of the environment for body motion discrimination inside a workspace area. Movement detection is based on the real-time analysis of body-induced signatures that are estimated from sub-THz measurements and then processed by specific neural network-based classifiers. Experimental trials are employed to validate the proposed methods and compare their performances with application to industrial safety monitoring.
无源无线电传感技术是一个成熟的研究课题,其中射频(RF)设备被用作实时虚拟探针,能够检测一个或多个(非仪器)受试者的存在和运动。然而,无线电传感方法通常使用未经许可的2.4 - 5.0 GHz频段的频率,其中多径效应严重限制了它们的精度,从而降低了它们的广泛接受度。相反,亚太赫兹(sub-THz)辐射由于其波长很短且多径效应减少,非常适合高分辨率身体占用检测和视觉应用。在本文中,我们首次采用100 GHz波段发射的无线电设备对工作区域内的环境图像进行处理,以进行身体运动识别。运动检测基于对亚太赫兹测量估计的身体诱发特征的实时分析,然后由特定的基于神经网络的分类器进行处理。通过实验验证了所提方法的有效性,并将其与工业安全监测中的应用进行了比较。
{"title":"Passive Detection and Discrimination of Body Movements in the sub-THz Band: A Case Study","authors":"S. Kianoush, S. Savazzi, V. Rampa","doi":"10.1109/ICASSP.2019.8682165","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682165","url":null,"abstract":"Passive radio sensing technique is a well established research topic where radio-frequency (RF) devices are used as real-time virtual probes that are able to detect the presence and the movement(s) of one or more (non instrumented) subjects. However, radio sensing methods usually employ frequencies in the unlicensed 2.4−5.0 GHz bands where multipath effects strongly limit their accuracy, thus reducing their wide acceptance. On the contrary, sub-terahertz (sub-THz) radiation, due to its very short wavelength and reduced multipath effects, is well suited for high-resolution body occupancy detection and vision applications. In this paper, for the first time, we adopt radio devices emitting in the 100 GHz band to process an image of the environment for body motion discrimination inside a workspace area. Movement detection is based on the real-time analysis of body-induced signatures that are estimated from sub-THz measurements and then processed by specific neural network-based classifiers. Experimental trials are employed to validate the proposed methods and compare their performances with application to industrial safety monitoring.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"5 1","pages":"1597-1601"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84182799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Learning the Spiral Sharing Network with Minimum Salient Region Regression for Saliency Detection 学习最小显著区回归的螺旋共享网络显著性检测
Zukai Chen, Xin Tan, Hengliang Zhu, Shouhong Ding, Lizhuang Ma, Haichuan Song
With the development of convolutional neural networks (CNNs), saliency detection methods have made a big progress in recent years. However, the previous methods sometimes mistakenly highlight the non-salient region, especially in complex backgrounds. To solve this problem, a two-stage method for saliency detection is proposed in this paper. In the first stage, a network is used to regress the minimum salient region (RMSR) containing all salient objects. Then in the second stage, in order to fuse the multi-level features, the spiral sharing network (SSN) is proposed for pixel-level detection on the result of RMSR. Experimental results on four public datasets show that our model is effective over the state-of-the-art approaches.
随着卷积神经网络(cnn)的发展,显著性检测方法近年来取得了很大的进步。然而,以往的方法有时会错误地突出非显著区域,特别是在复杂背景下。为了解决这一问题,本文提出了一种两阶段显著性检测方法。在第一阶段,使用网络回归包含所有显著对象的最小显著区域(RMSR)。然后,在第二阶段,为了融合多层次特征,提出螺旋共享网络(SSN)对RMSR结果进行像素级检测。在四个公共数据集上的实验结果表明,我们的模型比最先进的方法有效。
{"title":"Learning the Spiral Sharing Network with Minimum Salient Region Regression for Saliency Detection","authors":"Zukai Chen, Xin Tan, Hengliang Zhu, Shouhong Ding, Lizhuang Ma, Haichuan Song","doi":"10.1109/ICASSP.2019.8682531","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682531","url":null,"abstract":"With the development of convolutional neural networks (CNNs), saliency detection methods have made a big progress in recent years. However, the previous methods sometimes mistakenly highlight the non-salient region, especially in complex backgrounds. To solve this problem, a two-stage method for saliency detection is proposed in this paper. In the first stage, a network is used to regress the minimum salient region (RMSR) containing all salient objects. Then in the second stage, in order to fuse the multi-level features, the spiral sharing network (SSN) is proposed for pixel-level detection on the result of RMSR. Experimental results on four public datasets show that our model is effective over the state-of-the-art approaches.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"125 1","pages":"1667-1671"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72836600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Autoencoding HRTFS for DNN Based HRTF Personalization Using Anthropometric Features 自动编码HRTFS基于DNN的HRTF个性化使用人体特征
Tzu-Yu Chen, Tzu-Hsuan Kuo, T. Chi
We proposed a deep neural network (DNN) based approach to synthesize the magnitude of personalized head-related transfer functions (HRTFs) using anthropometric features of the user. To mitigate the over-fitting problem when training dataset is not very large, we built an autoencoder for dimensional reduction and establishing a crucial feature set to represent the raw HRTFs. Then we combined the decoder part of the autoencoder with a smaller DNN to synthesize the magnitude HRTFs. In this way, the complexity of the neural networks was greatly reduced to prevent unstable results with large variance due to overfitting. The proposed approach was compared with a baseline DNN model with no autoencoder. The log-spectral distortion (LSD) metric was used to evaluate the performance. Experiment results show that the proposed approach can reduce LSD of estimated HRTFs with greater stability.
我们提出了一种基于深度神经网络(DNN)的方法,利用用户的人体特征综合个性化头部相关传递函数(hrtf)的大小。为了缓解训练数据集不是很大时的过拟合问题,我们构建了一个用于降维的自动编码器,并建立了一个关键特征集来表示原始hrtf。然后,我们将自编码器的解码器部分与较小的DNN结合起来合成大小hrtf。这样大大降低了神经网络的复杂度,避免了由于过拟合导致的方差较大的不稳定结果。将该方法与不带自编码器的基线DNN模型进行了比较。使用对数光谱失真(LSD)度量来评估性能。实验结果表明,该方法可以降低估计hrtf的LSD,并且具有较高的稳定性。
{"title":"Autoencoding HRTFS for DNN Based HRTF Personalization Using Anthropometric Features","authors":"Tzu-Yu Chen, Tzu-Hsuan Kuo, T. Chi","doi":"10.1109/ICASSP.2019.8683814","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683814","url":null,"abstract":"We proposed a deep neural network (DNN) based approach to synthesize the magnitude of personalized head-related transfer functions (HRTFs) using anthropometric features of the user. To mitigate the over-fitting problem when training dataset is not very large, we built an autoencoder for dimensional reduction and establishing a crucial feature set to represent the raw HRTFs. Then we combined the decoder part of the autoencoder with a smaller DNN to synthesize the magnitude HRTFs. In this way, the complexity of the neural networks was greatly reduced to prevent unstable results with large variance due to overfitting. The proposed approach was compared with a baseline DNN model with no autoencoder. The log-spectral distortion (LSD) metric was used to evaluate the performance. Experiment results show that the proposed approach can reduce LSD of estimated HRTFs with greater stability.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"271-275"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76405984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Multicast Beamforming Using Semidefinite Relaxation and Bounded Perturbation Resilience 基于半定松弛和有界扰动弹性的组播波束形成
Jochen Fink, R. Cavalcante, S. Stańczak
Semidefinite relaxation followed by randomization is a well-known approach for approximating a solution to the NP-hard max-min fair multicast beamforming problem. While providing a good approximation to the optimal solution, this approach commonly involves the use of computationally demanding interior point methods. In this study, we propose a solution based on superiorization of bounded perturbation resilient iterative operators that scales to systems with a large number of antennas. We show that this method outperforms the randomization techniques in many cases, while using only computationally simple operations.
随机化后的半定松弛是解决NP-hard最大最小公平组播波束形成问题的一种众所周知的方法。虽然提供了对最优解的良好近似,但这种方法通常涉及使用计算要求很高的内点法。在这项研究中,我们提出了一种基于有界微扰弹性迭代算子的优越化解决方案,该方案适用于具有大量天线的系统。我们表明,这种方法在许多情况下优于随机化技术,而只使用计算简单的操作。
{"title":"Multicast Beamforming Using Semidefinite Relaxation and Bounded Perturbation Resilience","authors":"Jochen Fink, R. Cavalcante, S. Stańczak","doi":"10.1109/ICASSP.2019.8682325","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682325","url":null,"abstract":"Semidefinite relaxation followed by randomization is a well-known approach for approximating a solution to the NP-hard max-min fair multicast beamforming problem. While providing a good approximation to the optimal solution, this approach commonly involves the use of computationally demanding interior point methods. In this study, we propose a solution based on superiorization of bounded perturbation resilient iterative operators that scales to systems with a large number of antennas. We show that this method outperforms the randomization techniques in many cases, while using only computationally simple operations.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"75 1","pages":"4749-4753"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87693878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
1-D Convolutional Neural Networks for Signal Processing Applications 一维卷积神经网络在信号处理中的应用
S. Kiranyaz, T. Ince, Osama Abdeljaber, Onur Avcı, M. Gabbouj
1D Convolutional Neural Networks (CNNs) have recently become the state-of-the-art technique for crucial signal processing applications such as patient-specific ECG classification, structural health monitoring, anomaly detection in power electronics circuitry and motor-fault detection. This is an expected outcome as there are numerous advantages of using an adaptive and compact 1D CNN instead of a conventional (2D) deep counterparts. First of all, compact 1D CNNs can be efficiently trained with a limited dataset of 1D signals while the 2D deep CNNs, besides requiring 1D to 2D data transformation, usually need datasets with massive size, e.g., in the "Big Data" scale in order to prevent the well-known "overfitting" problem. 1D CNNs can directly be applied to the raw signal (e.g., current, voltage, vibration, etc.) without requiring any pre- or post-processing such as feature extraction, selection, dimension reduction, denoising, etc. Furthermore, due to the simple and compact configuration of such adaptive 1D CNNs that perform only linear 1D convolutions (scalar multiplications and additions), a real-time and low-cost hardware implementation is feasible. This paper reviews the major signal processing applications of compact 1D CNNs with a brief theoretical background. We will present their state-of-the-art performances and conclude with focusing on some major properties. Keywords – 1-D CNNs, Biomedical Signal Processing, SHM
一维卷积神经网络(cnn)近年来已成为关键信号处理应用的最先进技术,如患者特定ECG分类,结构健康监测,电力电子电路异常检测和电机故障检测。这是一个预期的结果,因为使用自适应和紧凑的1D CNN而不是传统的(2D)深度CNN有许多优点。首先,紧凑的一维cnn可以用有限的一维信号数据集进行有效的训练,而二维深度cnn除了需要从一维到二维的数据转换外,通常还需要大规模的数据集,例如“大数据”规模的数据集,以防止众所周知的“过拟合”问题。1D cnn可以直接应用于原始信号(如电流、电压、振动等),不需要进行特征提取、选择、降维、去噪等预处理或后处理。此外,由于这种自适应1D cnn的配置简单紧凑,仅执行线性1D卷积(标量乘法和加法),因此实时和低成本的硬件实现是可行的。本文综述了紧凑一维cnn在信号处理方面的主要应用,并简要介绍了其理论背景。我们将展示他们最先进的表演,最后重点介绍一些主要的属性。关键词:一维cnn,生物医学信号处理,SHM
{"title":"1-D Convolutional Neural Networks for Signal Processing Applications","authors":"S. Kiranyaz, T. Ince, Osama Abdeljaber, Onur Avcı, M. Gabbouj","doi":"10.1109/ICASSP.2019.8682194","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682194","url":null,"abstract":"1D Convolutional Neural Networks (CNNs) have recently become the state-of-the-art technique for crucial signal processing applications such as patient-specific ECG classification, structural health monitoring, anomaly detection in power electronics circuitry and motor-fault detection. This is an expected outcome as there are numerous advantages of using an adaptive and compact 1D CNN instead of a conventional (2D) deep counterparts. First of all, compact 1D CNNs can be efficiently trained with a limited dataset of 1D signals while the 2D deep CNNs, besides requiring 1D to 2D data transformation, usually need datasets with massive size, e.g., in the \"Big Data\" scale in order to prevent the well-known \"overfitting\" problem. 1D CNNs can directly be applied to the raw signal (e.g., current, voltage, vibration, etc.) without requiring any pre- or post-processing such as feature extraction, selection, dimension reduction, denoising, etc. Furthermore, due to the simple and compact configuration of such adaptive 1D CNNs that perform only linear 1D convolutions (scalar multiplications and additions), a real-time and low-cost hardware implementation is feasible. This paper reviews the major signal processing applications of compact 1D CNNs with a brief theoretical background. We will present their state-of-the-art performances and conclude with focusing on some major properties. Keywords – 1-D CNNs, Biomedical Signal Processing, SHM","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"8360-8364"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87978515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 145
An Empirical Study of Speech Processing in the Brain by Analyzing the Temporal Syllable Structure in Speech-input Induced EEG 通过分析语音输入诱发脑电图的时态音节结构对大脑语音处理的实证研究
Rini A. Sharon, Shrikanth S. Narayanan, M. Sur, H. Murthy
Clinical applicability of electroencephalography (EEG) is well established, however the use of EEG as a choice for constructing brain computer interfaces to develop communication platforms is relatively recent. To provide more natural means of communication, there is an increasing focus on bringing together speech and EEG signal processing. Quantifying the way our brain processes speech is one way of approaching the problem of speech recognition using brain waves. This paper analyses the feasibility of recognizing syllable level units by studying the temporal structure of speech reflected in the EEG signals. The slowly varying component of the delta band EEG(0.3-3Hz) is present in all other EEG frequency bands. Analysis shows that removing the delta trend in EEG signals results in signals that reveals syllable like structure. Using a 25 syllable framework, classification of EEG data obtained from 13 subjects yields promising results, underscoring the potential of revealing speech related temporal structure in EEG.
脑电图(EEG)的临床应用是公认的,但将EEG作为构建脑机接口开发通信平台的选择相对较晚。为了提供更自然的交流方式,人们越来越关注将语音和脑电图信号处理结合起来。量化我们的大脑处理语音的方式是利用脑电波解决语音识别问题的一种方法。本文通过研究脑电信号中反映的语音时间结构,分析了识别音节水平单位的可行性。δ波段脑电图(0.3-3Hz)的缓慢变化成分存在于所有其他脑电图频段。分析表明,去除脑电信号中的δ趋势后,得到的信号显示出类似音节的结构。采用25音节框架对13名受试者的脑电数据进行分类,结果令人鼓舞,强调了在脑电中揭示言语相关时间结构的潜力。
{"title":"An Empirical Study of Speech Processing in the Brain by Analyzing the Temporal Syllable Structure in Speech-input Induced EEG","authors":"Rini A. Sharon, Shrikanth S. Narayanan, M. Sur, H. Murthy","doi":"10.1109/ICASSP.2019.8683572","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683572","url":null,"abstract":"Clinical applicability of electroencephalography (EEG) is well established, however the use of EEG as a choice for constructing brain computer interfaces to develop communication platforms is relatively recent. To provide more natural means of communication, there is an increasing focus on bringing together speech and EEG signal processing. Quantifying the way our brain processes speech is one way of approaching the problem of speech recognition using brain waves. This paper analyses the feasibility of recognizing syllable level units by studying the temporal structure of speech reflected in the EEG signals. The slowly varying component of the delta band EEG(0.3-3Hz) is present in all other EEG frequency bands. Analysis shows that removing the delta trend in EEG signals results in signals that reveals syllable like structure. Using a 25 syllable framework, classification of EEG data obtained from 13 subjects yields promising results, underscoring the potential of revealing speech related temporal structure in EEG.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"33 1","pages":"4090-4094"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86300490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
CRF-based Single-stage Acoustic Modeling with CTC Topology 基于crf的CTC拓扑单级声学建模
Hongyu Xiang, Zhijian Ou
In this paper, we develop conditional random field (CRF) based single-stage (SS) acoustic modeling with connectionist temporal classification (CTC) inspired state topology, which is called CTC-CRF for short. CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology. Like SS-LF-MMI (lattice-free maximum-mutual-information), CTC-CRFs can be trained from scratch (flat-start), eliminating GMM-HMM pre-training and tree-building. Evaluation experiments are conducted on the WSJ, Switchboard and Librispeech datasets. In a head-to-head comparison, the CTC-CRF model using simple Bidirectional LSTMs consistently outperforms the strong SS-LF-MMI, across all the three benchmarking datasets and in both cases of mono-phones and mono-chars. Additionally, CTC-CRFs avoid some ad-hoc operation in SS-LF-MMI.
本文提出了一种基于条件随机场(CRF)的基于连接时间分类(CTC)启发状态拓扑的单级(SS)声学建模方法,简称CTC-CRF。CTC-CRF概念简单,基本上是在底层神经网络生成的具有特殊状态拓扑的特征之上实现一个CRF层。像SS-LF-MMI(无格最大互信息)一样,CTC-CRFs可以从头开始训练,消除了GMM-HMM预训练和树构建。在WSJ、Switchboard和librisspeech数据集上进行了评估实验。在正面比较中,使用简单双向lstm的CTC-CRF模型在所有三个基准数据集以及单电话和单字符的情况下始终优于强大的SS-LF-MMI。此外,CTC-CRFs避免了SS-LF-MMI中的一些特别操作。
{"title":"CRF-based Single-stage Acoustic Modeling with CTC Topology","authors":"Hongyu Xiang, Zhijian Ou","doi":"10.1109/ICASSP.2019.8682256","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682256","url":null,"abstract":"In this paper, we develop conditional random field (CRF) based single-stage (SS) acoustic modeling with connectionist temporal classification (CTC) inspired state topology, which is called CTC-CRF for short. CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology. Like SS-LF-MMI (lattice-free maximum-mutual-information), CTC-CRFs can be trained from scratch (flat-start), eliminating GMM-HMM pre-training and tree-building. Evaluation experiments are conducted on the WSJ, Switchboard and Librispeech datasets. In a head-to-head comparison, the CTC-CRF model using simple Bidirectional LSTMs consistently outperforms the strong SS-LF-MMI, across all the three benchmarking datasets and in both cases of mono-phones and mono-chars. Additionally, CTC-CRFs avoid some ad-hoc operation in SS-LF-MMI.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"5676-5680"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86079634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Enhanced Virtual Singers Generation by Incorporating Singing Dynamics to Personalized Text-to-speech-to-singing 增强虚拟歌手的产生,结合唱歌动态个性化的文本到语音到唱歌
Kantapon Kaewtip, F. Villavicencio, Fang-Yu Kuo, Mark Harvilla, I. Ouyang, P. Lanchantin
We present in this work a strategy to enhance the quality of Text-to-Speech (TTS) based Singing Voice generation. Speech-to-singing refers to techniques transforming a spoken voice into singing, mainly by manipulating the duration and pitch of a spoken version of a song’s lyrics. While this strategy efficiently preserves the speaker identity, the generated singing is not always perceived fully natural since the vocal conditions generally change between spoken and singing voice. By incorporating speaker-independent natural singing information to TTS-based Speech-to-Singing (STS) we positively impact the sound quality (e.g. reducing hoarseness), as it is shown in the subjective evaluation reported at the end of this paper.
在这项工作中,我们提出了一种提高基于文本到语音(TTS)的歌唱语音生成质量的策略。Speech-to-singing指的是将说话的声音转化为唱歌的技术,主要是通过控制歌词的时长和音高来实现的。虽然这种策略有效地保留了说话者的身份,但生成的歌声并不总是被认为是完全自然的,因为声音条件通常在说话和唱歌之间变化。通过将独立于说话人的自然歌唱信息整合到基于tts的speech -to- sing (STS)中,我们对音质产生了积极的影响(例如减少了声音嘶哑),正如本文最后报道的主观评价所示。
{"title":"Enhanced Virtual Singers Generation by Incorporating Singing Dynamics to Personalized Text-to-speech-to-singing","authors":"Kantapon Kaewtip, F. Villavicencio, Fang-Yu Kuo, Mark Harvilla, I. Ouyang, P. Lanchantin","doi":"10.1109/ICASSP.2019.8682968","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8682968","url":null,"abstract":"We present in this work a strategy to enhance the quality of Text-to-Speech (TTS) based Singing Voice generation. Speech-to-singing refers to techniques transforming a spoken voice into singing, mainly by manipulating the duration and pitch of a spoken version of a song’s lyrics. While this strategy efficiently preserves the speaker identity, the generated singing is not always perceived fully natural since the vocal conditions generally change between spoken and singing voice. By incorporating speaker-independent natural singing information to TTS-based Speech-to-Singing (STS) we positively impact the sound quality (e.g. reducing hoarseness), as it is shown in the subjective evaluation reported at the end of this paper.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"6960-6964"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82804455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Tensor Super-resolution for Seismic Data 地震数据的张量超分辨率
Songjie Liao, Xiao-Yang Liu, Feng Qian, Miao Yin, Guangmin Hu
In this paper, we propose a novel method for generating high-granularity three-dimensional (3D) seismic data from low-granularity data based on tensor sparse coding, which jointly trains a high-granularity dictionary and a low-granularity dictionary. First, considering the high-dimensional properties of seismic data, we introduce tensor sparse coding to seismic data interpolation. Second, we propose that the dictionary pairs trained by low-granularity seismic data and high-granularity seismic data have the same sparse representation, which are used to recover high-granularity data with the high-granularity dictionary. Finally, experiments on the seismic data of an actual field show that the proposed method effectively perform seismic trace interpolation and can improve the resolution of seismic data imaging.
本文提出了一种基于张量稀疏编码的从低粒度数据生成高粒度三维地震数据的新方法,该方法联合训练一个高粒度字典和一个低粒度字典。首先,考虑到地震数据的高维特性,将张量稀疏编码引入地震数据插值。其次,提出由低粒度地震数据和高粒度地震数据训练的字典对具有相同的稀疏表示,用高粒度字典恢复高粒度数据;最后,在实际现场地震数据上进行了实验,结果表明,该方法能有效地进行地震道插值,提高了地震数据成像的分辨率。
{"title":"Tensor Super-resolution for Seismic Data","authors":"Songjie Liao, Xiao-Yang Liu, Feng Qian, Miao Yin, Guangmin Hu","doi":"10.1109/ICASSP.2019.8683419","DOIUrl":"https://doi.org/10.1109/ICASSP.2019.8683419","url":null,"abstract":"In this paper, we propose a novel method for generating high-granularity three-dimensional (3D) seismic data from low-granularity data based on tensor sparse coding, which jointly trains a high-granularity dictionary and a low-granularity dictionary. First, considering the high-dimensional properties of seismic data, we introduce tensor sparse coding to seismic data interpolation. Second, we propose that the dictionary pairs trained by low-granularity seismic data and high-granularity seismic data have the same sparse representation, which are used to recover high-granularity data with the high-granularity dictionary. Finally, experiments on the seismic data of an actual field show that the proposed method effectively perform seismic trace interpolation and can improve the resolution of seismic data imaging.","PeriodicalId":13203,"journal":{"name":"ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"8598-8602"},"PeriodicalIF":0.0,"publicationDate":"2019-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83288224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1