首页 > 最新文献

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Image super-resolution based on error compensation with convolutional neural network 基于卷积神经网络误差补偿的图像超分辨率
Wei-Ting Lu, Chien-Wei Lin, Chih-Hung Kuo, Ying-Chan Tung
Convolutional Neural Networks have been widely studied for the super-resolution (SR) and other image restoration tasks. In this paper, we propose an additional error-compensational convolutional neural network (EC-CNN) that is trained based on the concept of iterative back projection (IBP). The residuals between interpolation images and ground truth images are used to train the network. This CNN model can compensate the residual projection in the IBP more accurately. This CNN- based IBP can be further combined with the super-resolution CNN(SRCNN). Experimental results show that our method can significantly enhance the quality of scale images as a post-processing method. The approach can averagely outperform SRCNN by 0.14 dB and SRCNN-EX by 0.08 dB in PSNR with scaling factor 3.
卷积神经网络在超分辨率和其他图像恢复任务中得到了广泛的研究。在本文中,我们提出了一种基于迭代反投影(IBP)概念训练的附加误差补偿卷积神经网络(EC-CNN)。利用插值图像与地面真值图像之间的残差对网络进行训练。该CNN模型可以更准确地补偿IBP中的残差投影。这种基于CNN的IBP可以与超分辨率CNN(SRCNN)进一步结合。实验结果表明,该方法作为一种后处理方法,可以显著提高尺度图像的质量。该方法的PSNR平均比SRCNN高0.14 dB,比SRCNN- ex高0.08 dB。
{"title":"Image super-resolution based on error compensation with convolutional neural network","authors":"Wei-Ting Lu, Chien-Wei Lin, Chih-Hung Kuo, Ying-Chan Tung","doi":"10.1109/APSIPA.2017.8282203","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282203","url":null,"abstract":"Convolutional Neural Networks have been widely studied for the super-resolution (SR) and other image restoration tasks. In this paper, we propose an additional error-compensational convolutional neural network (EC-CNN) that is trained based on the concept of iterative back projection (IBP). The residuals between interpolation images and ground truth images are used to train the network. This CNN model can compensate the residual projection in the IBP more accurately. This CNN- based IBP can be further combined with the super-resolution CNN(SRCNN). Experimental results show that our method can significantly enhance the quality of scale images as a post-processing method. The approach can averagely outperform SRCNN by 0.14 dB and SRCNN-EX by 0.08 dB in PSNR with scaling factor 3.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124972181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Importance of non-uniform prosody modification for speech recognition in emotion conditions 非均匀韵律修饰在情绪条件下语音识别中的重要性
Vishnu Vidyadhara Raju Vegesna, Hari Krishna Vydana, S. Gangashetty, A. Vuppala
A mismatch in training and operating environments causes a performance degradation in speech recognition systems (ASR). One major reason for this mismatch is due to the presence of expressive (emotive) speech in operational environments. Emotions in speech majorly inflict the changes in the prosody parameters of pitch, duration and energy. This work is aimed at improving the performance of speech recognition systems in the presence of emotive speech. This work focuses on improving the speech recognition performance without disturbing the existing ASR system. The prosody modification of pitch, duration and energy is achieved by tuning the modification factors values for the relative differences between the neutral and emotional data sets. The neutral version of emotive speech is generated using uniform and non-uniform prosody modification methods for speech recognition. During the study, IITKGP-SESC corpus is used for building the ASR system. The speech recognition system for the emotions (anger, happy and compassion) is evaluated. An improvement in the performance of ASR is observed when the prosody modified emotive utterance is used for speech recognition in place of original emotive utterance. An average improvement around 5% in accuracy is observed due to the use of non-uniform prosody modification methods.
训练和操作环境的不匹配会导致语音识别系统(ASR)的性能下降。造成这种不匹配的一个主要原因是在操作环境中存在表达性(情绪化)的语音。言语中的情绪主要造成音高、持续时间和能量等韵律参数的变化。这项工作的目的是提高语音识别系统的性能,在存在的情绪语音。本工作的重点是在不干扰现有ASR系统的情况下提高语音识别性能。通过调整中性数据集和情绪数据集之间相对差异的修正因子值,实现音调、持续时间和能量的韵律修正。使用统一和非统一韵律修饰方法生成情感语音的中性版本,用于语音识别。在研究过程中,使用IITKGP-SESC语料库构建ASR系统。对情绪(愤怒、快乐和同情)的语音识别系统进行了评估。用韵律修饰的情绪话语代替原有的情绪话语进行语音识别,可以明显改善ASR的表现。由于使用非均匀韵律修改方法,我们观察到准确率平均提高了5%左右。
{"title":"Importance of non-uniform prosody modification for speech recognition in emotion conditions","authors":"Vishnu Vidyadhara Raju Vegesna, Hari Krishna Vydana, S. Gangashetty, A. Vuppala","doi":"10.1109/APSIPA.2017.8282109","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282109","url":null,"abstract":"A mismatch in training and operating environments causes a performance degradation in speech recognition systems (ASR). One major reason for this mismatch is due to the presence of expressive (emotive) speech in operational environments. Emotions in speech majorly inflict the changes in the prosody parameters of pitch, duration and energy. This work is aimed at improving the performance of speech recognition systems in the presence of emotive speech. This work focuses on improving the speech recognition performance without disturbing the existing ASR system. The prosody modification of pitch, duration and energy is achieved by tuning the modification factors values for the relative differences between the neutral and emotional data sets. The neutral version of emotive speech is generated using uniform and non-uniform prosody modification methods for speech recognition. During the study, IITKGP-SESC corpus is used for building the ASR system. The speech recognition system for the emotions (anger, happy and compassion) is evaluated. An improvement in the performance of ASR is observed when the prosody modified emotive utterance is used for speech recognition in place of original emotive utterance. An average improvement around 5% in accuracy is observed due to the use of non-uniform prosody modification methods.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126837194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A deep learning architecture for classifying medical images of anatomy object 一种用于解剖对象医学图像分类的深度学习架构
S. Khan, S. Yong
Deep learning architectures particularly Convolutional Neural Network (CNN) have shown an intrinsic ability to automatically extract the high level representations from big data. CNN has produced impressive results in natural image classification, but there is a major hurdle to their deployment in medical domain because of the relatively lack of training data as compared to general imaging benchmarks such as ImageNet. In this paper we present a comparative evaluation of the three milestone architectures i.e. LeNet, AlexNet and GoogLeNet and propose our CNN architecture for classifying medical anatomy images. Based on the experiments, it is shown that the proposed Convolutional Neural Network architecture outperforms the three milestone architectures in classifying medical images of anatomy object.
深度学习架构,特别是卷积神经网络(CNN)已经显示出从大数据中自动提取高级表示的内在能力。CNN在自然图像分类方面取得了令人印象深刻的成果,但在医疗领域的部署存在一个主要障碍,因为与ImageNet等一般成像基准相比,相对缺乏训练数据。在本文中,我们对LeNet、AlexNet和GoogLeNet这三种里程碑式的架构进行了比较评估,并提出了我们的用于医学解剖图像分类的CNN架构。实验结果表明,本文提出的卷积神经网络结构在解剖对象医学图像分类方面优于三种里程碑结构。
{"title":"A deep learning architecture for classifying medical images of anatomy object","authors":"S. Khan, S. Yong","doi":"10.1109/APSIPA.2017.8282299","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282299","url":null,"abstract":"Deep learning architectures particularly Convolutional Neural Network (CNN) have shown an intrinsic ability to automatically extract the high level representations from big data. CNN has produced impressive results in natural image classification, but there is a major hurdle to their deployment in medical domain because of the relatively lack of training data as compared to general imaging benchmarks such as ImageNet. In this paper we present a comparative evaluation of the three milestone architectures i.e. LeNet, AlexNet and GoogLeNet and propose our CNN architecture for classifying medical anatomy images. Based on the experiments, it is shown that the proposed Convolutional Neural Network architecture outperforms the three milestone architectures in classifying medical images of anatomy object.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129072598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
MSE-optimized CP-based CFO estimation in OFDM systems over multipath channels 多径信道OFDM系统中基于mse优化cp的CFO估计
Tzu-Chiao Lin, See-May Phoong
Carrier frequency offset (CFO) is an important issue in the study of orthogonal frequency division multiplexing (OFDM) systems. It is well known that CFO destroys the orthogonality of the subcarriers and it significantly degrades the bit error rate (BER) performance of OFDM systems. In this paper, an algorithm based on cyclic prefix (CP) is proposed for blind CFO estimation in OFDM transmission over multipath channels. The proposed method minimizes the theoretical mean square error (MSE). A closed form formula is derived. Simulation results show that the proposed method performs very well.
载波频偏(CFO)是正交频分复用(OFDM)系统研究中的一个重要问题。众所周知,CFO破坏了子载波的正交性,严重降低了OFDM系统的误码率性能。提出了一种基于循环前缀(CP)的OFDM多径信道盲CFO估计算法。该方法使理论均方误差(MSE)最小。导出了一个封闭形式公式。仿真结果表明,该方法具有良好的性能。
{"title":"MSE-optimized CP-based CFO estimation in OFDM systems over multipath channels","authors":"Tzu-Chiao Lin, See-May Phoong","doi":"10.1109/APSIPA.2017.8282146","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282146","url":null,"abstract":"Carrier frequency offset (CFO) is an important issue in the study of orthogonal frequency division multiplexing (OFDM) systems. It is well known that CFO destroys the orthogonality of the subcarriers and it significantly degrades the bit error rate (BER) performance of OFDM systems. In this paper, an algorithm based on cyclic prefix (CP) is proposed for blind CFO estimation in OFDM transmission over multipath channels. The proposed method minimizes the theoretical mean square error (MSE). A closed form formula is derived. Simulation results show that the proposed method performs very well.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121177553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Electrolaryngeal speech modification towards singing aid system for laryngectomees 对喉切除者助唱系统的电喉语音改造
Kazuho Morikawa, T. Toda
Towards the development of a singing aid system for laryngectomees, we propose a method for converting electro-laryngeal (EL) speech produced by using an electrolarynx into more naturally sounding singing voices. Singing by using the electrolarynx is less flexible because the pitch of EL speech is determined by the source excitation signal mechanically produced by the electrolarynx, and therefore, it is necessary to embed melodies of songs to be sung in advance to the electrolarynx. In addition, sound quality of singing voices produced by the electrolarynx is severely degraded by an adverse effect of its mechanical excitation sounds emitted outside as noise. To address these problems, the proposed conversion method uses 1) pitch control by playing a musical instrument and 2) noise suppression. In the pitch control, pitch patterns of music sounds played simultaneously in singing with the electrolaryx are modified so that they have specific characteristics usually observed in singing voices, and then, the modified pitch patterns are used as the target pitch patterns in the conversion from EL speech into singing voices. In the noise suppression, spectral subtraction is used to suppress the leaked excitation sounds. The experimental results demonstrate that 1) naturalness of singing voices is significantly improved by the noise suppression and 2) the pitch pattern modification is not necessarily effective in the conversion from EL speech into singing voices.
为了开发喉切除手术的助唱系统,我们提出了一种将使用电喉器产生的电喉(EL)语音转换为更自然的歌声的方法。使用电喉唱歌灵活性较差,因为EL语音的音高是由电喉机械产生的源激励信号决定的,因此需要事先将要唱的歌曲的旋律嵌入到电喉中。此外,电喉产生的歌唱声音的音质由于其机械激发声作为噪声向外界发出的不利影响而严重下降。为了解决这些问题,所提出的转换方法使用1)通过演奏乐器来控制音高和2)噪声抑制。在音高控制中,对与电雀同时歌唱的音乐声音的音高模式进行修改,使其具有歌唱声音中通常观察到的特定特征,然后将修改后的音高模式作为EL语音转化为歌唱声音的目标音高模式。在噪声抑制中,采用谱减法对泄漏的激励声进行抑制。实验结果表明:1)噪声抑制显著提高了歌唱声音的自然度;2)音高模式修改对EL语音转化为歌唱声音不一定有效。
{"title":"Electrolaryngeal speech modification towards singing aid system for laryngectomees","authors":"Kazuho Morikawa, T. Toda","doi":"10.1109/APSIPA.2017.8282097","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282097","url":null,"abstract":"Towards the development of a singing aid system for laryngectomees, we propose a method for converting electro-laryngeal (EL) speech produced by using an electrolarynx into more naturally sounding singing voices. Singing by using the electrolarynx is less flexible because the pitch of EL speech is determined by the source excitation signal mechanically produced by the electrolarynx, and therefore, it is necessary to embed melodies of songs to be sung in advance to the electrolarynx. In addition, sound quality of singing voices produced by the electrolarynx is severely degraded by an adverse effect of its mechanical excitation sounds emitted outside as noise. To address these problems, the proposed conversion method uses 1) pitch control by playing a musical instrument and 2) noise suppression. In the pitch control, pitch patterns of music sounds played simultaneously in singing with the electrolaryx are modified so that they have specific characteristics usually observed in singing voices, and then, the modified pitch patterns are used as the target pitch patterns in the conversion from EL speech into singing voices. In the noise suppression, spectral subtraction is used to suppress the leaked excitation sounds. The experimental results demonstrate that 1) naturalness of singing voices is significantly improved by the noise suppression and 2) the pitch pattern modification is not necessarily effective in the conversion from EL speech into singing voices.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114162684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sliced voxel representations with LSTM and CNN for 3D shape recognition 使用LSTM和CNN进行三维形状识别的切片体素表示
R. Miyagi, Masaki Aono
We propose a sliced voxel representation, which we call Sliced Square Voxels (SSV), based on LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network) for three-dimensional shape recognition. Given an arbitrary 3D model, we first convert it into binary voxel of size 32×32×32. Then, after a view position is fixed, we slice the binary voxel data vertically in the depth direction. To utilize the 2D projected shape information of the sliced voxels, CNN has been applied. The output of CNN is fed into LSTM, which is our main idea, where the spatial topology is supposed to be favored with LSTM. From our experiments, our proposed method turns out to be superior to the baseline method which we prepared using 3DCNN. We further compared with related previous methods, using large-scale 3D model dataset (ModelNet10 and ModelNet40), and our proposed methods outperformed them.
我们提出了一种切片体素表示,我们称之为切片方形体素(SSV),基于LSTM(长短期记忆)和CNN(卷积神经网络)进行三维形状识别。给定一个任意的3D模型,我们首先将其转换为大小为32×32×32的二进制体素。然后,在视图位置固定后,我们在深度方向上垂直切片二进制体素数据。为了利用切片体素的二维投影形状信息,我们使用了CNN。CNN的输出被输入到LSTM中,这是我们的主要思想,其中空间拓扑应该更有利于LSTM。实验结果表明,本文提出的方法优于基于3DCNN的基线方法。我们进一步使用大型3D模型数据集(ModelNet10和ModelNet40)与之前的相关方法进行了比较,结果表明我们提出的方法优于它们。
{"title":"Sliced voxel representations with LSTM and CNN for 3D shape recognition","authors":"R. Miyagi, Masaki Aono","doi":"10.1109/APSIPA.2017.8282044","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282044","url":null,"abstract":"We propose a sliced voxel representation, which we call Sliced Square Voxels (SSV), based on LSTM (Long Short-Term Memory) and CNN (Convolutional Neural Network) for three-dimensional shape recognition. Given an arbitrary 3D model, we first convert it into binary voxel of size 32×32×32. Then, after a view position is fixed, we slice the binary voxel data vertically in the depth direction. To utilize the 2D projected shape information of the sliced voxels, CNN has been applied. The output of CNN is fed into LSTM, which is our main idea, where the spatial topology is supposed to be favored with LSTM. From our experiments, our proposed method turns out to be superior to the baseline method which we prepared using 3DCNN. We further compared with related previous methods, using large-scale 3D model dataset (ModelNet10 and ModelNet40), and our proposed methods outperformed them.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124067827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Speech emotion recognition using convolutional long short-term memory neural network and support vector machines 基于卷积长短期记忆神经网络和支持向量机的语音情感识别
Nattapong Kurpukdee, Tomoki Koriyama, Takao Kobayashi, S. Kasuriya, C. Wutiwiwatchai, P. Lamsrichan
In this paper, we propose a speech emotion recognition technique using convolutional long short-term memory (LSTM) recurrent neural network (ConvLSTM-RNN) as a phoneme-based feature extractor from raw input speech signal. In the proposed technique, ConvLSTM-RNN outputs phoneme- based emotion probabilities to every frame of an input utterance. Then these probabilities are converted into statistical features of the input utterance and used for the input features of support vector machines (SVMs) or linear discriminant analysis (LDA) system to classify the utterance-level emotions. To assess the effectiveness of the proposed technique, we conducted experiments in the classification of four emotions (anger, happiness, sadness, and neutral) on IEMOCAP database. The result showed that the proposed technique with either of SVM or LDA classifier outperforms the conventional ConvLSTM-based one.
在本文中,我们提出了一种使用卷积长短期记忆(LSTM)递归神经网络(ConvLSTM-RNN)作为原始输入语音信号中基于音素的特征提取器的语音情绪识别技术。在提出的技术中,ConvLSTM-RNN将基于音素的情绪概率输出到输入话语的每一帧。然后将这些概率转换为输入话语的统计特征,用于支持向量机(svm)或线性判别分析(LDA)系统的输入特征,对话语级情绪进行分类。为了评估该方法的有效性,我们在IEMOCAP数据库上对四种情绪(愤怒、快乐、悲伤和中性)进行了分类实验。结果表明,本文提出的SVM和LDA分类器都优于传统的基于convlstm的分类器。
{"title":"Speech emotion recognition using convolutional long short-term memory neural network and support vector machines","authors":"Nattapong Kurpukdee, Tomoki Koriyama, Takao Kobayashi, S. Kasuriya, C. Wutiwiwatchai, P. Lamsrichan","doi":"10.1109/APSIPA.2017.8282315","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282315","url":null,"abstract":"In this paper, we propose a speech emotion recognition technique using convolutional long short-term memory (LSTM) recurrent neural network (ConvLSTM-RNN) as a phoneme-based feature extractor from raw input speech signal. In the proposed technique, ConvLSTM-RNN outputs phoneme- based emotion probabilities to every frame of an input utterance. Then these probabilities are converted into statistical features of the input utterance and used for the input features of support vector machines (SVMs) or linear discriminant analysis (LDA) system to classify the utterance-level emotions. To assess the effectiveness of the proposed technique, we conducted experiments in the classification of four emotions (anger, happiness, sadness, and neutral) on IEMOCAP database. The result showed that the proposed technique with either of SVM or LDA classifier outperforms the conventional ConvLSTM-based one.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128006101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Nonuniform sampling theorems for random signals in the offset linear canonical transform domain 偏置线性正则变换域中随机信号的非均匀采样定理
Y. Bao, Yan-Na Zhang, Yu-E. Song, Bingzhao Li, P. Dang
With the rapid development of the offset linear canonical transform (OLCT) in the fields of optics and signal processing, it is necessary to consider the nonuniform sampling associated with the OLCT. Nowadays, the analysis and applications of the nonuniform sampling for deterministic signals in the OLCT domain have been well published and studied. However, none of the results about the reconstruction of nonuniform sampling for random signals in the OLCT domain have been proposed until now. In this paper, the nonuniform sampling and reconstruction of random signals in the OLCT domain are investigated. Firstly, a brief introduction to the fundamental knowledge of the OLCT and some special nonuniform sampling models are given. Then, the reconstruction theorems for random signals from nonuniform samples in the OLCT domain have been derived for different nonuniform sampling models. Finally, the simulation results are given to verify the accuracy of theoretical results.
随着偏移线性正则变换(OLCT)在光学和信号处理领域的迅速发展,必须考虑与OLCT相关的非均匀采样问题。目前,对确定性信号在OLCT域的非均匀采样的分析和应用已经有了很好的研究。然而,对于随机信号在OLCT域中的非均匀采样重建,目前还没有相关的研究成果。研究了随机信号在OLCT域中的非均匀采样和重构问题。首先,简要介绍了OLCT的基本知识和一些特殊的非均匀采样模型。然后,针对不同的非均匀采样模型,导出了OLCT域中非均匀采样随机信号的重构定理。最后给出了仿真结果,验证了理论结果的准确性。
{"title":"Nonuniform sampling theorems for random signals in the offset linear canonical transform domain","authors":"Y. Bao, Yan-Na Zhang, Yu-E. Song, Bingzhao Li, P. Dang","doi":"10.1109/APSIPA.2017.8282008","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282008","url":null,"abstract":"With the rapid development of the offset linear canonical transform (OLCT) in the fields of optics and signal processing, it is necessary to consider the nonuniform sampling associated with the OLCT. Nowadays, the analysis and applications of the nonuniform sampling for deterministic signals in the OLCT domain have been well published and studied. However, none of the results about the reconstruction of nonuniform sampling for random signals in the OLCT domain have been proposed until now. In this paper, the nonuniform sampling and reconstruction of random signals in the OLCT domain are investigated. Firstly, a brief introduction to the fundamental knowledge of the OLCT and some special nonuniform sampling models are given. Then, the reconstruction theorems for random signals from nonuniform samples in the OLCT domain have been derived for different nonuniform sampling models. Finally, the simulation results are given to verify the accuracy of theoretical results.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128080719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A new pool control method for Boolean compressed sensing based adaptive group testing 一种新的基于布尔压缩感知的自适应群测试池控制方法
Yujia Lu, K. Hayashi
In the adaptive group testing, the pool (a set of items to be tested) used in the next test is determined based on past test results, and its performance heavily depends on the control method of the pool. This paper proposes a new pool control method for Boolean compressed sensing based adaptive group testing. The proposed method firstly selects a pool size of the next test by minimizing the expectation of the approximated required number of tests after the next test based on the estimated number of remaining positive items. Then, when the selected pool size is one, an item having the highest probability of being positive will be selected as a pool, otherwise a pool with the selected size will be constructed by randomly selecting items. In addition, a new cardinality estimation method of positive items, that can be implemented in parallel with the proposed pool control method, is also proposed. Computer simulation results reveal that the adaptive group testing with the proposed method has better performance than that with the conventional methods for both with and without the information of cardinality of positive items.
在自适应组测试中,下一次测试使用的池(一组待测项目)是根据过去的测试结果确定的,其性能在很大程度上取决于池的控制方法。提出了一种新的基于布尔压缩感知的自适应群测试池控制方法。该方法首先根据剩余阳性项目的估计数量,通过最小化下一次测试后所需测试数量的期望来选择下一次测试的池大小。然后,当选择的池大小为1时,将选择概率为正的项目作为池,否则将通过随机选择项目构建具有所选大小的池。此外,还提出了一种新的正项基数估计方法,该方法可以与所提出的池控制方法并行实现。计算机仿真结果表明,无论是否考虑正项基数信息,采用该方法进行自适应群体测试都比传统方法具有更好的性能。
{"title":"A new pool control method for Boolean compressed sensing based adaptive group testing","authors":"Yujia Lu, K. Hayashi","doi":"10.1109/APSIPA.2017.8282168","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282168","url":null,"abstract":"In the adaptive group testing, the pool (a set of items to be tested) used in the next test is determined based on past test results, and its performance heavily depends on the control method of the pool. This paper proposes a new pool control method for Boolean compressed sensing based adaptive group testing. The proposed method firstly selects a pool size of the next test by minimizing the expectation of the approximated required number of tests after the next test based on the estimated number of remaining positive items. Then, when the selected pool size is one, an item having the highest probability of being positive will be selected as a pool, otherwise a pool with the selected size will be constructed by randomly selecting items. In addition, a new cardinality estimation method of positive items, that can be implemented in parallel with the proposed pool control method, is also proposed. Computer simulation results reveal that the adaptive group testing with the proposed method has better performance than that with the conventional methods for both with and without the information of cardinality of positive items.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125476602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving N-gram language modeling for code-switching speech recognition 改进的N-gram语言建模用于代码转换语音识别
Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li
Code-switching language modeling is challenging due to statistics of each individual language, as well as statistics of cross-lingual language are insufficient. To compensate for the issue of statistical insufficiency, in this paper we propose a word-class n-gram language modeling approach of which only infrequent words are clustered while most frequent words are treated as singleton classes themselves. We first demonstrate the effectiveness of the proposed method on our English-Mandarin code-switching SEAME data in terms of perplexity. Compared with the conventional word n-gram language models, as well as the word-class n-gram language models of which entire vocabulary words are clustered, the proposed word-class n- gram language modeling approach can yield lower perplexity on our SEAME dev data sets. Additionally, we observed further perplexity reduction by interpolating the word n-gram language models with the proposed word-class n-gram language models. We also attempted to build word-class n-gram language models using third-party text data with our proposed method, and similar perplexity performance improvement was obtained on our SEAME dev data sets when they are interpolated with the word n-gram language models. Finally, to examine the contribution of the proposed language modeling approach to code-switching speech recognition, we conducted lattice based n-best rescoring.
由于对每种语言的统计以及跨语言的统计不足,代码转换语言建模具有挑战性。为了弥补统计不足的问题,本文提出了一种词类n-gram语言建模方法,其中只有不频繁的词被聚类,而最频繁的词被视为单类。我们首先从困惑度的角度证明了该方法在英汉语码切换SEAME数据上的有效性。与传统的词n-gram语言模型以及整个词汇词聚类的词类n-gram语言模型相比,本文提出的词类n-gram语言建模方法在我们的SEAME开发数据集上产生了更低的困惑度。此外,我们观察到通过将单词n-gram语言模型与提出的词类n-gram语言模型内插,进一步降低了困惑度。我们还尝试使用我们提出的方法使用第三方文本数据构建词类n-gram语言模型,并且当我们的SEAME开发数据集与词n-gram语言模型进行插值时,它们也获得了类似的困惑性能改进。最后,为了检验所提出的语言建模方法对代码切换语音识别的贡献,我们进行了基于格的n-best评分。
{"title":"Improving N-gram language modeling for code-switching speech recognition","authors":"Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li","doi":"10.1109/APSIPA.2017.8282279","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282279","url":null,"abstract":"Code-switching language modeling is challenging due to statistics of each individual language, as well as statistics of cross-lingual language are insufficient. To compensate for the issue of statistical insufficiency, in this paper we propose a word-class n-gram language modeling approach of which only infrequent words are clustered while most frequent words are treated as singleton classes themselves. We first demonstrate the effectiveness of the proposed method on our English-Mandarin code-switching SEAME data in terms of perplexity. Compared with the conventional word n-gram language models, as well as the word-class n-gram language models of which entire vocabulary words are clustered, the proposed word-class n- gram language modeling approach can yield lower perplexity on our SEAME dev data sets. Additionally, we observed further perplexity reduction by interpolating the word n-gram language models with the proposed word-class n-gram language models. We also attempted to build word-class n-gram language models using third-party text data with our proposed method, and similar perplexity performance improvement was obtained on our SEAME dev data sets when they are interpolated with the word n-gram language models. Finally, to examine the contribution of the proposed language modeling approach to code-switching speech recognition, we conducted lattice based n-best rescoring.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132007961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1