This paper proposes multiscale directional transforms (MDTs) based on cosine-sine modulated filter banks (CSMFBs). Sparse image representation by directional transforms is necessary for image analysis and processing tasks and has been extensively studied. Conventionally, cosine-sine modulated filter banks (CSMFBs) have been proposed as one of separable directional transforms (SepDTs). Their computational cost is much lower than non-SepDTs, and they can work better than other SepDTs, e.g., dual-tree complex wavelet transforms (DTCWTs) in image processing applications. One drawback of CSMFBs is a lack of multiscale directional selectivity, i.e., it cannot provide multiple scale directional atoms as in the DTCWT frame, and thus flexible image representation cannot be achieved. In this work, we show a design method of multiscale CSMFBs by extending modulated lapped transforms, which are a subclass of CSMFBs. We confirm its effectiveness in nonlinear approximation and image denoising as a practical application.
{"title":"Multiscale directional transforms based on cosine-sine modulated filter banks for sparse directional image representation","authors":"Yusuke Nomura, Ryutaro Ogawa, Seisuke Kyochi, Taizo Suzuki","doi":"10.1109/APSIPA.2017.8282331","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282331","url":null,"abstract":"This paper proposes multiscale directional transforms (MDTs) based on cosine-sine modulated filter banks (CSMFBs). Sparse image representation by directional transforms is necessary for image analysis and processing tasks and has been extensively studied. Conventionally, cosine-sine modulated filter banks (CSMFBs) have been proposed as one of separable directional transforms (SepDTs). Their computational cost is much lower than non-SepDTs, and they can work better than other SepDTs, e.g., dual-tree complex wavelet transforms (DTCWTs) in image processing applications. One drawback of CSMFBs is a lack of multiscale directional selectivity, i.e., it cannot provide multiple scale directional atoms as in the DTCWT frame, and thus flexible image representation cannot be achieved. In this work, we show a design method of multiscale CSMFBs by extending modulated lapped transforms, which are a subclass of CSMFBs. We confirm its effectiveness in nonlinear approximation and image denoising as a practical application.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134477401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282007
Xiao-Zhi Zhang, Ya Li, B. Ling, Chao Song, K. Teo
Compressed sensing (CS) has shown great potential in accelerating data acquisition procedure for magnetic resonance imaging (MRI). For compressed sensing magnetic resonance imaging (CS-MRI), the incoherence between the sensing and the sparsity matrices is a key role of the performance . However, in conventional MRI, the sensing matrix is Fourier matrix and the sparsifying transform matrix is Wavelet matrix, respectively. They are not optimally incoherent. Moreover, Fourier encoding weakly spreads out energy and concentrates the energy in the center of the k-space. This will further reduce the randomness of the under-sampling pattern. Therefore, for the CS-MRI, incoherence between the sensing and the sparsity matrices will be weak and lead to a degradation of images reconstruction quality for highly under-sampling factors. In this paper, we investigate spread spectrum incoherent sampling compressed sensing MRI using fractional Fourier transform. Simulation results shown that the fractional Fourier transform encoding can spread out the energy more uniformly than the conventional Fourier encoding. Then it is beneficial for designing the incoherent sampling pattern to satisfy the incoherent requirements of the CS-MRI.
{"title":"Spread spectrum compressed sensing magnetic resonance imaging via fractional Fourier transform","authors":"Xiao-Zhi Zhang, Ya Li, B. Ling, Chao Song, K. Teo","doi":"10.1109/APSIPA.2017.8282007","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282007","url":null,"abstract":"Compressed sensing (CS) has shown great potential in accelerating data acquisition procedure for magnetic resonance imaging (MRI). For compressed sensing magnetic resonance imaging (CS-MRI), the incoherence between the sensing and the sparsity matrices is a key role of the performance . However, in conventional MRI, the sensing matrix is Fourier matrix and the sparsifying transform matrix is Wavelet matrix, respectively. They are not optimally incoherent. Moreover, Fourier encoding weakly spreads out energy and concentrates the energy in the center of the k-space. This will further reduce the randomness of the under-sampling pattern. Therefore, for the CS-MRI, incoherence between the sensing and the sparsity matrices will be weak and lead to a degradation of images reconstruction quality for highly under-sampling factors. In this paper, we investigate spread spectrum incoherent sampling compressed sensing MRI using fractional Fourier transform. Simulation results shown that the fractional Fourier transform encoding can spread out the energy more uniformly than the conventional Fourier encoding. Then it is beneficial for designing the incoherent sampling pattern to satisfy the incoherent requirements of the CS-MRI.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130287140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282253
K. Chee, Zhe Jin, W. Yap, B. Goi
Biometrics has been explosively deployed for identity verification and/or identification over the last decade. Lately, multi-biometric systems are gaining attention due to its universality and higher accuracy in biometric recognition. However, the compromise of templates stored in database as separate entities in multi-biometric systems undoubtedly poses the major security and privacy threats due to the strong binding between identity and biometric data. In this paper, we propose to fuse fingerprint and voice modalities at feature level to obtain an integrated template. Subsequently, we propose two-dimensional Winner-Takes-All hashing method to protect the fused template. The proposed hashing method is inspired from Winner-Takes-All hashing and further altered for this unique multi-biometric system. Specifically, the proposed hashing method transforms the continuous fused biometric feature into discrete value. Such transformation enjoys strong non-linearity and thus resilient to the feature variation in certain degree. We show that the resultant hashed code can withstand the major attacks (e.g. template invertibility attack, attack via multiplicity etc.) while yielding reasonable recognition performance. A low equal error rate of 0.94% is obtained using the proposed hashing method on fingerprint images from FVC2002 DB1 and FVC2002 DB2 datasets and voice features from NIST Speaker Recognition Evaluation (SRE) 2004 ∼ 2010. More importantly, the proposed two-dimensional Winner-Takes-All hashing method can be extended and applied to other biometric modalities with real value representation.
{"title":"Two-dimensional winner-takes-all hashing in template protection based on fingerprint and voice feature level fusion","authors":"K. Chee, Zhe Jin, W. Yap, B. Goi","doi":"10.1109/APSIPA.2017.8282253","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282253","url":null,"abstract":"Biometrics has been explosively deployed for identity verification and/or identification over the last decade. Lately, multi-biometric systems are gaining attention due to its universality and higher accuracy in biometric recognition. However, the compromise of templates stored in database as separate entities in multi-biometric systems undoubtedly poses the major security and privacy threats due to the strong binding between identity and biometric data. In this paper, we propose to fuse fingerprint and voice modalities at feature level to obtain an integrated template. Subsequently, we propose two-dimensional Winner-Takes-All hashing method to protect the fused template. The proposed hashing method is inspired from Winner-Takes-All hashing and further altered for this unique multi-biometric system. Specifically, the proposed hashing method transforms the continuous fused biometric feature into discrete value. Such transformation enjoys strong non-linearity and thus resilient to the feature variation in certain degree. We show that the resultant hashed code can withstand the major attacks (e.g. template invertibility attack, attack via multiplicity etc.) while yielding reasonable recognition performance. A low equal error rate of 0.94% is obtained using the proposed hashing method on fingerprint images from FVC2002 DB1 and FVC2002 DB2 datasets and voice features from NIST Speaker Recognition Evaluation (SRE) 2004 ∼ 2010. More importantly, the proposed two-dimensional Winner-Takes-All hashing method can be extended and applied to other biometric modalities with real value representation.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"10 47","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113954626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282280
J. Yu, Xiong Xiao, Lei Xie, Chng Eng Siong
In this paper, we propose to embed sentences into fixed-dimensional vectors that carry the topic information for story segmentation. As a sentence comprises of a sequence of words and may have different lengths, we use long short-term memory recurrent neural network (LSTM-RNN) to summarize the information of the whole sentence and only predict the topic class at the last word in the sentence. The output of the network at the last word can be used as an embedding of the sentence in the topic space. We used the obtained sentence embeddings in the HMM-based story segmentation framework and obtained promising results. On the TDT2 corpus, the F1 measure is improved to 0.789 from 0.765 which is obtained by a competitive system using DNN and bag-of-words features.
{"title":"Topic embedding of sentences for story segmentation","authors":"J. Yu, Xiong Xiao, Lei Xie, Chng Eng Siong","doi":"10.1109/APSIPA.2017.8282280","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282280","url":null,"abstract":"In this paper, we propose to embed sentences into fixed-dimensional vectors that carry the topic information for story segmentation. As a sentence comprises of a sequence of words and may have different lengths, we use long short-term memory recurrent neural network (LSTM-RNN) to summarize the information of the whole sentence and only predict the topic class at the last word in the sentence. The output of the network at the last word can be used as an embedding of the sentence in the topic space. We used the obtained sentence embeddings in the HMM-based story segmentation framework and obtained promising results. On the TDT2 corpus, the F1 measure is improved to 0.789 from 0.765 which is obtained by a competitive system using DNN and bag-of-words features.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122412532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282011
Aykut Koç, Haldun M. Özaktas, Burak Bartan, Erhan Gundogdu, T. Çukur
Fast and accurate digital computation of the fractional Fourier transform (FRT) and linear canonical transforms (LCT) are of utmost importance in order to deploy them in real world applications and systems. The algorithms in O(NlogN) to obtain the samples of the transform from the samples of the input function are presented for several different types of FRTs and LCTs, both in 1D and 2D forms. To apply them in image processing we consider the problem of obtaining sparse transform domains for images. Sparse recovery tries to reconstruct images that are sparse in a linear transform domain, from an underdeter- mined measurement set. The success of sparse recovery relies on the knowledge of domains in which compressible representations of the image can be obtained. In this work, we consider two- and three-dimensional images, and investigate the effects of the fractional Fourier (FRT) and linear canonical transforms (LCT) in obtaining sparser transform domains. For 2D images, we investigate direct transforming versus several patching strategies. For the 3D case, we consider biomedical images, and compare several different strategies such as taking 2D slices and optimizing for each slice and direct 3D transforming.
{"title":"Digital computation of fractional Fourier and linear canonical transforms and sparse image representation","authors":"Aykut Koç, Haldun M. Özaktas, Burak Bartan, Erhan Gundogdu, T. Çukur","doi":"10.1109/APSIPA.2017.8282011","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282011","url":null,"abstract":"Fast and accurate digital computation of the fractional Fourier transform (FRT) and linear canonical transforms (LCT) are of utmost importance in order to deploy them in real world applications and systems. The algorithms in O(NlogN) to obtain the samples of the transform from the samples of the input function are presented for several different types of FRTs and LCTs, both in 1D and 2D forms. To apply them in image processing we consider the problem of obtaining sparse transform domains for images. Sparse recovery tries to reconstruct images that are sparse in a linear transform domain, from an underdeter- mined measurement set. The success of sparse recovery relies on the knowledge of domains in which compressible representations of the image can be obtained. In this work, we consider two- and three-dimensional images, and investigate the effects of the fractional Fourier (FRT) and linear canonical transforms (LCT) in obtaining sparser transform domains. For 2D images, we investigate direct transforming versus several patching strategies. For the 3D case, we consider biomedical images, and compare several different strategies such as taking 2D slices and optimizing for each slice and direct 3D transforming.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124437732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282198
S. Ito
The use of compressive sensing (CS) in applications with rapid spatial phase variations is difficult, since not only the magnitude but also phase regularization is required in the CS framework. In this article, we propose a novel image reconstruction scheme for MR phase varied images in which phase regularizer is not required in the rather simple CS reconstruction scheme. In our work, to improve the incoherence between the sampling matrix and the basis of the sparsifying transform, multi-scale eFREBAS transform domain thresholding was used. Reconstruction experiments showed that CS reconstruction using 8-scale eFREBAS transform can restore the magnitude and phase of images much better than the conventional method, especially at the region where phase changes rapidly
{"title":"Compressed sensing reconstruction of MR phase-varied images using multi-scale complex sparsifying transform","authors":"S. Ito","doi":"10.1109/APSIPA.2017.8282198","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282198","url":null,"abstract":"The use of compressive sensing (CS) in applications with rapid spatial phase variations is difficult, since not only the magnitude but also phase regularization is required in the CS framework. In this article, we propose a novel image reconstruction scheme for MR phase varied images in which phase regularizer is not required in the rather simple CS reconstruction scheme. In our work, to improve the incoherence between the sampling matrix and the basis of the sparsifying transform, multi-scale eFREBAS transform domain thresholding was used. Reconstruction experiments showed that CS reconstruction using 8-scale eFREBAS transform can restore the magnitude and phase of images much better than the conventional method, especially at the region where phase changes rapidly","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127893490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282015
Tomoki Murata, Y. Kajikawa, S. Miyoshi
We analyze the behaviors of the Filtered-X LMS (FXLMS) algorithm for active noise control (ANC). Correlations between the impulse response of an adaptive filter and a primary path are treated as macroscopic variables. To obtain the correlations, we analytically solve the equations and finally compute the MSE. In particular, we analyze the behaviors of multiple-channel ANC. We theoretically show that the MSE is affected by the secondary paths that are not directly connected.
分析了用于主动噪声控制(ANC)的滤波- x LMS (FXLMS)算法的行为。自适应滤波器的脉冲响应与主路径之间的相关性被视为宏观变量。为了得到相关性,我们对方程进行解析求解,最后计算出均方差。特别地,我们分析了多通道ANC的行为。我们从理论上表明,MSE受到非直接连接的辅助路径的影响。
{"title":"Statistical-mechanical analysis of the FXLMS algorithm for multiple-channel active noise control","authors":"Tomoki Murata, Y. Kajikawa, S. Miyoshi","doi":"10.1109/APSIPA.2017.8282015","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282015","url":null,"abstract":"We analyze the behaviors of the Filtered-X LMS (FXLMS) algorithm for active noise control (ANC). Correlations between the impulse response of an adaptive filter and a primary path are treated as macroscopic variables. To obtain the correlations, we analytically solve the equations and finally compute the MSE. In particular, we analyze the behaviors of multiple-channel ANC. We theoretically show that the MSE is affected by the secondary paths that are not directly connected.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127895281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282268
M. Kuribayashi, Takahiro Ueda, N. Funabiki
The management of sensitive data in an organization is not limited to use authentication and encrytion systems. Actually, malicious users inside of an organization will leak sensitive data to adversaries if the users are privileged to access to the data. In this study, we enable a manager to identify the traitor(s) inside of an organization from the leaked data. The essential technique is the fingerprinting for encrypted data. When a user decrypts a ciphertext using the secret key assigned to the user, the decrypted data involves the information associated with the user. We propose such an access control system by combining an attribute-based encryption scheme and fingerprinting scheme. The proposed method prevents a dishonest manager from framing innocent users by realizing the asymmetric protocol using the fingerprinting scheme based on a key management.
{"title":"Secure data management system with traceability against internal leakage","authors":"M. Kuribayashi, Takahiro Ueda, N. Funabiki","doi":"10.1109/APSIPA.2017.8282268","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282268","url":null,"abstract":"The management of sensitive data in an organization is not limited to use authentication and encrytion systems. Actually, malicious users inside of an organization will leak sensitive data to adversaries if the users are privileged to access to the data. In this study, we enable a manager to identify the traitor(s) inside of an organization from the leaked data. The essential technique is the fingerprinting for encrypted data. When a user decrypts a ciphertext using the secret key assigned to the user, the decrypted data involves the information associated with the user. We propose such an access control system by combining an attribute-based encryption scheme and fingerprinting scheme. The proposed method prevents a dishonest manager from framing innocent users by realizing the asymmetric protocol using the fingerprinting scheme based on a key management.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127975944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282325
Takumi Takahashi, S. Ibi, S. Sampei
This paper proposes a new design criterion of adaptively scaled belief (ASB) in Gaussian belief propagation (GaBP), especially for large multi-user multi-input multi-output (MU-MIMO) detection with higher-order modulation. The most vital issue with regard to improving the convergence property of GaBP iterative detection is how to deal with the soft symbol outliers, which are induced by modeling errors of prior beliefs due to a lack of channel hardening effects. Unfortunately, the modeling errors become more severe in the presence of higher correlation among typical bit-wise prior beliefs while utilizing higher-order quadrature amplitude modulation (QAM) schemes. To avoid impairments of the inter-bit correlation, symbol-wise beliefs are defined for GaBP self-iterative detection. Moreover, as a simplest way to mitigate the harmful impacts of soft symbol outliers, a novel adaptive belief scaling is proposed while stabilizing dynamics of random MIMO channels. Finally, the validity of ASB for symbol-wise iterative detection is confirmed regarding suppression of the bit error rate (BER) floor level.
{"title":"Design of adaptively scaled belief in large MIMO detection for higher-order modulation","authors":"Takumi Takahashi, S. Ibi, S. Sampei","doi":"10.1109/APSIPA.2017.8282325","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282325","url":null,"abstract":"This paper proposes a new design criterion of adaptively scaled belief (ASB) in Gaussian belief propagation (GaBP), especially for large multi-user multi-input multi-output (MU-MIMO) detection with higher-order modulation. The most vital issue with regard to improving the convergence property of GaBP iterative detection is how to deal with the soft symbol outliers, which are induced by modeling errors of prior beliefs due to a lack of channel hardening effects. Unfortunately, the modeling errors become more severe in the presence of higher correlation among typical bit-wise prior beliefs while utilizing higher-order quadrature amplitude modulation (QAM) schemes. To avoid impairments of the inter-bit correlation, symbol-wise beliefs are defined for GaBP self-iterative detection. Moreover, as a simplest way to mitigate the harmful impacts of soft symbol outliers, a novel adaptive belief scaling is proposed while stabilizing dynamics of random MIMO channels. Finally, the validity of ASB for symbol-wise iterative detection is confirmed regarding suppression of the bit error rate (BER) floor level.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121205539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282285
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
The conventional frame-level Gaussian process regression (GPR)-based F0 generation can produce natural sounding pitch contours. However, a frame-level model is insufficient to represent pitch patterns in longer unit, especially for syllable- level tone contours in tonal languages. This paper proposes a multi-level modeling technique for improving GPR-based F0 generation, in which syllable-level model is considered as well as the frame-level model. In the syllable-level model, we use the discrete cosine transform (DCT) coefficients extracted from log F0 contour in syllable unit as the output variables of Gaussian process. F0 contours are generated by jointly maximizing predictive distribution of frame- and syllable-level models. Experimental results of objective evaluation show improvement in F0 generation when using a small amount of training data around 30 minutes.
{"title":"Enhanced F0 generation for GPR-based speech synthesis considering syllable-based prosodic features","authors":"Decha Moungsri, Tomoki Koriyama, Takao Kobayashi","doi":"10.1109/APSIPA.2017.8282285","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282285","url":null,"abstract":"The conventional frame-level Gaussian process regression (GPR)-based F0 generation can produce natural sounding pitch contours. However, a frame-level model is insufficient to represent pitch patterns in longer unit, especially for syllable- level tone contours in tonal languages. This paper proposes a multi-level modeling technique for improving GPR-based F0 generation, in which syllable-level model is considered as well as the frame-level model. In the syllable-level model, we use the discrete cosine transform (DCT) coefficients extracted from log F0 contour in syllable unit as the output variables of Gaussian process. F0 contours are generated by jointly maximizing predictive distribution of frame- and syllable-level models. Experimental results of objective evaluation show improvement in F0 generation when using a small amount of training data around 30 minutes.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128535263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}