Given the increasing attention paid to speech emotion classification in recent years, this work presents a novel speech emotion classification approach based on the multiple kernel Gaussian process. Two major aspects of a classification problem that play an important role in classification accuracy are addressed, i.e. feature extraction and classification. Prosodic features and other features widely used in sound effect classification are selected. A semi-nonnegative matrix factorization algorithm is then applied to the proposed features in order to obtain more information about the features. Following feature extraction, a multiple kernel Gaussian process (GP) is used for classification, in which two similarity notions from our data in the learning algorithm are presented by combining the linear kernel and radial basis function (RBF) kernel. According to our results, the proposed speech emotion classification apporach achieve an accuracy of 77.74%. Moreover, comparing different apporaches reveals that the proposed system performs best than other apporaches.
{"title":"Speech emotion classification using multiple kernel Gaussian process","authors":"Sih-Huei Chen, Jia-Ching Wang, Wen-Chi Hsieh, Yu-Hao Chin, Chin-Wen Ho, Chung-Hsien Wu","doi":"10.1109/APSIPA.2016.7820708","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820708","url":null,"abstract":"Given the increasing attention paid to speech emotion classification in recent years, this work presents a novel speech emotion classification approach based on the multiple kernel Gaussian process. Two major aspects of a classification problem that play an important role in classification accuracy are addressed, i.e. feature extraction and classification. Prosodic features and other features widely used in sound effect classification are selected. A semi-nonnegative matrix factorization algorithm is then applied to the proposed features in order to obtain more information about the features. Following feature extraction, a multiple kernel Gaussian process (GP) is used for classification, in which two similarity notions from our data in the learning algorithm are presented by combining the linear kernel and radial basis function (RBF) kernel. According to our results, the proposed speech emotion classification apporach achieve an accuracy of 77.74%. Moreover, comparing different apporaches reveals that the proposed system performs best than other apporaches.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115423683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820688
Chen-Wei Chang, Min-Hung Chen, Kuan-Chang Chen, Chi-Ming Yeh, Yi-Chang Lu
Pinhole-array-based hand-held light field cameras can be used to capture 4-dimensional light field data for different applications such as digital refocusing and depth estimation. Our previous experiences suggest the design of the pinhole array mask is very critical to the performance of the camera, and the selection of mask parameters could be very different between applications. In this paper, we derive equations for determining the parameters of pinhole masks. The proposed physically-based model can be applied to cameras of different pixel sizes. The experimental results which match the proposed model are also provided at the end of this paper.
{"title":"Mask design for pinhole-array-based hand-held light field cameras with applications in depth estimation","authors":"Chen-Wei Chang, Min-Hung Chen, Kuan-Chang Chen, Chi-Ming Yeh, Yi-Chang Lu","doi":"10.1109/APSIPA.2016.7820688","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820688","url":null,"abstract":"Pinhole-array-based hand-held light field cameras can be used to capture 4-dimensional light field data for different applications such as digital refocusing and depth estimation. Our previous experiences suggest the design of the pinhole array mask is very critical to the performance of the camera, and the selection of mask parameters could be very different between applications. In this paper, we derive equations for determining the parameters of pinhole masks. The proposed physically-based model can be applied to cameras of different pixel sizes. The experimental results which match the proposed model are also provided at the end of this paper.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820703
Zhen Wei, Zhizheng Wu, Lei Xie
Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.
{"title":"Predicting articulatory movement from text using deep architecture with stacked bottleneck features","authors":"Zhen Wei, Zhizheng Wu, Lei Xie","doi":"10.1109/APSIPA.2016.7820703","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820703","url":null,"abstract":"Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129708615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820710
Jaemoon Lim, Minhyeok Heo, Chulwoo Lee, Chang-Su Kim
We propose a novel noisy low-light image enhancement algorithm via structure-texture-noise (STN) decomposition. We split an input image into structure, texture, and noise components, and enhance the structure and texture components separately. Specifically, we first enhance the contrast of the structure image, by extending a 2D histogram-based image enhancement scheme based on the characteristics of low-light images. Then, we reconstruct the texture image by retrieving texture components from the noise image, and enhance it by exploiting the perceptual response of the human visual system. Experimental results demonstrate that the proposed STN algorithm sharpens the texture and enhances the contrast more effectively than conventional algorithms, while removing noise without artifacts.
{"title":"Enhancement of noisy low-light images via structure-texture-noise decomposition","authors":"Jaemoon Lim, Minhyeok Heo, Chulwoo Lee, Chang-Su Kim","doi":"10.1109/APSIPA.2016.7820710","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820710","url":null,"abstract":"We propose a novel noisy low-light image enhancement algorithm via structure-texture-noise (STN) decomposition. We split an input image into structure, texture, and noise components, and enhance the structure and texture components separately. Specifically, we first enhance the contrast of the structure image, by extending a 2D histogram-based image enhancement scheme based on the characteristics of low-light images. Then, we reconstruct the texture image by retrieving texture components from the noise image, and enhance it by exploiting the perceptual response of the human visual system. Experimental results demonstrate that the proposed STN algorithm sharpens the texture and enhances the contrast more effectively than conventional algorithms, while removing noise without artifacts.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820677
Yuan-sheng Luo, Hong Cheng, Lu Yang
Vessel segmentation of digital retinal images plays an important role in diagnosis of diseases such as diabetics, hypertension and retinopathy of prematurity due to these diseases impact the retina. In this paper, a novel Size-Invariant Fully Convolutional Neural Network (SIFCN) is proposed to address the automatic retinal vessel segmentation problems. The input data of the network is the patches of images and the corresponding pixel-wise labels. A consecutive convolution layers and pooling layers follow the input data, so that the network can learn the abstract features to segment retinal vessel. Our network is designed to hold the height and width of data of each layer with padding and assign pooling stride so that the spatial information maintain and up-sample is not required. Compared with the pixel-wise retinal vessel segmentation approaches, our patch-wise segmentation is much more efficient since in each cycle it can predict all the pixels of the patch. Our overlapped SIFCN approach achieves accuracy of 0.9471, with the AUC of 0.9682. And our non-overlap SIFCN is the most efficient approach among the deep learning approaches, costing only 3.68 seconds per image, and the overlapped SIFCN costs 31.17 seconds per image.
{"title":"Size-Invariant Fully Convolutional Neural Network for vessel segmentation of digital retinal images","authors":"Yuan-sheng Luo, Hong Cheng, Lu Yang","doi":"10.1109/APSIPA.2016.7820677","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820677","url":null,"abstract":"Vessel segmentation of digital retinal images plays an important role in diagnosis of diseases such as diabetics, hypertension and retinopathy of prematurity due to these diseases impact the retina. In this paper, a novel Size-Invariant Fully Convolutional Neural Network (SIFCN) is proposed to address the automatic retinal vessel segmentation problems. The input data of the network is the patches of images and the corresponding pixel-wise labels. A consecutive convolution layers and pooling layers follow the input data, so that the network can learn the abstract features to segment retinal vessel. Our network is designed to hold the height and width of data of each layer with padding and assign pooling stride so that the spatial information maintain and up-sample is not required. Compared with the pixel-wise retinal vessel segmentation approaches, our patch-wise segmentation is much more efficient since in each cycle it can predict all the pixels of the patch. Our overlapped SIFCN approach achieves accuracy of 0.9471, with the AUC of 0.9682. And our non-overlap SIFCN is the most efficient approach among the deep learning approaches, costing only 3.68 seconds per image, and the overlapped SIFCN costs 31.17 seconds per image.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130364872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820829
M. S. Shakeel, K. Lam
In this paper, we propose a new approach for recognition of low-resolution face images by using sparse coding of local features. The proposed algorithm extracts Gabor features from a low-resolution gallery image and a query image at different scales and orientations, then projects the features separately into a new low-dimensional feature space using sparse coding that preserves the sparse structure of the local features. To determine the similarity between the projected features, a coefficient vector is estimated by using linear regression that determines the relationship between the projected gallery and query features. On the basis of this coefficient vector, residual values will be computed to classify the images. To validate our proposed method, experiments were performed using three databases (ORL, Extended-Yale B, and CAS-PEAL-R1), which contain images with different facial expressions and lighting conditions. Experimental results show that our method outperforms various classical and state-of-the-art face recognition methods.
本文提出了一种基于局部特征稀疏编码的低分辨率人脸图像识别新方法。该算法从不同尺度和方向的低分辨率图库图像和查询图像中提取Gabor特征,然后使用稀疏编码将特征分别投影到新的低维特征空间中,保留了局部特征的稀疏结构。为了确定投影特征之间的相似性,使用线性回归来估计系数向量,该系数向量确定投影图库和查询特征之间的关系。在此系数向量的基础上,计算残差值对图像进行分类。为了验证我们提出的方法,使用三个数据库(ORL, Extended-Yale B和cas - pearl - r1)进行了实验,这些数据库包含不同面部表情和光照条件的图像。实验结果表明,该方法优于各种经典和先进的人脸识别方法。
{"title":"Recognition of low-resolution face images using sparse coding of local features","authors":"M. S. Shakeel, K. Lam","doi":"10.1109/APSIPA.2016.7820829","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820829","url":null,"abstract":"In this paper, we propose a new approach for recognition of low-resolution face images by using sparse coding of local features. The proposed algorithm extracts Gabor features from a low-resolution gallery image and a query image at different scales and orientations, then projects the features separately into a new low-dimensional feature space using sparse coding that preserves the sparse structure of the local features. To determine the similarity between the projected features, a coefficient vector is estimated by using linear regression that determines the relationship between the projected gallery and query features. On the basis of this coefficient vector, residual values will be computed to classify the images. To validate our proposed method, experiments were performed using three databases (ORL, Extended-Yale B, and CAS-PEAL-R1), which contain images with different facial expressions and lighting conditions. Experimental results show that our method outperforms various classical and state-of-the-art face recognition methods.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130683806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820892
Shunsuke Yamaki, Ryo Suzuki, M. Kawamata, M. Yoshizawa
This paper proposes statistical analysis of phase-only correlation functions between two signals with stochastic phase-spectra. We derive the expectation and variance of the phase-only correlation functions assuming phase-spectra of two input signals to be bivariate probability variables. As a result, we give expressions for the expectation and variance of phase-only correlation functions in terms of joint characteristic functions of the bivariate probability density function of the phase-spectra.
{"title":"Statistical analysis of phase-only correlation functions between two signals with stochastic bivariate phase-spectra","authors":"Shunsuke Yamaki, Ryo Suzuki, M. Kawamata, M. Yoshizawa","doi":"10.1109/APSIPA.2016.7820892","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820892","url":null,"abstract":"This paper proposes statistical analysis of phase-only correlation functions between two signals with stochastic phase-spectra. We derive the expectation and variance of the phase-only correlation functions assuming phase-spectra of two input signals to be bivariate probability variables. As a result, we give expressions for the expectation and variance of phase-only correlation functions in terms of joint characteristic functions of the bivariate probability density function of the phase-spectra.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"38 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130926653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Variable transform block (TB) sizes cause high computational complexity at the HEVC encoder. In this paper, a fast residual quad-tree (RQT) structure decision method is proposed to reduce the number of candidate transform sizes. The proposed method uses spatial and temporal correlation information in the neighbor blocks to predict the depth of current RQT. In addition, an efficient all zero block (AZB) detection approach is designed to accelerate transform and quantization. At last, the nonzero DCT coefficient (NNZ) based scheme is also integrated in the proposed method to early terminate the recursive RQT mode decision process. Experimental results show that our proposed method is able to reduce 70% computation complexity on average in RQT structure decision. And the BDBR and BDPR gains are 1.13% and −0.048dB respectively which are negligible.
{"title":"Fast RQT structure decision method for HEVC","authors":"Wei Zhou, Chang Yan, Henglu Wei, Guanwen Zhang, Ai Qing, Xin Zhou","doi":"10.1109/APSIPA.2016.7820705","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820705","url":null,"abstract":"Variable transform block (TB) sizes cause high computational complexity at the HEVC encoder. In this paper, a fast residual quad-tree (RQT) structure decision method is proposed to reduce the number of candidate transform sizes. The proposed method uses spatial and temporal correlation information in the neighbor blocks to predict the depth of current RQT. In addition, an efficient all zero block (AZB) detection approach is designed to accelerate transform and quantization. At last, the nonzero DCT coefficient (NNZ) based scheme is also integrated in the proposed method to early terminate the recursive RQT mode decision process. Experimental results show that our proposed method is able to reduce 70% computation complexity on average in RQT structure decision. And the BDBR and BDPR gains are 1.13% and −0.048dB respectively which are negligible.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129606523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820877
Xinyu Shen, Chunyu Lin, Yao Zhao, Hongyun Lin, Meiqin Liu
Saliency detection as an image preprocessing has been widely used in many applications such as image segmentation. Since most images stored in DCT domain, we propose an effective saliency detection algorithm, which is mainly based on DCT and secondary quantization. Firstly, the DC coefficient and the first five AC coefficients are used to get the color saliency map. Then, through secondary quantization of a JPEG image, we can obtain the difference of the original image and the quantified image, from which we can get the texture saliency map. Next, considering the center bias theory, the center region is easier to catch people's attention. And then the band-pass filter is used to simulate the behavior that the human visual system detects the salient region. Finally, the final saliency map is generated based on these two maps and two priorities. Experimental results on two datasets show that the proposed method can accurately detect the saliency regions and outperformed existing methods.
{"title":"Saliency detection using secondary quantization in DCT domain","authors":"Xinyu Shen, Chunyu Lin, Yao Zhao, Hongyun Lin, Meiqin Liu","doi":"10.1109/APSIPA.2016.7820877","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820877","url":null,"abstract":"Saliency detection as an image preprocessing has been widely used in many applications such as image segmentation. Since most images stored in DCT domain, we propose an effective saliency detection algorithm, which is mainly based on DCT and secondary quantization. Firstly, the DC coefficient and the first five AC coefficients are used to get the color saliency map. Then, through secondary quantization of a JPEG image, we can obtain the difference of the original image and the quantified image, from which we can get the texture saliency map. Next, considering the center bias theory, the center region is easier to catch people's attention. And then the band-pass filter is used to simulate the behavior that the human visual system detects the salient region. Finally, the final saliency map is generated based on these two maps and two priorities. Experimental results on two datasets show that the proposed method can accurately detect the saliency regions and outperformed existing methods.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130666618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820804
Tao Long, J. Benesty, Jingdong Chen
Noise reduction has long been an active research topic in signal processing and many algorithms have been developed over the last four decades. These algorithms were proved to be successful in some degree to improve the signal-to-noise ratio (SNR) and speech quality. However, there is one problem common to all these algorithms: the volume of the enhanced signal after noise reduction is often perceived lower than that of the original signal. This phenomenon is particularly serious when SNR is low. In this paper, we develop two constrained Wiener gains and filters for noise reduction in the short-time Fourier transform (STFT) domain. These Wiener gains and filters are deduced by minimizing the mean-squared error (MSE) between the clean speech and the speech estimate with the constraint that the sum of the variances of the filtered speech and residual noise is equal to the variance of the noisy observation.
{"title":"Constrained Wiener gains and filters for single-channel and multichannel noise reduction","authors":"Tao Long, J. Benesty, Jingdong Chen","doi":"10.1109/APSIPA.2016.7820804","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820804","url":null,"abstract":"Noise reduction has long been an active research topic in signal processing and many algorithms have been developed over the last four decades. These algorithms were proved to be successful in some degree to improve the signal-to-noise ratio (SNR) and speech quality. However, there is one problem common to all these algorithms: the volume of the enhanced signal after noise reduction is often perceived lower than that of the original signal. This phenomenon is particularly serious when SNR is low. In this paper, we develop two constrained Wiener gains and filters for noise reduction in the short-time Fourier transform (STFT) domain. These Wiener gains and filters are deduced by minimizing the mean-squared error (MSE) between the clean speech and the speech estimate with the constraint that the sum of the variances of the filtered speech and residual noise is equal to the variance of the noisy observation.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128051735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}