首页 > 最新文献

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)最新文献

英文 中文
Speech emotion classification using multiple kernel Gaussian process 基于多核高斯过程的语音情感分类
Sih-Huei Chen, Jia-Ching Wang, Wen-Chi Hsieh, Yu-Hao Chin, Chin-Wen Ho, Chung-Hsien Wu
Given the increasing attention paid to speech emotion classification in recent years, this work presents a novel speech emotion classification approach based on the multiple kernel Gaussian process. Two major aspects of a classification problem that play an important role in classification accuracy are addressed, i.e. feature extraction and classification. Prosodic features and other features widely used in sound effect classification are selected. A semi-nonnegative matrix factorization algorithm is then applied to the proposed features in order to obtain more information about the features. Following feature extraction, a multiple kernel Gaussian process (GP) is used for classification, in which two similarity notions from our data in the learning algorithm are presented by combining the linear kernel and radial basis function (RBF) kernel. According to our results, the proposed speech emotion classification apporach achieve an accuracy of 77.74%. Moreover, comparing different apporaches reveals that the proposed system performs best than other apporaches.
鉴于近年来人们对语音情绪分类的关注日益增加,本文提出了一种基于多核高斯过程的语音情绪分类方法。研究了对分类精度起重要作用的分类问题的两个主要方面,即特征提取和分类。选择韵律特征和其他在音效分类中广泛使用的特征。然后将半非负矩阵分解算法应用于所提出的特征,以获得更多的特征信息。在特征提取之后,使用多核高斯过程(GP)进行分类,其中通过结合线性核和径向基函数(RBF)核,从学习算法中的数据中获得两个相似概念。结果表明,本文提出的语音情感分类方法准确率达到77.74%。此外,比较不同的方法表明,所提出的系统比其他方法性能最好。
{"title":"Speech emotion classification using multiple kernel Gaussian process","authors":"Sih-Huei Chen, Jia-Ching Wang, Wen-Chi Hsieh, Yu-Hao Chin, Chin-Wen Ho, Chung-Hsien Wu","doi":"10.1109/APSIPA.2016.7820708","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820708","url":null,"abstract":"Given the increasing attention paid to speech emotion classification in recent years, this work presents a novel speech emotion classification approach based on the multiple kernel Gaussian process. Two major aspects of a classification problem that play an important role in classification accuracy are addressed, i.e. feature extraction and classification. Prosodic features and other features widely used in sound effect classification are selected. A semi-nonnegative matrix factorization algorithm is then applied to the proposed features in order to obtain more information about the features. Following feature extraction, a multiple kernel Gaussian process (GP) is used for classification, in which two similarity notions from our data in the learning algorithm are presented by combining the linear kernel and radial basis function (RBF) kernel. According to our results, the proposed speech emotion classification apporach achieve an accuracy of 77.74%. Moreover, comparing different apporaches reveals that the proposed system performs best than other apporaches.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115423683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Mask design for pinhole-array-based hand-held light field cameras with applications in depth estimation 基于针孔阵列的手持光场相机掩模设计及其在深度估计中的应用
Chen-Wei Chang, Min-Hung Chen, Kuan-Chang Chen, Chi-Ming Yeh, Yi-Chang Lu
Pinhole-array-based hand-held light field cameras can be used to capture 4-dimensional light field data for different applications such as digital refocusing and depth estimation. Our previous experiences suggest the design of the pinhole array mask is very critical to the performance of the camera, and the selection of mask parameters could be very different between applications. In this paper, we derive equations for determining the parameters of pinhole masks. The proposed physically-based model can be applied to cameras of different pixel sizes. The experimental results which match the proposed model are also provided at the end of this paper.
基于针孔阵列的手持式光场相机可用于捕获四维光场数据,用于数字重聚焦和深度估计等不同应用。我们以往的经验表明,针孔阵列掩模的设计对相机的性能至关重要,并且在不同的应用中,掩模参数的选择可能会有很大的不同。本文导出了确定针孔掩模参数的方程。提出的基于物理的模型可以应用于不同像素大小的相机。最后给出了与该模型相匹配的实验结果。
{"title":"Mask design for pinhole-array-based hand-held light field cameras with applications in depth estimation","authors":"Chen-Wei Chang, Min-Hung Chen, Kuan-Chang Chen, Chi-Ming Yeh, Yi-Chang Lu","doi":"10.1109/APSIPA.2016.7820688","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820688","url":null,"abstract":"Pinhole-array-based hand-held light field cameras can be used to capture 4-dimensional light field data for different applications such as digital refocusing and depth estimation. Our previous experiences suggest the design of the pinhole array mask is very critical to the performance of the camera, and the selection of mask parameters could be very different between applications. In this paper, we derive equations for determining the parameters of pinhole masks. The proposed physically-based model can be applied to cameras of different pixel sizes. The experimental results which match the proposed model are also provided at the end of this paper.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting articulatory movement from text using deep architecture with stacked bottleneck features 使用具有堆叠瓶颈特征的深度架构预测文本的发音运动
Zhen Wei, Zhizheng Wu, Lei Xie
Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.
使用语音或文本来预测发音运动可能对语音相关应用有潜在的好处。人们提出了许多方法来解决声学-发音倒置问题,这远远超过了从文本中预测发音运动的探索。在本文中,我们研究了使用深度神经网络(DNN)从文本中进行关节运动预测的可行性。我们还将全语境特征、状态和电话信息与提供广泛语言语境作为网络输入的堆叠瓶颈特征相结合,以提高发音动作预测的性能。我们在MNGU0数据集上显示,我们的DNN方法实现了0.7370 mm的均方根误差(RMSE),这是文献中报道的最低RMSE。我们还证实了堆叠瓶颈特征的有效性,其中可能包含重要的上下文信息。
{"title":"Predicting articulatory movement from text using deep architecture with stacked bottleneck features","authors":"Zhen Wei, Zhizheng Wu, Lei Xie","doi":"10.1109/APSIPA.2016.7820703","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820703","url":null,"abstract":"Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129708615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Enhancement of noisy low-light images via structure-texture-noise decomposition 基于结构-纹理-噪声分解的低光噪声图像增强
Jaemoon Lim, Minhyeok Heo, Chulwoo Lee, Chang-Su Kim
We propose a novel noisy low-light image enhancement algorithm via structure-texture-noise (STN) decomposition. We split an input image into structure, texture, and noise components, and enhance the structure and texture components separately. Specifically, we first enhance the contrast of the structure image, by extending a 2D histogram-based image enhancement scheme based on the characteristics of low-light images. Then, we reconstruct the texture image by retrieving texture components from the noise image, and enhance it by exploiting the perceptual response of the human visual system. Experimental results demonstrate that the proposed STN algorithm sharpens the texture and enhances the contrast more effectively than conventional algorithms, while removing noise without artifacts.
提出了一种基于结构-纹理-噪声(STN)分解的噪声弱光图像增强算法。我们将输入图像分割成结构、纹理和噪声三个分量,并分别增强结构和纹理分量。具体来说,我们首先基于弱光图像的特点,扩展了基于二维直方图的图像增强方案,增强了结构图像的对比度。然后,我们从噪声图像中提取纹理分量来重建纹理图像,并利用人类视觉系统的感知响应来增强纹理图像。实验结果表明,与传统算法相比,STN算法能更有效地锐化纹理和增强对比度,同时去除噪声而不产生伪影。
{"title":"Enhancement of noisy low-light images via structure-texture-noise decomposition","authors":"Jaemoon Lim, Minhyeok Heo, Chulwoo Lee, Chang-Su Kim","doi":"10.1109/APSIPA.2016.7820710","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820710","url":null,"abstract":"We propose a novel noisy low-light image enhancement algorithm via structure-texture-noise (STN) decomposition. We split an input image into structure, texture, and noise components, and enhance the structure and texture components separately. Specifically, we first enhance the contrast of the structure image, by extending a 2D histogram-based image enhancement scheme based on the characteristics of low-light images. Then, we reconstruct the texture image by retrieving texture components from the noise image, and enhance it by exploiting the perceptual response of the human visual system. Experimental results demonstrate that the proposed STN algorithm sharpens the texture and enhances the contrast more effectively than conventional algorithms, while removing noise without artifacts.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Size-Invariant Fully Convolutional Neural Network for vessel segmentation of digital retinal images 基于全卷积神经网络的数字视网膜图像血管分割
Yuan-sheng Luo, Hong Cheng, Lu Yang
Vessel segmentation of digital retinal images plays an important role in diagnosis of diseases such as diabetics, hypertension and retinopathy of prematurity due to these diseases impact the retina. In this paper, a novel Size-Invariant Fully Convolutional Neural Network (SIFCN) is proposed to address the automatic retinal vessel segmentation problems. The input data of the network is the patches of images and the corresponding pixel-wise labels. A consecutive convolution layers and pooling layers follow the input data, so that the network can learn the abstract features to segment retinal vessel. Our network is designed to hold the height and width of data of each layer with padding and assign pooling stride so that the spatial information maintain and up-sample is not required. Compared with the pixel-wise retinal vessel segmentation approaches, our patch-wise segmentation is much more efficient since in each cycle it can predict all the pixels of the patch. Our overlapped SIFCN approach achieves accuracy of 0.9471, with the AUC of 0.9682. And our non-overlap SIFCN is the most efficient approach among the deep learning approaches, costing only 3.68 seconds per image, and the overlapped SIFCN costs 31.17 seconds per image.
由于糖尿病、高血压、早产儿视网膜病变等疾病对视网膜的影响,数字视网膜图像的血管分割在这些疾病的诊断中具有重要的作用。本文提出了一种新型的尺寸不变全卷积神经网络(SIFCN)来解决视网膜血管自动分割问题。网络的输入数据是图像的patch和相应的逐像素标签。对输入数据进行连续的卷积层和池化层,使网络能够学习到抽象的特征来分割视网膜血管。我们的网络通过填充来保持每层数据的高度和宽度,并分配池化步幅,从而不需要空间信息维护和上采样。与基于像素的视网膜血管分割方法相比,我们的基于补丁的分割方法更有效,因为它可以在每个周期内预测补丁的所有像素。我们的重叠SIFCN方法准确率为0.9471,AUC为0.9682。我们的非重叠SIFCN是深度学习方法中效率最高的方法,每张图像耗时仅为3.68秒,而重叠SIFCN每张图像耗时为31.17秒。
{"title":"Size-Invariant Fully Convolutional Neural Network for vessel segmentation of digital retinal images","authors":"Yuan-sheng Luo, Hong Cheng, Lu Yang","doi":"10.1109/APSIPA.2016.7820677","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820677","url":null,"abstract":"Vessel segmentation of digital retinal images plays an important role in diagnosis of diseases such as diabetics, hypertension and retinopathy of prematurity due to these diseases impact the retina. In this paper, a novel Size-Invariant Fully Convolutional Neural Network (SIFCN) is proposed to address the automatic retinal vessel segmentation problems. The input data of the network is the patches of images and the corresponding pixel-wise labels. A consecutive convolution layers and pooling layers follow the input data, so that the network can learn the abstract features to segment retinal vessel. Our network is designed to hold the height and width of data of each layer with padding and assign pooling stride so that the spatial information maintain and up-sample is not required. Compared with the pixel-wise retinal vessel segmentation approaches, our patch-wise segmentation is much more efficient since in each cycle it can predict all the pixels of the patch. Our overlapped SIFCN approach achieves accuracy of 0.9471, with the AUC of 0.9682. And our non-overlap SIFCN is the most efficient approach among the deep learning approaches, costing only 3.68 seconds per image, and the overlapped SIFCN costs 31.17 seconds per image.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130364872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Recognition of low-resolution face images using sparse coding of local features 基于局部特征稀疏编码的低分辨率人脸图像识别
M. S. Shakeel, K. Lam
In this paper, we propose a new approach for recognition of low-resolution face images by using sparse coding of local features. The proposed algorithm extracts Gabor features from a low-resolution gallery image and a query image at different scales and orientations, then projects the features separately into a new low-dimensional feature space using sparse coding that preserves the sparse structure of the local features. To determine the similarity between the projected features, a coefficient vector is estimated by using linear regression that determines the relationship between the projected gallery and query features. On the basis of this coefficient vector, residual values will be computed to classify the images. To validate our proposed method, experiments were performed using three databases (ORL, Extended-Yale B, and CAS-PEAL-R1), which contain images with different facial expressions and lighting conditions. Experimental results show that our method outperforms various classical and state-of-the-art face recognition methods.
本文提出了一种基于局部特征稀疏编码的低分辨率人脸图像识别新方法。该算法从不同尺度和方向的低分辨率图库图像和查询图像中提取Gabor特征,然后使用稀疏编码将特征分别投影到新的低维特征空间中,保留了局部特征的稀疏结构。为了确定投影特征之间的相似性,使用线性回归来估计系数向量,该系数向量确定投影图库和查询特征之间的关系。在此系数向量的基础上,计算残差值对图像进行分类。为了验证我们提出的方法,使用三个数据库(ORL, Extended-Yale B和cas - pearl - r1)进行了实验,这些数据库包含不同面部表情和光照条件的图像。实验结果表明,该方法优于各种经典和先进的人脸识别方法。
{"title":"Recognition of low-resolution face images using sparse coding of local features","authors":"M. S. Shakeel, K. Lam","doi":"10.1109/APSIPA.2016.7820829","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820829","url":null,"abstract":"In this paper, we propose a new approach for recognition of low-resolution face images by using sparse coding of local features. The proposed algorithm extracts Gabor features from a low-resolution gallery image and a query image at different scales and orientations, then projects the features separately into a new low-dimensional feature space using sparse coding that preserves the sparse structure of the local features. To determine the similarity between the projected features, a coefficient vector is estimated by using linear regression that determines the relationship between the projected gallery and query features. On the basis of this coefficient vector, residual values will be computed to classify the images. To validate our proposed method, experiments were performed using three databases (ORL, Extended-Yale B, and CAS-PEAL-R1), which contain images with different facial expressions and lighting conditions. Experimental results show that our method outperforms various classical and state-of-the-art face recognition methods.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130683806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Statistical analysis of phase-only correlation functions between two signals with stochastic bivariate phase-spectra 随机二元相位谱信号间纯相位相关函数的统计分析
Shunsuke Yamaki, Ryo Suzuki, M. Kawamata, M. Yoshizawa
This paper proposes statistical analysis of phase-only correlation functions between two signals with stochastic phase-spectra. We derive the expectation and variance of the phase-only correlation functions assuming phase-spectra of two input signals to be bivariate probability variables. As a result, we give expressions for the expectation and variance of phase-only correlation functions in terms of joint characteristic functions of the bivariate probability density function of the phase-spectra.
本文提出了两个随机相位谱信号之间的纯相位相关函数的统计分析。假设两个输入信号的相位谱为二元概率变量,我们推导出纯相位相关函数的期望和方差。因此,我们用相谱的二元概率密度函数的联合特征函数给出了纯相相关函数的期望和方差的表达式。
{"title":"Statistical analysis of phase-only correlation functions between two signals with stochastic bivariate phase-spectra","authors":"Shunsuke Yamaki, Ryo Suzuki, M. Kawamata, M. Yoshizawa","doi":"10.1109/APSIPA.2016.7820892","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820892","url":null,"abstract":"This paper proposes statistical analysis of phase-only correlation functions between two signals with stochastic phase-spectra. We derive the expectation and variance of the phase-only correlation functions assuming phase-spectra of two input signals to be bivariate probability variables. As a result, we give expressions for the expectation and variance of phase-only correlation functions in terms of joint characteristic functions of the bivariate probability density function of the phase-spectra.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"38 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130926653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast RQT structure decision method for HEVC HEVC快速RQT结构决策方法
Wei Zhou, Chang Yan, Henglu Wei, Guanwen Zhang, Ai Qing, Xin Zhou
Variable transform block (TB) sizes cause high computational complexity at the HEVC encoder. In this paper, a fast residual quad-tree (RQT) structure decision method is proposed to reduce the number of candidate transform sizes. The proposed method uses spatial and temporal correlation information in the neighbor blocks to predict the depth of current RQT. In addition, an efficient all zero block (AZB) detection approach is designed to accelerate transform and quantization. At last, the nonzero DCT coefficient (NNZ) based scheme is also integrated in the proposed method to early terminate the recursive RQT mode decision process. Experimental results show that our proposed method is able to reduce 70% computation complexity on average in RQT structure decision. And the BDBR and BDPR gains are 1.13% and −0.048dB respectively which are negligible.
可变变换块(TB)大小导致HEVC编码器的计算复杂度很高。本文提出了一种快速残差四叉树(RQT)结构决策方法,以减少候选变换大小的数量。该方法利用相邻块的时空相关信息来预测当前RQT的深度。此外,设计了一种高效的全零块(AZB)检测方法,以加速变换和量化。最后,将基于非零DCT系数(NNZ)的方案集成到该方法中,提前终止递归RQT模式决策过程。实验结果表明,该方法可将RQT结构决策的计算复杂度平均降低70%。BDBR和BDPR增益分别为1.13%和- 0.048dB,可以忽略不计。
{"title":"Fast RQT structure decision method for HEVC","authors":"Wei Zhou, Chang Yan, Henglu Wei, Guanwen Zhang, Ai Qing, Xin Zhou","doi":"10.1109/APSIPA.2016.7820705","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820705","url":null,"abstract":"Variable transform block (TB) sizes cause high computational complexity at the HEVC encoder. In this paper, a fast residual quad-tree (RQT) structure decision method is proposed to reduce the number of candidate transform sizes. The proposed method uses spatial and temporal correlation information in the neighbor blocks to predict the depth of current RQT. In addition, an efficient all zero block (AZB) detection approach is designed to accelerate transform and quantization. At last, the nonzero DCT coefficient (NNZ) based scheme is also integrated in the proposed method to early terminate the recursive RQT mode decision process. Experimental results show that our proposed method is able to reduce 70% computation complexity on average in RQT structure decision. And the BDBR and BDPR gains are 1.13% and −0.048dB respectively which are negligible.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129606523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Saliency detection using secondary quantization in DCT domain 基于二次量化的DCT域显著性检测
Xinyu Shen, Chunyu Lin, Yao Zhao, Hongyun Lin, Meiqin Liu
Saliency detection as an image preprocessing has been widely used in many applications such as image segmentation. Since most images stored in DCT domain, we propose an effective saliency detection algorithm, which is mainly based on DCT and secondary quantization. Firstly, the DC coefficient and the first five AC coefficients are used to get the color saliency map. Then, through secondary quantization of a JPEG image, we can obtain the difference of the original image and the quantified image, from which we can get the texture saliency map. Next, considering the center bias theory, the center region is easier to catch people's attention. And then the band-pass filter is used to simulate the behavior that the human visual system detects the salient region. Finally, the final saliency map is generated based on these two maps and two priorities. Experimental results on two datasets show that the proposed method can accurately detect the saliency regions and outperformed existing methods.
显著性检测作为一种图像预处理技术,已广泛应用于图像分割等领域。针对大多数图像存储在DCT域的特点,提出了一种有效的基于DCT和二次量化的显著性检测算法。首先,利用直流系数和前5个AC系数得到颜色显著性图;然后,通过对JPEG图像进行二次量化,得到原始图像与量化后图像的差值,从而得到纹理显著性图。其次,考虑到中心偏置理论,中心区域更容易引起人们的注意。然后利用带通滤波器模拟人类视觉系统检测显著区域的行为。最后,根据这两个映射和两个优先级生成最终的显著性映射。在两个数据集上的实验结果表明,该方法能够准确地检测出显著区域,优于现有方法。
{"title":"Saliency detection using secondary quantization in DCT domain","authors":"Xinyu Shen, Chunyu Lin, Yao Zhao, Hongyun Lin, Meiqin Liu","doi":"10.1109/APSIPA.2016.7820877","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820877","url":null,"abstract":"Saliency detection as an image preprocessing has been widely used in many applications such as image segmentation. Since most images stored in DCT domain, we propose an effective saliency detection algorithm, which is mainly based on DCT and secondary quantization. Firstly, the DC coefficient and the first five AC coefficients are used to get the color saliency map. Then, through secondary quantization of a JPEG image, we can obtain the difference of the original image and the quantified image, from which we can get the texture saliency map. Next, considering the center bias theory, the center region is easier to catch people's attention. And then the band-pass filter is used to simulate the behavior that the human visual system detects the salient region. Finally, the final saliency map is generated based on these two maps and two priorities. Experimental results on two datasets show that the proposed method can accurately detect the saliency regions and outperformed existing methods.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130666618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constrained Wiener gains and filters for single-channel and multichannel noise reduction 约束维纳增益和滤波器用于单通道和多通道降噪
Tao Long, J. Benesty, Jingdong Chen
Noise reduction has long been an active research topic in signal processing and many algorithms have been developed over the last four decades. These algorithms were proved to be successful in some degree to improve the signal-to-noise ratio (SNR) and speech quality. However, there is one problem common to all these algorithms: the volume of the enhanced signal after noise reduction is often perceived lower than that of the original signal. This phenomenon is particularly serious when SNR is low. In this paper, we develop two constrained Wiener gains and filters for noise reduction in the short-time Fourier transform (STFT) domain. These Wiener gains and filters are deduced by minimizing the mean-squared error (MSE) between the clean speech and the speech estimate with the constraint that the sum of the variances of the filtered speech and residual noise is equal to the variance of the noisy observation.
在信号处理领域,降噪一直是一个活跃的研究课题,在过去的四十年里,人们开发了许多降噪算法。实验证明,这些算法在一定程度上提高了信噪比和语音质量。然而,所有这些算法都有一个共同的问题:降噪后增强信号的体积通常被认为比原始信号的体积小。当信噪比较低时,这种现象尤为严重。在本文中,我们开发了两个约束维纳增益和滤波器用于短时傅里叶变换(STFT)域的降噪。这些维纳增益和滤波器是通过最小化干净语音和语音估计之间的均方误差(MSE)来推导的,约束条件是滤波语音和残余噪声的方差之和等于噪声观测的方差。
{"title":"Constrained Wiener gains and filters for single-channel and multichannel noise reduction","authors":"Tao Long, J. Benesty, Jingdong Chen","doi":"10.1109/APSIPA.2016.7820804","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820804","url":null,"abstract":"Noise reduction has long been an active research topic in signal processing and many algorithms have been developed over the last four decades. These algorithms were proved to be successful in some degree to improve the signal-to-noise ratio (SNR) and speech quality. However, there is one problem common to all these algorithms: the volume of the enhanced signal after noise reduction is often perceived lower than that of the original signal. This phenomenon is particularly serious when SNR is low. In this paper, we develop two constrained Wiener gains and filters for noise reduction in the short-time Fourier transform (STFT) domain. These Wiener gains and filters are deduced by minimizing the mean-squared error (MSE) between the clean speech and the speech estimate with the constraint that the sum of the variances of the filtered speech and residual noise is equal to the variance of the noisy observation.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128051735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1