首页 > 最新文献

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)最新文献

英文 中文
Locality sensitive discriminant analysis for speaker verification 说话人验证的局部敏感判别分析
Danwei Cai, Weicheng Cai, Zhidong Ni, Ming Li
In this paper, we apply Locality Sensitive Discriminant Analysis (LSDA) to speaker verification system for intersession variability compensation. As opposed to LDA which fails to discover the local geometrical structure of the data manifold, LSDA finds a projection which maximizes the margin between i-vectors from different speakers at each local area. Since the number of samples varies in a wide range in each class, we improve LSDA by using adaptive k nearest neighbors in each class and modifying the corresponding within- and between-class weight matrix. In that way, each class has equal importance in LSDA's objective function. Experiments were carried out on the NIST 2010 speaker recognition evaluation (SRE) extended condition 5 female task, results show that our proposed adaptive k nearest neighbors based LSDA method significantly improves the conventional i-vector/PLDA baseline by 18% relative cost reduction and 28% relative equal error rate reduction.
本文将局域敏感判别分析(LSDA)应用于说话人验证系统的会话间可变性补偿。与LDA无法发现数据流形的局部几何结构相反,LSDA找到了一个投影,该投影使每个局部区域来自不同说话人的i向量之间的余量最大化。由于每个类的样本数量变化范围很大,我们通过在每个类中使用自适应k近邻并修改相应的类内和类间权重矩阵来改进LSDA。这样,每个类在LSDA的目标函数中具有同等的重要性。在NIST 2010 speaker recognition evaluation (SRE)扩展条件5女性任务上进行了实验,结果表明,我们提出的基于k近邻的自适应LSDA方法比传统的i-vector/PLDA基线显著降低了18%的相对成本和28%的相对平均错误率。
{"title":"Locality sensitive discriminant analysis for speaker verification","authors":"Danwei Cai, Weicheng Cai, Zhidong Ni, Ming Li","doi":"10.1109/APSIPA.2016.7820799","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820799","url":null,"abstract":"In this paper, we apply Locality Sensitive Discriminant Analysis (LSDA) to speaker verification system for intersession variability compensation. As opposed to LDA which fails to discover the local geometrical structure of the data manifold, LSDA finds a projection which maximizes the margin between i-vectors from different speakers at each local area. Since the number of samples varies in a wide range in each class, we improve LSDA by using adaptive k nearest neighbors in each class and modifying the corresponding within- and between-class weight matrix. In that way, each class has equal importance in LSDA's objective function. Experiments were carried out on the NIST 2010 speaker recognition evaluation (SRE) extended condition 5 female task, results show that our proposed adaptive k nearest neighbors based LSDA method significantly improves the conventional i-vector/PLDA baseline by 18% relative cost reduction and 28% relative equal error rate reduction.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Color-distribution similarity by information theoretic divergence for color images 基于信息理论散度的彩色图像颜色分布相似度研究
Mizuki Murayama, Daisuke Oguro, H. Kikuchi, H. Huttunen, Yo-Sung Ho, Jaeho Shin
The divergence similarity between two color images is presented based on the Jensen-Shannon divergence to measure the color-distribution similarity. Subjective assessment experiments were developed to obtain mean opinion scores (MOS) of test images. It was found that the divergence similarity and MOS values showed statistically significant correlations.
基于Jensen-Shannon散度,提出了两幅彩色图像之间的散度相似度来衡量颜色分布的相似度。进行主观评价实验,得到测试图像的平均意见分数(MOS)。结果表明,差异相似度与MOS值具有显著的统计学相关性。
{"title":"Color-distribution similarity by information theoretic divergence for color images","authors":"Mizuki Murayama, Daisuke Oguro, H. Kikuchi, H. Huttunen, Yo-Sung Ho, Jaeho Shin","doi":"10.1109/APSIPA.2016.7820681","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820681","url":null,"abstract":"The divergence similarity between two color images is presented based on the Jensen-Shannon divergence to measure the color-distribution similarity. Subjective assessment experiments were developed to obtain mean opinion scores (MOS) of test images. It was found that the divergence similarity and MOS values showed statistically significant correlations.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122625817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Highlighting root notes in chord recognition using cepstral features and multi-task learning 利用倒谱特征和多任务学习突出和弦识别中的根音
Mu Yang, Li Su, Yi-Hsuan Yang
A musical chord is usually described by its root note and the chord type. While a substantial amount of work has been done in the field of music information retrieval (MIR) to automate chord recognition, the role of root notes in this task has seldom received specific attention. In this paper, we present a new approach and empirical studies demonstrating improved accuracy in chord recognition by properly highlighting the information of the root notes. In the signal level, we propose to combine spectral features with features derived from the cepstrum to improve the identification of low pitches, which usually correspond to the root notes. In the model level, we propose a multi-task learning framework based on the neural nets to jointly consider chord recognition and root note recognition in training. We found that the improved accuracy can be attributed to better information about the sub-harmonics of the notes, and the emphasis of root notes in recognizing chords.
一个音乐和弦通常由它的根音和和弦类型来描述。虽然在音乐信息检索(MIR)领域已经做了大量的工作来自动识别和弦,但在这项任务中,词根音符的作用很少受到特别的关注。在本文中,我们提出了一种新的方法和实证研究表明,通过适当地突出根音符的信息,可以提高和弦识别的准确性。在信号层面,我们提出将频谱特征与倒谱衍生的特征相结合,以提高对低音的识别能力,因为低音通常与根音相对应。在模型层面,我们提出了一种基于神经网络的多任务学习框架,在训练中共同考虑和弦识别和根音识别。我们发现,准确度的提高可以归因于更好地了解音符的次谐波,以及识别和弦时根音的强调。
{"title":"Highlighting root notes in chord recognition using cepstral features and multi-task learning","authors":"Mu Yang, Li Su, Yi-Hsuan Yang","doi":"10.1109/APSIPA.2016.7820865","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820865","url":null,"abstract":"A musical chord is usually described by its root note and the chord type. While a substantial amount of work has been done in the field of music information retrieval (MIR) to automate chord recognition, the role of root notes in this task has seldom received specific attention. In this paper, we present a new approach and empirical studies demonstrating improved accuracy in chord recognition by properly highlighting the information of the root notes. In the signal level, we propose to combine spectral features with features derived from the cepstrum to improve the identification of low pitches, which usually correspond to the root notes. In the model level, we propose a multi-task learning framework based on the neural nets to jointly consider chord recognition and root note recognition in training. We found that the improved accuracy can be attributed to better information about the sub-harmonics of the notes, and the emphasis of root notes in recognizing chords.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126461814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Enhancement of noisy low-light images via structure-texture-noise decomposition 基于结构-纹理-噪声分解的低光噪声图像增强
Jaemoon Lim, Minhyeok Heo, Chulwoo Lee, Chang-Su Kim
We propose a novel noisy low-light image enhancement algorithm via structure-texture-noise (STN) decomposition. We split an input image into structure, texture, and noise components, and enhance the structure and texture components separately. Specifically, we first enhance the contrast of the structure image, by extending a 2D histogram-based image enhancement scheme based on the characteristics of low-light images. Then, we reconstruct the texture image by retrieving texture components from the noise image, and enhance it by exploiting the perceptual response of the human visual system. Experimental results demonstrate that the proposed STN algorithm sharpens the texture and enhances the contrast more effectively than conventional algorithms, while removing noise without artifacts.
提出了一种基于结构-纹理-噪声(STN)分解的噪声弱光图像增强算法。我们将输入图像分割成结构、纹理和噪声三个分量,并分别增强结构和纹理分量。具体来说,我们首先基于弱光图像的特点,扩展了基于二维直方图的图像增强方案,增强了结构图像的对比度。然后,我们从噪声图像中提取纹理分量来重建纹理图像,并利用人类视觉系统的感知响应来增强纹理图像。实验结果表明,与传统算法相比,STN算法能更有效地锐化纹理和增强对比度,同时去除噪声而不产生伪影。
{"title":"Enhancement of noisy low-light images via structure-texture-noise decomposition","authors":"Jaemoon Lim, Minhyeok Heo, Chulwoo Lee, Chang-Su Kim","doi":"10.1109/APSIPA.2016.7820710","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820710","url":null,"abstract":"We propose a novel noisy low-light image enhancement algorithm via structure-texture-noise (STN) decomposition. We split an input image into structure, texture, and noise components, and enhance the structure and texture components separately. Specifically, we first enhance the contrast of the structure image, by extending a 2D histogram-based image enhancement scheme based on the characteristics of low-light images. Then, we reconstruct the texture image by retrieving texture components from the noise image, and enhance it by exploiting the perceptual response of the human visual system. Experimental results demonstrate that the proposed STN algorithm sharpens the texture and enhances the contrast more effectively than conventional algorithms, while removing noise without artifacts.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Predicting articulatory movement from text using deep architecture with stacked bottleneck features 使用具有堆叠瓶颈特征的深度架构预测文本的发音运动
Zhen Wei, Zhizheng Wu, Lei Xie
Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.
使用语音或文本来预测发音运动可能对语音相关应用有潜在的好处。人们提出了许多方法来解决声学-发音倒置问题,这远远超过了从文本中预测发音运动的探索。在本文中,我们研究了使用深度神经网络(DNN)从文本中进行关节运动预测的可行性。我们还将全语境特征、状态和电话信息与提供广泛语言语境作为网络输入的堆叠瓶颈特征相结合,以提高发音动作预测的性能。我们在MNGU0数据集上显示,我们的DNN方法实现了0.7370 mm的均方根误差(RMSE),这是文献中报道的最低RMSE。我们还证实了堆叠瓶颈特征的有效性,其中可能包含重要的上下文信息。
{"title":"Predicting articulatory movement from text using deep architecture with stacked bottleneck features","authors":"Zhen Wei, Zhizheng Wu, Lei Xie","doi":"10.1109/APSIPA.2016.7820703","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820703","url":null,"abstract":"Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement prediction from text. We also combine full-context features, state and phone information with stacked bottleneck features which provide wide linguistic context as network input, to improve the performance of articulatory movements' prediction. We show on the MNGU0 data set that our DNN approach achieves a root mean-squared error (RMSE) of 0.7370 mm, the lowest RMSE reported in the literature. We also confirmed the effectiveness of stacked bottleneck features, which could include important contextual information.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129708615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fast RQT structure decision method for HEVC HEVC快速RQT结构决策方法
Wei Zhou, Chang Yan, Henglu Wei, Guanwen Zhang, Ai Qing, Xin Zhou
Variable transform block (TB) sizes cause high computational complexity at the HEVC encoder. In this paper, a fast residual quad-tree (RQT) structure decision method is proposed to reduce the number of candidate transform sizes. The proposed method uses spatial and temporal correlation information in the neighbor blocks to predict the depth of current RQT. In addition, an efficient all zero block (AZB) detection approach is designed to accelerate transform and quantization. At last, the nonzero DCT coefficient (NNZ) based scheme is also integrated in the proposed method to early terminate the recursive RQT mode decision process. Experimental results show that our proposed method is able to reduce 70% computation complexity on average in RQT structure decision. And the BDBR and BDPR gains are 1.13% and −0.048dB respectively which are negligible.
可变变换块(TB)大小导致HEVC编码器的计算复杂度很高。本文提出了一种快速残差四叉树(RQT)结构决策方法,以减少候选变换大小的数量。该方法利用相邻块的时空相关信息来预测当前RQT的深度。此外,设计了一种高效的全零块(AZB)检测方法,以加速变换和量化。最后,将基于非零DCT系数(NNZ)的方案集成到该方法中,提前终止递归RQT模式决策过程。实验结果表明,该方法可将RQT结构决策的计算复杂度平均降低70%。BDBR和BDPR增益分别为1.13%和- 0.048dB,可以忽略不计。
{"title":"Fast RQT structure decision method for HEVC","authors":"Wei Zhou, Chang Yan, Henglu Wei, Guanwen Zhang, Ai Qing, Xin Zhou","doi":"10.1109/APSIPA.2016.7820705","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820705","url":null,"abstract":"Variable transform block (TB) sizes cause high computational complexity at the HEVC encoder. In this paper, a fast residual quad-tree (RQT) structure decision method is proposed to reduce the number of candidate transform sizes. The proposed method uses spatial and temporal correlation information in the neighbor blocks to predict the depth of current RQT. In addition, an efficient all zero block (AZB) detection approach is designed to accelerate transform and quantization. At last, the nonzero DCT coefficient (NNZ) based scheme is also integrated in the proposed method to early terminate the recursive RQT mode decision process. Experimental results show that our proposed method is able to reduce 70% computation complexity on average in RQT structure decision. And the BDBR and BDPR gains are 1.13% and −0.048dB respectively which are negligible.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129606523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constrained Wiener gains and filters for single-channel and multichannel noise reduction 约束维纳增益和滤波器用于单通道和多通道降噪
Tao Long, J. Benesty, Jingdong Chen
Noise reduction has long been an active research topic in signal processing and many algorithms have been developed over the last four decades. These algorithms were proved to be successful in some degree to improve the signal-to-noise ratio (SNR) and speech quality. However, there is one problem common to all these algorithms: the volume of the enhanced signal after noise reduction is often perceived lower than that of the original signal. This phenomenon is particularly serious when SNR is low. In this paper, we develop two constrained Wiener gains and filters for noise reduction in the short-time Fourier transform (STFT) domain. These Wiener gains and filters are deduced by minimizing the mean-squared error (MSE) between the clean speech and the speech estimate with the constraint that the sum of the variances of the filtered speech and residual noise is equal to the variance of the noisy observation.
在信号处理领域,降噪一直是一个活跃的研究课题,在过去的四十年里,人们开发了许多降噪算法。实验证明,这些算法在一定程度上提高了信噪比和语音质量。然而,所有这些算法都有一个共同的问题:降噪后增强信号的体积通常被认为比原始信号的体积小。当信噪比较低时,这种现象尤为严重。在本文中,我们开发了两个约束维纳增益和滤波器用于短时傅里叶变换(STFT)域的降噪。这些维纳增益和滤波器是通过最小化干净语音和语音估计之间的均方误差(MSE)来推导的,约束条件是滤波语音和残余噪声的方差之和等于噪声观测的方差。
{"title":"Constrained Wiener gains and filters for single-channel and multichannel noise reduction","authors":"Tao Long, J. Benesty, Jingdong Chen","doi":"10.1109/APSIPA.2016.7820804","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820804","url":null,"abstract":"Noise reduction has long been an active research topic in signal processing and many algorithms have been developed over the last four decades. These algorithms were proved to be successful in some degree to improve the signal-to-noise ratio (SNR) and speech quality. However, there is one problem common to all these algorithms: the volume of the enhanced signal after noise reduction is often perceived lower than that of the original signal. This phenomenon is particularly serious when SNR is low. In this paper, we develop two constrained Wiener gains and filters for noise reduction in the short-time Fourier transform (STFT) domain. These Wiener gains and filters are deduced by minimizing the mean-squared error (MSE) between the clean speech and the speech estimate with the constraint that the sum of the variances of the filtered speech and residual noise is equal to the variance of the noisy observation.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128051735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical analysis of phase-only correlation functions between two signals with stochastic bivariate phase-spectra 随机二元相位谱信号间纯相位相关函数的统计分析
Shunsuke Yamaki, Ryo Suzuki, M. Kawamata, M. Yoshizawa
This paper proposes statistical analysis of phase-only correlation functions between two signals with stochastic phase-spectra. We derive the expectation and variance of the phase-only correlation functions assuming phase-spectra of two input signals to be bivariate probability variables. As a result, we give expressions for the expectation and variance of phase-only correlation functions in terms of joint characteristic functions of the bivariate probability density function of the phase-spectra.
本文提出了两个随机相位谱信号之间的纯相位相关函数的统计分析。假设两个输入信号的相位谱为二元概率变量,我们推导出纯相位相关函数的期望和方差。因此,我们用相谱的二元概率密度函数的联合特征函数给出了纯相相关函数的期望和方差的表达式。
{"title":"Statistical analysis of phase-only correlation functions between two signals with stochastic bivariate phase-spectra","authors":"Shunsuke Yamaki, Ryo Suzuki, M. Kawamata, M. Yoshizawa","doi":"10.1109/APSIPA.2016.7820892","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820892","url":null,"abstract":"This paper proposes statistical analysis of phase-only correlation functions between two signals with stochastic phase-spectra. We derive the expectation and variance of the phase-only correlation functions assuming phase-spectra of two input signals to be bivariate probability variables. As a result, we give expressions for the expectation and variance of phase-only correlation functions in terms of joint characteristic functions of the bivariate probability density function of the phase-spectra.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"38 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130926653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mesh-based image retargeting with spectral graph filtering 基于谱图滤波的网格图像重定位
Yuichi Tanaka, S. Yagyu, Akie Sakiyama, Masaki Onuki
We propose a calculation method of deformed image pixel positions for mesh-based image retargeting. Image retargeting is a sophisticated image resizing method which yields resized images with acceptable quality even if we resize the image into different aspect ratio from the original one. It often employs a mesh-based approach, where pixels are nodes of a graph and relationships between pixels are represented as its edges. In this paper, we reformulate a pixel position deformation of image retargeting as a spectral graph filtering with a graph signal processing-based approach. We validate our method through some image retargeting examples with an appropriately designed filter kernels in the graph spectral domain.
提出了一种基于网格的图像重定位中变形图像像素位置的计算方法。图像重定向是一种复杂的图像大小调整方法,即使我们将图像大小调整为与原始图像不同的宽高比,也能产生质量可接受的图像。它通常采用基于网格的方法,其中像素是图的节点,像素之间的关系表示为其边缘。本文采用基于图信号处理的方法,将图像重定位的像素位置变形重构为光谱图滤波。我们通过一些图像重定向实例验证了我们的方法,并在图谱域中设计了适当的滤波器核。
{"title":"Mesh-based image retargeting with spectral graph filtering","authors":"Yuichi Tanaka, S. Yagyu, Akie Sakiyama, Masaki Onuki","doi":"10.1109/APSIPA.2016.7820728","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820728","url":null,"abstract":"We propose a calculation method of deformed image pixel positions for mesh-based image retargeting. Image retargeting is a sophisticated image resizing method which yields resized images with acceptable quality even if we resize the image into different aspect ratio from the original one. It often employs a mesh-based approach, where pixels are nodes of a graph and relationships between pixels are represented as its edges. In this paper, we reformulate a pixel position deformation of image retargeting as a spectral graph filtering with a graph signal processing-based approach. We validate our method through some image retargeting examples with an appropriately designed filter kernels in the graph spectral domain.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127271640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient deep neural networks for speech synthesis using bottleneck features 基于瓶颈特征的高效深度神经网络语音合成
Young-Sun Joo, Won-Suk Jun, Hong-Goo Kang
This paper proposes a cascading deep neural network (DNN) structure for speech synthesis system that consists of text-to-bottleneck (TTB) and bottleneck-to-speech (BTS) models. Unlike conventional single structure that requires a large database to find complicated mapping rules between linguistic and acoustic features, the proposed structure is very effective even if the available training database is inadequate. The bottle-neck feature utilized in the proposed approach represents the characteristics of linguistic features and its average acoustic features of several speakers. Therefore, it is more efficient to learn a mapping rule between bottleneck and acoustic features than to learn directly a mapping rule between linguistic and acoustic features. Experimental results show that the learning capability of the proposed structure is much higher than that of the conventional structures. Objective and subjective listening test results also verify the superiority of the proposed structure.
本文提出了一种用于语音合成系统的级联深度神经网络(DNN)结构,该结构由文本到瓶颈(TTB)和瓶颈到语音(BTS)模型组成。传统的单一结构需要庞大的数据库才能找到语言和声学特征之间复杂的映射规则,与之不同的是,即使可用的训练数据库不足,该结构也非常有效。该方法中使用的瓶颈特征代表了几个说话人的语言特征及其平均声学特征的特征。因此,学习瓶颈和声学特征之间的映射规则比直接学习语言和声学特征之间的映射规则更有效。实验结果表明,该结构的学习能力大大高于传统结构。客观和主观听力测试结果也验证了所提出结构的优越性。
{"title":"Efficient deep neural networks for speech synthesis using bottleneck features","authors":"Young-Sun Joo, Won-Suk Jun, Hong-Goo Kang","doi":"10.1109/APSIPA.2016.7820721","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820721","url":null,"abstract":"This paper proposes a cascading deep neural network (DNN) structure for speech synthesis system that consists of text-to-bottleneck (TTB) and bottleneck-to-speech (BTS) models. Unlike conventional single structure that requires a large database to find complicated mapping rules between linguistic and acoustic features, the proposed structure is very effective even if the available training database is inadequate. The bottle-neck feature utilized in the proposed approach represents the characteristics of linguistic features and its average acoustic features of several speakers. Therefore, it is more efficient to learn a mapping rule between bottleneck and acoustic features than to learn directly a mapping rule between linguistic and acoustic features. Experimental results show that the learning capability of the proposed structure is much higher than that of the conventional structures. Objective and subjective listening test results also verify the superiority of the proposed structure.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122412625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1