首页 > 最新文献

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific最新文献

英文 中文
Robust emotion recognition in live music using noise suppression and a hierarchical sparse representation classifier 基于噪声抑制和层次稀疏表示分类器的现场音乐鲁棒情感识别
Yu-Hao Chin, Chang-Hong Lin, Jia-Ching Wang
Recognition of emotional content in music is an issue that arises recently. Music received by live applications are often exposed to noise, thus prone to reducing the recognition rate of the application. The solution proposed in this study is a robust music emotion recognition system for live applications. The proposed system consists of two major parts, i.e. subspace-based noise suppression and a hierarchical sparse representation classifier, which is based on sparse coding and a sparse representation classifier (SRC). The music is firstly enhanced by fast subspace based noise suppression. Nine classes of emotion are then used to construct a dictionary, and the vector of coefficients is obtained by sparse coding. The vector can be divided into nine parts, and each of which models a specific emotional class of a signal. Since the proposed descriptor can provide emotional content analysis of different resolutions for emotional music recognition, this work regards vectors of coefficients as feature representations. Finally, a sparse representation based classification method is employed for classification of music into four emotional classes. The experimental results confirm the highly robust performance of the proposed system in emotion recognition in live music.
对音乐中情感内容的识别是最近出现的一个问题。实时应用程序接收的音乐经常受到噪声的影响,从而容易降低应用程序的识别率。本研究提出的解决方案是一个鲁棒的音乐情感识别系统,用于现场应用。该系统由基于子空间的噪声抑制和基于稀疏编码和稀疏表示分类器(SRC)的分层稀疏表示分类器两大部分组成。首先采用基于子空间的快速噪声抑制技术增强音乐效果。然后用9类情绪构造字典,通过稀疏编码得到系数向量。这个向量可以分为九个部分,每个部分都模拟了一个信号的特定情感类别。由于所提出的描述符可以为情感音乐识别提供不同分辨率的情感内容分析,因此本文将系数向量作为特征表示。最后,采用基于稀疏表示的分类方法将音乐分为四类情感。实验结果证实了该系统在现场音乐情感识别中的鲁棒性。
{"title":"Robust emotion recognition in live music using noise suppression and a hierarchical sparse representation classifier","authors":"Yu-Hao Chin, Chang-Hong Lin, Jia-Ching Wang","doi":"10.1109/APSIPA.2014.7041629","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041629","url":null,"abstract":"Recognition of emotional content in music is an issue that arises recently. Music received by live applications are often exposed to noise, thus prone to reducing the recognition rate of the application. The solution proposed in this study is a robust music emotion recognition system for live applications. The proposed system consists of two major parts, i.e. subspace-based noise suppression and a hierarchical sparse representation classifier, which is based on sparse coding and a sparse representation classifier (SRC). The music is firstly enhanced by fast subspace based noise suppression. Nine classes of emotion are then used to construct a dictionary, and the vector of coefficients is obtained by sparse coding. The vector can be divided into nine parts, and each of which models a specific emotional class of a signal. Since the proposed descriptor can provide emotional content analysis of different resolutions for emotional music recognition, this work regards vectors of coefficients as feature representations. Finally, a sparse representation based classification method is employed for classification of music into four emotional classes. The experimental results confirm the highly robust performance of the proposed system in emotion recognition in live music.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124847018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text localization in natural scene images with stroke width histogram and superpixel 基于笔画宽度直方图和超像素的自然场景图像文本定位
Yu Zhou, Shuang Liu, Yongzheng Zhang, Yipeng Wang, Weiyao Lin
A novel stroke-based method to localize text in unconstrained natural scene images is proposed. Firstly, in order to improve the edge detection in tough situations where the texts are partially occluded or noisy, we use stroke width histogram as guidance to generate a series of superpixels. Secondly, we present a novel way of using distance transform and sobel operator to extract character skeleton and then use the skeleton to improve stroke-width accuracy. Our method was evaluated on two standard datasets: ICDAR 2005 and ICDAR 2011, and the experimental results show that it achieves state-of-the-art performance.
提出了一种基于笔画的无约束自然场景图像文本定位方法。首先,为了提高文本被部分遮挡或有噪声的恶劣情况下的边缘检测,我们使用笔画宽度直方图作为指导生成一系列超像素;其次,提出了一种利用距离变换和sobel算子提取字符骨架的新方法,并利用该骨架提高笔画宽度精度;在ICDAR 2005和ICDAR 2011两个标准数据集上对该方法进行了评估,实验结果表明该方法达到了最先进的性能。
{"title":"Text localization in natural scene images with stroke width histogram and superpixel","authors":"Yu Zhou, Shuang Liu, Yongzheng Zhang, Yipeng Wang, Weiyao Lin","doi":"10.1109/APSIPA.2014.7041656","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041656","url":null,"abstract":"A novel stroke-based method to localize text in unconstrained natural scene images is proposed. Firstly, in order to improve the edge detection in tough situations where the texts are partially occluded or noisy, we use stroke width histogram as guidance to generate a series of superpixels. Secondly, we present a novel way of using distance transform and sobel operator to extract character skeleton and then use the skeleton to improve stroke-width accuracy. Our method was evaluated on two standard datasets: ICDAR 2005 and ICDAR 2011, and the experimental results show that it achieves state-of-the-art performance.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125206131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
High dynamic range imaging technology for micro camera array 微相机阵列的高动态范围成像技术
Po-Hsiang Huang, Yuan-Hsiang Miao, Jiun-In Guo
Micro lens captures less light than normal lens dose, which makes low quality noise image, and current image sensor cannot preserve whole dynamic range in real world. HDR image with multi-exposure image overcomes the problems mentioned above. Choosing good exposure time is a seldom-discussed but important issue in HDR imaging technology. In this paper we propose a Histogram Based Exposure Time Selection (HBETS) method to automatically adjust proper exposure time of each lens for different scenes. Adopting the proposed weighting function restrains random distributed noise caused by micro-lens and produces a high quality HDR image. An integrated tone mapping methodology, which keeps all details in bright and dark parts when compressing the HDR image to LDR image for being displayed on monitors, is proposed. The result image has extended the dynamic range, that is, comprehensive information is provided. Eventually, we have implemented the proposed 4-CAM HDR system on Adlink MXC-6300 platform that can reach VGA video@10 fps.
微透镜捕获的光量比普通透镜少,导致图像噪声质量较低,目前的图像传感器无法在现实世界中保持整个动态范围。多曝光HDR图像克服了上述问题。在HDR成像技术中,选择良好的曝光时间是一个很少被讨论但又很重要的问题。本文提出了一种基于直方图的曝光时间选择(HBETS)方法来自动调整不同场景下每个镜头的曝光时间。采用该加权函数抑制了微透镜引起的随机分布噪声,得到了高质量的HDR图像。提出了一种将HDR图像压缩成LDR图像在显示器上显示时,保留所有细节在明暗部分的综合色调映射方法。结果图像扩展了动态范围,即提供了全面的信息。最终,我们在Adlink MXC-6300平台上实现了所提出的4-CAM HDR系统,达到VGA video@10 fps。
{"title":"High dynamic range imaging technology for micro camera array","authors":"Po-Hsiang Huang, Yuan-Hsiang Miao, Jiun-In Guo","doi":"10.1109/APSIPA.2014.7041726","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041726","url":null,"abstract":"Micro lens captures less light than normal lens dose, which makes low quality noise image, and current image sensor cannot preserve whole dynamic range in real world. HDR image with multi-exposure image overcomes the problems mentioned above. Choosing good exposure time is a seldom-discussed but important issue in HDR imaging technology. In this paper we propose a Histogram Based Exposure Time Selection (HBETS) method to automatically adjust proper exposure time of each lens for different scenes. Adopting the proposed weighting function restrains random distributed noise caused by micro-lens and produces a high quality HDR image. An integrated tone mapping methodology, which keeps all details in bright and dark parts when compressing the HDR image to LDR image for being displayed on monitors, is proposed. The result image has extended the dynamic range, that is, comprehensive information is provided. Eventually, we have implemented the proposed 4-CAM HDR system on Adlink MXC-6300 platform that can reach VGA video@10 fps.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"457 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124331580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Range reduction of HDR images for backward compatibility with LDR image processing 减少HDR图像的范围以向后兼容LDR图像处理
M. Iwahashi, Taichi Yoshida, H. Kiya
This paper proposes a new range reduction method with the minimum amount of quantization error in the L2 norm under the L infinity norm constraint. It is necessary to reduce dynamic range of pixel values of high dynamic range (HDR) images to have backward compatibility with low dynamic range image processing systems. The simplest approach is to truncate lower bit planes in binary representation of pixel values. However it does not have fine granularity of the reduced range, and also it does not utilize the histogram sparseness. Furthermore, it generates significant amount of quantization errors. In this paper, we propose a new range reduction method which can 1) utilize the histogram sparseness, and also 2) minimize variance of the error 3) under a specified maximum absolute value of the error.
在L无穷范数约束下,提出了一种新的L2范数量化误差最小的范围缩减方法。为了使高动态范围(HDR)图像与低动态范围图像处理系统向后兼容,必须减小其像素值的动态范围。最简单的方法是截断像素值二进制表示中的较低位平面。然而,它没有精细的缩减范围粒度,也没有利用直方图稀疏性。此外,它还会产生大量的量化误差。在本文中,我们提出了一种新的范围缩小方法,它可以1)利用直方图稀疏性,也可以2)最小化误差方差,3)在指定的最大误差绝对值下。
{"title":"Range reduction of HDR images for backward compatibility with LDR image processing","authors":"M. Iwahashi, Taichi Yoshida, H. Kiya","doi":"10.1109/APSIPA.2014.7041617","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041617","url":null,"abstract":"This paper proposes a new range reduction method with the minimum amount of quantization error in the L2 norm under the L infinity norm constraint. It is necessary to reduce dynamic range of pixel values of high dynamic range (HDR) images to have backward compatibility with low dynamic range image processing systems. The simplest approach is to truncate lower bit planes in binary representation of pixel values. However it does not have fine granularity of the reduced range, and also it does not utilize the histogram sparseness. Furthermore, it generates significant amount of quantization errors. In this paper, we propose a new range reduction method which can 1) utilize the histogram sparseness, and also 2) minimize variance of the error 3) under a specified maximum absolute value of the error.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124359025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Unnecessary utterance detection for avoiding digressions in discussion 讨论中避免离题的不必要话语检测
Riki Yoshida, Takuya Hiraoka, Graham Neubig, S. Sakti, T. Toda, Satoshi Nakamura
In this paper, we propose a method for avoiding digressions in discussion by detecting unnecessary utterances and having a dialogue system intervene. The detector is based on the features using word frequency and topic shifts. The performance (i.e. accuracy, recall, precision, and F-measure) of the unnecessary utterance detector is evaluated through leave-one-dialogue-out cross-validation. In the evaluation, we find that the performance of the proposed detector is higher than that of a typical automatic summarization method.
在本文中,我们提出了一种通过检测不必要的话语并让对话系统介入来避免讨论中跑题的方法。检测器是基于使用词频和主题转移的特征。不必要的话语检测器的性能(即准确性、召回率、精密度和F-measure)通过“留一个对话”交叉验证来评估。在评估中,我们发现该检测器的性能高于典型的自动摘要方法。
{"title":"Unnecessary utterance detection for avoiding digressions in discussion","authors":"Riki Yoshida, Takuya Hiraoka, Graham Neubig, S. Sakti, T. Toda, Satoshi Nakamura","doi":"10.1109/APSIPA.2014.7041572","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041572","url":null,"abstract":"In this paper, we propose a method for avoiding digressions in discussion by detecting unnecessary utterances and having a dialogue system intervene. The detector is based on the features using word frequency and topic shifts. The performance (i.e. accuracy, recall, precision, and F-measure) of the unnecessary utterance detector is evaluated through leave-one-dialogue-out cross-validation. In the evaluation, we find that the performance of the proposed detector is higher than that of a typical automatic summarization method.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123530340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A method for emotional speech synthesis based on the position of emotional state in Valence-Activation space 一种基于情绪状态在效价激活空间位置的情绪语音合成方法
Yasuhiro Hamada, Reda Elbarougy, M. Akagi
Speech to Speech translation (S2ST) systems are very important for processing by which a spoken utterance in one language is used to produce a spoken output in another language. In S2ST techniques, so far, linguistic information has been mainly adopted without para- and non-linguistic information (emotion, individuality and gender, etc.). Therefore, this systems have a limitation in synthesizing affective speech, for example emotional speech, instead of neutral one. To deal with affective speech, a system that can recognize and synthesize emotional speech is required. Although most studies focused on emotions categorically, emotional styles are not categorical but continuously spread in emotion space that are spanned by two dimensions (Valence and Activation). This paper proposes a method for synthesizing emotional speech based on the positions in Valence-Activation (V-A) space. In order to model relationships between acoustic features and V-A space, Fuzzy Inference Systems (FISs) were constructed. Twenty-one acoustic features were morphed using FISs. To verify whether synthesized speech can be perceived as the same intended position in V-A space, listening tests were carried out. The results indicate that the synthesized speech can give the same impression in the V-A space as the intended speech does.
语音到语音翻译(S2ST)系统对于将一种语言的口头话语用于产生另一种语言的口头输出的处理非常重要。在S2ST技术中,迄今为止主要采用的是语言信息,而没有使用超语言和非语言信息(情感、个性、性别等)。因此,该系统在合成情感言语(如情绪性言语)而非中性言语方面存在一定的局限性。为了处理情感言语,需要一个能够识别和合成情感言语的系统。虽然大多数研究关注的是情绪的分类,但情绪风格并不是分类的,而是在两个维度(效价和激活)跨越的情绪空间中不断传播的。本文提出了一种基于价态激活(V-A)空间位置的情感语音合成方法。为了对声学特征与V-A空间之间的关系进行建模,构建了模糊推理系统。使用FISs对21个声学特征进行了变形。为了验证合成语音是否可以被感知为在V-A空间中相同的预期位置,进行了听力测试。结果表明,合成语音能够在V-A空间中给人与预期语音相同的印象。
{"title":"A method for emotional speech synthesis based on the position of emotional state in Valence-Activation space","authors":"Yasuhiro Hamada, Reda Elbarougy, M. Akagi","doi":"10.1109/APSIPA.2014.7041729","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041729","url":null,"abstract":"Speech to Speech translation (S2ST) systems are very important for processing by which a spoken utterance in one language is used to produce a spoken output in another language. In S2ST techniques, so far, linguistic information has been mainly adopted without para- and non-linguistic information (emotion, individuality and gender, etc.). Therefore, this systems have a limitation in synthesizing affective speech, for example emotional speech, instead of neutral one. To deal with affective speech, a system that can recognize and synthesize emotional speech is required. Although most studies focused on emotions categorically, emotional styles are not categorical but continuously spread in emotion space that are spanned by two dimensions (Valence and Activation). This paper proposes a method for synthesizing emotional speech based on the positions in Valence-Activation (V-A) space. In order to model relationships between acoustic features and V-A space, Fuzzy Inference Systems (FISs) were constructed. Twenty-one acoustic features were morphed using FISs. To verify whether synthesized speech can be perceived as the same intended position in V-A space, listening tests were carried out. The results indicate that the synthesized speech can give the same impression in the V-A space as the intended speech does.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"603 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123234891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Forensics of image blurring and sharpening history based on NSCT domain 基于NSCT域的图像模糊和锐化历史取证
Yahui Liu, Yao Zhao, R. Ni
Detection of multi-manipulated image has always been a more realistic direction for digital image forensic technologies, which extremely attracts interests of researchers. However, mutual affects of manipulations make it difficult to identify the process using existing single-manipulated detection methods. In this paper, a novel algorithm for detecting image manipulation history of blurring and sharpening is proposed based on non-subsampled contourlet transform (NSCT) domain. Two main sets of features are extracted from the NSCT domain: extremum feature and local directional similarity vector. Extremum feature includes multiple maximums and minimums of NSCT coefficients through every scale. Under the influence of blurring or sharpening manipulation, the extremum feature tends to gain ideal discrimination. Directional similarity feature represents the correlation of a pixel and its neighbors, which can also be altered by blurring or sharpening. For one pixel, the directional vector is composed of the coefficients from every directional subband at a certain scale. Local directional similarity vector is obtained through similarity calculation between the directional vector of one random selected pixel and the directional vectors of its 8-neighborhood pixels. With the proposed features, we are able to detect two particular operations and determine the processing order at the same time. Experiment results manifest that the proposed algorithm is effective and accurate.
多操纵图像的检测一直是数字图像取证技术发展的一个较为现实的方向,引起了研究者的极大兴趣。然而,操作的相互影响使得使用现有的单操作检测方法难以识别过程。本文提出了一种基于非下采样contourlet变换(NSCT)域的图像模糊和锐化操作历史检测算法。从NSCT域中提取两组主要特征:极值特征和局部方向相似向量。极值特征包括NSCT系数在每个尺度上的多个最大值和最小值。在模糊或锐化操作的影响下,极值特征往往能获得理想的识别效果。方向相似性特征表示像素与其相邻点的相关性,这种相关性也可以通过模糊或锐化来改变。对于一个像素,方向矢量是由每个方向子带在一定尺度上的系数组成的。通过对随机选取的一个像素点的方向向量与其8个邻域像素点的方向向量进行相似度计算,得到局部方向相似向量。利用所提出的特征,我们能够检测两个特定的操作并同时确定处理顺序。实验结果表明了该算法的有效性和准确性。
{"title":"Forensics of image blurring and sharpening history based on NSCT domain","authors":"Yahui Liu, Yao Zhao, R. Ni","doi":"10.1109/APSIPA.2014.7041728","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041728","url":null,"abstract":"Detection of multi-manipulated image has always been a more realistic direction for digital image forensic technologies, which extremely attracts interests of researchers. However, mutual affects of manipulations make it difficult to identify the process using existing single-manipulated detection methods. In this paper, a novel algorithm for detecting image manipulation history of blurring and sharpening is proposed based on non-subsampled contourlet transform (NSCT) domain. Two main sets of features are extracted from the NSCT domain: extremum feature and local directional similarity vector. Extremum feature includes multiple maximums and minimums of NSCT coefficients through every scale. Under the influence of blurring or sharpening manipulation, the extremum feature tends to gain ideal discrimination. Directional similarity feature represents the correlation of a pixel and its neighbors, which can also be altered by blurring or sharpening. For one pixel, the directional vector is composed of the coefficients from every directional subband at a certain scale. Local directional similarity vector is obtained through similarity calculation between the directional vector of one random selected pixel and the directional vectors of its 8-neighborhood pixels. With the proposed features, we are able to detect two particular operations and determine the processing order at the same time. Experiment results manifest that the proposed algorithm is effective and accurate.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126481501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A novel speech enhancement method using power spectra smooth in Wiener filtering 基于功率谱平滑的维纳滤波语音增强方法
Feng Bao, Hui-jing Dou, Mao-shen Jia, C. Bao
In this paper, we propose a novel speech enhancement method by using power spectra smooth of the speech and noise in Wiener filtering based on the fact that a priori SNR in standard Wiener filtering reflects the power ratio of speech and noise in frequency bins. This power ratio also could be approximated by the smoothed spectra of speech and noise. We estimate the power spectra of noise and speech by means of minima controlled recursive averaging method and spectral-subtractive principle, respectively. Then, the linear prediction analysis is used to smooth power spectra of the speech and noise in frequency domain. Finally, we utilize cross-correlation between the power spectra of the noisy speech and noise to modify gains of the power spectra for further reducing noise in silence and unvoiced segments. The objective test results show that the performance of the proposed method outperforms conventional Wiener Filtering and Codebook-based methods.
本文基于标准维纳滤波中的先验信噪比反映了语音和噪声在频域中的功率比,提出了一种利用维纳滤波中语音和噪声功率谱平滑的语音增强方法。这个功率比也可以用平滑的语音和噪声谱来近似表示。分别用最小控制递推平均法和谱减法估计噪声和语音的功率谱。然后,利用线性预测分析在频域平滑语音和噪声的功率谱。最后,我们利用噪声语音的功率谱与噪声之间的相互关系来修改功率谱的增益,以进一步降低静音和非浊音段的噪声。客观测试结果表明,该方法的性能优于传统的维纳滤波和基于码本的方法。
{"title":"A novel speech enhancement method using power spectra smooth in Wiener filtering","authors":"Feng Bao, Hui-jing Dou, Mao-shen Jia, C. Bao","doi":"10.1109/APSIPA.2014.7041526","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041526","url":null,"abstract":"In this paper, we propose a novel speech enhancement method by using power spectra smooth of the speech and noise in Wiener filtering based on the fact that a priori SNR in standard Wiener filtering reflects the power ratio of speech and noise in frequency bins. This power ratio also could be approximated by the smoothed spectra of speech and noise. We estimate the power spectra of noise and speech by means of minima controlled recursive averaging method and spectral-subtractive principle, respectively. Then, the linear prediction analysis is used to smooth power spectra of the speech and noise in frequency domain. Finally, we utilize cross-correlation between the power spectra of the noisy speech and noise to modify gains of the power spectra for further reducing noise in silence and unvoiced segments. The objective test results show that the performance of the proposed method outperforms conventional Wiener Filtering and Codebook-based methods.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130588978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Fast image matching using multi-level texture descriptor 使用多层次纹理描述符的快速图像匹配
Hui-Fuang Ng, Chih-Yang Lin, Tatenda Muindisi
At present, image and video descriptors have been widely used in many computer vision applications. In this paper, a new hierarchical multiscale texture-based image descriptor for efficient image matching is introduced. The proposed descriptor utilizes mean values at multiscale levels of an image region to convert the image region to binary bitmaps and then applies binary operations to effectively reduce the computational time and improve noise reduction to achieve stable and fast image matching. Experimental results show high performance and robustness of our proposed method over existing descriptors on image matching under variant illumination conditions and noise.
目前,图像和视频描述符已广泛应用于许多计算机视觉应用中。本文提出了一种新的基于分层多尺度纹理的图像描述子,用于图像的高效匹配。该描述符利用图像区域的多尺度均值将图像区域转换为二值位图,再通过二值运算有效减少计算时间,提高降噪能力,实现稳定快速的图像匹配。实验结果表明,该方法在不同光照条件和噪声条件下的图像匹配性能优于现有描述符。
{"title":"Fast image matching using multi-level texture descriptor","authors":"Hui-Fuang Ng, Chih-Yang Lin, Tatenda Muindisi","doi":"10.1109/APSIPA.2014.7041672","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041672","url":null,"abstract":"At present, image and video descriptors have been widely used in many computer vision applications. In this paper, a new hierarchical multiscale texture-based image descriptor for efficient image matching is introduced. The proposed descriptor utilizes mean values at multiscale levels of an image region to convert the image region to binary bitmaps and then applies binary operations to effectively reduce the computational time and improve noise reduction to achieve stable and fast image matching. Experimental results show high performance and robustness of our proposed method over existing descriptors on image matching under variant illumination conditions and noise.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"1 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115046664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Investigation and analysis on the effect of filtering mechanisms for 3D depth map coding 滤波机制对三维深度图编码效果的研究与分析
Xiaozhen Zheng, Weiran Li, Jianhua Zheng, Xu Chen
Depth map is a kind of video clip that contains 3D object's depth information, and is an important coding feature in the recently 3D video coding standards, which has been applied for the latest 3D coding approaches, e.g. MV-HEVC and 3D-HEVC. It has been approved that the support of depth map coding can significantly improve the coding performance for 3D videos, and provide more flexibility for 3D applications. Some previous works show that depth map has some different coding properties compared to the traditional 2D sequences. Many coding tools have different performance influence and behaviors on these two kind of video clips. This paper concentrates on the investigation and analysis of those phenomena for depth map coding.
深度图是一种包含三维物体深度信息的视频片段,是最新的3D视频编码标准中的一个重要编码特征,已被应用于最新的3D编码方法,如MV-HEVC和3D- hevc。研究表明,深度图编码的支持可以显著提高3D视频的编码性能,为3D应用提供更大的灵活性。以往的一些研究表明,深度图与传统的二维序列相比具有一些不同的编码特性。许多编码工具对这两种视频剪辑有不同的性能影响和行为。本文着重对深度图编码中的这些现象进行了研究和分析。
{"title":"Investigation and analysis on the effect of filtering mechanisms for 3D depth map coding","authors":"Xiaozhen Zheng, Weiran Li, Jianhua Zheng, Xu Chen","doi":"10.1109/APSIPA.2014.7041669","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041669","url":null,"abstract":"Depth map is a kind of video clip that contains 3D object's depth information, and is an important coding feature in the recently 3D video coding standards, which has been applied for the latest 3D coding approaches, e.g. MV-HEVC and 3D-HEVC. It has been approved that the support of depth map coding can significantly improve the coding performance for 3D videos, and provide more flexibility for 3D applications. Some previous works show that depth map has some different coding properties compared to the traditional 2D sequences. Many coding tools have different performance influence and behaviors on these two kind of video clips. This paper concentrates on the investigation and analysis of those phenomena for depth map coding.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129483180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1