首页 > 最新文献

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)最新文献

英文 中文
Measure of image focus using image segmentation and SML for light field images 对光场图像使用图像分割和SML测量图像焦点
Wisarut Chantara, Yo-Sung Ho
In this paper, a detection method of in-focused regions in the light field stack images is proposed. Its main motivation is that the focus measure with region-based image algorithms can be more meaningful than the focus measure with pixel-based algorithms which just consider individual pixels or associated local neighborhoods of pixels in the focus measure process. After we employ the normalized cut method to segment the light field stack images, we apply the sum-modified-Laplacian operation to the corresponding segmented regions. This process provides a focus measurement to select suitable in-focused areas of the stack images. Since only sharply focused regions have high responses, the in-focused regions can be detected. In addition, the all-focused image can be reconstructed by combining all in-focused image regions.
提出了一种光场叠加图像聚焦区域的检测方法。其主要动机是,在焦点测量过程中,基于区域的图像算法的焦点测量比基于像素的算法的焦点测量更有意义,而基于像素的算法只考虑单个像素或像素的相关局部邻域。在采用归一化切割方法对光场叠加图像进行分割后,对相应的分割区域进行和修正拉普拉斯运算。该过程提供了一个焦点测量,以选择合适的聚焦区域的堆栈图像。由于只有高度聚焦的区域才有高响应,因此可以检测到非聚焦区域。此外,可以将所有聚焦图像区域组合起来重建全聚焦图像。
{"title":"Measure of image focus using image segmentation and SML for light field images","authors":"Wisarut Chantara, Yo-Sung Ho","doi":"10.1109/APSIPA.2016.7820731","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820731","url":null,"abstract":"In this paper, a detection method of in-focused regions in the light field stack images is proposed. Its main motivation is that the focus measure with region-based image algorithms can be more meaningful than the focus measure with pixel-based algorithms which just consider individual pixels or associated local neighborhoods of pixels in the focus measure process. After we employ the normalized cut method to segment the light field stack images, we apply the sum-modified-Laplacian operation to the corresponding segmented regions. This process provides a focus measurement to select suitable in-focused areas of the stack images. Since only sharply focused regions have high responses, the in-focused regions can be detected. In addition, the all-focused image can be reconstructed by combining all in-focused image regions.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114741497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Distributed compression sensing oriented soft video transmission with 1D-DCT over wireless 面向分布式压缩感知的无线1D-DCT软视频传输
Kai Zhang, Anhong Wang, Haidong Wang
Recently, DCS-cast has shown an advantage in accommodating heterogeneous users over wireless networks by combining SoftCast and Distributed Compressed Sensing (DCS). However, DCS-cast's efficiency is not actually high due to its ignorance of the temporal correlation in each packet at the encoder. This paper proposes an improved scheme named DCT-DCS-cast that removes such temporal correlation through a one-dimensional Discrete Cosine Transform (1D-DCT) based on the observation that there exists strong temporal redundancies in each packet of measurements. The power allocation is proposed to minimize the reconstruction errors in each packet. When compared to the benchmark DCS-cast scheme and SoftCast, our DCT-DCS-cast scheme is able to provide better performance when some packets are lost during the transmission in both cases of unicast and multicast.
最近,DCS-cast通过结合SoftCast和分布式压缩感知(DCS)在无线网络上容纳异构用户方面显示出优势。然而,DCS-cast的效率实际上并不高,因为它忽略了编码器中每个数据包的时间相关性。本文提出了一种名为DCT-DCS-cast的改进方案,该方案基于观察到每个测量包中存在强时间冗余,通过一维离散余弦变换(1D-DCT)去除这种时间相关性。功率分配的目的是尽量减少每个包的重构误差。与基准的DCS-cast方案和软播相比,我们的DCT-DCS-cast方案在单播和组播两种情况下都能够提供更好的性能。
{"title":"Distributed compression sensing oriented soft video transmission with 1D-DCT over wireless","authors":"Kai Zhang, Anhong Wang, Haidong Wang","doi":"10.1109/APSIPA.2016.7820733","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820733","url":null,"abstract":"Recently, DCS-cast has shown an advantage in accommodating heterogeneous users over wireless networks by combining SoftCast and Distributed Compressed Sensing (DCS). However, DCS-cast's efficiency is not actually high due to its ignorance of the temporal correlation in each packet at the encoder. This paper proposes an improved scheme named DCT-DCS-cast that removes such temporal correlation through a one-dimensional Discrete Cosine Transform (1D-DCT) based on the observation that there exists strong temporal redundancies in each packet of measurements. The power allocation is proposed to minimize the reconstruction errors in each packet. When compared to the benchmark DCS-cast scheme and SoftCast, our DCT-DCS-cast scheme is able to provide better performance when some packets are lost during the transmission in both cases of unicast and multicast.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125988932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A low power lossy frame memory recompression algorithm 一种低功耗有损帧存储器重压缩算法
Xin Zhou, Xiaocong Lian, Wei Zhou, Zhenyu Liu, Xiu Zhang
With the development of Ultra-High-Definition video, the power consumed by accessing reference frames in the external DRAM has become the bottleneck for the portable video encoding system design. To reduce the dynamic power of DRAM, a lossy frame memory recompression algorithm is proposed. The compression algorithm is composed of a content-aware adaptive quantization, a multi-mode directional prediction, a dynamic kth-order unary/Exp-Golomb coding and a partition group table based storage space reduction scheme. Experimental results show that, an average data reduction ratio of 71.1% is obtained, while 41% memory space can be saved. 59.6% dynamic power of the DRAM is reduced by our strategies in total. The algorithm causes a controllable video quality degradation, and the BD-PSNR is only −0.04db, or equivalently BD-BR=1.42%.
随着超高清视频技术的发展,外接DRAM存取参考帧的功耗问题已经成为便携式视频编码系统设计的瓶颈。为了降低DRAM的动态功耗,提出了一种有损帧内存重压缩算法。压缩算法由内容感知自适应量化、多模方向预测、动态k阶一元/Exp-Golomb编码和基于分区群表的存储空间缩减方案组成。实验结果表明,该算法的平均数据缩减率为71.1%,节省了41%的内存空间。我们的策略总共降低了DRAM动态功耗的59.6%。该算法使视频质量下降可控,BD-PSNR仅为- 0.04db,即BD-BR=1.42%。
{"title":"A low power lossy frame memory recompression algorithm","authors":"Xin Zhou, Xiaocong Lian, Wei Zhou, Zhenyu Liu, Xiu Zhang","doi":"10.1109/APSIPA.2016.7820747","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820747","url":null,"abstract":"With the development of Ultra-High-Definition video, the power consumed by accessing reference frames in the external DRAM has become the bottleneck for the portable video encoding system design. To reduce the dynamic power of DRAM, a lossy frame memory recompression algorithm is proposed. The compression algorithm is composed of a content-aware adaptive quantization, a multi-mode directional prediction, a dynamic kth-order unary/Exp-Golomb coding and a partition group table based storage space reduction scheme. Experimental results show that, an average data reduction ratio of 71.1% is obtained, while 41% memory space can be saved. 59.6% dynamic power of the DRAM is reduced by our strategies in total. The algorithm causes a controllable video quality degradation, and the BD-PSNR is only −0.04db, or equivalently BD-BR=1.42%.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129883108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance estimation of spontaneous speech recognition using non-reference acoustic features 基于非参考声学特征的自发语音识别性能估计
Ling Guo, Takeshi Yamada, S. Makino
To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.
为了保证令人满意的QoE(体验质量),建立一种能够有效地研究自发语音识别性能的方法是至关重要的。通过使用该方法,可以在提供语音识别服务时对识别性能进行监控。它也可用作语音对话系统的可靠性度量。在此之前,已经提出了基于频谱失真度量的噪声语音识别性能估计方法。虽然这些方法在没有实际进行语音识别的情况下给出了识别性能的估计,但由于这些方法需要参考语音来获得失真值,因此无法应用于自发语音。为了解决这个问题,我们提出了一种新的方法来估计不同说话风格的自发语音的识别性能。主要特点是使用不需要参考语音的非参考声学特征。该方法通过openSMILE(开源媒体大特征空间提取)提取非参考特征,然后利用支持向量回归(SVR)估计识别性能。我们通过使用OGVC(在线游戏语音聊天)语料库中的自发语音数据验证了所提出方法的有效性。
{"title":"Performance estimation of spontaneous speech recognition using non-reference acoustic features","authors":"Ling Guo, Takeshi Yamada, S. Makino","doi":"10.1109/APSIPA.2016.7820792","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820792","url":null,"abstract":"To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128568300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep neural network based voice conversion with a large synthesized parallel corpus 基于深度神经网络的语音转换与大型合成并行语料库
Zhengqi Wen, Kehuang Li, J. Tao, Chin-Hui Lee
We propose a voice conversion framework to map the speech features of a source speaker to a target speaker based on deep neural networks (DNNs). Due to a limited availability of the parallel data needed for a pair of source and target speakers, speech synthesis and dynamic time warping are utilized to construct a large parallel corpus for DNN training. With a small corpus to train DNNs, a lower log spectral distortion can still be seen over the conventional Gaussian mixture model (GMM) approach, trained with the same data. With the synthesized parallel corpus, a speech naturalness preference score of about 54.5% vs. 32.8% and a speech similarity preference score of about 52.5% vs. 23.6% are observed for the DNN-converted speech from the large parallel corpus when compared with the DNN-converted speech from the small parallel corpus.
我们提出了一种语音转换框架,基于深度神经网络(DNN)将源扬声器的语音特征映射到目标扬声器。由于一对源扬声器和目标扬声器所需的并行数据可用性有限,我们利用语音合成和动态时间扭曲来构建用于 DNN 训练的大型并行语料库。与使用相同数据训练的传统高斯混合模型 (GMM) 方法相比,使用小型语料库训练 DNN 的对数频谱失真仍然较低。在合成平行语料库中,与小平行语料库中的 DNN 转换语音相比,大平行语料库中的 DNN 转换语音的语音自然度偏好分数约为 54.5%,而小平行语料库中的 DNN 转换语音的语音自然度偏好分数约为 32.8%,语音相似度偏好分数约为 52.5%,而小平行语料库中的 DNN 转换语音的语音相似度偏好分数约为 23.6%。
{"title":"Deep neural network based voice conversion with a large synthesized parallel corpus","authors":"Zhengqi Wen, Kehuang Li, J. Tao, Chin-Hui Lee","doi":"10.1109/APSIPA.2016.7820716","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820716","url":null,"abstract":"We propose a voice conversion framework to map the speech features of a source speaker to a target speaker based on deep neural networks (DNNs). Due to a limited availability of the parallel data needed for a pair of source and target speakers, speech synthesis and dynamic time warping are utilized to construct a large parallel corpus for DNN training. With a small corpus to train DNNs, a lower log spectral distortion can still be seen over the conventional Gaussian mixture model (GMM) approach, trained with the same data. With the synthesized parallel corpus, a speech naturalness preference score of about 54.5% vs. 32.8% and a speech similarity preference score of about 52.5% vs. 23.6% are observed for the DNN-converted speech from the large parallel corpus when compared with the DNN-converted speech from the small parallel corpus.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130704502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Spoofing speech detection using temporal convolutional neural network 基于时间卷积神经网络的欺骗语音检测
Xiaohai Tian, Xiong Xiao, Chng Eng Siong, Haizhou Li
Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these features are not sufficient. Most of the systems failed to detect the non-vocoder or unit selection based spoofing attacks. In this work, we propose to use a temporal convolutional neural network (CNN) based classifier for spoofing speech detection. The temporal CNN first convolves the feature trajectories with a set of filters, then extract the maximum responses of these filters within a time window using a max-pooling layer. Due to the use of max-pooling, we can extract useful information from a long temporal span without concatenating a large number of neighbouring frames, as in feedforward deep neural network (DNN). Five types of feature are employed to access the performance of proposed classifier. Experimental results on ASVspoof 2015 corpus show that the temporal CNN based classifier is effective for synthetic speech detection. Specifically, the proposed method brings a significant performance boost for the unit selection based spoofing speech detection.
欺骗语音检测的目的是将欺骗语音与自然语音区分开来。基于框架的特征通常在以前的大多数作品中使用。虽然使用多个帧或动态特征组成一个超级向量来表示时间信息,但这些特征所覆盖的时间跨度是不够的。大多数系统无法检测到非声码器或基于单元选择的欺骗攻击。在这项工作中,我们提出使用基于时间卷积神经网络(CNN)的分类器进行欺骗语音检测。时间CNN首先将特征轨迹与一组滤波器进行卷积,然后使用最大池化层在时间窗口内提取这些滤波器的最大响应。由于使用最大池化,我们可以从长时间跨度中提取有用的信息,而无需像前馈深度神经网络(DNN)那样连接大量相邻帧。使用五种类型的特征来访问所提出的分类器的性能。在ASVspoof 2015语料库上的实验结果表明,基于时态CNN的分类器对合成语音检测是有效的。具体来说,该方法对基于单元选择的欺骗语音检测有显著的性能提升。
{"title":"Spoofing speech detection using temporal convolutional neural network","authors":"Xiaohai Tian, Xiong Xiao, Chng Eng Siong, Haizhou Li","doi":"10.1109/APSIPA.2016.7820738","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820738","url":null,"abstract":"Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these features are not sufficient. Most of the systems failed to detect the non-vocoder or unit selection based spoofing attacks. In this work, we propose to use a temporal convolutional neural network (CNN) based classifier for spoofing speech detection. The temporal CNN first convolves the feature trajectories with a set of filters, then extract the maximum responses of these filters within a time window using a max-pooling layer. Due to the use of max-pooling, we can extract useful information from a long temporal span without concatenating a large number of neighbouring frames, as in feedforward deep neural network (DNN). Five types of feature are employed to access the performance of proposed classifier. Experimental results on ASVspoof 2015 corpus show that the temporal CNN based classifier is effective for synthetic speech detection. Specifically, the proposed method brings a significant performance boost for the unit selection based spoofing speech detection.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132776144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Pattern learning in closed-form 封闭式模式学习
K. Toh
“The more relevant patterns at your disposal, the better your decisions will be.” — Herbert Simon. We shall begin this overview session by a conceptual recall of state-of-the-art learning algorithms such as Linear Regression (LR), Linear Discriminant Analysis (LDA), k-Nearest Neighbors (kNN) and Support Vector Machines (SVM) for pattern classification. Next, several closed-form learning formulations for classification are introduced. In particular, the classification total error rate (TER) and the receiver operating characteristics (ROC) are shown to be optimized in closed-form. Such results not only facilitate efficient batch learning, but also they can be extended to online applications where the learning is convergent according to data arrival. These learning formulations are subsequently shown to be inter-related from the data transformation perspective. Some numerical examples are included to compare the performances of these learning formulations.
“你掌握的相关模式越多,你的决策就越好。——赫伯特·西蒙。我们将从概念上回顾最先进的学习算法,如线性回归(LR),线性判别分析(LDA), k-近邻(kNN)和支持向量机(SVM)进行模式分类。接下来,介绍了几种用于分类的封闭式学习公式。特别是,分类总错误率(TER)和接收机工作特性(ROC)以封闭形式优化。这样的结果不仅有利于高效的批量学习,而且可以扩展到在线应用中,根据数据的到达,学习是收敛的。从数据转换的角度来看,这些学习公式是相互关联的。通过一些数值算例比较了这些学习公式的性能。
{"title":"Pattern learning in closed-form","authors":"K. Toh","doi":"10.1109/APSIPA.2016.7820862","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820862","url":null,"abstract":"“The more relevant patterns at your disposal, the better your decisions will be.” — Herbert Simon. We shall begin this overview session by a conceptual recall of state-of-the-art learning algorithms such as Linear Regression (LR), Linear Discriminant Analysis (LDA), k-Nearest Neighbors (kNN) and Support Vector Machines (SVM) for pattern classification. Next, several closed-form learning formulations for classification are introduced. In particular, the classification total error rate (TER) and the receiver operating characteristics (ROC) are shown to be optimized in closed-form. Such results not only facilitate efficient batch learning, but also they can be extended to online applications where the learning is convergent according to data arrival. These learning formulations are subsequently shown to be inter-related from the data transformation perspective. Some numerical examples are included to compare the performances of these learning formulations.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133006754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced query by humming system using diffused hidden Markov model and tempo based dynamic programming 利用扩散隐马尔可夫模型和基于节奏的动态规划实现蜂鸣声系统的高级查询
Chiao-Wei Lin, Jian-Jiun Ding, Che-Ming Hu
Query by humming (QBH) is a content-based system to identify which song a person sang. In this paper, we proposed a note-based QBH system which apply the hidden Markov model and dynamic programming to find the most possible song. Also, we proposed several techniques to improve the QBH system performance. First, we propose a modified method for onset detection. The frequency information is also used in this part By time-frequency analysis, we can find out the onset points which are difficult to be picked up in the time domain. Besides the pitch feature, the beat information and possible pitch and humming errors are also considered for melody matching. The tempo feature is also an important part for a song. Even though the pitch sequences of two songs are the same, if the tempo is clearly different, then they are complete different songs. Also the possible singing errors are considered. Simulations show that the performance can be much improved by our proposed methods.
哼歌查询(QBH)是一个基于内容的系统,用于识别一个人唱了哪首歌。在本文中,我们提出了一个基于音符的QBH系统,该系统应用隐马尔可夫模型和动态规划来寻找最可能的歌曲。此外,我们还提出了几种提高QBH系统性能的技术。首先,我们提出了一种改进的起始检测方法。该部分还利用了频率信息,通过时频分析,可以找出在时域难以提取的起始点。除了音高特征外,还考虑了节拍信息以及可能出现的音高和哼音误差。节奏特征也是一首歌曲的重要组成部分。即使两首歌的音高序列相同,但如果节奏明显不同,那么它们就是完全不同的歌曲。同时也考虑了可能出现的歌唱误差。仿真结果表明,本文提出的方法可以大大提高系统的性能。
{"title":"Advanced query by humming system using diffused hidden Markov model and tempo based dynamic programming","authors":"Chiao-Wei Lin, Jian-Jiun Ding, Che-Ming Hu","doi":"10.1109/APSIPA.2016.7820765","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820765","url":null,"abstract":"Query by humming (QBH) is a content-based system to identify which song a person sang. In this paper, we proposed a note-based QBH system which apply the hidden Markov model and dynamic programming to find the most possible song. Also, we proposed several techniques to improve the QBH system performance. First, we propose a modified method for onset detection. The frequency information is also used in this part By time-frequency analysis, we can find out the onset points which are difficult to be picked up in the time domain. Besides the pitch feature, the beat information and possible pitch and humming errors are also considered for melody matching. The tempo feature is also an important part for a song. Even though the pitch sequences of two songs are the same, if the tempo is clearly different, then they are complete different songs. Also the possible singing errors are considered. Simulations show that the performance can be much improved by our proposed methods.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127955688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Acoustic probing to estimate freshness of tomato 用声波探测法测定番茄新鲜度
Hidetomo Kataoka, Takashi Ijiri, Jeremy White, A. Hirabayashi
The freshness of vegetables attracts significant interest, because consumers will determine the way of cooking based on the maturity of the vegetable or select better vegetables in supermarkets based on the freshness information. This paper focuses on tomatoes, and reports our preliminary studies on acoustic probing techniques to estimate their storage term. We hit an acoustic probe that sweeps audible band to a sample and capture an transmitted acoustic signal by using a microphone. We collect transmitted signals for samples with various storage terms and the obtained signals are used to train a classifier. In our study, twelve sample tomatoes were measured during fourteen days. We found the amplitude of the transmitted signal obviously decreases as the tomato matures.
蔬菜的新鲜度引起了人们极大的兴趣,因为消费者会根据蔬菜的成熟度来决定烹饪的方式,或者根据新鲜度信息在超市选择更好的蔬菜。本文以番茄为研究对象,报道了利用声波探测技术估算其贮藏期的初步研究。我们用一个声波探测器将可听波段扫到样本上,并通过麦克风捕获传输的声波信号。我们收集具有不同存储项的样本的传输信号,并使用获得的信号来训练分类器。在我们的研究中,在14天内测量了12个样品西红柿。我们发现,随着番茄的成熟,传输信号的幅度明显减小。
{"title":"Acoustic probing to estimate freshness of tomato","authors":"Hidetomo Kataoka, Takashi Ijiri, Jeremy White, A. Hirabayashi","doi":"10.1109/APSIPA.2016.7820777","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820777","url":null,"abstract":"The freshness of vegetables attracts significant interest, because consumers will determine the way of cooking based on the maturity of the vegetable or select better vegetables in supermarkets based on the freshness information. This paper focuses on tomatoes, and reports our preliminary studies on acoustic probing techniques to estimate their storage term. We hit an acoustic probe that sweeps audible band to a sample and capture an transmitted acoustic signal by using a microphone. We collect transmitted signals for samples with various storage terms and the obtained signals are used to train a classifier. In our study, twelve sample tomatoes were measured during fourteen days. We found the amplitude of the transmitted signal obviously decreases as the tomato matures.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Appearance dependent inter-part relationship for human pose estimation 人体姿态估计中依赖于外观的局部关系
Yumin Suh, Kyoung Mu Lee
We propose a new method for human pose estimation from a single image. Since both appearance and locations of different body parts strongly depends on each other in an image, considering their relationship helps identifying the underlying poses. However, most of the existing methods cannot fully utilize this contextual information by using simplified model to make inference tractable. The proposed method models general relationship between body parts based on the convolutional neural networks, while keeping inference tractableble by effectively reducing the search space to a subset of poses by pruning unreliable ones based on the strong unary part detectors. Experimental results demonstrate that the proposed method improves the accuracy than baselines, on FLIC and LSP dataset, while keeping inference and learning tractable.
我们提出了一种新的基于单幅图像的人体姿态估计方法。因为在一张照片中,不同身体部位的外观和位置都是相互依赖的,考虑它们之间的关系有助于识别潜在的姿势。然而,现有的方法大多不能充分利用上下文信息,采用简化模型使推理易于处理。该方法基于卷积神经网络对人体部位之间的一般关系进行建模,同时通过基于强一元部分检测器修剪不可靠的部分,有效地将搜索空间缩减到姿态子集,从而保持推理的可处理性。实验结果表明,该方法在FLIC和LSP数据集上的准确率比基线有所提高,同时保持了推理和学习的可处理性。
{"title":"Appearance dependent inter-part relationship for human pose estimation","authors":"Yumin Suh, Kyoung Mu Lee","doi":"10.1109/APSIPA.2016.7820757","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820757","url":null,"abstract":"We propose a new method for human pose estimation from a single image. Since both appearance and locations of different body parts strongly depends on each other in an image, considering their relationship helps identifying the underlying poses. However, most of the existing methods cannot fully utilize this contextual information by using simplified model to make inference tractable. The proposed method models general relationship between body parts based on the convolutional neural networks, while keeping inference tractableble by effectively reducing the search space to a subset of poses by pruning unreliable ones based on the strong unary part detectors. Experimental results demonstrate that the proposed method improves the accuracy than baselines, on FLIC and LSP dataset, while keeping inference and learning tractable.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125374807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1