Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820731
Wisarut Chantara, Yo-Sung Ho
In this paper, a detection method of in-focused regions in the light field stack images is proposed. Its main motivation is that the focus measure with region-based image algorithms can be more meaningful than the focus measure with pixel-based algorithms which just consider individual pixels or associated local neighborhoods of pixels in the focus measure process. After we employ the normalized cut method to segment the light field stack images, we apply the sum-modified-Laplacian operation to the corresponding segmented regions. This process provides a focus measurement to select suitable in-focused areas of the stack images. Since only sharply focused regions have high responses, the in-focused regions can be detected. In addition, the all-focused image can be reconstructed by combining all in-focused image regions.
{"title":"Measure of image focus using image segmentation and SML for light field images","authors":"Wisarut Chantara, Yo-Sung Ho","doi":"10.1109/APSIPA.2016.7820731","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820731","url":null,"abstract":"In this paper, a detection method of in-focused regions in the light field stack images is proposed. Its main motivation is that the focus measure with region-based image algorithms can be more meaningful than the focus measure with pixel-based algorithms which just consider individual pixels or associated local neighborhoods of pixels in the focus measure process. After we employ the normalized cut method to segment the light field stack images, we apply the sum-modified-Laplacian operation to the corresponding segmented regions. This process provides a focus measurement to select suitable in-focused areas of the stack images. Since only sharply focused regions have high responses, the in-focused regions can be detected. In addition, the all-focused image can be reconstructed by combining all in-focused image regions.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114741497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820733
Kai Zhang, Anhong Wang, Haidong Wang
Recently, DCS-cast has shown an advantage in accommodating heterogeneous users over wireless networks by combining SoftCast and Distributed Compressed Sensing (DCS). However, DCS-cast's efficiency is not actually high due to its ignorance of the temporal correlation in each packet at the encoder. This paper proposes an improved scheme named DCT-DCS-cast that removes such temporal correlation through a one-dimensional Discrete Cosine Transform (1D-DCT) based on the observation that there exists strong temporal redundancies in each packet of measurements. The power allocation is proposed to minimize the reconstruction errors in each packet. When compared to the benchmark DCS-cast scheme and SoftCast, our DCT-DCS-cast scheme is able to provide better performance when some packets are lost during the transmission in both cases of unicast and multicast.
{"title":"Distributed compression sensing oriented soft video transmission with 1D-DCT over wireless","authors":"Kai Zhang, Anhong Wang, Haidong Wang","doi":"10.1109/APSIPA.2016.7820733","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820733","url":null,"abstract":"Recently, DCS-cast has shown an advantage in accommodating heterogeneous users over wireless networks by combining SoftCast and Distributed Compressed Sensing (DCS). However, DCS-cast's efficiency is not actually high due to its ignorance of the temporal correlation in each packet at the encoder. This paper proposes an improved scheme named DCT-DCS-cast that removes such temporal correlation through a one-dimensional Discrete Cosine Transform (1D-DCT) based on the observation that there exists strong temporal redundancies in each packet of measurements. The power allocation is proposed to minimize the reconstruction errors in each packet. When compared to the benchmark DCS-cast scheme and SoftCast, our DCT-DCS-cast scheme is able to provide better performance when some packets are lost during the transmission in both cases of unicast and multicast.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125988932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of Ultra-High-Definition video, the power consumed by accessing reference frames in the external DRAM has become the bottleneck for the portable video encoding system design. To reduce the dynamic power of DRAM, a lossy frame memory recompression algorithm is proposed. The compression algorithm is composed of a content-aware adaptive quantization, a multi-mode directional prediction, a dynamic kth-order unary/Exp-Golomb coding and a partition group table based storage space reduction scheme. Experimental results show that, an average data reduction ratio of 71.1% is obtained, while 41% memory space can be saved. 59.6% dynamic power of the DRAM is reduced by our strategies in total. The algorithm causes a controllable video quality degradation, and the BD-PSNR is only −0.04db, or equivalently BD-BR=1.42%.
{"title":"A low power lossy frame memory recompression algorithm","authors":"Xin Zhou, Xiaocong Lian, Wei Zhou, Zhenyu Liu, Xiu Zhang","doi":"10.1109/APSIPA.2016.7820747","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820747","url":null,"abstract":"With the development of Ultra-High-Definition video, the power consumed by accessing reference frames in the external DRAM has become the bottleneck for the portable video encoding system design. To reduce the dynamic power of DRAM, a lossy frame memory recompression algorithm is proposed. The compression algorithm is composed of a content-aware adaptive quantization, a multi-mode directional prediction, a dynamic kth-order unary/Exp-Golomb coding and a partition group table based storage space reduction scheme. Experimental results show that, an average data reduction ratio of 71.1% is obtained, while 41% memory space can be saved. 59.6% dynamic power of the DRAM is reduced by our strategies in total. The algorithm causes a controllable video quality degradation, and the BD-PSNR is only −0.04db, or equivalently BD-BR=1.42%.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129883108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820792
Ling Guo, Takeshi Yamada, S. Makino
To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.
{"title":"Performance estimation of spontaneous speech recognition using non-reference acoustic features","authors":"Ling Guo, Takeshi Yamada, S. Makino","doi":"10.1109/APSIPA.2016.7820792","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820792","url":null,"abstract":"To ensure a satisfactory QoE (Quality of Experience), it is essential to establish a method that can be used to efficiently investigate recognition performance for spontaneous speech. By using this method, it is allowed to monitor the recognition performance in providing speech recognition services. It can be also used as a reliability measure in speech dialogue systems. Previously, methods for estimating the performance of noisy speech recognition based on spectral distortion measures have been proposed. Although they give an estimate of recognition performance without actually performing speech recognition, the methods cannot be applied to spontaneous speech because they require the reference speech to obtain the distortion values. To solve this problem, we propose a novel method for estimating the recognition performance of spontaneous speech with various speaking styles. The main feature is to use non-reference acoustic features that do not require the reference speech. The proposed method extracts non-reference features by openSMILE (open-Source Media Interpretation by Large feature-space Extraction) and then estimates the recognition performance by using SVR (Support Vector Regression). We confirmed the effectiveness of the proposed method by experiments using spontaneous speech data from the OGVC (On-line Gaming Voice Chat) corpus.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128568300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820716
Zhengqi Wen, Kehuang Li, J. Tao, Chin-Hui Lee
We propose a voice conversion framework to map the speech features of a source speaker to a target speaker based on deep neural networks (DNNs). Due to a limited availability of the parallel data needed for a pair of source and target speakers, speech synthesis and dynamic time warping are utilized to construct a large parallel corpus for DNN training. With a small corpus to train DNNs, a lower log spectral distortion can still be seen over the conventional Gaussian mixture model (GMM) approach, trained with the same data. With the synthesized parallel corpus, a speech naturalness preference score of about 54.5% vs. 32.8% and a speech similarity preference score of about 52.5% vs. 23.6% are observed for the DNN-converted speech from the large parallel corpus when compared with the DNN-converted speech from the small parallel corpus.
{"title":"Deep neural network based voice conversion with a large synthesized parallel corpus","authors":"Zhengqi Wen, Kehuang Li, J. Tao, Chin-Hui Lee","doi":"10.1109/APSIPA.2016.7820716","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820716","url":null,"abstract":"We propose a voice conversion framework to map the speech features of a source speaker to a target speaker based on deep neural networks (DNNs). Due to a limited availability of the parallel data needed for a pair of source and target speakers, speech synthesis and dynamic time warping are utilized to construct a large parallel corpus for DNN training. With a small corpus to train DNNs, a lower log spectral distortion can still be seen over the conventional Gaussian mixture model (GMM) approach, trained with the same data. With the synthesized parallel corpus, a speech naturalness preference score of about 54.5% vs. 32.8% and a speech similarity preference score of about 52.5% vs. 23.6% are observed for the DNN-converted speech from the large parallel corpus when compared with the DNN-converted speech from the small parallel corpus.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130704502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820738
Xiaohai Tian, Xiong Xiao, Chng Eng Siong, Haizhou Li
Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these features are not sufficient. Most of the systems failed to detect the non-vocoder or unit selection based spoofing attacks. In this work, we propose to use a temporal convolutional neural network (CNN) based classifier for spoofing speech detection. The temporal CNN first convolves the feature trajectories with a set of filters, then extract the maximum responses of these filters within a time window using a max-pooling layer. Due to the use of max-pooling, we can extract useful information from a long temporal span without concatenating a large number of neighbouring frames, as in feedforward deep neural network (DNN). Five types of feature are employed to access the performance of proposed classifier. Experimental results on ASVspoof 2015 corpus show that the temporal CNN based classifier is effective for synthetic speech detection. Specifically, the proposed method brings a significant performance boost for the unit selection based spoofing speech detection.
{"title":"Spoofing speech detection using temporal convolutional neural network","authors":"Xiaohai Tian, Xiong Xiao, Chng Eng Siong, Haizhou Li","doi":"10.1109/APSIPA.2016.7820738","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820738","url":null,"abstract":"Spoofing speech detection aims to differentiate spoofing speech from natural speech. Frame-based features are usually used in most of previous works. Although multiple frames or dynamic features are used to form a super-vector to represent the temporal information, the time span covered by these features are not sufficient. Most of the systems failed to detect the non-vocoder or unit selection based spoofing attacks. In this work, we propose to use a temporal convolutional neural network (CNN) based classifier for spoofing speech detection. The temporal CNN first convolves the feature trajectories with a set of filters, then extract the maximum responses of these filters within a time window using a max-pooling layer. Due to the use of max-pooling, we can extract useful information from a long temporal span without concatenating a large number of neighbouring frames, as in feedforward deep neural network (DNN). Five types of feature are employed to access the performance of proposed classifier. Experimental results on ASVspoof 2015 corpus show that the temporal CNN based classifier is effective for synthetic speech detection. Specifically, the proposed method brings a significant performance boost for the unit selection based spoofing speech detection.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132776144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820862
K. Toh
“The more relevant patterns at your disposal, the better your decisions will be.” — Herbert Simon. We shall begin this overview session by a conceptual recall of state-of-the-art learning algorithms such as Linear Regression (LR), Linear Discriminant Analysis (LDA), k-Nearest Neighbors (kNN) and Support Vector Machines (SVM) for pattern classification. Next, several closed-form learning formulations for classification are introduced. In particular, the classification total error rate (TER) and the receiver operating characteristics (ROC) are shown to be optimized in closed-form. Such results not only facilitate efficient batch learning, but also they can be extended to online applications where the learning is convergent according to data arrival. These learning formulations are subsequently shown to be inter-related from the data transformation perspective. Some numerical examples are included to compare the performances of these learning formulations.
{"title":"Pattern learning in closed-form","authors":"K. Toh","doi":"10.1109/APSIPA.2016.7820862","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820862","url":null,"abstract":"“The more relevant patterns at your disposal, the better your decisions will be.” — Herbert Simon. We shall begin this overview session by a conceptual recall of state-of-the-art learning algorithms such as Linear Regression (LR), Linear Discriminant Analysis (LDA), k-Nearest Neighbors (kNN) and Support Vector Machines (SVM) for pattern classification. Next, several closed-form learning formulations for classification are introduced. In particular, the classification total error rate (TER) and the receiver operating characteristics (ROC) are shown to be optimized in closed-form. Such results not only facilitate efficient batch learning, but also they can be extended to online applications where the learning is convergent according to data arrival. These learning formulations are subsequently shown to be inter-related from the data transformation perspective. Some numerical examples are included to compare the performances of these learning formulations.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133006754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820765
Chiao-Wei Lin, Jian-Jiun Ding, Che-Ming Hu
Query by humming (QBH) is a content-based system to identify which song a person sang. In this paper, we proposed a note-based QBH system which apply the hidden Markov model and dynamic programming to find the most possible song. Also, we proposed several techniques to improve the QBH system performance. First, we propose a modified method for onset detection. The frequency information is also used in this part By time-frequency analysis, we can find out the onset points which are difficult to be picked up in the time domain. Besides the pitch feature, the beat information and possible pitch and humming errors are also considered for melody matching. The tempo feature is also an important part for a song. Even though the pitch sequences of two songs are the same, if the tempo is clearly different, then they are complete different songs. Also the possible singing errors are considered. Simulations show that the performance can be much improved by our proposed methods.
{"title":"Advanced query by humming system using diffused hidden Markov model and tempo based dynamic programming","authors":"Chiao-Wei Lin, Jian-Jiun Ding, Che-Ming Hu","doi":"10.1109/APSIPA.2016.7820765","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820765","url":null,"abstract":"Query by humming (QBH) is a content-based system to identify which song a person sang. In this paper, we proposed a note-based QBH system which apply the hidden Markov model and dynamic programming to find the most possible song. Also, we proposed several techniques to improve the QBH system performance. First, we propose a modified method for onset detection. The frequency information is also used in this part By time-frequency analysis, we can find out the onset points which are difficult to be picked up in the time domain. Besides the pitch feature, the beat information and possible pitch and humming errors are also considered for melody matching. The tempo feature is also an important part for a song. Even though the pitch sequences of two songs are the same, if the tempo is clearly different, then they are complete different songs. Also the possible singing errors are considered. Simulations show that the performance can be much improved by our proposed methods.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127955688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820777
Hidetomo Kataoka, Takashi Ijiri, Jeremy White, A. Hirabayashi
The freshness of vegetables attracts significant interest, because consumers will determine the way of cooking based on the maturity of the vegetable or select better vegetables in supermarkets based on the freshness information. This paper focuses on tomatoes, and reports our preliminary studies on acoustic probing techniques to estimate their storage term. We hit an acoustic probe that sweeps audible band to a sample and capture an transmitted acoustic signal by using a microphone. We collect transmitted signals for samples with various storage terms and the obtained signals are used to train a classifier. In our study, twelve sample tomatoes were measured during fourteen days. We found the amplitude of the transmitted signal obviously decreases as the tomato matures.
{"title":"Acoustic probing to estimate freshness of tomato","authors":"Hidetomo Kataoka, Takashi Ijiri, Jeremy White, A. Hirabayashi","doi":"10.1109/APSIPA.2016.7820777","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820777","url":null,"abstract":"The freshness of vegetables attracts significant interest, because consumers will determine the way of cooking based on the maturity of the vegetable or select better vegetables in supermarkets based on the freshness information. This paper focuses on tomatoes, and reports our preliminary studies on acoustic probing techniques to estimate their storage term. We hit an acoustic probe that sweeps audible band to a sample and capture an transmitted acoustic signal by using a microphone. We collect transmitted signals for samples with various storage terms and the obtained signals are used to train a classifier. In our study, twelve sample tomatoes were measured during fourteen days. We found the amplitude of the transmitted signal obviously decreases as the tomato matures.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131324623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820757
Yumin Suh, Kyoung Mu Lee
We propose a new method for human pose estimation from a single image. Since both appearance and locations of different body parts strongly depends on each other in an image, considering their relationship helps identifying the underlying poses. However, most of the existing methods cannot fully utilize this contextual information by using simplified model to make inference tractable. The proposed method models general relationship between body parts based on the convolutional neural networks, while keeping inference tractableble by effectively reducing the search space to a subset of poses by pruning unreliable ones based on the strong unary part detectors. Experimental results demonstrate that the proposed method improves the accuracy than baselines, on FLIC and LSP dataset, while keeping inference and learning tractable.
{"title":"Appearance dependent inter-part relationship for human pose estimation","authors":"Yumin Suh, Kyoung Mu Lee","doi":"10.1109/APSIPA.2016.7820757","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820757","url":null,"abstract":"We propose a new method for human pose estimation from a single image. Since both appearance and locations of different body parts strongly depends on each other in an image, considering their relationship helps identifying the underlying poses. However, most of the existing methods cannot fully utilize this contextual information by using simplified model to make inference tractable. The proposed method models general relationship between body parts based on the convolutional neural networks, while keeping inference tractableble by effectively reducing the search space to a subset of poses by pruning unreliable ones based on the strong unary part detectors. Experimental results demonstrate that the proposed method improves the accuracy than baselines, on FLIC and LSP dataset, while keeping inference and learning tractable.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125374807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}