Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820865
Mu Yang, Li Su, Yi-Hsuan Yang
A musical chord is usually described by its root note and the chord type. While a substantial amount of work has been done in the field of music information retrieval (MIR) to automate chord recognition, the role of root notes in this task has seldom received specific attention. In this paper, we present a new approach and empirical studies demonstrating improved accuracy in chord recognition by properly highlighting the information of the root notes. In the signal level, we propose to combine spectral features with features derived from the cepstrum to improve the identification of low pitches, which usually correspond to the root notes. In the model level, we propose a multi-task learning framework based on the neural nets to jointly consider chord recognition and root note recognition in training. We found that the improved accuracy can be attributed to better information about the sub-harmonics of the notes, and the emphasis of root notes in recognizing chords.
{"title":"Highlighting root notes in chord recognition using cepstral features and multi-task learning","authors":"Mu Yang, Li Su, Yi-Hsuan Yang","doi":"10.1109/APSIPA.2016.7820865","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820865","url":null,"abstract":"A musical chord is usually described by its root note and the chord type. While a substantial amount of work has been done in the field of music information retrieval (MIR) to automate chord recognition, the role of root notes in this task has seldom received specific attention. In this paper, we present a new approach and empirical studies demonstrating improved accuracy in chord recognition by properly highlighting the information of the root notes. In the signal level, we propose to combine spectral features with features derived from the cepstrum to improve the identification of low pitches, which usually correspond to the root notes. In the model level, we propose a multi-task learning framework based on the neural nets to jointly consider chord recognition and root note recognition in training. We found that the improved accuracy can be attributed to better information about the sub-harmonics of the notes, and the emphasis of root notes in recognizing chords.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126461814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820728
Yuichi Tanaka, S. Yagyu, Akie Sakiyama, Masaki Onuki
We propose a calculation method of deformed image pixel positions for mesh-based image retargeting. Image retargeting is a sophisticated image resizing method which yields resized images with acceptable quality even if we resize the image into different aspect ratio from the original one. It often employs a mesh-based approach, where pixels are nodes of a graph and relationships between pixels are represented as its edges. In this paper, we reformulate a pixel position deformation of image retargeting as a spectral graph filtering with a graph signal processing-based approach. We validate our method through some image retargeting examples with an appropriately designed filter kernels in the graph spectral domain.
{"title":"Mesh-based image retargeting with spectral graph filtering","authors":"Yuichi Tanaka, S. Yagyu, Akie Sakiyama, Masaki Onuki","doi":"10.1109/APSIPA.2016.7820728","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820728","url":null,"abstract":"We propose a calculation method of deformed image pixel positions for mesh-based image retargeting. Image retargeting is a sophisticated image resizing method which yields resized images with acceptable quality even if we resize the image into different aspect ratio from the original one. It often employs a mesh-based approach, where pixels are nodes of a graph and relationships between pixels are represented as its edges. In this paper, we reformulate a pixel position deformation of image retargeting as a spectral graph filtering with a graph signal processing-based approach. We validate our method through some image retargeting examples with an appropriately designed filter kernels in the graph spectral domain.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127271640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820852
Jonghee Kim, Changick Kim
Dictionary-based super-resolution is actively studied with successful achievements. However, previous dictionary-based super-resolution methods exploit optimization or nearest neighbor search which has high complexity. In this paper, we propose a low-complexity super-resolution method called the discrete feature transform which performs feature extraction and nearest neighbor search at once. As a result, the proposed method achieves the lowest complexity among dictionary-based super-resolution methods with a comparable performance.
{"title":"Discrete feature transform for low-complexity single-image super-resolution","authors":"Jonghee Kim, Changick Kim","doi":"10.1109/APSIPA.2016.7820852","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820852","url":null,"abstract":"Dictionary-based super-resolution is actively studied with successful achievements. However, previous dictionary-based super-resolution methods exploit optimization or nearest neighbor search which has high complexity. In this paper, we propose a low-complexity super-resolution method called the discrete feature transform which performs feature extraction and nearest neighbor search at once. As a result, the proposed method achieves the lowest complexity among dictionary-based super-resolution methods with a comparable performance.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127469234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820714
Yoonmo Yang, Dohoon Lee, Byung Tae Oh
In this paper, we propose a new frame rate up conversion method for multiview video. The proposed method uses the depth map and neighboring view information for the improvement of motion estimation and compensation accuracy. In details, it decomposes a block into multiple layers with depth map. Then it estimates the occluded regions in the lower layer using their neighboring view information, which consequently leads more accurate motion estimation and compensation. The experimental results show that the proposed method highly improves the quality of the interpolated frames compared to the conventional methods.
{"title":"Frame rate up conversion for multiview video","authors":"Yoonmo Yang, Dohoon Lee, Byung Tae Oh","doi":"10.1109/APSIPA.2016.7820714","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820714","url":null,"abstract":"In this paper, we propose a new frame rate up conversion method for multiview video. The proposed method uses the depth map and neighboring view information for the improvement of motion estimation and compensation accuracy. In details, it decomposes a block into multiple layers with depth map. Then it estimates the occluded regions in the lower layer using their neighboring view information, which consequently leads more accurate motion estimation and compensation. The experimental results show that the proposed method highly improves the quality of the interpolated frames compared to the conventional methods.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127367435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820799
Danwei Cai, Weicheng Cai, Zhidong Ni, Ming Li
In this paper, we apply Locality Sensitive Discriminant Analysis (LSDA) to speaker verification system for intersession variability compensation. As opposed to LDA which fails to discover the local geometrical structure of the data manifold, LSDA finds a projection which maximizes the margin between i-vectors from different speakers at each local area. Since the number of samples varies in a wide range in each class, we improve LSDA by using adaptive k nearest neighbors in each class and modifying the corresponding within- and between-class weight matrix. In that way, each class has equal importance in LSDA's objective function. Experiments were carried out on the NIST 2010 speaker recognition evaluation (SRE) extended condition 5 female task, results show that our proposed adaptive k nearest neighbors based LSDA method significantly improves the conventional i-vector/PLDA baseline by 18% relative cost reduction and 28% relative equal error rate reduction.
{"title":"Locality sensitive discriminant analysis for speaker verification","authors":"Danwei Cai, Weicheng Cai, Zhidong Ni, Ming Li","doi":"10.1109/APSIPA.2016.7820799","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820799","url":null,"abstract":"In this paper, we apply Locality Sensitive Discriminant Analysis (LSDA) to speaker verification system for intersession variability compensation. As opposed to LDA which fails to discover the local geometrical structure of the data manifold, LSDA finds a projection which maximizes the margin between i-vectors from different speakers at each local area. Since the number of samples varies in a wide range in each class, we improve LSDA by using adaptive k nearest neighbors in each class and modifying the corresponding within- and between-class weight matrix. In that way, each class has equal importance in LSDA's objective function. Experiments were carried out on the NIST 2010 speaker recognition evaluation (SRE) extended condition 5 female task, results show that our proposed adaptive k nearest neighbors based LSDA method significantly improves the conventional i-vector/PLDA baseline by 18% relative cost reduction and 28% relative equal error rate reduction.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820721
Young-Sun Joo, Won-Suk Jun, Hong-Goo Kang
This paper proposes a cascading deep neural network (DNN) structure for speech synthesis system that consists of text-to-bottleneck (TTB) and bottleneck-to-speech (BTS) models. Unlike conventional single structure that requires a large database to find complicated mapping rules between linguistic and acoustic features, the proposed structure is very effective even if the available training database is inadequate. The bottle-neck feature utilized in the proposed approach represents the characteristics of linguistic features and its average acoustic features of several speakers. Therefore, it is more efficient to learn a mapping rule between bottleneck and acoustic features than to learn directly a mapping rule between linguistic and acoustic features. Experimental results show that the learning capability of the proposed structure is much higher than that of the conventional structures. Objective and subjective listening test results also verify the superiority of the proposed structure.
{"title":"Efficient deep neural networks for speech synthesis using bottleneck features","authors":"Young-Sun Joo, Won-Suk Jun, Hong-Goo Kang","doi":"10.1109/APSIPA.2016.7820721","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820721","url":null,"abstract":"This paper proposes a cascading deep neural network (DNN) structure for speech synthesis system that consists of text-to-bottleneck (TTB) and bottleneck-to-speech (BTS) models. Unlike conventional single structure that requires a large database to find complicated mapping rules between linguistic and acoustic features, the proposed structure is very effective even if the available training database is inadequate. The bottle-neck feature utilized in the proposed approach represents the characteristics of linguistic features and its average acoustic features of several speakers. Therefore, it is more efficient to learn a mapping rule between bottleneck and acoustic features than to learn directly a mapping rule between linguistic and acoustic features. Experimental results show that the learning capability of the proposed structure is much higher than that of the conventional structures. Objective and subjective listening test results also verify the superiority of the proposed structure.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122412625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820683
Y. Ji, Xiangeng Bu, Jinwei Sun, Zhiyong Liu
In order to establish a more reliable and robust EEG model in sleep stages, the reasonable choice of modeling parameters is necessary. The function of this step is to select a subset of d features from a set of D features based on some optimization criterion, and provide the most optimal input features of classification. In the present study, an improved simulated annealing genetic algorithm (ISAGA) was proposed. 25 feature parameters were extracted from the sleep EEG in MIT-BIH polysomnography database. The feature selection results demonstrated that ISAGA can get a higher classification accuracy with fewer feature number than the correlation coefficient algorithm (CCA), genetic algorithm (GA), adaptive genetic algorithm (AGA) and simulated annealing genetic algorithm (SAGA). Compared to using all the features in sleep staging, the classification accuracy of ISAGA with optimal features is about 92.00%, which improved about 4.83%.
{"title":"An improved simulated annealing genetic algorithm of EEG feature selection in sleep stage","authors":"Y. Ji, Xiangeng Bu, Jinwei Sun, Zhiyong Liu","doi":"10.1109/APSIPA.2016.7820683","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820683","url":null,"abstract":"In order to establish a more reliable and robust EEG model in sleep stages, the reasonable choice of modeling parameters is necessary. The function of this step is to select a subset of d features from a set of D features based on some optimization criterion, and provide the most optimal input features of classification. In the present study, an improved simulated annealing genetic algorithm (ISAGA) was proposed. 25 feature parameters were extracted from the sleep EEG in MIT-BIH polysomnography database. The feature selection results demonstrated that ISAGA can get a higher classification accuracy with fewer feature number than the correlation coefficient algorithm (CCA), genetic algorithm (GA), adaptive genetic algorithm (AGA) and simulated annealing genetic algorithm (SAGA). Compared to using all the features in sleep staging, the classification accuracy of ISAGA with optimal features is about 92.00%, which improved about 4.83%.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121724641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820681
Mizuki Murayama, Daisuke Oguro, H. Kikuchi, H. Huttunen, Yo-Sung Ho, Jaeho Shin
The divergence similarity between two color images is presented based on the Jensen-Shannon divergence to measure the color-distribution similarity. Subjective assessment experiments were developed to obtain mean opinion scores (MOS) of test images. It was found that the divergence similarity and MOS values showed statistically significant correlations.
{"title":"Color-distribution similarity by information theoretic divergence for color images","authors":"Mizuki Murayama, Daisuke Oguro, H. Kikuchi, H. Huttunen, Yo-Sung Ho, Jaeho Shin","doi":"10.1109/APSIPA.2016.7820681","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820681","url":null,"abstract":"The divergence similarity between two color images is presented based on the Jensen-Shannon divergence to measure the color-distribution similarity. Subjective assessment experiments were developed to obtain mean opinion scores (MOS) of test images. It was found that the divergence similarity and MOS values showed statistically significant correlations.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122625817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820827
Jaeryun Ko, Yo-Sung Ho
The census transform in computing the matching cost of stereo matching is simple and robust under luminance variations in stereo image pairs; however, different disparity maps are generated depending on the shape and size of the census transform window. In this paper, we propose a stereo matching method with variable sizes of census transform windows based on the gradients of stereo images. Our experiment shows higher accuracy of disparity values in the area of depth discontinuities.
{"title":"Stereo matching using census transform of adaptive window sizes with gradient images","authors":"Jaeryun Ko, Yo-Sung Ho","doi":"10.1109/APSIPA.2016.7820827","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820827","url":null,"abstract":"The census transform in computing the matching cost of stereo matching is simple and robust under luminance variations in stereo image pairs; however, different disparity maps are generated depending on the shape and size of the census transform window. In this paper, we propose a stereo matching method with variable sizes of census transform windows based on the gradients of stereo images. Our experiment shows higher accuracy of disparity values in the area of depth discontinuities.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126815594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/APSIPA.2016.7820704
Yueh-Ting Tsai, B. Su, Yu Tsao, Syu-Siang Wang
Recently, a subspace-constrained diagonal loading (SSC-DL) method has been proposed for robust beamforming against the mismatched direction of arrival (DoA) issue. Although SSC-DL has outstanding output SINR performance, it is not clear how to choose the DL factor and subspace dimension in practice. The aim of the present study is to further investigate conditions on optimal parameters for SSC-DL and algorithms to determine them in realistic test conditions. First, we proposed to use the Capon power spectrum density to determine the desired signal power, which is then used to compute the optimal DL factor for SSC-DL. Next, a novel adaptive SSC-DL approach (adaptive-SSC-DL) is proposed, which can dynamically optimize the sub-space dimension based on the test conditions. Simulation results show that adaptive-SSC-DL provides higher output SINR than several existing methods and achieves comparable performance comparing to SSC-DL with ideal parameter setup.
{"title":"Adaptive subspace-constrained diagonal loading","authors":"Yueh-Ting Tsai, B. Su, Yu Tsao, Syu-Siang Wang","doi":"10.1109/APSIPA.2016.7820704","DOIUrl":"https://doi.org/10.1109/APSIPA.2016.7820704","url":null,"abstract":"Recently, a subspace-constrained diagonal loading (SSC-DL) method has been proposed for robust beamforming against the mismatched direction of arrival (DoA) issue. Although SSC-DL has outstanding output SINR performance, it is not clear how to choose the DL factor and subspace dimension in practice. The aim of the present study is to further investigate conditions on optimal parameters for SSC-DL and algorithms to determine them in realistic test conditions. First, we proposed to use the Capon power spectrum density to determine the desired signal power, which is then used to compute the optimal DL factor for SSC-DL. Next, a novel adaptive SSC-DL approach (adaptive-SSC-DL) is proposed, which can dynamically optimize the sub-space dimension based on the test conditions. Simulation results show that adaptive-SSC-DL provides higher output SINR than several existing methods and achieves comparable performance comparing to SSC-DL with ideal parameter setup.","PeriodicalId":409448,"journal":{"name":"2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129176065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}