Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041599
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
This paper describes an approach to HMM-based Thai speech synthesis using stress context. It has been shown that context related to stressed/unstressed syllable information (stress context) significantly improves the tone correctness of the synthetic speech, but there is a problem of requiring a manual context labeling process in tone modeling. To reduce costs for the stress context labeling, we propose an unsupervised technique for automatic labeling based on the characteristics of Thai stressed syllables, namely, having high FO movement and long duration. In the proposed technique, we use log FO variance and duration of each syllable to classify it into one of stress-related context classes. Objective and subjective evaluation results show that the proposed context labeling gives comparable performance to that conducted carefully by a human in terms of tone naturalness of synthetic speech.
{"title":"HMM-based Thai speech synthesis using unsupervised stress context labeling","authors":"Decha Moungsri, Tomoki Koriyama, Takao Kobayashi","doi":"10.1109/APSIPA.2014.7041599","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041599","url":null,"abstract":"This paper describes an approach to HMM-based Thai speech synthesis using stress context. It has been shown that context related to stressed/unstressed syllable information (stress context) significantly improves the tone correctness of the synthetic speech, but there is a problem of requiring a manual context labeling process in tone modeling. To reduce costs for the stress context labeling, we propose an unsupervised technique for automatic labeling based on the characteristics of Thai stressed syllables, namely, having high FO movement and long duration. In the proposed technique, we use log FO variance and duration of each syllable to classify it into one of stress-related context classes. Objective and subjective evaluation results show that the proposed context labeling gives comparable performance to that conducted carefully by a human in terms of tone naturalness of synthetic speech.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116881436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041513
N. Ito
In this paper, we first derive a new phase-error function for designing all-pass phase-correction-system (PCS) that is needed in digital communication systems and other signal-processing systems. Based on the new phase-error function, we propose a linearized optimization scheme for linearizing the non-linear optimization problem as a successively linearized optimization problem. An illustrative example is given to validate the proposed successively linearized optimization scheme.
{"title":"Phase-correction-system (PCS) design utilizing successively linearized optimization","authors":"N. Ito","doi":"10.1109/APSIPA.2014.7041513","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041513","url":null,"abstract":"In this paper, we first derive a new phase-error function for designing all-pass phase-correction-system (PCS) that is needed in digital communication systems and other signal-processing systems. Based on the new phase-error function, we propose a linearized optimization scheme for linearizing the non-linear optimization problem as a successively linearized optimization problem. An illustrative example is given to validate the proposed successively linearized optimization scheme.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117040612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041678
Chuang Shi, Y. Kajikawa
The parametric loudspeaker is a directional sound reproduction device making use of the parametric sound generation. A sound beam is formed as a result of nonlinear interactions between ultrasonic beams. The parametric loudspeaker is advantageous in transmitting an equally narrow sound beam from a smaller emitter as compared to the conventional loudspeaker. Due to this advantage, parametric loudspeakers are readily applied in a variety of sound field control applications, such as creation of personal listening spots, spatial audio reproduction, and active noise control. However, there is a long concerned drawback of the parametric loudspeaker, whereby harmonic and intermodulation distortions are byproducts of the parametric sound generation. Hence, a comparative study of six preprocessing methods, including two proposed methods from this paper, is carried out. Harmonic and intermodulation distortions are demonstrated by experiments.
{"title":"A comparative study of preprocessing methods in the parametric loudspeaker","authors":"Chuang Shi, Y. Kajikawa","doi":"10.1109/APSIPA.2014.7041678","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041678","url":null,"abstract":"The parametric loudspeaker is a directional sound reproduction device making use of the parametric sound generation. A sound beam is formed as a result of nonlinear interactions between ultrasonic beams. The parametric loudspeaker is advantageous in transmitting an equally narrow sound beam from a smaller emitter as compared to the conventional loudspeaker. Due to this advantage, parametric loudspeakers are readily applied in a variety of sound field control applications, such as creation of personal listening spots, spatial audio reproduction, and active noise control. However, there is a long concerned drawback of the parametric loudspeaker, whereby harmonic and intermodulation distortions are byproducts of the parametric sound generation. Hence, a comparative study of six preprocessing methods, including two proposed methods from this paper, is carried out. Harmonic and intermodulation distortions are demonstrated by experiments.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116371630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041808
Dabwitso Kasauka, Hiroshi Tsutsui, H. Okuhata, Y. Miyanaga
In this paper, we present a computational cost analysis result of accelerated iterative shrinkage smoothing algorithm, which is one of promising image smoothing algorithms with sufficient smoothing quality results and reduced processing time. The main motivation of this cost analysis is to provide a base for efficient hardware implementation. We implemented it in a lower-level programming language with OpenCV library as opposed to the MATLAB implementation. The resolution dependency of the processing time is also illustrated.
{"title":"Computational cost analysis and implementation of accelerated iterative shrinkage smoothing","authors":"Dabwitso Kasauka, Hiroshi Tsutsui, H. Okuhata, Y. Miyanaga","doi":"10.1109/APSIPA.2014.7041808","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041808","url":null,"abstract":"In this paper, we present a computational cost analysis result of accelerated iterative shrinkage smoothing algorithm, which is one of promising image smoothing algorithms with sufficient smoothing quality results and reduced processing time. The main motivation of this cost analysis is to provide a base for efficient hardware implementation. We implemented it in a lower-level programming language with OpenCV library as opposed to the MATLAB implementation. The resolution dependency of the processing time is also illustrated.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125737872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041740
Chanathip Roeksukrungrueang, Xaysamone Dittaphong, K. Khongsomboon, Nounchan Panyanouyong, S. Chivapreecha
An implementation of chaotic encoder-decoder on FPGA will be proposed in this paper. Overflow non-linearity by using 2's complement number in digital filter causes the phenomenon called "Chaos" in digital filter. An 1ER filter can be used to chaotic encoder while an FIR filter is used to chaotic decoder. Filter coefficients of both encoder and decoder can be compared to the secret key in private-key crypto system. However, if filter coefficients of chaotic decoder are not same as filter coefficients of chaotic encoder, ciphertext cannot decrypt to get original plaintext. Both chaotic encoder and decoder will be implemented on FPGA to demonstrate the hardware prototype of chaotic crypto system.
{"title":"Chaotic encoder-decoder on FPGA for crypto system","authors":"Chanathip Roeksukrungrueang, Xaysamone Dittaphong, K. Khongsomboon, Nounchan Panyanouyong, S. Chivapreecha","doi":"10.1109/APSIPA.2014.7041740","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041740","url":null,"abstract":"An implementation of chaotic encoder-decoder on FPGA will be proposed in this paper. Overflow non-linearity by using 2's complement number in digital filter causes the phenomenon called \"Chaos\" in digital filter. An 1ER filter can be used to chaotic encoder while an FIR filter is used to chaotic decoder. Filter coefficients of both encoder and decoder can be compared to the secret key in private-key crypto system. However, if filter coefficients of chaotic decoder are not same as filter coefficients of chaotic encoder, ciphertext cannot decrypt to get original plaintext. Both chaotic encoder and decoder will be implemented on FPGA to demonstrate the hardware prototype of chaotic crypto system.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126111111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041681
Wen-Chi Hsieh, Chin-Wen Ho, Viet-Hang Duong, Yuan-Shan Lee, Jia-Ching Wang
This paper introduces a novel two dimensional feature extraction method for environmental sound classification, based on two dimensional semi-nonnegative matrix factorization (2D Semi-NMF) of scale-frequency maps. We first extract scale-frequency maps (SFMs) from the input signals, and this feature is considered preserving scale and frequency characteristics of signals. Second, a 2D Semi-NMF method is applied on SFMs to get more information of the input signals. We use the combinational coefficients extracted from 2D Semi-NMF for classification. Experimental results on an 8 class environmental sound database show that 2D Semi-NMF has better classification accuracy than traditional ID NMF and 2D NMF Also, applying 2D Semi-NMF on SFMs will get slightly improvement than SFMs features alone.
{"title":"2D semi-NMF of scale-frequency map for environmental sound classification","authors":"Wen-Chi Hsieh, Chin-Wen Ho, Viet-Hang Duong, Yuan-Shan Lee, Jia-Ching Wang","doi":"10.1109/APSIPA.2014.7041681","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041681","url":null,"abstract":"This paper introduces a novel two dimensional feature extraction method for environmental sound classification, based on two dimensional semi-nonnegative matrix factorization (2D Semi-NMF) of scale-frequency maps. We first extract scale-frequency maps (SFMs) from the input signals, and this feature is considered preserving scale and frequency characteristics of signals. Second, a 2D Semi-NMF method is applied on SFMs to get more information of the input signals. We use the combinational coefficients extracted from 2D Semi-NMF for classification. Experimental results on an 8 class environmental sound database show that 2D Semi-NMF has better classification accuracy than traditional ID NMF and 2D NMF Also, applying 2D Semi-NMF on SFMs will get slightly improvement than SFMs features alone.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124661536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041730
Akinari Ida, T. Fujii
In this paper, we consider a secrecy transmission scheme based on physical layer, employing multi-band transmitters with dynamic power allocations and channel selections under the circumstance of spectrum sharing cognitive radio networks. Here, we apply a physical layer security utilizing multi-band transmitter for distributing each confidential message of each receiver over multiple frequency channels and for decreasing leakage. Moreover, we aim to improve the secrecy capacity by using a channel selection method based on the channel condition of each user and primary usage. By using computer simulations, we could verify that the proposed method improves the performance in terms of secrecy capacity against eavesdropping compared with methods using single-band transmission under primary user coexisting environment.
{"title":"Physical layer security using multi-band transmission considering channel selection for cognitive radio networks","authors":"Akinari Ida, T. Fujii","doi":"10.1109/APSIPA.2014.7041730","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041730","url":null,"abstract":"In this paper, we consider a secrecy transmission scheme based on physical layer, employing multi-band transmitters with dynamic power allocations and channel selections under the circumstance of spectrum sharing cognitive radio networks. Here, we apply a physical layer security utilizing multi-band transmitter for distributing each confidential message of each receiver over multiple frequency channels and for decreasing leakage. Moreover, we aim to improve the secrecy capacity by using a channel selection method based on the channel condition of each user and primary usage. By using computer simulations, we could verify that the proposed method improves the performance in terms of secrecy capacity against eavesdropping compared with methods using single-band transmission under primary user coexisting environment.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124837755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041698
R. Liu, R. Ni, Yao Zhao
Reversible data hiding recovers the original image from the stego-image without distortion after data extraction. In this paper, we propose a novel reversible data hiding method based on adaptive prediction techniques and histogram shifting. Because most natural images always contain edges, it is not suitable to predict these pixels using existing prediction methods. For more precise prediction, two prediction methods are adaptively used to calculate prediction error according to the characteristic of a pixel. As a result, two prediction error histograms are built. One is for pixels located at edges, and the other is for the rest pixels. Data are embedded in the image by using histogram shifting method. In addition, a new sorting method is applied to histogram shifting, which considers the differences of all pixel pairs in the neighborhood and better reflects the correlation among pixels. Through the sorting method, the prediction errors with small absolute values are arranged in the front and more embeddable pixels are preferentially processed. Therefore, the number of shifting pixels is decreased if the peaks in the histograms are all dealt with or the capacity is satisfied, which is beneficial to distortion reduction. Experimental results demonstrate that the proposed method acquires greater capacity and higher quality compared with other state-of-the-art schemes.
{"title":"A reversible data hiding based on adaptive prediction technique and histogram shifting","authors":"R. Liu, R. Ni, Yao Zhao","doi":"10.1109/APSIPA.2014.7041698","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041698","url":null,"abstract":"Reversible data hiding recovers the original image from the stego-image without distortion after data extraction. In this paper, we propose a novel reversible data hiding method based on adaptive prediction techniques and histogram shifting. Because most natural images always contain edges, it is not suitable to predict these pixels using existing prediction methods. For more precise prediction, two prediction methods are adaptively used to calculate prediction error according to the characteristic of a pixel. As a result, two prediction error histograms are built. One is for pixels located at edges, and the other is for the rest pixels. Data are embedded in the image by using histogram shifting method. In addition, a new sorting method is applied to histogram shifting, which considers the differences of all pixel pairs in the neighborhood and better reflects the correlation among pixels. Through the sorting method, the prediction errors with small absolute values are arranged in the front and more embeddable pixels are preferentially processed. Therefore, the number of shifting pixels is decreased if the peaks in the histograms are all dealt with or the capacity is satisfied, which is beneficial to distortion reduction. Experimental results demonstrate that the proposed method acquires greater capacity and higher quality compared with other state-of-the-art schemes.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129771516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041739
Y. Kang, Yo-Sung Ho
In this paper, we present an upsampling method of low-resolution depth maps with enhancing depth discontinuities using color segment information. After we supply the initial depth measurement considering the corresponding color segment information, we define an energy function for depth map upsampling based on the depth measurement, color values, and color segments. Then, we obtain high-resolution depth maps by by belief propagation optimization. Experimental results show that the proposed method outperforms other approaches for depth map upsampling in terms of the bad pixel rate and mean absolute error.
{"title":"Upsampling of low-resolution depth map with enhancing depth discontinuity regions","authors":"Y. Kang, Yo-Sung Ho","doi":"10.1109/APSIPA.2014.7041739","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041739","url":null,"abstract":"In this paper, we present an upsampling method of low-resolution depth maps with enhancing depth discontinuities using color segment information. After we supply the initial depth measurement considering the corresponding color segment information, we define an energy function for depth map upsampling based on the depth measurement, color values, and color segments. Then, we obtain high-resolution depth maps by by belief propagation optimization. Experimental results show that the proposed method outperforms other approaches for depth map upsampling in terms of the bad pixel rate and mean absolute error.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127236924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/APSIPA.2014.7041574
Liqun Peng, Deshun Yang, Xiaoou Chen
This paper addresses the problem of detection and recognition of impulsive sounds in surveillance system, such as door slams, footsteps, glass breaks, gunshots and human screams. We build an acoustic event dataset of about 1k sound clips and a ground truth dataset of a surveillance system. We investigate the influence of different frame size in audio feature extraction when classify acoustic events and our result show that the classification accuracy differs from different audio frame sizes. Based on this result, we propose an approach to integrate multi frame size features to generate a new feature set, which can achieve better performance. We build an abnormal acoustic event detection system for surveillance using this feature and adopt a smoothing post process. The experiments show the effectiveness of our proposed approach.
{"title":"Multi frame size feature extraction for acoustic event detection","authors":"Liqun Peng, Deshun Yang, Xiaoou Chen","doi":"10.1109/APSIPA.2014.7041574","DOIUrl":"https://doi.org/10.1109/APSIPA.2014.7041574","url":null,"abstract":"This paper addresses the problem of detection and recognition of impulsive sounds in surveillance system, such as door slams, footsteps, glass breaks, gunshots and human screams. We build an acoustic event dataset of about 1k sound clips and a ground truth dataset of a surveillance system. We investigate the influence of different frame size in audio feature extraction when classify acoustic events and our result show that the classification accuracy differs from different audio frame sizes. Based on this result, we propose an approach to integrate multi frame size features to generate a new feature set, which can achieve better performance. We build an abnormal acoustic event detection system for surveillance using this feature and adopt a smoothing post process. The experiments show the effectiveness of our proposed approach.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122377207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}