Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282273
Zhi Hao Lim, Xiaohai Tian, Wei Rao, Chng Eng Siong
Replay attacks from unseen utterances poses a significant challenge in Anti-Spoofing Detection. In this paper, we propose a statistical measure based on the Rayleigh Quotient in order to investigate a feature partition capable of discerning genuine and playback speech under unseen conditions. The Log- Magnitude Spectrum (LMS) of the utterances is used in this study. Using the proposed measure, we analyze the frequency bands of the LMS based on the amount of discriminative information between the scatter matrices of the genuine and spoof utterances. This allows us to determine the optimal frequency bands required for replay attacks detection. In addition, we further investigate the effects of training our models using voiced and unvoiced portions of the utterances. We conducted our experiments based on the ASVspoof 2017 database. On the development set, our partitioned LMS feature based on the whole utterance yields a 3.8% EER. After utilizing just the unvoiced portions of the utterances, the EER is further decreased to 3.27% while our baseline using the Constant Q Cepstral Coefficients (CQCC) as a feature is at 10.21%. The evaluation results also confirms the effectiveness of our approach.
{"title":"An investigation of spectral feature partitioning for replay attacks detection","authors":"Zhi Hao Lim, Xiaohai Tian, Wei Rao, Chng Eng Siong","doi":"10.1109/APSIPA.2017.8282273","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282273","url":null,"abstract":"Replay attacks from unseen utterances poses a significant challenge in Anti-Spoofing Detection. In this paper, we propose a statistical measure based on the Rayleigh Quotient in order to investigate a feature partition capable of discerning genuine and playback speech under unseen conditions. The Log- Magnitude Spectrum (LMS) of the utterances is used in this study. Using the proposed measure, we analyze the frequency bands of the LMS based on the amount of discriminative information between the scatter matrices of the genuine and spoof utterances. This allows us to determine the optimal frequency bands required for replay attacks detection. In addition, we further investigate the effects of training our models using voiced and unvoiced portions of the utterances. We conducted our experiments based on the ASVspoof 2017 database. On the development set, our partitioned LMS feature based on the whole utterance yields a 3.8% EER. After utilizing just the unvoiced portions of the utterances, the EER is further decreased to 3.27% while our baseline using the Constant Q Cepstral Coefficients (CQCC) as a feature is at 10.21%. The evaluation results also confirms the effectiveness of our approach.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126801053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282274
Hanwu Sun, Kong-Aik Lee, Trung Hieu Nguyen, B. Ma, Haizhou Li
This paper presents a detailed description and analysis of a joint submission of Institute for Infocomm Research (I2R) and National University of Singapore (NUS), which is the top performing system to AP16-OL7 Challenge. The submitted system was a fusion of two sub-systems: the i-vector system and GMM-SVM system, both based on state-of-the-art bottleneck feature. Central to our work presented in this paper is a language-dependent UBM GMM-SVM system and traditional i- vector polynomials expansion with SVM classifier. The FoCal toolkit was used for sub-system fusion. Experimental results show that the proposed approach achieves significant improvement over the baseline system on the development and evaluation sets. Our final submission achieve EER 0.440%, 1.09% and identification rates 98.9%, 97.6% on the development set and evaluation set, respectively.
{"title":"I2R-NUS submission to oriental language recognition AP16-OL7 challenge","authors":"Hanwu Sun, Kong-Aik Lee, Trung Hieu Nguyen, B. Ma, Haizhou Li","doi":"10.1109/APSIPA.2017.8282274","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282274","url":null,"abstract":"This paper presents a detailed description and analysis of a joint submission of Institute for Infocomm Research (I2R) and National University of Singapore (NUS), which is the top performing system to AP16-OL7 Challenge. The submitted system was a fusion of two sub-systems: the i-vector system and GMM-SVM system, both based on state-of-the-art bottleneck feature. Central to our work presented in this paper is a language-dependent UBM GMM-SVM system and traditional i- vector polynomials expansion with SVM classifier. The FoCal toolkit was used for sub-system fusion. Experimental results show that the proposed approach achieves significant improvement over the baseline system on the development and evaluation sets. Our final submission achieve EER 0.440%, 1.09% and identification rates 98.9%, 97.6% on the development set and evaluation set, respectively.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114062635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282055
G. He, Siyuan Xing, Dandan Dong, Ximei Zhao
Based on the characteristics of two-step sparse coding and multi-scale analysis of wavelet transform, a novel fusion algorithm based on two-step sparse coding (Two Step Sparse Representation, TSSR) and wavelet transform is proposed. The two-step sparse strategy is used to construct the corresponding dictionary for the low-frequency component and the down- sampled low-frequency component respectively, which avoids the training process of the traditional sparse representation and improves the computing speed. At the same time, the sparse coefficient solution based on two-step sparse coding is closer to the original signal than the one-step sparse solution in traditional sparse representation, and the precision of the algorithm is higher. Experimental results and analysis show that the proposed method can not only keep the spectral characteristics, but also can effectively integrate the spatial detail information of panchromatic images. The computing time is much faster than the traditional sparse method, and it has more advantages than wavelet transform and traditional sparse representation with excellent fusion effect.
{"title":"Panchromatic and multi-spectral image fusion method based on two-step sparse representation and wavelet transform","authors":"G. He, Siyuan Xing, Dandan Dong, Ximei Zhao","doi":"10.1109/APSIPA.2017.8282055","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282055","url":null,"abstract":"Based on the characteristics of two-step sparse coding and multi-scale analysis of wavelet transform, a novel fusion algorithm based on two-step sparse coding (Two Step Sparse Representation, TSSR) and wavelet transform is proposed. The two-step sparse strategy is used to construct the corresponding dictionary for the low-frequency component and the down- sampled low-frequency component respectively, which avoids the training process of the traditional sparse representation and improves the computing speed. At the same time, the sparse coefficient solution based on two-step sparse coding is closer to the original signal than the one-step sparse solution in traditional sparse representation, and the precision of the algorithm is higher. Experimental results and analysis show that the proposed method can not only keep the spectral characteristics, but also can effectively integrate the spatial detail information of panchromatic images. The computing time is much faster than the traditional sparse method, and it has more advantages than wavelet transform and traditional sparse representation with excellent fusion effect.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120905260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282095
Nirmesh J. Shah, H. Patil
Development of text-independent Voice Conversion (VC) has gained more research interest for last one decade. Alignment of the source and target speakers' spectral features before learning the mapping function is the challenging step for the development of the text-independent VC as both the speakers have uttered different utterances from the same or different languages. State-of-the-art alignment technique is an Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment (INCA) algorithm that iteratively learns the mapping function after getting the nearest neighbor aligned feature pairs from intermediate converted spectral features and target spectral features. To the best of authors' knowledge, this algorithm was shown to converge empirically, however, its theoretical proof has not been discussed in detail in the VC literature. In this paper, we have presented that the INCA algorithm will converge monotonically to a local minimum in mean square error (MSE) sense. In addition, we also present the reason of convergence in MSE sense in the context of VC task.
{"title":"On the convergence of INCA algorithm","authors":"Nirmesh J. Shah, H. Patil","doi":"10.1109/APSIPA.2017.8282095","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282095","url":null,"abstract":"Development of text-independent Voice Conversion (VC) has gained more research interest for last one decade. Alignment of the source and target speakers' spectral features before learning the mapping function is the challenging step for the development of the text-independent VC as both the speakers have uttered different utterances from the same or different languages. State-of-the-art alignment technique is an Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment (INCA) algorithm that iteratively learns the mapping function after getting the nearest neighbor aligned feature pairs from intermediate converted spectral features and target spectral features. To the best of authors' knowledge, this algorithm was shown to converge empirically, however, its theoretical proof has not been discussed in detail in the VC literature. In this paper, we have presented that the INCA algorithm will converge monotonically to a local minimum in mean square error (MSE) sense. In addition, we also present the reason of convergence in MSE sense in the context of VC task.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123825645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282061
Y. Tew, Tiong Yew Tang, Yoonku Lee
There are plenty of digital education tools to provide additional assistance for conducting lecture class in university. For instance, online video source (e.g., YouTube) provides practical coding exercise for web application development, interactive communication channel (e.g., Google Hangout) provides platform for distance learning. However, these tools are rarely to be connected with a real-life environmental conditions. An advanced education system shall consider students attendance, activities and intention to pay attention as a part of assessment and provide appropriate education tools to improve the education quality. Therefore, there is an urge to adopt recent Internet of Technology (IoT) to detect and sense the environmental condition (e.g., room temperature, student activities) and produce necessary reaction (e.g, air condition control, awake overslept students). In this paper, we propose an integrated platform by utilizing the advanced IoT devices to improve the quality of education. Several IoT controller boards capabilities and features are described and compared for realizing the IoT solution in educational platform.
{"title":"A study on enhanced educational platform with adaptive sensing devices using IoT features","authors":"Y. Tew, Tiong Yew Tang, Yoonku Lee","doi":"10.1109/APSIPA.2017.8282061","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282061","url":null,"abstract":"There are plenty of digital education tools to provide additional assistance for conducting lecture class in university. For instance, online video source (e.g., YouTube) provides practical coding exercise for web application development, interactive communication channel (e.g., Google Hangout) provides platform for distance learning. However, these tools are rarely to be connected with a real-life environmental conditions. An advanced education system shall consider students attendance, activities and intention to pay attention as a part of assessment and provide appropriate education tools to improve the education quality. Therefore, there is an urge to adopt recent Internet of Technology (IoT) to detect and sense the environmental condition (e.g., room temperature, student activities) and produce necessary reaction (e.g, air condition control, awake overslept students). In this paper, we propose an integrated platform by utilizing the advanced IoT devices to improve the quality of education. Several IoT controller boards capabilities and features are described and compared for realizing the IoT solution in educational platform.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124935623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recently proposed compressive modulation (CM) offers a much higher bandwidth efficiency than the conventional modulation schemes such as binary phase shift keying (BPSK) and M-ary phase shift keying (M-PSK), due to the employment of the compressive sensing (CS) principle. However, the CS-driven reconstruction currently used in the CM scheme cannot guarantee the highly desirable performance because it ignores the characteristics of the aliased waveforms. In this paper, we propose to use the decision feedback equalization (DFE) technique in the reconstruction process and extend the idea of CM to the framework of random aliasing modulation, leading to a random aliasing modulation with decision directed demodulation (abbreviated as RAM-DDD). Our experimental results show that the performance of the proposed RAM-DDD scheme, measured by either the bandwidth efficiency or bit error rate (BER) at the same SNR, has outperformed the alternatives.
{"title":"Random aliasing modulation with decision-directed demodulation","authors":"Cairong Xing, Anhong Wang, Suyue Li, Peihao Li, Jing Zhang","doi":"10.1109/APSIPA.2017.8282086","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282086","url":null,"abstract":"The recently proposed compressive modulation (CM) offers a much higher bandwidth efficiency than the conventional modulation schemes such as binary phase shift keying (BPSK) and M-ary phase shift keying (M-PSK), due to the employment of the compressive sensing (CS) principle. However, the CS-driven reconstruction currently used in the CM scheme cannot guarantee the highly desirable performance because it ignores the characteristics of the aliased waveforms. In this paper, we propose to use the decision feedback equalization (DFE) technique in the reconstruction process and extend the idea of CM to the framework of random aliasing modulation, leading to a random aliasing modulation with decision directed demodulation (abbreviated as RAM-DDD). Our experimental results show that the performance of the proposed RAM-DDD scheme, measured by either the bandwidth efficiency or bit error rate (BER) at the same SNR, has outperformed the alternatives.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122724585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282193
Ai-jun Li, Gongping Wang
Linguistic focus conveys semantic meanings and speakers' intentions. However, the perceptual and production patterns of focal speech for L2 learners are always affected by their mother tongues. The present paper concerns the longitudinal developmental patterns of focus duration for Chinese learners whose mother tongue is Korean (hereafter Korean Chinese Learners). The results show that (i) The development trajectory of focus duration follows a non-linear pattern. (ii) Tone of the focal syllable significantly affects the longitudinal development, that tone 3 (the low tone) shows a larger deviation than other tones. (iii) Focus position also has an obvious effect, especially in the initial and final positions of the sentence.
{"title":"The longitudinal development of focus duration of Korean Chinese learners","authors":"Ai-jun Li, Gongping Wang","doi":"10.1109/APSIPA.2017.8282193","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282193","url":null,"abstract":"Linguistic focus conveys semantic meanings and speakers' intentions. However, the perceptual and production patterns of focal speech for L2 learners are always affected by their mother tongues. The present paper concerns the longitudinal developmental patterns of focus duration for Chinese learners whose mother tongue is Korean (hereafter Korean Chinese Learners). The results show that (i) The development trajectory of focus duration follows a non-linear pattern. (ii) Tone of the focal syllable significantly affects the longitudinal development, that tone 3 (the low tone) shows a larger deviation than other tones. (iii) Focus position also has an obvious effect, especially in the initial and final positions of the sentence.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282296
Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou
In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.
{"title":"Mood disorder identification using deep bottleneck features of elicited speech","authors":"Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou","doi":"10.1109/APSIPA.2017.8282296","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282296","url":null,"abstract":"In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127568989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282183
Shinya Takamaeda-Yamazaki, Kodai Ueyoshi, Kota Ando, Ryota Uematsu, Kazutoshi Hirose, M. Ikebe, T. Asai, M. Motomura
Hardware-oriented approaches to accelerate deep neural network processing are very important for various embedded intelligent applications. This paper is a summary of our recent achievements for efficient neural network processing. We focus on the binarization approach for energy- and area-efficient neural network processor. We first present an energy-efficient binarized processor for deep neural networks by employing inmemory processing architecture. The real processor LSI achieves high performance and energy-efficiency compared to prior works. We then present an architecture exploration technique for binarized neural network processor on an FPGA. The exploration result indicates that the binarized hardware achieves very high performance by exploiting multiple different parallelisms at the same time.
{"title":"Accelerating deep learning by binarized hardware","authors":"Shinya Takamaeda-Yamazaki, Kodai Ueyoshi, Kota Ando, Ryota Uematsu, Kazutoshi Hirose, M. Ikebe, T. Asai, M. Motomura","doi":"10.1109/APSIPA.2017.8282183","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282183","url":null,"abstract":"Hardware-oriented approaches to accelerate deep neural network processing are very important for various embedded intelligent applications. This paper is a summary of our recent achievements for efficient neural network processing. We focus on the binarization approach for energy- and area-efficient neural network processor. We first present an energy-efficient binarized processor for deep neural networks by employing inmemory processing architecture. The real processor LSI achieves high performance and energy-efficiency compared to prior works. We then present an architecture exploration technique for binarized neural network processor on an FPGA. The exploration result indicates that the binarized hardware achieves very high performance by exploiting multiple different parallelisms at the same time.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132762076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-01DOI: 10.1109/APSIPA.2017.8282205
Y. Sugawara, Sayaka Shiota, H. Kiya
An acceleration method for interpolation-based super-resolution (SR) methods using convolutional neural networks (CNNs), represented by SRCNN and VDSR, is proposed. In this paper, estimated pixels are classified into a number of types according to upscaling factors, and then SR images are generated by using CNNs optimized for each type. It allows us to adapt smaller filter sizes to CNNs than conventional ones, so that the computational complexity can be reduced for both running phase and training one. In addition, it is shown that the optimized CNNs for some type are closely related to those of other types, and the relation provides a method to reduce the computational complexity for training phase. A number of experiments are carried out to demonstrate that the effectiveness of the proposed method. The proposed method outperforms conventional ones in terms of the processing speed, while keeping the quality of SR images.
{"title":"A parallel computation algorithm for super-resolution methods using convolutional neural networks","authors":"Y. Sugawara, Sayaka Shiota, H. Kiya","doi":"10.1109/APSIPA.2017.8282205","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282205","url":null,"abstract":"An acceleration method for interpolation-based super-resolution (SR) methods using convolutional neural networks (CNNs), represented by SRCNN and VDSR, is proposed. In this paper, estimated pixels are classified into a number of types according to upscaling factors, and then SR images are generated by using CNNs optimized for each type. It allows us to adapt smaller filter sizes to CNNs than conventional ones, so that the computational complexity can be reduced for both running phase and training one. In addition, it is shown that the optimized CNNs for some type are closely related to those of other types, and the relation provides a method to reduce the computational complexity for training phase. A number of experiments are carried out to demonstrate that the effectiveness of the proposed method. The proposed method outperforms conventional ones in terms of the processing speed, while keeping the quality of SR images.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"18 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113938799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}