Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659795
Hyekyoung Hwang, T. Bui, Sang-il Ahn, Jitae Shin
Dealing with multiple scale of object is main problem in computer vision. Feature Pyramid Networks (FPN) has widely used in instance segmentation area to utilize multiple scales of features. Using different scale of feature maps, the method enables to capture a various sizes of objects in a scene. However, FPN still cannot propagate semantic information of deeper layer into the shallow layer which contains spatial information strongly. In this paper, we propose a novel network which consists of stage residual connection and aggregation between $boldsymbol{C_{i}}$ and $boldsymbol{P}_{boldsymbol{i}-1}$ above the FPN to improve the imperfectness of original FPNs for the instance segmentation. Our proposed network is called Skipped-Hierarchical Feature Pyramid Networks (SH-FPN), integrated on Mask R-CNN. Experimental results of SH-FPN show that it has significant improvement on Data Science Bowl 2018 benchmark dataset on nuclei segmentation, compared to FPN.
{"title":"Skipped-Hierarchical Feature Pyramid Networks for Nuclei Instance Segmentation","authors":"Hyekyoung Hwang, T. Bui, Sang-il Ahn, Jitae Shin","doi":"10.23919/APSIPA.2018.8659795","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659795","url":null,"abstract":"Dealing with multiple scale of object is main problem in computer vision. Feature Pyramid Networks (FPN) has widely used in instance segmentation area to utilize multiple scales of features. Using different scale of feature maps, the method enables to capture a various sizes of objects in a scene. However, FPN still cannot propagate semantic information of deeper layer into the shallow layer which contains spatial information strongly. In this paper, we propose a novel network which consists of stage residual connection and aggregation between $boldsymbol{C_{i}}$ and $boldsymbol{P}_{boldsymbol{i}-1}$ above the FPN to improve the imperfectness of original FPNs for the instance segmentation. Our proposed network is called Skipped-Hierarchical Feature Pyramid Networks (SH-FPN), integrated on Mask R-CNN. Experimental results of SH-FPN show that it has significant improvement on Data Science Bowl 2018 benchmark dataset on nuclei segmentation, compared to FPN.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115404405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659631
Shuhei Yamasaki, Minato Oriuchi, O. Takyu, K. Shirai, T. Fujii, M. Ohta, F. Sasamori, S. Handa
Addressing low delay and high traffic performance is a technique necessary for wireless sensor networks (WSN). Although physical wireless parameter conversion sensor networks (PhyC-SN) achieve simultaneous information gathering from multiple sensors, separating the gathered mixed sensing results becomes a difficult problem. The proposed method utilizes an approach used in multi target tracking (MTT) in order to separate the mixed data points into a set of sequential ones. Particularly, we regard the data separation problem as path planning problems. In short, we consider paths by connecting data points observed at the adjacent time, and find a set of continuous paths consisting of data points of the same sensor. Following the problem, the same number of paths as sensors are obtained, so all sensing results can be correctly discriminated and labeled over all times in WSN. Therefore, we focus on a $k$-shortest pass method of MTT. In this paper, we show the accuracy of signal separation through simulation experiments and evaluate it in terms of the precision rate quantitatively.
{"title":"A Signal Separation Method for Physical Wireless Parameter Conversion Sensor Networks Using K-Shortest Path","authors":"Shuhei Yamasaki, Minato Oriuchi, O. Takyu, K. Shirai, T. Fujii, M. Ohta, F. Sasamori, S. Handa","doi":"10.23919/APSIPA.2018.8659631","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659631","url":null,"abstract":"Addressing low delay and high traffic performance is a technique necessary for wireless sensor networks (WSN). Although physical wireless parameter conversion sensor networks (PhyC-SN) achieve simultaneous information gathering from multiple sensors, separating the gathered mixed sensing results becomes a difficult problem. The proposed method utilizes an approach used in multi target tracking (MTT) in order to separate the mixed data points into a set of sequential ones. Particularly, we regard the data separation problem as path planning problems. In short, we consider paths by connecting data points observed at the adjacent time, and find a set of continuous paths consisting of data points of the same sensor. Following the problem, the same number of paths as sensors are obtained, so all sensing results can be correctly discriminated and labeled over all times in WSN. Therefore, we focus on a $k$-shortest pass method of MTT. In this paper, we show the accuracy of signal separation through simulation experiments and evaluate it in terms of the precision rate quantitatively.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115495603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an extended version of the variational autoencoder (VAE) for sequence modeling. In contrast to the original VAE, the proposed model can directly handle variable-length observation sequences. Furthermore, the discriminative model and the generative model are simultaneously learned in a unified framework. The network architecture of the proposed model is inspired by the i-vector/PLDA framework, whose effectiveness has been proven in sequence modeling tasks such as speaker recognition. Experimental results on the TIMIT database show that the proposed model outperforms the traditional i-vector/PLDA system.
{"title":"Discriminative Feature Extraction Based on Sequential Variational Autoencoder for Speaker Recognition","authors":"Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda","doi":"10.23919/APSIPA.2018.8659722","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659722","url":null,"abstract":"This paper presents an extended version of the variational autoencoder (VAE) for sequence modeling. In contrast to the original VAE, the proposed model can directly handle variable-length observation sequences. Furthermore, the discriminative model and the generative model are simultaneously learned in a unified framework. The network architecture of the proposed model is inspired by the i-vector/PLDA framework, whose effectiveness has been proven in sequence modeling tasks such as speaker recognition. Experimental results on the TIMIT database show that the proposed model outperforms the traditional i-vector/PLDA system.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117193966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659672
Sara Akbarzadeh, Sungmin Lee, Satnam Singh, Chin-Tuan Tan
Relative levels of speech and noise, which is signal-to-noise ratio (SNR), alone as a metric may not fully account how human perceives speech in noise or making judgement on the sound quality of the speech component. To date, the most common rationale in front-end processing of noisy speech in assistive hearing devices is to reduce “noise” (estimated) with a sole objective to improve the overall SNR. Absolute sound pressure level of speech in the remaining noise, which is necessary for listeners to anchor their perceptual judgement, is assumed to be restored by the subsequent dynamic range compression stage intended to compensate for the loudness recruitment in hearing impaired (HI). However, un-coordinated setting of thresholds that trigger the nonlinear processing in these two separate stages, amplify the remaining “noise” and/or distortion instead. This will confuse listener's judgement of sound quality and deviate from the usual perceptual trend as one would expect when more noise was present. In this study, both normal hearing (NH) and HI listeners were asked to rate the sound quality of noisy speech and noise reduced speech as they perceived. The result found that speech processed by noise reduction algorithms were lower in quality compared to original unprocessed speech in noise conditions. The outcomes also showed that sound quality judgement was dependent on both input SNR and absolute level of speech, with a greater weightage on the latter, across both NH and HI listeners. The outcome of this study potentially suggests that integrating the two separate processing stages into one will better match with the underlying mechanism in auditory reception of sound. Further work will attempt to identify settings of these two processing stages for a better speech reception in assistive hearing device users.
{"title":"Implication of speech level control in noise to sound quality judgement","authors":"Sara Akbarzadeh, Sungmin Lee, Satnam Singh, Chin-Tuan Tan","doi":"10.23919/APSIPA.2018.8659672","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659672","url":null,"abstract":"Relative levels of speech and noise, which is signal-to-noise ratio (SNR), alone as a metric may not fully account how human perceives speech in noise or making judgement on the sound quality of the speech component. To date, the most common rationale in front-end processing of noisy speech in assistive hearing devices is to reduce “noise” (estimated) with a sole objective to improve the overall SNR. Absolute sound pressure level of speech in the remaining noise, which is necessary for listeners to anchor their perceptual judgement, is assumed to be restored by the subsequent dynamic range compression stage intended to compensate for the loudness recruitment in hearing impaired (HI). However, un-coordinated setting of thresholds that trigger the nonlinear processing in these two separate stages, amplify the remaining “noise” and/or distortion instead. This will confuse listener's judgement of sound quality and deviate from the usual perceptual trend as one would expect when more noise was present. In this study, both normal hearing (NH) and HI listeners were asked to rate the sound quality of noisy speech and noise reduced speech as they perceived. The result found that speech processed by noise reduction algorithms were lower in quality compared to original unprocessed speech in noise conditions. The outcomes also showed that sound quality judgement was dependent on both input SNR and absolute level of speech, with a greater weightage on the latter, across both NH and HI listeners. The outcome of this study potentially suggests that integrating the two separate processing stages into one will better match with the underlying mechanism in auditory reception of sound. Further work will attempt to identify settings of these two processing stages for a better speech reception in assistive hearing device users.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127308205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659637
Eita Nakamura, Ryo Nishikimi, S. Dixon, Kazuyoshi Yoshii
Statistical models of musical scores play an important role in various tasks of music information processing. It has been an open problem to construct a score model incorporating global repetitive structure of note sequences, which is expected to be useful for music transcription and other tasks. Since repetitions can be described by a sparse distribution over note patterns (segments of music), a possible solution is to consider a Bayesian score model in which such a sparse distribution is first generated for each individual piece and then musical notes are generated in units of note patterns according to the distribution. However, straightforward construction is impractical due to the enormous number of possible note patterns. We propose a probabilistic model that represents a cluster of note patterns, instead of explicitly dealing with the set of all possible note patterns, to attain computational tractability. A score model is constructed as a mixture or a Markov model of such clusters, which is compatible with the above framework for describing repetitive structure. As a practical test to evaluate the potential of the model, we consider the problem of singing transcription from vocal f0 trajectories. Evaluation results show that our model achieves better predictive ability and transcription accuracies compared to the conventional Markov model, nearly reaching state-of-the-art performance.
{"title":"Probabilistic Sequential Patterns for Singing Transcription","authors":"Eita Nakamura, Ryo Nishikimi, S. Dixon, Kazuyoshi Yoshii","doi":"10.23919/APSIPA.2018.8659637","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659637","url":null,"abstract":"Statistical models of musical scores play an important role in various tasks of music information processing. It has been an open problem to construct a score model incorporating global repetitive structure of note sequences, which is expected to be useful for music transcription and other tasks. Since repetitions can be described by a sparse distribution over note patterns (segments of music), a possible solution is to consider a Bayesian score model in which such a sparse distribution is first generated for each individual piece and then musical notes are generated in units of note patterns according to the distribution. However, straightforward construction is impractical due to the enormous number of possible note patterns. We propose a probabilistic model that represents a cluster of note patterns, instead of explicitly dealing with the set of all possible note patterns, to attain computational tractability. A score model is constructed as a mixture or a Markov model of such clusters, which is compatible with the above framework for describing repetitive structure. As a practical test to evaluate the potential of the model, we consider the problem of singing transcription from vocal f0 trajectories. Evaluation results show that our model achieves better predictive ability and transcription accuracies compared to the conventional Markov model, nearly reaching state-of-the-art performance.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126085145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659480
K. Takahashi, M. Akagi
Estimation of glottal vibration and vocal tract for singing voices is necessary for clarifying the mechanism of singing voice production. However, accurate estimation of glottal vibration and vocal tract shape in singing voices with a high fundamental frequency (f0) is difficult using simulated models such as the auto-regressive with exogenous input (ARX) model and LiljencrantsFant (LF) model. This is caused by two problems: the inaccurate estimation method of the glottal closure instant (GCI) and the inappropriate estimation method of ARX model parameter values in singing voices with high f0. Therefore, this proposed method aims to accurately estimate glottal source waveforms and vocal tract shape for singing voices with wide frequency range. To achieve this objective, we propose two solutions: estimation of GCI using an electroglottogram (EGG) signal and estimation of ARX model parameter values using multi-stage optimization and an evaluation function including the leaking effect from forwarded periods. In experiments using simulated singing voices and real singing voices, it was indicated that the accurate estimation of GCI, the reliable estimation of the parameter values of the ARX model for singing voices with high f0, and the estimation of glottal vibration and vocal tract shape in singing voices with wide frequency range were achieved by the proposed method.
{"title":"Estimation of glottal source waveforms and vocal tract shape for singing voices with wide frequency range","authors":"K. Takahashi, M. Akagi","doi":"10.23919/APSIPA.2018.8659480","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659480","url":null,"abstract":"Estimation of glottal vibration and vocal tract for singing voices is necessary for clarifying the mechanism of singing voice production. However, accurate estimation of glottal vibration and vocal tract shape in singing voices with a high fundamental frequency (f0) is difficult using simulated models such as the auto-regressive with exogenous input (ARX) model and LiljencrantsFant (LF) model. This is caused by two problems: the inaccurate estimation method of the glottal closure instant (GCI) and the inappropriate estimation method of ARX model parameter values in singing voices with high f0. Therefore, this proposed method aims to accurately estimate glottal source waveforms and vocal tract shape for singing voices with wide frequency range. To achieve this objective, we propose two solutions: estimation of GCI using an electroglottogram (EGG) signal and estimation of ARX model parameter values using multi-stage optimization and an evaluation function including the leaking effect from forwarded periods. In experiments using simulated singing voices and real singing voices, it was indicated that the accurate estimation of GCI, the reliable estimation of the parameter values of the ARX model for singing voices with high f0, and the estimation of glottal vibration and vocal tract shape in singing voices with wide frequency range were achieved by the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126116038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659583
Yafei Li, Jiageng Chen, A. Ho
Mobile phones are playing an important roles in our modern digital society, which have already replaced the traditional computer in many situations. Nevertheless, the number of malicious software also starts to grow and showed significant impact on our legal use. Among several mobile systems, the Android platform is currently the most widely used and open system, which also makes it a very attractive target for the malicious applications. User privacy is of great interest to many different agents, which becomes of the most valuable target for the malware, and the chatting software naturally become one of the richest information resource target. In this paper, we first investigate the core techniques that are used by the most monitoring softwares. Then we propose several correlation experiments to efficiently detect the those softwares. We developed a monitoring prototype as well as the detecting system, including the mobile phone side and the remote web server side, to simulate the scenario in the real-world environment. The experiment confirmed the efficiency of our approach.
{"title":"Chatting Application Monitoring on Android System and its Detection based on the Correlation Test","authors":"Yafei Li, Jiageng Chen, A. Ho","doi":"10.23919/APSIPA.2018.8659583","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659583","url":null,"abstract":"Mobile phones are playing an important roles in our modern digital society, which have already replaced the traditional computer in many situations. Nevertheless, the number of malicious software also starts to grow and showed significant impact on our legal use. Among several mobile systems, the Android platform is currently the most widely used and open system, which also makes it a very attractive target for the malicious applications. User privacy is of great interest to many different agents, which becomes of the most valuable target for the malware, and the chatting software naturally become one of the richest information resource target. In this paper, we first investigate the core techniques that are used by the most monitoring softwares. Then we propose several correlation experiments to efficiently detect the those softwares. We developed a monitoring prototype as well as the detecting system, including the mobile phone side and the remote web server side, to simulate the scenario in the real-world environment. The experiment confirmed the efficiency of our approach.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116136326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659666
H. Patil, Madhu R. Kamble
In this paper, we present a brief survey of various approaches used to detect replay attack for Automatic Speaker Verification (ASV) system. The replay spoofing attack is the most challenging task to detect as only few seconds of audio samples are required to replay genuine speaker's voice. Due to large availability and the widespread usage of the mobile/smart gadgets, recording devices, it is easy and simple to record and replay the genuine speaker's voice. The challenging task, in replay spoof attack is to detect the acoustical characteristics of the speech signal between the natural and replayed version. The speech signal recorded with the playback device contains the convolutional and additive distortions from the intermediate device. Background noise and channel degradations seriously constrain the performance of the system. The goal of this paper is to provide an overview of the replay attack focusing on 2nd ASVspoof 2017 challenge which is an emerging research problem in the field of anti-spoofing. This paper presents critical analysis of state-of-the-art techniques, various countermeasures, databases, and also aims to present current limitations along with road map ahead, i.e., future research directions in this technological challenging problem.
{"title":"A Survey on Replay Attack Detection for Automatic Speaker Verification (ASV) System","authors":"H. Patil, Madhu R. Kamble","doi":"10.23919/APSIPA.2018.8659666","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659666","url":null,"abstract":"In this paper, we present a brief survey of various approaches used to detect replay attack for Automatic Speaker Verification (ASV) system. The replay spoofing attack is the most challenging task to detect as only few seconds of audio samples are required to replay genuine speaker's voice. Due to large availability and the widespread usage of the mobile/smart gadgets, recording devices, it is easy and simple to record and replay the genuine speaker's voice. The challenging task, in replay spoof attack is to detect the acoustical characteristics of the speech signal between the natural and replayed version. The speech signal recorded with the playback device contains the convolutional and additive distortions from the intermediate device. Background noise and channel degradations seriously constrain the performance of the system. The goal of this paper is to provide an overview of the replay attack focusing on 2nd ASVspoof 2017 challenge which is an emerging research problem in the field of anti-spoofing. This paper presents critical analysis of state-of-the-art techniques, various countermeasures, databases, and also aims to present current limitations along with road map ahead, i.e., future research directions in this technological challenging problem.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122532496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659670
Kewei Chen, Stefan Werner, A. Kuh, Yih-Fang Huang
Principles of adaptive filtering and signal processing are useful tools in machine learning. Nonlinear adaptive filtering techniques, though often are analytically intractable, are more suitable for dealing with complex practical problems. This paper develops a nonlinear online learning algorithm with a kernel set-membership filtering (SMF) approach. One of the main features in the SMF framework is its data-dependent selective update of parameter estimates. Accordingly, the kernel SMF algorithm can not only selectively update its parameter estimates by making discerning use of the input data, but also selectively increase the dimension of the kernel expansions with a model sparsification criterion. This results in more sparse kernel expansions and less computation in the update of parameter estimates, making the proposed online learning algorithm more effective. Both analytical and numerical results are presented in this paper to corroborate the above statements.
{"title":"Nonlinear Online Learning — A Kernel SMF Approach","authors":"Kewei Chen, Stefan Werner, A. Kuh, Yih-Fang Huang","doi":"10.23919/APSIPA.2018.8659670","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659670","url":null,"abstract":"Principles of adaptive filtering and signal processing are useful tools in machine learning. Nonlinear adaptive filtering techniques, though often are analytically intractable, are more suitable for dealing with complex practical problems. This paper develops a nonlinear online learning algorithm with a kernel set-membership filtering (SMF) approach. One of the main features in the SMF framework is its data-dependent selective update of parameter estimates. Accordingly, the kernel SMF algorithm can not only selectively update its parameter estimates by making discerning use of the input data, but also selectively increase the dimension of the kernel expansions with a model sparsification criterion. This results in more sparse kernel expansions and less computation in the update of parameter estimates, making the proposed online learning algorithm more effective. Both analytical and numerical results are presented in this paper to corroborate the above statements.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122865954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659509
Yanzhen Ren, Weiman Zheng, Lina Wang
SILK, as a speech codec for real-time packet-based voice communications, which is widely used in many popular mobile Internet application, such as Skype, WeChat, QQ, WhatsApp, etc. It will be a novel and ideal carrier for information hiding. In this paper, a secure steganography scheme for SILK is proposed, which embeds secret message by modifying the LSF (Line Spectral Frequency) quantization indices based on the statistical distribution of LSF Codebook. The experimental results show that the auditory concealment of the proposed scheme is excellent, the decrease in PESQ is very small. The average hiding capacity can achieve 129 bps and 223 bps under the sampling rate of 8 kHz and 16 kHz respectively. More importantly, the proposed scheme has good statistical security. In this scheme, the statistical distribution of LSF Codebook is considered as a constraint condition to make the distribution of stego's codeword close to that of the cover audio. Under the steganlysis scheme which is referenced from the existing steganlysis scheme for G.723.1, the average correct detection rate is under 55.4% for both cover and stego audio. To the best of our knowledge, this is the first work to hide information in SILK. Based on the similar principle of speech compression, the method can be extended to other CELP codec, such as G.723.1, G.729, AMR, etc.
{"title":"SILK Steganography Scheme Based on the Distribution of LSF Parameter","authors":"Yanzhen Ren, Weiman Zheng, Lina Wang","doi":"10.23919/APSIPA.2018.8659509","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659509","url":null,"abstract":"SILK, as a speech codec for real-time packet-based voice communications, which is widely used in many popular mobile Internet application, such as Skype, WeChat, QQ, WhatsApp, etc. It will be a novel and ideal carrier for information hiding. In this paper, a secure steganography scheme for SILK is proposed, which embeds secret message by modifying the LSF (Line Spectral Frequency) quantization indices based on the statistical distribution of LSF Codebook. The experimental results show that the auditory concealment of the proposed scheme is excellent, the decrease in PESQ is very small. The average hiding capacity can achieve 129 bps and 223 bps under the sampling rate of 8 kHz and 16 kHz respectively. More importantly, the proposed scheme has good statistical security. In this scheme, the statistical distribution of LSF Codebook is considered as a constraint condition to make the distribution of stego's codeword close to that of the cover audio. Under the steganlysis scheme which is referenced from the existing steganlysis scheme for G.723.1, the average correct detection rate is under 55.4% for both cover and stego audio. To the best of our knowledge, this is the first work to hide information in SILK. Based on the similar principle of speech compression, the method can be extended to other CELP codec, such as G.723.1, G.729, AMR, etc.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121868719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}