首页 > 最新文献

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Skipped-Hierarchical Feature Pyramid Networks for Nuclei Instance Segmentation 核实例分割的跳过层次特征金字塔网络
Hyekyoung Hwang, T. Bui, Sang-il Ahn, Jitae Shin
Dealing with multiple scale of object is main problem in computer vision. Feature Pyramid Networks (FPN) has widely used in instance segmentation area to utilize multiple scales of features. Using different scale of feature maps, the method enables to capture a various sizes of objects in a scene. However, FPN still cannot propagate semantic information of deeper layer into the shallow layer which contains spatial information strongly. In this paper, we propose a novel network which consists of stage residual connection and aggregation between $boldsymbol{C_{i}}$ and $boldsymbol{P}_{boldsymbol{i}-1}$ above the FPN to improve the imperfectness of original FPNs for the instance segmentation. Our proposed network is called Skipped-Hierarchical Feature Pyramid Networks (SH-FPN), integrated on Mask R-CNN. Experimental results of SH-FPN show that it has significant improvement on Data Science Bowl 2018 benchmark dataset on nuclei segmentation, compared to FPN.
多尺度目标的处理是计算机视觉中的主要问题。特征金字塔网络(FPN)利用多尺度的特征在实例分割领域得到了广泛的应用。该方法使用不同比例的特征图,可以捕获场景中不同大小的物体。然而,FPN仍然不能将深层的语义信息传播到含有强烈空间信息的浅层。本文提出了一种新的FPN网络,该网络在FPN之上的$boldsymbol{C_{i}}$和$boldsymbol{P}_{boldsymbol{i}-1}$之间进行阶段残差连接和聚合,以改善原始FPN在实例分割方面的不完善性。我们提出的网络被称为跳过-分层特征金字塔网络(SH-FPN),集成在掩码R-CNN上。实验结果表明,与FPN相比,SH-FPN在2018年数据科学碗基准数据集上的核分割有显著改善。
{"title":"Skipped-Hierarchical Feature Pyramid Networks for Nuclei Instance Segmentation","authors":"Hyekyoung Hwang, T. Bui, Sang-il Ahn, Jitae Shin","doi":"10.23919/APSIPA.2018.8659795","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659795","url":null,"abstract":"Dealing with multiple scale of object is main problem in computer vision. Feature Pyramid Networks (FPN) has widely used in instance segmentation area to utilize multiple scales of features. Using different scale of feature maps, the method enables to capture a various sizes of objects in a scene. However, FPN still cannot propagate semantic information of deeper layer into the shallow layer which contains spatial information strongly. In this paper, we propose a novel network which consists of stage residual connection and aggregation between $boldsymbol{C_{i}}$ and $boldsymbol{P}_{boldsymbol{i}-1}$ above the FPN to improve the imperfectness of original FPNs for the instance segmentation. Our proposed network is called Skipped-Hierarchical Feature Pyramid Networks (SH-FPN), integrated on Mask R-CNN. Experimental results of SH-FPN show that it has significant improvement on Data Science Bowl 2018 benchmark dataset on nuclei segmentation, compared to FPN.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115404405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Signal Separation Method for Physical Wireless Parameter Conversion Sensor Networks Using K-Shortest Path 基于k -最短路径的物理无线参数转换传感器网络信号分离方法
Shuhei Yamasaki, Minato Oriuchi, O. Takyu, K. Shirai, T. Fujii, M. Ohta, F. Sasamori, S. Handa
Addressing low delay and high traffic performance is a technique necessary for wireless sensor networks (WSN). Although physical wireless parameter conversion sensor networks (PhyC-SN) achieve simultaneous information gathering from multiple sensors, separating the gathered mixed sensing results becomes a difficult problem. The proposed method utilizes an approach used in multi target tracking (MTT) in order to separate the mixed data points into a set of sequential ones. Particularly, we regard the data separation problem as path planning problems. In short, we consider paths by connecting data points observed at the adjacent time, and find a set of continuous paths consisting of data points of the same sensor. Following the problem, the same number of paths as sensors are obtained, so all sensing results can be correctly discriminated and labeled over all times in WSN. Therefore, we focus on a $k$-shortest pass method of MTT. In this paper, we show the accuracy of signal separation through simulation experiments and evaluate it in terms of the precision rate quantitatively.
解决低延迟和高流量性能是实现无线传感器网络的必要技术。物理无线参数转换传感器网络(physical wireless parameter conversion sensor network, physical - sn)实现了对多个传感器的信息同时采集,但对采集到的混合传感结果进行分离是一个难题。该方法利用多目标跟踪(MTT)中的一种方法,将混合数据点分离成一组连续的数据点。特别地,我们把数据分离问题看作是路径规划问题。简而言之,我们通过连接相邻时间观测到的数据点来考虑路径,并找到由同一传感器的数据点组成的一组连续路径。根据该问题,获得与传感器相同数量的路径,从而在WSN中始终能够正确地区分和标记所有的传感结果。因此,我们重点研究了MTT的k次最短传递方法。本文通过仿真实验证明了信号分离的准确性,并从精度率方面对其进行了定量评价。
{"title":"A Signal Separation Method for Physical Wireless Parameter Conversion Sensor Networks Using K-Shortest Path","authors":"Shuhei Yamasaki, Minato Oriuchi, O. Takyu, K. Shirai, T. Fujii, M. Ohta, F. Sasamori, S. Handa","doi":"10.23919/APSIPA.2018.8659631","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659631","url":null,"abstract":"Addressing low delay and high traffic performance is a technique necessary for wireless sensor networks (WSN). Although physical wireless parameter conversion sensor networks (PhyC-SN) achieve simultaneous information gathering from multiple sensors, separating the gathered mixed sensing results becomes a difficult problem. The proposed method utilizes an approach used in multi target tracking (MTT) in order to separate the mixed data points into a set of sequential ones. Particularly, we regard the data separation problem as path planning problems. In short, we consider paths by connecting data points observed at the adjacent time, and find a set of continuous paths consisting of data points of the same sensor. Following the problem, the same number of paths as sensors are obtained, so all sensing results can be correctly discriminated and labeled over all times in WSN. Therefore, we focus on a $k$-shortest pass method of MTT. In this paper, we show the accuracy of signal separation through simulation experiments and evaluate it in terms of the precision rate quantitatively.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115495603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discriminative Feature Extraction Based on Sequential Variational Autoencoder for Speaker Recognition 基于顺序变分自编码器的判别性特征提取在说话人识别中的应用
Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda
This paper presents an extended version of the variational autoencoder (VAE) for sequence modeling. In contrast to the original VAE, the proposed model can directly handle variable-length observation sequences. Furthermore, the discriminative model and the generative model are simultaneously learned in a unified framework. The network architecture of the proposed model is inspired by the i-vector/PLDA framework, whose effectiveness has been proven in sequence modeling tasks such as speaker recognition. Experimental results on the TIMIT database show that the proposed model outperforms the traditional i-vector/PLDA system.
本文提出了用于序列建模的变分自编码器(VAE)的扩展版本。与原始VAE相比,该模型可以直接处理变长观测序列。此外,判别模型和生成模型在一个统一的框架中同时学习。该模型的网络架构受到i-vector/PLDA框架的启发,其有效性已在说话人识别等序列建模任务中得到证明。在TIMIT数据库上的实验结果表明,该模型优于传统的i-vector/PLDA系统。
{"title":"Discriminative Feature Extraction Based on Sequential Variational Autoencoder for Speaker Recognition","authors":"Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, K. Tokuda","doi":"10.23919/APSIPA.2018.8659722","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659722","url":null,"abstract":"This paper presents an extended version of the variational autoencoder (VAE) for sequence modeling. In contrast to the original VAE, the proposed model can directly handle variable-length observation sequences. Furthermore, the discriminative model and the generative model are simultaneously learned in a unified framework. The network architecture of the proposed model is inspired by the i-vector/PLDA framework, whose effectiveness has been proven in sequence modeling tasks such as speaker recognition. Experimental results on the TIMIT database show that the proposed model outperforms the traditional i-vector/PLDA system.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117193966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implication of speech level control in noise to sound quality judgement 噪声中语音电平控制对音质判断的意义
Sara Akbarzadeh, Sungmin Lee, Satnam Singh, Chin-Tuan Tan
Relative levels of speech and noise, which is signal-to-noise ratio (SNR), alone as a metric may not fully account how human perceives speech in noise or making judgement on the sound quality of the speech component. To date, the most common rationale in front-end processing of noisy speech in assistive hearing devices is to reduce “noise” (estimated) with a sole objective to improve the overall SNR. Absolute sound pressure level of speech in the remaining noise, which is necessary for listeners to anchor their perceptual judgement, is assumed to be restored by the subsequent dynamic range compression stage intended to compensate for the loudness recruitment in hearing impaired (HI). However, un-coordinated setting of thresholds that trigger the nonlinear processing in these two separate stages, amplify the remaining “noise” and/or distortion instead. This will confuse listener's judgement of sound quality and deviate from the usual perceptual trend as one would expect when more noise was present. In this study, both normal hearing (NH) and HI listeners were asked to rate the sound quality of noisy speech and noise reduced speech as they perceived. The result found that speech processed by noise reduction algorithms were lower in quality compared to original unprocessed speech in noise conditions. The outcomes also showed that sound quality judgement was dependent on both input SNR and absolute level of speech, with a greater weightage on the latter, across both NH and HI listeners. The outcome of this study potentially suggests that integrating the two separate processing stages into one will better match with the underlying mechanism in auditory reception of sound. Further work will attempt to identify settings of these two processing stages for a better speech reception in assistive hearing device users.
语音和噪声的相对水平,即信噪比(SNR),单独作为度量可能不能完全说明人类如何在噪声中感知语音或对语音成分的音质做出判断。迄今为止,在辅助听力设备中对有噪声语音进行前端处理时,最常见的基本原理是减少“噪声”(估计),其唯一目的是提高整体信噪比。在剩余噪声中,语音的绝对声压级是听者固定感知判断所必需的,它可以通过随后的动态范围压缩阶段恢复,以补偿听障(HI)的响度补充。然而,在这两个独立的阶段触发非线性处理的阈值的不协调设置,反而放大了剩余的“噪声”和/或失真。这将混淆听者对音质的判断,并偏离通常的感知趋势,因为人们期望更多的噪音存在。在这项研究中,正常听力(NH)和高听力(HI)听众都被要求对他们所感知到的嘈杂语音和降噪语音的音质进行评分。结果发现,在噪声条件下,经过降噪算法处理的语音质量低于原始未处理的语音。结果还表明,音质判断依赖于输入信噪比和绝对语音水平,后者在NH和HI听众中占有更大的权重。本研究的结果可能表明,将两个独立的加工阶段整合为一个阶段将更好地匹配听觉接收的潜在机制。进一步的工作将试图确定这两个处理阶段的设置,以使辅助听力设备用户更好地接受语音。
{"title":"Implication of speech level control in noise to sound quality judgement","authors":"Sara Akbarzadeh, Sungmin Lee, Satnam Singh, Chin-Tuan Tan","doi":"10.23919/APSIPA.2018.8659672","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659672","url":null,"abstract":"Relative levels of speech and noise, which is signal-to-noise ratio (SNR), alone as a metric may not fully account how human perceives speech in noise or making judgement on the sound quality of the speech component. To date, the most common rationale in front-end processing of noisy speech in assistive hearing devices is to reduce “noise” (estimated) with a sole objective to improve the overall SNR. Absolute sound pressure level of speech in the remaining noise, which is necessary for listeners to anchor their perceptual judgement, is assumed to be restored by the subsequent dynamic range compression stage intended to compensate for the loudness recruitment in hearing impaired (HI). However, un-coordinated setting of thresholds that trigger the nonlinear processing in these two separate stages, amplify the remaining “noise” and/or distortion instead. This will confuse listener's judgement of sound quality and deviate from the usual perceptual trend as one would expect when more noise was present. In this study, both normal hearing (NH) and HI listeners were asked to rate the sound quality of noisy speech and noise reduced speech as they perceived. The result found that speech processed by noise reduction algorithms were lower in quality compared to original unprocessed speech in noise conditions. The outcomes also showed that sound quality judgement was dependent on both input SNR and absolute level of speech, with a greater weightage on the latter, across both NH and HI listeners. The outcome of this study potentially suggests that integrating the two separate processing stages into one will better match with the underlying mechanism in auditory reception of sound. Further work will attempt to identify settings of these two processing stages for a better speech reception in assistive hearing device users.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127308205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Probabilistic Sequential Patterns for Singing Transcription 歌唱转录的概率顺序模式
Eita Nakamura, Ryo Nishikimi, S. Dixon, Kazuyoshi Yoshii
Statistical models of musical scores play an important role in various tasks of music information processing. It has been an open problem to construct a score model incorporating global repetitive structure of note sequences, which is expected to be useful for music transcription and other tasks. Since repetitions can be described by a sparse distribution over note patterns (segments of music), a possible solution is to consider a Bayesian score model in which such a sparse distribution is first generated for each individual piece and then musical notes are generated in units of note patterns according to the distribution. However, straightforward construction is impractical due to the enormous number of possible note patterns. We propose a probabilistic model that represents a cluster of note patterns, instead of explicitly dealing with the set of all possible note patterns, to attain computational tractability. A score model is constructed as a mixture or a Markov model of such clusters, which is compatible with the above framework for describing repetitive structure. As a practical test to evaluate the potential of the model, we consider the problem of singing transcription from vocal f0 trajectories. Evaluation results show that our model achieves better predictive ability and transcription accuracies compared to the conventional Markov model, nearly reaching state-of-the-art performance.
乐谱统计模型在音乐信息处理的各种任务中起着重要的作用。构建一个包含音符序列整体重复结构的乐谱模型一直是一个有待解决的问题,该模型有望用于音乐转录和其他任务。由于重复可以通过音符模式(音乐片段)的稀疏分布来描述,一个可能的解决方案是考虑贝叶斯评分模型,在该模型中,首先为每个单独的片段生成这样的稀疏分布,然后根据分布以音符模式为单位生成音符。然而,由于大量可能的音符模式,直接的结构是不切实际的。我们提出了一个概率模型来表示一组音符模式,而不是明确地处理所有可能的音符模式,以获得计算可追溯性。分数模型被构建为这些聚类的混合模型或马尔可夫模型,这与上述描述重复结构的框架是兼容的。作为评估该模型潜力的实际测试,我们考虑了从声乐轨迹唱歌转录的问题。评估结果表明,与传统的马尔可夫模型相比,我们的模型具有更好的预测能力和转录精度,几乎达到了最先进的性能。
{"title":"Probabilistic Sequential Patterns for Singing Transcription","authors":"Eita Nakamura, Ryo Nishikimi, S. Dixon, Kazuyoshi Yoshii","doi":"10.23919/APSIPA.2018.8659637","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659637","url":null,"abstract":"Statistical models of musical scores play an important role in various tasks of music information processing. It has been an open problem to construct a score model incorporating global repetitive structure of note sequences, which is expected to be useful for music transcription and other tasks. Since repetitions can be described by a sparse distribution over note patterns (segments of music), a possible solution is to consider a Bayesian score model in which such a sparse distribution is first generated for each individual piece and then musical notes are generated in units of note patterns according to the distribution. However, straightforward construction is impractical due to the enormous number of possible note patterns. We propose a probabilistic model that represents a cluster of note patterns, instead of explicitly dealing with the set of all possible note patterns, to attain computational tractability. A score model is constructed as a mixture or a Markov model of such clusters, which is compatible with the above framework for describing repetitive structure. As a practical test to evaluate the potential of the model, we consider the problem of singing transcription from vocal f0 trajectories. Evaluation results show that our model achieves better predictive ability and transcription accuracies compared to the conventional Markov model, nearly reaching state-of-the-art performance.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126085145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Estimation of glottal source waveforms and vocal tract shape for singing voices with wide frequency range 宽频域歌唱声门源波形及声道形状的估计
K. Takahashi, M. Akagi
Estimation of glottal vibration and vocal tract for singing voices is necessary for clarifying the mechanism of singing voice production. However, accurate estimation of glottal vibration and vocal tract shape in singing voices with a high fundamental frequency (f0) is difficult using simulated models such as the auto-regressive with exogenous input (ARX) model and LiljencrantsFant (LF) model. This is caused by two problems: the inaccurate estimation method of the glottal closure instant (GCI) and the inappropriate estimation method of ARX model parameter values in singing voices with high f0. Therefore, this proposed method aims to accurately estimate glottal source waveforms and vocal tract shape for singing voices with wide frequency range. To achieve this objective, we propose two solutions: estimation of GCI using an electroglottogram (EGG) signal and estimation of ARX model parameter values using multi-stage optimization and an evaluation function including the leaking effect from forwarded periods. In experiments using simulated singing voices and real singing voices, it was indicated that the accurate estimation of GCI, the reliable estimation of the parameter values of the ARX model for singing voices with high f0, and the estimation of glottal vibration and vocal tract shape in singing voices with wide frequency range were achieved by the proposed method.
对声门振动和声道的估计是阐明发声机制的必要条件。然而,使用外生输入自回归(ARX)模型和liljencrantsant (LF)模型等模拟模型,难以准确估计具有高基频(f0)的歌唱声音的声门振动和声道形状。这是由两个问题造成的:声门关闭瞬间(GCI)的估计方法不准确,以及高f0唱歌声音中ARX模型参数值的估计方法不合适。因此,本文提出的方法旨在准确估计频率范围较宽的歌唱声音的声门源波形和声道形状。为了实现这一目标,我们提出了两种解决方案:使用声门电图(EGG)信号估计GCI,使用多阶段优化和包含转发周期泄漏效应的评估函数估计ARX模型参数值。通过模拟歌声和真实歌声的实验表明,该方法可以准确估计GCI,可靠估计高f0歌声的ARX模型参数值,以及估计宽频率范围歌声的声门振动和声道形状。
{"title":"Estimation of glottal source waveforms and vocal tract shape for singing voices with wide frequency range","authors":"K. Takahashi, M. Akagi","doi":"10.23919/APSIPA.2018.8659480","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659480","url":null,"abstract":"Estimation of glottal vibration and vocal tract for singing voices is necessary for clarifying the mechanism of singing voice production. However, accurate estimation of glottal vibration and vocal tract shape in singing voices with a high fundamental frequency (f0) is difficult using simulated models such as the auto-regressive with exogenous input (ARX) model and LiljencrantsFant (LF) model. This is caused by two problems: the inaccurate estimation method of the glottal closure instant (GCI) and the inappropriate estimation method of ARX model parameter values in singing voices with high f0. Therefore, this proposed method aims to accurately estimate glottal source waveforms and vocal tract shape for singing voices with wide frequency range. To achieve this objective, we propose two solutions: estimation of GCI using an electroglottogram (EGG) signal and estimation of ARX model parameter values using multi-stage optimization and an evaluation function including the leaking effect from forwarded periods. In experiments using simulated singing voices and real singing voices, it was indicated that the accurate estimation of GCI, the reliable estimation of the parameter values of the ARX model for singing voices with high f0, and the estimation of glottal vibration and vocal tract shape in singing voices with wide frequency range were achieved by the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126116038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Chatting Application Monitoring on Android System and its Detection based on the Correlation Test 基于相关测试的Android聊天应用监控及检测
Yafei Li, Jiageng Chen, A. Ho
Mobile phones are playing an important roles in our modern digital society, which have already replaced the traditional computer in many situations. Nevertheless, the number of malicious software also starts to grow and showed significant impact on our legal use. Among several mobile systems, the Android platform is currently the most widely used and open system, which also makes it a very attractive target for the malicious applications. User privacy is of great interest to many different agents, which becomes of the most valuable target for the malware, and the chatting software naturally become one of the richest information resource target. In this paper, we first investigate the core techniques that are used by the most monitoring softwares. Then we propose several correlation experiments to efficiently detect the those softwares. We developed a monitoring prototype as well as the detecting system, including the mobile phone side and the remote web server side, to simulate the scenario in the real-world environment. The experiment confirmed the efficiency of our approach.
手机在现代数字社会中扮演着重要的角色,在许多情况下已经取代了传统的电脑。然而,恶意软件的数量也开始增长,并对我们的合法使用产生了重大影响。在众多的移动系统中,Android平台是目前使用最广泛、最开放的系统,这也使其成为恶意应用程序的一个极具吸引力的目标。用户隐私是众多代理关注的焦点,成为恶意软件攻击的最有价值的目标,而聊天软件自然成为信息资源最丰富的目标之一。在本文中,我们首先研究了大多数监控软件使用的核心技术。然后,我们提出了一些相关实验来有效地检测这些软件。我们开发了一个监控原型和检测系统,包括手机端和远程web服务器端,以模拟现实环境中的场景。实验证实了我们方法的有效性。
{"title":"Chatting Application Monitoring on Android System and its Detection based on the Correlation Test","authors":"Yafei Li, Jiageng Chen, A. Ho","doi":"10.23919/APSIPA.2018.8659583","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659583","url":null,"abstract":"Mobile phones are playing an important roles in our modern digital society, which have already replaced the traditional computer in many situations. Nevertheless, the number of malicious software also starts to grow and showed significant impact on our legal use. Among several mobile systems, the Android platform is currently the most widely used and open system, which also makes it a very attractive target for the malicious applications. User privacy is of great interest to many different agents, which becomes of the most valuable target for the malware, and the chatting software naturally become one of the richest information resource target. In this paper, we first investigate the core techniques that are used by the most monitoring softwares. Then we propose several correlation experiments to efficiently detect the those softwares. We developed a monitoring prototype as well as the detecting system, including the mobile phone side and the remote web server side, to simulate the scenario in the real-world environment. The experiment confirmed the efficiency of our approach.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116136326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Survey on Replay Attack Detection for Automatic Speaker Verification (ASV) System 自动说话人验证(ASV)系统重放攻击检测研究综述
H. Patil, Madhu R. Kamble
In this paper, we present a brief survey of various approaches used to detect replay attack for Automatic Speaker Verification (ASV) system. The replay spoofing attack is the most challenging task to detect as only few seconds of audio samples are required to replay genuine speaker's voice. Due to large availability and the widespread usage of the mobile/smart gadgets, recording devices, it is easy and simple to record and replay the genuine speaker's voice. The challenging task, in replay spoof attack is to detect the acoustical characteristics of the speech signal between the natural and replayed version. The speech signal recorded with the playback device contains the convolutional and additive distortions from the intermediate device. Background noise and channel degradations seriously constrain the performance of the system. The goal of this paper is to provide an overview of the replay attack focusing on 2nd ASVspoof 2017 challenge which is an emerging research problem in the field of anti-spoofing. This paper presents critical analysis of state-of-the-art techniques, various countermeasures, databases, and also aims to present current limitations along with road map ahead, i.e., future research directions in this technological challenging problem.
在本文中,我们简要介绍了用于检测自动说话人验证(ASV)系统重放攻击的各种方法。重放欺骗攻击是最具挑战性的任务,因为只需要几秒钟的音频样本就可以重放真正的说话人的声音。由于移动/智能设备、录音设备的大量可用性和广泛使用,录制和回放真正的说话者的声音很容易和简单。在重放欺骗攻击中,最具挑战性的任务是检测语音信号的自然版本和重放版本之间的声学特性。用回放设备记录的语音信号包含来自中间设备的卷积和加性失真。背景噪声和信道退化严重制约了系统的性能。本文的目的是概述重播攻击,重点关注2017年第2次ASVspoof挑战,这是反欺骗领域的一个新兴研究问题。本文对最先进的技术、各种对策、数据库进行了批判性分析,并旨在提出当前的局限性以及未来的路线图,即在这一技术挑战问题上的未来研究方向。
{"title":"A Survey on Replay Attack Detection for Automatic Speaker Verification (ASV) System","authors":"H. Patil, Madhu R. Kamble","doi":"10.23919/APSIPA.2018.8659666","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659666","url":null,"abstract":"In this paper, we present a brief survey of various approaches used to detect replay attack for Automatic Speaker Verification (ASV) system. The replay spoofing attack is the most challenging task to detect as only few seconds of audio samples are required to replay genuine speaker's voice. Due to large availability and the widespread usage of the mobile/smart gadgets, recording devices, it is easy and simple to record and replay the genuine speaker's voice. The challenging task, in replay spoof attack is to detect the acoustical characteristics of the speech signal between the natural and replayed version. The speech signal recorded with the playback device contains the convolutional and additive distortions from the intermediate device. Background noise and channel degradations seriously constrain the performance of the system. The goal of this paper is to provide an overview of the replay attack focusing on 2nd ASVspoof 2017 challenge which is an emerging research problem in the field of anti-spoofing. This paper presents critical analysis of state-of-the-art techniques, various countermeasures, databases, and also aims to present current limitations along with road map ahead, i.e., future research directions in this technological challenging problem.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122532496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Nonlinear Online Learning — A Kernel SMF Approach 非线性在线学习-核SMF方法
Kewei Chen, Stefan Werner, A. Kuh, Yih-Fang Huang
Principles of adaptive filtering and signal processing are useful tools in machine learning. Nonlinear adaptive filtering techniques, though often are analytically intractable, are more suitable for dealing with complex practical problems. This paper develops a nonlinear online learning algorithm with a kernel set-membership filtering (SMF) approach. One of the main features in the SMF framework is its data-dependent selective update of parameter estimates. Accordingly, the kernel SMF algorithm can not only selectively update its parameter estimates by making discerning use of the input data, but also selectively increase the dimension of the kernel expansions with a model sparsification criterion. This results in more sparse kernel expansions and less computation in the update of parameter estimates, making the proposed online learning algorithm more effective. Both analytical and numerical results are presented in this paper to corroborate the above statements.
自适应滤波和信号处理原理是机器学习中有用的工具。非线性自适应滤波技术虽然在分析上难以处理,但更适合于处理复杂的实际问题。提出了一种基于核集隶属度滤波(SMF)的非线性在线学习算法。SMF框架的主要特征之一是参数估计的数据依赖选择性更新。因此,核SMF算法不仅可以通过识别输入数据有选择地更新其参数估计,还可以通过模型稀疏化准则有选择地增加核展开的维数。这使得更稀疏的核展开和更少的参数估计更新计算,使所提出的在线学习算法更加有效。本文给出了解析和数值结果来证实上述说法。
{"title":"Nonlinear Online Learning — A Kernel SMF Approach","authors":"Kewei Chen, Stefan Werner, A. Kuh, Yih-Fang Huang","doi":"10.23919/APSIPA.2018.8659670","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659670","url":null,"abstract":"Principles of adaptive filtering and signal processing are useful tools in machine learning. Nonlinear adaptive filtering techniques, though often are analytically intractable, are more suitable for dealing with complex practical problems. This paper develops a nonlinear online learning algorithm with a kernel set-membership filtering (SMF) approach. One of the main features in the SMF framework is its data-dependent selective update of parameter estimates. Accordingly, the kernel SMF algorithm can not only selectively update its parameter estimates by making discerning use of the input data, but also selectively increase the dimension of the kernel expansions with a model sparsification criterion. This results in more sparse kernel expansions and less computation in the update of parameter estimates, making the proposed online learning algorithm more effective. Both analytical and numerical results are presented in this paper to corroborate the above statements.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122865954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
SILK Steganography Scheme Based on the Distribution of LSF Parameter 基于LSF参数分布的SILK隐写方案
Yanzhen Ren, Weiman Zheng, Lina Wang
SILK, as a speech codec for real-time packet-based voice communications, which is widely used in many popular mobile Internet application, such as Skype, WeChat, QQ, WhatsApp, etc. It will be a novel and ideal carrier for information hiding. In this paper, a secure steganography scheme for SILK is proposed, which embeds secret message by modifying the LSF (Line Spectral Frequency) quantization indices based on the statistical distribution of LSF Codebook. The experimental results show that the auditory concealment of the proposed scheme is excellent, the decrease in PESQ is very small. The average hiding capacity can achieve 129 bps and 223 bps under the sampling rate of 8 kHz and 16 kHz respectively. More importantly, the proposed scheme has good statistical security. In this scheme, the statistical distribution of LSF Codebook is considered as a constraint condition to make the distribution of stego's codeword close to that of the cover audio. Under the steganlysis scheme which is referenced from the existing steganlysis scheme for G.723.1, the average correct detection rate is under 55.4% for both cover and stego audio. To the best of our knowledge, this is the first work to hide information in SILK. Based on the similar principle of speech compression, the method can be extended to other CELP codec, such as G.723.1, G.729, AMR, etc.
SILK是一种基于实时分组语音通信的语音编解码器,广泛应用于许多流行的移动互联网应用,如Skype、微信、QQ、WhatsApp等。它将是一种新型的、理想的信息隐藏载体。提出了一种基于LSF码本统计分布,通过修改LSF (Line Spectral Frequency)量化指标嵌入秘密信息的SILK安全隐写方案。实验结果表明,所提方案具有良好的听觉隐蔽性,对PESQ的降低很小。在8 kHz和16 kHz采样率下,平均隐藏容量分别达到129 bps和223 bps。更重要的是,该方案具有良好的统计安全性。在该方案中,将LSF码本的统计分布作为约束条件,使stego的码字分布接近于封面音频的码字分布。在参考现有G.723.1隐写方案的隐写方案下,覆盖音频和隐写音频的平均正确检出率都在55.4%以下。据我们所知,这是第一个在SILK中隐藏信息的作品。基于类似的语音压缩原理,该方法可以扩展到其他CELP编解码器,如G.723.1、G.729、AMR等。
{"title":"SILK Steganography Scheme Based on the Distribution of LSF Parameter","authors":"Yanzhen Ren, Weiman Zheng, Lina Wang","doi":"10.23919/APSIPA.2018.8659509","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659509","url":null,"abstract":"SILK, as a speech codec for real-time packet-based voice communications, which is widely used in many popular mobile Internet application, such as Skype, WeChat, QQ, WhatsApp, etc. It will be a novel and ideal carrier for information hiding. In this paper, a secure steganography scheme for SILK is proposed, which embeds secret message by modifying the LSF (Line Spectral Frequency) quantization indices based on the statistical distribution of LSF Codebook. The experimental results show that the auditory concealment of the proposed scheme is excellent, the decrease in PESQ is very small. The average hiding capacity can achieve 129 bps and 223 bps under the sampling rate of 8 kHz and 16 kHz respectively. More importantly, the proposed scheme has good statistical security. In this scheme, the statistical distribution of LSF Codebook is considered as a constraint condition to make the distribution of stego's codeword close to that of the cover audio. Under the steganlysis scheme which is referenced from the existing steganlysis scheme for G.723.1, the average correct detection rate is under 55.4% for both cover and stego audio. To the best of our knowledge, this is the first work to hide information in SILK. Based on the similar principle of speech compression, the method can be extended to other CELP codec, such as G.723.1, G.729, AMR, etc.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121868719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1