首页 > 最新文献

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
An investigation of spectral feature partitioning for replay attacks detection 基于频谱特征划分的重放攻击检测研究
Zhi Hao Lim, Xiaohai Tian, Wei Rao, Chng Eng Siong
Replay attacks from unseen utterances poses a significant challenge in Anti-Spoofing Detection. In this paper, we propose a statistical measure based on the Rayleigh Quotient in order to investigate a feature partition capable of discerning genuine and playback speech under unseen conditions. The Log- Magnitude Spectrum (LMS) of the utterances is used in this study. Using the proposed measure, we analyze the frequency bands of the LMS based on the amount of discriminative information between the scatter matrices of the genuine and spoof utterances. This allows us to determine the optimal frequency bands required for replay attacks detection. In addition, we further investigate the effects of training our models using voiced and unvoiced portions of the utterances. We conducted our experiments based on the ASVspoof 2017 database. On the development set, our partitioned LMS feature based on the whole utterance yields a 3.8% EER. After utilizing just the unvoiced portions of the utterances, the EER is further decreased to 3.27% while our baseline using the Constant Q Cepstral Coefficients (CQCC) as a feature is at 10.21%. The evaluation results also confirms the effectiveness of our approach.
不可见话语的重放攻击对反欺骗检测提出了重大挑战。在本文中,我们提出了一种基于瑞利商的统计度量,以研究能够在不可见条件下识别真实语音和回放语音的特征划分。本研究使用了语音的对数幅度谱(LMS)。利用所提出的度量,我们基于真实话语和欺骗话语散点矩阵之间的判别信息量来分析LMS的频带。这使我们能够确定重放攻击检测所需的最佳频段。此外,我们进一步研究了使用发音和非发音部分训练我们的模型的效果。我们基于ASVspoof 2017数据库进行了实验。在开发集上,我们基于整个话语的分区LMS特征产生3.8%的EER。在只使用语音的不发音部分后,EER进一步降低到3.27%,而我们使用恒定Q频谱系数(CQCC)作为特征的基线为10.21%。评价结果也证实了该方法的有效性。
{"title":"An investigation of spectral feature partitioning for replay attacks detection","authors":"Zhi Hao Lim, Xiaohai Tian, Wei Rao, Chng Eng Siong","doi":"10.1109/APSIPA.2017.8282273","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282273","url":null,"abstract":"Replay attacks from unseen utterances poses a significant challenge in Anti-Spoofing Detection. In this paper, we propose a statistical measure based on the Rayleigh Quotient in order to investigate a feature partition capable of discerning genuine and playback speech under unseen conditions. The Log- Magnitude Spectrum (LMS) of the utterances is used in this study. Using the proposed measure, we analyze the frequency bands of the LMS based on the amount of discriminative information between the scatter matrices of the genuine and spoof utterances. This allows us to determine the optimal frequency bands required for replay attacks detection. In addition, we further investigate the effects of training our models using voiced and unvoiced portions of the utterances. We conducted our experiments based on the ASVspoof 2017 database. On the development set, our partitioned LMS feature based on the whole utterance yields a 3.8% EER. After utilizing just the unvoiced portions of the utterances, the EER is further decreased to 3.27% while our baseline using the Constant Q Cepstral Coefficients (CQCC) as a feature is at 10.21%. The evaluation results also confirms the effectiveness of our approach.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126801053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
I2R-NUS submission to oriental language recognition AP16-OL7 challenge I2R-NUS提交的东方语言识别AP16-OL7挑战
Hanwu Sun, Kong-Aik Lee, Trung Hieu Nguyen, B. Ma, Haizhou Li
This paper presents a detailed description and analysis of a joint submission of Institute for Infocomm Research (I2R) and National University of Singapore (NUS), which is the top performing system to AP16-OL7 Challenge. The submitted system was a fusion of two sub-systems: the i-vector system and GMM-SVM system, both based on state-of-the-art bottleneck feature. Central to our work presented in this paper is a language-dependent UBM GMM-SVM system and traditional i- vector polynomials expansion with SVM classifier. The FoCal toolkit was used for sub-system fusion. Experimental results show that the proposed approach achieves significant improvement over the baseline system on the development and evaluation sets. Our final submission achieve EER 0.440%, 1.09% and identification rates 98.9%, 97.6% on the development set and evaluation set, respectively.
本文详细描述和分析了由信息通信研究所(I2R)和新加坡国立大学(NUS)联合提交的AP16-OL7挑战赛中表现最好的系统。所提交的系统是两个子系统的融合:i-vector系统和GMM-SVM系统,都是基于最先进的瓶颈特征。本文的核心工作是基于语言的UBM GMM-SVM系统和基于支持向量机分类器的i向量多项式展开。FoCal工具箱用于子系统融合。实验结果表明,该方法在开发集和评估集上都比基线系统有了显著的改进。我们最终提交的开发集和评价集的识别率分别达到了0.40%、1.09%和98.9%、97.6%。
{"title":"I2R-NUS submission to oriental language recognition AP16-OL7 challenge","authors":"Hanwu Sun, Kong-Aik Lee, Trung Hieu Nguyen, B. Ma, Haizhou Li","doi":"10.1109/APSIPA.2017.8282274","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282274","url":null,"abstract":"This paper presents a detailed description and analysis of a joint submission of Institute for Infocomm Research (I2R) and National University of Singapore (NUS), which is the top performing system to AP16-OL7 Challenge. The submitted system was a fusion of two sub-systems: the i-vector system and GMM-SVM system, both based on state-of-the-art bottleneck feature. Central to our work presented in this paper is a language-dependent UBM GMM-SVM system and traditional i- vector polynomials expansion with SVM classifier. The FoCal toolkit was used for sub-system fusion. Experimental results show that the proposed approach achieves significant improvement over the baseline system on the development and evaluation sets. Our final submission achieve EER 0.440%, 1.09% and identification rates 98.9%, 97.6% on the development set and evaluation set, respectively.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114062635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Panchromatic and multi-spectral image fusion method based on two-step sparse representation and wavelet transform 基于两步稀疏表示和小波变换的全色与多光谱图像融合方法
G. He, Siyuan Xing, Dandan Dong, Ximei Zhao
Based on the characteristics of two-step sparse coding and multi-scale analysis of wavelet transform, a novel fusion algorithm based on two-step sparse coding (Two Step Sparse Representation, TSSR) and wavelet transform is proposed. The two-step sparse strategy is used to construct the corresponding dictionary for the low-frequency component and the down- sampled low-frequency component respectively, which avoids the training process of the traditional sparse representation and improves the computing speed. At the same time, the sparse coefficient solution based on two-step sparse coding is closer to the original signal than the one-step sparse solution in traditional sparse representation, and the precision of the algorithm is higher. Experimental results and analysis show that the proposed method can not only keep the spectral characteristics, but also can effectively integrate the spatial detail information of panchromatic images. The computing time is much faster than the traditional sparse method, and it has more advantages than wavelet transform and traditional sparse representation with excellent fusion effect.
基于两步稀疏编码和小波变换多尺度分析的特点,提出了一种基于两步稀疏编码和小波变换的融合算法(Two Step sparse Representation, TSSR)。采用两步稀疏策略分别对低频分量和下采样的低频分量构建相应的字典,避免了传统稀疏表示的训练过程,提高了计算速度。同时,基于两步稀疏编码的稀疏系数解比传统稀疏表示的一步稀疏解更接近原始信号,算法精度更高。实验结果和分析表明,该方法既能保持全色图像的光谱特征,又能有效地整合全色图像的空间细节信息。与传统的稀疏表示相比,该方法的计算速度快得多,具有小波变换和传统稀疏表示所没有的优点,融合效果好。
{"title":"Panchromatic and multi-spectral image fusion method based on two-step sparse representation and wavelet transform","authors":"G. He, Siyuan Xing, Dandan Dong, Ximei Zhao","doi":"10.1109/APSIPA.2017.8282055","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282055","url":null,"abstract":"Based on the characteristics of two-step sparse coding and multi-scale analysis of wavelet transform, a novel fusion algorithm based on two-step sparse coding (Two Step Sparse Representation, TSSR) and wavelet transform is proposed. The two-step sparse strategy is used to construct the corresponding dictionary for the low-frequency component and the down- sampled low-frequency component respectively, which avoids the training process of the traditional sparse representation and improves the computing speed. At the same time, the sparse coefficient solution based on two-step sparse coding is closer to the original signal than the one-step sparse solution in traditional sparse representation, and the precision of the algorithm is higher. Experimental results and analysis show that the proposed method can not only keep the spectral characteristics, but also can effectively integrate the spatial detail information of panchromatic images. The computing time is much faster than the traditional sparse method, and it has more advantages than wavelet transform and traditional sparse representation with excellent fusion effect.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120905260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the convergence of INCA algorithm 关于INCA算法的收敛性
Nirmesh J. Shah, H. Patil
Development of text-independent Voice Conversion (VC) has gained more research interest for last one decade. Alignment of the source and target speakers' spectral features before learning the mapping function is the challenging step for the development of the text-independent VC as both the speakers have uttered different utterances from the same or different languages. State-of-the-art alignment technique is an Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment (INCA) algorithm that iteratively learns the mapping function after getting the nearest neighbor aligned feature pairs from intermediate converted spectral features and target spectral features. To the best of authors' knowledge, this algorithm was shown to converge empirically, however, its theoretical proof has not been discussed in detail in the VC literature. In this paper, we have presented that the INCA algorithm will converge monotonically to a local minimum in mean square error (MSE) sense. In addition, we also present the reason of convergence in MSE sense in the context of VC task.
近十年来,与文本无关的语音转换(VC)的发展引起了越来越多的研究兴趣。在学习映射函数之前,对源和目标说话人的频谱特征进行对齐是开发与文本无关的VC的一个具有挑战性的步骤,因为两个说话人都是从相同或不同的语言中发出不同的话语。最先进的对准技术是一种最近邻搜索步骤和转换步骤对准(INCA)算法的迭代组合,该算法从中间转换的光谱特征和目标光谱特征中获得最近邻对齐的特征对后迭代学习映射函数。据作者所知,该算法在经验上是收敛的,然而,其理论证明在VC文献中没有详细讨论。在本文中,我们提出了INCA算法在均方误差(MSE)意义上单调收敛到局部最小值。此外,我们还在VC任务的背景下给出了MSE意义上的收敛原因。
{"title":"On the convergence of INCA algorithm","authors":"Nirmesh J. Shah, H. Patil","doi":"10.1109/APSIPA.2017.8282095","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282095","url":null,"abstract":"Development of text-independent Voice Conversion (VC) has gained more research interest for last one decade. Alignment of the source and target speakers' spectral features before learning the mapping function is the challenging step for the development of the text-independent VC as both the speakers have uttered different utterances from the same or different languages. State-of-the-art alignment technique is an Iterative combination of a Nearest Neighbor search step and a Conversion step Alignment (INCA) algorithm that iteratively learns the mapping function after getting the nearest neighbor aligned feature pairs from intermediate converted spectral features and target spectral features. To the best of authors' knowledge, this algorithm was shown to converge empirically, however, its theoretical proof has not been discussed in detail in the VC literature. In this paper, we have presented that the INCA algorithm will converge monotonically to a local minimum in mean square error (MSE) sense. In addition, we also present the reason of convergence in MSE sense in the context of VC task.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123825645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A study on enhanced educational platform with adaptive sensing devices using IoT features 基于物联网特性的自适应传感设备增强教育平台研究
Y. Tew, Tiong Yew Tang, Yoonku Lee
There are plenty of digital education tools to provide additional assistance for conducting lecture class in university. For instance, online video source (e.g., YouTube) provides practical coding exercise for web application development, interactive communication channel (e.g., Google Hangout) provides platform for distance learning. However, these tools are rarely to be connected with a real-life environmental conditions. An advanced education system shall consider students attendance, activities and intention to pay attention as a part of assessment and provide appropriate education tools to improve the education quality. Therefore, there is an urge to adopt recent Internet of Technology (IoT) to detect and sense the environmental condition (e.g., room temperature, student activities) and produce necessary reaction (e.g, air condition control, awake overslept students). In this paper, we propose an integrated platform by utilizing the advanced IoT devices to improve the quality of education. Several IoT controller boards capabilities and features are described and compared for realizing the IoT solution in educational platform.
有很多数字教育工具可以为大学课堂授课提供额外的帮助。例如,在线视频源(如YouTube)为web应用程序开发提供了实用的编码练习,交互式通信渠道(如Google Hangout)为远程学习提供了平台。然而,这些工具很少与现实生活中的环境条件联系起来。先进的教育体系应将学生的出勤率、活动和关注意愿作为评估的一部分,并提供适当的教育工具,以提高教育质量。因此,迫切需要采用最新的物联网(IoT)来检测和感知环境状况(例如室温,学生活动)并产生必要的反应(例如空调控制,唤醒睡过头的学生)。在本文中,我们提出了一个利用先进的物联网设备来提高教育质量的集成平台。为实现教育平台的物联网解决方案,对几种物联网控制器板的功能和特性进行了描述和比较。
{"title":"A study on enhanced educational platform with adaptive sensing devices using IoT features","authors":"Y. Tew, Tiong Yew Tang, Yoonku Lee","doi":"10.1109/APSIPA.2017.8282061","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282061","url":null,"abstract":"There are plenty of digital education tools to provide additional assistance for conducting lecture class in university. For instance, online video source (e.g., YouTube) provides practical coding exercise for web application development, interactive communication channel (e.g., Google Hangout) provides platform for distance learning. However, these tools are rarely to be connected with a real-life environmental conditions. An advanced education system shall consider students attendance, activities and intention to pay attention as a part of assessment and provide appropriate education tools to improve the education quality. Therefore, there is an urge to adopt recent Internet of Technology (IoT) to detect and sense the environmental condition (e.g., room temperature, student activities) and produce necessary reaction (e.g, air condition control, awake overslept students). In this paper, we propose an integrated platform by utilizing the advanced IoT devices to improve the quality of education. Several IoT controller boards capabilities and features are described and compared for realizing the IoT solution in educational platform.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124935623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Random aliasing modulation with decision-directed demodulation 随机混叠调制与决策定向解调
Cairong Xing, Anhong Wang, Suyue Li, Peihao Li, Jing Zhang
The recently proposed compressive modulation (CM) offers a much higher bandwidth efficiency than the conventional modulation schemes such as binary phase shift keying (BPSK) and M-ary phase shift keying (M-PSK), due to the employment of the compressive sensing (CS) principle. However, the CS-driven reconstruction currently used in the CM scheme cannot guarantee the highly desirable performance because it ignores the characteristics of the aliased waveforms. In this paper, we propose to use the decision feedback equalization (DFE) technique in the reconstruction process and extend the idea of CM to the framework of random aliasing modulation, leading to a random aliasing modulation with decision directed demodulation (abbreviated as RAM-DDD). Our experimental results show that the performance of the proposed RAM-DDD scheme, measured by either the bandwidth efficiency or bit error rate (BER) at the same SNR, has outperformed the alternatives.
由于采用压缩感知(CS)原理,最近提出的压缩调制(CM)提供了比传统调制方案(如二进制相移键控(BPSK)和M-ary相移键控(M-PSK)更高的带宽效率。然而,目前CM方案中使用的cs驱动重构由于忽略了混叠波形的特性,无法保证高期望的性能。在本文中,我们建议在重构过程中使用决策反馈均衡(DFE)技术,并将CM的思想扩展到随机混叠调制的框架中,从而实现具有决策定向解调的随机混叠调制(简称RAM-DDD)。实验结果表明,以相同信噪比下的带宽效率或误码率(BER)衡量,所提出的RAM-DDD方案的性能优于其他方案。
{"title":"Random aliasing modulation with decision-directed demodulation","authors":"Cairong Xing, Anhong Wang, Suyue Li, Peihao Li, Jing Zhang","doi":"10.1109/APSIPA.2017.8282086","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282086","url":null,"abstract":"The recently proposed compressive modulation (CM) offers a much higher bandwidth efficiency than the conventional modulation schemes such as binary phase shift keying (BPSK) and M-ary phase shift keying (M-PSK), due to the employment of the compressive sensing (CS) principle. However, the CS-driven reconstruction currently used in the CM scheme cannot guarantee the highly desirable performance because it ignores the characteristics of the aliased waveforms. In this paper, we propose to use the decision feedback equalization (DFE) technique in the reconstruction process and extend the idea of CM to the framework of random aliasing modulation, leading to a random aliasing modulation with decision directed demodulation (abbreviated as RAM-DDD). Our experimental results show that the performance of the proposed RAM-DDD scheme, measured by either the bandwidth efficiency or bit error rate (BER) at the same SNR, has outperformed the alternatives.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122724585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The longitudinal development of focus duration of Korean Chinese learners 韩国汉语学习者注意力持续时间的纵向发展
Ai-jun Li, Gongping Wang
Linguistic focus conveys semantic meanings and speakers' intentions. However, the perceptual and production patterns of focal speech for L2 learners are always affected by their mother tongues. The present paper concerns the longitudinal developmental patterns of focus duration for Chinese learners whose mother tongue is Korean (hereafter Korean Chinese Learners). The results show that (i) The development trajectory of focus duration follows a non-linear pattern. (ii) Tone of the focal syllable significantly affects the longitudinal development, that tone 3 (the low tone) shows a larger deviation than other tones. (iii) Focus position also has an obvious effect, especially in the initial and final positions of the sentence.
语言焦点传达的是语义和说话人的意图。然而,二语学习者焦点语的感知和产生模式总是受到母语的影响。本文研究了以韩语为母语的汉语学习者(以下简称朝鲜族汉语学习者)注意力持续时间的纵向发展模式。结果表明:(1)焦点持续时间的发展轨迹遵循非线性模式。(ii)焦点音节的音调对纵向发展有显著影响,其中音调3(低音)比其他音调偏差更大。(三)焦点位置也有明显的影响,尤其是在句子的开头和结尾位置。
{"title":"The longitudinal development of focus duration of Korean Chinese learners","authors":"Ai-jun Li, Gongping Wang","doi":"10.1109/APSIPA.2017.8282193","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282193","url":null,"abstract":"Linguistic focus conveys semantic meanings and speakers' intentions. However, the perceptual and production patterns of focal speech for L2 learners are always affected by their mother tongues. The present paper concerns the longitudinal developmental patterns of focus duration for Chinese learners whose mother tongue is Korean (hereafter Korean Chinese Learners). The results show that (i) The development trajectory of focus duration follows a non-linear pattern. (ii) Tone of the focal syllable significantly affects the longitudinal development, that tone 3 (the low tone) shows a larger deviation than other tones. (iii) Focus position also has an obvious effect, especially in the initial and final positions of the sentence.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mood disorder identification using deep bottleneck features of elicited speech 利用诱导言语的深度瓶颈特征识别情绪障碍
Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou
In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.
在精神健康障碍的诊断中,很大一部分双相情感障碍(BD)患者在最初表现时很可能被误诊为单极抑郁症(UD)。由于言语是最自然的情绪表达方式,本研究的重点是跟踪诱发言语的情绪特征,用于短期情绪障碍的识别。在这项工作中,提取语音信号的深散射谱(DSS)和低电平描述符(LLDs)作为语音特征。采用层次谱聚类(HSC)算法将情绪数据库与情绪障碍数据库相适应,以缓解数据偏差问题。然后使用去噪自编码器提取DSS和lld的瓶颈特征,以便更好地表示。基于瓶颈特征,采用长短期记忆(LSTM)方法生成时变情绪剖面序列。最后,在给定情绪轮廓序列的情况下,采用基于hmm的识别验证模型对情绪障碍进行识别。本工作收集了15名bd、15名ud和15名健康对照者的情感语音诱发数据,用于系统训练和评估。采用五重交叉验证进行评价。实验结果表明,使用瓶颈特征的系统识别准确率为73.33%,比不使用瓶颈特征的系统提高了8.89%。有验证机制的系统比无验证的系统性能提高了4.44%。
{"title":"Mood disorder identification using deep bottleneck features of elicited speech","authors":"Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Chia-Hui Chou","doi":"10.1109/APSIPA.2017.8282296","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282296","url":null,"abstract":"In the diagnosis of mental health disorder, a large portion of the Bipolar Disorder (BD) patients is likely to be misdiagnosed as Unipolar Depression (UD) on initial presentation. As speech is the most natural way to express emotion, this work focuses on tracking emotion profile of elicited speech for short-term mood disorder identification. In this work, the Deep Scattering Spectrum (DSS) and Low Level Descriptors (LLDs) of the elicited speech signals are extracted as the speech features. The hierarchical spectral clustering (HSC) algorithm is employed to adapt the emotion database to the mood disorder database to alleviate the data bias problem. The denoising autoencoder is then used to extract the bottleneck features of DSS and LLDs for better representation. Based on the bottleneck features, a long short term memory (LSTM) is applied to generate the time-varying emotion profile sequence. Finally, given the emotion profile sequence, the HMM-based identification and verification model is used to determine mood disorder. This work collected the elicited emotional speech data from 15 BDs, 15 UDs and 15 healthy controls for system training and evaluation. Five-fold cross validation was employed for evaluation. Experimental results show that the system using the bottleneck feature achieved an identification accuracy of 73.33%, improving by 8.89%, compared to that without bottleneck features. Furthermore, the system with verification mechanism, improving by 4.44%, outperformed that without verification.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127568989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Accelerating deep learning by binarized hardware 通过二值化硬件加速深度学习
Shinya Takamaeda-Yamazaki, Kodai Ueyoshi, Kota Ando, Ryota Uematsu, Kazutoshi Hirose, M. Ikebe, T. Asai, M. Motomura
Hardware-oriented approaches to accelerate deep neural network processing are very important for various embedded intelligent applications. This paper is a summary of our recent achievements for efficient neural network processing. We focus on the binarization approach for energy- and area-efficient neural network processor. We first present an energy-efficient binarized processor for deep neural networks by employing inmemory processing architecture. The real processor LSI achieves high performance and energy-efficiency compared to prior works. We then present an architecture exploration technique for binarized neural network processor on an FPGA. The exploration result indicates that the binarized hardware achieves very high performance by exploiting multiple different parallelisms at the same time.
面向硬件加速深度神经网络处理的方法对于各种嵌入式智能应用非常重要。本文综述了近年来在高效神经网络处理方面的研究成果。重点研究了能量和面积效率高的神经网络处理器的二值化方法。本文首先提出了一种采用内存处理架构的高效二值化深度神经网络处理器。与之前的产品相比,真正的处理器LSI实现了高性能和高能效。然后,我们提出了一种基于FPGA的二值化神经网络处理器的架构探索技术。研究结果表明,二值化后的硬件可以同时利用多种不同的并行性,从而获得很高的性能。
{"title":"Accelerating deep learning by binarized hardware","authors":"Shinya Takamaeda-Yamazaki, Kodai Ueyoshi, Kota Ando, Ryota Uematsu, Kazutoshi Hirose, M. Ikebe, T. Asai, M. Motomura","doi":"10.1109/APSIPA.2017.8282183","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282183","url":null,"abstract":"Hardware-oriented approaches to accelerate deep neural network processing are very important for various embedded intelligent applications. This paper is a summary of our recent achievements for efficient neural network processing. We focus on the binarization approach for energy- and area-efficient neural network processor. We first present an energy-efficient binarized processor for deep neural networks by employing inmemory processing architecture. The real processor LSI achieves high performance and energy-efficiency compared to prior works. We then present an architecture exploration technique for binarized neural network processor on an FPGA. The exploration result indicates that the binarized hardware achieves very high performance by exploiting multiple different parallelisms at the same time.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132762076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A parallel computation algorithm for super-resolution methods using convolutional neural networks 一种基于卷积神经网络的超分辨方法并行计算算法
Y. Sugawara, Sayaka Shiota, H. Kiya
An acceleration method for interpolation-based super-resolution (SR) methods using convolutional neural networks (CNNs), represented by SRCNN and VDSR, is proposed. In this paper, estimated pixels are classified into a number of types according to upscaling factors, and then SR images are generated by using CNNs optimized for each type. It allows us to adapt smaller filter sizes to CNNs than conventional ones, so that the computational complexity can be reduced for both running phase and training one. In addition, it is shown that the optimized CNNs for some type are closely related to those of other types, and the relation provides a method to reduce the computational complexity for training phase. A number of experiments are carried out to demonstrate that the effectiveness of the proposed method. The proposed method outperforms conventional ones in terms of the processing speed, while keeping the quality of SR images.
提出了一种利用以SRCNN和VDSR为代表的卷积神经网络(cnn)加速插值超分辨率(SR)方法。本文根据升尺度因子将估计的像素分类为多种类型,然后使用针对每种类型优化的cnn生成SR图像。它允许我们为cnn适应比传统滤波器更小的滤波器尺寸,从而可以降低运行阶段和训练阶段的计算复杂度。此外,研究表明,某些类型优化后的cnn与其他类型的cnn密切相关,这种关系为降低训练阶段的计算复杂度提供了一种方法。通过实验验证了该方法的有效性。该方法在保证图像质量的前提下,在处理速度上优于传统方法。
{"title":"A parallel computation algorithm for super-resolution methods using convolutional neural networks","authors":"Y. Sugawara, Sayaka Shiota, H. Kiya","doi":"10.1109/APSIPA.2017.8282205","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282205","url":null,"abstract":"An acceleration method for interpolation-based super-resolution (SR) methods using convolutional neural networks (CNNs), represented by SRCNN and VDSR, is proposed. In this paper, estimated pixels are classified into a number of types according to upscaling factors, and then SR images are generated by using CNNs optimized for each type. It allows us to adapt smaller filter sizes to CNNs than conventional ones, so that the computational complexity can be reduced for both running phase and training one. In addition, it is shown that the optimized CNNs for some type are closely related to those of other types, and the relation provides a method to reduce the computational complexity for training phase. A number of experiments are carried out to demonstrate that the effectiveness of the proposed method. The proposed method outperforms conventional ones in terms of the processing speed, while keeping the quality of SR images.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"18 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113938799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1