首页 > 最新文献

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network 基于时频掩模的卷积生成对抗网络语音增强
Neil Shah, H. Patil, Meet H. Soni
Speech Enhancement (SE) system deals with improving the perceptual quality and preserving the speech intelligibility of the noisy mixture. The Time-Frequency (T-F) masking-based SE using the supervised learning algorithm, such as a Deep Neural Network (DNN), has outperformed the traditional SE techniques. However, the notable difference observed between the oracle mask and the predicted mask, motivates us to explore different deep learning architectures. In this paper, we propose to use a Convolutional Neural Network (CNN)-based Generative Adversarial Network (GAN) for inherent mask estimation. GAN takes an advantage of the adversarial optimization, an alternative to the other Maximum Likelihood (ML) optimization-based architectures. We also show the need for supervised T-F mask estimation for effective noise suppression. Experimental results demonstrate that the proposed T-F mask-based SE significantly outperforms the recently proposed end-to-end SEGAN and a GAN-based Pix2Pix architecture. The performance evaluation in terms of both the predicted mask and the objective measures, dictates the improvement in the speech quality, while simultaneously reducing the speech distortion observed in the noisy mixture.
语音增强(SE)系统处理的是提高感知质量和保持混合噪声的语音可理解性。使用监督学习算法(如深度神经网络(DNN))的基于时间-频率(T-F)掩蔽的SE优于传统的SE技术。然而,在oracle掩码和预测掩码之间观察到的显著差异,激励我们探索不同的深度学习架构。在本文中,我们提出使用基于卷积神经网络(CNN)的生成对抗网络(GAN)进行固有掩码估计。GAN利用了对抗优化的优势,这是其他基于最大似然(ML)优化的架构的替代方案。我们还证明了为了有效抑制噪声,需要有监督的T-F掩模估计。实验结果表明,本文提出的基于T-F掩模的SE显著优于最近提出的端到端SEGAN和基于gan的Pix2Pix架构。从预测掩模和客观测量两方面进行的性能评估表明,语音质量得到了改善,同时减少了在噪声混合中观察到的语音失真。
{"title":"Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network","authors":"Neil Shah, H. Patil, Meet H. Soni","doi":"10.23919/APSIPA.2018.8659692","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659692","url":null,"abstract":"Speech Enhancement (SE) system deals with improving the perceptual quality and preserving the speech intelligibility of the noisy mixture. The Time-Frequency (T-F) masking-based SE using the supervised learning algorithm, such as a Deep Neural Network (DNN), has outperformed the traditional SE techniques. However, the notable difference observed between the oracle mask and the predicted mask, motivates us to explore different deep learning architectures. In this paper, we propose to use a Convolutional Neural Network (CNN)-based Generative Adversarial Network (GAN) for inherent mask estimation. GAN takes an advantage of the adversarial optimization, an alternative to the other Maximum Likelihood (ML) optimization-based architectures. We also show the need for supervised T-F mask estimation for effective noise suppression. Experimental results demonstrate that the proposed T-F mask-based SE significantly outperforms the recently proposed end-to-end SEGAN and a GAN-based Pix2Pix architecture. The performance evaluation in terms of both the predicted mask and the objective measures, dictates the improvement in the speech quality, while simultaneously reducing the speech distortion observed in the noisy mixture.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130724429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Block Tensor Train Decomposition for Missing Value Imputation 缺失值输入的块张量列分解
Namgil Lee
We propose a new method for imputation of missing values in large scale matrix data based on a low-rank tensor approximation technique called the block tensor train (TT) decomposition. Given sparsely observed data points, the proposed method iteratively computes the soft-thresholded singular value decomposition (SVD) of the underlying data matrix with missing values. The SVD of matrices is performed based on a low-rank block TT decomposition for large scale data matrices with a low-rank tensor structure. Experimental results on simulated data demonstrate that the proposed method can estimate a large amount of missing values accurately compared to a matrix-based standard method.
本文提出了一种基于低秩张量近似的块张量序列分解方法,用于大规模矩阵数据中缺失值的插值。在稀疏观测数据点的情况下,该方法迭代计算缺失值的底层数据矩阵的软阈值奇异值分解(SVD)。对于具有低秩张量结构的大规模数据矩阵,基于低秩块TT分解对矩阵进行奇异值分解。在仿真数据上的实验结果表明,与基于矩阵的标准方法相比,该方法可以准确地估计出大量的缺失值。
{"title":"Block Tensor Train Decomposition for Missing Value Imputation","authors":"Namgil Lee","doi":"10.23919/APSIPA.2018.8659560","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659560","url":null,"abstract":"We propose a new method for imputation of missing values in large scale matrix data based on a low-rank tensor approximation technique called the block tensor train (TT) decomposition. Given sparsely observed data points, the proposed method iteratively computes the soft-thresholded singular value decomposition (SVD) of the underlying data matrix with missing values. The SVD of matrices is performed based on a low-rank block TT decomposition for large scale data matrices with a low-rank tensor structure. Experimental results on simulated data demonstrate that the proposed method can estimate a large amount of missing values accurately compared to a matrix-based standard method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133519781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Comparative Effect of Snowfall, Accumulation, and Density on Speech Intelligibility 降雪量、累积量和密度对语音清晰度的比较效应
Shuto Shibata, K. Kondo
Sound is known to be altered in some manner by the acoustic characteristics of snow. However, the specific characteristics of snow, which actually affects the acoustical transfer characteristics, are not clearly understood. This transfer characteristics will be crucial in disaster prevention radio broadcasting systems that warn citizens working outdoors of potential natural disasters during the winter in regions with heavy snow. These systems use extremely high-output horn speakers to convey the warning messages to a large area. Accordingly, the purpose of this research is to clarify how the speech intelligibility will be influenced by the amount of snowfall, its accumulation, and the snow density. In this research, impulse response measurement outdoors is actually carried out during snowfall. We measured and compiled the transfer characteristics under several snow conditions, convolved these with test speech in order to simulate the transmitted speech quality during snow. We conducted a Japanese speech intelligibility test using these speech samples, and clarify the effect of each snow quality measure using multivariate analysis. As a result, it was found that although there is some influence of the amount of snowfall and density, the influence of the amount of snowfall becomes dominant as the distance between the loudspeaker and the listener (microphone) becomes large.
众所周知,雪的声学特性会以某种方式改变声音。然而,雪的具体特性,它实际上影响了声学传递特性,尚不清楚。这种传输特性对于在冬季大雪地区向户外工作的市民发出潜在自然灾害警报的防灾无线电广播系统至关重要。这些系统使用极高输出的喇叭扬声器将警告信息传达到大范围。因此,本研究的目的是阐明语音可理解度如何受到降雪量、累积量和雪密度的影响。在本研究中,脉冲响应测量实际上是在降雪期间进行的。我们测量并编译了几种雪条件下的传输特性,并将其与测试语音进行卷积,以模拟雪条件下传输的语音质量。我们使用这些语音样本进行了日语语音清晰度测试,并使用多变量分析阐明了每个雪质量测量的影响。结果发现,虽然降雪量和密度有一定的影响,但随着扬声器与听者(麦克风)之间的距离变大,降雪量的影响占主导地位。
{"title":"On the Comparative Effect of Snowfall, Accumulation, and Density on Speech Intelligibility","authors":"Shuto Shibata, K. Kondo","doi":"10.23919/APSIPA.2018.8659782","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659782","url":null,"abstract":"Sound is known to be altered in some manner by the acoustic characteristics of snow. However, the specific characteristics of snow, which actually affects the acoustical transfer characteristics, are not clearly understood. This transfer characteristics will be crucial in disaster prevention radio broadcasting systems that warn citizens working outdoors of potential natural disasters during the winter in regions with heavy snow. These systems use extremely high-output horn speakers to convey the warning messages to a large area. Accordingly, the purpose of this research is to clarify how the speech intelligibility will be influenced by the amount of snowfall, its accumulation, and the snow density. In this research, impulse response measurement outdoors is actually carried out during snowfall. We measured and compiled the transfer characteristics under several snow conditions, convolved these with test speech in order to simulate the transmitted speech quality during snow. We conducted a Japanese speech intelligibility test using these speech samples, and clarify the effect of each snow quality measure using multivariate analysis. As a result, it was found that although there is some influence of the amount of snowfall and density, the influence of the amount of snowfall becomes dominant as the distance between the loudspeaker and the listener (microphone) becomes large.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133549242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-View and Multi-Modal Action Recognition with Learned Fusion 基于学习融合的多视角多模态动作识别
Sandy Ardianto, H. Hang
In this paper, we study multi-modal and multi-view action recognition system based on the deep-learning techniques. We extended the Temporal Segment Network with additional data fusion stage to combine information from different sources. In this research, we use multiple types of information from different modality such as RGB, depth, infrared data to detect predefined human actions. We tested various combinations of these data sources to examine their impact on the final detection accuracy. We designed 3 information fusion methods to generate the final decision. The most interested one is the Learned Fusion Net designed by us. It turns out the Learned Fusion structure has the best results but requires more training.
本文研究了基于深度学习技术的多模态、多视角动作识别系统。我们对时态段网络进行了扩展,增加了数据融合阶段,以整合不同来源的信息。在本研究中,我们使用来自不同模态的多种类型的信息,如RGB、深度、红外数据来检测预定义的人类行为。我们测试了这些数据源的各种组合,以检查它们对最终检测精度的影响。我们设计了3种信息融合方法来生成最终的决策。最让人感兴趣的是我们设计的学习型融合网。事实证明,习得融合结构的效果最好,但需要更多的训练。
{"title":"Multi-View and Multi-Modal Action Recognition with Learned Fusion","authors":"Sandy Ardianto, H. Hang","doi":"10.23919/APSIPA.2018.8659539","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659539","url":null,"abstract":"In this paper, we study multi-modal and multi-view action recognition system based on the deep-learning techniques. We extended the Temporal Segment Network with additional data fusion stage to combine information from different sources. In this research, we use multiple types of information from different modality such as RGB, depth, infrared data to detect predefined human actions. We tested various combinations of these data sources to examine their impact on the final detection accuracy. We designed 3 information fusion methods to generate the final decision. The most interested one is the Learned Fusion Net designed by us. It turns out the Learned Fusion structure has the best results but requires more training.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133302894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Cocoa bean quality assessment using closed range hyperspectral images 用近距离高光谱图像评价可可豆质量
Oswaldo Bayona, Daniel Ochoa, Ronald Criollo, J. Cevallos-Cevallos, Wenzi Liao
Farmers mix high and low quality cocoa beans to increase their income at the expense of chocolate flavor. We use closed range hyperspectral images to recognize two common varieties of cocoa beans at various fermentation stages. Several image calibration issues are addressed in this paper to reduce the effect of the bean's shape in the reflectance image estimation and specular patches on the bean's surface. Fusion and feature extraction techniques were exploited for bean classification. From our experimental results, we noticed that bean's biochemical processes during fermentation of each bean type influences their spectral signatures enabling an increasingly better discrimination. We found that spectral indexes related to anthocyanin reflectance index yield a high discriminant rate, particularly at later fermentation stages. These findings suggest that bean classification is possible and could be adopted as the standard method for fast bean quality assessment.
农民将高质量和低质量的可可豆混合在一起,以牺牲巧克力的味道来增加收入。我们使用近距离高光谱图像来识别两个常见品种的可可豆在不同的发酵阶段。本文解决了几个图像校准问题,以减少豆子形状对反射图像估计和豆子表面镜面斑块的影响。利用融合和特征提取技术对豆类进行分类。从我们的实验结果中,我们注意到每种豆类在发酵过程中的生化过程都会影响其光谱特征,从而使其越来越好地识别。我们发现与花青素反射率指数相关的光谱指标产生了很高的判别率,特别是在发酵后期。这些结果表明,豆类分类是可行的,可作为快速评价豆类品质的标准方法。
{"title":"Cocoa bean quality assessment using closed range hyperspectral images","authors":"Oswaldo Bayona, Daniel Ochoa, Ronald Criollo, J. Cevallos-Cevallos, Wenzi Liao","doi":"10.23919/APSIPA.2018.8659490","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659490","url":null,"abstract":"Farmers mix high and low quality cocoa beans to increase their income at the expense of chocolate flavor. We use closed range hyperspectral images to recognize two common varieties of cocoa beans at various fermentation stages. Several image calibration issues are addressed in this paper to reduce the effect of the bean's shape in the reflectance image estimation and specular patches on the bean's surface. Fusion and feature extraction techniques were exploited for bean classification. From our experimental results, we noticed that bean's biochemical processes during fermentation of each bean type influences their spectral signatures enabling an increasingly better discrimination. We found that spectral indexes related to anthocyanin reflectance index yield a high discriminant rate, particularly at later fermentation stages. These findings suggest that bean classification is possible and could be adopted as the standard method for fast bean quality assessment.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132088500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Block-Permutation-Based Image Encryption Allowing Hierarchical Decryption 允许分层解密的基于块排列的图像加密
Yusuke Izawa, Shoko Imaizumi, H. Kiya
This paper proposes a block-permutation-based encryption (BPBE) scheme, which allows only decrypting particular regions in the encrypted image. It is difficult to perform partial decryption in the conventional scheme, because it encrypts the entire image at once. By composing regions in the original image, we can conduct the hierarchical encryption and achieve the partial decryption in the proposed scheme. Additionally, the proposed scheme can maintain the JPEG-LS compression efficiency of the encrypted images compared to the conventional scheme. Moreover, the resilience against jigsaw puzzle solving problems can be enhanced by applying the proposed scheme to the combined images. We further consider an efficient key management by using hash chains.
本文提出了一种基于块置换的加密(BPBE)方案,该方案只允许对加密图像中的特定区域解密。在传统的加密方案中,由于要一次性对整个图像进行加密,因此很难进行部分解密。该方案通过组合原始图像中的区域进行分层加密,实现部分解密。此外,与传统方案相比,该方案可以保持加密图像的JPEG-LS压缩效率。此外,通过将该方案应用于组合图像,可以增强对拼图求解问题的弹性。我们进一步考虑使用哈希链进行有效的密钥管理。
{"title":"A Block-Permutation-Based Image Encryption Allowing Hierarchical Decryption","authors":"Yusuke Izawa, Shoko Imaizumi, H. Kiya","doi":"10.23919/APSIPA.2018.8659479","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659479","url":null,"abstract":"This paper proposes a block-permutation-based encryption (BPBE) scheme, which allows only decrypting particular regions in the encrypted image. It is difficult to perform partial decryption in the conventional scheme, because it encrypts the entire image at once. By composing regions in the original image, we can conduct the hierarchical encryption and achieve the partial decryption in the proposed scheme. Additionally, the proposed scheme can maintain the JPEG-LS compression efficiency of the encrypted images compared to the conventional scheme. Moreover, the resilience against jigsaw puzzle solving problems can be enhanced by applying the proposed scheme to the combined images. We further consider an efficient key management by using hash chains.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127869386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Speech Processing Strategy based on Sinusoidal Speech Model for Cochlear Implant Users 基于正弦语音模型的人工耳蜗用户语音处理策略
Sungmin Lee, Sara Akbarzadeh, Satnam Singh, Chin-Tuan Tan
In sinusoidal modeling(SM), speech signal, which is pseudo-periodic in structure, can be approximated by sinusoids and noise without losing significant speech information. A speech processing strategy based on this sinusoidal speech model will be relevant for encoding electric pulse streams in cochlear implant (CI) processing, where the number of channels available is limited. In this study, 5 normal hearing(NH) listeners and 2 CI users were asked to perform the task of speech recognition and perceived sound quality rating on speech sentences processed in 12 different test conditions. The sinusoidal analysis/synthesis algorithm was limited to 1, 3 or 6 sinusoids from the sentences low-pass filtered at either 1 kHz, 1.5 kHz, 3 kHz, or 6 kHz, re-synthesized as the test conditions. Each of 12 lists of AzBio sentences was randomly chosen and process with one of 12 test conditions, before they were presented to each participant at 65 dB SPL (Sound Pressure Level). Participant was instructed to repeat the sentence as they perceived, and the number of words correctly recognized was scored. They were also asked to rate the perceived sound quality of the sentences including original speech sentence, on the scale of 1 (distorted) to 10 (clean). Both speech recognition score and perceived sound quality rating across all participants increase when the number of sinusoids increases and low-pass filter broadens. Our current finding showed that three sinusoids may be sufficient to elicit the nearly maximum speech intelligibility and quality necessary for both NH and CI listeners. Sinusoidal speech model has the potential in facilitating the basis for a speech processing strategy in CI.
在正弦建模(SM)中,语音信号具有伪周期结构,可以用正弦波和噪声来逼近,而不会丢失重要的语音信息。基于正弦语音模型的语音处理策略将适用于人工耳蜗(CI)处理中可用通道数量有限的电脉冲流编码。在本研究中,5名正常听力(NH)听众和2名CI使用者在12种不同的测试条件下对语音句子进行语音识别和感知音质评分。正弦分析/合成算法被限制在1 kHz、1.5 kHz、3 kHz或6 kHz低通滤波的句子中,重新合成1、3或6个正弦波作为测试条件。在以65 dB SPL(声压级)播放给每个参与者之前,随机选择12个AzBio句子列表中的每一个,并在12个测试条件中进行处理。参与者被要求重复他们所理解的句子,并对正确识别的单词数量进行评分。他们还被要求对包括原话在内的句子的声音质量进行评分,从1(失真)到10(干净)不等。当正弦波数量增加和低通滤波器变宽时,所有参与者的语音识别得分和感知声音质量评级都增加。我们目前的发现表明,三个正弦波可能足以引出NH和CI听众所需的几乎最大的语音清晰度和质量。正弦语音模型有潜力为CI中的语音处理策略提供基础。
{"title":"A Speech Processing Strategy based on Sinusoidal Speech Model for Cochlear Implant Users","authors":"Sungmin Lee, Sara Akbarzadeh, Satnam Singh, Chin-Tuan Tan","doi":"10.23919/APSIPA.2018.8659620","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659620","url":null,"abstract":"In sinusoidal modeling(SM), speech signal, which is pseudo-periodic in structure, can be approximated by sinusoids and noise without losing significant speech information. A speech processing strategy based on this sinusoidal speech model will be relevant for encoding electric pulse streams in cochlear implant (CI) processing, where the number of channels available is limited. In this study, 5 normal hearing(NH) listeners and 2 CI users were asked to perform the task of speech recognition and perceived sound quality rating on speech sentences processed in 12 different test conditions. The sinusoidal analysis/synthesis algorithm was limited to 1, 3 or 6 sinusoids from the sentences low-pass filtered at either 1 kHz, 1.5 kHz, 3 kHz, or 6 kHz, re-synthesized as the test conditions. Each of 12 lists of AzBio sentences was randomly chosen and process with one of 12 test conditions, before they were presented to each participant at 65 dB SPL (Sound Pressure Level). Participant was instructed to repeat the sentence as they perceived, and the number of words correctly recognized was scored. They were also asked to rate the perceived sound quality of the sentences including original speech sentence, on the scale of 1 (distorted) to 10 (clean). Both speech recognition score and perceived sound quality rating across all participants increase when the number of sinusoids increases and low-pass filter broadens. Our current finding showed that three sinusoids may be sufficient to elicit the nearly maximum speech intelligibility and quality necessary for both NH and CI listeners. Sinusoidal speech model has the potential in facilitating the basis for a speech processing strategy in CI.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127386133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly Labeled Learning Using BLSTM-CTC for Sound Event Detection 基于BLSTM-CTC的弱标记学习用于声音事件检测
Taiki Matsuyoshi, Tatsuya Komatsu, Reishi Kondo, Takeshi Yamada, S. Makino
In this paper, we propose a method of weakly labeled learning of bidirectional long short-term memory (BLSTM) using connectionist temporal classification (BLSTM-CTC) to reduce the hand-labeling cost of learning samples. BLSTM-CTC enables us to update the parameters of BLSTM by loss calculation using CTC, instead of the exact error calculation that cannot be conducted when using weakly labeled samples, which have only the event class of each individual sound event. In the proposed method, we first conduct strongly labeled learning of BLSTM using a small amount of strongly labeled samples, which have the timestamps of the beginning and end of each individual sound event and its event class, as initial learning. We then conduct weakly labeled learning based on BLSTM-CTC using a large amount of weakly labeled samples as additional learning. To evaluate the performance of the proposed method, we conducted a sound event detection experiment using the dataset provided by Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 2. As a result, the proposed method improved the segment-based F1 score by 1.9% compared with the initial learning mentioned above. Furthermore, it succeeded in reducing the labeling cost by 95%, although the F1 score was degraded by 1.3%, comparing with additional learning using a large amount of strongly labeled samples. This result confirms that our weakly labeled learning is effective for learning BLSTM with a low hand-labeling cost.
本文提出了一种基于连接时间分类(connectionist temporal classification, BLSTM- ctc)的双向长短期记忆弱标记学习方法,以减少学习样本的手工标记成本。BLSTM-CTC使我们能够通过CTC的损失计算来更新BLSTM的参数,而不是使用弱标记样本时无法进行精确的误差计算,因为弱标记样本只有每个单个声音事件的事件类别。在本文提出的方法中,我们首先使用少量的强标记样本进行BLSTM的强标记学习,这些样本具有每个单个声音事件及其事件类的开始和结束的时间戳作为初始学习。然后,我们使用大量的弱标记样本作为额外的学习,基于BLSTM-CTC进行弱标记学习。为了评估该方法的性能,我们使用声学场景和事件检测与分类(DCASE) 2016任务2提供的数据集进行了声音事件检测实验。结果表明,本文提出的方法与前面提到的初始学习相比,将基于片段的F1分数提高了1.9%。此外,与使用大量强标记样本进行额外学习相比,它成功地将标记成本降低了95%,尽管F1分数下降了1.3%。这一结果证实了我们的弱标记学习方法对于学习BLSTM是有效的,并且人工标记成本很低。
{"title":"Weakly Labeled Learning Using BLSTM-CTC for Sound Event Detection","authors":"Taiki Matsuyoshi, Tatsuya Komatsu, Reishi Kondo, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPA.2018.8659528","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659528","url":null,"abstract":"In this paper, we propose a method of weakly labeled learning of bidirectional long short-term memory (BLSTM) using connectionist temporal classification (BLSTM-CTC) to reduce the hand-labeling cost of learning samples. BLSTM-CTC enables us to update the parameters of BLSTM by loss calculation using CTC, instead of the exact error calculation that cannot be conducted when using weakly labeled samples, which have only the event class of each individual sound event. In the proposed method, we first conduct strongly labeled learning of BLSTM using a small amount of strongly labeled samples, which have the timestamps of the beginning and end of each individual sound event and its event class, as initial learning. We then conduct weakly labeled learning based on BLSTM-CTC using a large amount of weakly labeled samples as additional learning. To evaluate the performance of the proposed method, we conducted a sound event detection experiment using the dataset provided by Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 2. As a result, the proposed method improved the segment-based F1 score by 1.9% compared with the initial learning mentioned above. Furthermore, it succeeded in reducing the labeling cost by 95%, although the F1 score was degraded by 1.3%, comparing with additional learning using a large amount of strongly labeled samples. This result confirms that our weakly labeled learning is effective for learning BLSTM with a low hand-labeling cost.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133795308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Study on Indoor Dimming Method Utilizing Outside Light for Power Saving 利用外界光节能的室内调光方法研究
Kengo Sasaki, E. Okamoto
In the next generation power networks, more energy saving and energy-efficient network are required. One of the solutions is a location-aware energy distribution scheme, where persons' location is accurately estimated by a centimeter-order indoor localization scheme and the energy is preferentially allocated to the electric equipment near the persons. As one of its applications, there is an energy-saving indoor lighting control scheme exploiting person's location information and the estimated illumination intensity, and large energy saving effects are obtained. We have proposed an indoor diming scheme that considers an external light in previous studies. However, in the previous study, advanced intensity measurements at many reference points were required. Therefore, in this paper, we propose an energy-saving indoor lighting control method that uses an estimated external light to reduce the measurement points. Numerical results show the advanced performance of the proposed method.
在下一代电网中,需要更多的节能和高能效的网络。其中一种解决方案是位置感知的能量分配方案,通过厘米级室内定位方案精确估计人员的位置,并优先将能量分配给人员附近的电气设备。作为其应用之一,提出了一种利用人的位置信息和估计照度的室内照明节能控制方案,取得了较大的节能效果。在之前的研究中,我们提出了一种考虑外部光线的室内调光方案。然而,在先前的研究中,需要在许多参考点进行先进的强度测量。因此,在本文中,我们提出了一种节能的室内照明控制方法,使用估计的外部光来减少测量点。数值结果表明了该方法的优越性。
{"title":"A Study on Indoor Dimming Method Utilizing Outside Light for Power Saving","authors":"Kengo Sasaki, E. Okamoto","doi":"10.23919/APSIPA.2018.8659602","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659602","url":null,"abstract":"In the next generation power networks, more energy saving and energy-efficient network are required. One of the solutions is a location-aware energy distribution scheme, where persons' location is accurately estimated by a centimeter-order indoor localization scheme and the energy is preferentially allocated to the electric equipment near the persons. As one of its applications, there is an energy-saving indoor lighting control scheme exploiting person's location information and the estimated illumination intensity, and large energy saving effects are obtained. We have proposed an indoor diming scheme that considers an external light in previous studies. However, in the previous study, advanced intensity measurements at many reference points were required. Therefore, in this paper, we propose an energy-saving indoor lighting control method that uses an estimated external light to reduce the measurement points. Numerical results show the advanced performance of the proposed method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"323 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124295125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Diversification Strategy for IIR Filter Design Using PSO 基于粒子群算法的IIR滤波器多样化设计策略
Y. Takase, K. Suyama
IIR (Infinite Impulse Response) filter design problem is a non-linear optimization problem. Because PSO (Particle Swarm Optimization) can enumerate solution candidates quickly, it is known as an effective method for such a problem. However, PSO has a drawback that tends to indicate a premature convergence due to a strong directivity. In this paper, PSS (Problem Space Stretch)-PSO is verified to avoid the local minimum stagnation. Several design examples are shown to present the effectiveness of the method.
无限脉冲响应滤波器设计问题是一个非线性优化问题。由于粒子群优化算法能够快速枚举候选解,因此被认为是解决这类问题的有效方法。然而,PSO有一个缺点,即由于强指向性,往往表明过早收敛。本文验证了PSS (Problem Space Stretch)-PSO算法可以避免局部最小停滞。算例表明了该方法的有效性。
{"title":"A Diversification Strategy for IIR Filter Design Using PSO","authors":"Y. Takase, K. Suyama","doi":"10.23919/APSIPA.2018.8659771","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659771","url":null,"abstract":"IIR (Infinite Impulse Response) filter design problem is a non-linear optimization problem. Because PSO (Particle Swarm Optimization) can enumerate solution candidates quickly, it is known as an effective method for such a problem. However, PSO has a drawback that tends to indicate a premature convergence due to a strong directivity. In this paper, PSS (Problem Space Stretch)-PSO is verified to avoid the local minimum stagnation. Several design examples are shown to present the effectiveness of the method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116006038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1