首页 > 最新文献

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Investigating Co-Prime Microphone Arrays for Speech Direction of Arrival Estimation 用于语音到达方向估计的共素麦克风阵列研究
Jiahong Zhao, C. Ritz
This paper investigates the application of the steered response power - phase transform (SRP-PHAT) method to coprime microphone array (CPMA) recordings to estimate the direction of arrival (DOA) of speech sources. While existing CPMA approaches for acoustics applications are limited, especially under reverberant conditions, the proposed algorithm utilises SRP-PHAT to estimate the DOA of speech sources and then employs a histogram-based stochastic algorithm using steered response power (SRP) adjustment and kernel density evaluation (KDE) to improve the DOA estimation accuracy. Experiments are conducted for up to three simultaneous speech sources in the far field considering both anechoic and reverberant scenarios. Results suggest that the proposed approach achieves more accurate DOA estimates than a uniform linear array (ULA) with the same number of microphones under both anechoic and low reverberant conditions, and it significantly decreases the number of microphones of another equivalent ULA while maintaining similar performances. Moreover, the operating frequency of the microphone array is largely increased without changing the number of microphones, making it possible to accurately record higher-frequency components of source signals.
本文研究了将转向响应功率相位变换(SRP-PHAT)方法应用于同质麦克风阵列(CPMA)录音中,以估计语音源的到达方向(DOA)。虽然现有的声学应用CPMA方法有限,特别是在混响条件下,该算法利用SRP- phat估计语音源的DOA,然后采用基于直方图的随机算法,使用转向响应功率(SRP)调整和核密度评估(KDE)来提高DOA估计精度。在远场实验中,考虑了消声和混响两种情况,对多达三个同步语音源进行了实验。结果表明,在消声和低混响条件下,该方法比具有相同麦克风数量的均匀线性阵列(ULA)获得了更精确的DOA估计,并且在保持相似性能的同时显着减少了另一等效ULA的麦克风数量。此外,在不改变麦克风数量的情况下,大幅度提高了麦克风阵列的工作频率,使得准确记录源信号的高频分量成为可能。
{"title":"Investigating Co-Prime Microphone Arrays for Speech Direction of Arrival Estimation","authors":"Jiahong Zhao, C. Ritz","doi":"10.23919/APSIPA.2018.8659626","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659626","url":null,"abstract":"This paper investigates the application of the steered response power - phase transform (SRP-PHAT) method to coprime microphone array (CPMA) recordings to estimate the direction of arrival (DOA) of speech sources. While existing CPMA approaches for acoustics applications are limited, especially under reverberant conditions, the proposed algorithm utilises SRP-PHAT to estimate the DOA of speech sources and then employs a histogram-based stochastic algorithm using steered response power (SRP) adjustment and kernel density evaluation (KDE) to improve the DOA estimation accuracy. Experiments are conducted for up to three simultaneous speech sources in the far field considering both anechoic and reverberant scenarios. Results suggest that the proposed approach achieves more accurate DOA estimates than a uniform linear array (ULA) with the same number of microphones under both anechoic and low reverberant conditions, and it significantly decreases the number of microphones of another equivalent ULA while maintaining similar performances. Moreover, the operating frequency of the microphone array is largely increased without changing the number of microphones, making it possible to accurately record higher-frequency components of source signals.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127104814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Semi-Supervised NMF in the chroma Domain Applied to Music Harmony Estimation 色度域半监督NMF在音乐和声估计中的应用
Takuya Takahashi, T. Hori, Christoph M. Wilk, S. Sagayama
In this paper, we discuss non-negative matrix factorization (NMF) applied to chroma feature sequences to reduce the chroma-specific noise in chord estimation from music signals using the hidden Markov model (HMM). Even in the case of single pitch sounds, the raw 12-dimensional chroma vectors obtained from the music signal by summing and normalizing the spectrum by octaves often contain irrelevant components such as non-octave overtones falling into different pitch classes and cause inaccuracies in estimation of harmonies. NMF applied to the chroma domain is expected to suppress such chroma components in the NMF activation matrix caused by overtones, and thus “purifies” the noisy chroma vectors. By reducing the dimensionality to 12 dimensions as opposed to NMF applied to the raw spectrum, we expect advantages with respect to statistical robustness as well as computational cost for pitch class estimation of single and multiple tones. We use the “purified” chroma vectors in combination with a harmony progression model based on an HMM where the NMF activation distributions are modeled as observations associated with hidden harmonies, whose transition probabilities have been obtained statistically. We attempt to improve harmony estimation accuracy by combining suppression of irrelevant components and the HMM-based harmony model. In the experimental evaluation, we demonstrate the reduction of irrelevant components in raw chroma vectors computed from recordings of musical instruments. In addition, using music audio data with harmony annotation from the RWC database, we compare the harmony estimation accuracies using our method and conventional chroma.
本文讨论了将非负矩阵分解(NMF)应用于色度特征序列,以降低隐马尔可夫模型(HMM)在音乐信号和弦估计中的色度噪声。即使在单音高的情况下,通过按八度对频谱求和和归一化而从音乐信号中获得的原始12维色度向量通常包含不相关的成分,例如落入不同音高类别的非八度泛音,从而导致和声估计的不准确。应用于色度域的NMF有望抑制由泛音引起的NMF激活矩阵中的色度成分,从而“净化”有噪声的色度向量。通过将维数降至12维,而不是将NMF应用于原始频谱,我们期望在统计鲁棒性以及单个和多个音调的音高类别估计的计算成本方面具有优势。我们将“纯化”色度向量与基于HMM的和声级数模型结合使用,其中NMF激活分布被建模为与隐藏和声相关的观测值,其转移概率已统计获得。我们尝试将不相关分量的抑制与基于hmm的和声模型相结合来提高和声估计的精度。在实验评估中,我们展示了从乐器录音中计算的原始色度向量中不相关成分的减少。此外,利用RWC数据库中带有和声标注的音乐音频数据,比较了本文方法与传统色度方法的和声估计精度。
{"title":"Semi-Supervised NMF in the chroma Domain Applied to Music Harmony Estimation","authors":"Takuya Takahashi, T. Hori, Christoph M. Wilk, S. Sagayama","doi":"10.23919/APSIPA.2018.8659645","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659645","url":null,"abstract":"In this paper, we discuss non-negative matrix factorization (NMF) applied to chroma feature sequences to reduce the chroma-specific noise in chord estimation from music signals using the hidden Markov model (HMM). Even in the case of single pitch sounds, the raw 12-dimensional chroma vectors obtained from the music signal by summing and normalizing the spectrum by octaves often contain irrelevant components such as non-octave overtones falling into different pitch classes and cause inaccuracies in estimation of harmonies. NMF applied to the chroma domain is expected to suppress such chroma components in the NMF activation matrix caused by overtones, and thus “purifies” the noisy chroma vectors. By reducing the dimensionality to 12 dimensions as opposed to NMF applied to the raw spectrum, we expect advantages with respect to statistical robustness as well as computational cost for pitch class estimation of single and multiple tones. We use the “purified” chroma vectors in combination with a harmony progression model based on an HMM where the NMF activation distributions are modeled as observations associated with hidden harmonies, whose transition probabilities have been obtained statistically. We attempt to improve harmony estimation accuracy by combining suppression of irrelevant components and the HMM-based harmony model. In the experimental evaluation, we demonstrate the reduction of irrelevant components in raw chroma vectors computed from recordings of musical instruments. In addition, using music audio data with harmony annotation from the RWC database, we compare the harmony estimation accuracies using our method and conventional chroma.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving SVM Computing in the Encrypted Domain 加密域的隐私保护SVM计算
Takahiro Maekawa, Ayana Kawamura, Yuma Kinoshita, H. Kiya
Privacy-preserving Support Vector Machine (SVM) computing scheme is proposed in this paper. Cloud computing has been spreading in many fields. However, the cloud computing has some serious issues for end users, such as unauthorized use and leak of data, and privacy compromise. We focus on templates protected by a block scrambling-based encryption scheme, and consider some properties of the protected templates for secure SVM computing, where templates mean features extracted from data. The proposed scheme enables us not only to protect templates, but also to have the same performance as that of unprotected templates under some useful kernel functions. Moreover, it can be directly carried out by using well-known SVM algorithms, without preparing any algorithms specialized for secure SVM computing. In an experiment, the pfroposed scheme is applied to a face-based authentication algorithm with SVM classifiers to confirm the effectiveness.
提出了一种保护隐私的支持向量机(SVM)计算方案。云计算已经在许多领域得到推广。然而,云计算对于最终用户来说存在一些严重的问题,例如未经授权的使用和数据泄漏以及隐私泄露。我们重点研究了基于块扰频的加密方案保护的模板,并考虑了安全支持向量机计算中受保护模板的一些属性,其中模板是指从数据中提取的特征。提出的方案不仅可以保护模板,而且在一些有用的内核函数下具有与未保护模板相同的性能。而且,它可以直接使用已知的SVM算法来实现,而不需要编写任何专门用于安全SVM计算的算法。在实验中,将该方案应用于基于人脸的SVM分类器认证算法,验证了该方案的有效性。
{"title":"Privacy-Preserving SVM Computing in the Encrypted Domain","authors":"Takahiro Maekawa, Ayana Kawamura, Yuma Kinoshita, H. Kiya","doi":"10.23919/APSIPA.2018.8659529","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659529","url":null,"abstract":"Privacy-preserving Support Vector Machine (SVM) computing scheme is proposed in this paper. Cloud computing has been spreading in many fields. However, the cloud computing has some serious issues for end users, such as unauthorized use and leak of data, and privacy compromise. We focus on templates protected by a block scrambling-based encryption scheme, and consider some properties of the protected templates for secure SVM computing, where templates mean features extracted from data. The proposed scheme enables us not only to protect templates, but also to have the same performance as that of unprotected templates under some useful kernel functions. Moreover, it can be directly carried out by using well-known SVM algorithms, without preparing any algorithms specialized for secure SVM computing. In an experiment, the pfroposed scheme is applied to a face-based authentication algorithm with SVM classifiers to confirm the effectiveness.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114063091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Visual Saliency Detection Algorithm in Compressed HEVC Domain 压缩HEVC域的视觉显著性检测算法
Rui Bai, Wei Zhou, Guanwen Zhang, Henglu Wei
Saliency detection has been widely used to predict human fixation. In this paper, a Visual Saliency Detection Algorithm in Compressed HEVC Domain is proposed which consists of three parts: static saliency detection, dynamic saliency detection and competitive fusion. Firstly, the Gauss model is used to filter out the background of the static features which are extracted by down-sampling and DCT. Secondly, the motion vectors are used to represent the dynamic feature. Then the dynamic saliency is calculated by filtering out the background of dynamic feature. Finally, the competitive fusion model is used to adaptively combine the characteristic of static and dynamic saliency maps. Experimental results show that the proposed method is superior to classic state-of-the-art saliency detection methods with 0.05 AUC value increasing and 0.17 KL divergence decreasing on average. The average time of one frame detection is 2.3 seconds.
显著性检测已被广泛用于预测人类注视。本文提出了一种压缩HEVC域的视觉显著性检测算法,该算法由静态显著性检测、动态显著性检测和竞争融合三部分组成。首先,利用高斯模型对下采样和DCT提取的静态特征进行背景滤波;其次,用运动向量表示动态特征;然后通过滤除动态特征的背景来计算动态显著性。最后,采用竞争融合模型对静态显著性图和动态显著性图的特征进行自适应融合。实验结果表明,该方法的AUC值平均提高0.05,KL散度平均降低0.17,优于经典的显著性检测方法。一帧检测的平均时间为2.3秒。
{"title":"Visual Saliency Detection Algorithm in Compressed HEVC Domain","authors":"Rui Bai, Wei Zhou, Guanwen Zhang, Henglu Wei","doi":"10.23919/APSIPA.2018.8659565","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659565","url":null,"abstract":"Saliency detection has been widely used to predict human fixation. In this paper, a Visual Saliency Detection Algorithm in Compressed HEVC Domain is proposed which consists of three parts: static saliency detection, dynamic saliency detection and competitive fusion. Firstly, the Gauss model is used to filter out the background of the static features which are extracted by down-sampling and DCT. Secondly, the motion vectors are used to represent the dynamic feature. Then the dynamic saliency is calculated by filtering out the background of dynamic feature. Finally, the competitive fusion model is used to adaptively combine the characteristic of static and dynamic saliency maps. Experimental results show that the proposed method is superior to classic state-of-the-art saliency detection methods with 0.05 AUC value increasing and 0.17 KL divergence decreasing on average. The average time of one frame detection is 2.3 seconds.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114300222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Measuring Infant's Length with an Image 用图像测量婴儿的长度
Maolong Tang, Ming-Ting Sun, Leonardo Seda, J. Swanson, Zhengyou Zhang
It is important to measure an infant's length regularly to estimate the growth velocity to make sure that the infant is growing normally. Traditionally, measuring an infant's length is performed with an infantometer. However, the infant struggles and cries in the measuring process, and it often needs three persons to position the infant's head, legs, and the boards of the infantometer during the process. Thus, it is not practical for a parent to perform this measurement at home regularly. In this paper, we propose a new approach which allows the measurement of an infant's length using a cellphone picture without the need to position the infant. Our algorithm automatically calculates the 3D positions of the body parts and the total length of the infant with the help of round stickers. The round stickers can be put on the infant's body easily in a few seconds, before the picture is taken. This new technology would make frequent measurements of the infant's length and the tracking of the growth velocity possible.
定期测量婴儿的体长以估计生长速度以确保婴儿正常生长是很重要的。传统上,测量婴儿的长度是用婴儿计进行的。然而,在测量过程中,婴儿会挣扎和哭泣,并且在测量过程中往往需要三个人来定位婴儿的头,腿和婴儿计步器的板。因此,家长在家里定期进行这种测量是不实际的。在本文中,我们提出了一种新的方法,允许使用手机图片测量婴儿的长度,而不需要定位婴儿。我们的算法在圆形贴纸的帮助下自动计算出身体部位的三维位置和婴儿的总长度。这种圆形贴纸可以在拍照前的几秒钟内轻松地贴在婴儿身上。这项新技术将使频繁测量婴儿的长度和跟踪生长速度成为可能。
{"title":"Measuring Infant's Length with an Image","authors":"Maolong Tang, Ming-Ting Sun, Leonardo Seda, J. Swanson, Zhengyou Zhang","doi":"10.23919/APSIPA.2018.8659482","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659482","url":null,"abstract":"It is important to measure an infant's length regularly to estimate the growth velocity to make sure that the infant is growing normally. Traditionally, measuring an infant's length is performed with an infantometer. However, the infant struggles and cries in the measuring process, and it often needs three persons to position the infant's head, legs, and the boards of the infantometer during the process. Thus, it is not practical for a parent to perform this measurement at home regularly. In this paper, we propose a new approach which allows the measurement of an infant's length using a cellphone picture without the need to position the infant. Our algorithm automatically calculates the 3D positions of the body parts and the total length of the infant with the help of round stickers. The round stickers can be put on the infant's body easily in a few seconds, before the picture is taken. This new technology would make frequent measurements of the infant's length and the tracking of the growth velocity possible.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122557015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Microphone Position Realignment by Extrapolation of Virtual Microphone 基于虚拟麦克风外推的麦克风位置调整
R. Jinzai, K. Yamaoka, Mitsuo Matsumoto, Takeshi Yamada, S. Makino
In this paper, microphone realignment by phase extrapolation using the virtual microphone technique for reproducing binaural signals with adequate the interaural time differences (ITDs) for a listener is proposed. For a sound source in the horizontal plane, ITDs are a major cues for localizing a sound image. Since ITDs are not considered for headphones listening in conventional amplitude panning in multichannel recording, sound images are localized inside the head (lateralization). A microphone array is applicable to recording signals with time differences corresponding to the directions of sound sources. Since microphones in such an array are closely positioned, the time differences are inappropriate as ITDs for localizing sound images for the sources. In this paper, phase extrapolation using the virtual microphone technique is applied to the virtual realignment of a microphone in such an array for restoring ITD. In the experiments with two speeches as sound sources located at the leftmost and the rightmost positions from the viewpoint of two real microphones positioned 2.83 cm apart. Furthermore, the phase of a signal of a virtual realigned microphone is extrapolated eight times as much as the phase between the two real microphones. Time differences between signals of one of the real microphones and the realigned one are observed to be $-500 boldsymbol{mu}mathbf{s}$ for the source on the left and $500 boldsymbol{mu}mathbf{s}$ for the source on the right. Furthermore, the interaural cross correlations of the two signals suggest that sound images will be perceived on both the left and right of a listener. In this method, it is expected that prior information on the number of sources and the direction of arrival is not required, and the adjustment of individual differences is easy.
本文提出了一种利用虚拟麦克风技术进行相位外推的麦克风调整,以再现双耳信号,并为听者提供足够的耳间时差(ITDs)。对于水平面上的声源,过渡段是声像定位的主要线索。由于在多声道录音中,在传统的振幅平移中,不考虑过渡段用于耳机收听,因此声音图像定位在头部内部(侧向化)。一种麦克风阵列,适用于记录声源方向对应的具有时间差的信号。由于这种阵列中的麦克风位置很近,因此时差不适合作为定位声源声音图像的过渡段。本文将基于虚拟传声器的相位外推技术应用于该阵列中传声器的虚拟调整以恢复过渡段。在实验中,两个演讲作为声源分别位于最左边和最右边的位置,从两个相距2.83 cm的真实麦克风的角度来看。此外,一个虚拟重新排列麦克风的信号相位是两个真实麦克风之间相位的八倍。观察到其中一个真实麦克风的信号与重新排列的麦克风的信号之间的时间差为$-500 boldsymbol{mu}mathbf{s}$,左侧源为$-500 boldsymbol{mu}mathbf{s}$,右侧源为$500 boldsymbol{mu}mathbf{s}$。此外,两种信号的耳间相互关系表明,听者的左右两侧都能感知到声音图像。在这种方法中,期望不需要先验信息,如源的数量和到达的方向,并且个体差异的调整很容易。
{"title":"Microphone Position Realignment by Extrapolation of Virtual Microphone","authors":"R. Jinzai, K. Yamaoka, Mitsuo Matsumoto, Takeshi Yamada, S. Makino","doi":"10.23919/APSIPA.2018.8659728","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659728","url":null,"abstract":"In this paper, microphone realignment by phase extrapolation using the virtual microphone technique for reproducing binaural signals with adequate the interaural time differences (ITDs) for a listener is proposed. For a sound source in the horizontal plane, ITDs are a major cues for localizing a sound image. Since ITDs are not considered for headphones listening in conventional amplitude panning in multichannel recording, sound images are localized inside the head (lateralization). A microphone array is applicable to recording signals with time differences corresponding to the directions of sound sources. Since microphones in such an array are closely positioned, the time differences are inappropriate as ITDs for localizing sound images for the sources. In this paper, phase extrapolation using the virtual microphone technique is applied to the virtual realignment of a microphone in such an array for restoring ITD. In the experiments with two speeches as sound sources located at the leftmost and the rightmost positions from the viewpoint of two real microphones positioned 2.83 cm apart. Furthermore, the phase of a signal of a virtual realigned microphone is extrapolated eight times as much as the phase between the two real microphones. Time differences between signals of one of the real microphones and the realigned one are observed to be $-500 boldsymbol{mu}mathbf{s}$ for the source on the left and $500 boldsymbol{mu}mathbf{s}$ for the source on the right. Furthermore, the interaural cross correlations of the two signals suggest that sound images will be perceived on both the left and right of a listener. In this method, it is expected that prior information on the number of sources and the direction of arrival is not required, and the adjustment of individual differences is easy.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123046519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs 基于特征学习的RNN-LMs领域自适应隐藏单元贡献
Michael Hentschel, Marc Delcroix, A. Ogawa, T. Nakatani
In recent years, many approaches have been proposed for domain adaptation of neural network language models. These methods can be separated into two categories. The first is model-based adaptation, which creates a domain specific language model by re-training the weights in the network on the in-domain data. This requires domain annotation in the training and test data. The second is feature-based adaptation, which uses topic features to perform mainly bias adaptation of network input or output layers in an unsupervised manner. Recently, a scheme called learning hidden unit contributions was proposed for acoustic model adaptation. We propose applying this scheme to feature-based domain adaptation of recurrent neural network language model. In addition, we also investigate the combination of this approach with bias-based domain adaptation. For the experiments, we use a corpus based on TED talks and the CSJ lecture corpus to show perplexity and speech recognition results. Our proposed method consistently outperforms a pure non-adapted baseline and the combined approach can improve on pure bias adaptation.
近年来,人们提出了许多神经网络语言模型的领域自适应方法。这些方法可以分为两类。第一种是基于模型的自适应,它通过在域内数据上重新训练网络中的权值来创建特定于域的语言模型。这需要在训练和测试数据中进行领域注释。二是基于特征的自适应,主要利用主题特征以无监督的方式对网络输入或输出层进行偏差自适应。最近,一种被称为学习隐藏单元贡献的方案被提出用于声学模型自适应。我们提出将该方案应用于基于特征的递归神经网络语言模型的领域自适应。此外,我们还研究了该方法与基于偏差的领域自适应的结合。在实验中,我们使用基于TED演讲的语料库和CSJ讲座语料库来展示困惑和语音识别结果。我们提出的方法始终优于纯非自适应基线,并且组合方法可以改进纯偏差自适应。
{"title":"Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs","authors":"Michael Hentschel, Marc Delcroix, A. Ogawa, T. Nakatani","doi":"10.23919/APSIPA.2018.8659468","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659468","url":null,"abstract":"In recent years, many approaches have been proposed for domain adaptation of neural network language models. These methods can be separated into two categories. The first is model-based adaptation, which creates a domain specific language model by re-training the weights in the network on the in-domain data. This requires domain annotation in the training and test data. The second is feature-based adaptation, which uses topic features to perform mainly bias adaptation of network input or output layers in an unsupervised manner. Recently, a scheme called learning hidden unit contributions was proposed for acoustic model adaptation. We propose applying this scheme to feature-based domain adaptation of recurrent neural network language model. In addition, we also investigate the combination of this approach with bias-based domain adaptation. For the experiments, we use a corpus based on TED talks and the CSJ lecture corpus to show perplexity and speech recognition results. Our proposed method consistently outperforms a pure non-adapted baseline and the combined approach can improve on pure bias adaptation.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127307164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DSP Implementation of Adaptive Notch Filters With Overflow Avoidance in Fixed-Point Arithmetic 基于定点算法的自适应陷波滤波器的DSP实现
Satoru Ishibashi, S. Koshita, M. Abe, M. Kawamata
In this paper, we implement adaptive notch filters with constrained poles and zeros (CPZ-ANFs) using fixed-point DSP. Since the CPZ-ANFs are IIR filters that have narrow notch width, a signal can be amplified significantly in their feedback loops. Therefore, direct-form II structure suffers from high probability of overflow in its internal state. When an overflow occurs in internal state of filters, inaccurate values due to the overflow are used repeatedly to calculate the output signal of the filters. As a result, the filters do not operate correctly and therefore we have to prevent such overflow. In order to avoid the overflow, we use direct-form I structure in implementation of the CPZ-ANFs. Experimental results show that our method allows the CPZ-ANFs to operate properly on the fixed-point DSP.
在本文中,我们使用定点DSP实现了具有约束极点和零点的自适应陷波滤波器(CPZ-ANFs)。由于cpz - anf是具有窄陷波宽度的IIR滤波器,信号可以在其反馈回路中显着放大。因此,直接型II结构在其内部状态下发生溢流的概率很大。当滤波器内部状态发生溢出时,反复使用由于溢出而产生的不准确值来计算滤波器的输出信号。因此,过滤器不能正常运行,因此我们必须防止这种溢出。为了避免溢出,我们在实现cpz - anf时使用直接形式I结构。实验结果表明,该方法可以使cpz - anf在定点DSP上正常工作。
{"title":"DSP Implementation of Adaptive Notch Filters With Overflow Avoidance in Fixed-Point Arithmetic","authors":"Satoru Ishibashi, S. Koshita, M. Abe, M. Kawamata","doi":"10.23919/APSIPA.2018.8659673","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659673","url":null,"abstract":"In this paper, we implement adaptive notch filters with constrained poles and zeros (CPZ-ANFs) using fixed-point DSP. Since the CPZ-ANFs are IIR filters that have narrow notch width, a signal can be amplified significantly in their feedback loops. Therefore, direct-form II structure suffers from high probability of overflow in its internal state. When an overflow occurs in internal state of filters, inaccurate values due to the overflow are used repeatedly to calculate the output signal of the filters. As a result, the filters do not operate correctly and therefore we have to prevent such overflow. In order to avoid the overflow, we use direct-form I structure in implementation of the CPZ-ANFs. Experimental results show that our method allows the CPZ-ANFs to operate properly on the fixed-point DSP.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122378042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How do people construct mutual beliefs in task-oriented dialogues? 在任务导向的对话中,人们是如何构建共同信念的?
Yoshiko Kawabata, Toshihiko Matsuka
The present study investigates how mutual beliefs are achieved by examining the relationship between actual behaviors and utterances in task-oriented dialogues. According to a widely accepted model, mutual belief about a task is considered to be achieved when a listener accepted utterances about the task given by another agent and gives some signs of task completion to the agent. However, by analyzing Japanese Map Task Dialogue Corpus (JMTDC), we found vast majority of conversations (94%) did not follow what was suggested by the model. We categorized those non-standard dialogues into six categories, namely, delayed acceptance, premature sign of completion, execution postponement, silent adjustment, unconfirmed, and indirection. We further analyzed those six categories carefully to see how and when participants were able to achieve mutual belief in the dialogues.
本研究通过考察任务导向对话中实际行为与话语之间的关系来探讨相互信念是如何形成的。根据一个被广泛接受的模型,当听者接受另一个智能体给出的关于任务的话语,并给出任务完成的一些信号时,就认为实现了对任务的共同信念。然而,通过分析日本地图任务对话语料库(JMTDC),我们发现绝大多数对话(94%)没有遵循模型的建议。我们将这些非标准对话分为6类,即延迟接受、提前完成标志、延迟执行、静默调整、未确认和间接。我们进一步仔细分析了这六个类别,以了解参与者如何以及何时能够在对话中实现相互信任。
{"title":"How do people construct mutual beliefs in task-oriented dialogues?","authors":"Yoshiko Kawabata, Toshihiko Matsuka","doi":"10.23919/APSIPA.2018.8659453","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659453","url":null,"abstract":"The present study investigates how mutual beliefs are achieved by examining the relationship between actual behaviors and utterances in task-oriented dialogues. According to a widely accepted model, mutual belief about a task is considered to be achieved when a listener accepted utterances about the task given by another agent and gives some signs of task completion to the agent. However, by analyzing Japanese Map Task Dialogue Corpus (JMTDC), we found vast majority of conversations (94%) did not follow what was suggested by the model. We categorized those non-standard dialogues into six categories, namely, delayed acceptance, premature sign of completion, execution postponement, silent adjustment, unconfirmed, and indirection. We further analyzed those six categories carefully to see how and when participants were able to achieve mutual belief in the dialogues.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124439573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Rate Control Algorithm for HEVC Considering Visual Saliency 一种考虑视觉显著性的HEVC速率控制算法
Henglu Wei, Wei Zhou, Rui Bai, Zhemin Duan
In this paper, visual saliency is used to guide the coding tree unit (CTU) level bit allocation process in high efficiency video coding (HEVC) to improve the visual quality. At first, a saliency detection algorithm is proposed. With the detected saliency map, the distortion of each CTU is weighted by the corresponding saliency, so that the distortion of the salient areas is more critical. Then, the optimal bit allocation problem constraint by the picture level target bits and minimum quality fluctuation is built. Numerical method is used to solve the bit allocation problem. Experiment results show that quality gaining in salient areas is up to 0.8658 dB, the gaining of saliency weighted PSNR is up to 1.0318 dB.
本文利用视觉显著性来指导高效视频编码(HEVC)中编码树单元(CTU)级的比特分配过程,以提高视频编码的视觉质量。首先提出了一种显著性检测算法。在检测到的显著性图中,每个CTU的失真程度由对应的显著性加权,使得显著区域的失真程度更为严重。然后,建立了以图像级目标比特和最小质量波动为约束的最优比特分配问题。采用数值方法解决了比特分配问题。实验结果表明,显著区质量增益可达0.8658 dB,显著加权PSNR增益可达1.0318 dB。
{"title":"A Rate Control Algorithm for HEVC Considering Visual Saliency","authors":"Henglu Wei, Wei Zhou, Rui Bai, Zhemin Duan","doi":"10.23919/APSIPA.2018.8659729","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659729","url":null,"abstract":"In this paper, visual saliency is used to guide the coding tree unit (CTU) level bit allocation process in high efficiency video coding (HEVC) to improve the visual quality. At first, a saliency detection algorithm is proposed. With the detected saliency map, the distortion of each CTU is weighted by the corresponding saliency, so that the distortion of the salient areas is more critical. Then, the optimal bit allocation problem constraint by the picture level target bits and minimum quality fluctuation is built. Numerical method is used to solve the bit allocation problem. Experiment results show that quality gaining in salient areas is up to 0.8658 dB, the gaining of saliency weighted PSNR is up to 1.0318 dB.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128992471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1