首页 > 最新文献

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Efficient edge-oriented based image interpolation algorithm for non-integer scaling factor 基于边缘的非整数比例因子图像插值算法
Chia-Chun Hsu, Jian-Jiun Ding, Yih-Cherng Lee
Though image interpolation has been developed for many years, most of state-of-the-art methods, including machine learning based methods, can only zoom the image with the scaling factor of 2, 3, 2k, or other integer values. Hence, the bicubic interpolation method is still a popular method for the non-integer scaling problem. In this paper, we propose a novel interpolation algorithm for image zooming with non-integer scaling factors based on the gradient direction. The proposed method first estimates the gradient direction for each pixel in the low resolution image. Then, we construct the gradient map for the high resolution image by the spline interpolation method. Finally, the intensity of missing pixels can be computed by the weighted sum of the pixels in the pre-defined window. To preserve the edge information during the interpolation process, the weight is determined by the inner product of the estimated gradient vector and the vector from the missing pixel to the known data point. Simulations show that the proposed method has higher performance than other non-integer time scaling methods and is helpful for superresolution.
虽然图像插值已经发展了很多年,但大多数最先进的方法,包括基于机器学习的方法,只能用缩放因子2、3、2k或其他整数值缩放图像。因此,双三次插值法仍然是求解非整数尺度问题的常用方法。本文提出了一种基于梯度方向的非整数缩放因子图像插值算法。该方法首先估计低分辨率图像中每个像素的梯度方向;然后,利用样条插值法构造高分辨率图像的梯度图。最后,通过预定义窗口中像素的加权和来计算缺失像素的强度。为了在插值过程中保留边缘信息,权重由估计的梯度向量与缺失像素到已知数据点的向量的内积确定。仿真结果表明,该方法比其他非整数时间尺度方法具有更高的性能,有助于实现超分辨率。
{"title":"Efficient edge-oriented based image interpolation algorithm for non-integer scaling factor","authors":"Chia-Chun Hsu, Jian-Jiun Ding, Yih-Cherng Lee","doi":"10.1109/APSIPA.2017.8282202","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282202","url":null,"abstract":"Though image interpolation has been developed for many years, most of state-of-the-art methods, including machine learning based methods, can only zoom the image with the scaling factor of 2, 3, 2k, or other integer values. Hence, the bicubic interpolation method is still a popular method for the non-integer scaling problem. In this paper, we propose a novel interpolation algorithm for image zooming with non-integer scaling factors based on the gradient direction. The proposed method first estimates the gradient direction for each pixel in the low resolution image. Then, we construct the gradient map for the high resolution image by the spline interpolation method. Finally, the intensity of missing pixels can be computed by the weighted sum of the pixels in the pre-defined window. To preserve the edge information during the interpolation process, the weight is determined by the inner product of the estimated gradient vector and the vector from the missing pixel to the known data point. Simulations show that the proposed method has higher performance than other non-integer time scaling methods and is helpful for superresolution.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"63 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123187549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiscale directional transforms based on cosine-sine modulated filter banks for sparse directional image representation 基于余弦-正弦调制滤波器组的多尺度方向变换稀疏定向图像表示
Yusuke Nomura, Ryutaro Ogawa, Seisuke Kyochi, Taizo Suzuki
This paper proposes multiscale directional transforms (MDTs) based on cosine-sine modulated filter banks (CSMFBs). Sparse image representation by directional transforms is necessary for image analysis and processing tasks and has been extensively studied. Conventionally, cosine-sine modulated filter banks (CSMFBs) have been proposed as one of separable directional transforms (SepDTs). Their computational cost is much lower than non-SepDTs, and they can work better than other SepDTs, e.g., dual-tree complex wavelet transforms (DTCWTs) in image processing applications. One drawback of CSMFBs is a lack of multiscale directional selectivity, i.e., it cannot provide multiple scale directional atoms as in the DTCWT frame, and thus flexible image representation cannot be achieved. In this work, we show a design method of multiscale CSMFBs by extending modulated lapped transforms, which are a subclass of CSMFBs. We confirm its effectiveness in nonlinear approximation and image denoising as a practical application.
本文提出了基于余弦-正弦调制滤波器组的多尺度方向变换。方向变换的稀疏图像表示是图像分析和处理任务所必需的,已经得到了广泛的研究。传统上,余弦-正弦调制滤波器组(CSMFBs)被认为是可分离方向变换(sepdt)的一种。它们的计算成本比非sepdt低得多,并且在图像处理应用中可以比其他sepdt(例如双树复小波变换(DTCWTs))更好地工作。csmfb的一个缺点是缺乏多尺度定向选择性,即它不能像DTCWT帧那样提供多尺度定向原子,因此无法实现灵活的图像表示。本文提出了一种多尺度csmfb的设计方法,该方法是通过扩展csmfb的一个子类调制重叠变换来实现的。通过实际应用验证了该方法在非线性逼近和图像去噪方面的有效性。
{"title":"Multiscale directional transforms based on cosine-sine modulated filter banks for sparse directional image representation","authors":"Yusuke Nomura, Ryutaro Ogawa, Seisuke Kyochi, Taizo Suzuki","doi":"10.1109/APSIPA.2017.8282331","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282331","url":null,"abstract":"This paper proposes multiscale directional transforms (MDTs) based on cosine-sine modulated filter banks (CSMFBs). Sparse image representation by directional transforms is necessary for image analysis and processing tasks and has been extensively studied. Conventionally, cosine-sine modulated filter banks (CSMFBs) have been proposed as one of separable directional transforms (SepDTs). Their computational cost is much lower than non-SepDTs, and they can work better than other SepDTs, e.g., dual-tree complex wavelet transforms (DTCWTs) in image processing applications. One drawback of CSMFBs is a lack of multiscale directional selectivity, i.e., it cannot provide multiple scale directional atoms as in the DTCWT frame, and thus flexible image representation cannot be achieved. In this work, we show a design method of multiscale CSMFBs by extending modulated lapped transforms, which are a subclass of CSMFBs. We confirm its effectiveness in nonlinear approximation and image denoising as a practical application.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134477401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classifying road surface conditions using vibration signals 利用振动信号对路面状况进行分类
Lounell B. Gueta, Akiko Sato
The paper aims to classify road surface types and conditions by characterizing the temporal and spectral features of vibration signals gathered from land roads. In the past, road surfaces have been studied for detecting road anomalies like bumps and potholes. This study extends the analysis to detect road anomalies such as patches and road gaps. In terms of temporal features such as magnitude peaks and variance, these anomalies have common features to road anomalies. Therefore, a classification method based on support vector classifier is proposed by taking into account both the temporal and spectral features of the road vibrations as well as factor such as vehicle speed. It is tested on a real data gathered by conducting a smart phone-based data collection between Thailand and Cambodia and is shown to be effective in differentiating road segments with and without anomalies. The method is applicable to undertaking appropriate road maintenance works.
本文旨在通过表征从陆地道路收集的振动信号的时间和频谱特征来分类路面类型和条件。在过去,研究路面是为了检测路面异常,如颠簸和坑洼。本研究将分析扩展到检测道路异常,如斑块和道路间隙。在震级峰值和方差等时间特征上,这些异常与道路异常具有共同的特征。为此,提出了一种基于支持向量分类器的道路振动分类方法,该方法同时考虑了道路振动的时间和频谱特征以及车速等因素。通过在泰国和柬埔寨之间进行基于智能手机的数据收集收集的真实数据进行测试,结果表明,该系统在区分有和没有异常的路段方面是有效的。该方法适用于进行适当的道路维修工程。
{"title":"Classifying road surface conditions using vibration signals","authors":"Lounell B. Gueta, Akiko Sato","doi":"10.1109/APSIPA.2017.8281999","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8281999","url":null,"abstract":"The paper aims to classify road surface types and conditions by characterizing the temporal and spectral features of vibration signals gathered from land roads. In the past, road surfaces have been studied for detecting road anomalies like bumps and potholes. This study extends the analysis to detect road anomalies such as patches and road gaps. In terms of temporal features such as magnitude peaks and variance, these anomalies have common features to road anomalies. Therefore, a classification method based on support vector classifier is proposed by taking into account both the temporal and spectral features of the road vibrations as well as factor such as vehicle speed. It is tested on a real data gathered by conducting a smart phone-based data collection between Thailand and Cambodia and is shown to be effective in differentiating road segments with and without anomalies. The method is applicable to undertaking appropriate road maintenance works.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131558193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Binaural beamforming with spatial cues preservation for hearing aids in real-life complex acoustic environments 具有空间线索保存的双耳波束形成在现实生活中复杂声环境中的助听器
Hala As’ad, M. Bouchard, A. H. Kamkar-Parsi
This work is introducing novel binaural beamforming algorithms for hearing aids, with a good trade-off between noise reduction and the preservation of the binaural cues for different types of sources (directional interfering talker sources, diffuse-like background noise). In the proposed methods, no knowledge of the interfering talkers' direction or the second order statistics of the noise-only components is required. Different classification decisions are considered in the time- frequency domain based on the power, the power difference, and the complex coherence of different available signals. Simulations are performed using signals recorded from multichannel binaural hearing aids, to validate the performance of the proposed algorithms under different acoustic scenarios and using different microphone configurations. For the simulations performed in this paper, a good knowledge of the target direction and propagation model is assumed. For hearing aids, this assumption is typically more realistic than the assumption of knowing the direction and propagation model of the interferer talkers. The comparison of the performance results is done with other algorithms that don't require information on the directions or statistics of the interfering talker sources and the background noise. The results indicate that the proposed algorithms can either provide nearly the same noise reduction as classical beamformers but with improved noise binaural cues preservation, or they can produce a good trade-off between noise reduction and noise binaural cues preservation.
这项工作为助听器引入了新的双耳波束形成算法,在降噪和保留不同类型的双耳信号(定向干扰话音源,扩散样背景噪声)之间取得了很好的平衡。在提出的方法中,不需要知道干扰话音的方向或纯噪声分量的二阶统计量。基于不同可用信号的功率、功率差和复相干性,在时频域考虑了不同的分类决策。利用多声道双耳助听器记录的信号进行了仿真,以验证所提出算法在不同声学场景和不同麦克风配置下的性能。对于本文所进行的仿真,假设很好地了解目标方向和传播模型。对于助听器来说,这种假设通常比知道干扰说话者的方向和传播模型的假设更现实。将该算法的性能结果与其他不需要干扰话音源和背景噪声方向或统计信息的算法进行了比较。结果表明,所提出的算法既可以提供与经典波束形成器几乎相同的降噪效果,又可以改善噪声双耳线索的保存,或者可以在降噪和噪声双耳线索保存之间取得良好的平衡。
{"title":"Binaural beamforming with spatial cues preservation for hearing aids in real-life complex acoustic environments","authors":"Hala As’ad, M. Bouchard, A. H. Kamkar-Parsi","doi":"10.1109/APSIPA.2017.8282250","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282250","url":null,"abstract":"This work is introducing novel binaural beamforming algorithms for hearing aids, with a good trade-off between noise reduction and the preservation of the binaural cues for different types of sources (directional interfering talker sources, diffuse-like background noise). In the proposed methods, no knowledge of the interfering talkers' direction or the second order statistics of the noise-only components is required. Different classification decisions are considered in the time- frequency domain based on the power, the power difference, and the complex coherence of different available signals. Simulations are performed using signals recorded from multichannel binaural hearing aids, to validate the performance of the proposed algorithms under different acoustic scenarios and using different microphone configurations. For the simulations performed in this paper, a good knowledge of the target direction and propagation model is assumed. For hearing aids, this assumption is typically more realistic than the assumption of knowing the direction and propagation model of the interferer talkers. The comparison of the performance results is done with other algorithms that don't require information on the directions or statistics of the interfering talker sources and the background noise. The results indicate that the proposed algorithms can either provide nearly the same noise reduction as classical beamformers but with improved noise binaural cues preservation, or they can produce a good trade-off between noise reduction and noise binaural cues preservation.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134195973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Online sound structure analysis based on generative model of acoustic feature sequences 基于声特征序列生成模型的在线声结构分析
Keisuke Imoto, Nobutaka Ono, M. Niitsuma, Y. Yamashita
We propose a method for the online sound structure analysis based on a Bayesian generative model of acoustic feature sequences, with which the hierarchical generative process of the sound clip, acoustic topic, acoustic word, and acoustic feature is assumed. In this model, it is assumed that sound clips are organized based on the combination of latent acoustic topics, and each acoustic topic is represented by a Gaussian mixture model (GMM) over an acoustic feature space, where the components of the GMM correspond to acoustic words. Since the conventional batch algorithm for learning this model requires a huge amount of calculation, it is difficult to analyze the massive amount of sound data. Moreover, the batch algorithm does not allow us to analyze the sequentially obtained data. Our variational Bayes-based online algorithm for this generative model can analyze the structure of sounds sound clip by sound clip. The experimental results show that the proposed online algorithm can reduce the calculation cost by about 90% and estimate the posterior distributions as efficiently as the conventional batch algorithm.
提出了一种基于声学特征序列贝叶斯生成模型的在线声音结构分析方法,该模型假设了声音片段、声学主题、声学词和声学特征的分层生成过程。在该模型中,假设声音片段是基于潜在声学主题的组合进行组织的,每个声学主题由声学特征空间上的高斯混合模型(GMM)表示,GMM的分量对应于声学单词。由于传统的批处理算法学习该模型需要大量的计算量,难以分析海量的声音数据。此外,批处理算法不允许我们分析顺序获得的数据。我们基于变分贝叶斯的在线算法可以对每个声音片段的声音结构进行分析。实验结果表明,该算法可将计算成本降低约90%,且估计后验分布的效率与传统的批处理算法相当。
{"title":"Online sound structure analysis based on generative model of acoustic feature sequences","authors":"Keisuke Imoto, Nobutaka Ono, M. Niitsuma, Y. Yamashita","doi":"10.1109/APSIPA.2017.8282236","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282236","url":null,"abstract":"We propose a method for the online sound structure analysis based on a Bayesian generative model of acoustic feature sequences, with which the hierarchical generative process of the sound clip, acoustic topic, acoustic word, and acoustic feature is assumed. In this model, it is assumed that sound clips are organized based on the combination of latent acoustic topics, and each acoustic topic is represented by a Gaussian mixture model (GMM) over an acoustic feature space, where the components of the GMM correspond to acoustic words. Since the conventional batch algorithm for learning this model requires a huge amount of calculation, it is difficult to analyze the massive amount of sound data. Moreover, the batch algorithm does not allow us to analyze the sequentially obtained data. Our variational Bayes-based online algorithm for this generative model can analyze the structure of sounds sound clip by sound clip. The experimental results show that the proposed online algorithm can reduce the calculation cost by about 90% and estimate the posterior distributions as efficiently as the conventional batch algorithm.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132423297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks 基于多分辨率分析和卷积神经网络的语音信号人格特征感知
Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong, H. Wang
This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.
本研究提出了一种基于小波的多分辨率分析和卷积神经网络(cnn)从语音信号中感知人格特征(PT)的方法。在本研究中,首先利用小波变换将语音信号分解为不同分辨率的信号。然后,提取语音信号在各个分辨率下的声学特征。根据声学特征,采用CNN生成大五项清单-10 (Big Five Inventory-10, BFI- 10)的剖面,为表达一组10个基本BFI项的存在或不存在的程度提供了定量度量。BFI-10的特征被进一步输入到5个人工神经网络(ANN)中,每个网络对应5个人格维度中的一个:开放性、严谨性、外向性、宜人性和神经质。为了评估该方法的性能,在SSPNet说话人人格语料库(SPC)上进行了实验,其中包括在INTERSPEECH 2012说话人特征子挑战中随机抽取的640个法语新闻公告片段。从实验结果来看,在INTERSPEECH 2012说话人特征子挑战中,平均PT感知准确率为71.97%,优于基于神经网络的方法和Baseline方法。
{"title":"Personality trait perception from speech signals using multiresolution analysis and convolutional neural networks","authors":"Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong, H. Wang","doi":"10.1109/APSIPA.2017.8282287","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282287","url":null,"abstract":"This study presents an approach to personality trait (PT) perception from speech signals using wavelet-based multiresolution analysis and convolutional neural networks (CNNs). In this study, first, wavelet transform is employed to decompose the speech signals into the signals at different levels of resolution. Then, the acoustic features of the speech signals at each resolution are extracted. Given the acoustic features, the CNN is adopted to generate the profiles of the Big Five Inventory-10 (BFI- 10), which provide a quantitative measure for expressing the degree of the presence or absence of a set of 10 basic BFI items. The BFI-10 profiles are further fed into five artificial neural networks (ANN), each for one of the five personality dimensions: Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism for PT perception. To evaluate the performance of the proposed method, experiments were conducted over the SSPNet Speaker Personality Corpus (SPC), including 640 clips randomly extracted from the French news bulletins in the INTERSPEECH 2012 speaker trait sub-challenge. From the experimental results, an average PT perception accuracy of 71.97% was obtained, outperforming the ANN-based method and the Baseline method in the INTERSPEECH 2012 speaker trait sub-challenge.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133577060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling 基于潜在轨迹建模的声学-发音深度反演映射
Patrick Lumban Tobing, H. Kameoka, T. Toda
This paper presents a novel implementation of latent trajectory modeling in a deep acoustic-to-articulatory inversion mapping framework. In the conventional methods, i.e., the Gaussian mixture model (GMM)- and the deep neural network (DNN)- based inversion mappings, the frame interdependency can be considered while generating articulatory parameter trajectories with the use of an explicit constraint between static and dynamic features. However, in training these models, such a constraint is not considered, and therefore, the trained model is not optimum for the mapping procedure. In this paper, we address this problem by introducing a latent trajectory modeling into the DNN-based inversion mapping. In the latent trajectory model, the frame interdependency can be well considered, in both training and mapping, by using a soft-constraint between static and dynamic features. The experimental results demonstrate that the proposed latent trajectory DNN (LTDNN)-based inversion mapping outperforms the conventional and the state-of-the-art inversion mapping systems.
本文提出了一种在声学-发音深层反演映射框架中实现潜在轨迹建模的新方法。在传统的方法中,即基于高斯混合模型(GMM)和基于深度神经网络(DNN)的反演映射中,在使用静态和动态特征之间的显式约束生成铰合参数轨迹时,可以考虑帧间的相互依赖性。然而,在训练这些模型时,没有考虑到这样的约束,因此,训练的模型对于映射过程来说不是最优的。在本文中,我们通过在基于dnn的反演映射中引入潜在轨迹建模来解决这个问题。在潜在轨迹模型中,通过使用静态和动态特征之间的软约束,可以在训练和映射中很好地考虑帧之间的相互依赖性。实验结果表明,基于潜在轨迹深度神经网络(LTDNN)的反演映射优于传统的和最先进的反演映射系统。
{"title":"Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling","authors":"Patrick Lumban Tobing, H. Kameoka, T. Toda","doi":"10.1109/APSIPA.2017.8282219","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282219","url":null,"abstract":"This paper presents a novel implementation of latent trajectory modeling in a deep acoustic-to-articulatory inversion mapping framework. In the conventional methods, i.e., the Gaussian mixture model (GMM)- and the deep neural network (DNN)- based inversion mappings, the frame interdependency can be considered while generating articulatory parameter trajectories with the use of an explicit constraint between static and dynamic features. However, in training these models, such a constraint is not considered, and therefore, the trained model is not optimum for the mapping procedure. In this paper, we address this problem by introducing a latent trajectory modeling into the DNN-based inversion mapping. In the latent trajectory model, the frame interdependency can be well considered, in both training and mapping, by using a soft-constraint between static and dynamic features. The experimental results demonstrate that the proposed latent trajectory DNN (LTDNN)-based inversion mapping outperforms the conventional and the state-of-the-art inversion mapping systems.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117268618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Understanding multiple-input multiple-output active noise control from a perspective of sampling and reconstruction 从采样和重构的角度理解多输入多输出主动噪声控制
Chuang Shi, Huiyong Li, Dongyuan Shi, Bhan Lam, W. Gan
This paper formulates the multiple-input multiple- output active noise control as a spatial sampling and reconstruction problem. With the proposed formulation, the inputs from the reference microphones and the outputs of the antinoise sources are regarded as spatial samples. We show that the proposed formulation is general and can unify the existing control strategies. Three control strategies, for instance, are derived from the proposed formulation and linked to different cost functions in the practical implementation. Finally, simulation results are presented to verify the effectiveness of our analysis.
本文将多输入多输出主动噪声控制表述为一个空间采样和重构问题。在该公式中,参考麦克风的输入和降噪源的输出被视为空间样本。结果表明,该公式具有通用性,可以统一现有的控制策略。例如,三种控制策略是从提议的公式中衍生出来的,并在实际执行中与不同的成本函数相联系。最后给出了仿真结果,验证了分析的有效性。
{"title":"Understanding multiple-input multiple-output active noise control from a perspective of sampling and reconstruction","authors":"Chuang Shi, Huiyong Li, Dongyuan Shi, Bhan Lam, W. Gan","doi":"10.1109/APSIPA.2017.8282013","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282013","url":null,"abstract":"This paper formulates the multiple-input multiple- output active noise control as a spatial sampling and reconstruction problem. With the proposed formulation, the inputs from the reference microphones and the outputs of the antinoise sources are regarded as spatial samples. We show that the proposed formulation is general and can unify the existing control strategies. Three control strategies, for instance, are derived from the proposed formulation and linked to different cost functions in the practical implementation. Finally, simulation results are presented to verify the effectiveness of our analysis.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A drag-and-drop type human computer interaction technique based on electrooculogram 一种基于眼电图的拖放式人机交互技术
S. Ogai, Toshihisa Tanaka
A fundamental limitation of human-computer interaction using electrooculogram (EOG) is a low accuracy of eye tracking performance and the head movement that violates the calibration of the on-monitor gaze coordinates. In this paper, we develop a drag-and-drop type interface with the EOG that can avoid a direct estimation of gaze location and can make users free from the restriction of head movement. To drag a cursor on the screen, the proposed system models the relationship between the amount of eye movement and the EOG amplitude with linear regression. Five subjects participated in the experiment to compare the proposed drag-and-drop type and the conventional direct gaze type interfaces. Performance measures such as efficiency and satisfaction showed the advantage of the proposed method with significant differences (p < 0.05).
使用眼电图(EOG)进行人机交互的一个基本限制是眼动追踪性能的准确性较低,并且头部运动违反了监视器注视坐标的校准。在本文中,我们开发了一个拖放式的EOG界面,可以避免对凝视位置的直接估计,并且可以使用户免受头部运动的限制。为了在屏幕上拖动光标,该系统采用线性回归方法对眼动量与眼电振幅之间的关系进行建模。五名受试者参与了实验,比较了提出的拖放式界面和传统的直接注视式界面。效能、满意度等绩效指标均显示出该方法的优势,差异有统计学意义(p < 0.05)。
{"title":"A drag-and-drop type human computer interaction technique based on electrooculogram","authors":"S. Ogai, Toshihisa Tanaka","doi":"10.1109/APSIPA.2017.8282126","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282126","url":null,"abstract":"A fundamental limitation of human-computer interaction using electrooculogram (EOG) is a low accuracy of eye tracking performance and the head movement that violates the calibration of the on-monitor gaze coordinates. In this paper, we develop a drag-and-drop type interface with the EOG that can avoid a direct estimation of gaze location and can make users free from the restriction of head movement. To drag a cursor on the screen, the proposed system models the relationship between the amount of eye movement and the EOG amplitude with linear regression. Five subjects participated in the experiment to compare the proposed drag-and-drop type and the conventional direct gaze type interfaces. Performance measures such as efficiency and satisfaction showed the advantage of the proposed method with significant differences (p < 0.05).","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116113181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hybrid EEG-NIRS brain-computer interface under eyes-closed condition 闭眼条件下脑电-近红外混合脑机接口
Jaeyoung Shin, K. Müller, Han-Jeong Hwang
In this study, we propose a hybrid BCI combining electroencephalography (EEG) and near-infrared spectroscopy (NIRS) that can be potentially operated in eyes-closed condition for paralyzed patients with oculomotor dysfunctions. In the experiment, seven healthy participants performed mental subtraction and stayed relaxed (baseline state), during which EEG and NIRS data were simultaneously measured. To evaluate the feasibility of the hybrid BCI, we classified frontal brain activities inducted by mental subtraction and baseline state, and compared classification accuracies obtained using unimodal EEG and NIRS BCI and the hybrid BCI. As a result, the hybrid BCI (85.54 % ± 8.59) showed significantly higher classification accuracy than those of unimodal EEG (80.77 % ± 11.15) and NIRS BCI (77.12 % ± 7.63) (Wilcoxon signed rank test, Bonferroni corrected p < 0.05). The result demonstrated that our eyes-closed hybrid BCI approach could be potentially applied to neurodegenerative patients with impaired motor functions accompanied by a decline of visual functions.
在这项研究中,我们提出了一种结合脑电图(EEG)和近红外光谱(NIRS)的混合型脑机接口,可以在闭眼条件下对患有动眼肌功能障碍的瘫痪患者进行手术。在实验中,7名健康参与者在保持放松状态(基线状态)的情况下进行精神减法,同时测量EEG和NIRS数据。为了评估混合脑机接口的可行性,我们对精神减法和基线状态诱导的额叶脑活动进行了分类,并比较了单峰脑电和近红外脑机接口与混合脑机接口的分类准确率。结果表明,混合脑电分类准确率(85.54%±8.59)明显高于单峰脑电分类准确率(80.77%±11.15)和近红外脑电分类准确率(77.12%±7.63)(Wilcoxon符号秩检验,Bonferroni校正p < 0.05)。结果表明,我们的闭眼混合脑机接口方法可以潜在地应用于运动功能受损伴视觉功能下降的神经退行性患者。
{"title":"Hybrid EEG-NIRS brain-computer interface under eyes-closed condition","authors":"Jaeyoung Shin, K. Müller, Han-Jeong Hwang","doi":"10.1109/APSIPA.2017.8282127","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282127","url":null,"abstract":"In this study, we propose a hybrid BCI combining electroencephalography (EEG) and near-infrared spectroscopy (NIRS) that can be potentially operated in eyes-closed condition for paralyzed patients with oculomotor dysfunctions. In the experiment, seven healthy participants performed mental subtraction and stayed relaxed (baseline state), during which EEG and NIRS data were simultaneously measured. To evaluate the feasibility of the hybrid BCI, we classified frontal brain activities inducted by mental subtraction and baseline state, and compared classification accuracies obtained using unimodal EEG and NIRS BCI and the hybrid BCI. As a result, the hybrid BCI (85.54 % ± 8.59) showed significantly higher classification accuracy than those of unimodal EEG (80.77 % ± 11.15) and NIRS BCI (77.12 % ± 7.63) (Wilcoxon signed rank test, Bonferroni corrected p < 0.05). The result demonstrated that our eyes-closed hybrid BCI approach could be potentially applied to neurodegenerative patients with impaired motor functions accompanied by a decline of visual functions.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116137384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1