首页 > 最新文献

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)最新文献

英文 中文
Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling 基于潜在轨迹建模的声学-发音深度反演映射
Patrick Lumban Tobing, H. Kameoka, T. Toda
This paper presents a novel implementation of latent trajectory modeling in a deep acoustic-to-articulatory inversion mapping framework. In the conventional methods, i.e., the Gaussian mixture model (GMM)- and the deep neural network (DNN)- based inversion mappings, the frame interdependency can be considered while generating articulatory parameter trajectories with the use of an explicit constraint between static and dynamic features. However, in training these models, such a constraint is not considered, and therefore, the trained model is not optimum for the mapping procedure. In this paper, we address this problem by introducing a latent trajectory modeling into the DNN-based inversion mapping. In the latent trajectory model, the frame interdependency can be well considered, in both training and mapping, by using a soft-constraint between static and dynamic features. The experimental results demonstrate that the proposed latent trajectory DNN (LTDNN)-based inversion mapping outperforms the conventional and the state-of-the-art inversion mapping systems.
本文提出了一种在声学-发音深层反演映射框架中实现潜在轨迹建模的新方法。在传统的方法中,即基于高斯混合模型(GMM)和基于深度神经网络(DNN)的反演映射中,在使用静态和动态特征之间的显式约束生成铰合参数轨迹时,可以考虑帧间的相互依赖性。然而,在训练这些模型时,没有考虑到这样的约束,因此,训练的模型对于映射过程来说不是最优的。在本文中,我们通过在基于dnn的反演映射中引入潜在轨迹建模来解决这个问题。在潜在轨迹模型中,通过使用静态和动态特征之间的软约束,可以在训练和映射中很好地考虑帧之间的相互依赖性。实验结果表明,基于潜在轨迹深度神经网络(LTDNN)的反演映射优于传统的和最先进的反演映射系统。
{"title":"Deep acoustic-to-articulatory inversion mapping with latent trajectory modeling","authors":"Patrick Lumban Tobing, H. Kameoka, T. Toda","doi":"10.1109/APSIPA.2017.8282219","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282219","url":null,"abstract":"This paper presents a novel implementation of latent trajectory modeling in a deep acoustic-to-articulatory inversion mapping framework. In the conventional methods, i.e., the Gaussian mixture model (GMM)- and the deep neural network (DNN)- based inversion mappings, the frame interdependency can be considered while generating articulatory parameter trajectories with the use of an explicit constraint between static and dynamic features. However, in training these models, such a constraint is not considered, and therefore, the trained model is not optimum for the mapping procedure. In this paper, we address this problem by introducing a latent trajectory modeling into the DNN-based inversion mapping. In the latent trajectory model, the frame interdependency can be well considered, in both training and mapping, by using a soft-constraint between static and dynamic features. The experimental results demonstrate that the proposed latent trajectory DNN (LTDNN)-based inversion mapping outperforms the conventional and the state-of-the-art inversion mapping systems.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117268618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Hybrid EEG-NIRS brain-computer interface under eyes-closed condition 闭眼条件下脑电-近红外混合脑机接口
Jaeyoung Shin, K. Müller, Han-Jeong Hwang
In this study, we propose a hybrid BCI combining electroencephalography (EEG) and near-infrared spectroscopy (NIRS) that can be potentially operated in eyes-closed condition for paralyzed patients with oculomotor dysfunctions. In the experiment, seven healthy participants performed mental subtraction and stayed relaxed (baseline state), during which EEG and NIRS data were simultaneously measured. To evaluate the feasibility of the hybrid BCI, we classified frontal brain activities inducted by mental subtraction and baseline state, and compared classification accuracies obtained using unimodal EEG and NIRS BCI and the hybrid BCI. As a result, the hybrid BCI (85.54 % ± 8.59) showed significantly higher classification accuracy than those of unimodal EEG (80.77 % ± 11.15) and NIRS BCI (77.12 % ± 7.63) (Wilcoxon signed rank test, Bonferroni corrected p < 0.05). The result demonstrated that our eyes-closed hybrid BCI approach could be potentially applied to neurodegenerative patients with impaired motor functions accompanied by a decline of visual functions.
在这项研究中,我们提出了一种结合脑电图(EEG)和近红外光谱(NIRS)的混合型脑机接口,可以在闭眼条件下对患有动眼肌功能障碍的瘫痪患者进行手术。在实验中,7名健康参与者在保持放松状态(基线状态)的情况下进行精神减法,同时测量EEG和NIRS数据。为了评估混合脑机接口的可行性,我们对精神减法和基线状态诱导的额叶脑活动进行了分类,并比较了单峰脑电和近红外脑机接口与混合脑机接口的分类准确率。结果表明,混合脑电分类准确率(85.54%±8.59)明显高于单峰脑电分类准确率(80.77%±11.15)和近红外脑电分类准确率(77.12%±7.63)(Wilcoxon符号秩检验,Bonferroni校正p < 0.05)。结果表明,我们的闭眼混合脑机接口方法可以潜在地应用于运动功能受损伴视觉功能下降的神经退行性患者。
{"title":"Hybrid EEG-NIRS brain-computer interface under eyes-closed condition","authors":"Jaeyoung Shin, K. Müller, Han-Jeong Hwang","doi":"10.1109/APSIPA.2017.8282127","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282127","url":null,"abstract":"In this study, we propose a hybrid BCI combining electroencephalography (EEG) and near-infrared spectroscopy (NIRS) that can be potentially operated in eyes-closed condition for paralyzed patients with oculomotor dysfunctions. In the experiment, seven healthy participants performed mental subtraction and stayed relaxed (baseline state), during which EEG and NIRS data were simultaneously measured. To evaluate the feasibility of the hybrid BCI, we classified frontal brain activities inducted by mental subtraction and baseline state, and compared classification accuracies obtained using unimodal EEG and NIRS BCI and the hybrid BCI. As a result, the hybrid BCI (85.54 % ± 8.59) showed significantly higher classification accuracy than those of unimodal EEG (80.77 % ± 11.15) and NIRS BCI (77.12 % ± 7.63) (Wilcoxon signed rank test, Bonferroni corrected p < 0.05). The result demonstrated that our eyes-closed hybrid BCI approach could be potentially applied to neurodegenerative patients with impaired motor functions accompanied by a decline of visual functions.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116137384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Four-dimensional image compression with region of interest based on non-separable double lifting integer wavelet transform 基于不可分双提升整数小波变换的感兴趣区域四维图像压缩
Fairoza Amira Hamzah, Taichi Yoshida, M. Iwahashi
This paper increases the coding performance for four-dimensional (4D) image based on the region of interest (ROI) coding implemented in the non-separable double lifting structure of 4D integer wavelet transform (WT). The WT has succeeded its predecessor, the discrete cosine transform (DCT), which has been widely used in image compression international standard, the JPEG 2000 since more than a decade ago. The conventional lifting structure which is known as the separable structure has many rounding operators that will increase the rounding noise inside the transform. The higher the rounding noise inside the transform, the lower the coding performance. Thus, a non-separable structure of double lifting WT is introduced to reduce the rounding noise. The non-separable structure is compatible with the conventional wavelet-based JPEG 2000. Furthermore, an ROI coding based non-separable integer WT is proposed by utilizing both lossy and lossless compression and it was observed that the proposed method increased the coding performance of 4D image.
本文利用四维整数小波变换的不可分双提升结构实现感兴趣区域编码,提高了四维图像的编码性能。小波变换继承了十多年前被广泛应用于图像压缩国际标准JPEG 2000的离散余弦变换(DCT)。传统的升降结构被称为可分离结构,它有许多舍入算子,这将增加变换内部的舍入噪声。变换内的舍入噪声越高,编码性能越低。为此,引入了一种双提升小波的不可分结构来降低舍入噪声。该结构与传统的基于小波的JPEG 2000兼容。在此基础上,结合有损和无损压缩,提出了一种基于ROI编码的不可分整数小波变换,提高了四维图像的编码性能。
{"title":"Four-dimensional image compression with region of interest based on non-separable double lifting integer wavelet transform","authors":"Fairoza Amira Hamzah, Taichi Yoshida, M. Iwahashi","doi":"10.1109/APSIPA.2017.8282329","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282329","url":null,"abstract":"This paper increases the coding performance for four-dimensional (4D) image based on the region of interest (ROI) coding implemented in the non-separable double lifting structure of 4D integer wavelet transform (WT). The WT has succeeded its predecessor, the discrete cosine transform (DCT), which has been widely used in image compression international standard, the JPEG 2000 since more than a decade ago. The conventional lifting structure which is known as the separable structure has many rounding operators that will increase the rounding noise inside the transform. The higher the rounding noise inside the transform, the lower the coding performance. Thus, a non-separable structure of double lifting WT is introduced to reduce the rounding noise. The non-separable structure is compatible with the conventional wavelet-based JPEG 2000. Furthermore, an ROI coding based non-separable integer WT is proposed by utilizing both lossy and lossless compression and it was observed that the proposed method increased the coding performance of 4D image.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"319 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123330905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Investigating the use of scattering coefficients for replay attack detection 研究使用散射系数重放攻击检测
Kaavya Sriskandaraja, Gajan Suthokumar, V. Sethu, E. Ambikairajah
Widespread adoption of speaker verification for security relies on the existence of effective anti-spoofing countermeasures. This paper presents a countermeasure based on spectral features to detect replay spoofing attacks on automatic speaker verification systems. In particular, the use of hierarchical scattering decomposition coefficients and inverse- mel frequency cepstral coefficients are explored. Our best system achieved a relative improvement of around 70% in terms of equal error rate on the development set and 20% on the evaluation set, when compared to the baseline on the ASVspoof 2017 database. In addition, we show that features with a shorter window can be beneficial to detecting replayed speech, in contrast to speech synthesis and voice conversion attack.
说话人验证的广泛采用依赖于有效的反欺骗对策的存在。提出了一种基于频谱特征的语音自动验证系统重放欺骗攻击检测方法。特别探讨了分层散射分解系数和逆模频率倒谱系数的应用。与ASVspoof 2017数据库的基线相比,我们最好的系统在开发集的平均错误率方面实现了大约70%的相对改进,在评估集上实现了20%的相对改进。此外,我们表明,与语音合成和语音转换攻击相比,具有较短窗口的特征有利于检测重放语音。
{"title":"Investigating the use of scattering coefficients for replay attack detection","authors":"Kaavya Sriskandaraja, Gajan Suthokumar, V. Sethu, E. Ambikairajah","doi":"10.1109/APSIPA.2017.8282211","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282211","url":null,"abstract":"Widespread adoption of speaker verification for security relies on the existence of effective anti-spoofing countermeasures. This paper presents a countermeasure based on spectral features to detect replay spoofing attacks on automatic speaker verification systems. In particular, the use of hierarchical scattering decomposition coefficients and inverse- mel frequency cepstral coefficients are explored. Our best system achieved a relative improvement of around 70% in terms of equal error rate on the development set and 20% on the evaluation set, when compared to the baseline on the ASVspoof 2017 database. In addition, we show that features with a shorter window can be beneficial to detecting replayed speech, in contrast to speech synthesis and voice conversion attack.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123064711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Efficient edge-oriented based image interpolation algorithm for non-integer scaling factor 基于边缘的非整数比例因子图像插值算法
Chia-Chun Hsu, Jian-Jiun Ding, Yih-Cherng Lee
Though image interpolation has been developed for many years, most of state-of-the-art methods, including machine learning based methods, can only zoom the image with the scaling factor of 2, 3, 2k, or other integer values. Hence, the bicubic interpolation method is still a popular method for the non-integer scaling problem. In this paper, we propose a novel interpolation algorithm for image zooming with non-integer scaling factors based on the gradient direction. The proposed method first estimates the gradient direction for each pixel in the low resolution image. Then, we construct the gradient map for the high resolution image by the spline interpolation method. Finally, the intensity of missing pixels can be computed by the weighted sum of the pixels in the pre-defined window. To preserve the edge information during the interpolation process, the weight is determined by the inner product of the estimated gradient vector and the vector from the missing pixel to the known data point. Simulations show that the proposed method has higher performance than other non-integer time scaling methods and is helpful for superresolution.
虽然图像插值已经发展了很多年,但大多数最先进的方法,包括基于机器学习的方法,只能用缩放因子2、3、2k或其他整数值缩放图像。因此,双三次插值法仍然是求解非整数尺度问题的常用方法。本文提出了一种基于梯度方向的非整数缩放因子图像插值算法。该方法首先估计低分辨率图像中每个像素的梯度方向;然后,利用样条插值法构造高分辨率图像的梯度图。最后,通过预定义窗口中像素的加权和来计算缺失像素的强度。为了在插值过程中保留边缘信息,权重由估计的梯度向量与缺失像素到已知数据点的向量的内积确定。仿真结果表明,该方法比其他非整数时间尺度方法具有更高的性能,有助于实现超分辨率。
{"title":"Efficient edge-oriented based image interpolation algorithm for non-integer scaling factor","authors":"Chia-Chun Hsu, Jian-Jiun Ding, Yih-Cherng Lee","doi":"10.1109/APSIPA.2017.8282202","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282202","url":null,"abstract":"Though image interpolation has been developed for many years, most of state-of-the-art methods, including machine learning based methods, can only zoom the image with the scaling factor of 2, 3, 2k, or other integer values. Hence, the bicubic interpolation method is still a popular method for the non-integer scaling problem. In this paper, we propose a novel interpolation algorithm for image zooming with non-integer scaling factors based on the gradient direction. The proposed method first estimates the gradient direction for each pixel in the low resolution image. Then, we construct the gradient map for the high resolution image by the spline interpolation method. Finally, the intensity of missing pixels can be computed by the weighted sum of the pixels in the pre-defined window. To preserve the edge information during the interpolation process, the weight is determined by the inner product of the estimated gradient vector and the vector from the missing pixel to the known data point. Simulations show that the proposed method has higher performance than other non-integer time scaling methods and is helpful for superresolution.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"63 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123187549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs 利用不平衡文本和声学数据训练韵律增强rnnlm
Michael Hentschel, A. Ogawa, Marc Delcroix, T. Nakatani, Yuji Matsumoto
There have been many attempts in the past to exploit various sources of information in language modelling besides words, for instance prosody or topic information. With neural network based language models, it became easier to make use of this continuous valued information, because the neural network transforms the discrete valued space into a continuous valued space. So far, models incorporating prosodic information were jointly trained on the auxiliary and the textual information from the beginning. However, in practice the auxiliary information is usually only available for a small amount of the training data. In order to fully exploit text and acoustic data, we propose to re-train a recurrent neural network language model, rather than training a language model from scratch. Using this method we achieved perplexity and word error rate reductions for N-best rescoring on the MIT-OCW lecture corpus.
在过去的语言建模中,除了单词之外,人们还尝试利用各种信息来源,例如韵律信息或主题信息。基于神经网络的语言模型使得这种连续值信息的利用变得更加容易,因为神经网络将离散值空间转化为连续值空间。到目前为止,包含韵律信息的模型从一开始就在辅助信息和文本信息上进行联合训练。然而,在实践中,辅助信息通常只适用于一小部分训练数据。为了充分利用文本和声学数据,我们建议重新训练递归神经网络语言模型,而不是从头开始训练语言模型。使用该方法,我们在MIT-OCW课程语料库上实现了N-best评分的困惑和单词错误率降低。
{"title":"Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs","authors":"Michael Hentschel, A. Ogawa, Marc Delcroix, T. Nakatani, Yuji Matsumoto","doi":"10.1109/APSIPA.2017.8282099","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282099","url":null,"abstract":"There have been many attempts in the past to exploit various sources of information in language modelling besides words, for instance prosody or topic information. With neural network based language models, it became easier to make use of this continuous valued information, because the neural network transforms the discrete valued space into a continuous valued space. So far, models incorporating prosodic information were jointly trained on the auxiliary and the textual information from the beginning. However, in practice the auxiliary information is usually only available for a small amount of the training data. In order to fully exploit text and acoustic data, we propose to re-train a recurrent neural network language model, rather than training a language model from scratch. Using this method we achieved perplexity and word error rate reductions for N-best rescoring on the MIT-OCW lecture corpus.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126123674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A real time micro-expression detection system with LBP-TOP on a many-core processor 基于LBP-TOP的多核微表情实时检测系统
X. Soh, Vishnu Monn Baskaran, Adamu Muhammad Buhari, R. Phan
The implementation of a micro-expression detection system introduces challenges to sustain a real time recognition result. In order to surmount these problems, this paper examines the algorithm of a serial Local Binary Pattern from Three Orthogonal Planes (LBP-TOP) in order to identify the performance limitations for real time system. Videos from SMIC and CASMEII were up sampled to higher resolutions (280×340, 560×680 and 1120×1360) to cater the need of real life implementation. Then, a parallel multicore-based LBP-TOP algorithm is studied as a benchmark. Experimental results show that the parallel LBP-TOP algorithm exhibits 7× and 8× speedup against serial LBP-TOP for SMIC and CASMEII database respectively for the highest tested video resolution utilising 24- logical processor multi-core architecture. To further reduce the computational time, this paper also proposes a many-core parallel LBP-TOP algorithm using Compute Unified Device Architecture (CUDA). In addition, a method is designed to calculate the threads and blocks required to launch the kernel when processing videos from different resolutions. The proposed algorithm increases the performance speedup to 117× and 130× against the serial algorithm for the highest tested resolution videos.
微表情检测系统的实现给维持实时识别结果带来了挑战。为了克服这些问题,本文研究了三正交平面串行局部二值模式(LBP-TOP)算法,以识别实时系统的性能限制。中芯国际和CASMEII的视频被采样到更高的分辨率(280×340, 560×680和1120×1360),以满足现实生活中实现的需要。然后,研究了一种基于并行多核的LBP-TOP算法作为基准。实验结果表明,在24逻辑处理器多核架构下,并行LBP-TOP算法相对于串行LBP-TOP算法分别在SMIC和CASMEII数据库中具有7倍和8倍的加速,可获得最高的测试视频分辨率。为了进一步减少计算时间,本文还提出了一种基于CUDA的多核并行LBP-TOP算法。此外,还设计了一个方法来计算在处理不同分辨率的视频时启动内核所需的线程和块。在测试的最高分辨率视频中,与串行算法相比,该算法的性能速度提高了117倍和130倍。
{"title":"A real time micro-expression detection system with LBP-TOP on a many-core processor","authors":"X. Soh, Vishnu Monn Baskaran, Adamu Muhammad Buhari, R. Phan","doi":"10.1109/APSIPA.2017.8282041","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282041","url":null,"abstract":"The implementation of a micro-expression detection system introduces challenges to sustain a real time recognition result. In order to surmount these problems, this paper examines the algorithm of a serial Local Binary Pattern from Three Orthogonal Planes (LBP-TOP) in order to identify the performance limitations for real time system. Videos from SMIC and CASMEII were up sampled to higher resolutions (280×340, 560×680 and 1120×1360) to cater the need of real life implementation. Then, a parallel multicore-based LBP-TOP algorithm is studied as a benchmark. Experimental results show that the parallel LBP-TOP algorithm exhibits 7× and 8× speedup against serial LBP-TOP for SMIC and CASMEII database respectively for the highest tested video resolution utilising 24- logical processor multi-core architecture. To further reduce the computational time, this paper also proposes a many-core parallel LBP-TOP algorithm using Compute Unified Device Architecture (CUDA). In addition, a method is designed to calculate the threads and blocks required to launch the kernel when processing videos from different resolutions. The proposed algorithm increases the performance speedup to 117× and 130× against the serial algorithm for the highest tested resolution videos.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126129679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Prediction techniques for wavelet based 1-D signal compression 基于小波变换的一维信号压缩预测技术
I-Hsiang Wang, Jian-Jiun Ding, H. Hsu
This paper proposes a novel one-dimensional (1-D) signal compression technique. We first perform beat-alignment to transform a 1-D signal into 2-D, then use 2-D discrete wavelet transform (DWT) to further decompose the 2-D signal into multiple subbands. These coefficients in certain subbands are then coded using a simple differential pulse code modulation (DPCM). After which, we construct neural networks one for each subband (except the LL subband) to perform prediction. Based on the prediction results, we construct a type of pixel-wise context A to determine the activity of a given pixel. At last, the DWT coefficients and residues from DPCM are bit-plane coded using the Embedded Block Coding with Optimized Truncation (EBCOT) from JPEG2000. We analyzed our results using a well- known 1D signal, the ECG signals in the MIT-BIH database, and it demonstrated significant improvement over existing methods.
本文提出了一种新的一维(1-D)信号压缩技术。我们首先进行热对准将一维信号转换为二维信号,然后使用二维离散小波变换(DWT)将二维信号进一步分解为多个子带。然后使用简单的差分脉冲编码调制(DPCM)对某些子带中的这些系数进行编码。然后,我们为每个子带(除了LL子带)构建一个神经网络来进行预测。基于预测结果,我们构建了一种逐像素上下文a来确定给定像素的活动。最后,利用JPEG2000的优化截断嵌入式块编码(EBCOT)对DPCM的DWT系数和残数进行位平面编码。我们使用MIT-BIH数据库中众所周知的1D信号(心电信号)分析了我们的结果,它比现有方法有了显著的改进。
{"title":"Prediction techniques for wavelet based 1-D signal compression","authors":"I-Hsiang Wang, Jian-Jiun Ding, H. Hsu","doi":"10.1109/APSIPA.2017.8281996","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8281996","url":null,"abstract":"This paper proposes a novel one-dimensional (1-D) signal compression technique. We first perform beat-alignment to transform a 1-D signal into 2-D, then use 2-D discrete wavelet transform (DWT) to further decompose the 2-D signal into multiple subbands. These coefficients in certain subbands are then coded using a simple differential pulse code modulation (DPCM). After which, we construct neural networks one for each subband (except the LL subband) to perform prediction. Based on the prediction results, we construct a type of pixel-wise context A to determine the activity of a given pixel. At last, the DWT coefficients and residues from DPCM are bit-plane coded using the Embedded Block Coding with Optimized Truncation (EBCOT) from JPEG2000. We analyzed our results using a well- known 1D signal, the ECG signals in the MIT-BIH database, and it demonstrated significant improvement over existing methods.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129647205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A rail detection algorithm based on pair particles filtering 一种基于对粒子滤波的轨道检测算法
Ji-Sang Bae, Jong-Ok Kim
Safety of mass transportation like train cannot be emphasized enough, and accurate rail detection in the direction of progress can be useful to the safe operation of a train. In this paper, we propose a new pair particles filtering based rail detection algorithm that simultaneously predicts a pair position of left and right rails. Multiple pairs of particles are first generated from the previously detected rails, and features of a pair particles position, rail gauge, and gradient magnitude are used to detect the positions of pair rails. The proposed pair particles filtering based method flexibly detects both straight and curved rails robustly. Experiments with various actual rail images show plausible detection results of the proposed method.
列车等大众交通工具的安全性再怎么强调都不为过,准确的列车前进方向轨道检测对列车的安全运行至关重要。本文提出了一种新的基于对粒子滤波的轨道检测算法,该算法可以同时预测左右轨道的对位置。首先从先前检测到的轨道中生成多对粒子,并利用一对粒子的位置、轨距和梯度大小的特征来检测对轨道的位置。基于对粒子滤波的方法可以灵活地对直线轨道和曲线轨道进行鲁棒检测。对各种实际轨道图像的实验表明,该方法的检测结果是可信的。
{"title":"A rail detection algorithm based on pair particles filtering","authors":"Ji-Sang Bae, Jong-Ok Kim","doi":"10.1109/APSIPA.2017.8282258","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282258","url":null,"abstract":"Safety of mass transportation like train cannot be emphasized enough, and accurate rail detection in the direction of progress can be useful to the safe operation of a train. In this paper, we propose a new pair particles filtering based rail detection algorithm that simultaneously predicts a pair position of left and right rails. Multiple pairs of particles are first generated from the previously detected rails, and features of a pair particles position, rail gauge, and gradient magnitude are used to detect the positions of pair rails. The proposed pair particles filtering based method flexibly detects both straight and curved rails robustly. Experiments with various actual rail images show plausible detection results of the proposed method.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129744707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Epileptic focus localization based on bivariate empirical mode decomposition and entropy 基于二元经验模态分解和熵的癫痫病灶定位
Tatsunori Itakura, Toshihisa Tanaka
Epilepsy is a neurological disorder which causes abnormal discharges in the brain. Epileptic focus localization is a important factor for successful epilepsy surgery. The intracranial electroencephalogram (iEEG) is the most used signal for detecting epileptic focus. The iEEG signals are obtained from a publicly available database that consists of 7,500 signal pairs. To this dataset, empirical mode decomposition (EMD) has been successfully applied to detect the epileptic focus. However, the EMD method is not suitable for iEEG signal pairs. In this paper, a method for the classification of focal and non-focal iEEG signals using bivariate EMD (BEMD) is presented. The bivariate iEEG signals are decomposed the into signal components of the same frequency band. Various entropy measures calculated from the IMFs of the iEEG signals. Then, some or all of the entropies are chosen as features, which are discriminated into focal or non-focal iEEG by using the support vector machine (SVM). Experimental results show that the proposed method is able to differentiate the focal from non-focal iEEG signals with an average classification accuracy of 86.89%.
癫痫是一种神经系统疾病,会导致大脑异常放电。癫痫病灶定位是癫痫手术成功的重要因素。颅内脑电图(iEEG)是检测癫痫病灶最常用的信号。iEEG信号是从一个由7500个信号对组成的公开数据库中获得的。对于该数据集,经验模态分解(EMD)已成功应用于癫痫病灶检测。然而,EMD方法并不适用于iEEG信号对。本文提出了一种利用二元EMD (BEMD)对震源和非震源iEEG信号进行分类的方法。将二元iEEG信号分解为同频段的信号分量。根据iEEG信号的imf计算出各种熵测度。然后,选取部分或全部熵作为特征,利用支持向量机(SVM)将其区分为焦点或非焦点iEEG。实验结果表明,该方法能够区分出焦点和非焦点的iEEG信号,平均分类准确率为86.89%。
{"title":"Epileptic focus localization based on bivariate empirical mode decomposition and entropy","authors":"Tatsunori Itakura, Toshihisa Tanaka","doi":"10.1109/APSIPA.2017.8282255","DOIUrl":"https://doi.org/10.1109/APSIPA.2017.8282255","url":null,"abstract":"Epilepsy is a neurological disorder which causes abnormal discharges in the brain. Epileptic focus localization is a important factor for successful epilepsy surgery. The intracranial electroencephalogram (iEEG) is the most used signal for detecting epileptic focus. The iEEG signals are obtained from a publicly available database that consists of 7,500 signal pairs. To this dataset, empirical mode decomposition (EMD) has been successfully applied to detect the epileptic focus. However, the EMD method is not suitable for iEEG signal pairs. In this paper, a method for the classification of focal and non-focal iEEG signals using bivariate EMD (BEMD) is presented. The bivariate iEEG signals are decomposed the into signal components of the same frequency band. Various entropy measures calculated from the IMFs of the iEEG signals. Then, some or all of the entropies are chosen as features, which are discriminated into focal or non-focal iEEG by using the support vector machine (SVM). Experimental results show that the proposed method is able to differentiate the focal from non-focal iEEG signals with an average classification accuracy of 86.89%.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129410428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
期刊
2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1