Pub Date : 2018-11-01DOI: 10.1109/ICDSP.2018.8631616
Ting Li, Liqiang Zhao, Fengfei Song, Chengkang Pan
Network slicing is a key technology of 5G network to realize flexible customization for various services based on Network Function Virtualization and Software Defined Network. In this paper, we discuss end-to-end network slicing in terms of non-standalone 5G standard, where eMBB and uRLLC scenarios are supported using 4G core network. Firstly, we present eMBB and uRLLC slices at the user plane respectively. To reduce end-to-end delay in the uRLLC slice, Mobile Edge Computing is introduced. Secondly, both eMBB and uRLLC slices share the same control plane at core network. Finally, we establish a testbed based on the open source software of OAI. Experimental results demonstrate that our proposed scheme can increase the downlink rate for eMBB slice and reduce the delay for uRLLC slice.
{"title":"OAI-based End-to-End Network Slicing","authors":"Ting Li, Liqiang Zhao, Fengfei Song, Chengkang Pan","doi":"10.1109/ICDSP.2018.8631616","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631616","url":null,"abstract":"Network slicing is a key technology of 5G network to realize flexible customization for various services based on Network Function Virtualization and Software Defined Network. In this paper, we discuss end-to-end network slicing in terms of non-standalone 5G standard, where eMBB and uRLLC scenarios are supported using 4G core network. Firstly, we present eMBB and uRLLC slices at the user plane respectively. To reduce end-to-end delay in the uRLLC slice, Mobile Edge Computing is introduced. Secondly, both eMBB and uRLLC slices share the same control plane at core network. Finally, we establish a testbed based on the open source software of OAI. Experimental results demonstrate that our proposed scheme can increase the downlink rate for eMBB slice and reduce the delay for uRLLC slice.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128847616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICDSP.2018.8631800
Urszula Libal
Feature extraction is one of the most important stages of pattern recognition. In the paper, a second-degree nonlinear Schur parametrization is proposed as a method of extraction of features from non-Gaussian and non-stationary time-series. The nonlinear algorithm is derived from the linear Schur parametrization. The experimental pattern recognition, using several well-known classifiers, is performed on UCI ML repository benchmark data: 60-dimensional sonar digital data set. The classification accuracy for nonlinear Schur parameterization as feature extraction is compared to the results obtained for the linear Schur parametrization and other popular feature extraction methods. The use of a nonlinear parametrization method causes a significant increase in the classification accuracy, comparing to linear case, with a relatively moderate – as for multidimensional nonlinear algorithm– increase in the number of features.
{"title":"Pattern Recognition Based on Multidimensional Nonlinear Schur Parametrization","authors":"Urszula Libal","doi":"10.1109/ICDSP.2018.8631800","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631800","url":null,"abstract":"Feature extraction is one of the most important stages of pattern recognition. In the paper, a second-degree nonlinear Schur parametrization is proposed as a method of extraction of features from non-Gaussian and non-stationary time-series. The nonlinear algorithm is derived from the linear Schur parametrization. The experimental pattern recognition, using several well-known classifiers, is performed on UCI ML repository benchmark data: 60-dimensional sonar digital data set. The classification accuracy for nonlinear Schur parameterization as feature extraction is compared to the results obtained for the linear Schur parametrization and other popular feature extraction methods. The use of a nonlinear parametrization method causes a significant increase in the classification accuracy, comparing to linear case, with a relatively moderate – as for multidimensional nonlinear algorithm– increase in the number of features.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"393 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115915761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICDSP.2018.8631815
T. Lee, J. G. Lim, K. Leo, S. Sanei
The needs of an ever growing global aging population are a cause of world wide concern. The consequent ageing of the human nervous system is a major risk factor for stroke and many other neurological disorders. These pathological conditions affect the activities of daily living and impose a support and resource burden on society. Rehabilitation is long term and resource intensive and even so, it can be subjective and inconsistent in execution. We propose a novel system to indicate the level of neurological disorder by electronically scoring a widely used rehabilitative assessment for the upper limb. This is done by embedding widely available sensors into the objects used in this assessment. We enhance this with a two new features derived from these sensors and process one of them using a data driven approachA set of pilot trials were conducted to demonstrate the effectiveness of our approach with promising results.
{"title":"Indications of Neural Disorder through Automated Assessment of the Box and Block Test","authors":"T. Lee, J. G. Lim, K. Leo, S. Sanei","doi":"10.1109/ICDSP.2018.8631815","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631815","url":null,"abstract":"The needs of an ever growing global aging population are a cause of world wide concern. The consequent ageing of the human nervous system is a major risk factor for stroke and many other neurological disorders. These pathological conditions affect the activities of daily living and impose a support and resource burden on society. Rehabilitation is long term and resource intensive and even so, it can be subjective and inconsistent in execution. We propose a novel system to indicate the level of neurological disorder by electronically scoring a widely used rehabilitative assessment for the upper limb. This is done by embedding widely available sensors into the objects used in this assessment. We enhance this with a two new features derived from these sensors and process one of them using a data driven approachA set of pilot trials were conducted to demonstrate the effectiveness of our approach with promising results.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117072580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICDSP.2018.8631826
Bo Liu, Shisheng Guo, Hai Qin, Yu Gong, Jinjiang Yang, Wei-qi Ge, Jun Yang
This paper proposes an hybrid deep neural network (DNN) for speech recognition and an energy-efficient reconfigurable architecture with approximate computing for accelerating the DNN. The hybrid DNN consists of two network models: a binary weight network (BWN) for twenty key words recognition; a recurrent neural network (RNN) for processing acoustic model of high precision common words recognition. To accelerate the hybrid DNN and reduce the energy cost, we propose a digital-analog mixed reconfigurable architecture with approximate computing units, including: a BWN accelerator with analog multi-chain delay-addition units for bit-wise approximate computing, and a RNN accelerator with approximate multiplication units for different calculation accuracy requirements. Implementation and simulation with TSMC 28nm HPC+ process technology, the energy efficiency of proposed architecture can achieves 163.8TOPS/W for twenty key words recognition and 3.3TOPS/W for common words recognition. Comparing with State-of-the-Art architectures, this work achieves over 1.7X better in energy efficiency with approximate computing.
{"title":"An Energy-efficient Reconfigurable Hybrid DNN Architecture for Speech Recognition with Approximate Computing","authors":"Bo Liu, Shisheng Guo, Hai Qin, Yu Gong, Jinjiang Yang, Wei-qi Ge, Jun Yang","doi":"10.1109/ICDSP.2018.8631826","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631826","url":null,"abstract":"This paper proposes an hybrid deep neural network (DNN) for speech recognition and an energy-efficient reconfigurable architecture with approximate computing for accelerating the DNN. The hybrid DNN consists of two network models: a binary weight network (BWN) for twenty key words recognition; a recurrent neural network (RNN) for processing acoustic model of high precision common words recognition. To accelerate the hybrid DNN and reduce the energy cost, we propose a digital-analog mixed reconfigurable architecture with approximate computing units, including: a BWN accelerator with analog multi-chain delay-addition units for bit-wise approximate computing, and a RNN accelerator with approximate multiplication units for different calculation accuracy requirements. Implementation and simulation with TSMC 28nm HPC+ process technology, the energy efficiency of proposed architecture can achieves 163.8TOPS/W for twenty key words recognition and 3.3TOPS/W for common words recognition. Comparing with State-of-the-Art architectures, this work achieves over 1.7X better in energy efficiency with approximate computing.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115533325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICDSP.2018.8631611
Chuanpu Li, Xin Jin, Yanqin Chen, Qionghai Dai
The point spread function (PSF) of plenoptic camera is verified to be spatial varying theoretically. Therefore, memory and time are consumed severely during the reconstruction of large-scale light field at the object plane where inversing the PSF matrix is needed. This problem will directly limit the spatial resolution of the object that can be handled. In this paper, a layered LU decomposition, partitioned Gaussian elimination and memory reusing method are proposed to reconstruct the light field for plenoptic camera. Layered LU decomposition together with partitioned Gaussian elimination makes a better use of computer’s memory hierarchies and increases computing efficiency. The intra layer memory reusing method further reduces the memory consumption by in-place updating. Compared with existing methods, the proposed algorithm can reduce the memory consumption by the maximum of 1.85 times. It also provides the best trade-off between the computational complexity and memory consumption.
{"title":"Storage-Computational Complexity Efficient Light Field Reconstruction","authors":"Chuanpu Li, Xin Jin, Yanqin Chen, Qionghai Dai","doi":"10.1109/ICDSP.2018.8631611","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631611","url":null,"abstract":"The point spread function (PSF) of plenoptic camera is verified to be spatial varying theoretically. Therefore, memory and time are consumed severely during the reconstruction of large-scale light field at the object plane where inversing the PSF matrix is needed. This problem will directly limit the spatial resolution of the object that can be handled. In this paper, a layered LU decomposition, partitioned Gaussian elimination and memory reusing method are proposed to reconstruct the light field for plenoptic camera. Layered LU decomposition together with partitioned Gaussian elimination makes a better use of computer’s memory hierarchies and increases computing efficiency. The intra layer memory reusing method further reduces the memory consumption by in-place updating. Compared with existing methods, the proposed algorithm can reduce the memory consumption by the maximum of 1.85 times. It also provides the best trade-off between the computational complexity and memory consumption.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115709788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICDSP.2018.8631578
Xudong Dong, Liqiang Zhao, H Zhao, Chengkang Pan
In this paper, we propose a radio access network (RAN) slicing based handover scheme in heterogeneous networks, which provides users with better QoS of customized services during handover. Firstly, we present a hierarchical control model composed of a global controller and multiple local controllers for the heterogeneous networks, including LTE, WLAN and NR. Secondly, we develop each RAN slice based on control/user plane separation to provide users with customized services. Thirdly, we implement the RAN slicing based handover scheme. Finally, we develop a testbed based on open-source software and the experiment results verify the feasibility of the proposed handover scheme.
{"title":"RAN Slicing-based Handover Scheme in HetNets","authors":"Xudong Dong, Liqiang Zhao, H Zhao, Chengkang Pan","doi":"10.1109/ICDSP.2018.8631578","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631578","url":null,"abstract":"In this paper, we propose a radio access network (RAN) slicing based handover scheme in heterogeneous networks, which provides users with better QoS of customized services during handover. Firstly, we present a hierarchical control model composed of a global controller and multiple local controllers for the heterogeneous networks, including LTE, WLAN and NR. Secondly, we develop each RAN slice based on control/user plane separation to provide users with customized services. Thirdly, we implement the RAN slicing based handover scheme. Finally, we develop a testbed based on open-source software and the experiment results verify the feasibility of the proposed handover scheme.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114667441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICDSP.2018.8631558
Yuechi Jiang, F. H. F. Leung
Gaussian Mixture Model (GMM) has been widely used in speech signal and image signal classification tasks. It can be directly used as a classifier, or used as the representation of speech or image signals. Another important usage of GMM is to serve as the Universal Background Model (UBM) to generate speech representations such as Gaussian Supervector (GSV) and i-vector. In this paper, we borrow GSV from speech signal classification studies and apply it as an image representation for image classification. GSV is calculated based on a Universal Background Model (UBM). Apart from employing the conventional GMM as the UBM to calculate GSV, we also propose the Equal-Variance GMM (EV-GMM), where all the variables in all the Gaussian mixture components share the same variance. Moreover, we derive the kernel version of EV-GMM, which generalizes EV-GMM by introducing a kernel. We then compare GSV to the raw image feature and other popular image representations such as Sparse Representation (SR) and Collaborative Representation (CR). Experiments are carried out on a handwritten digit recognition task, and classification results indicate that GSV can work very well and can be even better than other popular image representations. In addition, as the UBM, the proposed EV-GMM can work better than the conventional GMM.
{"title":"Gaussian Mixture Model and Gaussian Supervector for Image Classification","authors":"Yuechi Jiang, F. H. F. Leung","doi":"10.1109/ICDSP.2018.8631558","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631558","url":null,"abstract":"Gaussian Mixture Model (GMM) has been widely used in speech signal and image signal classification tasks. It can be directly used as a classifier, or used as the representation of speech or image signals. Another important usage of GMM is to serve as the Universal Background Model (UBM) to generate speech representations such as Gaussian Supervector (GSV) and i-vector. In this paper, we borrow GSV from speech signal classification studies and apply it as an image representation for image classification. GSV is calculated based on a Universal Background Model (UBM). Apart from employing the conventional GMM as the UBM to calculate GSV, we also propose the Equal-Variance GMM (EV-GMM), where all the variables in all the Gaussian mixture components share the same variance. Moreover, we derive the kernel version of EV-GMM, which generalizes EV-GMM by introducing a kernel. We then compare GSV to the raw image feature and other popular image representations such as Sparse Representation (SR) and Collaborative Representation (CR). Experiments are carried out on a handwritten digit recognition task, and classification results indicate that GSV can work very well and can be even better than other popular image representations. In addition, as the UBM, the proposed EV-GMM can work better than the conventional GMM.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114689125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Physical Unclonable Functions (PUFs) have emerged as a lightweight security primitive for resource constrained devices. However, conventional delay-based Physical Unclonable Functions (PUFs) are vulnerable to machine learning (ML) based modelling attacks. Although ML resistant PUF designs have been proposed, they often suffer from large overheads and are difficult to implement on FPGA. Lightweight ML resistant FPGA compatible designs have been proposed which make use of combined multi-PUF designs, incorporating a set of weak PUFs to obscure the challenge to a strong PUF in order to increase the difficulty of model building. In such designs any unreliability in the main PUF is amplified by unreliability in the masking PUFs. For this reason strong PUFs suitable for FPGA that can achieve high reliability, such as the Configurable Ring Oscillator (CRO) PUF, are a promising option. In this paper a mathematical model of the CRO PUF is presented. We show that models of traditional CRO PUFs can be trained to above 99% prediction rate using the Linear Regression and CMA-ES strategies. A proposed multi-PUF design based on the previously proposed arbiter MPUF is evaluated with the same methods. It is shown that even with challenge obfuscation the CRO PUF can be predicted with greater than 90% accuracy. It is shown that with the addition of a second XORed PUF the ML resistance can be increased further with a maximum prediction rate of 86%.
{"title":"Modelling Attack Analysis of Configurable Ring Oscillator (CRO) PUF Designs","authors":"Jack Miskelly, Chongyan Gu, Qingqing Ma, Yijun Cui, Weiqiang Liu, Máire O’Neill","doi":"10.1109/ICDSP.2018.8631638","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631638","url":null,"abstract":"Physical Unclonable Functions (PUFs) have emerged as a lightweight security primitive for resource constrained devices. However, conventional delay-based Physical Unclonable Functions (PUFs) are vulnerable to machine learning (ML) based modelling attacks. Although ML resistant PUF designs have been proposed, they often suffer from large overheads and are difficult to implement on FPGA. Lightweight ML resistant FPGA compatible designs have been proposed which make use of combined multi-PUF designs, incorporating a set of weak PUFs to obscure the challenge to a strong PUF in order to increase the difficulty of model building. In such designs any unreliability in the main PUF is amplified by unreliability in the masking PUFs. For this reason strong PUFs suitable for FPGA that can achieve high reliability, such as the Configurable Ring Oscillator (CRO) PUF, are a promising option. In this paper a mathematical model of the CRO PUF is presented. We show that models of traditional CRO PUFs can be trained to above 99% prediction rate using the Linear Regression and CMA-ES strategies. A proposed multi-PUF design based on the previously proposed arbiter MPUF is evaluated with the same methods. It is shown that even with challenge obfuscation the CRO PUF can be predicted with greater than 90% accuracy. It is shown that with the addition of a second XORed PUF the ML resistance can be increased further with a maximum prediction rate of 86%.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127437637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.1109/ICDSP.2018.8631631
Cheng Chen, Arvind Srivastav, D. Ariando, S. Mandal, Yiqiao Tang, Yi-Qiao Song
Multidimensional inverse Laplace transforms (ILTs) are of importance for obtaining sample properties from nuclear magnetic resonance (NMR) relaxation and diffusion measurements. This paper describes computationally-efficient implementations of the one-dimensional ILT on embedded processors that enable adaptive “smart” data acquisition approaches for portable low-field NMR devices. Experimental results from a low-cost NMR device based on an Altera system-on-chip Soc that integrates an embedded ARM core with an FPGA fabric are also presented.
{"title":"Real-Time Data Inversion Methods for Low-Field Nuclear Magnetic Resonance (NMR)","authors":"Cheng Chen, Arvind Srivastav, D. Ariando, S. Mandal, Yiqiao Tang, Yi-Qiao Song","doi":"10.1109/ICDSP.2018.8631631","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631631","url":null,"abstract":"Multidimensional inverse Laplace transforms (ILTs) are of importance for obtaining sample properties from nuclear magnetic resonance (NMR) relaxation and diffusion measurements. This paper describes computationally-efficient implementations of the one-dimensional ILT on embedded processors that enable adaptive “smart” data acquisition approaches for portable low-field NMR devices. Experimental results from a low-cost NMR device based on an Altera system-on-chip Soc that integrates an embedded ARM core with an FPGA fabric are also presented.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125880249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The common utilization of bidirectional gated recurrent unit (BGRU) architectures for end-to-end speech recognition suffers from long-term dependence and information redundancy. The reason lies in that the BGRU architectures model speech data according to time distance, which implicitly assumes that speech data is continuous. In this paper, we propose a new hypothesis, i.e., speech data possess the feature of being locally continuous and globally discrete. Based on this hypothesis, we propose to model speech data according to information distance. To support this hypothesis, we design an information distance based modeling architecture. Via the incorporation of self-attention mechanism, the proposed architecture is termed self-attention bidirectional gated recurrent unit (SABGRU). Experiment results show that SABGRU increases more than 10% speech recognition accuracy over conventional BGRU.
{"title":"Information Distance Based Self-Attention-BGRU Layer for End-to-End Speech Recognition","authors":"Yunhao Yan, Qinmengying Yan, Guang Hua, Haijian Zhang","doi":"10.1109/ICDSP.2018.8631855","DOIUrl":"https://doi.org/10.1109/ICDSP.2018.8631855","url":null,"abstract":"The common utilization of bidirectional gated recurrent unit (BGRU) architectures for end-to-end speech recognition suffers from long-term dependence and information redundancy. The reason lies in that the BGRU architectures model speech data according to time distance, which implicitly assumes that speech data is continuous. In this paper, we propose a new hypothesis, i.e., speech data possess the feature of being locally continuous and globally discrete. Based on this hypothesis, we propose to model speech data according to information distance. To support this hypothesis, we design an information distance based modeling architecture. Via the incorporation of self-attention mechanism, the proposed architecture is termed self-attention bidirectional gated recurrent unit (SABGRU). Experiment results show that SABGRU increases more than 10% speech recognition accuracy over conventional BGRU.","PeriodicalId":218806,"journal":{"name":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126996360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}