Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659593
Paul Tune, M. Roughan, Chris Wiren
The traffic matrix of a network is useful in a variety of applications: network planning and forecasting, traffic engineering and anomaly detection. Much work has focused on estimating traffic matrices, but methods are often tested on limited data. There is then the possibility of unrepresentativeness of the datasets, and the lack of generalizability of the subsequent results. Synthesis can help alleviate this problem. In this paper, we examine a fundamental question: what constitutes a good class of statistical models for traffic matrix synthesis? The results of our study is the definition of a set of axioms specifying structure on traffic matrix models, including the incorporation of organizational structure (hierarchies) in network traffic. We introduce the Hierarchical Traffic Matrix (HTM) which satisfies these requirements. We then study the hierarchical structure of the GEANT network, a research network based in Europe, to validate our ideas. Finally, we illustrate how structure in traffic matrices can affect network topology design.
{"title":"Hierarchical Traffic Matrices: Axiomatic Foundations to Practical Traffic Matrix Synthesis","authors":"Paul Tune, M. Roughan, Chris Wiren","doi":"10.23919/APSIPA.2018.8659593","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659593","url":null,"abstract":"The traffic matrix of a network is useful in a variety of applications: network planning and forecasting, traffic engineering and anomaly detection. Much work has focused on estimating traffic matrices, but methods are often tested on limited data. There is then the possibility of unrepresentativeness of the datasets, and the lack of generalizability of the subsequent results. Synthesis can help alleviate this problem. In this paper, we examine a fundamental question: what constitutes a good class of statistical models for traffic matrix synthesis? The results of our study is the definition of a set of axioms specifying structure on traffic matrix models, including the incorporation of organizational structure (hierarchies) in network traffic. We introduce the Hierarchical Traffic Matrix (HTM) which satisfies these requirements. We then study the hierarchical structure of the GEANT network, a research network based in Europe, to validate our ideas. Finally, we illustrate how structure in traffic matrices can affect network topology design.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131732647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659738
Sunghan Lee, Sangjun Han, S. Jun
Multi-user electroencephalogram (EEG) system is necessary to study concurrent activity among many persons. It is difficult to find a system that measures multiple EEG signals from more than even three people simultaneously. Therefore, we suggested a framework that is able to acquire EEG signals of more than eight persons at the same time and investigated the feasibility of this system. Acquisition was performed by using OpenViBE software developed by INRIA. Wireless EEG devices for our proposed framework were manufactured by BioBrain, Corp. in Korea. A device consists of eight channels measuring frontal EEG at a speed of 1 KHz sampling rate. While participants wore this system and did emotional video watching task as a group audience, their brain signals were acquired. To show its feasibility and efficacy, our preliminary result is analyzed using deep learning technique.
{"title":"EEG Hyperscanning for Eight or more Persons - Feasibility Study for Emotion Recognition using Deep Learning Technique","authors":"Sunghan Lee, Sangjun Han, S. Jun","doi":"10.23919/APSIPA.2018.8659738","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659738","url":null,"abstract":"Multi-user electroencephalogram (EEG) system is necessary to study concurrent activity among many persons. It is difficult to find a system that measures multiple EEG signals from more than even three people simultaneously. Therefore, we suggested a framework that is able to acquire EEG signals of more than eight persons at the same time and investigated the feasibility of this system. Acquisition was performed by using OpenViBE software developed by INRIA. Wireless EEG devices for our proposed framework were manufactured by BioBrain, Corp. in Korea. A device consists of eight channels measuring frontal EEG at a speed of 1 KHz sampling rate. While participants wore this system and did emotional video watching task as a group audience, their brain signals were acquired. To show its feasibility and efficacy, our preliminary result is analyzed using deep learning technique.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128792303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659636
Yukio Matsuyoshi, T. Takiguchi, Y. Ariki
A rule-based question-answering system is limited in its ability to understand a user's intention due to the inevitable incompleteness of the rules. To address this problem, in this paper, we propose a method to estimate question type and question keyword class from a user's question by using an attention-based LSTM (Long Short-Term Memory) model. We also propose a joint model for simultaneous estimation of question type and question keyword class. Through the experiment, the effectiveness of our proposed method is evaluated based upon estimation rates. In addition, the proposed method for question type estimation is compared with a rule-based system, support vector machine (SVM), and Random Forest. The method for question keyword class estimation is also compared with the non-attention LSTM model and the conventional model.
{"title":"User's Intention Understanding in Question-Answering System Using Attention-based LSTM","authors":"Yukio Matsuyoshi, T. Takiguchi, Y. Ariki","doi":"10.23919/APSIPA.2018.8659636","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659636","url":null,"abstract":"A rule-based question-answering system is limited in its ability to understand a user's intention due to the inevitable incompleteness of the rules. To address this problem, in this paper, we propose a method to estimate question type and question keyword class from a user's question by using an attention-based LSTM (Long Short-Term Memory) model. We also propose a joint model for simultaneous estimation of question type and question keyword class. Through the experiment, the effectiveness of our proposed method is evaluated based upon estimation rates. In addition, the proposed method for question type estimation is compared with a rule-based system, support vector machine (SVM), and Random Forest. The method for question keyword class estimation is also compared with the non-attention LSTM model and the conventional model.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125377277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659776
Kazuhiro Tsujimura, K. Umebayashi, J. Kokkoniemi, Janne J. Lehtomäki
Impulse response based channel model is vital for wireless communication analysis and modeling. This paper considers the impulse response of the terahertz band (THz band: 0.1-10 THz) for reflected path in case of short range (1–100 cm) wireless communication. In indoor application, it is necessary to consider multipath channel. In analysis of reflected path, rough surface of reflector is considered with Rayleigh roughness factor. The validity of the model is investigated with experimental THz band measurements (up to 2THz).
{"title":"A study on impulse response model of reflected path for THz band","authors":"Kazuhiro Tsujimura, K. Umebayashi, J. Kokkoniemi, Janne J. Lehtomäki","doi":"10.23919/APSIPA.2018.8659776","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659776","url":null,"abstract":"Impulse response based channel model is vital for wireless communication analysis and modeling. This paper considers the impulse response of the terahertz band (THz band: 0.1-10 THz) for reflected path in case of short range (1–100 cm) wireless communication. In indoor application, it is necessary to consider multipath channel. In analysis of reflected path, rough surface of reflector is considered with Rayleigh roughness factor. The validity of the model is investigated with experimental THz band measurements (up to 2THz).","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123334432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659484
Kazuki Matsumoto, H. Torikai, H. Sekiya
A Spiking Neural Network (SNN), which expresses information by spike trains, has an ability to process information with low energy like a human brain. Hardware implementation of a SNN is an important research problem. If the neurons are linked by wireless communications, SNNs can obtain the spatial degree of freedom, which may extend application area dramatically. Additionally, such SNNs can process information with low energy, owing to wireless communication by the spike trains. Therefore, it is regarded as low power-consumption wireless sensor networks (WSNs) with adding the functions of SNN neurons to wireless sensor nodes. This “Wireless Neural Sensor Networks” can distribute information processing like a brain on the WSN nodes. This paper presents a SNN with infrared(IR) communications as the first step of the above concept. Neurons are implemented by field programmable gate array, which are linked by IR communications. The implemented SNN succeeded in acquiring the XOR function through reinforcement learning.
{"title":"XOR learning by spiking neural network with infrared communications","authors":"Kazuki Matsumoto, H. Torikai, H. Sekiya","doi":"10.23919/APSIPA.2018.8659484","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659484","url":null,"abstract":"A Spiking Neural Network (SNN), which expresses information by spike trains, has an ability to process information with low energy like a human brain. Hardware implementation of a SNN is an important research problem. If the neurons are linked by wireless communications, SNNs can obtain the spatial degree of freedom, which may extend application area dramatically. Additionally, such SNNs can process information with low energy, owing to wireless communication by the spike trains. Therefore, it is regarded as low power-consumption wireless sensor networks (WSNs) with adding the functions of SNN neurons to wireless sensor nodes. This “Wireless Neural Sensor Networks” can distribute information processing like a brain on the WSN nodes. This paper presents a SNN with infrared(IR) communications as the first step of the above concept. Neurons are implemented by field programmable gate array, which are linked by IR communications. The implemented SNN succeeded in acquiring the XOR function through reinforcement learning.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126049695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659515
Sebastian Milde, Annika Liebgott, Ziwei Wu, Wenyi Feng, Jiahuan Yang, Lukas Mauch, P. Martirosian, F. Bamberg, K. Nikolaou, S. Gatidis, F. Schick, Bin Yang, Thomas Kustner
In clinical diagnostic, magnetic resonance imaging (MRI) is a valuable and versatile tool. The acquisition process is, however, susceptible to image distortions (artifacts) which may lead to degradation of image quality. Automated and reference-free localization and quantification of artifacts by employing convolutional neural networks (CNNs) is a promising way for early detection of artifacts. Training relies on high amount of expert labeled data which is a time-demanding process. Previous studies were based on global labels, i.e. a whole volume was automatically labeled as artifact-free or artifact-affected. However, artifact appearance is rather localized. We propose a local labeling which is conducted via a graphical user interface (GUI). Moreover, the GUI provides easy handling of data viewing, preprocessing (labeling, patching, data augmentation), network parametrization and training, data and network evaluation as well as deep visualization of the learned network content. The GUI is not limited to these features and will be extended in the future. The developed GUI is made publicly available and features a modular outline to target different applications of machine learning and deep learning, such as artifact detection, classification and segmentation.
{"title":"Graphical User Interface for Medical Deep Learning - Application to Magnetic Resonance Imaging","authors":"Sebastian Milde, Annika Liebgott, Ziwei Wu, Wenyi Feng, Jiahuan Yang, Lukas Mauch, P. Martirosian, F. Bamberg, K. Nikolaou, S. Gatidis, F. Schick, Bin Yang, Thomas Kustner","doi":"10.23919/APSIPA.2018.8659515","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659515","url":null,"abstract":"In clinical diagnostic, magnetic resonance imaging (MRI) is a valuable and versatile tool. The acquisition process is, however, susceptible to image distortions (artifacts) which may lead to degradation of image quality. Automated and reference-free localization and quantification of artifacts by employing convolutional neural networks (CNNs) is a promising way for early detection of artifacts. Training relies on high amount of expert labeled data which is a time-demanding process. Previous studies were based on global labels, i.e. a whole volume was automatically labeled as artifact-free or artifact-affected. However, artifact appearance is rather localized. We propose a local labeling which is conducted via a graphical user interface (GUI). Moreover, the GUI provides easy handling of data viewing, preprocessing (labeling, patching, data augmentation), network parametrization and training, data and network evaluation as well as deep visualization of the learned network content. The GUI is not limited to these features and will be extended in the future. The developed GUI is made publicly available and features a modular outline to target different applications of machine learning and deep learning, such as artifact detection, classification and segmentation.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114129515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659655
Koichi Ito, Hiroya Kawai, Takehisa Okano, T. Aoki
Attribute information such as age and gender improves the performance of face recognition. This paper proposes an age and gender prediction method from face images using convolutional neural network. Through a set of experiments using public face databases, we demonstrate that the proposed method exhibits the efficient performance on age and gender prediction compared with conventional methods.
{"title":"Age and Gender Prediction from Face Images Using Convolutional Neural Network","authors":"Koichi Ito, Hiroya Kawai, Takehisa Okano, T. Aoki","doi":"10.23919/APSIPA.2018.8659655","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659655","url":null,"abstract":"Attribute information such as age and gender improves the performance of face recognition. This paper proposes an age and gender prediction method from face images using convolutional neural network. Through a set of experiments using public face databases, we demonstrate that the proposed method exhibits the efficient performance on age and gender prediction compared with conventional methods.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116236480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659599
Hongwu Yang, Weizhao Zhang, Pengpeng Zhi
The paper proposes a deep neural network (DNN)-based emotional speech synthesis method to improve the quality of synthesized emotional speech by speaker adaptation with a multi-speaker and multi-emotion speech corpus. Firstly, a text analyzer is employed to obtain the contextual labels from sentences while the WORLD vocoder is used to extract the acoustic features from corresponding speeches. Then a set of speaker-independent DNN average voice models are trained with the contextual labels and acoustic features of multi-emotion speech corpus. Finally, the speaker adaptation is adopted to train a set of speaker-dependent DNN voice models of target emotion with target emotional training speeches. The target emotional speech is synthesized by the speaker-dependent DNN voice models. Subjective evaluations show that comparing with the traditional hidden Markov model (HMM)-based method, the proposed method can achieve higher opinion scores. Objective tests demonstrate that the spectrum of the emotional speech synthesized by the proposed method is also closer to the original speech than that of the emotional speech synthesized by the HMM-based method. Therefore, the proposed method can improve the emotion express and naturalness of synthesized emotional speech.
{"title":"A DNN-based emotional speech synthesis by speaker adaptation","authors":"Hongwu Yang, Weizhao Zhang, Pengpeng Zhi","doi":"10.23919/APSIPA.2018.8659599","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659599","url":null,"abstract":"The paper proposes a deep neural network (DNN)-based emotional speech synthesis method to improve the quality of synthesized emotional speech by speaker adaptation with a multi-speaker and multi-emotion speech corpus. Firstly, a text analyzer is employed to obtain the contextual labels from sentences while the WORLD vocoder is used to extract the acoustic features from corresponding speeches. Then a set of speaker-independent DNN average voice models are trained with the contextual labels and acoustic features of multi-emotion speech corpus. Finally, the speaker adaptation is adopted to train a set of speaker-dependent DNN voice models of target emotion with target emotional training speeches. The target emotional speech is synthesized by the speaker-dependent DNN voice models. Subjective evaluations show that comparing with the traditional hidden Markov model (HMM)-based method, the proposed method can achieve higher opinion scores. Objective tests demonstrate that the spectrum of the emotional speech synthesized by the proposed method is also closer to the original speech than that of the emotional speech synthesized by the HMM-based method. Therefore, the proposed method can improve the emotion express and naturalness of synthesized emotional speech.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116777523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659747
Xuyang Zhao, Toshihisa Tanaka, Wanzeng Kong, Qibin Zhao, Jianting Cao, H. Sugano, Noboru Yoshida
Epilepsy is a chronic disorder of the brain. Intracranial electroencephalogram (iEEG) recorded from cortex is the most popular measurement for not only the diagnosis of epilepsy, but also the focus localization that is crucial for the surgery. In recent years, the machine learning methods have been rapidly developed and applied successfully to various real world problems. Given sufficient number of samples, the powerful deep learning methods can achieve high performance for epileptic focus localization. However, it is a challenging task to obtain large amount of labeled iEEG regarding focal/non-focal channels, since the annotations must be performed by multiple clinical experts through visual judgment on the long term iEEG signals. In order to reduce the necessary number of labeled training samples, we introduce the positive unlabeled (PU) learning method for classification of focal and non-focal epileptic iEEG signals. The proposed method enables us to learn a binary classifier by using small amount of labeled data containing only one class (i.e., focal signals) and unlabeled data containing two classes (i.e., focal and non-focal signals), which greatly reduces the workload of clinical experts for annotations. Experimental results on Bern dataset and iEEG recorded from Juntendo University Hospital demonstrate the effectiveness of our method.
{"title":"Epileptic Focus Localization Based on iEEG by Using Positive Unlabeled (PU) Learning","authors":"Xuyang Zhao, Toshihisa Tanaka, Wanzeng Kong, Qibin Zhao, Jianting Cao, H. Sugano, Noboru Yoshida","doi":"10.23919/APSIPA.2018.8659747","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659747","url":null,"abstract":"Epilepsy is a chronic disorder of the brain. Intracranial electroencephalogram (iEEG) recorded from cortex is the most popular measurement for not only the diagnosis of epilepsy, but also the focus localization that is crucial for the surgery. In recent years, the machine learning methods have been rapidly developed and applied successfully to various real world problems. Given sufficient number of samples, the powerful deep learning methods can achieve high performance for epileptic focus localization. However, it is a challenging task to obtain large amount of labeled iEEG regarding focal/non-focal channels, since the annotations must be performed by multiple clinical experts through visual judgment on the long term iEEG signals. In order to reduce the necessary number of labeled training samples, we introduce the positive unlabeled (PU) learning method for classification of focal and non-focal epileptic iEEG signals. The proposed method enables us to learn a binary classifier by using small amount of labeled data containing only one class (i.e., focal signals) and unlabeled data containing two classes (i.e., focal and non-focal signals), which greatly reduces the workload of clinical experts for annotations. Experimental results on Bern dataset and iEEG recorded from Juntendo University Hospital demonstrate the effectiveness of our method.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"43 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113941982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-11-01DOI: 10.23919/APSIPA.2018.8659664
Prasad A. Tapkir, H. Patil
The increased use of voice biometrics for various security applications, motivated authors to investigate different countermeasures for the hazard of spoofing attacks, where the attacker tries to imitate the genuine speaker. The replay is the most accessible spoofing attack. Past studies have ignored phase information for various speech processing applications. In this paper, we explore the excitation source-like feature set, namely, Teager Energy Operator (TEO) phase and its significance in the replay spoof detection task. This feature set is further fused at score-level with magnitude spectrum-based features, such as Constant Q Cepstral Coefficients (CQCC), Mel Frequency Cepstral Coefficients (MFCC), and Linear Frequency Cepstral Coefficients (LFCC). The improvement in the results show that the TEO phase feature set contains the complementary information to the magnitude spectrum-based features. The experiments are performed on the ASV Spoof 2017 Challenge database. The systems are implemented with Gaussian Mixture Model (GMM) as a classifier. Our best system using TEO phase achieves the Equal Error Rate (EER) of 6.57% and 15.39% on the development and evaluation set, respectively.
{"title":"Significance of Teager Energy Operator Phase for Replay Spoof Detection","authors":"Prasad A. Tapkir, H. Patil","doi":"10.23919/APSIPA.2018.8659664","DOIUrl":"https://doi.org/10.23919/APSIPA.2018.8659664","url":null,"abstract":"The increased use of voice biometrics for various security applications, motivated authors to investigate different countermeasures for the hazard of spoofing attacks, where the attacker tries to imitate the genuine speaker. The replay is the most accessible spoofing attack. Past studies have ignored phase information for various speech processing applications. In this paper, we explore the excitation source-like feature set, namely, Teager Energy Operator (TEO) phase and its significance in the replay spoof detection task. This feature set is further fused at score-level with magnitude spectrum-based features, such as Constant Q Cepstral Coefficients (CQCC), Mel Frequency Cepstral Coefficients (MFCC), and Linear Frequency Cepstral Coefficients (LFCC). The improvement in the results show that the TEO phase feature set contains the complementary information to the magnitude spectrum-based features. The experiments are performed on the ASV Spoof 2017 Challenge database. The systems are implemented with Gaussian Mixture Model (GMM) as a classifier. Our best system using TEO phase achieves the Equal Error Rate (EER) of 6.57% and 15.39% on the development and evaluation set, respectively.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122502610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}