Pub Date : 2022-12-16DOI: 10.25073/2588-1086/vnucsce.306
Van‐Phuc Hoang, Van-Toan Tran, Quang-Kien Trinh
Based on intrinsic physical characteristics of devices, Physically Unclonable Functions (PUFs) provide the high reliability while maintaining the sufficient uniqueness. However, in the practical implementation based on PUFs, the extracted bit-string normally exhibits the unavoidable small fluctuation. Hence, PUFs can be used for the application of chip identification, but not suitable for the application that strictly requires an exact generated number. In this work, we propose several techniques to stabilize the generated value based on the existing Ring Oscillator (RO)-PUF circuit so that the stable unique number can be directly used for high-profile hardware security applications. In detail, we design a specialized on-chip key generation circuit that repeatedly samples the RO frequency values for statistical analysis and dynamically phases out the unstable bits, resulting in a unique and stable output bit-string. The experiments are conducted for the actual data measured from Xilinx Artix-7 FPGA devices. The generated key is proven to be relatively stable and can be readily used for the emerging security applications.
{"title":"Stabilizing Techniques for Secure On-chip Key Generation Based on RO-PUF","authors":"Van‐Phuc Hoang, Van-Toan Tran, Quang-Kien Trinh","doi":"10.25073/2588-1086/vnucsce.306","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.306","url":null,"abstract":"Based on intrinsic physical characteristics of devices, Physically Unclonable Functions (PUFs) provide the high reliability while maintaining the sufficient uniqueness. However, in the practical implementation based on PUFs, the extracted bit-string normally exhibits the unavoidable small fluctuation. Hence, PUFs can be used for the application of chip identification, but not suitable for the application that strictly requires an exact generated number. In this work, we propose several techniques to stabilize the generated value based on the existing Ring Oscillator (RO)-PUF circuit so that the stable unique number can be directly used for high-profile hardware security applications. In detail, we design a specialized on-chip key generation circuit that repeatedly samples the RO frequency values for statistical analysis and dynamically phases out the unstable bits, resulting in a unique and stable output bit-string. The experiments are conducted for the actual data measured from Xilinx Artix-7 FPGA devices. The generated key is proven to be relatively stable and can be readily used for the emerging security applications.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132520356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-16DOI: 10.25073/2588-1086/vnucsce.344
Minh Le Nguyen
Machine Reading Comprehension (MRC) is a great NLP task that requires concentration on making the machine read, scan documents, and extract meaning from the text, just like a human reader.One of the MRC system challenges is not only having to understand the context to extract the answer but also being aware of the trust-worthy of the given question is possible or not.Thought pre-trained language models (PTMs) have shown their performance on many NLP downstream tasks, but it still has a limitation in the fixed-length input. We propose an unsupervised context selector that shortens the given context but still contains the answers within related contexts.In VLSP2021-MRC shared task dataset, we also empirical several training strategies consisting of unanswerable question sample selection and different adversarial training approaches, which slightly boost the performance 2.5% in EM score and 1% in F1 score.
{"title":"ViMRC - VLSP 2021: An empirical study of Vietnamese Machine Reading Comprehension with Unsupervised Context Selector and Adversarial Learning","authors":"Minh Le Nguyen","doi":"10.25073/2588-1086/vnucsce.344","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.344","url":null,"abstract":"Machine Reading Comprehension (MRC) is a great NLP task that requires concentration on making the machine read, scan documents, and extract meaning from the text, just like a human reader.One of the MRC system challenges is not only having to understand the context to extract the answer but also being aware of the trust-worthy of the given question is possible or not.Thought pre-trained language models (PTMs) have shown their performance on many NLP downstream tasks, but it still has a limitation in the fixed-length input. We propose an unsupervised context selector that shortens the given context but still contains the answers within related contexts.In VLSP2021-MRC shared task dataset, we also empirical several training strategies consisting of unanswerable question sample selection and different adversarial training approaches, which slightly boost the performance 2.5% in EM score and 1% in F1 score.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131782706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-16DOI: 10.25073/2588-1086/vnucsce.364
P. Phan
Machine reading comprehension (MRC) is a challenging Natural Language Processing (NLP) research fieldand wide real-world applications. The great progress of this field in recents is mainly due to the emergence offew datasets for machine reading comprehension tasks with large sizes and deep learning. For the Vietnameselanguage, some datasets, such as UIT-ViQuAD [1] and UIT-ViNewsQA [2], most recently, UIT-ViQuAD 2.0 [3] - adataset of the competitive VLSP 2021-MRC Shared Task 1 . MRC systems must not only answer questions whennecessary but also tactfully abstain from answering when no answer is available according to the given passage.In this paper, we proposed two types of joint models for answerability prediction and pure-MRC prediction with/without a dependency mechanism to learn the correlation between a start position and end position in pure-MRCoutput prediction. Besides, we use ensemble models and a verification strategy by voting the best answer from thetop K answers of different models. Our proposed approach is evaluated on the benchmark VLSP 2021-MRC SharedTask challenge dataset UIT-ViQuAD 2.0 [3] shows that our approach is significantly better than the baseline.
{"title":"VLSP 2021 - VieCap4H Challenge: Automatic Image Caption Generation for Healthcare Domain in Vietnamese","authors":"P. Phan","doi":"10.25073/2588-1086/vnucsce.364","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.364","url":null,"abstract":"Machine reading comprehension (MRC) is a challenging Natural Language Processing (NLP) research fieldand wide real-world applications. The great progress of this field in recents is mainly due to the emergence offew datasets for machine reading comprehension tasks with large sizes and deep learning. For the Vietnameselanguage, some datasets, such as UIT-ViQuAD [1] and UIT-ViNewsQA [2], most recently, UIT-ViQuAD 2.0 [3] - adataset of the competitive VLSP 2021-MRC Shared Task 1 . MRC systems must not only answer questions whennecessary but also tactfully abstain from answering when no answer is available according to the given passage.In this paper, we proposed two types of joint models for answerability prediction and pure-MRC prediction with/without a dependency mechanism to learn the correlation between a start position and end position in pure-MRCoutput prediction. Besides, we use ensemble models and a verification strategy by voting the best answer from thetop K answers of different models. Our proposed approach is evaluated on the benchmark VLSP 2021-MRC SharedTask challenge dataset UIT-ViQuAD 2.0 [3] shows that our approach is significantly better than the baseline.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134316037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-16DOI: 10.25073/2588-1086/vnucsce.302
T. Quan
Natural Language Processing (NLP) is one of the major branches in the emerging field of Artificial Intelligence (AI). Classical approaches in this area were mostly based on parsing and information extraction techniques, which suffered from great difficulty when dealing with very large textual datasets available in practical applications. This issue can potentially be addressed with the recent advancement of the Deep Learning (DL) techniques, which are naturally assuming very large datasets for training. In fact, NLP research has witnessed a remarkable achievement with the introduction of Word Embedding techniques, which allows a document to be represented meaningfully as a matrix, on which major DL models like CNN or RNN can be deployed effectively to accomplish common NLP tasks. Gradually, NLP scholars keep developing specific models for their areas, notably attention-enhanced BiLSTM, Transformer and BERT. The births of those models have introduced a new wave of modern approaches which frequently report new breaking results and open much novel research directions. The aim of this paper is to give readers a roadmap of those modern approaches in NLP, including their ideas, theories and applications. This would hopefully offer a solid background for further research in this area.
{"title":"N/A Modern Approaches in Natural Language Processing","authors":"T. Quan","doi":"10.25073/2588-1086/vnucsce.302","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.302","url":null,"abstract":"Natural Language Processing (NLP) is one of the major branches in the emerging field of Artificial Intelligence (AI). Classical approaches in this area were mostly based on parsing and information extraction techniques, which suffered from great difficulty when dealing with very large textual datasets available in practical applications. This issue can potentially be addressed with the recent advancement of the Deep Learning (DL) techniques, which are naturally assuming very large datasets for training. In fact, NLP research has witnessed a remarkable achievement with the introduction of Word Embedding techniques, which allows a document to be represented meaningfully as a matrix, on which major DL models like CNN or RNN can be deployed effectively to accomplish common NLP tasks. Gradually, NLP scholars keep developing specific models for their areas, notably attention-enhanced BiLSTM, Transformer and BERT. The births of those models have introduced a new wave of modern approaches which frequently report new breaking results and open much novel research directions. The aim of this paper is to give readers a roadmap of those modern approaches in NLP, including their ideas, theories and applications. This would hopefully offer a solid background for further research in this area.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130790134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-16DOI: 10.25073/2588-1086/vnucsce.353
Le Kim Thu, Do Duc Dong, Bui Ngoc Thang, Hoang Thi Diep, Nguyen Phuong Thao, L. Vinh
Phylogenomics, or evolutionary inference based on genome alignment, is becoming prominent thanks to next-generation sequencing technologies. In model-based phylogenomics, the partition scheme has a significant impact on inference performance, both in terms of log-likelihoods and computation time. Therefore, finding an optimal partition scheme, or partitioning, is critical in a phylogenomic inference pipeline. To accomplish this, one needs to divide the alignment sites into disjoint partitions so that the sites of similar evolutionary models are in the same partition. Computational partitioning is a recent approach of increasing interest due to its capability of modeling the site-rate heterogeneity within a single gene. State-of-the-art computational partitioning methods, such as mPartition or RatePartition, are, however, ineffective on long alignments of millions of sites. In this paper, we introduce gPartition, a new computational partitioning method leveraging both the site rate and the best-fit substitution model. We conducted experiments on recently published alignments to compare gPartition with mPartition and RatePartition. gPartition was orders of magnitude faster than other methods. The AIC score demonstrated that gPartition produced partition schemes that were better or comparable to mPartition. gPartition outperformed RatePartition on all examined alignments. We implemented our proposed method in the gPartition program to help researchers partition genome alignments with millions of sites more efficiently.
{"title":"gPartition: An Efficient Alignment Partitioning Program for Genome Datasets","authors":"Le Kim Thu, Do Duc Dong, Bui Ngoc Thang, Hoang Thi Diep, Nguyen Phuong Thao, L. Vinh","doi":"10.25073/2588-1086/vnucsce.353","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.353","url":null,"abstract":"Phylogenomics, or evolutionary inference based on genome alignment, is becoming prominent thanks to next-generation sequencing technologies. In model-based phylogenomics, the partition scheme has a significant impact on inference performance, both in terms of log-likelihoods and computation time. Therefore, finding an optimal partition scheme, or partitioning, is critical in a phylogenomic inference pipeline. To accomplish this, one needs to divide the alignment sites into disjoint partitions so that the sites of similar evolutionary models are in the same partition. Computational partitioning is a recent approach of increasing interest due to its capability of modeling the site-rate heterogeneity within a single gene. State-of-the-art computational partitioning methods, such as mPartition or RatePartition, are, however, ineffective on long alignments of millions of sites. In this paper, we introduce gPartition, a new computational partitioning method leveraging both the site rate and the best-fit substitution model. We conducted experiments on recently published alignments to compare gPartition with mPartition and RatePartition. gPartition was orders of magnitude faster than other methods. The AIC score demonstrated that gPartition produced partition schemes that were better or comparable to mPartition. gPartition outperformed RatePartition on all examined alignments. We implemented our proposed method in the gPartition program to help researchers partition genome alignments with millions of sites more efficiently.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115075615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-30DOI: 10.25073/2588-1086/vnucsce.368
Quan Chu Quoc, Viola Van
Named entity recognition (NER) is a widely studied task in natural language processing. Recently, a growing number of studies have focused on the nested NER. The span-based methods consider the named entity recognition as span classification task, can deal with nested entities naturally. But they suffer from class imbalance problem because the number of non-entity spans accounts for the majority of total spans. To address this issue, we propose a two stage model for nested NER. We utilize an entity proposal module to filter an easy non-entity spans for efficient training. In addition, we combine all variants of the model to improve overall accuracy of our system. Our method achieves 1st place on the Vietnamese NER shared task at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP) with F1-score of 62.71 on the private test dataset. For research purposes, our source code is available at https://github.com/quancq/VLSP2021_NER
{"title":"NER - VLSP 2021: Two Stage Model for Nested Named Entity Recognition","authors":"Quan Chu Quoc, Viola Van","doi":"10.25073/2588-1086/vnucsce.368","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.368","url":null,"abstract":"Named entity recognition (NER) is a widely studied task in natural language processing. Recently, a growing number of studies have focused on the nested NER. The span-based methods consider the named entity recognition as span classification task, can deal with nested entities naturally. But they suffer from class imbalance problem because the number of non-entity spans accounts for the majority of total spans. To address this issue, we propose a two stage model for nested NER. We utilize an entity proposal module to filter an easy non-entity spans for efficient training. In addition, we combine all variants of the model to improve overall accuracy of our system. Our method achieves 1st place on the Vietnamese NER shared task at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP) with F1-score of 62.71 on the private test dataset. For research purposes, our source code is available at https://github.com/quancq/VLSP2021_NER","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129671565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-30DOI: 10.25073/2588-1086/vnucsce.320
T. Thang, Huynh Thi Thanh Binh
Recently, Xvectors and ECAPA-TDNN have been considered state-of-the-art models in designing speaker verification systems. This paper proposes a novel approach that combines Attentive statistic pooling-based Xvector and pre-trained ECAPA-TDNN for Vietnamese speaker verification. Experiments are conducted on various recent Vietnamese speech datasets. The results portrayed that our proposed combination outperformed all constitutive models with 4% to 37% relative EER improvement and ranked second place in Task 2 of the 2021 VLSP Speaker Verification competition.
{"title":"SV - VLSP 2021: Combine Attentive Statistical Pooling-based Xvector and Pretrained ECAPA-TDNN for Vietnamese Text-Independent Speaker Verification","authors":"T. Thang, Huynh Thi Thanh Binh","doi":"10.25073/2588-1086/vnucsce.320","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.320","url":null,"abstract":"Recently, Xvectors and ECAPA-TDNN have been considered state-of-the-art models in designing speaker verification systems. This paper proposes a novel approach that combines Attentive statistic pooling-based Xvector and pre-trained ECAPA-TDNN for Vietnamese speaker verification. Experiments are conducted on various recent Vietnamese speech datasets. The results portrayed that our proposed combination outperformed all constitutive models with 4% to 37% relative EER improvement and ranked second place in Task 2 of the 2021 VLSP Speaker Verification competition.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133550297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-30DOI: 10.25073/2588-1086/vnucsce.342
N. Ngoc Anh, Nguyen Tien Thanh, Le Dang Linh
This paper describes our speech synthesis system participating in the Vietnamese Text-To-Speech track of the 2021 VLSP evaluation campaign. The goal of this challenge is to build a synthetic voice from a provided spontaneous speech corpus in Vietnamese. In this paper, we propose our implementation of FastSpeech2 model on spontaneous speech. We used a special strategy with spontaneous datasets using the TTS system. We present our utilization in generating mel-spectrograms from given texts and then synthesize speech from generated mel-spectrograms using a separately trained vocoder. In evaluation, our team achieved 3.943 mean score in MOS in-domain test, 3.3 in MOS out-domain test, and 85.00% SUS, which indicates the effectiveness of the proposed system.
{"title":"TTS - VLSP 2021: The Thunder Text-To-Speech System","authors":"N. Ngoc Anh, Nguyen Tien Thanh, Le Dang Linh","doi":"10.25073/2588-1086/vnucsce.342","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.342","url":null,"abstract":"This paper describes our speech synthesis system participating in the Vietnamese Text-To-Speech track of the 2021 VLSP evaluation campaign. The goal of this challenge is to build a synthetic voice from a provided spontaneous speech corpus in Vietnamese. In this paper, we propose our implementation of FastSpeech2 model on spontaneous speech. We used a special strategy with spontaneous datasets using the TTS system. We present our utilization in generating mel-spectrograms from given texts and then synthesize speech from generated mel-spectrograms using a separately trained vocoder. In evaluation, our team achieved 3.943 mean score in MOS in-domain test, 3.3 in MOS out-domain test, and 85.00% SUS, which indicates the effectiveness of the proposed system.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125176794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-30DOI: 10.25073/2588-1086/vnucsce.362
Ha My Linh, Do Duy Dao, Nguyen Thi Minh Huyen, Ngo The Quyen, Doan Xuan Dung
Named entities (NE) are phrases that contain the names of persons, organizations, locations, times, quantities, email, phone number, etc., in a document. Named Entity Recognition (NER) is a fundamental task that is useful in many applications, especially in information extraction and question answering. Shared tasks on NER provides several reference datasets in many languages. In the 2016 and 2018 editions of the VLSP workshop series, reference NER datasets have been published with only three main entity categories: person, organization and location. At the VLSP 2021 workshop, another challenge on NER is organized for dealing with an extended set of 14 main entity types and 26 sub-entity types. This paper describes the published datasets and the evaluated systems in the framework of the VLSP 2021 evaluation campaign.
{"title":"VLSP 2021 - NER Challenge: Named Entity Recognition for Vietnamese","authors":"Ha My Linh, Do Duy Dao, Nguyen Thi Minh Huyen, Ngo The Quyen, Doan Xuan Dung","doi":"10.25073/2588-1086/vnucsce.362","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.362","url":null,"abstract":"Named entities (NE) are phrases that contain the names of persons, organizations, locations, times, quantities, email, phone number, etc., in a document. Named Entity Recognition (NER) is a fundamental task that is useful in many applications, especially in information extraction and question answering. Shared tasks on NER provides several reference datasets in many languages. In the 2016 and 2018 editions of the VLSP workshop series, reference NER datasets have been published with only three main entity categories: person, organization and location. At the VLSP 2021 workshop, another challenge on NER is organized for dealing with an extended set of 14 main entity types and 26 sub-entity types. This paper describes the published datasets and the evaluated systems in the framework of the VLSP 2021 evaluation campaign.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129025056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-30DOI: 10.25073/2588-1086/vnucsce.347
Nguyen Le Minh, An Quoc Do, Viet Q. Vu, Huyen Thuc Khanh Vo
The Association for Vietnamese Language and Speech Processing (VLSP) has organized a series of workshops intending to bring together researchers and professionals working in NLP and attempt a synthesis of research in the Vietnamese language. One of the shared tasks held at the eighth workshop is TTS [14] using a dataset that only consists of spontaneous audio. This poses a challenge for current TTS models since they only perform well constructing reading-style speech (e.g, audiobook). Not only that, the quality of the audio provided by the dataset has a huge impact on the performance of the model. Specifically, samples with noisy backgrounds or with multiple voices speaking at the same time will deteriorate the performance of our model. In this paper, we describe our approach to tackle this problem: we first preprocess the training data then use it to train a FastSpeech2 [10] acoustic model with some replacements in the external aligner model, finally we use HiFiGAN [4] vocoder to construct the waveform. According to the official evaluation of VLSP 2021 competition in the TTS task, our approach achieves 3.729 in-domain MOS, 3.557 out-of-domain MOS, and 79.70% SUS score. Audio samples are available at https://navi-tts.github.io/.
{"title":"TTS - VLSP 2021: The NAVI’s Text-To-Speech System for Vietnamese","authors":"Nguyen Le Minh, An Quoc Do, Viet Q. Vu, Huyen Thuc Khanh Vo","doi":"10.25073/2588-1086/vnucsce.347","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.347","url":null,"abstract":"The Association for Vietnamese Language and Speech Processing (VLSP) has organized a series of workshops intending to bring together researchers and professionals working in NLP and attempt a synthesis of research in the Vietnamese language. One of the shared tasks held at the eighth workshop is TTS [14] using a dataset that only consists of spontaneous audio. This poses a challenge for current TTS models since they only perform well constructing reading-style speech (e.g, audiobook). Not only that, the quality of the audio provided by the dataset has a huge impact on the performance of the model. Specifically, samples with noisy backgrounds or with multiple voices speaking at the same time will deteriorate the performance of our model. In this paper, we describe our approach to tackle this problem: we first preprocess the training data then use it to train a FastSpeech2 [10] acoustic model with some replacements in the external aligner model, finally we use HiFiGAN [4] vocoder to construct the waveform. According to the official evaluation of VLSP 2021 competition in the TTS task, our approach achieves 3.729 in-domain MOS, 3.557 out-of-domain MOS, and 79.70% SUS score. Audio samples are available at https://navi-tts.github.io/.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131040199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}