首页 > 最新文献

VNU Journal of Science: Computer Science and Communication Engineering最新文献

英文 中文
ASR - VLSP 2021: Semi-supervised Ensemble Model for Vietnamese Automatic Speech Recognition ASR - VLSP 2021:越南语自动语音识别的半监督集成模型
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.332
Phạm Việt Thành, Le Duc Cuong, Dao Dang Huy, Luu Duc Thanh, Nguyen Duc Tan, Dang Trung Duc Anh, Nguyen Thi Thu Trang
Automatic speech recognition (ASR) is gaining huge advances with the arrival of End-to-End architectures. Semi-supervised learning methods, which can utilize unlabeled data, have largely contributed to the success of ASR systems, giving them the ability to surpass human performance. However, most of the researches focus on developing these techniques for English speech recognition, which raises concern about their performance in other languages, especially in low-resource scenarios. In this paper, we aim at proposing a Vietnamese ASR system for participating in the VLSP 2021 Automatic Speech Recognition Shared Task. The system is based on the Wav2vec 2.0 framework, along with the application of self-training and several data augmentation techniques. Experimental results show that on the ASR-T1 test set of the shared task, our proposed model achieved a remarkable result, ranked as the second place with a Syllable Error Rate (SyER) of 11.08%.
随着端到端架构的到来,自动语音识别(ASR)正在取得巨大的进步。半监督学习方法可以利用未标记的数据,这在很大程度上促进了ASR系统的成功,使它们有能力超越人类的表现。然而,大多数研究都集中在开发英语语音识别技术上,这引起了人们对其在其他语言中的表现的关注,特别是在资源匮乏的情况下。在本文中,我们旨在提出一个参与VLSP 2021自动语音识别共享任务的越南ASR系统。该系统基于Wav2vec 2.0框架,并应用了自我训练和多种数据增强技术。实验结果表明,在共享任务的ASR-T1测试集上,我们提出的模型取得了显著的效果,以11.08%的音节错误率(SyER)排名第二。
{"title":"ASR - VLSP 2021: Semi-supervised Ensemble Model for Vietnamese Automatic Speech Recognition","authors":"Phạm Việt Thành, Le Duc Cuong, Dao Dang Huy, Luu Duc Thanh, Nguyen Duc Tan, Dang Trung Duc Anh, Nguyen Thi Thu Trang","doi":"10.25073/2588-1086/vnucsce.332","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.332","url":null,"abstract":"Automatic speech recognition (ASR) is gaining huge advances with the arrival of End-to-End architectures. Semi-supervised learning methods, which can utilize unlabeled data, have largely contributed to the success of ASR systems, giving them the ability to surpass human performance. However, most of the researches focus on developing these techniques for English speech recognition, which raises concern about their performance in other languages, especially in low-resource scenarios. In this paper, we aim at proposing a Vietnamese ASR system for participating in the VLSP 2021 Automatic Speech Recognition Shared Task. The system is based on the Wav2vec 2.0 framework, along with the application of self-training and several data augmentation techniques. Experimental results show that on the ASR-T1 test set of the shared task, our proposed model achieved a remarkable result, ranked as the second place with a Syllable Error Rate (SyER) of 11.08%.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NER - VLSP 2021: A Span-Based Model for Named Entity Recognition Task with Co-teaching+ Training Strategy NER - VLSP 2021:基于跨域的协同教学+训练策略命名实体识别任务模型
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.328
Pham Hoai Phu Thinh, Vu Tran Duy, Do Tran Anh Duc
Named entities containing other named entities inside are referred to as nested entities, which commonly exist in news articles and other documents. However, most studies in the field of Vietnamese named entity recognition entirely ignore nested entities. In this report, we describe our system at VLSP 2021 evaluation campaign, adopting the technique from dependency parsing to tackle the problem of nested entities. We also apply Coteaching+ technique to enhance the overall performance and propose an ensemble algorithm to combine predictions. Experimental results show that the ensemble method achieves the best F1 score on the test set at VLSP 2021.
包含其他命名实体的命名实体被称为嵌套实体,通常存在于新闻文章和其他文档中。然而,大多数越南语命名实体识别领域的研究完全忽略了嵌套实体。在本报告中,我们在VLSP 2021评估活动中描述了我们的系统,采用依赖解析技术来解决嵌套实体的问题。我们还应用了Coteaching+技术来提高整体性能,并提出了一种集成算法来组合预测。实验结果表明,该方法在VLSP 2021的测试集上获得了最佳的F1分数。
{"title":"NER - VLSP 2021: A Span-Based Model for Named Entity Recognition Task with Co-teaching+ Training Strategy","authors":"Pham Hoai Phu Thinh, Vu Tran Duy, Do Tran Anh Duc","doi":"10.25073/2588-1086/vnucsce.328","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.328","url":null,"abstract":"Named entities containing other named entities inside are referred to as nested entities, which commonly exist in news articles and other documents. However, most studies in the field of Vietnamese named entity recognition entirely ignore nested entities. In this report, we describe our system at VLSP 2021 evaluation campaign, adopting the technique from dependency parsing to tackle the problem of nested entities. We also apply Coteaching+ technique to enhance the overall performance and propose an ensemble algorithm to combine predictions. Experimental results show that the ensemble method achieves the best F1 score on the test set at VLSP 2021.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121556302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLSP 2021 - ASR Challenge for Vietnamese Automatic Speech Recognition VLSP 2021 -越南语自动语音识别的ASR挑战
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.356
Van Hai Do
Recently, Vietnamese speech recognition has been attracted by various research groups in both academics and industry. This paper presents a Vietnamese automatic speech recognition challenge for the eighth annual workshop on Vietnamese Language and Speech Processing (VLSP 2021). There are two sub-tasks in the challenge. The first task is ASR-Task1 focusing on a full pipeline development of the ASR model from scratch with both labeled and unlabeled training data provided by the organizer. The second task is ASR-Task2 focusing on spontaneous speech in different real scenarios e.g., meeting conversation, lecture speech. In the ASR-Task2, participants can use all available data sources to develop their models without any limitations. The quality of the models is evaluated by the Syllable Error Rate (SyER) metric.
最近,越南语语音识别受到学术界和产业界各种研究团体的关注。本文为第八届越南语言和语音处理年度研讨会(VLSP 2021)提出了越南语自动语音识别挑战。挑战中有两个子任务。第一个任务是ASR- task1,重点是使用组织者提供的标记和未标记的训练数据从零开始对ASR模型进行完整的流水线开发。第二个任务是ASR-Task2,侧重于不同真实场景下的自发演讲,如会议对话,讲座演讲。在ASR-Task2中,参与者可以使用所有可用的数据源来开发他们的模型,没有任何限制。通过音节错误率(SyER)度量来评估模型的质量。
{"title":"VLSP 2021 - ASR Challenge for Vietnamese Automatic Speech Recognition","authors":"Van Hai Do","doi":"10.25073/2588-1086/vnucsce.356","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.356","url":null,"abstract":"Recently, Vietnamese speech recognition has been attracted by various research groups in both academics and industry. This paper presents a Vietnamese automatic speech recognition challenge for the eighth annual workshop on Vietnamese Language and Speech Processing (VLSP 2021). There are two sub-tasks in the challenge. The first task is ASR-Task1 focusing on a full pipeline development of the ASR model from scratch with both labeled and unlabeled training data provided by the organizer. The second task is ASR-Task2 focusing on spontaneous speech in different real scenarios e.g., meeting conversation, lecture speech. In the ASR-Task2, participants can use all available data sources to develop their models without any limitations. The quality of the models is evaluated by the Syllable Error Rate (SyER) metric.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128879233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLSP 2021 - TTS Challenge: Vietnamese Spontaneous Speech Synthesis VLSP 2021 - TTS挑战:越南语自发语音合成
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.358
Nguyen Thi Thu Trang, H. Nguyen
Text-To-Speech (TTS) was one of nine shared tasks in the eighth annual international VLSP 2021 workshop. All three previous TTS shared tasks were conducted on reading datasets. However, the synthetic voices were not natural enough for spoken dialog systems where the computer must talk to the human in a conversation. Speech datasets recorded in a spontaneous environment help a TTS system to produce more natural voices in speaking style, speaking rate, intonation... Therefore, in this shared task, participants were asked to build a TTS system from a spontaneous speech dataset. This 7.5-hour dataset was collected from a channel of a famous youtuber "Giang ơi..."and then pre-processed to build utterances and their corresponding texts. Main challenges at this task this year were: (i) inconsistency in speaking rate, intensity, stress and prosody across the dataset, (ii) background noises or mixed with other voices, and (iii) inaccurate transcripts. A total of 43 teams registered to participate in this shared task, and finally, 8 submissions were evaluated online with perceptual tests. Two types of perceptual tests were conducted: (i) MOS test for naturalness and (ii) SUS (Semantically Unpredictable Sentences) test for intelligibility. The best SUS intelligibility TTS system had a syllable error rate of 15%, while the best MOS score on dialog utterances was 3.98 over 4.54 points on a 5-point MOS scale. The prosody and speaking rate of synthetic voices were similar to the natural one. However, there were still some distorted segments and background noises in most of TTS systems, a half of which had a syllable error rate of at least 30%.
文本到语音(TTS)是第八届年度国际VLSP 2021研讨会的九个共同任务之一。所有前三个TTS共享任务都是在读取数据集上进行的。然而,对于计算机必须与人对话的口语对话系统来说,合成的声音不够自然。在自然环境中记录的语音数据集有助于TTS系统在说话风格、说话速度、语调等方面产生更自然的声音。因此,在这个共享任务中,参与者被要求从一个自发语音数据集构建一个TTS系统。这个7.5小时的数据集是从著名youtuber“Giang ơi…”的频道中收集的,然后进行预处理以构建话语和相应的文本。今年这项任务的主要挑战是:(i)整个数据集的语速、强度、重音和韵律不一致,(ii)背景噪音或与其他声音混合,以及(iii)不准确的转录本。共有43个团队注册参与这项共享任务,最后,8个提交的作品通过感知测试进行在线评估。进行了两种类型的感知测试:(i)自然性的MOS测试和(ii)可理解性的SUS(语义不可预测的句子)测试。最佳SUS可理解性TTS系统的音节错误率为15%,而对话话语的最佳MOS得分为3.98分,满分为4.54分(满分为5分)。合成人声的韵律和语速与自然人声相似。然而,大多数TTS系统仍然存在一些失真的片段和背景噪声,其中一半的音节错误率至少为30%。
{"title":"VLSP 2021 - TTS Challenge: Vietnamese Spontaneous Speech Synthesis","authors":"Nguyen Thi Thu Trang, H. Nguyen","doi":"10.25073/2588-1086/vnucsce.358","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.358","url":null,"abstract":"Text-To-Speech (TTS) was one of nine shared tasks in the eighth annual international VLSP 2021 workshop. All three previous TTS shared tasks were conducted on reading datasets. However, the synthetic voices were not natural enough for spoken dialog systems where the computer must talk to the human in a conversation. Speech datasets recorded in a spontaneous environment help a TTS system to produce more natural voices in speaking style, speaking rate, intonation... Therefore, in this shared task, participants were asked to build a TTS system from a spontaneous speech dataset. This 7.5-hour dataset was collected from a channel of a famous youtuber \"Giang ơi...\"and then pre-processed to build utterances and their corresponding texts. Main challenges at this task this year were: (i) inconsistency in speaking rate, intensity, stress and prosody across the dataset, (ii) background noises or mixed with other voices, and (iii) inaccurate transcripts. A total of 43 teams registered to participate in this shared task, and finally, 8 submissions were evaluated online with perceptual tests. Two types of perceptual tests were conducted: (i) MOS test for naturalness and (ii) SUS (Semantically Unpredictable Sentences) test for intelligibility. The best SUS intelligibility TTS system had a syllable error rate of 15%, while the best MOS score on dialog utterances was 3.98 over 4.54 points on a 5-point MOS scale. The prosody and speaking rate of synthetic voices were similar to the natural one. However, there were still some distorted segments and background noises in most of TTS systems, a half of which had a syllable error rate of at least 30%.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130015394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TTS - VLSP 2021: Development of Smartcall Vietnamese Text-to-Speech TTS - VLSP 2021:智能呼叫越南文转语音的发展
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.348
Nguyen Quoc Bao, Le Ba Hoai, N. Hoc, Dam Ba Quyen, Nguyen Thu Phuong
Recent advances in deep learning facilitate the development of end-to-end Vietnamese text-to-speech (TTS) systems with high intelligibility and naturalness in the presence of a clean training corpus. Given a rich source of audio recording data on the Internet, TTS has excellent potential for growth if it can take advantage of this data source. However, the quality of these data is often not sufficient for training TTS systems, e.g., noisy audio. In this paper, we propose an approach that preprocesses noisy found data on the Internet and trains a high-quality TTS model on the processed data. The VLSP-provided training data was thoroughly preprocessed using 1) voice activity detection, 2) automatic speech recognition-based prosodic punctuation insertion, and 3) Spleeter, source separation tool, for separating voice from background music. Moreover, we utilize a state-of-the-art TTS system that takes advantage of the Conditional Variational Autoencoder with the Adversarial Learning model. Our experiment showed that the proposed TTS system trained on the preprocessed data achieved a good result on the provided noisy dataset.
深度学习的最新进展促进了端到端越南语文本到语音(TTS)系统的发展,在干净的训练语料库的存在下具有高可理解性和自然性。鉴于Internet上有丰富的音频记录数据源,如果TTS能够利用这些数据源,它将具有极好的增长潜力。然而,这些数据的质量往往不足以训练TTS系统,例如噪声音频。在本文中,我们提出了一种对互联网上有噪声的发现数据进行预处理并在处理后的数据上训练高质量的TTS模型的方法。对vlsp提供的训练数据进行预处理,采用1)语音活动检测,2)基于语音自动识别的韵律标点插入,3)源分离工具Spleeter将语音与背景音乐分离。此外,我们利用了最先进的TTS系统,该系统利用了具有对抗学习模型的条件变分自编码器。实验表明,本文提出的TTS系统在预处理数据的基础上,在给定的噪声数据集上取得了较好的效果。
{"title":"TTS - VLSP 2021: Development of Smartcall Vietnamese Text-to-Speech","authors":"Nguyen Quoc Bao, Le Ba Hoai, N. Hoc, Dam Ba Quyen, Nguyen Thu Phuong","doi":"10.25073/2588-1086/vnucsce.348","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.348","url":null,"abstract":"Recent advances in deep learning facilitate the development of end-to-end Vietnamese text-to-speech (TTS) systems with high intelligibility and naturalness in the presence of a clean training corpus. Given a rich source of audio recording data on the Internet, TTS has excellent potential for growth if it can take advantage of this data source. However, the quality of these data is often not sufficient for training TTS systems, e.g., noisy audio. In this paper, we propose an approach that preprocesses noisy found data on the Internet and trains a high-quality TTS model on the processed data. The VLSP-provided training data was thoroughly preprocessed using 1) voice activity detection, 2) automatic speech recognition-based prosodic punctuation insertion, and 3) Spleeter, source separation tool, for separating voice from background music. Moreover, we utilize a state-of-the-art TTS system that takes advantage of the Conditional Variational Autoencoder with the Adversarial Learning model. Our experiment showed that the proposed TTS system trained on the preprocessed data achieved a good result on the provided noisy dataset.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117299463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASR - VLSP 2021: An Efficient Transformer-based Approach for Vietnamese ASR Task ASR - VLSP 2021:越南ASR任务的高效变压器方法
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.325
Toan Truong Tien
Various techniques have been applied to enhance automatic speech recognition during the last few years. Reaching auspicious performance in natural language processing makes Transformer architecture becoming the de facto standard in numerous domains. This paper first presents our effort to collect a 3000-hour Vietnamese speech corpus. After that, we introduce the system used for VLSP 2021 ASR task 2, which is based on the Transformer. Our simple method achieves a favorable syllable error rate of 6.72% and gets second place on the private test. Experimental results indicate that the proposed approach dominates traditional methods with lower syllable error rates on general-domain evaluation sets. Finally, we show that applying Vietnamese word segmentation on the label does not improve the efficiency of the ASR system.
在过去的几年中,各种技术被应用于增强自动语音识别。在自然语言处理方面达到良好的性能使得Transformer架构成为许多领域的事实标准。本文首先介绍了我们收集3000小时越南语语音语料库的努力。然后,我们介绍了基于Transformer的VLSP 2021 ASR任务2所使用的系统。我们的简单方法达到了6.72%的音节错误率,在私测中获得了第二名。实验结果表明,该方法在通用领域评价集上的音节错误率较低,优于传统方法。最后,我们证明了在标签上应用越南语分词并不能提高ASR系统的效率。
{"title":"ASR - VLSP 2021: An Efficient Transformer-based Approach for Vietnamese ASR Task","authors":"Toan Truong Tien","doi":"10.25073/2588-1086/vnucsce.325","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.325","url":null,"abstract":"Various techniques have been applied to enhance automatic speech recognition during the last few years. Reaching auspicious performance in natural language processing makes Transformer architecture becoming the de facto standard in numerous domains. This paper first presents our effort to collect a 3000-hour Vietnamese speech corpus. After that, we introduce the system used for VLSP 2021 ASR task 2, which is based on the Transformer. Our simple method achieves a favorable syllable error rate of 6.72% and gets second place on the private test. Experimental results indicate that the proposed approach dominates traditional methods with lower syllable error rates on general-domain evaluation sets. Finally, we show that applying Vietnamese word segmentation on the label does not improve the efficiency of the ASR system.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116065865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASR - VLSP 2021: Automatic Speech Recognition with Blank Label Re-weighting ASR - VLSP 2021:自动语音识别与空白标签重加权
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.321
T. Thang, Dang Dinh Son, Le Dang Linh, Dang Xuan Vuong, Duong Quang Tien
End-to-end models have significant potential in most languages and recently proved the robustness in ASR tasks. Many robust architectures are proposed, and among many techniques, Recurrent Neural Network - Transducer (RNN-T) shows remarkable success. However, with background noise or reverb in spontaneous speech, this architecture generally suffers from high deletion error problems. For this reason, we propose the blank label re-weighting technique to improve the state-of-the-art Conformer transducer model. Our proposed system adopts the Stochastic Weight Averaging approach, stabilizing the training process. Our work achieved the first rank with a 4.17% of word error rate in Task 2 of the VLSP 2021 Competition.  
端到端模型在大多数语言中具有巨大的潜力,并且最近证明了其在ASR任务中的鲁棒性。许多鲁棒架构被提出,在众多技术中,递归神经网络-传感器(RNN-T)取得了显著的成功。然而,在自发语音中存在背景噪声或混响时,这种结构通常存在较高的删除错误问题。因此,我们提出了空白标签重加权技术来改进最先进的共形换能器模型。我们提出的系统采用随机加权平均方法,稳定了训练过程。我们的工作在VLSP 2021竞赛的Task 2中以4.17%的错误率获得了第一名。
{"title":"ASR - VLSP 2021: Automatic Speech Recognition with Blank Label Re-weighting","authors":"T. Thang, Dang Dinh Son, Le Dang Linh, Dang Xuan Vuong, Duong Quang Tien","doi":"10.25073/2588-1086/vnucsce.321","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.321","url":null,"abstract":"End-to-end models have significant potential in most languages and recently proved the robustness in ASR tasks. Many robust architectures are proposed, and among many techniques, Recurrent Neural Network - Transducer (RNN-T) shows remarkable success. However, with background noise or reverb in spontaneous speech, this architecture generally suffers from high deletion error problems. For this reason, we propose the blank label re-weighting technique to improve the state-of-the-art Conformer transducer model. Our proposed system adopts the Stochastic Weight Averaging approach, stabilizing the training process. Our work achieved the first rank with a 4.17% of word error rate in Task 2 of the VLSP 2021 Competition. \u0000 ","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130365030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SV - VLSP2021: The Smartcall - ITS’s Systems SV - VLSP2021: Smartcall - ITS的系统
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.339
Hung Van Dinh, Tuan Van Mai, Quyen B. Dam, Bao Quoc Nguyen
This paper presents the Smartcall - ITS’s systems submitted to the Vietnamese Language and Speech Processing, Speaker Verification (SV) task. The challenge consists of two tasks focusing on the development of SV models with limited data and testing the robustness of SV systems. In both tasks, we used various pre-trained speaker embedding models with different architectures: TDNN, Resnet34. After a specific fine-tuning strategy with data from the organiser, our system achieved the first rank for both two tasks with the Equal Error Rate respectively are 1.755%, 1.95%. In this paper, we describe our system developed for the booth two tasks in the VLSP2021 Speaker Verification shared-task.
本文介绍了智能呼叫ITS系统在越南语语言语音处理、说话人验证(SV)任务中的应用。挑战包括两项任务,重点是在有限数据下开发SV模型和测试SV系统的鲁棒性。在这两个任务中,我们使用了不同架构的各种预训练的说话人嵌入模型:TDNN, Resnet34。在使用组织者的数据进行特定的微调策略后,我们的系统在两个任务上都获得了第一名,错误率分别为1.755%和1.95%。在本文中,我们描述了我们在VLSP2021扬声器验证共享任务中为展台两个任务开发的系统。
{"title":"SV - VLSP2021: The Smartcall - ITS’s Systems","authors":"Hung Van Dinh, Tuan Van Mai, Quyen B. Dam, Bao Quoc Nguyen","doi":"10.25073/2588-1086/vnucsce.339","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.339","url":null,"abstract":"This paper presents the Smartcall - ITS’s systems submitted to the Vietnamese Language and Speech Processing, Speaker Verification (SV) task. The challenge consists of two tasks focusing on the development of SV models with limited data and testing the robustness of SV systems. In both tasks, we used various pre-trained speaker embedding models with different architectures: TDNN, Resnet34. After a specific fine-tuning strategy with data from the organiser, our system achieved the first rank for both two tasks with the Equal Error Rate respectively are 1.755%, 1.95%. In this paper, we describe our system developed for the booth two tasks in the VLSP2021 Speaker Verification shared-task.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130664150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLSP 2021 - SV challenge: Vietnamese Speaker Verification in Noisy Environments VLSP 2021 - SV挑战:嘈杂环境下的越南语说话人验证
Pub Date : 2022-06-30 DOI: 10.25073/2588-1086/vnucsce.333
Vi Thanh Dat, Phạm Việt Thành, Nguyen Thi Thu Trang
The VLSP 2021 is the eighth annual international workshop whose campaign was organized at the University of Information Technology, Vietnam National University, Ho Chi Minh City (UIT-VNU-HCM). This was the first time we organized the Speaker Verification shared task with two subtasks SV-T1 and SV-T2. SV-T1 focuses on the development of SV models with limited data, and SV-T2 focuses on testing the capability and the robustness of SV systems. With the aim to boost the development of robust models, we collected, processed, and published a speaker dataset in noisy environments containing 50 hours of speech and more than 1,300 speaker identities. A total of 39 teams registered to participate in this shared task, 15 teams received the dataset, and finally, 7 teams submitted final solutions. The best solution leveraged English pre-trained models and achieved 1.755% and 1.950% Equal Error Rate for SV-T1 and SV-T2 respectively.
VLSP 2021是在胡志明市越南国立大学信息技术大学(unit - vnu - hcm)举办的第八届年度国际研讨会。这是我们第一次组织演讲者验证共享任务,分为两个子任务SV-T1和SV-T2。SV- t1侧重于开发有限数据的SV模型,SV- t2侧重于测试SV系统的能力和鲁棒性。为了促进鲁棒模型的发展,我们收集、处理并发布了一个嘈杂环境下的说话人数据集,其中包含50小时的语音和1300多个说话人身份。共有39支队伍报名参加了这个共享任务,15支队伍收到了数据集,最后,7支队伍提交了最终的解决方案。最佳解决方案利用英语预训练模型,SV-T1和SV-T2的相等错误率分别为1.755%和1.950%。
{"title":"VLSP 2021 - SV challenge: Vietnamese Speaker Verification in Noisy Environments","authors":"Vi Thanh Dat, Phạm Việt Thành, Nguyen Thi Thu Trang","doi":"10.25073/2588-1086/vnucsce.333","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.333","url":null,"abstract":"\u0000 \u0000 \u0000 \u0000 \u0000 \u0000The VLSP 2021 is the eighth annual international workshop whose campaign was organized at the University of Information Technology, Vietnam National University, Ho Chi Minh City (UIT-VNU-HCM). This was the first time we organized the Speaker Verification shared task with two subtasks SV-T1 and SV-T2. SV-T1 focuses on the development of SV models with limited data, and SV-T2 focuses on testing the capability and the robustness of SV systems. With the aim to boost the development of robust models, we collected, processed, and published a speaker dataset in noisy environments containing 50 hours of speech and more than 1,300 speaker identities. A total of 39 teams registered to participate in this shared task, 15 teams received the dataset, and finally, 7 teams submitted final solutions. The best solution leveraged English pre-trained models and achieved 1.755% and 1.950% Equal Error Rate for SV-T1 and SV-T2 respectively. \u0000 \u0000 \u0000 \u0000 \u0000 \u0000","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126170457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Ultra-High-Throughput Multi-Core AES Encryption Hardware Architecture 超高吞吐量多核AES加密硬件架构
Pub Date : 2021-11-11 DOI: 10.25073/2588-1086/vnucsce.290
Pham-Khoi Dong, Hung K. Nguyen, F. Hussin, Xuan-Tu Tran
Security issues in high-speed data transfer between devices are always a big challenge. On the other hand, new data transfer standards such as IEEE P802.3bs 2017 stipulate the maximum data rate up to 400 Gbps. So, security encryptions need high throughput to meet data transfer rates and low latency to ensure the quality of services. In this paper, we propose a multi-core AES encryption hardware architecture to achieve ultra-high-throughput encryption. To reduce area cost and power consumption, these cores share the same KeyExpansion blocks. Fully parallel, outer round pipeline technique is also applied to the proposed architecture to achieve low latency encryption. The design has been modelled at RTL (Register-Transfer-Level) in VHDL and then synthesized with a CMOS 45nm technology using Synopsys Design Compiler. With 10-cores fully parallel and outer round pipeline, the implementation results show that our architecture achieves a throughput of 1 Tbps at the maximum operating frequency of 800 MHz. These results meet the speed requirements of future communication standards. In addition, our design also achieves a high power-efficiency of 2377 Gbps/W and area-efficiency of 833 Gbps/mm2, that is 2.6x and 4.5x higher than those of the other highest throughput of single-core AES, respectively.
设备间高速数据传输的安全问题一直是一个巨大的挑战。另一方面,IEEE P802.3bs 2017等新的数据传输标准规定了最高400gbps的数据传输速率。因此,安全加密需要高吞吐量来满足数据传输速率,同时需要低延迟来保证服务质量。在本文中,我们提出了一个多核AES加密硬件架构,以实现超高吞吐量加密。为了降低面积成本和功耗,这些核心共享相同的KeyExpansion模块。完全并行的外圆管道技术也被应用到该架构中,以实现低延迟加密。该设计已在VHDL的RTL (Register-Transfer-Level)中建模,然后使用Synopsys design Compiler使用CMOS 45nm技术进行合成。采用10核全并行和外圆管道,实现结果表明,我们的架构在800mhz的最高工作频率下实现了1 Tbps的吞吐量。这些结果满足了未来通信标准对速度的要求。此外,我们的设计还实现了2377 Gbps/W的高功率效率和833 Gbps/mm2的面积效率,分别比其他最高吞吐量的单核AES高2.6倍和4.5倍。
{"title":"Ultra-High-Throughput Multi-Core AES Encryption Hardware Architecture","authors":"Pham-Khoi Dong, Hung K. Nguyen, F. Hussin, Xuan-Tu Tran","doi":"10.25073/2588-1086/vnucsce.290","DOIUrl":"https://doi.org/10.25073/2588-1086/vnucsce.290","url":null,"abstract":"Security issues in high-speed data transfer between devices are always a big challenge. On the other hand, new data transfer standards such as IEEE P802.3bs 2017 stipulate the maximum data rate up to 400 Gbps. So, security encryptions need high throughput to meet data transfer rates and low latency to ensure the quality of services. In this paper, we propose a multi-core AES encryption hardware architecture to achieve ultra-high-throughput encryption. To reduce area cost and power consumption, these cores share the same KeyExpansion blocks. Fully parallel, outer round pipeline technique is also applied to the proposed architecture to achieve low latency encryption. The design has been modelled at RTL (Register-Transfer-Level) in VHDL and then synthesized with a CMOS 45nm technology using Synopsys Design Compiler. With 10-cores fully parallel and outer round pipeline, the implementation results show that our architecture achieves a throughput of 1 Tbps at the maximum operating frequency of 800 MHz. These results meet the speed requirements of future communication standards. In addition, our design also achieves a high power-efficiency of 2377 Gbps/W and area-efficiency of 833 Gbps/mm2, that is 2.6x and 4.5x higher than those of the other highest throughput of single-core AES, respectively.","PeriodicalId":416488,"journal":{"name":"VNU Journal of Science: Computer Science and Communication Engineering","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114604347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
VNU Journal of Science: Computer Science and Communication Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1