首页 > 最新文献

2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)最新文献

英文 中文
Spectral analysis of English voiced palato-alveolar fricative /Ʒ/ produced by Chinese WU Speakers 汉语吴族英语发音腭-肺泡摩擦音/Ʒ/的频谱分析
Jie Liang, Minyi Yu, Wenjun Chen
In this study we investigated the production of English voiced palato-alveolar fricative /Ʒ/ by Chinese Wu speakers through statistically analyzing two spectral parameters of the native and accented sounds. Results showed that the voiced accented-/Ʒ/ was significantly influenced by voiceless /ɕ/ rather than by voiced /Ʒ/ in Wu dialect; the female Wu speakers tended to over-palatalize the accented sound, suggesting that females might be more influenced by social stereotype than the males in second language acquisition; students with higher level of English were more susceptible to Wu dialect compared to those with lower level of English, indicating that better phonological awareness does not necessarily lead to higher accuracy in phonetic production.
在这项研究中,我们通过统计分析母语和重音的两个频谱参数,研究了中国吴族人在英语发音中腭-肺泡摩擦音/Ʒ/的产生。结果表明:吴方言的浊音-/Ʒ/受不浊音/ _ /的影响显著,浊音/Ʒ/的影响较小;吴族女性在第二语言习得过程中倾向于过度使用重音,表明社会刻板印象对女性的影响大于男性;与英语水平较低的学生相比,英语水平较高的学生更容易产生吴语,这表明语音意识的提高并不一定意味着语音产生的准确性更高。
{"title":"Spectral analysis of English voiced palato-alveolar fricative /Ʒ/ produced by Chinese WU Speakers","authors":"Jie Liang, Minyi Yu, Wenjun Chen","doi":"10.1109/ICSDA.2017.8384457","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384457","url":null,"abstract":"In this study we investigated the production of English voiced palato-alveolar fricative /Ʒ/ by Chinese Wu speakers through statistically analyzing two spectral parameters of the native and accented sounds. Results showed that the voiced accented-/Ʒ/ was significantly influenced by voiceless /ɕ/ rather than by voiced /Ʒ/ in Wu dialect; the female Wu speakers tended to over-palatalize the accented sound, suggesting that females might be more influenced by social stereotype than the males in second language acquisition; students with higher level of English were more susceptible to Wu dialect compared to those with lower level of English, indicating that better phonological awareness does not necessarily lead to higher accuracy in phonetic production.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115056790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spoken English fluency scoring using convolutional neural networks 使用卷积神经网络进行英语口语流利度评分
Hoon Chung, Y. Lee, Sung Joo Lee, J. Park
In this paper, we propose a spoken English fluency scoring using Convolutional Neural Network (CNN) to learn feature extraction and scoring model jointly from raw time-domain signal input. In general, automatic spoken English fluency scoring is composed feature extraction and a scoring model. Feature extraction is used to compute the feature vectors that are assumed to represent spoken English fluency, and the scoring model predicts the fluency score of an input feature vector. Although the conventional approach works well, there are issues regarding feature extraction and model parameter optimization. First, because the fluency features are computed based on human knowledge, some crucial representations that are included in a raw data corpus can be missed. Second, each parameter of the model is optimized separately, which can lead to suboptimal performance. To address these issues, we propose a CNN-based approach to extract fluency features directly from a raw data corpus without hand-crafted engineering and optimizes all model parameters jointly. The effectiveness of the proposed approach is evaluated using Korean-Spoken English Corpus.
本文提出了一种基于卷积神经网络(CNN)的英语口语流利度评分方法,该方法从原始时域信号输入中学习特征提取和评分模型。一般来说,英语口语流利度自动评分由特征提取和评分模型组成。特征提取用于计算表征英语口语流利度的特征向量,评分模型预测输入特征向量的流利度分数。尽管传统方法效果良好,但存在特征提取和模型参数优化等问题。首先,由于流畅性特征是基于人类知识计算的,因此可能会遗漏原始数据语料库中包含的一些关键表示。其次,模型的每个参数都是单独优化的,这可能导致性能次优。为了解决这些问题,我们提出了一种基于cnn的方法,直接从原始数据语料库中提取流畅性特征,而无需手工制作工程,并联合优化所有模型参数。使用韩语口语语料库对该方法的有效性进行了评估。
{"title":"Spoken English fluency scoring using convolutional neural networks","authors":"Hoon Chung, Y. Lee, Sung Joo Lee, J. Park","doi":"10.1109/ICSDA.2017.8384444","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384444","url":null,"abstract":"In this paper, we propose a spoken English fluency scoring using Convolutional Neural Network (CNN) to learn feature extraction and scoring model jointly from raw time-domain signal input. In general, automatic spoken English fluency scoring is composed feature extraction and a scoring model. Feature extraction is used to compute the feature vectors that are assumed to represent spoken English fluency, and the scoring model predicts the fluency score of an input feature vector. Although the conventional approach works well, there are issues regarding feature extraction and model parameter optimization. First, because the fluency features are computed based on human knowledge, some crucial representations that are included in a raw data corpus can be missed. Second, each parameter of the model is optimized separately, which can lead to suboptimal performance. To address these issues, we propose a CNN-based approach to extract fluency features directly from a raw data corpus without hand-crafted engineering and optimizes all model parameters jointly. The effectiveness of the proposed approach is evaluated using Korean-Spoken English Corpus.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133970449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Pitch heights and contours of tones in weifang dialect: An experimental investigation 潍坊方言音高与声调轮廓的实验研究
Xiangting Yin, Ying Chen
An experiment was conducted to investigate pitch heights and contours of tones in Weifang Dialect. Eight middle- aged native speakers of Weifang Dialect participated in the experimental fieldwork. Tone types and values described in Documentation of Weifang Dialect [1] were taken as a reference. Acoustic data, including duration and fundamental frequency (F0), were extracted by using a Praat script— ProsodyPro [2] to track the pitch heights and contours of each tone-syllable sequence in the stimuli. F0 values were normalized to Log-Z (LZ) scores for speakers' individual differences [3, 4]. Tone values extracted from the LZ scores were determined according to the five-scale annotation of tones [5]. Tones in Weifang Dialect were found 324, 52, 44, and 41 in the experiment in contrast to 213, 42, 55 and 21 in Documentation of Weifang Dialect. Syllable durations of the different tones were also examined and compared.
对潍坊方言的音高和声调轮廓进行了实验研究。8名以潍坊方言为母语的中年人参加了实地调查。参考《潍坊方言文献》[1]中所描述的声调类型和数值。声学数据包括持续时间和基频(F0),使用Praat脚本- ProsodyPro[2]来跟踪刺激中每个音音节序列的音高和轮廓。F0值被归一化为Log-Z (LZ)分数来表示说话者的个体差异[3,4]。根据音调[5]的五级注释确定从LZ分数中提取的音调值。实验中发现潍坊方言声调分别为324、52、44和41,而《潍坊方言文献》中声调分别为213、42、55和21。不同音调的音节持续时间也被检查和比较。
{"title":"Pitch heights and contours of tones in weifang dialect: An experimental investigation","authors":"Xiangting Yin, Ying Chen","doi":"10.1109/ICSDA.2017.8384461","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384461","url":null,"abstract":"An experiment was conducted to investigate pitch heights and contours of tones in Weifang Dialect. Eight middle- aged native speakers of Weifang Dialect participated in the experimental fieldwork. Tone types and values described in Documentation of Weifang Dialect [1] were taken as a reference. Acoustic data, including duration and fundamental frequency (F0), were extracted by using a Praat script— ProsodyPro [2] to track the pitch heights and contours of each tone-syllable sequence in the stimuli. F0 values were normalized to Log-Z (LZ) scores for speakers' individual differences [3, 4]. Tone values extracted from the LZ scores were determined according to the five-scale annotation of tones [5]. Tones in Weifang Dialect were found 324, 52, 44, and 41 in the experiment in contrast to 213, 42, 55 and 21 in Documentation of Weifang Dialect. Syllable durations of the different tones were also examined and compared.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116718988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of distant multi-channel speech and noise databases for speech recognition by in-door conversational robots 用于室内会话机器人语音识别的远程多通道语音和噪声数据库的开发
Youngjoo Suh, Younggwan Kim, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Yeon-Ji Choi, Hoirin Kim, Dae-Lim Choi, Yong-Ju Lee
In this paper, we presents the method and procedure for collecting the Korean distant multi-channel speech and noise databases, which were designed for developing the highly accurate distant speech recognition system for indoor conversational robot applications. The speech database was collected at four different distant positions in an in-door room, which was furnished to simulate a living room acoustically, by the playback-and-recording method that uses an artificial mouth for playing the clean source speech data and three kinds of multi-channel microphone arrays for recording the distant speech data. The speech database further consists of a read speech dataset and two conversational speech datasets. Additionally, the noise database consists of 12 types of in-door noise, which were collected at a single distant position with the same approach. These speech and noise databases can be used for creating simulated noisy speech data reflecting various in-door acoustic conditions corrupted by room reverberation and additive noise.
本文介绍了为开发用于室内会话机器人的高精度远程语音识别系统而设计的朝鲜语远程多通道语音和噪声数据库的采集方法和步骤。在模拟客厅环境的室内房间中,采用人工嘴播放干净源语音数据,三种多声道麦克风阵列记录远程语音数据的放录音方法,在四个不同的距离位置采集语音数据库。语音数据库进一步由一个读语音数据集和两个会话语音数据集组成。此外,噪声数据库由12种室内噪声组成,这些噪声是用相同的方法在一个遥远的位置收集的。这些语音和噪声数据库可用于创建模拟噪声语音数据,这些数据反映了受室内混响和附加噪声干扰的各种室内声学条件。
{"title":"Development of distant multi-channel speech and noise databases for speech recognition by in-door conversational robots","authors":"Youngjoo Suh, Younggwan Kim, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Yeon-Ji Choi, Hoirin Kim, Dae-Lim Choi, Yong-Ju Lee","doi":"10.1109/ICSDA.2017.8384419","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384419","url":null,"abstract":"In this paper, we presents the method and procedure for collecting the Korean distant multi-channel speech and noise databases, which were designed for developing the highly accurate distant speech recognition system for indoor conversational robot applications. The speech database was collected at four different distant positions in an in-door room, which was furnished to simulate a living room acoustically, by the playback-and-recording method that uses an artificial mouth for playing the clean source speech data and three kinds of multi-channel microphone arrays for recording the distant speech data. The speech database further consists of a read speech dataset and two conversational speech datasets. Additionally, the noise database consists of 12 types of in-door noise, which were collected at a single distant position with the same approach. These speech and noise databases can be used for creating simulated noisy speech data reflecting various in-door acoustic conditions corrupted by room reverberation and additive noise.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130135032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Spoken query for Qur'anic verse information retrieval 古兰经经文信息检索语音查询
Taufik Ridwan, D. Lestari
Most of information retrieval (IR) systems for Qur'an use text as their input query, whether they use the Alphabetic script or the Arabic script to represent the query. Thus, required IR user to know how to write the query. For searching the Qur'an verses, it is possible that IR user knows how to pronounce the query, but does not have enough knowledge about how to write Arabic letters to represent the query when search for a Qur'an verse. In this case, speech can be an alternative as the input to the IR system. In this work, we develop a spoken query IR based on the Hidden Markov Model acoustic models and the n- gram language model for its automatic speech recognition system. Both models are trained by using all verses of the Qur'an. The Inference Network Model and the well-known Vector Space Model are employed for its IR system. For the speech recognition system, average of word error rate are 7.41% for closed speakers, and 18.53% for open speakers. For the IR system, the best query formulation for the Inference Network is achieved by using input queries consisting of phrase of 2 words with the average value of Mean Reciprocal Rank is 0,922475, while for the Vector Space Model is achieved by using input query consisting of one word with the average value of Mean Reciprocal Rank is 0,9308.
古兰经信息检索系统大多以文字作为输入查询,无论是用字母文字还是阿拉伯文字表示查询。因此,需要IR用户知道如何编写查询。对于搜索古兰经经文,有可能IR用户知道如何发音查询,但在搜索古兰经经文时,不知道如何写阿拉伯字母来表示查询。在这种情况下,语音可以作为IR系统的另一种输入。在这项工作中,我们开发了一个基于隐马尔可夫模型声学模型和n- gram语言模型的语音查询IR,用于其自动语音识别系统。这两个模型都是通过使用古兰经的所有经文来训练的。其红外系统采用了推理网络模型和著名的向量空间模型。在语音识别系统中,闭式说话者的平均错误率为7.41%,开放式说话者的平均错误率为18.53%。对于IR系统,使用由2个单词组成的短语组成的输入查询,平均倒数秩的平均值为0,922475,获得了推理网络的最佳查询公式;对于向量空间模型,使用由一个单词组成的输入查询,平均倒数秩的平均值为0,9308,获得了最佳查询公式。
{"title":"Spoken query for Qur'anic verse information retrieval","authors":"Taufik Ridwan, D. Lestari","doi":"10.1109/ICSDA.2017.8384422","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384422","url":null,"abstract":"Most of information retrieval (IR) systems for Qur'an use text as their input query, whether they use the Alphabetic script or the Arabic script to represent the query. Thus, required IR user to know how to write the query. For searching the Qur'an verses, it is possible that IR user knows how to pronounce the query, but does not have enough knowledge about how to write Arabic letters to represent the query when search for a Qur'an verse. In this case, speech can be an alternative as the input to the IR system. In this work, we develop a spoken query IR based on the Hidden Markov Model acoustic models and the n- gram language model for its automatic speech recognition system. Both models are trained by using all verses of the Qur'an. The Inference Network Model and the well-known Vector Space Model are employed for its IR system. For the speech recognition system, average of word error rate are 7.41% for closed speakers, and 18.53% for open speakers. For the IR system, the best query formulation for the Inference Network is achieved by using input queries consisting of phrase of 2 words with the average value of Mean Reciprocal Rank is 0,922475, while for the Vector Space Model is achieved by using input query consisting of one word with the average value of Mean Reciprocal Rank is 0,9308.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124566521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detecting oxymoron in a single statement 在单个语句中发现矛盾修饰法
Won Ik Cho, Woohyun Kang, Hyeon Seung Lee, N. Kim
This paper proposes a novel evaluation scheme for word vector representation using oxymoron, which is a special kind of contradiction arising from semantic discrepancy between a pair of words. The proper word vector representation is expected to yield a remarkable result using the proposed scheme and evaluation.
摘要本文提出了一种新的基于矛盾修饰法的词向量表示评价方案,矛盾修饰法是由一对词之间的语义差异引起的一种特殊的矛盾。使用所提出的方案和评估,期望适当的词向量表示产生显着的结果。
{"title":"Detecting oxymoron in a single statement","authors":"Won Ik Cho, Woohyun Kang, Hyeon Seung Lee, N. Kim","doi":"10.1109/ICSDA.2017.8384447","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384447","url":null,"abstract":"This paper proposes a novel evaluation scheme for word vector representation using oxymoron, which is a special kind of contradiction arising from semantic discrepancy between a pair of words. The proper word vector representation is expected to yield a remarkable result using the proposed scheme and evaluation.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Construction and analysis of Indonesian-interviews deception corpus 印尼语访谈欺骗语料库的构建与分析
Tifani Warnita, D. Lestari
In this paper, we present the first deception corpus in Indonesian to support deception detection based on statistical machine learning approach due to the importance of data in related studies. We collect speech recordings along with their high frame rate video from 30 subjects to develop Indonesian Deception Corpus (IDC). Using financial motivation as its basic scenario, IDC consists of 5542 speech segments with a total duration of approximately 16 hours and 34 minutes. As an imbalanced corpus, the majority class is represented by truth segments which is almost four times higher than the lie segments. We also perform some experiments using only the speech corpus, along with the transcriptions. Using the combination of paralinguistic, prosodic, and lexical features, we obtained the best accuracy of 61.26% and F-measure of 61.30% using Random Forest classifier and RUS as the undersampling technique.
鉴于数据在相关研究中的重要性,在本文中,我们提出了第一个支持基于统计机器学习方法的印尼语欺骗检测的欺骗语料库。我们收集了30名受试者的语音录音及其高帧率视频,开发了印度尼西亚欺骗语料库(IDC)。IDC以财务动机为基本场景,由5542个语音片段组成,总时长约16小时34分钟。作为一个不平衡的语料库,多数阶级的真实部分几乎是谎言部分的四倍。我们还做了一些实验,只使用语音语料库和转录。结合副语言、韵律和词汇特征,采用随机森林分类器和RUS作为欠采样技术,我们获得了61.26%的最佳准确率和61.30%的F-measure。
{"title":"Construction and analysis of Indonesian-interviews deception corpus","authors":"Tifani Warnita, D. Lestari","doi":"10.1109/ICSDA.2017.8384472","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384472","url":null,"abstract":"In this paper, we present the first deception corpus in Indonesian to support deception detection based on statistical machine learning approach due to the importance of data in related studies. We collect speech recordings along with their high frame rate video from 30 subjects to develop Indonesian Deception Corpus (IDC). Using financial motivation as its basic scenario, IDC consists of 5542 speech segments with a total duration of approximately 16 hours and 34 minutes. As an imbalanced corpus, the majority class is represented by truth segments which is almost four times higher than the lie segments. We also perform some experiments using only the speech corpus, along with the transcriptions. Using the combination of paralinguistic, prosodic, and lexical features, we obtained the best accuracy of 61.26% and F-measure of 61.30% using Random Forest classifier and RUS as the undersampling technique.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127120978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inclusion of manner of articulation to achieve improved phoneme classification accuracy for Bengali continuous speech 加入发音方式以提高孟加拉语连续语音的音素分类精度
Tanmay Bhowmik, S. Mandal
In this experiment, a phoneme classification model has been developed using a Deep Neural Network based framework. The experiment is conducted in two phases. In the first phase, phoneme classification task has been performed. The deep- structured model provided good overall classification accuracy of 87.8%. All the phonemes are classified with precision and recall values. A confusion matrix of all the Bengali phonemes is derived. Using the confusion matrix, the phonemes are classified into nine groups. These nine groups provided better overall classification accuracy of 98.7%, and a new confusion matrix is derived for this nine groups. A lower confusion rate is observed this time. In the second phase of the experiment, the nine groups are reclassified into 15 groups using the manner of articulation based knowledge and the deep-structured model is retrained. The system provided 98.9% of overall classification accuracy this time. This result is almost equal to the overall accuracy which was observed for nine groups. But as the nine groups are redivided into 15 groups, the phoneme confusion in a single group became less which leads to a better phoneme classification model.
在本实验中,使用基于深度神经网络的框架开发了一个音素分类模型。实验分两个阶段进行。在第一阶段,完成了音素分类任务。该模型总体分类准确率为87.8%。对所有音素进行了精度和查全率的分类。导出了所有孟加拉语音素的混淆矩阵。利用混淆矩阵,将音素分为九组。这9组总体分类准确率达到了98.7%,并导出了新的混淆矩阵。这次观察到的混淆率较低。在实验的第二阶段,使用基于发音的知识方式将9个组重新分类为15个组,并对深度结构模型进行再训练。这一次,该系统提供了98.9%的总体分类准确率。这一结果几乎等于9组观察到的总体准确性。但随着9组重新划分为15组,单个组的音素混淆减少,从而形成更好的音素分类模型。
{"title":"Inclusion of manner of articulation to achieve improved phoneme classification accuracy for Bengali continuous speech","authors":"Tanmay Bhowmik, S. Mandal","doi":"10.1109/ICSDA.2017.8384455","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384455","url":null,"abstract":"In this experiment, a phoneme classification model has been developed using a Deep Neural Network based framework. The experiment is conducted in two phases. In the first phase, phoneme classification task has been performed. The deep- structured model provided good overall classification accuracy of 87.8%. All the phonemes are classified with precision and recall values. A confusion matrix of all the Bengali phonemes is derived. Using the confusion matrix, the phonemes are classified into nine groups. These nine groups provided better overall classification accuracy of 98.7%, and a new confusion matrix is derived for this nine groups. A lower confusion rate is observed this time. In the second phase of the experiment, the nine groups are reclassified into 15 groups using the manner of articulation based knowledge and the deep-structured model is retrained. The system provided 98.9% of overall classification accuracy this time. This result is almost equal to the overall accuracy which was observed for nine groups. But as the nine groups are redivided into 15 groups, the phoneme confusion in a single group became less which leads to a better phoneme classification model.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128246900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of text and speech corpus for an Indonesian speech-to-speech translation system 印尼语语音到语音翻译系统的文本和语音语料库的开发
M. T. Uliniansyah, Hammam Riza, Agung Santosa, Gunarso, Made Gunawan, Elvira Nurfadhilah
This paper describes our natural language resources especially text and speech corpora for developing an Indonesian speech-to-speech translation (S2ST) system. The corpora are used to create models for Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT), and Text-to-Speech (TTS) systems. The corpora collected since 1987 from various sources and projects such as Multilingual Machine Translation System (MMTS), PAN Localization, ASEAN MT, U-STAR, etc. Text corpora are created by either collecting from online resources or translating manually from textual sources. Speech corpora are made from several recording projects. Availability of these corpora enables us to develop Indonesian speech-to- speech translation system.
本文介绍了我们的自然语言资源,特别是用于开发印尼语语音翻译系统的文本和语音语料库。这些语料库用于创建自动语音识别(ASR)、统计机器翻译(SMT)和文本到语音(TTS)系统的模型。自1987年以来,从各种来源和项目收集的语料库,如多语言机器翻译系统(MMTS), PAN本地化,东盟MT, U-STAR等。文本语料库是通过从在线资源中收集或从文本源中手动翻译而创建的。语料库是由几个录音项目组成的。这些语料库的可用性使我们能够开发印尼语语音翻译系统。
{"title":"Development of text and speech corpus for an Indonesian speech-to-speech translation system","authors":"M. T. Uliniansyah, Hammam Riza, Agung Santosa, Gunarso, Made Gunawan, Elvira Nurfadhilah","doi":"10.1109/ICSDA.2017.8384448","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384448","url":null,"abstract":"This paper describes our natural language resources especially text and speech corpora for developing an Indonesian speech-to-speech translation (S2ST) system. The corpora are used to create models for Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT), and Text-to-Speech (TTS) systems. The corpora collected since 1987 from various sources and projects such as Multilingual Machine Translation System (MMTS), PAN Localization, ASEAN MT, U-STAR, etc. Text corpora are created by either collecting from online resources or translating manually from textual sources. Speech corpora are made from several recording projects. Availability of these corpora enables us to develop Indonesian speech-to- speech translation system.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"15 13","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133170493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Rediscovering 50 years of discoveries in speech and language processing: A survey 重新发现50年来语音和语言处理的发现:一项调查
J. Mariani, Gil Francopoulo, P. Paroubek, F. Vernier, Nam Kyun Kim, Moon Ju Jo, H. Kim
We have created the NLP4NLP corpus to study the content of scientific publications in the field of speech and natural language processing. It contains articles published in 34 major conferences and journals in that field over a period of 50 years (1965-2015), comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing approximately 270 million words. Most of these publications are in English, some are in French, German or Russian. Some are open access, others have been provided by the publishers. In order to constitute and analyze this corpus several tools have been used or developed. Some of them use Natural Language Processing methods that have been published in the corpus, hence its name. Numerous manual corrections were necessary, which demonstrated the importance of establishing standards for uniquely identifying authors, publications or resources. We have conducted various studies: evolution over time of the number of articles and authors, collaborations between authors, citations between papers and authors, evolution of research themes and identification of the authors who introduced them, measure of innovation and detection of epistemological ruptures, use of language resources, reuse of articles and plagiarism in the context of a global or comparative analysis between sources.
我们创建了NLP4NLP语料库来研究语音和自然语言处理领域的科学出版物的内容。它收录了50年间(1965-2015)在该领域34个主要会议和期刊上发表的文章,包括6.5万份文件,收集了5万名作者,其中包括32.5万篇参考文献,约2.7亿字。这些出版物大多是英文的,有些是法文、德文或俄文的。有些是开放获取的,有些是由出版商提供的。为了构建和分析这个语料库,已经使用或开发了几个工具。其中一些使用已经在语料库中发布的自然语言处理方法,因此它的名字。有必要进行大量的手工更正,这表明建立唯一确定作者、出版物或资源的标准的重要性。我们进行了各种研究:文章和作者数量随时间的演变,作者之间的合作,论文和作者之间的引用,研究主题的演变和引入它们的作者的识别,创新的衡量和认识论断裂的检测,语言资源的使用,文章的再利用和剽窃在全球或来源之间的比较分析的背景下。
{"title":"Rediscovering 50 years of discoveries in speech and language processing: A survey","authors":"J. Mariani, Gil Francopoulo, P. Paroubek, F. Vernier, Nam Kyun Kim, Moon Ju Jo, H. Kim","doi":"10.1109/ICSDA.2017.8384413","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384413","url":null,"abstract":"We have created the NLP4NLP corpus to study the content of scientific publications in the field of speech and natural language processing. It contains articles published in 34 major conferences and journals in that field over a period of 50 years (1965-2015), comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing approximately 270 million words. Most of these publications are in English, some are in French, German or Russian. Some are open access, others have been provided by the publishers. In order to constitute and analyze this corpus several tools have been used or developed. Some of them use Natural Language Processing methods that have been published in the corpus, hence its name. Numerous manual corrections were necessary, which demonstrated the importance of establishing standards for uniquely identifying authors, publications or resources. We have conducted various studies: evolution over time of the number of articles and authors, collaborations between authors, citations between papers and authors, evolution of research themes and identification of the authors who introduced them, measure of innovation and detection of epistemological ruptures, use of language resources, reuse of articles and plagiarism in the context of a global or comparative analysis between sources.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124121771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1