首页 > 最新文献

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)最新文献

英文 中文
Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix 使用灵活语境和范畴矩阵的上下文依赖的字素-音素评价语料库
C. Hansakunbuntheung, Sumonmas Thatphithakkul
Context-dependent pronunciation, e.g. homographs, is a difficult grapheme-to-phoneme conversion (G2P) issue. It causes accuracy downgrade in speech synthesis and speech recognition. However, the context-dependent pronunciation issue is rarely considered in collecting pronunciation corpus for evaluating accuracy of G2P. Thus, this paper proposes a context-dependent pronunciation corpus using grapheme-phoneme pairs with their context information for G2P assessment. The context information includes 1) Categorial Matrix for representing orthographic types and usage domains of orthographic groups (OG). Categorial Matrix is designed to investigate problem categories in the G2P. 2) regular-expression-based flexible context for representing context variation. 3) OG Classes for representing interchangeable OGs in the flexible context. The flexible context and the word classes are designed to remove redundant contexts while covering context variation with minimal sets of patterns. By using the proposed corpus, automatic context generation for G2P evaluation can be implemented.
上下文相关的发音,例如同音异义词,是一个困难的字素到音素转换(G2P)问题。它会导致语音合成和语音识别的准确率下降。然而,在收集语音语料库以评估G2P的准确性时,很少考虑上下文相关的发音问题。因此,本文提出了一个基于上下文的语音语料库,该语料库使用字素-音素对及其上下文信息进行G2P评估。上下文信息包括:1)表示正字法组(OG)的正字法类型和使用域的范畴矩阵。范畴矩阵是用来研究G2P中的问题范畴的。2)基于正则表达式的灵活上下文,用于表示上下文变化。3)在灵活的上下文中表示可互换OG的OG类。灵活的上下文和词类旨在删除冗余上下文,同时用最少的模式集覆盖上下文变化。通过使用该语料库,可以实现G2P评估的自动上下文生成。
{"title":"Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix","authors":"C. Hansakunbuntheung, Sumonmas Thatphithakkul","doi":"10.1109/ICSDA.2015.7357884","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357884","url":null,"abstract":"Context-dependent pronunciation, e.g. homographs, is a difficult grapheme-to-phoneme conversion (G2P) issue. It causes accuracy downgrade in speech synthesis and speech recognition. However, the context-dependent pronunciation issue is rarely considered in collecting pronunciation corpus for evaluating accuracy of G2P. Thus, this paper proposes a context-dependent pronunciation corpus using grapheme-phoneme pairs with their context information for G2P assessment. The context information includes 1) Categorial Matrix for representing orthographic types and usage domains of orthographic groups (OG). Categorial Matrix is designed to investigate problem categories in the G2P. 2) regular-expression-based flexible context for representing context variation. 3) OG Classes for representing interchangeable OGs in the flexible context. The flexible context and the word classes are designed to remove redundant contexts while covering context variation with minimal sets of patterns. By using the proposed corpus, automatic context generation for G2P evaluation can be implemented.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134179703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The recognition of neutral tone across acoustic cues 通过声音线索对中性音调的识别
Shanshan Fan, Ao Chen, Ai-jun Li
In Standard Chinese, both F0 and duration are the important acoustic cues for neutral tone perception. The present study focuses on the acoustic cues that contribute to neutral tone perception by checking the interplay between acoustic cues and other factors, including lexical status, the underlying tones. Manipulations were conducted according to different acoustic cues, obtaining three conditions: duration (D), F0 (P), or both duration and F0 (DP). The results showed that 1) both duration and F0 are necessary for neutral tone perception; 2) although F0 is the most reliable cue, the F0 alone is not sufficient for neutral tone identification; 3) real and pseudo words are different, which probably represent distinctive processing mechanisms in language networks; 4) duration serves as a more reliable cue than F0 when the underlying tone is T3; 5) paradigm effect was found in P condition: F0 showed more reliability in ABX paradigm.
在标准汉语中,F0和持续时间都是中性音感知的重要声学线索。本研究通过检查声音线索与其他因素(包括词汇状态、潜在音调)之间的相互作用,重点研究了声音线索对中性音调感知的影响。根据不同的声信号进行操作,获得三种条件:持续时间(D)、F0 (P)或持续时间和F0 (DP)。结果表明:1)持续时间和F0都是中性音感知的必要条件;2)虽然F0是最可靠的线索,但仅F0不足以识别中性音;3)真实词和伪词不同,这可能代表了语言网络中不同的处理机制;4)当底音为T3时,持续时间作为线索比F0更可靠;5)在P条件下存在范式效应,在ABX条件下F0具有更高的信度。
{"title":"The recognition of neutral tone across acoustic cues","authors":"Shanshan Fan, Ao Chen, Ai-jun Li","doi":"10.1109/ICSDA.2015.7357865","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357865","url":null,"abstract":"In Standard Chinese, both F0 and duration are the important acoustic cues for neutral tone perception. The present study focuses on the acoustic cues that contribute to neutral tone perception by checking the interplay between acoustic cues and other factors, including lexical status, the underlying tones. Manipulations were conducted according to different acoustic cues, obtaining three conditions: duration (D), F0 (P), or both duration and F0 (DP). The results showed that 1) both duration and F0 are necessary for neutral tone perception; 2) although F0 is the most reliable cue, the F0 alone is not sufficient for neutral tone identification; 3) real and pseudo words are different, which probably represent distinctive processing mechanisms in language networks; 4) duration serves as a more reliable cue than F0 when the underlying tone is T3; 5) paradigm effect was found in P condition: F0 showed more reliability in ABX paradigm.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115093589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Noise-robust and stress-free visualization of pronunciation diversity of World Englishes using a learner's self-centered viewpoint 以学习者为中心的视角,无噪声、无压力地可视化世界英语的发音多样性
Yuichi Sato, Yosuke Kashiwagi, N. Minematsu, D. Saito, K. Hirose
The term of “World Englishes” describes the current and real state of English and one of their main characteristics is a large diversity of pronunciation, called accents. We have developed two techniques of individual-based clustering of the diversity [1, 2] and educationally-effective visualization of the diversity [3]. Accent clustering requires a technique to quantify the accent gap between any speaker pair and visualization requires a technique of stress-free plotting of the speakers. In the above studies, however, we developed and assessed these two techniques independently and in this paper, we assess our technique of automatic accept gap prediction when it is used for our stress-free visualization. Further, since CALL applications today are not always used in a quiet environment, we introduce a feature enhancement (denoising) technique to improve noise-robustness of accent gap prediction. Results show that our accent gap prediction shows correlation of 0.77 to IPA-based manually-defined accent gaps and that, by applying feature enhancement to noisy input utterances, our technique can predict the accent gap that could be obtained in a clean condition, when the SNR is larger than 10 [dB].
“世界英语”一词描述了当前和真实的英语状态,其主要特征之一是发音的多样性,称为口音。我们已经开发了两种基于个体的多样性聚类技术[1,2]和教育有效的多样性可视化技术[3]。口音聚类需要一种量化任何说话人对之间的口音差距的技术,而可视化需要一种对说话人进行无应力绘图的技术。然而,在上述研究中,我们独立开发和评估了这两种技术,在本文中,我们评估了我们的自动接受间隙预测技术,当它用于我们的无压力可视化时。此外,由于今天的CALL应用并不总是在安静的环境中使用,我们引入了一种特征增强(去噪)技术来提高口音间隙预测的噪声鲁棒性。结果表明,我们的重音间隙预测与基于ipa的手动定义重音间隙的相关性为0.77,并且通过对有噪声的输入话语进行特征增强,我们的技术可以预测在干净条件下,当信噪比大于10 [dB]时可以获得的重音间隙。
{"title":"Noise-robust and stress-free visualization of pronunciation diversity of World Englishes using a learner's self-centered viewpoint","authors":"Yuichi Sato, Yosuke Kashiwagi, N. Minematsu, D. Saito, K. Hirose","doi":"10.1109/ICSDA.2015.7357855","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357855","url":null,"abstract":"The term of “World Englishes” describes the current and real state of English and one of their main characteristics is a large diversity of pronunciation, called accents. We have developed two techniques of individual-based clustering of the diversity [1, 2] and educationally-effective visualization of the diversity [3]. Accent clustering requires a technique to quantify the accent gap between any speaker pair and visualization requires a technique of stress-free plotting of the speakers. In the above studies, however, we developed and assessed these two techniques independently and in this paper, we assess our technique of automatic accept gap prediction when it is used for our stress-free visualization. Further, since CALL applications today are not always used in a quiet environment, we introduce a feature enhancement (denoising) technique to improve noise-robustness of accent gap prediction. Results show that our accent gap prediction shows correlation of 0.77 to IPA-based manually-defined accent gaps and that, by applying feature enhancement to noisy input utterances, our technique can predict the accent gap that could be obtained in a clean condition, when the SNR is larger than 10 [dB].","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114885120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Stress annotated Urdu speech corpus to build female voice for TTS 重音注释乌尔都语语料库构建TTS女声
B. Mumtaz, Saba Urooj, S. Hussain, Wajiha Habib
This research describes the stress annotation process for the two hours of Urdu speech corpus containing 18,640 words and 28,866 syllables to build a natural voice for Text-to-speech (TTS) system. For the stress annotation of speech corpus, two algorithms i.e. phonological and acoustic stress marking algorithms have been tested in comparison to perceptual stress marking. Urdu phonological stress markings algorithm [1] reports 70% accuracy whereas Urdu acoustic stress marking algorithm developed through this research reports 81.2% accuracy. This acoustic stress marking algorithm is then used to annotate two hours of Urdu speech corpus. It is a semi-automatic acoustic stress marking algorithm, which annotates 54% data automatically using duration cue whereas 46% data is marked manually using the acoustic cues of pitch, glottalization and intensity.
本研究描述了对2小时的乌尔都语语音语料库(18,640个单词,28,866个音节)的重音标注过程,以构建文本到语音(TTS)系统的自然语音。对于语音语料库的重音标注,我们测试了语音重音标注和声学重音标注两种算法,并与感知重音标注进行了比较。乌尔都语语音重音标记算法[1]的准确率为70%,而本研究开发的乌尔都语声学重音标记算法的准确率为81.2%。然后使用该声学重音标注算法对两小时的乌尔都语语料库进行标注。它是一种半自动声应力标记算法,其中54%的数据使用持续时间线索自动标注,46%的数据使用音调、全球化和强度等声学线索手动标注。
{"title":"Stress annotated Urdu speech corpus to build female voice for TTS","authors":"B. Mumtaz, Saba Urooj, S. Hussain, Wajiha Habib","doi":"10.1109/ICSDA.2015.7357857","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357857","url":null,"abstract":"This research describes the stress annotation process for the two hours of Urdu speech corpus containing 18,640 words and 28,866 syllables to build a natural voice for Text-to-speech (TTS) system. For the stress annotation of speech corpus, two algorithms i.e. phonological and acoustic stress marking algorithms have been tested in comparison to perceptual stress marking. Urdu phonological stress markings algorithm [1] reports 70% accuracy whereas Urdu acoustic stress marking algorithm developed through this research reports 81.2% accuracy. This acoustic stress marking algorithm is then used to annotate two hours of Urdu speech corpus. It is a semi-automatic acoustic stress marking algorithm, which annotates 54% data automatically using duration cue whereas 46% data is marked manually using the acoustic cues of pitch, glottalization and intensity.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133196825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Information content, weighting and distribution in continuous speech prosody - A cross-genre comparison 连续语音韵律中的信息内容、权重和分布——跨体裁比较
Helen Kai-Yun Chen, Wei-te Fang, Chiu-yu Tseng
This study explores the composition of information content in continuous speech using data of a diversity of speech genres. Our approach is to measure information weighting, distribution and correlative expressiveness through perceived prosodic prominences in continuous speech from data of 4 different styles. This alternative perspective differs from reported studies on emotion related prosodic expressions and is based mainly on the assumption that patterned prominences are also positively correlated with the allocation and weighted loading of information, but only by higher level of discourse units. Four speech genres, i.e., 2 styles of read vs. 2 of spontaneous speech annotated with perceived prominences at 4 relative degrees are compared. Information allocation and weighting are calculated using both frequency count of prominence patterns and designation of weighting scores by prominence levels. The most revealing results are found in data of spontaneous conversation, which feature in more varieties of emphasis patterns as results of constant reduction. Far more significantly, conversation data also showcase that while their paragraph-level prosodic units carry the least amount of information content, the discourse-level prosodic units exhibit the highest score of information weighting. In other words, one major but less known distinctive feature of conversation speech is its largest amount of information content, which only surfaces when examined by the highest level of discourse-prosodic unit. We believe the results have furthered our understanding of prosody expressions in continuous speech in general and spontaneous conversation in particular; and could readily be utilized in many speech technology related implementations.
本研究利用不同语音体裁的数据,探讨了连续语音中信息内容的构成。我们的方法是通过从4种不同风格的数据中感知连续语音的韵律突出来衡量信息的权重、分布和相关表达性。这一替代性观点不同于已有的关于情绪相关韵律表达的研究,主要基于这样的假设:模式突出也与信息的分配和加权负载呈正相关,但仅与更高层次的话语单位呈正相关。比较了四种语言类型,即2种阅读风格和2种带有感知突出度的自发语言。信息分配和权重的计算使用突出模式的频率计数和突出水平加权分数的指定。最具启发性的结果是在自发对话的数据中发现的,由于不断减少,其特征是更多种类的强调模式。更重要的是,会话数据还显示,虽然段落级韵律单位携带的信息内容最少,但篇章级韵律单位的信息权重得分最高。换句话说,会话言语的一个主要但不太为人所知的显著特征是其信息量最大,只有在最高水平的话语韵律单位中进行研究时,这一点才会显现出来。我们相信这些结果进一步加深了我们对一般连续言语和自发对话中韵律表达的理解;并且可以很容易地应用于许多语音技术相关的实现中。
{"title":"Information content, weighting and distribution in continuous speech prosody - A cross-genre comparison","authors":"Helen Kai-Yun Chen, Wei-te Fang, Chiu-yu Tseng","doi":"10.1109/ICSDA.2015.7357868","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357868","url":null,"abstract":"This study explores the composition of information content in continuous speech using data of a diversity of speech genres. Our approach is to measure information weighting, distribution and correlative expressiveness through perceived prosodic prominences in continuous speech from data of 4 different styles. This alternative perspective differs from reported studies on emotion related prosodic expressions and is based mainly on the assumption that patterned prominences are also positively correlated with the allocation and weighted loading of information, but only by higher level of discourse units. Four speech genres, i.e., 2 styles of read vs. 2 of spontaneous speech annotated with perceived prominences at 4 relative degrees are compared. Information allocation and weighting are calculated using both frequency count of prominence patterns and designation of weighting scores by prominence levels. The most revealing results are found in data of spontaneous conversation, which feature in more varieties of emphasis patterns as results of constant reduction. Far more significantly, conversation data also showcase that while their paragraph-level prosodic units carry the least amount of information content, the discourse-level prosodic units exhibit the highest score of information weighting. In other words, one major but less known distinctive feature of conversation speech is its largest amount of information content, which only surfaces when examined by the highest level of discourse-prosodic unit. We believe the results have furthered our understanding of prosody expressions in continuous speech in general and spontaneous conversation in particular; and could readily be utilized in many speech technology related implementations.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121565597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Elicit spoken-style data from social media through a style classifier 通过风格分类器从社交媒体中获取口语风格数据
A. Chotimongkol, Vataya Chunwijitra, Sumonmas Thatphithakkul, Nattapong Kurpukdee, C. Wutiwiwatchai
We explore the use of social media data to reduce the effort in developing a conversational speech corpus. The LOTUS-SOC corpus is created by recording Twitter messages through a mobile application. In the first phase, which took around one month, 172 hours of speech from 208 speakers were recorded and ready for use without the need for speech segmentation and transcription. In terms of language similarity to spoken language, the perplexity of LOTUS-SOC with respect to known spoken utterances is lower than that of the broadcast news corpus and almost as low as the telephone conversation corpus. We also applied a style classifier trained from words and parts-of-speech using two machine learning approaches, SVM and CRF, to identify spoken-style utterances in LOTUS-SOC. By training a language model from only the utterances classified as “spoken”, the perplexity of LOTUS-SOC was further reduced as evaluated by three different sets of spoken utterances.
我们探索使用社交媒体数据来减少开发会话语音语料库的工作量。LOTUS-SOC语料库是通过移动应用程序记录Twitter消息创建的。在第一阶段,大约花了一个月的时间,记录了208位演讲者172小时的演讲,并准备使用,而不需要语音分割和转录。在与口语的语言相似度方面,LOTUS-SOC对已知口语话语的困惑度低于广播新闻语料库,几乎与电话会话语料库一样低。我们还应用了一个使用两种机器学习方法(SVM和CRF)从单词和词性中训练的风格分类器来识别LOTUS-SOC中的口语风格话语。通过仅从被分类为“口语”的话语中训练语言模型,进一步降低LOTUS-SOC的困惑度,并通过三组不同的口语话语进行评估。
{"title":"Elicit spoken-style data from social media through a style classifier","authors":"A. Chotimongkol, Vataya Chunwijitra, Sumonmas Thatphithakkul, Nattapong Kurpukdee, C. Wutiwiwatchai","doi":"10.1109/ICSDA.2015.7357856","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357856","url":null,"abstract":"We explore the use of social media data to reduce the effort in developing a conversational speech corpus. The LOTUS-SOC corpus is created by recording Twitter messages through a mobile application. In the first phase, which took around one month, 172 hours of speech from 208 speakers were recorded and ready for use without the need for speech segmentation and transcription. In terms of language similarity to spoken language, the perplexity of LOTUS-SOC with respect to known spoken utterances is lower than that of the broadcast news corpus and almost as low as the telephone conversation corpus. We also applied a style classifier trained from words and parts-of-speech using two machine learning approaches, SVM and CRF, to identify spoken-style utterances in LOTUS-SOC. By training a language model from only the utterances classified as “spoken”, the perplexity of LOTUS-SOC was further reduced as evaluated by three different sets of spoken utterances.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127213607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A comparison study on contextual modeling for estimating functional loads of phonological contrasts 语音对比功能负荷估算的语境建模比较研究
Bin Wu, Yanlu Xie, Jinsong Zhang
Functional load (FL) is the quantitative measure of the importance of phonological contrasts, which stand for the differentiation of communicative linguistic units. Correct estimate of FLs is useful for the studies of speech recognition, language evolution, language teaching and etc. Conventional approaches use phonological transcriptions and unigram probabilities for the estimation, hence weak in contextual modeling. Based on the measurement of mutual information (MI) between the text and its phonological transcription, we previously proposed a novel FL measurement which utilizes n-gram word probabilities, hence owing better context modeling power. In this study, we compare the effects of different context on the estimation of FL: syllable, word, n-gram word model, and open data. Experimental results show: the wider the context modeling, the smaller the FL; FL based on MI with the trigram model achieves the best performance in modeling the context in our experiments. Compared with FL based on entropy, FL based on MI showed smaller value and is applicable to open data.
功能负荷是对语音对比重要性的定量衡量,语音对比代表着交际语言单位的分化。外语的正确估计对语音识别、语言进化、语言教学等方面的研究具有重要意义。传统的方法使用语音转录和单字母概率进行估计,因此在上下文建模方面很弱。基于文本及其语音转录之间的互信息(MI)测量,我们先前提出了一种新的互信息测量方法,该方法利用n-gram单词概率,因此具有更好的上下文建模能力。在本研究中,我们比较了不同语境对语音识别的影响:音节、单词、n-gram单词模型和开放数据。实验结果表明:上下文建模越宽,FL越小;在我们的实验中,基于三元图模型的人工智能在上下文建模方面取得了最好的效果。与基于熵的模糊模糊相比,基于MI的模糊模糊值更小,适用于开放数据。
{"title":"A comparison study on contextual modeling for estimating functional loads of phonological contrasts","authors":"Bin Wu, Yanlu Xie, Jinsong Zhang","doi":"10.1109/ICSDA.2015.7357886","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357886","url":null,"abstract":"Functional load (FL) is the quantitative measure of the importance of phonological contrasts, which stand for the differentiation of communicative linguistic units. Correct estimate of FLs is useful for the studies of speech recognition, language evolution, language teaching and etc. Conventional approaches use phonological transcriptions and unigram probabilities for the estimation, hence weak in contextual modeling. Based on the measurement of mutual information (MI) between the text and its phonological transcription, we previously proposed a novel FL measurement which utilizes n-gram word probabilities, hence owing better context modeling power. In this study, we compare the effects of different context on the estimation of FL: syllable, word, n-gram word model, and open data. Experimental results show: the wider the context modeling, the smaller the FL; FL based on MI with the trigram model achieves the best performance in modeling the context in our experiments. Compared with FL based on entropy, FL based on MI showed smaller value and is applicable to open data.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128895288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tonal alignment in Shanghai Chinese 上海汉语的调性
Bijun Ling, Jie Liang
In this paper, we investigated the tonal alignment in open syllable (CV) and closed syllable (CV?) starting with nasal consonant /m/ and with rising/falling F0 contours in Shanghai Chinese. Results show that a glottal coda shortens the duration of vowel significantly and in order to keep the isochronism of syllable, the duration of nasal consonant /m/ showed a significant compensatory lengthening effect, which makes the duration of consonant longer than vowel in closed syllables. As the onset of tone (rise/fall) normally stayed around the center of the host syllable [12], the onset of tone in closed syllable (T5) located within the nasal consonant /m/, which indicated that the implementation of tone started from the onset of its host syllable rather than from the onset of the rhyme and verified that the whole syllable was the tone carrier.
本文研究了上海汉语以鼻辅音/m/开头的开音节和闭音节(CV?)和F0起落线的声调排列。结果表明,声门尾音明显缩短了元音的持续时间,为了保持音节的等时性,鼻辅音/m/的持续时间表现出明显的补偿性延长效应,使得封闭音节中辅音的持续时间比元音长。由于声调起头(升/降)通常在主音节的中心附近[12],闭音节(T5)的声调起头位于鼻辅音/m/内,说明声调的实现是从主音节的起头开始,而不是从押韵的起头开始,验证了整个音节是声调的载体。
{"title":"Tonal alignment in Shanghai Chinese","authors":"Bijun Ling, Jie Liang","doi":"10.1109/ICSDA.2015.7357878","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357878","url":null,"abstract":"In this paper, we investigated the tonal alignment in open syllable (CV) and closed syllable (CV?) starting with nasal consonant /m/ and with rising/falling F0 contours in Shanghai Chinese. Results show that a glottal coda shortens the duration of vowel significantly and in order to keep the isochronism of syllable, the duration of nasal consonant /m/ showed a significant compensatory lengthening effect, which makes the duration of consonant longer than vowel in closed syllables. As the onset of tone (rise/fall) normally stayed around the center of the host syllable [12], the onset of tone in closed syllable (T5) located within the nasal consonant /m/, which indicated that the implementation of tone started from the onset of its host syllable rather than from the onset of the rhyme and verified that the whole syllable was the tone carrier.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Construction and analysis of social-affective interaction corpus in English and Indonesian 英语和印尼语社会情感互动语料库的构建与分析
Nurul Lubis, S. Sakti, Graham Neubig, T. Toda, Satoshi Nakamura
Social-affective aspects of interaction play a vital role in making human communication a rich and dynamic experience. Observation of complex emotional phenomena requires rich sets of labeled data of natural interaction. Although there has been an increase of interest in constructing corpora containing social interactions, there is still a lack of spontaneous and emotionally rich corpora. This paper presents a corpus of social-affective interactions in English and Indonesian, constructed from various television talk shows, containing natural conversations and real emotion occurrences. We carefully annotate the corpus in terms of emotion and discourse structure to allow for the aforementioned observation. The corpus is still in its early stage of development, yielding wide-ranging possibilities for future work.
互动的社会情感方面在使人类交流成为丰富而动态的体验方面起着至关重要的作用。观察复杂的情绪现象需要丰富的自然交互的标记数据集。尽管人们对构建包含社会互动的语料库越来越感兴趣,但仍然缺乏自发的、情感丰富的语料库。本文从各种电视谈话节目中构建了英语和印尼语的社会情感互动语料库,包含自然对话和真实情感事件。我们根据情感和话语结构对语料库进行了仔细的注释,以便进行上述观察。该语料库仍处于早期发展阶段,为今后的工作提供了广泛的可能性。
{"title":"Construction and analysis of social-affective interaction corpus in English and Indonesian","authors":"Nurul Lubis, S. Sakti, Graham Neubig, T. Toda, Satoshi Nakamura","doi":"10.1109/ICSDA.2015.7357892","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357892","url":null,"abstract":"Social-affective aspects of interaction play a vital role in making human communication a rich and dynamic experience. Observation of complex emotional phenomena requires rich sets of labeled data of natural interaction. Although there has been an increase of interest in constructing corpora containing social interactions, there is still a lack of spontaneous and emotionally rich corpora. This paper presents a corpus of social-affective interactions in English and Indonesian, constructed from various television talk shows, containing natural conversations and real emotion occurrences. We carefully annotate the corpus in terms of emotion and discourse structure to allow for the aforementioned observation. The corpus is still in its early stage of development, yielding wide-ranging possibilities for future work.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123340181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On finding word-level break-type formation rules for mandarin read speech 汉语朗读语音的词级断型构词法研究
Fu-Ja Kung, Pauline Lee, Yih-Ru Wang, Sin-Horng Chen, Chen-Yu Chiang
This paper presents a study on exploring word-level break-type formation rules for Mandarin read speech. A 4-layer hierarchical structure with seven break types is adopted to represent the prosody of utterance. The work is based on the break-type tags labeled on a large read-speech database by the prosody labeling and modeling algorithm (PLM) proposed previously. Occurrence frequencies of seven break types for pre- and post-boundaries of several types of function words are calculated and taken as the inferred statistical break-type formation rules. Linguistic interpretations of the most likely break types occurred at pre- and post-boundaries of each function word are discussed. Some exceptions that deviate from the most likely break types are also examined.
本文对汉语阅读语音的词级断型构词法进行了研究。采用四层分层结构,7种间断类型来表示话语韵律。该工作是基于先前提出的韵律标记和建模算法(PLM)在大型读-语音数据库上标记的中断类型标签。计算了几种虚词前后边界的7种断续类型的出现频率,并以此作为推断的统计断续类型构成规则。讨论了在每个虚词的前后边界最可能发生的断续类型的语言解释。还检查了一些偏离最可能中断类型的异常。
{"title":"On finding word-level break-type formation rules for mandarin read speech","authors":"Fu-Ja Kung, Pauline Lee, Yih-Ru Wang, Sin-Horng Chen, Chen-Yu Chiang","doi":"10.1109/ICSDA.2015.7357864","DOIUrl":"https://doi.org/10.1109/ICSDA.2015.7357864","url":null,"abstract":"This paper presents a study on exploring word-level break-type formation rules for Mandarin read speech. A 4-layer hierarchical structure with seven break types is adopted to represent the prosody of utterance. The work is based on the break-type tags labeled on a large read-speech database by the prosody labeling and modeling algorithm (PLM) proposed previously. Occurrence frequencies of seven break types for pre- and post-boundaries of several types of function words are calculated and taken as the inferred statistical break-type formation rules. Linguistic interpretations of the most likely break types occurred at pre- and post-boundaries of each function word are discussed. Some exceptions that deviate from the most likely break types are also examined.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134328450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1