首页 > 最新文献

2013 International Conference on Asian Language Processing最新文献

英文 中文
Transfer Grammar in Tamil-Hindi MT System 泰米尔语-印地语MT系统中的迁移语法
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.24
S. L. Devi, Sindhuja Gopalan, R. Ram
In this paper, we present the work on transfer grammar, one of the most challenging issues in MT, in a bidirectional Tamil-Hindi translation system-Sam park. Transfer grammar between the above languages can be categorized into two levels (1) the structure transfer and (2) lexical level transfer. Tamil and Hindi differ extensively at the clausal construction level and at the verb formation level since Tamil is an agglutinative language and Hindi is not. Transfer grammar described here uses a hybrid approach using CRF a machine learning algorithm and linguistic rules for structure transfer, a rule based approach for word level transfer. We tested the approach in the Sam park system using web data and the results are encouraging.
在本文中,我们介绍了在泰米尔语-印地语双向翻译系统中迁移语法的研究,这是机器翻译中最具挑战性的问题之一。上述语言间的语法迁移可分为两个层面(1)结构层面的迁移和(2)词汇层面的迁移。泰米尔语和印地语在小句结构层面和动词构词层面上有很大的不同,因为泰米尔语是一种粘连语言,而印地语不是。这里描述的迁移语法使用了一种混合方法,使用CRF(机器学习算法)和语言规则进行结构迁移,这是一种基于规则的词级迁移方法。我们使用web数据在Sam park系统中测试了该方法,结果令人鼓舞。
{"title":"Transfer Grammar in Tamil-Hindi MT System","authors":"S. L. Devi, Sindhuja Gopalan, R. Ram","doi":"10.1109/IALP.2013.24","DOIUrl":"https://doi.org/10.1109/IALP.2013.24","url":null,"abstract":"In this paper, we present the work on transfer grammar, one of the most challenging issues in MT, in a bidirectional Tamil-Hindi translation system-Sam park. Transfer grammar between the above languages can be categorized into two levels (1) the structure transfer and (2) lexical level transfer. Tamil and Hindi differ extensively at the clausal construction level and at the verb formation level since Tamil is an agglutinative language and Hindi is not. Transfer grammar described here uses a hybrid approach using CRF a machine learning algorithm and linguistic rules for structure transfer, a rule based approach for word level transfer. We tested the approach in the Sam park system using web data and the results are encouraging.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131928738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Comparative Research on the Segmentation Strategies of Tibetan Bounded-Variant Forms 藏文有界变型分词策略的比较研究
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.75
Congjun Long, Caijun Kang, Di Jiang
The segmentation of Tibetan bounded-variant forms (TBVFS) is one of the most foundational tasks in text processing and the segmenting results directly influence the word segmentation, portaging, syntactic parsing and the Named Entity Extraction and so on. At present, the segmenting results are unsatisfactory and cannot be applied in practice. In this article, authors firstly describe the features of TBVFS, their distributions and then test the segmenting results by using two different segmentation strategies and conclude that Statistics-based methods for morpheme position tagging is better than Rule-based methods. If some rules are used to adjust a part of mistaken segmentations in the post processing, this kind of segmentation problem can be resolved.
藏文有界变形式的分词是文本处理中最基础的任务之一,分词结果直接影响到分词、翻译、句法分析和命名实体抽取等工作。目前,分割效果不理想,不能应用于实际。本文首先描述了TBVFS的特征及其分布,然后使用两种不同的分词策略对分词结果进行了测试,得出基于statistics的词素位置标注方法优于基于rule的方法。如果在后期处理中使用一些规则对部分分割错误进行调整,就可以解决这类分割问题。
{"title":"The Comparative Research on the Segmentation Strategies of Tibetan Bounded-Variant Forms","authors":"Congjun Long, Caijun Kang, Di Jiang","doi":"10.1109/IALP.2013.75","DOIUrl":"https://doi.org/10.1109/IALP.2013.75","url":null,"abstract":"The segmentation of Tibetan bounded-variant forms (TBVFS) is one of the most foundational tasks in text processing and the segmenting results directly influence the word segmentation, portaging, syntactic parsing and the Named Entity Extraction and so on. At present, the segmenting results are unsatisfactory and cannot be applied in practice. In this article, authors firstly describe the features of TBVFS, their distributions and then test the segmenting results by using two different segmentation strategies and conclude that Statistics-based methods for morpheme position tagging is better than Rule-based methods. If some rules are used to adjust a part of mistaken segmentations in the post processing, this kind of segmentation problem can be resolved.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115292476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Novel Gaussian Filter-Based Automatic Labeling of Speech Data for TTS System in Gujarati Language 一种基于高斯滤波的古吉拉特语TTS系统语音数据自动标注方法
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.46
Swati Talesara, H. Patil, T. Patel, Hardik B. Sailor, Nirmesh J. Shah
Text-to-speech (TTS) synthesizer has been proved to be an aiding tool for many visually challenged people for reading through hearing feedback. There are TTS synthesizers available in English, however, it has been observed that people feel more comfortable in hearing their own native language. Keeping this point in mind, Gujarati TTS synthesizer has been built. This TTS system has been built in Festival speech synthesis framework. Syllable is taken as the basic unit in building Gujarati TTS synthesizer as Indian languages are syllabic in nature. In building the unit-selection based Gujarati TTS system, one requires large Gujarati labeled corpus. The task of labeling is most time-consuming and tedious. This task requires large manual efforts. Therefore, in this work, an attempt has been made to reduce these efforts by automatically generating labeled corpus at syllable-level. To that effect, a Gaussian-based segmentation method has been proposed for automatic segmentation of speech at syllable-level. It has been observed that percentage correctness of labeled data is around 80% for both male and female voice as compared to 70% for group delay-based labeling. In addition, the system built on the proposed approach shows better intelligibility when evaluated by a visually challenged subject. The word error rate is reduced by 5% for Gaussian filter-based TTS system, compared to group delay-based TTS system. Also, 5% increment is observed in correctly synthesized words. The main focus of this work is to reduce the manual efforts required in building TTS system (which are primarily the manual efforts required in labeling speech data) for Gujarati.
文本到语音(TTS)合成器已被证明是一个辅助工具,许多视障人士通过听觉反馈阅读。有英语版的TTS合成器,然而,据观察,人们在听到自己的母语时感觉更舒服。记住这一点,古吉拉特TTS合成器已经建成。该TTS系统是在Festival语音合成框架下构建的。古吉拉特语TTS合成器以音节为基本单位,因为印度语言具有音节性。在构建基于单位选择的古吉拉特语TTS系统时,需要大量的古吉拉特语标记语料库。贴标签是最耗时、最乏味的工作。这项任务需要大量的手工工作。因此,在这项工作中,我们试图通过在音节级自动生成标记语料库来减少这些工作量。为此,提出了一种基于高斯的语音自动分词方法。据观察,男性和女性语音标记数据的正确率都在80%左右,而基于群体延迟的标记的正确率为70%。此外,当视觉障碍受试者评估时,基于该方法构建的系统显示出更好的可理解性。与基于组延迟的TTS系统相比,基于高斯滤波器的TTS系统的单词错误率降低了5%。此外,在正确合成的单词中观察到5%的增量。这项工作的主要重点是减少为古吉拉特语构建TTS系统所需的手工工作(主要是标记语音数据所需的手工工作)。
{"title":"A Novel Gaussian Filter-Based Automatic Labeling of Speech Data for TTS System in Gujarati Language","authors":"Swati Talesara, H. Patil, T. Patel, Hardik B. Sailor, Nirmesh J. Shah","doi":"10.1109/IALP.2013.46","DOIUrl":"https://doi.org/10.1109/IALP.2013.46","url":null,"abstract":"Text-to-speech (TTS) synthesizer has been proved to be an aiding tool for many visually challenged people for reading through hearing feedback. There are TTS synthesizers available in English, however, it has been observed that people feel more comfortable in hearing their own native language. Keeping this point in mind, Gujarati TTS synthesizer has been built. This TTS system has been built in Festival speech synthesis framework. Syllable is taken as the basic unit in building Gujarati TTS synthesizer as Indian languages are syllabic in nature. In building the unit-selection based Gujarati TTS system, one requires large Gujarati labeled corpus. The task of labeling is most time-consuming and tedious. This task requires large manual efforts. Therefore, in this work, an attempt has been made to reduce these efforts by automatically generating labeled corpus at syllable-level. To that effect, a Gaussian-based segmentation method has been proposed for automatic segmentation of speech at syllable-level. It has been observed that percentage correctness of labeled data is around 80% for both male and female voice as compared to 70% for group delay-based labeling. In addition, the system built on the proposed approach shows better intelligibility when evaluated by a visually challenged subject. The word error rate is reduced by 5% for Gaussian filter-based TTS system, compared to group delay-based TTS system. Also, 5% increment is observed in correctly synthesized words. The main focus of this work is to reduce the manual efforts required in building TTS system (which are primarily the manual efforts required in labeling speech data) for Gujarati.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121566932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Improving Chinese Parsing with Special-Case Probability Re-estimation 用特殊情况概率重估计改进汉语句法分析
Pub Date : 2013-06-01 DOI: 10.1109/IALP.2013.54
Yu-Ming Hsieh, Su-Chu Lin, Jason J. S. Chang, Keh-Jiann Chen
Syntactic patterns which are hard to be expressed by binary dependent relations need special treatments, since structure evaluations of such constructions are different from general parsing framework. Moreover, these different syntactic patterns (special cases) should be handled with distinct estimated model other than the general one. In this paper, we present a special-case probability re-estimation model (SCM), integrating the general model with an adoptable estimated model in special cases. The SCM model can estimate evaluation scores in specific syntactic constructions more accurately, and is able for adopting different features in different cases. Experiment results show that our proposed model has better performance than the state-of-the-art parser in Chinese.
难以用二元依赖关系表示的句法模式需要特殊处理,因为这种结构的结构评估不同于一般的解析框架。此外,这些不同的语法模式(特殊情况)应该用不同的估计模型来处理,而不是一般的估计模型。本文提出了一种特殊情况下的概率再估计模型(SCM),它将一般模型与特殊情况下可采用的估计模型相结合。SCM模型可以更准确地估计特定句法结构的评价分数,并且可以在不同的情况下采用不同的特征。实验结果表明,该模型比目前最先进的中文解析器具有更好的性能。
{"title":"Improving Chinese Parsing with Special-Case Probability Re-estimation","authors":"Yu-Ming Hsieh, Su-Chu Lin, Jason J. S. Chang, Keh-Jiann Chen","doi":"10.1109/IALP.2013.54","DOIUrl":"https://doi.org/10.1109/IALP.2013.54","url":null,"abstract":"Syntactic patterns which are hard to be expressed by binary dependent relations need special treatments, since structure evaluations of such constructions are different from general parsing framework. Moreover, these different syntactic patterns (special cases) should be handled with distinct estimated model other than the general one. In this paper, we present a special-case probability re-estimation model (SCM), integrating the general model with an adoptable estimated model in special cases. The SCM model can estimate evaluation scores in specific syntactic constructions more accurately, and is able for adopting different features in different cases. Experiment results show that our proposed model has better performance than the state-of-the-art parser in Chinese.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114338832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2013 International Conference on Asian Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1