Kikamba计算语法研究

数据分析和信息处理(英文) Pub Date : 2019-09-12 DOI:10.4236/jdaip.2019.74015

Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi

{"title":"Kikamba计算语法研究","authors":"Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi","doi":"10.4236/jdaip.2019.74015","DOIUrl":null,"url":null,"abstract":"The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.","PeriodicalId":71434,"journal":{"name":"数据分析和信息处理(英文)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Towards Kikamba Computational Grammar\",\"authors\":\"Benson Kituku, Wanjiku Ng'ang'a, Lawrence Muchemi\",\"doi\":\"10.4236/jdaip.2019.74015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.\",\"PeriodicalId\":71434,\"journal\":{\"name\":\"数据分析和信息处理(英文)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"数据分析和信息处理(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.4236/jdaip.2019.74015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"数据分析和信息处理(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/jdaip.2019.74015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

由于缺乏数字化语料库，开发Kikamba语言的更有效和流行的数据驱动方法受到数据稀疏的影响，因此资源不足的Kikamba语言几乎没有语言技术工具。为了应对这一挑战，我们在多语言语法框架(GF)工具包中为Kikamba语言开发了一个计算语法。GF使用Interlingua基于规则的翻译方法。为了开发语法，我们使用了词法驱动策略。因此，我们首先开发了词形变化的正则表达式，然后开发了语法规则。使用英语和基坎巴语的100个句子对语法进行了评估。结果令人鼓舞的4 n-gram BLEU得分为83.05%，位置无关错误率(PER)为10.96%。最后，我们为Kikamba的语言技术资源做出了贡献，包括多语言机器翻译，形态学分析仪，计算语法，它为多语言应用程序的开发提供了一个平台，并能够为Kikamba生成各种双语语料库，用于目前在GF中定义的所有语言，使其更容易实验数据驱动的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Towards Kikamba Computational Grammar

The under-resourced Kikamba language has few language technology tools since the more efficient and popular data driven approaches for developing them suffer from data sparseness due to lack of digitized corpora. To address this challenge, we have developed a computational grammar for the Kikamba language within the multilingual Grammatical Framework (GF) toolkit. GF uses the Interlingua rule-based translation approach. To develop the grammar, we used the morphology driven strategy. Therefore, we first developed regular expressions for morphology inflection and thereafter developed the syntax rules. Evaluation of the grammar was done using one hundred sentences in both English and Kikamba languages. The results were an encouraging four n-gram BLEU score of 83.05% and the Position independent error rate (PER) of 10.96%. Finally, we have made a contribution to the language technology resources for Kikamba including multilingual machine translation, a morphology analyzer, a computational grammar which provides a platform for development of multilingual applications and the ability to generate a variety of bilingual corpora for Kikamba for all languages currently defined in GF, making it easier to experiment with data driven approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

数据分析和信息处理(英文)

自引率

0.00%

发文量