Automatic acquisition of morphological resources for Melanau language

2014 International Conference on Asian Language Processing (IALP) Pub Date : 2014-10-01 DOI:10.1109/IALP.2014.6973523

Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang

{"title":"Automatic acquisition of morphological resources for Melanau language","authors":"Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang","doi":"10.1109/IALP.2014.6973523","DOIUrl":null,"url":null,"abstract":"Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

美拉瑙语词形资源的自动获取

计算形态学资源是提供形态学信息以创建形态学分析器所必需的关键组成部分。要手动获取形态学资源，需要两个主要组成部分。预处理和词法归纳两个组成部分导致了两个问题:1)耗时;2)从资源不足的语言角度管理资源的模糊性。为了克服上述问题，我们提出了一种从人工方式扩展而来的形态资源自动获取工具。在这项工作中，提出的自动工具中的三个主要模块是:i)标记化-对原始文本进行标记并生成词表;ii)转换-将形态学资源的软拷贝转换为所需格式;iii)分词工具集成-集成两个已建立的分词工具，即Linguistica和Morfessor，从生成的词表中获取形态学信息。测试方法主要有组件测试和集成测试两种。结果表明，所提出的工具已经被设计出来，并证明了它的有效性，使语言学家能够轻松地获得他们的词表和分割数据。我们相信所提出的工具将有助于其他研究人员以自动化的方式为资源不足的语言构建计算形态资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 International Conference on Asian Language Processing (IALP)

自引率

0.00%

发文量