Automatic acquisition of morphological resources for Melanau language

Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang
{"title":"Automatic acquisition of morphological resources for Melanau language","authors":"Suhaila Saee, Lay-Ki Soon, T. Lim, Bali Ranaivo-Malançon, J. Juk, E. Tang","doi":"10.1109/IALP.2014.6973523","DOIUrl":null,"url":null,"abstract":"Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Computational morphological resources are the crucial component needed in providing morphological information to create morphological analyser. To acquire the morphological resources in a manual way, two main components are required. The components, which are preprocessing and morphology induction, have led to two issues: i) time consuming and ii) ambiguity in managing the resources from under-resourced languages perspective. We proposed an automatic acquisition of morphological resources tool, which is an extension from the manual way, to overcome the mentioned issues. In this work, three main modules in the proposed automatic tool are: i) tokenization - to tokenise a raw text and generate a wordlist, ii) conversion - to convert a softcopy of morphological resources into required formats and iii) integration of segmentation tools - to integrate two established segmentation tools, namely, Linguistica and Morfessor, in obtaining morphological information from the generated wordlist. Two testing methods have been conducted are component and integration testing. Result shows the proposed tool has been devised and the effectiveness has been demonstrated which allows the linguist to obtain their wordlist and segmented data easily. We believe the proposed tool will assist other researchers to construct computational morphological resources in automated way for under-resourced languages.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
美拉瑙语词形资源的自动获取
计算形态学资源是提供形态学信息以创建形态学分析器所必需的关键组成部分。要手动获取形态学资源,需要两个主要组成部分。预处理和词法归纳两个组成部分导致了两个问题:1)耗时;2)从资源不足的语言角度管理资源的模糊性。为了克服上述问题,我们提出了一种从人工方式扩展而来的形态资源自动获取工具。在这项工作中,提出的自动工具中的三个主要模块是:i)标记化-对原始文本进行标记并生成词表;ii)转换-将形态学资源的软拷贝转换为所需格式;iii)分词工具集成-集成两个已建立的分词工具,即Linguistica和Morfessor,从生成的词表中获取形态学信息。测试方法主要有组件测试和集成测试两种。结果表明,所提出的工具已经被设计出来,并证明了它的有效性,使语言学家能够轻松地获得他们的词表和分割数据。我们相信所提出的工具将有助于其他研究人员以自动化的方式为资源不足的语言构建计算形态资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic detection of subject/object drops in Bengali Which performs better for new word detection, character based or Chinese Word Segmentation based? Effectiveness of multiscale fractal dimension-based phonetic segmentation in speech synthesis for low resource language A Cepstral Mean Subtraction based features for Singer Identification The analysis on mistaken segmentation of Tibetan words based on statistical method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1