印尼语文本不同词性标注技术的比较

2017 7th International Annual Engineering Seminar (InAES) Pub Date : 2017-08-01 DOI:10.1109/INAES.2017.8068538

Ahmad Zuli Amrullah, Rudy Hartanto, I. Mustika

{"title":"印尼语文本不同词性标注技术的比较","authors":"Ahmad Zuli Amrullah, Rudy Hartanto, I. Mustika","doi":"10.1109/INAES.2017.8068538","DOIUrl":null,"url":null,"abstract":"Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of annotated training corpuses to tag properly. We used some resource annotation corpus of Bahasa. Those corpuses were implemented with POS Tagging techniques. We subsequently compared and analyzed the results. We also compared the accuracy and highlighted some advantages and disadvantages for every technique we used. Unigram showed a higher accuracy compared to HMM and Brill tagger with 88,37% on a tagged corpus.","PeriodicalId":382919,"journal":{"name":"2017 7th International Annual Engineering Seminar (InAES)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia\",\"authors\":\"Ahmad Zuli Amrullah, Rudy Hartanto, I. Mustika\",\"doi\":\"10.1109/INAES.2017.8068538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of annotated training corpuses to tag properly. We used some resource annotation corpus of Bahasa. Those corpuses were implemented with POS Tagging techniques. We subsequently compared and analyzed the results. We also compared the accuracy and highlighted some advantages and disadvantages for every technique we used. Unigram showed a higher accuracy compared to HMM and Brill tagger with 88,37% on a tagged corpus.\",\"PeriodicalId\":382919,\"journal\":{\"name\":\"2017 7th International Annual Engineering Seminar (InAES)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 7th International Annual Engineering Seminar (InAES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INAES.2017.8068538\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th International Annual Engineering Seminar (InAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INAES.2017.8068538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

词性标注有一些不同的方法或技术来为文本中的每个词分配词性标注。在本文中，我们使用统计方法(Unigram，隐马尔可夫模型)和Brill标注器对印尼语进行了词性标注技术实验。在本研究中，我们使用监督式词性标注方法，需要大量带注释的训练语料库来正确标注。我们使用了一些资源标注语料库。这些语料库采用词性标注技术实现。我们随后对结果进行了比较和分析。我们还比较了准确性，并强调了我们使用的每种技术的一些优点和缺点。与HMM和Brill标记器相比，Unigram在标记语料上的准确率为88.37%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia

Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of annotated training corpuses to tag properly. We used some resource annotation corpus of Bahasa. Those corpuses were implemented with POS Tagging techniques. We subsequently compared and analyzed the results. We also compared the accuracy and highlighted some advantages and disadvantages for every technique we used. Unigram showed a higher accuracy compared to HMM and Brill tagger with 88,37% on a tagged corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 7th International Annual Engineering Seminar (InAES)

自引率

0.00%

发文量