{"title":"印尼语文本不同词性标注技术的比较","authors":"Ahmad Zuli Amrullah, Rudy Hartanto, I. Mustika","doi":"10.1109/INAES.2017.8068538","DOIUrl":null,"url":null,"abstract":"Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of annotated training corpuses to tag properly. We used some resource annotation corpus of Bahasa. Those corpuses were implemented with POS Tagging techniques. We subsequently compared and analyzed the results. We also compared the accuracy and highlighted some advantages and disadvantages for every technique we used. Unigram showed a higher accuracy compared to HMM and Brill tagger with 88,37% on a tagged corpus.","PeriodicalId":382919,"journal":{"name":"2017 7th International Annual Engineering Seminar (InAES)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia\",\"authors\":\"Ahmad Zuli Amrullah, Rudy Hartanto, I. Mustika\",\"doi\":\"10.1109/INAES.2017.8068538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of annotated training corpuses to tag properly. We used some resource annotation corpus of Bahasa. Those corpuses were implemented with POS Tagging techniques. We subsequently compared and analyzed the results. We also compared the accuracy and highlighted some advantages and disadvantages for every technique we used. Unigram showed a higher accuracy compared to HMM and Brill tagger with 88,37% on a tagged corpus.\",\"PeriodicalId\":382919,\"journal\":{\"name\":\"2017 7th International Annual Engineering Seminar (InAES)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 7th International Annual Engineering Seminar (InAES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INAES.2017.8068538\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th International Annual Engineering Seminar (InAES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INAES.2017.8068538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia
Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of annotated training corpuses to tag properly. We used some resource annotation corpus of Bahasa. Those corpuses were implemented with POS Tagging techniques. We subsequently compared and analyzed the results. We also compared the accuracy and highlighted some advantages and disadvantages for every technique we used. Unigram showed a higher accuracy compared to HMM and Brill tagger with 88,37% on a tagged corpus.