Nurulhuda Mohamad Ali, Goh Hui Ngo, Amy Lim Hui Lan
{"title":"马来语词性标注器的建构述评","authors":"Nurulhuda Mohamad Ali, Goh Hui Ngo, Amy Lim Hui Lan","doi":"10.1109/ICNLP58431.2023.00053","DOIUrl":null,"url":null,"abstract":"Part-of-Speech (POS) Tagging is one of the fundamental tasks in Natural Language Processing (NLP) in analyzing human languages. It is a process of identifying how words are used in a sentence by assigning the proper POS for each word. Thus far, most well-researched POS tagging is on European languages which are considered rich-resource languages due to the unlimited linguistic resources such as research studies and large standard corpus. However, POS tagging is arduous for low-resource languages due to the limitation of linguistic resources. The Malay language is considered as a low-resource language. Most POS tagging studies for the Malay language are using rule-based and stochastic methods. However, exploration in Deep Learning (DL) for Malay language is limited. Thus, studies with POS tagging methods that implement DL for other low-resource languages within South East Asia are included in this study. Hence, the aim of this study is to identify the state of the art, challenges, and future works of Malay POS tagger. This study provides a review of different methods, datasets, and performance measures used in POS tagging studies.","PeriodicalId":53637,"journal":{"name":"Icon","volume":"2015 1","pages":"253-257"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Construction of Part of Speech Tagger for Malay Language: A Review\",\"authors\":\"Nurulhuda Mohamad Ali, Goh Hui Ngo, Amy Lim Hui Lan\",\"doi\":\"10.1109/ICNLP58431.2023.00053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Part-of-Speech (POS) Tagging is one of the fundamental tasks in Natural Language Processing (NLP) in analyzing human languages. It is a process of identifying how words are used in a sentence by assigning the proper POS for each word. Thus far, most well-researched POS tagging is on European languages which are considered rich-resource languages due to the unlimited linguistic resources such as research studies and large standard corpus. However, POS tagging is arduous for low-resource languages due to the limitation of linguistic resources. The Malay language is considered as a low-resource language. Most POS tagging studies for the Malay language are using rule-based and stochastic methods. However, exploration in Deep Learning (DL) for Malay language is limited. Thus, studies with POS tagging methods that implement DL for other low-resource languages within South East Asia are included in this study. Hence, the aim of this study is to identify the state of the art, challenges, and future works of Malay POS tagger. This study provides a review of different methods, datasets, and performance measures used in POS tagging studies.\",\"PeriodicalId\":53637,\"journal\":{\"name\":\"Icon\",\"volume\":\"2015 1\",\"pages\":\"253-257\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Icon\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNLP58431.2023.00053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Icon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNLP58431.2023.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Arts and Humanities","Score":null,"Total":0}
Construction of Part of Speech Tagger for Malay Language: A Review
Part-of-Speech (POS) Tagging is one of the fundamental tasks in Natural Language Processing (NLP) in analyzing human languages. It is a process of identifying how words are used in a sentence by assigning the proper POS for each word. Thus far, most well-researched POS tagging is on European languages which are considered rich-resource languages due to the unlimited linguistic resources such as research studies and large standard corpus. However, POS tagging is arduous for low-resource languages due to the limitation of linguistic resources. The Malay language is considered as a low-resource language. Most POS tagging studies for the Malay language are using rule-based and stochastic methods. However, exploration in Deep Learning (DL) for Malay language is limited. Thus, studies with POS tagging methods that implement DL for other low-resource languages within South East Asia are included in this study. Hence, the aim of this study is to identify the state of the art, challenges, and future works of Malay POS tagger. This study provides a review of different methods, datasets, and performance measures used in POS tagging studies.