{"title":"基于分离标记语料库的泰语命名实体识别建模","authors":"Kitiya Suriyachay, Virach Sornlertlamvanich","doi":"10.1109/ICAICTA.2018.8541344","DOIUrl":null,"url":null,"abstract":"In the Thai language, named entity can be used with or without a prefix or an indication of word. This may cause confusion between named entity and other types of noun. However, a named entity is likely to be used in adjacent to verbs or prepositions. This means that the adjacent verbs or prepositions to a noun can be as a good feature to determine the type of named entity. There are some studies on named entity recognition (NER) task in other languages such as Indonesian showing that combination of word embedding and part-of-speech (POS) tag can improve the performance of the NER model. In this paper, we investigate the Thai Named Entity Recognition task using Bi-LSTM model with word embedding and POS embedding for dealing with the relatively small and disjointedly labeled corpus. We compare our model with the one without POS tag, and the baseline model of CRF with the similar set of feature. The experiment results show that our proposed model outperforms the other two in all F1-score measures. Especially, in the case of location file, the F1-score is increased by 14 percent.","PeriodicalId":184882,"journal":{"name":"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus\",\"authors\":\"Kitiya Suriyachay, Virach Sornlertlamvanich\",\"doi\":\"10.1109/ICAICTA.2018.8541344\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the Thai language, named entity can be used with or without a prefix or an indication of word. This may cause confusion between named entity and other types of noun. However, a named entity is likely to be used in adjacent to verbs or prepositions. This means that the adjacent verbs or prepositions to a noun can be as a good feature to determine the type of named entity. There are some studies on named entity recognition (NER) task in other languages such as Indonesian showing that combination of word embedding and part-of-speech (POS) tag can improve the performance of the NER model. In this paper, we investigate the Thai Named Entity Recognition task using Bi-LSTM model with word embedding and POS embedding for dealing with the relatively small and disjointedly labeled corpus. We compare our model with the one without POS tag, and the baseline model of CRF with the similar set of feature. The experiment results show that our proposed model outperforms the other two in all F1-score measures. Especially, in the case of location file, the F1-score is increased by 14 percent.\",\"PeriodicalId\":184882,\"journal\":{\"name\":\"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA.2018.8541344\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Advanced Informatics: Concept Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2018.8541344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus
In the Thai language, named entity can be used with or without a prefix or an indication of word. This may cause confusion between named entity and other types of noun. However, a named entity is likely to be used in adjacent to verbs or prepositions. This means that the adjacent verbs or prepositions to a noun can be as a good feature to determine the type of named entity. There are some studies on named entity recognition (NER) task in other languages such as Indonesian showing that combination of word embedding and part-of-speech (POS) tag can improve the performance of the NER model. In this paper, we investigate the Thai Named Entity Recognition task using Bi-LSTM model with word embedding and POS embedding for dealing with the relatively small and disjointedly labeled corpus. We compare our model with the one without POS tag, and the baseline model of CRF with the similar set of feature. The experiment results show that our proposed model outperforms the other two in all F1-score measures. Especially, in the case of location file, the F1-score is increased by 14 percent.