Q. Tran, T. Pham, Quoc Hung Ngo, D. Dinh, Nigel Collier
{"title":"越南文件中的命名实体识别","authors":"Q. Tran, T. Pham, Quoc Hung Ngo, D. Dinh, Nigel Collier","doi":"10.2201/NIIPI.2007.4.2","DOIUrl":null,"url":null,"abstract":"NamedEntityRecognition (NER) aimstoclassify wordsin a documentintopre-definedtarget entity classes and is now considered to be fundamental for many natural language processing tasks such a si nformation retrieval, machine translation, information extraction and question answering. This paper presents the results of an experiment in which a Support Vector Machine (SVM) based NER model is applied to the Vietnamese language. Though this state of the art machine learning method has been widely applied to NER in several well-studied languages, this is the first time this method has been applied to Vietnamese. In a comparison against Conditional Random Fields (CRFs) the SVM model was shown to outperform CRF by optimizing its feature window size, obtaining an overall F-score of 87.75. The paper also presents a detailed discussion about the characteristics of the Vietnamese language and provides an analysis of the factors which influence performance in this task.","PeriodicalId":91638,"journal":{"name":"... Proceedings of the ... IEEE International Conference on Progress in Informatics and Computing. IEEE International Conference on Progress in Informatics and Computing","volume":"21 1 1","pages":"5"},"PeriodicalIF":0.0000,"publicationDate":"2007-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Named Entity Recognition in Vietnamese documents\",\"authors\":\"Q. Tran, T. Pham, Quoc Hung Ngo, D. Dinh, Nigel Collier\",\"doi\":\"10.2201/NIIPI.2007.4.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"NamedEntityRecognition (NER) aimstoclassify wordsin a documentintopre-definedtarget entity classes and is now considered to be fundamental for many natural language processing tasks such a si nformation retrieval, machine translation, information extraction and question answering. This paper presents the results of an experiment in which a Support Vector Machine (SVM) based NER model is applied to the Vietnamese language. Though this state of the art machine learning method has been widely applied to NER in several well-studied languages, this is the first time this method has been applied to Vietnamese. In a comparison against Conditional Random Fields (CRFs) the SVM model was shown to outperform CRF by optimizing its feature window size, obtaining an overall F-score of 87.75. The paper also presents a detailed discussion about the characteristics of the Vietnamese language and provides an analysis of the factors which influence performance in this task.\",\"PeriodicalId\":91638,\"journal\":{\"name\":\"... Proceedings of the ... IEEE International Conference on Progress in Informatics and Computing. IEEE International Conference on Progress in Informatics and Computing\",\"volume\":\"21 1 1\",\"pages\":\"5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"... Proceedings of the ... IEEE International Conference on Progress in Informatics and Computing. IEEE International Conference on Progress in Informatics and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2201/NIIPI.2007.4.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"... Proceedings of the ... IEEE International Conference on Progress in Informatics and Computing. IEEE International Conference on Progress in Informatics and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2201/NIIPI.2007.4.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
NamedEntityRecognition (NER) aimstoclassify wordsin a documentintopre-definedtarget entity classes and is now considered to be fundamental for many natural language processing tasks such a si nformation retrieval, machine translation, information extraction and question answering. This paper presents the results of an experiment in which a Support Vector Machine (SVM) based NER model is applied to the Vietnamese language. Though this state of the art machine learning method has been widely applied to NER in several well-studied languages, this is the first time this method has been applied to Vietnamese. In a comparison against Conditional Random Fields (CRFs) the SVM model was shown to outperform CRF by optimizing its feature window size, obtaining an overall F-score of 87.75. The paper also presents a detailed discussion about the characteristics of the Vietnamese language and provides an analysis of the factors which influence performance in this task.