Fériel Ben Fraj Trabelsi, C. Ben Othmane Zribi, Wiem Kouki
{"title":"从阿拉伯文本中提取命名实体的组合分类","authors":"Fériel Ben Fraj Trabelsi, C. Ben Othmane Zribi, Wiem Kouki","doi":"10.1109/ACLING.2015.15","DOIUrl":null,"url":null,"abstract":"In this paper, we describe an approach for extracting named entities from Arabic texts. Arabic language is hard to process since its characteristics that influence, even, the NE extraction. For our case, we consider that the named entities extraction can be assimilated to a typical classification problem. Indeed, this extraction consists of searching for text portions that can be classified in a NE class (Person, Locality or Organization). Thus, we choose to use a supervised learning approach and employ the BIO tagging format that can solve the twin problems of segmentation and categorization. In addition, singular classifier cannot give good results for all types of contexts. Thus, we adopt a set of weighted classifiers which we combined through a voting procedure. In order to appreciate properly the performance of our system, we perform two types of tests: with and without morphological attributes. We consider that the results are highly satisfactory especially with a accuracy that exceeds 89% for both Person and Locality classes.","PeriodicalId":404268,"journal":{"name":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combined Classification for Extracting Named Entities from Arabic Texts\",\"authors\":\"Fériel Ben Fraj Trabelsi, C. Ben Othmane Zribi, Wiem Kouki\",\"doi\":\"10.1109/ACLING.2015.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we describe an approach for extracting named entities from Arabic texts. Arabic language is hard to process since its characteristics that influence, even, the NE extraction. For our case, we consider that the named entities extraction can be assimilated to a typical classification problem. Indeed, this extraction consists of searching for text portions that can be classified in a NE class (Person, Locality or Organization). Thus, we choose to use a supervised learning approach and employ the BIO tagging format that can solve the twin problems of segmentation and categorization. In addition, singular classifier cannot give good results for all types of contexts. Thus, we adopt a set of weighted classifiers which we combined through a voting procedure. In order to appreciate properly the performance of our system, we perform two types of tests: with and without morphological attributes. We consider that the results are highly satisfactory especially with a accuracy that exceeds 89% for both Person and Locality classes.\",\"PeriodicalId\":404268,\"journal\":{\"name\":\"2015 First International Conference on Arabic Computational Linguistics (ACLing)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 First International Conference on Arabic Computational Linguistics (ACLing)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACLING.2015.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 First International Conference on Arabic Computational Linguistics (ACLing)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACLING.2015.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Combined Classification for Extracting Named Entities from Arabic Texts
In this paper, we describe an approach for extracting named entities from Arabic texts. Arabic language is hard to process since its characteristics that influence, even, the NE extraction. For our case, we consider that the named entities extraction can be assimilated to a typical classification problem. Indeed, this extraction consists of searching for text portions that can be classified in a NE class (Person, Locality or Organization). Thus, we choose to use a supervised learning approach and employ the BIO tagging format that can solve the twin problems of segmentation and categorization. In addition, singular classifier cannot give good results for all types of contexts. Thus, we adopt a set of weighted classifiers which we combined through a voting procedure. In order to appreciate properly the performance of our system, we perform two types of tests: with and without morphological attributes. We consider that the results are highly satisfactory especially with a accuracy that exceeds 89% for both Person and Locality classes.