Majid Asgari-Bidhendi, Farzane Fakhrian, B. Minaei-Bidgoli
{"title":"ParsEL 1.0: Unsupervised Entity Linking in Persian Social Media Texts","authors":"Majid Asgari-Bidhendi, Farzane Fakhrian, B. Minaei-Bidgoli","doi":"10.1109/ikt51791.2020.9345631","DOIUrl":null,"url":null,"abstract":"Social media users have exponentially increased in recent years, and social media data has become one of the most populated repositories of data in the world. Natural language text is one of the main portions of this data. However, this textual data contains many entities, which increases the ambiguity of the data. Entity linking targets finding entity mentions and linking them to their corresponding entities in an external dataset. Recently, FarsBase has been introduced as the first Persian knowledge graph, containing almost 750,000 entities. In this study, we propose ParsEL, the first unsupervised end-to-end entity linking system specially designed for the Persian language, and utilizes contextual and graph-based features to rank the candidate entities. To evaluate the proposed approach, we publish the first entity linking dataset for the Persian language, created by crawling social media text from some popular Telegram channels and contains 67,595 tokens. The results show ParsEL records 86.94% f-score for the introduced dataset, and it is comparable with one other entity linking system which supports the Persian language.","PeriodicalId":382725,"journal":{"name":"2020 11th International Conference on Information and Knowledge Technology (IKT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th International Conference on Information and Knowledge Technology (IKT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ikt51791.2020.9345631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Social media users have exponentially increased in recent years, and social media data has become one of the most populated repositories of data in the world. Natural language text is one of the main portions of this data. However, this textual data contains many entities, which increases the ambiguity of the data. Entity linking targets finding entity mentions and linking them to their corresponding entities in an external dataset. Recently, FarsBase has been introduced as the first Persian knowledge graph, containing almost 750,000 entities. In this study, we propose ParsEL, the first unsupervised end-to-end entity linking system specially designed for the Persian language, and utilizes contextual and graph-based features to rank the candidate entities. To evaluate the proposed approach, we publish the first entity linking dataset for the Persian language, created by crawling social media text from some popular Telegram channels and contains 67,595 tokens. The results show ParsEL records 86.94% f-score for the introduced dataset, and it is comparable with one other entity linking system which supports the Persian language.