{"title":"Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)","authors":"Rizka Maulida Yanti, Ibnu Santoso, Lya Hulliyyatus Suadaa","doi":"10.24002/ijis.v4i1.4677","DOIUrl":null,"url":null,"abstract":"SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.","PeriodicalId":34118,"journal":{"name":"Indonesian Journal of Information Systems","volume":"425 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indonesian Journal of Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24002/ijis.v4i1.4677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.
SpaCy是一种能够有效处理自然语言处理(NLP)问题的工具,其中之一就是命名实体识别(NER)。NER用于提取和识别文本中的命名实体。然而,到目前为止,space还没有正式发布NER模型的印尼语预训练。另一方面,根据2019年PLN的统计报告,日惹省是一个经常发生停电的省份,在推特上发现了许多与该省发生的停电有关的公众投诉。这是因为目前还没有关于提取电干扰相关信息的研究,使用印度尼西亚语的SpaCy对NER的研究仍然很少。因此,在本研究中,将对直航省的停电相关信息进行提取。日惹通过twitter使用印度尼西亚空间。本研究取得了良好的性能结果,计算精度为95.52%,召回率为93.27%,f1得分为94.38%。然后,根据与电干扰相关的推文中包含的位置实体进行映射。从这个过程中发现,推文中提到的与停电有关的地点最多的是Sleman Regency,而最少的是Gunung Kidul Regency。然后,停电最多的月份是2020年3月,而停电最少的月份是2020年7月。