Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)

Rizka Maulida Yanti, Ibnu Santoso, Lya Hulliyyatus Suadaa
{"title":"Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)","authors":"Rizka Maulida Yanti, Ibnu Santoso, Lya Hulliyyatus Suadaa","doi":"10.24002/ijis.v4i1.4677","DOIUrl":null,"url":null,"abstract":"SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.","PeriodicalId":34118,"journal":{"name":"Indonesian Journal of Information Systems","volume":"425 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indonesian Journal of Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24002/ijis.v4i1.4677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过Twitter的命名实体识别在印度尼西亚空间中的应用(案例研究:日惹特别地区的停电)
SpaCy是一种能够有效处理自然语言处理(NLP)问题的工具,其中之一就是命名实体识别(NER)。NER用于提取和识别文本中的命名实体。然而,到目前为止,space还没有正式发布NER模型的印尼语预训练。另一方面,根据2019年PLN的统计报告,日惹省是一个经常发生停电的省份,在推特上发现了许多与该省发生的停电有关的公众投诉。这是因为目前还没有关于提取电干扰相关信息的研究,使用印度尼西亚语的SpaCy对NER的研究仍然很少。因此,在本研究中,将对直航省的停电相关信息进行提取。日惹通过twitter使用印度尼西亚空间。本研究取得了良好的性能结果,计算精度为95.52%,召回率为93.27%,f1得分为94.38%。然后,根据与电干扰相关的推文中包含的位置实体进行映射。从这个过程中发现,推文中提到的与停电有关的地点最多的是Sleman Regency,而最少的是Gunung Kidul Regency。然后,停电最多的月份是2020年3月,而停电最少的月份是2020年7月。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
7
审稿时长
12 weeks
期刊最新文献
The Implementation of Business Process Blockchain Technology Based of MSCWR SmartBox Model Priority Scheduling Implementation for Exam Schedule SPAM (Smart Patient Monitoring System) using Structural Similarity Index Measurement An Investigation of Nurses' Perceptions of the Usefulness and Easiness of Using Electronic Medical Records in Saudi Arabia: A Technology Acceptance Model Mobile Application for Medicinal Plants Recognition from Leaf Image Using Convolutional Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1