Event information extraction from Indonesian tweets using conditional random field

F. Muhammad, M. L. Khodra
{"title":"Event information extraction from Indonesian tweets using conditional random field","authors":"F. Muhammad, M. L. Khodra","doi":"10.1109/ICAICTA.2015.7335383","DOIUrl":null,"url":null,"abstract":"Information extraction is a process to find structured text from unstructured or semi-structured text. This research has an objective to build an information extraction system specialized for Events in Indonesian tweets. The system consists of two main parts. First part filters relevant tweet from irrelevant tweet. This part is only using a rule based approach with additional bag of words feature and gets the best accuracy of 86%. The second part is doing the extraction process. From our experiments, we get the best combination for extractor module by using multi token tokenization method, all feature set and 1st Order Conditional Random Field. This combination result in average accuracy of 74% per token.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Information extraction is a process to find structured text from unstructured or semi-structured text. This research has an objective to build an information extraction system specialized for Events in Indonesian tweets. The system consists of two main parts. First part filters relevant tweet from irrelevant tweet. This part is only using a rule based approach with additional bag of words feature and gets the best accuracy of 86%. The second part is doing the extraction process. From our experiments, we get the best combination for extractor module by using multi token tokenization method, all feature set and 1st Order Conditional Random Field. This combination result in average accuracy of 74% per token.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用条件随机场从印度尼西亚tweets中提取事件信息
信息提取是从非结构化或半结构化文本中寻找结构化文本的过程。本研究的目的是建立一个专门针对印度尼西亚推文事件的信息提取系统。该系统主要由两个部分组成。第一部分从不相关的推文中过滤相关的推文。这部分只使用了基于规则的方法和额外的词包特征,并获得了86%的最佳准确率。第二部分是提取过程。通过实验,我们得到了多标记化方法、全特征集和一阶条件随机场对提取器模块的最佳组合。这种组合导致每个令牌的平均准确率为74%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A framework for laptop review analysis Incorporating text information on presentation slides for spoken lecture retrieval TippyDB: Geographically-aware distributed NoSQL Key-Value store Handling arbitrary polygon query based on the boolean overlay on a geographical information system Relation between EMG signal activation and time lags using feature analysis during dynamic contraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1