How much is Wikipedia Lagging Behind News?

B. Fetahu, Abhijith Anand, Avishek Anand
{"title":"How much is Wikipedia Lagging Behind News?","authors":"B. Fetahu, Abhijith Anand, Avishek Anand","doi":"10.1145/2786451.2786460","DOIUrl":null,"url":null,"abstract":"Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of The New York Times dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12% of the entities mentioned in the event page are created after the creation of the event page.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2786451.2786460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

Abstract

Wikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge from it. Online news, on the other hand, is an authoritative and rich source for emerging entities, events and facts relating to existing entities. In this work, we study the creation of entities in Wikipedia with respect to news by studying how entity and event based information flows from news to Wikipedia. We analyze the lag of Wikipedia (based on the revision history of the English Wikipedia) with 20 years of The New York Times dataset (NYT). We model and analyze the lag of entities and events, namely their first appearance in Wikipedia and in NYT, respectively. In our extensive experimental analysis, we find that almost 20% of the external references in entity pages are news articles encoding the importance of news to Wikipedia. Second, we observe that the entity-based lag follows a normal distribution with a high standard deviation, whereas the lag for news-based events is typically very low. Finally, we find that events are responsible for creation of emergent entities with as many as 12% of the entities mentioned in the event page are created after the creation of the event page.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
维基百科落后新闻多少?
维基百科拥有丰富的实体和事件,是各种知识收获、提取和挖掘任务的宝贵资源。许多资源,如DBpedia、YAGO和其他知识库,都是基于从中提取实体和基于事件的知识。另一方面,网络新闻是新兴实体、与现有实体相关的事件和事实的权威和丰富的来源。在这项工作中,我们通过研究基于实体和事件的信息如何从新闻流向维基百科,研究维基百科中与新闻相关的实体的创建。我们用20年的纽约时报数据集(NYT)来分析维基百科的滞后(基于英文维基百科的修订历史)。我们建模并分析实体和事件的滞后性,即它们分别首次出现在维基百科和纽约时报上。在我们广泛的实验分析中,我们发现实体页面中近20%的外部引用是新闻文章,这些文章编码了新闻对维基百科的重要性。其次,我们观察到基于实体的滞后遵循具有高标准偏差的正态分布,而基于新闻的事件的滞后通常非常低。最后,我们发现事件负责创建紧急实体,事件页面中提到的实体中有多达12%是在事件页面创建之后创建的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Opinions on Homeopathy for COVID-19 on Twitter. An Initial Study of Depression Detection on Mandarin Textual through BERT Model WebSci '22: 14th ACM Web Science Conference 2022, Barcelona, Spain, June 26 - 29, 2022 WebSci '21: 13th ACM Web Science Conference 2021, Virtual Event, United Kingdom, 21-25 June, 2021, Companion Publication In conversation with Martha Lane Fox and Wendy Hall on the Future of the Internet
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1