Malte Barth, Tibor Bleidt, Martin Büßemeyer, Fabian Heseding, Niklas Köhnecke, Tobias Bleifuß, Leon Bornemann, D. Kalashnikov, Felix Naumann, D. Srivastava
{"title":"Detecting Stale Data in Wikipedia Infoboxes","authors":"Malte Barth, Tibor Bleidt, Martin Büßemeyer, Fabian Heseding, Niklas Köhnecke, Tobias Bleifuß, Leon Bornemann, D. Kalashnikov, Felix Naumann, D. Srivastava","doi":"10.48786/edbt.2023.36","DOIUrl":null,"url":null,"abstract":"Today’s fast-paced society is increasingly reliant on correct and up-to-date data. Wikipedia is the world’s most popular source of knowledge, and its infoboxes contain concise semi-structured data with important facts about a page’s topic. However, these data are not always up-to-date: we do not expect Wikipedia editors to update items at the moment their true values change. Also, many pages might not be well maintained and users might forget to update the data, e.g., when they are on holiday. To detect stale data in Wikipedia infoboxes, we combine cor-relation-based and rule-based approaches trained on different temporal granularities, based on all infobox changes over 15 years of English Wikipedia. We are able to predict 8 . 19% of all changes with a precision of 89 . 69% over a whole year, thus meet-ing our target precision of","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"29 1","pages":"450-456"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2023.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Today’s fast-paced society is increasingly reliant on correct and up-to-date data. Wikipedia is the world’s most popular source of knowledge, and its infoboxes contain concise semi-structured data with important facts about a page’s topic. However, these data are not always up-to-date: we do not expect Wikipedia editors to update items at the moment their true values change. Also, many pages might not be well maintained and users might forget to update the data, e.g., when they are on holiday. To detect stale data in Wikipedia infoboxes, we combine cor-relation-based and rule-based approaches trained on different temporal granularities, based on all infobox changes over 15 years of English Wikipedia. We are able to predict 8 . 19% of all changes with a precision of 89 . 69% over a whole year, thus meet-ing our target precision of