{"title":"Large-Scale Analysis of Wikipedia’s Link Structure and its Applications in Learning Path Construction","authors":"Yiding Song, Chun Hei Leung","doi":"10.1109/IRI58017.2023.00051","DOIUrl":null,"url":null,"abstract":"As the largest encyclopedia in history, Wikipedia represents an unprecedented unification of the world’s knowledge. Its internal links are an invaluable resource for understanding the relationships between concepts and information organization on the Web. However, such link structures are not thoroughly examined and barely visualized. In this paper, we take a graph-theoretic approach to investigate the link structure of English Wikipedia, providing an up-to-date snapshot of its knowledge organization, including degree distributions, strongly connected components, and disconnected subgraphs. To the best of our knowledge, we also perform the first k-core visualization over all of Wikipedia. Our results suggest Wikipedia is highly connected, with 90.05% of articles reachable from one another. Inbound links are found to be a better measure of an article’s importance than outbound links and demonstrate a more centralized mode of connection. Based on our observations, we propose a novel, end-to-end framework for automatically constructing learning paths, using Wikipedia links to recursively shortlist and rank prerequisite concepts for understanding new topics.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI58017.2023.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As the largest encyclopedia in history, Wikipedia represents an unprecedented unification of the world’s knowledge. Its internal links are an invaluable resource for understanding the relationships between concepts and information organization on the Web. However, such link structures are not thoroughly examined and barely visualized. In this paper, we take a graph-theoretic approach to investigate the link structure of English Wikipedia, providing an up-to-date snapshot of its knowledge organization, including degree distributions, strongly connected components, and disconnected subgraphs. To the best of our knowledge, we also perform the first k-core visualization over all of Wikipedia. Our results suggest Wikipedia is highly connected, with 90.05% of articles reachable from one another. Inbound links are found to be a better measure of an article’s importance than outbound links and demonstrate a more centralized mode of connection. Based on our observations, we propose a novel, end-to-end framework for automatically constructing learning paths, using Wikipedia links to recursively shortlist and rank prerequisite concepts for understanding new topics.