A. Samoilenko, F. Karimi, Jérôme Kunegis, Daniel Edler, M. Strohmaier
{"title":"Linguistic influence patterns within the global network of Wikipedia language editions","authors":"A. Samoilenko, F. Karimi, Jérôme Kunegis, Daniel Edler, M. Strohmaier","doi":"10.1145/2786451.2786497","DOIUrl":null,"url":null,"abstract":"The Internet is highly multilingual, and its content is created, shared, debated and shaped within many different language-speaking communities. These communities do not exist in isolation, but communicate and influence each other's interests, just as in the offline world. Quantifying this influence is however a non-trivial task, as these communities are usually spread across multiple heterogeneous platforms. In this work, we set out to measure the influence of languages on each other by observing concept overlap between the 110 largest Wikipedia language editions. We describe experiments to test if language overlap in concept coverage is a random process, and find that edition size is a strong predictor of higher concept overlap, with English--German being the most frequently co-occurring pair (45%). Both small and large editions co-occur more frequently than expected with editions of similar size, but co-occurrences across groups are below what is expected by chance. Additionally, by applying network analysis, we find that the hierarchy of language interconnections differs depending on the locality of topics: for interlingually popular topics, the dominance of English, German and French is pronounced, while for topics with a local reach, geographical and cultural proximity as well as common heritage are better explanators of co-occurrence.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2786451.2786497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The Internet is highly multilingual, and its content is created, shared, debated and shaped within many different language-speaking communities. These communities do not exist in isolation, but communicate and influence each other's interests, just as in the offline world. Quantifying this influence is however a non-trivial task, as these communities are usually spread across multiple heterogeneous platforms. In this work, we set out to measure the influence of languages on each other by observing concept overlap between the 110 largest Wikipedia language editions. We describe experiments to test if language overlap in concept coverage is a random process, and find that edition size is a strong predictor of higher concept overlap, with English--German being the most frequently co-occurring pair (45%). Both small and large editions co-occur more frequently than expected with editions of similar size, but co-occurrences across groups are below what is expected by chance. Additionally, by applying network analysis, we find that the hierarchy of language interconnections differs depending on the locality of topics: for interlingually popular topics, the dominance of English, German and French is pronounced, while for topics with a local reach, geographical and cultural proximity as well as common heritage are better explanators of co-occurrence.