Linguistic influence patterns within the global network of Wikipedia language editions

A. Samoilenko, F. Karimi, Jérôme Kunegis, Daniel Edler, M. Strohmaier
{"title":"Linguistic influence patterns within the global network of Wikipedia language editions","authors":"A. Samoilenko, F. Karimi, Jérôme Kunegis, Daniel Edler, M. Strohmaier","doi":"10.1145/2786451.2786497","DOIUrl":null,"url":null,"abstract":"The Internet is highly multilingual, and its content is created, shared, debated and shaped within many different language-speaking communities. These communities do not exist in isolation, but communicate and influence each other's interests, just as in the offline world. Quantifying this influence is however a non-trivial task, as these communities are usually spread across multiple heterogeneous platforms. In this work, we set out to measure the influence of languages on each other by observing concept overlap between the 110 largest Wikipedia language editions. We describe experiments to test if language overlap in concept coverage is a random process, and find that edition size is a strong predictor of higher concept overlap, with English--German being the most frequently co-occurring pair (45%). Both small and large editions co-occur more frequently than expected with editions of similar size, but co-occurrences across groups are below what is expected by chance. Additionally, by applying network analysis, we find that the hierarchy of language interconnections differs depending on the locality of topics: for interlingually popular topics, the dominance of English, German and French is pronounced, while for topics with a local reach, geographical and cultural proximity as well as common heritage are better explanators of co-occurrence.","PeriodicalId":93136,"journal":{"name":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Web Science Conference. ACM Web Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2786451.2786497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The Internet is highly multilingual, and its content is created, shared, debated and shaped within many different language-speaking communities. These communities do not exist in isolation, but communicate and influence each other's interests, just as in the offline world. Quantifying this influence is however a non-trivial task, as these communities are usually spread across multiple heterogeneous platforms. In this work, we set out to measure the influence of languages on each other by observing concept overlap between the 110 largest Wikipedia language editions. We describe experiments to test if language overlap in concept coverage is a random process, and find that edition size is a strong predictor of higher concept overlap, with English--German being the most frequently co-occurring pair (45%). Both small and large editions co-occur more frequently than expected with editions of similar size, but co-occurrences across groups are below what is expected by chance. Additionally, by applying network analysis, we find that the hierarchy of language interconnections differs depending on the locality of topics: for interlingually popular topics, the dominance of English, German and French is pronounced, while for topics with a local reach, geographical and cultural proximity as well as common heritage are better explanators of co-occurrence.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
全球维基百科语言版本网络中的语言影响模式
互联网是高度多语种的,其内容是在许多不同的语言社区中创建、共享、讨论和塑造的。这些社区并不是孤立存在的,而是相互交流,影响彼此的利益,就像在线下世界一样。然而,量化这种影响是一项重要的任务,因为这些社区通常分布在多个异构平台上。在这项工作中,我们开始通过观察110个最大的维基百科语言版本之间的概念重叠来衡量语言对彼此的影响。我们描述了一些实验,以测试概念覆盖中的语言重叠是否是一个随机过程,并发现版本大小是更高概念重叠的有力预测指标,英语和德语是最常见的共存对(45%)。对于大小相似的版本,小版本和大版本共同出现的频率比预期的要高,但是跨组的共同出现的频率低于预期的偶然性。此外,通过应用网络分析,我们发现语言相互联系的层次结构因话题的地域性而异:对于语言间流行的话题,英语、德语和法语的优势是明显的,而对于具有本地影响的话题,地理和文化邻近以及共同遗产是共存的更好解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Opinions on Homeopathy for COVID-19 on Twitter. An Initial Study of Depression Detection on Mandarin Textual through BERT Model WebSci '22: 14th ACM Web Science Conference 2022, Barcelona, Spain, June 26 - 29, 2022 WebSci '21: 13th ACM Web Science Conference 2021, Virtual Event, United Kingdom, 21-25 June, 2021, Companion Publication In conversation with Martha Lane Fox and Wendy Hall on the Future of the Internet
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1