The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the existing literature according to a given schema. The automation of this classification process not only simplifies the submission process for authors, but also ensures the coherent assignment of classes. However, especially fine-grained classes and new research fields do not provide sufficient training data to automatize the process. Additionally, given the large number of not mutual exclusive classes, it is often difficult and computationally expensive to train models able to deal with multi-class multi-label settings. To overcome these issues, this work presents a preliminary Deep Learning framework as a solution for multi-label text classification for scholarly papers about Computer Science. The proposed model addresses the issue of insufficient data by utilizing the semantics of classes, which is explicitly provided by latent representations of class labels. This study uses Knowledge Graphs as a source of these required external class definitions by identifying corresponding entities in DBpedia to improve the overall classification.
{"title":"Deep Learning meets Knowledge Graphs for Scholarly Data Classification","authors":"Fabian Hoppe, D. Dessí, Harald Sack","doi":"10.1145/3442442.3451361","DOIUrl":"https://doi.org/10.1145/3442442.3451361","url":null,"abstract":"The amount of scientific literature continuously grows, which poses an increasing challenge for researchers to manage, find and explore research results. Therefore, the classification of scientific work is widely applied to enable the retrieval, support the search of suitable reviewers during the reviewing process, and in general to organize the existing literature according to a given schema. The automation of this classification process not only simplifies the submission process for authors, but also ensures the coherent assignment of classes. However, especially fine-grained classes and new research fields do not provide sufficient training data to automatize the process. Additionally, given the large number of not mutual exclusive classes, it is often difficult and computationally expensive to train models able to deal with multi-class multi-label settings. To overcome these issues, this work presents a preliminary Deep Learning framework as a solution for multi-label text classification for scholarly papers about Computer Science. The proposed model addresses the issue of insufficient data by utilizing the semantics of classes, which is explicitly provided by latent representations of class labels. This study uses Knowledge Graphs as a source of these required external class definitions by identifying corresponding entities in DBpedia to improve the overall classification.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134237406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fraud behavior poses a severe threat to e-commerce platforms and anti-fraud systems have become indispensable infrastructure of these platforms. Recently, there have been a large number of fraud detection models proposed to monitor online purchasing transactions and extract hidden fraud patterns. Thanks to these fraud detection models, we have observed a significant reduction of committed frauds in the last several years. However, there have been an increasing number of malicious sellers on e-commerce platforms, according to our recent statistics, who purposely circumvent these online fraud detection systems by transferring their fake purchasing behaviors from online to offline. This way, the effectiveness of our existing fraud detection system built based upon online transactions is compromised. To solve this problem, we study in this paper a new problem, called offline fraud community detection, which can greatly strengthen our existing fraud detection systems. We propose a new FRaud COmmunity Detection from Online to Offline (FRODO) framework which combines the strength of both online and offline data views, especially the offline spatial-temporal data, for fraud community discovery. Moreover, a new Multi-view Heterogeneous Graph Neural Network model is proposed within our new FRODO framework which can find anomalous graph patterns such as biclique communities through only a small number of black seeds, i.e., a small number of labeled fraud users. The seeds are processed by a streamlined pipeline of three components comprised of label propagation for a high coverage, multi-view heterogeneous graph neural networks for high-risky fraud user recognition, and spatial-temporal network reconstruction and mining for offline fraud community detection. The extensive experimental results on a large real-life Taobao network, with 20 millions of users, 5 millions of product items and 30 millions of transactions, demonstrate the good effectiveness of the proposed methods.
{"title":"What Happens Behind the Scene? Towards Fraud Community Detection in E-Commerce from Online to Offline","authors":"Zhao Li, Pengrui Hui, Peng Zhang, Jiaming Huang, Biao Wang, Ling Tian, Ji Zhang, Jianliang Gao, Xing Tang","doi":"10.1145/3442442.3451147","DOIUrl":"https://doi.org/10.1145/3442442.3451147","url":null,"abstract":"Fraud behavior poses a severe threat to e-commerce platforms and anti-fraud systems have become indispensable infrastructure of these platforms. Recently, there have been a large number of fraud detection models proposed to monitor online purchasing transactions and extract hidden fraud patterns. Thanks to these fraud detection models, we have observed a significant reduction of committed frauds in the last several years. However, there have been an increasing number of malicious sellers on e-commerce platforms, according to our recent statistics, who purposely circumvent these online fraud detection systems by transferring their fake purchasing behaviors from online to offline. This way, the effectiveness of our existing fraud detection system built based upon online transactions is compromised. To solve this problem, we study in this paper a new problem, called offline fraud community detection, which can greatly strengthen our existing fraud detection systems. We propose a new FRaud COmmunity Detection from Online to Offline (FRODO) framework which combines the strength of both online and offline data views, especially the offline spatial-temporal data, for fraud community discovery. Moreover, a new Multi-view Heterogeneous Graph Neural Network model is proposed within our new FRODO framework which can find anomalous graph patterns such as biclique communities through only a small number of black seeds, i.e., a small number of labeled fraud users. The seeds are processed by a streamlined pipeline of three components comprised of label propagation for a high coverage, multi-view heterogeneous graph neural networks for high-risky fraud user recognition, and spatial-temporal network reconstruction and mining for offline fraud community detection. The extensive experimental results on a large real-life Taobao network, with 20 millions of users, 5 millions of product items and 30 millions of transactions, demonstrate the good effectiveness of the proposed methods.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125701167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Färber, Vinzenz Zinecker, Isabela Bragaglia Cartus, S. Celis, Maria Duma
Finding suitable citations for scientific publications can be challenging and time-consuming. To this end, context-aware citation recommendation approaches that recommend publications as candidates for in-text citations have been developed. In this paper, we present C-Rex, a web-based demonstration system available at http://c-rex.org for context-aware citation recommendation based on the Neural Citation Network [5] and millions of publications from the Microsoft Academic Graph. Our system is one of the first online context-aware citation recommendation systems and the first to incorporate not only a deep learning recommendation approach, but also explanation components to help users better understand why papers were recommended. In our offline evaluation, our model performs similarly to the one presented in the original paper and can serve as a basic framework for further implementations. In our online evaluation, we found that the explanations of recommendations increased users’ satisfaction.
{"title":"C-Rex: A Comprehensive System for Recommending In-Text Citations with Explanations","authors":"Michael Färber, Vinzenz Zinecker, Isabela Bragaglia Cartus, S. Celis, Maria Duma","doi":"10.1145/3442442.3451366","DOIUrl":"https://doi.org/10.1145/3442442.3451366","url":null,"abstract":"Finding suitable citations for scientific publications can be challenging and time-consuming. To this end, context-aware citation recommendation approaches that recommend publications as candidates for in-text citations have been developed. In this paper, we present C-Rex, a web-based demonstration system available at http://c-rex.org for context-aware citation recommendation based on the Neural Citation Network [5] and millions of publications from the Microsoft Academic Graph. Our system is one of the first online context-aware citation recommendation systems and the first to incorporate not only a deep learning recommendation approach, but also explanation components to help users better understand why papers were recommended. In our offline evaluation, our model performs similarly to the one presented in the original paper and can serve as a basic framework for further implementations. In our online evaluation, we found that the explanations of recommendations increased users’ satisfaction.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123462877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hiba Arnaout, S. Razniewski, G. Weikum, Jeff Z. Pan
The Wikidata knowledge base (KB) is one of the most popular structured data repositories on the web, containing more than 1 billion statements for over 90 million entities. Like most major KBs, it is nonetheless incomplete and therefore operates under the open-world assumption (OWA) - statements not contained in Wikidata should be assumed to have an unknown truth. The OWA ignores however, that a significant part of interesting knowledge is negative, which cannot be readily expressed in this data model. In this paper, we review the challenges arising from the OWA, as well as some specific attempts Wikidata has made to overcome them. We review a statistical inference method for negative statements, called peer-based inference, and present Wikinegata, a platform that implements this inference over Wikidata. We discuss lessons learned from the development of this platform, as well as how the platform can be used both for learning about interesting negations, as well as about modelling challenges inside Wikidata. Wikinegata is available at https://d5demos.mpi-inf.mpg.de/negation.
{"title":"Negative Knowledge for Open-world Wikidata","authors":"Hiba Arnaout, S. Razniewski, G. Weikum, Jeff Z. Pan","doi":"10.1145/3442442.3452339","DOIUrl":"https://doi.org/10.1145/3442442.3452339","url":null,"abstract":"The Wikidata knowledge base (KB) is one of the most popular structured data repositories on the web, containing more than 1 billion statements for over 90 million entities. Like most major KBs, it is nonetheless incomplete and therefore operates under the open-world assumption (OWA) - statements not contained in Wikidata should be assumed to have an unknown truth. The OWA ignores however, that a significant part of interesting knowledge is negative, which cannot be readily expressed in this data model. In this paper, we review the challenges arising from the OWA, as well as some specific attempts Wikidata has made to overcome them. We review a statistical inference method for negative statements, called peer-based inference, and present Wikinegata, a platform that implements this inference over Wikidata. We discuss lessons learned from the development of this platform, as well as how the platform can be used both for learning about interesting negations, as well as about modelling challenges inside Wikidata. Wikinegata is available at https://d5demos.mpi-inf.mpg.de/negation.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124767403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a novel end-to-end hand detection method YOLObile-KCF on mobile device based on Web of Things (WoT) is presented, which can also be applied in practice. While hand detection has been become a hot topic in recent years, little attention has been paid to the practical use of hand detection on mobile device. It is demonstrated that our hand detection system can effectively detect and track hand with high accuracy and fast speed that enables us not only to communicate with each other on mobile devices, but also can assist and guide the people on the other side on the mobile device in real-time. The method used in our study is known as object detection, which is a working theory based on deep learning. And lightweight neural network suitable for mobile device which can has few parameters and easily deployed is adopted in our model. What's more, KCF algorithms is added in our model. And several experiments were carried out to test the validity of hand detection system. From the experiment, it came to realize that the YOLObile-KCF hand detection system based on WoT is considerable, which is more efficient and convenient in smart life. Our work involving studies of hand detection for smart life proves to be encouraging.
本文提出了一种基于物联网(Web of Things, WoT)的移动设备端到端手部检测方法YOLObile-KCF,该方法也可以应用于实际。虽然手部检测是近年来的一个热门话题,但在移动设备上的实际应用却很少受到关注。实验证明,我们的手部检测系统能够高效、准确、快速地检测和跟踪手部,使我们不仅可以在移动设备上相互交流,还可以在移动设备上实时帮助和引导对方的人。我们研究中使用的方法被称为对象检测,这是一种基于深度学习的工作理论。该模型采用了适合移动设备的轻量神经网络,具有参数少、易于部署等特点。此外,我们还在模型中加入了KCF算法。并通过实验验证了该手部检测系统的有效性。通过实验,我们意识到基于WoT的YOLObile-KCF手部检测系统是相当可观的,在智能生活中更加高效和便捷。我们的工作涉及智能生活的手部检测研究,证明是令人鼓舞的。
{"title":"A Deep End-to-end Hand Detection Application On Mobile Device Based On Web Of Things","authors":"Linjuan Ma, Fuquan Zhang","doi":"10.1145/3442442.3451141","DOIUrl":"https://doi.org/10.1145/3442442.3451141","url":null,"abstract":"In this paper, a novel end-to-end hand detection method YOLObile-KCF on mobile device based on Web of Things (WoT) is presented, which can also be applied in practice. While hand detection has been become a hot topic in recent years, little attention has been paid to the practical use of hand detection on mobile device. It is demonstrated that our hand detection system can effectively detect and track hand with high accuracy and fast speed that enables us not only to communicate with each other on mobile devices, but also can assist and guide the people on the other side on the mobile device in real-time. The method used in our study is known as object detection, which is a working theory based on deep learning. And lightweight neural network suitable for mobile device which can has few parameters and easily deployed is adopted in our model. What's more, KCF algorithms is added in our model. And several experiments were carried out to test the validity of hand detection system. From the experiment, it came to realize that the YOLObile-KCF hand detection system based on WoT is considerable, which is more efficient and convenient in smart life. Our work involving studies of hand detection for smart life proves to be encouraging.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"650 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122701240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rise of the counterfactual concept promoted the study of reasoning, and we applied it to Knowledge Base Question Answering (KBQA) multi-hop reasoning as a way of data augmentation for the first time. Intuitively, we propose a model-agnostic Counterfactual Samples Synthesizing(CSS) training scheme. The CSS uses two augmentation methods Q-CSS and T-CSS to augment the training set. That is, for each training instance, we create two augmented instances, one per augmentation method. Furthermore, perform the Dynamic Answer Equipment(DAE) algorithm to dynamically assign ground-truth answers for the expanded question, constructing counterfactual examples. After training with the supplemented examples, the KBQA model can focus on all key entities and words, which significantly improved model’s sensitivity. Experimental verified the effectiveness of CSS and achieved consistent improvements across settings with different extents of KB incompleteness.
{"title":"Counterfactual-Augmented Data for Multi-Hop Knowledge Base Question Answering","authors":"Yingting Li","doi":"10.1145/3442442.3453706","DOIUrl":"https://doi.org/10.1145/3442442.3453706","url":null,"abstract":"The rise of the counterfactual concept promoted the study of reasoning, and we applied it to Knowledge Base Question Answering (KBQA) multi-hop reasoning as a way of data augmentation for the first time. Intuitively, we propose a model-agnostic Counterfactual Samples Synthesizing(CSS) training scheme. The CSS uses two augmentation methods Q-CSS and T-CSS to augment the training set. That is, for each training instance, we create two augmented instances, one per augmentation method. Furthermore, perform the Dynamic Answer Equipment(DAE) algorithm to dynamically assign ground-truth answers for the expanded question, constructing counterfactual examples. After training with the supplemented examples, the KBQA model can focus on all key entities and words, which significantly improved model’s sensitivity. Experimental verified the effectiveness of CSS and achieved consistent improvements across settings with different extents of KB incompleteness.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131464073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancement in neural network architectures has provided several opportunities to develop systems to automatically extract and represent information from domain specific unstructured text sources. The Finsim-2021 shared task, collocated with the FinNLP workshop, offered the challenge to automatically learn effective and precise semantic models of financial domain concepts. Building such semantic representations of domain concepts requires knowledge about the specific domain. Such a thorough knowledge can be obtained through the contextual information available in raw text documents on those domains. In this paper, we proposed a transformer-based BERT architecture that captures such contextual information from a set of domain specific raw documents and then perform a classification task to segregate domain terms into fixed number of class labels. The proposed model not only considers the contextual BERT embeddings but also incorporates a TF-IDF vectorizer that gives a word level importance to the model. The performance of the model has been evaluated against several baseline architectures.
{"title":"TCS_WITM_2021 @FinSim-2: Transformer based Models for Automatic Classification of Financial Terms","authors":"Tushar Goel, Vipul Chauhan, Ishan Verma, Tirthankar Dasgupta, Lipika Dey","doi":"10.1145/3442442.3451386","DOIUrl":"https://doi.org/10.1145/3442442.3451386","url":null,"abstract":"Recent advancement in neural network architectures has provided several opportunities to develop systems to automatically extract and represent information from domain specific unstructured text sources. The Finsim-2021 shared task, collocated with the FinNLP workshop, offered the challenge to automatically learn effective and precise semantic models of financial domain concepts. Building such semantic representations of domain concepts requires knowledge about the specific domain. Such a thorough knowledge can be obtained through the contextual information available in raw text documents on those domains. In this paper, we proposed a transformer-based BERT architecture that captures such contextual information from a set of domain specific raw documents and then perform a classification task to segregate domain terms into fixed number of class labels. The proposed model not only considers the contextual BERT embeddings but also incorporates a TF-IDF vectorizer that gives a word level importance to the model. The performance of the model has been evaluated against several baseline architectures.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127069715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information Extraction (IE) tasks are commonly studied topics in various domains of research. Hence, the community continuously produces multiple techniques, solutions, and tools to perform such tasks. However, running those tools and integrating them within existing infrastructure requires time, expertise, and resources. One pertinent task here is triples extraction and linking, where structured triples are extracted from a text and aligned to an existing Knowledge Graph (KG). In this paper, we present Plumber , the first framework that allows users to manually and automatically create suitable IE pipelines from a community-created pool of tools to perform triple extraction and alignment on unstructured text. Our approach provides an interactive medium to alter the pipelines and perform IE tasks. A short video to show the working of the framework for different use-cases is available online1
{"title":"Plumber: A Modular Framework to Create Information Extraction Pipelines","authors":"M. Y. Jaradeh, Kuldeep Singh, M. Stocker, S. Auer","doi":"10.1145/3442442.3458603","DOIUrl":"https://doi.org/10.1145/3442442.3458603","url":null,"abstract":"Information Extraction (IE) tasks are commonly studied topics in various domains of research. Hence, the community continuously produces multiple techniques, solutions, and tools to perform such tasks. However, running those tools and integrating them within existing infrastructure requires time, expertise, and resources. One pertinent task here is triples extraction and linking, where structured triples are extracted from a text and aligned to an existing Knowledge Graph (KG). In this paper, we present Plumber , the first framework that allows users to manually and automatically create suitable IE pipelines from a community-created pool of tools to perform triple extraction and alignment on unstructured text. Our approach provides an interactive medium to alter the pipelines and perform IE tasks. A short video to show the working of the framework for different use-cases is available online1","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"54 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120854413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present the different methods proposed for the FinSIM-2 Shared Task 2021 on Learning Semantic Similarities for the Financial domain. The main focus of this task is to evaluate the classification of financial terms into corresponding top-level concepts (also known as hypernyms) that were extracted from an external ontology. We approached the task as a semantic textual similarity problem. By relying on a siamese network with pre-trained language model encoders, we derived semantically meaningful term embeddings and computed similarity scores between them in a ranked manner. Additionally, we exhibit the results of different baselines in which the task is tackled as a multi-class classification problem. The proposed methods outperformed our baselines and proved the robustness of the models based on textual similarity siamese network.
{"title":"L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers","authors":"Nhu Khoa Nguyen, Emanuela Boros, Gaël Lejeune, A. Doucet, Thierry Delahaut","doi":"10.1145/3442442.3451384","DOIUrl":"https://doi.org/10.1145/3442442.3451384","url":null,"abstract":"In this paper, we present the different methods proposed for the FinSIM-2 Shared Task 2021 on Learning Semantic Similarities for the Financial domain. The main focus of this task is to evaluate the classification of financial terms into corresponding top-level concepts (also known as hypernyms) that were extracted from an external ontology. We approached the task as a semantic textual similarity problem. By relying on a siamese network with pre-trained language model encoders, we derived semantically meaningful term embeddings and computed similarity scores between them in a ranked manner. Additionally, we exhibit the results of different baselines in which the task is tackled as a multi-class classification problem. The proposed methods outperformed our baselines and proved the robustness of the models based on textual similarity siamese network.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124230673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elad Vardi, Lev Muchnik, Alex Conway, Micha Breakstone
Wikipedia is a major source of information utilized by internet users around the globe for fact-checking and access to general, encyclopedic information. For researchers, it offers an unprecedented opportunity to measure how societies respond to events and how our collective perception of the world evolves over time and in response to events. Wikipedia use and the reading patterns of its users reflect our collective interests and the way they are expressed in our search for information – whether as part of fleeting, zeitgeist-fed trends or long-term – on most every topic, from personal to business, through political, health-related, academic and scientific. In a very real sense, events are defined by how we interpret them and how they affect our perception of the context in which they occurred, rendering Wikipedia invaluable for understanding events and their context. This paper introduces WikiShark (www.wikishark.com) – an online tool that allows researchers to analyze Wikipedia traffic and trends quickly and effectively, by (1) instantly querying pageview traffic data; (2) comparing traffic across articles; (3) surfacing and analyzing trending topics; and (4) easily leveraging findings for use in their own research.
{"title":"WikiShark: An Online Tool for Analyzing Wikipedia Traffic and Trends","authors":"Elad Vardi, Lev Muchnik, Alex Conway, Micha Breakstone","doi":"10.1145/3442442.3452341","DOIUrl":"https://doi.org/10.1145/3442442.3452341","url":null,"abstract":"Wikipedia is a major source of information utilized by internet users around the globe for fact-checking and access to general, encyclopedic information. For researchers, it offers an unprecedented opportunity to measure how societies respond to events and how our collective perception of the world evolves over time and in response to events. Wikipedia use and the reading patterns of its users reflect our collective interests and the way they are expressed in our search for information – whether as part of fleeting, zeitgeist-fed trends or long-term – on most every topic, from personal to business, through political, health-related, academic and scientific. In a very real sense, events are defined by how we interpret them and how they affect our perception of the context in which they occurred, rendering Wikipedia invaluable for understanding events and their context. This paper introduces WikiShark (www.wikishark.com) – an online tool that allows researchers to analyze Wikipedia traffic and trends quickly and effectively, by (1) instantly querying pageview traffic data; (2) comparing traffic across articles; (3) surfacing and analyzing trending topics; and (4) easily leveraging findings for use in their own research.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115420589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}