The advent of Wikidata represented a breakthrough as a collaborative and constantly advancing knowledgebase. As it was originally envisioned, it simplified the linkage and data reuse among different Wikimedia projects. Catalan Wikipedia is one example project where Wikidata has been heavily adopted by its community base: that is the case of integration with article infoboxes or in automatically generated lists. In the following article we highlight the possibilities of taking advantage of structured data from Wikidata for evaluating new biographical articles, so facilitating users to get engaged into diversity challenges or track potential vandalism and errors.
{"title":"Simple Wikidata Analysis for Tracking and Improving Biographies in Catalan Wikipedia","authors":"Toni Hermoso Pulido","doi":"10.1145/3442442.3452344","DOIUrl":"https://doi.org/10.1145/3442442.3452344","url":null,"abstract":"The advent of Wikidata represented a breakthrough as a collaborative and constantly advancing knowledgebase. As it was originally envisioned, it simplified the linkage and data reuse among different Wikimedia projects. Catalan Wikipedia is one example project where Wikidata has been heavily adopted by its community base: that is the case of integration with article infoboxes or in automatically generated lists. In the following article we highlight the possibilities of taking advantage of structured data from Wikidata for evaluating new biographical articles, so facilitating users to get engaged into diversity challenges or track potential vandalism and errors.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123413230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper motivates to solve the multiple mapping of Received Signal Strength Indications (RSSIs) and location estimating problem in mobile positioning. A mobile positioning method based on Time-distributed Auto Encoder and Gated Recurrent Unit (TAE-GRU) is proposed to realize the mobile positioning. To distinguish the identical RSSI of different temporal steps, this paper develops a reconstructed model based on Time-distributed Auto Encoder (TAE), which is conducive for further learning of the estimated model. Among them, time-distributed technology is utilized to translate the data of each temporal step separately accommodating the temporal characteristics of RSSI data. Besides, an estimated model based on Gated Recurrent Unit (GRU) is developed to learn the temporal relationship of RSSI data to estimate the locations of mobile devices. Combining the TAE model and GRU model, the proposed model is provided with the capability of solving multiple mapping and mobile positioning dilemma. Massive experimental results demonstrated that the proposed method provides superior performance than comparative methods when solving multiple mapping and positioning problems.
{"title":"Mobile Positioning Based on TAE-GRU","authors":"Canyang Guo, Ling Wu, Cheng Shi, Chi-Hua Chen","doi":"10.1145/3442442.3451146","DOIUrl":"https://doi.org/10.1145/3442442.3451146","url":null,"abstract":"This paper motivates to solve the multiple mapping of Received Signal Strength Indications (RSSIs) and location estimating problem in mobile positioning. A mobile positioning method based on Time-distributed Auto Encoder and Gated Recurrent Unit (TAE-GRU) is proposed to realize the mobile positioning. To distinguish the identical RSSI of different temporal steps, this paper develops a reconstructed model based on Time-distributed Auto Encoder (TAE), which is conducive for further learning of the estimated model. Among them, time-distributed technology is utilized to translate the data of each temporal step separately accommodating the temporal characteristics of RSSI data. Besides, an estimated model based on Gated Recurrent Unit (GRU) is developed to learn the temporal relationship of RSSI data to estimate the locations of mobile devices. Combining the TAE model and GRU model, the proposed model is provided with the capability of solving multiple mapping and mobile positioning dilemma. Massive experimental results demonstrated that the proposed method provides superior performance than comparative methods when solving multiple mapping and positioning problems.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128319518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As user-generated contents thrive, so does the spread of toxic comment. Therefore, detecting toxic comment becomes an active research area, and it is often handled as a text classification task. As recent popular methods for text classification tasks, pre-trained language model-based methods are at the forefront of natural language processing, achieving state-of-the-art performance on various NLP tasks. However, there is a paucity in studies using such methods on toxic comment classification. In this work, we study how to best make use of pre-trained language model-based methods for toxic comment classification and the performances of different pre-trained language models on these tasks. Our results show that, Out of the three most popular language models, i.e. BERT, RoBERTa, and XLM, BERT and RoBERTa generally outperform XLM on toxic comment classification. We also prove that using a basic linear downstream structure outperforms complex ones such as CNN and BiLSTM. What is more, we find that further fine-tuning a pre-trained language model with light hyper-parameter settings brings improvements to the downstream toxic comment classification task, especially when the task has a relatively small dataset.
{"title":"A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification","authors":"Zhixue Zhao, Ziqi Zhang, F. Hopfgartner","doi":"10.1145/3442442.3452313","DOIUrl":"https://doi.org/10.1145/3442442.3452313","url":null,"abstract":"As user-generated contents thrive, so does the spread of toxic comment. Therefore, detecting toxic comment becomes an active research area, and it is often handled as a text classification task. As recent popular methods for text classification tasks, pre-trained language model-based methods are at the forefront of natural language processing, achieving state-of-the-art performance on various NLP tasks. However, there is a paucity in studies using such methods on toxic comment classification. In this work, we study how to best make use of pre-trained language model-based methods for toxic comment classification and the performances of different pre-trained language models on these tasks. Our results show that, Out of the three most popular language models, i.e. BERT, RoBERTa, and XLM, BERT and RoBERTa generally outperform XLM on toxic comment classification. We also prove that using a basic linear downstream structure outperforms complex ones such as CNN and BiLSTM. What is more, we find that further fine-tuning a pre-trained language model with light hyper-parameter settings brings improvements to the downstream toxic comment classification task, especially when the task has a relatively small dataset.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124565826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucy McKenna, Junli Liang, Natalia Duda, N. McDonald, Rob Brennan
In this demonstration we present the Access Risk Knowledge (ARK) Platform - a socio-technical risk governance system. Through the ARK Virus Project, the ARK Platform has been extended for risk management of personal protective equipment (PPE) in healthcare settings during the COVID-19 pandemic. ARK demonstrates the benefits of a Semantic Web approach for supporting both the integration and classification of qualitative and quantitative PPE risk data, across multiple healthcare organisations, in order to generate a unique unified evidence base of risk. This evidence base could be used to inform decision making processes regarding PPE use.
{"title":"ARK-Virus: An ARK Platform Extension for Mindful Risk Governance of Personal Protective Equipment Use in Healthcare","authors":"Lucy McKenna, Junli Liang, Natalia Duda, N. McDonald, Rob Brennan","doi":"10.1145/3442442.3458609","DOIUrl":"https://doi.org/10.1145/3442442.3458609","url":null,"abstract":"In this demonstration we present the Access Risk Knowledge (ARK) Platform - a socio-technical risk governance system. Through the ARK Virus Project, the ARK Platform has been extended for risk management of personal protective equipment (PPE) in healthcare settings during the COVID-19 pandemic. ARK demonstrates the benefits of a Semantic Web approach for supporting both the integration and classification of qualitative and quantitative PPE risk data, across multiple healthcare organisations, in order to generate a unique unified evidence base of risk. This evidence base could be used to inform decision making processes regarding PPE use.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"492 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116324690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Gilgur, Brian Coutinho, Iyswarya Narayanan, Parth Malani
Maintaining efficient utilization of allocated compute resources and controlling their capital and operating expenditure is important for running a hyperscale datacenter infrastructure. Power is one of the most constrained and difficult to manage resources in datacenters. Accurate accounting of power usage across clients of multi-tenant web services can improve budgeting, planning and provisioning of compute resources. In this work, we propose a queuing theory based transitive power modeling framework that estimates the total power cost of a client request across the stack of shared services running in Facebook datacenters. By capturing the non-linearity of power vs load relation, our model is able to estimate marginal change in power consumption of a system upon serving a request with a mean error of less than 4% when applied on production services. In view of the fact that datacenter capacity is planned for peak demand, we test this model at peak load to report up to 2x improvement in accuracy compared to a mathematical model. We further leverage this framework along with a distributed tracing system to estimate power demand shift for serving particular product features within fraction of a percentage and guide the decision to shift their computation at off-peak time.
{"title":"Transitive Power Modeling for Improving Resource Efficiency in a Hyperscale Datacenter","authors":"A. Gilgur, Brian Coutinho, Iyswarya Narayanan, Parth Malani","doi":"10.1145/3442442.3452057","DOIUrl":"https://doi.org/10.1145/3442442.3452057","url":null,"abstract":"Maintaining efficient utilization of allocated compute resources and controlling their capital and operating expenditure is important for running a hyperscale datacenter infrastructure. Power is one of the most constrained and difficult to manage resources in datacenters. Accurate accounting of power usage across clients of multi-tenant web services can improve budgeting, planning and provisioning of compute resources. In this work, we propose a queuing theory based transitive power modeling framework that estimates the total power cost of a client request across the stack of shared services running in Facebook datacenters. By capturing the non-linearity of power vs load relation, our model is able to estimate marginal change in power consumption of a system upon serving a request with a mean error of less than 4% when applied on production services. In view of the fact that datacenter capacity is planned for peak demand, we test this model at peak load to report up to 2x improvement in accuracy compared to a mathematical model. We further leverage this framework along with a distributed tracing system to estimate power demand shift for serving particular product features within fraction of a percentage and guide the decision to shift their computation at off-peak time.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115841130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Much of what we do today is centered around humans — whether it is creating the next generation smartphones, understanding interactions with social media platforms, or developing new mobility strategies. A better understanding of people can not only answer fundamental questions about “us” as humans, but can also facilitate the development of enhanced, personalized technologies. In this talk, I will overview the main challenges (and opportunities) faced by research on multimodal sensing of human behavior, and illustrate these challenges with projects conducted in the Language and Information Technologies lab at Michigan.
{"title":"Language, Vision and Action are Better Together","authors":"Rada Mihalcea","doi":"10.1145/3442442.3451895","DOIUrl":"https://doi.org/10.1145/3442442.3451895","url":null,"abstract":"Much of what we do today is centered around humans — whether it is creating the next generation smartphones, understanding interactions with social media platforms, or developing new mobility strategies. A better understanding of people can not only answer fundamental questions about “us” as humans, but can also facilitate the development of enhanced, personalized technologies. In this talk, I will overview the main challenges (and opportunities) faced by research on multimodal sensing of human behavior, and illustrate these challenges with projects conducted in the Language and Information Technologies lab at Michigan.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115861439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Workforce diversification is essential to increase productivity in any world economy. In the context of the Fourth Industrial Revolution, that need is even more urgent since technological sectors are men-dominated. Despite the significant progress made towards gender inequality in the last decades, we are far from the ideal scenario. Changes towards equality are too slow and uneven across different world regions. Monitoring gender parity is essential to understand priorities and specificities in each world region. However, it is challenging because of the scarcity and the cost to obtain data, especially in less developed countries. In this paper we study how the Facebook Advertising Platform (Facebook Ads) can be used to assess gender imbalance in education, focusing on STEM (Science, Technology, Engineering, and Mathematics) areas, which are the main focus of the Fourth Revolution. As a case study, we apply our methodology to characterize Brazil in terms of gender balance in STEM as well as to correlate the results using Facebook Ads data with official Brazilian government numbers. Our results suggest that even considering a biased population where the majority is female, the proportion of men interested in some majors is higher than the proportion of women. Within STEM areas, we can identify two different patterns. Life Science and Math/Physical Sciences have female dominance, Environmental Science, Technology, and Engineering majors are still concentrated towards men. We also assess the impact of educational level and age on the interest in majors. The gender gap in STEM increases with the women’s educational level and age, as confirmed by official data in Brazil.
{"title":"Using Facebook Ads Data to Assess Gender Balance in STEM: Evidence from Brazil","authors":"C. C. Vieira, Marisa Vasconcelos","doi":"10.1145/3442442.3453456","DOIUrl":"https://doi.org/10.1145/3442442.3453456","url":null,"abstract":"Workforce diversification is essential to increase productivity in any world economy. In the context of the Fourth Industrial Revolution, that need is even more urgent since technological sectors are men-dominated. Despite the significant progress made towards gender inequality in the last decades, we are far from the ideal scenario. Changes towards equality are too slow and uneven across different world regions. Monitoring gender parity is essential to understand priorities and specificities in each world region. However, it is challenging because of the scarcity and the cost to obtain data, especially in less developed countries. In this paper we study how the Facebook Advertising Platform (Facebook Ads) can be used to assess gender imbalance in education, focusing on STEM (Science, Technology, Engineering, and Mathematics) areas, which are the main focus of the Fourth Revolution. As a case study, we apply our methodology to characterize Brazil in terms of gender balance in STEM as well as to correlate the results using Facebook Ads data with official Brazilian government numbers. Our results suggest that even considering a biased population where the majority is female, the proportion of men interested in some majors is higher than the proportion of women. Within STEM areas, we can identify two different patterns. Life Science and Math/Physical Sciences have female dominance, Environmental Science, Technology, and Engineering majors are still concentrated towards men. We also assess the impact of educational level and age on the interest in majors. The gender gap in STEM increases with the women’s educational level and age, as confirmed by official data in Brazil.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124409255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine-driven topic identification of online contents is a prevalent task in the natural language processing (NLP) domain. Social media deliberation reflects society's opinion, and a structured analysis of these contents allows us to decipher the same. We employ an NLP-based approach for investigating migration-related Twitter discussions. Besides traditional deep learning-based models, we have also considered pre-trained transformer-based models for analyzing our corpus. We have successfully classified multiple strands of public opinion related to European migrants. Finally, we use 'BertViz' to visually explore the interpretability of better performing transformer-based models.
{"title":"Analyzing European Migrant-related Twitter Deliberations","authors":"A. Khatua, W. Nejdl","doi":"10.1145/3442442.3453459","DOIUrl":"https://doi.org/10.1145/3442442.3453459","url":null,"abstract":"Machine-driven topic identification of online contents is a prevalent task in the natural language processing (NLP) domain. Social media deliberation reflects society's opinion, and a structured analysis of these contents allows us to decipher the same. We employ an NLP-based approach for investigating migration-related Twitter discussions. Besides traditional deep learning-based models, we have also considered pre-trained transformer-based models for analyzing our corpus. We have successfully classified multiple strands of public opinion related to European migrants. Finally, we use 'BertViz' to visually explore the interpretability of better performing transformer-based models.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126606574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonidas G. Anthopoulos, Christos Ziozias, A. Siokis
Planning and establishing digital transformation (DT) is a complex process for all the organizations. City's DT is another challenging and complex process, which demands both the leading and dedicated role of the local government, and the engagement and commitment of the local stakeholders on a commonly agreed vision and plan. European Commission launched its Digital (DCC) and Intelligent Cities Challenge (ICC) initiatives to provide cities with guidance and support to design and implement corresponding digital transformation strategies. Shaping this strategy became hard during the ICC due to the Covid-19 pandemic, which changed all the local priorities and affected the initial city planning. The aim of this work-in-progress paper is to present the strategic planning process for city's digital transformation that was followed by the municipality of Trikala in Greece, which regardless is a famous smart city it had to join the DCC and ICC initiatives in order to methodologically perform it. Useful evidence are depicted with regard to the different stakeholders’ perspectives and priorities within the city's digital transformation and especially whether and how the COVID-19 outbreak re-arranged or re-shaped them.
{"title":"Shaping a Digital Transformation Strategy for Smart Cities under the COVID-19 pandemic: Evidence from Greece","authors":"Leonidas G. Anthopoulos, Christos Ziozias, A. Siokis","doi":"10.1145/3442442.3453470","DOIUrl":"https://doi.org/10.1145/3442442.3453470","url":null,"abstract":"Planning and establishing digital transformation (DT) is a complex process for all the organizations. City's DT is another challenging and complex process, which demands both the leading and dedicated role of the local government, and the engagement and commitment of the local stakeholders on a commonly agreed vision and plan. European Commission launched its Digital (DCC) and Intelligent Cities Challenge (ICC) initiatives to provide cities with guidance and support to design and implement corresponding digital transformation strategies. Shaping this strategy became hard during the ICC due to the Covid-19 pandemic, which changed all the local priorities and affected the initial city planning. The aim of this work-in-progress paper is to present the strategic planning process for city's digital transformation that was followed by the municipality of Trikala in Greece, which regardless is a famous smart city it had to join the DCC and ICC initiatives in order to methodologically perform it. Useful evidence are depicted with regard to the different stakeholders’ perspectives and priorities within the city's digital transformation and especially whether and how the COVID-19 outbreak re-arranged or re-shaped them.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133209249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A talk with two parts covering three modalities. In the first part, I will talk about NLP Beyond Text, where we integrate visual context into a speech recognition model and find that the recovery of different types of masked speech inputs is improved by fine-grained visual grounding against detected objects [2]. In the second part, I will come Back Again, and talk about the benefits of textual supervision in cross-modal speech–vision retrieval models [1].
{"title":"Beyond Text and Back Again","authors":"Desmond Elliott","doi":"10.1145/3442442.3451896","DOIUrl":"https://doi.org/10.1145/3442442.3451896","url":null,"abstract":"A talk with two parts covering three modalities. In the first part, I will talk about NLP Beyond Text, where we integrate visual context into a speech recognition model and find that the recovery of different types of masked speech inputs is improved by fine-grained visual grounding against detected objects [2]. In the second part, I will come Back Again, and talk about the benefits of textual supervision in cross-modal speech–vision retrieval models [1].","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123965526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}