首页 > 最新文献

Companion Proceedings of the Web Conference 2021最新文献

英文 中文
Emotion-Aware Event Summarization in Microblogs 微博中的情绪感知事件总结
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452311
R. Panchendrarajan, W. Hsu, M. Lee
Microblogs have become the preferred means of communication for people to share information and feelings, especially for fast evolving events. Understanding the emotional reactions of people allows decision makers to formulate policies that are likely to be more well-received by the public and hence better accepted especially during policy implementation. However, uncovering the topics and emotions related to an event over time is a challenge due to the short and noisy nature of microblogs. This work proposes a weakly supervised learning approach to learn coherent topics and the corresponding emotional reactions as an event unfolds. We summarize the event by giving the representative microblogs and the emotion distributions associated with the topics over time. Experiments on multiple real-world event datasets demonstrate the effectiveness of the proposed approach over existing solutions.
微博已经成为人们分享信息和感受的首选交流方式,尤其是在快速发展的事件中。了解人们的情绪反应,可以让决策者制定更容易被公众接受的政策,从而更好地接受政策,特别是在政策实施过程中。然而,由于微博短而嘈杂的特性,随着时间的推移,发现与事件相关的话题和情绪是一项挑战。这项工作提出了一种弱监督学习方法来学习连贯的主题和相应的情绪反应,作为一个事件展开。我们通过给出具有代表性的微博以及随着时间的推移与主题相关的情绪分布来总结事件。在多个真实事件数据集上的实验证明了该方法优于现有解决方案的有效性。
{"title":"Emotion-Aware Event Summarization in Microblogs","authors":"R. Panchendrarajan, W. Hsu, M. Lee","doi":"10.1145/3442442.3452311","DOIUrl":"https://doi.org/10.1145/3442442.3452311","url":null,"abstract":"Microblogs have become the preferred means of communication for people to share information and feelings, especially for fast evolving events. Understanding the emotional reactions of people allows decision makers to formulate policies that are likely to be more well-received by the public and hence better accepted especially during policy implementation. However, uncovering the topics and emotions related to an event over time is a challenge due to the short and noisy nature of microblogs. This work proposes a weakly supervised learning approach to learn coherent topics and the corresponding emotional reactions as an event unfolds. We summarize the event by giving the representative microblogs and the emotion distributions associated with the topics over time. Experiments on multiple real-world event datasets demonstrate the effectiveness of the proposed approach over existing solutions.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121213942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Cross-city Analysis of Location-based Sentiment in User-generated Text 用户生成文本中基于位置的情感跨城市分析
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451889
Christopher Stelzmüller, Sebastian Tanzer, M. Schedl
Geolocated user-generated content is a promising source of data reflecting how citizens live and feel. Information extracted from this source is being increasingly used for urban planning and policy evaluation purposes. While a lot of existing research focuses on the relationship between locations and sentiment in social media postings, we aim to uncover relations between location and sentiment that are consistent over cities around the world. In this paper, we therefore analyze the relationship between multiple categories of points of interest (POIs) in the OpenStreetMap dataset and the sentiment of English microblogging messages sent nearby using a three-stage processing pipeline: (1) extract sentiment scores from geolocated microblogs posted on Twitter, (2) spatial aggregation of sentiment in cities and POIs, (3) analyze relationships in aggregated sentiment. We identify differences in Twitter users’ sentiments within cities based on POIs, and we investigate the temporal dynamics of these sentiments and compare our findings between major cities in multiple countries.
定位用户生成的内容是反映公民生活和感受的一个很有前景的数据来源。从这一来源提取的信息越来越多地用于城市规划和政策评价目的。虽然许多现有的研究都集中在社交媒体帖子中的位置和情绪之间的关系上,但我们的目标是揭示世界各地城市的位置和情绪之间的关系。因此,在本文中,我们分析了OpenStreetMap数据集中的多个兴趣点(poi)类别与附近发送的英语微博消息的情感之间的关系,使用了三个阶段的处理管道:(1)从Twitter上发布的地理定位微博中提取情感分数,(2)城市和poi的情感空间聚合,(3)分析聚合情感中的关系。我们根据poi确定了城市内Twitter用户情绪的差异,我们调查了这些情绪的时间动态,并比较了多个国家主要城市之间的发现。
{"title":"Cross-city Analysis of Location-based Sentiment in User-generated Text","authors":"Christopher Stelzmüller, Sebastian Tanzer, M. Schedl","doi":"10.1145/3442442.3451889","DOIUrl":"https://doi.org/10.1145/3442442.3451889","url":null,"abstract":"Geolocated user-generated content is a promising source of data reflecting how citizens live and feel. Information extracted from this source is being increasingly used for urban planning and policy evaluation purposes. While a lot of existing research focuses on the relationship between locations and sentiment in social media postings, we aim to uncover relations between location and sentiment that are consistent over cities around the world. In this paper, we therefore analyze the relationship between multiple categories of points of interest (POIs) in the OpenStreetMap dataset and the sentiment of English microblogging messages sent nearby using a three-stage processing pipeline: (1) extract sentiment scores from geolocated microblogs posted on Twitter, (2) spatial aggregation of sentiment in cities and POIs, (3) analyze relationships in aggregated sentiment. We identify differences in Twitter users’ sentiments within cities based on POIs, and we investigate the temporal dynamics of these sentiments and compare our findings between major cities in multiple countries.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128698357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Extraction and Evaluation of Statistical Information from Social and Behavioral Science Papers 从社会和行为科学论文中提取和评估统计信息
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451363
Sree Sai Teja Lanka, S. Rajtmajer, Jian Wu, C. Lee Giles
With substantial and continuing increases in the number of published papers across the scientific literature, development of reliable approaches for automated discovery and assessment of published findings is increasingly urgent. Tools which can extract critical information from scientific papers and metadata can support representation and reasoning over existing findings, and offer insights into replicability, robustness and generalizability of specific claims. In this work, we present a pipeline for the extraction of statistical information (p-values, sample size, number of hypotheses tested) from full-text scientific documents. We validate our approach on 300 papers selected from the social and behavioral science literatures, and suggest directions for next steps.
随着科学文献中发表的论文数量的大量持续增加,开发可靠的方法来自动发现和评估已发表的发现变得越来越紧迫。可以从科学论文和元数据中提取关键信息的工具可以支持对现有发现的表示和推理,并提供对特定主张的可复制性、稳健性和概括性的见解。在这项工作中,我们提出了一个从全文科学文献中提取统计信息(p值,样本量,检验假设数量)的管道。我们从社会和行为科学文献中选择了300篇论文来验证我们的方法,并提出了下一步的方向。
{"title":"Extraction and Evaluation of Statistical Information from Social and Behavioral Science Papers","authors":"Sree Sai Teja Lanka, S. Rajtmajer, Jian Wu, C. Lee Giles","doi":"10.1145/3442442.3451363","DOIUrl":"https://doi.org/10.1145/3442442.3451363","url":null,"abstract":"With substantial and continuing increases in the number of published papers across the scientific literature, development of reliable approaches for automated discovery and assessment of published findings is increasingly urgent. Tools which can extract critical information from scientific papers and metadata can support representation and reasoning over existing findings, and offer insights into replicability, robustness and generalizability of specific claims. In this work, we present a pipeline for the extraction of statistical information (p-values, sample size, number of hypotheses tested) from full-text scientific documents. We validate our approach on 300 papers selected from the social and behavioral science literatures, and suggest directions for next steps.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116332836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
FastSNG: The Fastest Social Network Dataset Generator FastSNG:最快的社交网络数据集生成器
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3458604
Binbin Wang, Chaokun Wang, Hao Feng
Large-scale social networks have become more and more popular with the rapid progress of social media. A number of social network analysis tasks have been developed to conduct on the real large-scale networks. However, the prohibitive cost of achieving the underlying large network, including time cost and data privacy, makes it hard to evaluate the performance of analysis algorithms on real-world social networks. In this paper, we present a tool called FastSNG, which generates heterogeneous social network datasets according to the user-defined configuration depicting the rich characteristics of the expected social network, such as community structures, attributes, and node degree distributions. Moreover, the generation algorithm of FastSNG adopts a degree distribution generation (D2G) model which is efficient to generate web-scale social network datasets. Finally, the tool provides user-friendly and succinct user interfaces for the interaction with general users.
随着社交媒体的飞速发展,大型社交网络越来越受欢迎。许多社会网络分析任务已经开发出来,可以在真实的大规模网络上进行。然而,实现底层大型网络的高昂成本,包括时间成本和数据隐私,使得很难评估分析算法在现实世界社交网络上的性能。在本文中,我们提出了一个名为FastSNG的工具,该工具根据用户定义的配置生成异构社交网络数据集,这些配置描述了预期社交网络的丰富特征,如社区结构、属性和节点度分布。FastSNG的生成算法采用度分布生成(D2G)模型,能够高效地生成web规模的社交网络数据集。最后,该工具为与一般用户的交互提供了用户友好且简洁的用户界面。
{"title":"FastSNG: The Fastest Social Network Dataset Generator","authors":"Binbin Wang, Chaokun Wang, Hao Feng","doi":"10.1145/3442442.3458604","DOIUrl":"https://doi.org/10.1145/3442442.3458604","url":null,"abstract":"Large-scale social networks have become more and more popular with the rapid progress of social media. A number of social network analysis tasks have been developed to conduct on the real large-scale networks. However, the prohibitive cost of achieving the underlying large network, including time cost and data privacy, makes it hard to evaluate the performance of analysis algorithms on real-world social networks. In this paper, we present a tool called FastSNG, which generates heterogeneous social network datasets according to the user-defined configuration depicting the rich characteristics of the expected social network, such as community structures, attributes, and node degree distributions. Moreover, the generation algorithm of FastSNG adopts a degree distribution generation (D2G) model which is efficient to generate web-scale social network datasets. Finally, the tool provides user-friendly and succinct user interfaces for the interaction with general users.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115239576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
LocWeb2021 Workshop – Chair’s Welcome: Eleventh International Workshop on Location and the Web at The Web Conference 2021 LocWeb2021研讨会-主席欢迎:第十一届“位置与网络”国际研讨会
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451891
Dirk Ahlers, Erik Wilde, R. Schifanella, Jalal S. Alowibdi
LocWeb2021 (Eleventh International Workshop on Location and the Web) is a workshop at The Web Conference 2021, with evolving topics around location-aware information access, Web architecture, spatial social computing, and social good. It is designed as a meeting place for researchers around the location topic at The Web Conference.
LocWeb2021(第十一届国际位置与网络研讨会)是2021年网络会议的一个研讨会,围绕位置感知信息访问、Web架构、空间社会计算和社会公益等主题不断发展。它被设计为研究人员围绕Web会议的位置主题的会议场所。
{"title":"LocWeb2021 Workshop – Chair’s Welcome: Eleventh International Workshop on Location and the Web at The Web Conference 2021","authors":"Dirk Ahlers, Erik Wilde, R. Schifanella, Jalal S. Alowibdi","doi":"10.1145/3442442.3451891","DOIUrl":"https://doi.org/10.1145/3442442.3451891","url":null,"abstract":"LocWeb2021 (Eleventh International Workshop on Location and the Web) is a workshop at The Web Conference 2021, with evolving topics around location-aware information access, Web architecture, spatial social computing, and social good. It is designed as a meeting place for researchers around the location topic at The Web Conference.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126752834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Information flow on COVID-19 over Wikipedia: A case study of 11 languages 维基百科上关于COVID-19的信息流:以11种语言为例
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452352
Chang-Ryong Jung, I. Hong, Diego Sáez-Trumper, Damin Lee, Jaehyeon Myung, Danu Kim, Jinhyuk Yun, Woo-Sung Jung, M. Cha
Wikipedia has been a critical information source during the COVID-19 pandemic. Analyzing how information is created, edited, and viewed on this platform can help gain new insights for risk communication strategies for the next pandemic. Here, we study the content editor and viewer patterns on the COVID-19 related documents on Wikipedia using a near-complete dataset gathered of 11 languages over 238 days in 2020. Based on the analysis of the daily access and edit logs on the identified Wikipedia pages, we discuss how the regional and cultural closeness factors affect information demand and supply.
在COVID-19大流行期间,维基百科一直是一个重要的信息来源。分析如何在这个平台上创建、编辑和查看信息,有助于为下一次大流行的风险沟通战略获得新的见解。在这里,我们使用2020年238天内收集的11种语言的近乎完整的数据集,研究了维基百科上与COVID-19相关文档的内容编辑器和查看器模式。本文通过对维基百科已识别页面的日常访问和编辑日志的分析,探讨了地域和文化亲密性因素对信息需求和供给的影响。
{"title":"Information flow on COVID-19 over Wikipedia: A case study of 11 languages","authors":"Chang-Ryong Jung, I. Hong, Diego Sáez-Trumper, Damin Lee, Jaehyeon Myung, Danu Kim, Jinhyuk Yun, Woo-Sung Jung, M. Cha","doi":"10.1145/3442442.3452352","DOIUrl":"https://doi.org/10.1145/3442442.3452352","url":null,"abstract":"Wikipedia has been a critical information source during the COVID-19 pandemic. Analyzing how information is created, edited, and viewed on this platform can help gain new insights for risk communication strategies for the next pandemic. Here, we study the content editor and viewer patterns on the COVID-19 related documents on Wikipedia using a near-complete dataset gathered of 11 languages over 238 days in 2020. Based on the analysis of the daily access and edit logs on the identified Wikipedia pages, we discuss how the regional and cultural closeness factors affect information demand and supply.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126781003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generating Rich Product Descriptions for Conversational E-commerce Systems 生成会话式电子商务系统的丰富产品描述
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451893
Shashank Kedia, Aditya Mantha, Sneha R. Gupta, Stephen D. Guo, Kannan Achan
Through recent advancements in speech technologies and introduction of smart assistants, such as Amazon Alexa, Apple Siri and Google Home, increasing number of users are interacting with various applications through voice commands. E-commerce companies typically display short product titles on their webpages, either human-curated or algorithmically generated, when brevity is required. However, these titles are dissimilar from natural spoken language. For example, ”Lucky Charms Gluten Free Break-fast Cereal, 20.5 oz a box Lucky Charms Gluten Free” is acceptable to display on a webpage, while a similar title cannot be used in a voice based text-to-speech application. In such conversational systems, an easy to comprehend sentence, such as ”a 20.5 ounce box of lucky charms gluten free cereal” is preferred. Compared to display devices, where images and detailed product information can be presented to users, short titles for products which convey the most important information, are necessary when interfacing with voice assistants. We propose eBERT, a sequence-to-sequence approach by further pre-training the BERT embeddings on an e-commerce product description corpus, and then fine-tuning the resulting model to generate short, natural, spoken language titles from input web titles. Our extensive experiments on a real-world industry dataset, as well as human evaluation of model output, demonstrate that eBERT summarization outperforms comparable baseline models. Owing to the efficacy of the model, a version of this model has been deployed in real-world setting.
通过最近语音技术的进步和智能助手的引入,如亚马逊Alexa,苹果Siri和谷歌Home,越来越多的用户通过语音命令与各种应用程序进行交互。当需要简洁时,电子商务公司通常会在他们的网页上显示简短的产品标题,要么是人工策划的,要么是算法生成的。然而,这些头衔不同于自然的口语。例如,“Lucky Charms无麸质早餐麦片,20.5盎司一盒Lucky Charms无麸质”可以显示在网页上,而类似的标题不能用于基于语音的文本到语音应用程序。在这样的对话系统中,一个容易理解的句子,比如“一盒20.5盎司的幸运符无麸质麦片”是首选。与显示设备相比,可以向用户展示图像和详细的产品信息,在与语音助手交互时,需要为传达最重要信息的产品提供简短的标题。我们提出了eBERT,这是一种序列到序列的方法,通过在电子商务产品描述语料库上进一步预训练BERT嵌入,然后对结果模型进行微调,以从输入的网络标题中生成简短、自然的口语标题。我们在真实世界的工业数据集上进行了广泛的实验,并对模型输出进行了人工评估,结果表明,eBERT总结优于可比的基线模型。由于该模型的有效性,该模型的一个版本已在现实环境中部署。
{"title":"Generating Rich Product Descriptions for Conversational E-commerce Systems","authors":"Shashank Kedia, Aditya Mantha, Sneha R. Gupta, Stephen D. Guo, Kannan Achan","doi":"10.1145/3442442.3451893","DOIUrl":"https://doi.org/10.1145/3442442.3451893","url":null,"abstract":"Through recent advancements in speech technologies and introduction of smart assistants, such as Amazon Alexa, Apple Siri and Google Home, increasing number of users are interacting with various applications through voice commands. E-commerce companies typically display short product titles on their webpages, either human-curated or algorithmically generated, when brevity is required. However, these titles are dissimilar from natural spoken language. For example, ”Lucky Charms Gluten Free Break-fast Cereal, 20.5 oz a box Lucky Charms Gluten Free” is acceptable to display on a webpage, while a similar title cannot be used in a voice based text-to-speech application. In such conversational systems, an easy to comprehend sentence, such as ”a 20.5 ounce box of lucky charms gluten free cereal” is preferred. Compared to display devices, where images and detailed product information can be presented to users, short titles for products which convey the most important information, are necessary when interfacing with voice assistants. We propose eBERT, a sequence-to-sequence approach by further pre-training the BERT embeddings on an e-commerce product description corpus, and then fine-tuning the resulting model to generate short, natural, spoken language titles from input web titles. Our extensive experiments on a real-world industry dataset, as well as human evaluation of model output, demonstrate that eBERT summarization outperforms comparable baseline models. Owing to the efficacy of the model, a version of this model has been deployed in real-world setting.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127134669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
IP Geolocation Using Traceroute Location Propagation and IP Range Location Interpolation 使用Traceroute位置传播和IP范围位置插值的IP地理定位
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451888
Ovidiu Dan, Vaibhav Parikh, Brian D. Davison
Many online services, including search engines, content delivery networks, ad networks, and fraud detection utilize IP geolocation databases to map IP addresses to their physical locations. However, IP geolocation databases are often inaccurate. We present a novel IP geolocation technique based on combining propagating IP location information through traceroutes with IP interpolation. Using a large ground truth set, we show that physical locations of IP addresses can be propagated along traceroute paths. We also experiment with and expand upon the concept of IP range location interpolation, where we use the location of individual addresses in an IP range to assign a location to the entire range. The results show that our approach significantly outperforms commercial geolocation by up to 31 percentage points. We open source several components to aid in reproducing our results.
许多在线服务,包括搜索引擎、内容交付网络、广告网络和欺诈检测,都利用IP地理位置数据库将IP地址映射到它们的物理位置。然而,IP地理位置数据库通常是不准确的。提出了一种基于traceroutes传播IP位置信息和IP插值相结合的IP定位技术。使用一个大的地面真值集,我们展示了IP地址的物理位置可以沿着跟踪路由路径传播。我们还尝试并扩展了IP范围位置插值的概念,在这个概念中,我们使用IP范围中单个地址的位置来为整个范围分配位置。结果表明,我们的方法明显优于商业地理定位高达31个百分点。我们开源了几个组件来帮助重现我们的结果。
{"title":"IP Geolocation Using Traceroute Location Propagation and IP Range Location Interpolation","authors":"Ovidiu Dan, Vaibhav Parikh, Brian D. Davison","doi":"10.1145/3442442.3451888","DOIUrl":"https://doi.org/10.1145/3442442.3451888","url":null,"abstract":"Many online services, including search engines, content delivery networks, ad networks, and fraud detection utilize IP geolocation databases to map IP addresses to their physical locations. However, IP geolocation databases are often inaccurate. We present a novel IP geolocation technique based on combining propagating IP location information through traceroutes with IP interpolation. Using a large ground truth set, we show that physical locations of IP addresses can be propagated along traceroute paths. We also experiment with and expand upon the concept of IP range location interpolation, where we use the location of individual addresses in an IP range to assign a location to the entire range. The results show that our approach significantly outperforms commercial geolocation by up to 31 percentage points. We open source several components to aid in reproducing our results.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124839887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Analysis and Visualisation of Time Series Data on Networks with Pathpy 基于路径的网络时间序列数据分析与可视化
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3452052
Jürgen Hackl, Ingo Scholtes, L. V. Petrovic, Vincenzo Perri, Luca Verginer, Christoph Gote
The Open Source software package pathpy, available at https://www.pathpy.net, implements statistical techniques to learn optimal graphical models for the causal topology generated by paths in time-series data. Operationalizing Occam’s razor, these models balance model complexity with explanatory power for empirically observed paths in relational time series. Standard network analysis is justified if the inferred optimal model is a first-order network model. Optimal models with orders larger than one indicate higher-order dependencies and can be used to improve the analysis of dynamical processes, node centralities and clusters.
开源软件包pathpy(可在https://www.pathpy.net获得)实现了统计技术,以学习由时间序列数据中的路径生成的因果拓扑的最佳图形模型。运用Occam剃刀,这些模型平衡了模型的复杂性和对关系时间序列中经验观察路径的解释力。如果推断的最优模型是一阶网络模型,则标准网络分析是合理的。阶数大于1的最优模型表示高阶依赖关系,可用于改进动态过程、节点中心性和聚类的分析。
{"title":"Analysis and Visualisation of Time Series Data on Networks with Pathpy","authors":"Jürgen Hackl, Ingo Scholtes, L. V. Petrovic, Vincenzo Perri, Luca Verginer, Christoph Gote","doi":"10.1145/3442442.3452052","DOIUrl":"https://doi.org/10.1145/3442442.3452052","url":null,"abstract":"The Open Source software package pathpy, available at https://www.pathpy.net, implements statistical techniques to learn optimal graphical models for the causal topology generated by paths in time-series data. Operationalizing Occam’s razor, these models balance model complexity with explanatory power for empirically observed paths in relational time series. Standard network analysis is justified if the inferred optimal model is a first-order network model. Optimal models with orders larger than one indicate higher-order dependencies and can be used to improve the analysis of dynamical processes, node centralities and clusters.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124285312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Predicting Paper Acceptance via Interpretable Decision Sets 通过可解释决策集预测论文接受度
Pub Date : 2021-04-19 DOI: 10.1145/3442442.3451370
Peng Bao, Weihui Hong, Xuanya Li
Measuring the quality of research work is an essential component of the scientific process. With the ever-growing rates of articles being submitted to top-tier conferences, and the potential consistency and bias issues in the peer review process identified by scientific community, it is thus of great necessary and challenge to automatically evaluate submissions. Existing works mainly focus on exploring relevant factors and applying machine learning models to simply be accurate at predicting the acceptance of a given academic paper, while ignoring the interpretability power which is required by a wide range of applications. In this paper, we propose a framework to construct decision sets that consist of unordered if-then rules for predicting paper acceptance. We formalize decision set learning problem via a joint objective function that simultaneously optimize accuracy and interpretability of the rules, rather than organizing them in a hierarchy. We evaluate the effectiveness of the proposed framework by applying it on a public scientific peer reviews dataset. Experimental results demonstrate that the learned interpretable decision sets by our framework performs on par with state-of-the-art classification algorithms which optimize exclusively for predictive accuracy and much more interpretable than rule-based methods.
衡量研究工作的质量是科学过程的一个重要组成部分。随着一流会议论文投稿率的不断增长,以及同行评议过程中潜在的一致性和偏倚问题被科学界发现,对投稿进行自动评估是非常必要和具有挑战性的。现有的工作主要集中在探索相关因素和应用机器学习模型来简单准确地预测给定学术论文的接受程度,而忽略了广泛应用所需的可解释性能力。在本文中,我们提出了一个框架来构建由无序if-then规则组成的决策集,用于预测论文的接受程度。我们通过一个联合目标函数来形式化决策集学习问题,该函数同时优化规则的准确性和可解释性,而不是将它们组织在层次结构中。我们通过将所提出的框架应用于公共科学同行评审数据集来评估其有效性。实验结果表明,通过我们的框架学习的可解释决策集与最先进的分类算法相当,这些算法专门针对预测准确性进行优化,并且比基于规则的方法更具可解释性。
{"title":"Predicting Paper Acceptance via Interpretable Decision Sets","authors":"Peng Bao, Weihui Hong, Xuanya Li","doi":"10.1145/3442442.3451370","DOIUrl":"https://doi.org/10.1145/3442442.3451370","url":null,"abstract":"Measuring the quality of research work is an essential component of the scientific process. With the ever-growing rates of articles being submitted to top-tier conferences, and the potential consistency and bias issues in the peer review process identified by scientific community, it is thus of great necessary and challenge to automatically evaluate submissions. Existing works mainly focus on exploring relevant factors and applying machine learning models to simply be accurate at predicting the acceptance of a given academic paper, while ignoring the interpretability power which is required by a wide range of applications. In this paper, we propose a framework to construct decision sets that consist of unordered if-then rules for predicting paper acceptance. We formalize decision set learning problem via a joint objective function that simultaneously optimize accuracy and interpretability of the rules, rather than organizing them in a hierarchy. We evaluate the effectiveness of the proposed framework by applying it on a public scientific peer reviews dataset. Experimental results demonstrate that the learned interpretable decision sets by our framework performs on par with state-of-the-art classification algorithms which optimize exclusively for predictive accuracy and much more interpretable than rule-based methods.","PeriodicalId":129420,"journal":{"name":"Companion Proceedings of the Web Conference 2021","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123312747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Companion Proceedings of the Web Conference 2021
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1