Proceedings of the Web Conference 2021最新文献_第2页

Towards a Better Understanding of Query Reformulation Behavior in Web Search 更好地理解Web搜索中的查询重构行为

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450127

Jia Chen, Jiaxin Mao, Yiqun Liu, Fan Zhang, M. Zhang, Shaoping Ma

As queries submitted by users directly affect search experiences, how to organize queries has always been a research focus in Web search studies. While search request becomes complex and exploratory, many search sessions contain more than a single query thus reformulation becomes a necessity. To help users better formulate their queries in these complex search tasks, modern search engines usually provide a series of reformulation entries on search engine result pages (SERPs), i.e., query suggestions and related entities. However, few existing work have thoroughly studied why and how users perform query reformulations in these heterogeneous interfaces. Therefore, whether search engines provide sufficient assistance for users in reformulating queries remains under-investigated. To shed light on this research question, we conducted a field study to analyze fine-grained user reformulation behaviors including reformulation type, entry, reason, and the inspiration source with various search intents. Different from existing efforts that rely on external assessors to make judgments, in the field study we collect both implicit behavior signals and explicit user feedback information. Analysis results demonstrate that query reformulation behavior in Web search varies with the type of search tasks. We also found that the current query suggestion/related query recommendations provided by search engines do not offer enough help for users in complex search tasks. Based on the findings in our field study, we design a supervised learning framework to predict: 1) the reason behind each query reformulation, and 2) how users organize the reformulated query, both of which are novel challenges in this domain. This work provides insight into complex query reformulation behavior in Web search as well as the guidance for designing better query suggestion techniques in search engines.

由于用户提交的查询直接影响到搜索体验，如何组织查询一直是Web搜索研究的热点。当搜索请求变得复杂和探索性时，许多搜索会话包含不止一个查询，因此需要重新表述。为了帮助用户在这些复杂的搜索任务中更好地制定查询，现代搜索引擎通常在搜索引擎结果页(serp)上提供一系列重新制定的条目，即查询建议和相关实体。然而，很少有现有的工作深入研究用户为什么以及如何在这些异构接口中执行查询重新表述。因此，搜索引擎是否为用户重新表述查询提供了足够的帮助仍有待调查。为了阐明这一研究问题，我们进行了一项实地研究，分析了细粒度的用户重构行为，包括不同搜索意图下的重构类型、入口、原因和灵感来源。与现有的依赖于外部评估者做出判断的工作不同，在实地研究中，我们收集了隐性行为信号和显性用户反馈信息。分析结果表明，Web搜索中的查询重构行为随搜索任务类型的不同而不同。我们还发现，目前搜索引擎提供的查询建议/相关查询推荐并没有为用户在复杂的搜索任务中提供足够的帮助。基于我们的实地研究结果，我们设计了一个监督学习框架来预测:1)每个查询重新表述背后的原因，以及2)用户如何组织重新表述的查询，这两者都是该领域的新挑战。这项工作提供了对Web搜索中复杂查询重新表述行为的洞察，并为在搜索引擎中设计更好的查询建议技术提供了指导。

{"title":"Towards a Better Understanding of Query Reformulation Behavior in Web Search","authors":"Jia Chen, Jiaxin Mao, Yiqun Liu, Fan Zhang, M. Zhang, Shaoping Ma","doi":"10.1145/3442381.3450127","DOIUrl":"https://doi.org/10.1145/3442381.3450127","url":null,"abstract":"As queries submitted by users directly affect search experiences, how to organize queries has always been a research focus in Web search studies. While search request becomes complex and exploratory, many search sessions contain more than a single query thus reformulation becomes a necessity. To help users better formulate their queries in these complex search tasks, modern search engines usually provide a series of reformulation entries on search engine result pages (SERPs), i.e., query suggestions and related entities. However, few existing work have thoroughly studied why and how users perform query reformulations in these heterogeneous interfaces. Therefore, whether search engines provide sufficient assistance for users in reformulating queries remains under-investigated. To shed light on this research question, we conducted a field study to analyze fine-grained user reformulation behaviors including reformulation type, entry, reason, and the inspiration source with various search intents. Different from existing efforts that rely on external assessors to make judgments, in the field study we collect both implicit behavior signals and explicit user feedback information. Analysis results demonstrate that query reformulation behavior in Web search varies with the type of search tasks. We also found that the current query suggestion/related query recommendations provided by search engines do not offer enough help for users in complex search tasks. Based on the findings in our field study, we design a supervised learning framework to predict: 1) the reason behind each query reformulation, and 2) how users organize the reformulated query, both of which are novel challenges in this domain. This work provides insight into complex query reformulation behavior in Web search as well as the guidance for designing better query suggestion techniques in search engines.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128410844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Predicting Customer Value with Social Relationships via Motif-based Graph Attention Networks 基于主题的图形注意网络的社会关系客户价值预测

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449849

J. Piao, Guozheng Zhang, Fengli Xu, Zhilong Chen, Yong Li

Customer value is essential for successful customer relationship management. Although growing evidence suggests that customers’ purchase decisions can be influenced by social relationships, social influence is largely overlooked in previous research. In this work, we fill this gap with a novel framework — Motif-based Multi-view Graph Attention Networks with Gated Fusion (MAG), which jointly considers customer demographics, past behaviors, and social network structures. Specifically, (1) to make the best use of higher-order information in complex social networks, we design a motif-based multi-view graph attention module, which explicitly captures different higher-order structures, along with the attention mechanism auto-assigning high weights to informative ones. (2) To model the complex effects of customer attributes and social influence, we propose a gated fusion module with two gates: one depicts the susceptibility to social influence and the other depicts the dependency of the two factors. Extensive experiments on two large-scale datasets show superior performance of our model over the state-of-the-art baselines. Further, we discover that the increase of motifs does not guarantee better performances and identify how motifs play different roles. These findings shed light on how to understand socio-economic relationships among customers and find high-value customers.

客户价值对于成功的客户关系管理至关重要。尽管越来越多的证据表明，顾客的购买决策可以受到社会关系的影响，但在以前的研究中，社会影响在很大程度上被忽视了。在这项工作中，我们用一个新颖的框架填补了这一空白——基于主题的多视图图注意力网络与门控制融合(MAG)，它联合考虑了客户人口统计、过去的行为和社会网络结构。具体而言，(1)为了最大限度地利用复杂社会网络中的高阶信息，我们设计了一个基于主题的多视图图注意模块，该模块明确捕获了不同的高阶结构，并对信息丰富的结构自动分配高权重。(2)为了模拟顾客属性和社会影响的复杂效应，我们提出了一个门控融合模块，其中一个门描述了对社会影响的敏感性，另一个门描述了这两个因素的依赖性。在两个大规模数据集上进行的大量实验表明，我们的模型在最先进的基线上具有优越的性能。此外，我们发现基序的增加并不能保证更好的性能，并确定了基序如何发挥不同的作用。这些发现揭示了如何理解客户之间的社会经济关系，并找到高价值客户。

{"title":"Predicting Customer Value with Social Relationships via Motif-based Graph Attention Networks","authors":"J. Piao, Guozheng Zhang, Fengli Xu, Zhilong Chen, Yong Li","doi":"10.1145/3442381.3449849","DOIUrl":"https://doi.org/10.1145/3442381.3449849","url":null,"abstract":"Customer value is essential for successful customer relationship management. Although growing evidence suggests that customers’ purchase decisions can be influenced by social relationships, social influence is largely overlooked in previous research. In this work, we fill this gap with a novel framework — Motif-based Multi-view Graph Attention Networks with Gated Fusion (MAG), which jointly considers customer demographics, past behaviors, and social network structures. Specifically, (1) to make the best use of higher-order information in complex social networks, we design a motif-based multi-view graph attention module, which explicitly captures different higher-order structures, along with the attention mechanism auto-assigning high weights to informative ones. (2) To model the complex effects of customer attributes and social influence, we propose a gated fusion module with two gates: one depicts the susceptibility to social influence and the other depicts the dependency of the two factors. Extensive experiments on two large-scale datasets show superior performance of our model over the state-of-the-art baselines. Further, we discover that the increase of motifs does not guarantee better performances and identify how motifs play different roles. These findings shed light on how to understand socio-economic relationships among customers and find high-value customers.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127033676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Search Engines vs. Symptom Checkers: A Comparison of their Effectiveness for Online Health Advice 搜索引擎与症状检查器:在线健康建议的有效性比较

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450140

Sebastian Cross, Ahmed Mourad, G. Zuccon, B. Koopman

Increasingly, people go online to seek health advice. They commonly use the symptoms they are experiencing to identify the health conditions they may have (self-diagnosis task) as well as to determine an appropriate action to take (triaging task); e.g., should they seek emergent medical attention or attempt to treat themselves at home? This paper investigates the effectiveness of two of the most common methods people use for self-diagnosis and triaging: online symptom checkers and traditional web search engines. To this end, we conducted a user study with 64 real-world users performing 8 simulated self-diagnosis tasks. Participants were exposed to both a representative symptom checker and a search engine. The results of our study provides empirical evidence for whether using a search engine for health information improves people’s understanding of their health condition and their ability to act on them, compared to interacting with a symptom checker, which bases its interaction model on a question-answering process. Additionally, recorded answers to qualitative questionnaires from study participants provide insights into which style of interaction and system they prefer to use for obtaining medical information, and how helpful they thought each system was. These findings can help inform the development of better search engines and symptom checkers that support people seeking health advice online.

越来越多的人上网寻求健康建议。他们通常使用他们所经历的症状来确定他们可能有的健康状况(自我诊断任务)以及确定要采取的适当行动(分诊任务);例如，他们应该寻求紧急医疗照顾还是尝试在家治疗自己?本文调查了人们用于自我诊断和分诊的两种最常用方法的有效性:在线症状检查器和传统的网络搜索引擎。为此，我们进行了一项用户研究，让64名真实世界的用户执行8项模拟自我诊断任务。参与者同时接触到代表性症状检查器和搜索引擎。我们的研究结果为使用健康信息搜索引擎是否能提高人们对自己健康状况的理解和采取行动的能力提供了经验证据，而与症状检查器的互动模型基于问答过程。此外，记录了研究参与者对定性问卷的回答，可以了解他们更喜欢使用哪种交互方式和系统来获取医疗信息，以及他们认为每种系统的帮助程度。这些发现可以帮助开发更好的搜索引擎和症状检查器，以支持人们在网上寻求健康建议。

{"title":"Search Engines vs. Symptom Checkers: A Comparison of their Effectiveness for Online Health Advice","authors":"Sebastian Cross, Ahmed Mourad, G. Zuccon, B. Koopman","doi":"10.1145/3442381.3450140","DOIUrl":"https://doi.org/10.1145/3442381.3450140","url":null,"abstract":"Increasingly, people go online to seek health advice. They commonly use the symptoms they are experiencing to identify the health conditions they may have (self-diagnosis task) as well as to determine an appropriate action to take (triaging task); e.g., should they seek emergent medical attention or attempt to treat themselves at home? This paper investigates the effectiveness of two of the most common methods people use for self-diagnosis and triaging: online symptom checkers and traditional web search engines. To this end, we conducted a user study with 64 real-world users performing 8 simulated self-diagnosis tasks. Participants were exposed to both a representative symptom checker and a search engine. The results of our study provides empirical evidence for whether using a search engine for health information improves people’s understanding of their health condition and their ability to act on them, compared to interacting with a symptom checker, which bases its interaction model on a question-answering process. Additionally, recorded answers to qualitative questionnaires from study participants provide insights into which style of interaction and system they prefer to use for obtaining medical information, and how helpful they thought each system was. These findings can help inform the development of better search engines and symptom checkers that support people seeking health advice online.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"81 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123221444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Cross-lingual Language Model Pretraining for Retrieval 面向检索的跨语言模型预训练

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449830

Puxuan Yu, Hongliang Fei, P. Li

Existing research on cross-lingual retrieval cannot take good advantage of large-scale pretrained language models such as multilingual BERT and XLM. We hypothesize that the absence of cross-lingual passage-level relevance data for finetuning and the lack of query-document style pretraining are key factors of this issue. In this paper, we introduce two novel retrieval-oriented pretraining tasks to further pretrain cross-lingual language models for downstream retrieval tasks such as cross-lingual ad-hoc retrieval (CLIR) and cross-lingual question answering (CLQA). We construct distant supervision data from multilingual Wikipedia using section alignment to support retrieval-oriented language model pretraining. We also propose to directly finetune language models on part of the evaluation collection by making Transformers capable of accepting longer sequences. Experiments on multiple benchmark datasets show that our proposed model can significantly improve upon general multilingual language models in both the cross-lingual retrieval setting and the cross-lingual transfer setting.

现有的跨语言检索研究不能很好地利用多语言BERT和XLM等大规模预训练语言模型。我们假设缺乏用于调优的跨语言段落级相关数据和缺乏查询文档风格的预训练是这个问题的关键因素。在本文中，我们引入了两个新的面向检索的预训练任务，以进一步对下游检索任务(如跨语言特设检索(CLIR)和跨语言问答(CLQA))的跨语言语言模型进行预训练。我们使用章节对齐从多语言维基百科构建远程监督数据，以支持面向检索的语言模型预训练。我们还建议通过使变形金刚能够接受更长的序列，直接对部分评估集合的语言模型进行微调。在多个基准数据集上的实验表明，该模型在跨语言检索和跨语言迁移两方面都比一般的多语言模型有显著的改进。

引用次数: 28

Semi-Open Information Extraction 半开放信息提取

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450029

Yu Bowen, Zhenyu Zhang, Jiawei Sheng, Tingwen Liu, Yubin Wang, Yu-Chih Wang, Bin Wang

Open Information Extraction (OIE), the task aimed at discovering all textual facts organized in the form of (subject, predicate, object) found within a sentence, has gained much attention recently. However, in some knowledge-driven applications such as question answering, we often have a target entity and hope to obtain its structured factual knowledge for better understanding, instead of extracting all possible facts aimlessly from the corpus. In this paper, we define a new task, namely Semi-Open Information Extraction (SOIE), to address this need. The goal of SOIE is to discover domain-independent facts towards a particular entity from general and diverse web text. To facilitate research on this new task, we propose a large-scale human-annotated benchmark called SOIED, consisting of 61,984 facts for 8,013 subject entities annotated on 24,000 Chinese sentences collected from the web search engine. In addition, we propose a novel unified model called USE for this task. First, we introduce subject-guided sequence as input to a pre-trained language model and normalize the hidden representations conditioned on the subject embedding to encode the sentence in a subject-aware manner. Second, we decompose SOIE into three uncoupled subtasks: predicate extraction, object extraction, and boundary alignment. They can all be formulated as the problem of table filling by forming a two-dimensional tag table based on a task-specific tagging scheme. Third, we introduce a collaborative learning strategy that enables the interactive relations among subtasks to be better exploited by explicitly exchanging informative clues. Finally, we evaluate USE and several strong baselines on our new dataset. Experimental results demonstrate the advantages of the proposed method and reveal insight for future improvement.

开放信息抽取(OIE)是一项旨在发现句子中所有以主语、谓语、宾语形式组织的文本事实的任务，近年来受到了广泛关注。然而，在一些知识驱动的应用中，如问答，我们经常有一个目标实体，并希望获得其结构化的事实知识，以便更好地理解，而不是漫无目的地从语料库中提取所有可能的事实。在本文中，我们定义了一个新的任务，即半开放信息提取(SOIE)，以解决这一需求。SOIE的目标是从一般和多样化的网络文本中发现针对特定实体的独立于领域的事实。为了促进这项新任务的研究，我们提出了一个大规模的人工标注基准，称为SOIED，它包括从网络搜索引擎收集的24,000个中文句子中标注的8,013个主题实体的61,984个事实。此外，我们提出了一种新的统一模型，称为USE。首先，我们引入主题引导序列作为预训练语言模型的输入，并对主题嵌入条件下的隐藏表示进行规范化，以主题感知的方式对句子进行编码。其次，我们将SOIE分解为三个不耦合的子任务:谓词提取、对象提取和边界对齐。它们都可以通过基于特定于任务的标记方案形成一个二维标记表来表示为表填充问题。第三，我们引入了一种协作学习策略，通过明确地交换信息线索，使子任务之间的交互关系得到更好的利用。最后，我们在新数据集上评估USE和几个强基线。实验结果证明了该方法的优点，并为今后的改进提供了新的思路。

{"title":"Semi-Open Information Extraction","authors":"Yu Bowen, Zhenyu Zhang, Jiawei Sheng, Tingwen Liu, Yubin Wang, Yu-Chih Wang, Bin Wang","doi":"10.1145/3442381.3450029","DOIUrl":"https://doi.org/10.1145/3442381.3450029","url":null,"abstract":"Open Information Extraction (OIE), the task aimed at discovering all textual facts organized in the form of (subject, predicate, object) found within a sentence, has gained much attention recently. However, in some knowledge-driven applications such as question answering, we often have a target entity and hope to obtain its structured factual knowledge for better understanding, instead of extracting all possible facts aimlessly from the corpus. In this paper, we define a new task, namely Semi-Open Information Extraction (SOIE), to address this need. The goal of SOIE is to discover domain-independent facts towards a particular entity from general and diverse web text. To facilitate research on this new task, we propose a large-scale human-annotated benchmark called SOIED, consisting of 61,984 facts for 8,013 subject entities annotated on 24,000 Chinese sentences collected from the web search engine. In addition, we propose a novel unified model called USE for this task. First, we introduce subject-guided sequence as input to a pre-trained language model and normalize the hidden representations conditioned on the subject embedding to encode the sentence in a subject-aware manner. Second, we decompose SOIE into three uncoupled subtasks: predicate extraction, object extraction, and boundary alignment. They can all be formulated as the problem of table filling by forming a two-dimensional tag table based on a task-specific tagging scheme. Third, we introduce a collaborative learning strategy that enables the interactive relations among subtasks to be better exploited by explicitly exchanging informative clues. Finally, we evaluate USE and several strong baselines on our new dataset. Experimental results demonstrate the advantages of the proposed method and reveal insight for future improvement.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123680568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of China 中国墙还是瑞士奶酪?中国防火长城的关键字过滤

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450076

Zachary Weinberg, Diogo Barradas, Nicolas Christin

The Great Firewall of China (GFW) prevents Chinese citizens from accessing online content deemed objectionable by the Chinese government. One way it does this is to search for forbidden keywords in unencrypted packet streams. When it detects them, it terminates the offending stream by injecting TCP RST packets, and blocks further traffic between the same two hosts for a few minutes. We report on a detailed investigation of the GFW’s application-layer understanding of HTTP. Forbidden keywords are only detected in certain locations within an HTTP request. Requests that contain the English word “search” are inspected for a longer list of forbidden keywords than requests without this word. The firewall can be evaded by bending the rules of the HTTP specification. We observe censorship based on the cleartext TLS Server Name Indication (SNI), but we find no evidence for bulk decryption of HTTPS. We also report on changes since 2014 in the contents of the forbidden keyword list. Over 85% of the forbidden keywords have been replaced since 2014, with the surviving terms referring to perennially sensitive topics. The new keywords refer to recent events and controversies. The GFW’s keyword list is not kept in sync with the blocklists used by Chinese chat clients.

中国防火长城(GFW)阻止中国公民访问被中国政府视为令人反感的在线内容。一种方法是在未加密的数据包流中搜索禁止的关键字。当它检测到它们时，它通过注入TCP RST数据包来终止违规流，并在几分钟内阻止相同两台主机之间的进一步流量。我们报告了GFW对HTTP的应用层理解的详细调查。禁止关键字仅在HTTP请求中的某些位置检测到。包含英文单词“search”的请求要比不包含这个单词的请求检查更长的禁用关键字列表。可以通过修改HTTP规范的规则来规避防火墙。我们观察到基于明文TLS服务器名称指示(SNI)的审查，但我们没有发现HTTPS批量解密的证据。我们还报告了自2014年以来禁用关键字列表内容的变化。自2014年以来，超过85%的被禁关键词已经被替换，幸存的术语指的是长期敏感的话题。新关键词指的是最近发生的事件和争议。GFW的关键字列表与中国聊天客户端使用的屏蔽列表不同步。

{"title":"Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of China","authors":"Zachary Weinberg, Diogo Barradas, Nicolas Christin","doi":"10.1145/3442381.3450076","DOIUrl":"https://doi.org/10.1145/3442381.3450076","url":null,"abstract":"The Great Firewall of China (GFW) prevents Chinese citizens from accessing online content deemed objectionable by the Chinese government. One way it does this is to search for forbidden keywords in unencrypted packet streams. When it detects them, it terminates the offending stream by injecting TCP RST packets, and blocks further traffic between the same two hosts for a few minutes. We report on a detailed investigation of the GFW’s application-layer understanding of HTTP. Forbidden keywords are only detected in certain locations within an HTTP request. Requests that contain the English word “search” are inspected for a longer list of forbidden keywords than requests without this word. The firewall can be evaded by bending the rules of the HTTP specification. We observe censorship based on the cleartext TLS Server Name Indication (SNI), but we find no evidence for bulk decryption of HTTPS. We also report on changes since 2014 in the contents of the forbidden keyword list. Over 85% of the forbidden keywords have been replaced since 2014, with the surviving terms referring to perennially sensitive topics. The new keywords refer to recent events and controversies. The GFW’s keyword list is not kept in sync with the blocklists used by Chinese chat clients.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Whale Watching in Inland Indonesia: Analyzing a Small, Remote, Internet-Based Community Cellular Network 在印度尼西亚内陆观鲸:分析一个小的，远程的，基于互联网的社区蜂窝网络

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449996

Matthew Johnson, Jenny T Liang, Michelle Lin, S. Singanamalla, Kurtis Heimerl

While only generating a minuscule percentage of global traffic, largely lost in the noise of large-scale analyses, remote rural networks are the physical frontier of the Internet today. Through tight integration with a local operator’s infrastructure, we gather a unique dataset to characterize and report a year of interaction between finances, utilization, and performance of a novel, remote, data-only Community LTE Network in Bokondini, Indonesia. With visibility to drill down to individual users, we find use highly unbalanced and the network supported by only a handful of relatively heavy consumers. 45% of users are offline more days than online, and the median user consumes only 77 MB per day online and 36 MB per day on average, limiting consumption by frequently “topping up” in small amounts. Outside video and social media, messaging and IP calling provided by over-the-top services like Facebook Messenger, QQ, and WhatsApp comprise a relatively large percentage of traffic consistently across both heavy and light users. Our analysis shows that Internet-only Community Cellular Networks can be profitable despite most users spending less than $1 USD/day, and offers insights into the unique properties of these networks.

虽然只产生很小比例的全球流量，很大程度上被大规模分析的噪音所淹没，但偏远的农村网络是当今互联网的物理前沿。通过与当地运营商基础设施的紧密集成，我们收集了一个独特的数据集，以描述和报告在印度尼西亚Bokondini的一个新型、远程、仅数据的社区LTE网络的财务、利用率和性能之间的相互作用。随着深入到个人用户的可见性，我们发现使用高度不平衡，网络只有少数相对较重的消费者支持。45%的用户离线的时间比在线的时间长，用户平均每天在线仅消耗77 MB，平均每天消耗36 MB，这限制了用户频繁“充值”少量的消费。在视频和社交媒体之外，Facebook Messenger、QQ和WhatsApp等顶级服务提供的信息和IP通话在重度和轻度用户中都占了相对较大的比例。我们的分析表明，尽管大多数用户每天花费不到1美元，但仅限互联网的社区蜂窝网络仍然可以盈利，并提供了对这些网络独特属性的见解。

{"title":"Whale Watching in Inland Indonesia: Analyzing a Small, Remote, Internet-Based Community Cellular Network","authors":"Matthew Johnson, Jenny T Liang, Michelle Lin, S. Singanamalla, Kurtis Heimerl","doi":"10.1145/3442381.3449996","DOIUrl":"https://doi.org/10.1145/3442381.3449996","url":null,"abstract":"While only generating a minuscule percentage of global traffic, largely lost in the noise of large-scale analyses, remote rural networks are the physical frontier of the Internet today. Through tight integration with a local operator’s infrastructure, we gather a unique dataset to characterize and report a year of interaction between finances, utilization, and performance of a novel, remote, data-only Community LTE Network in Bokondini, Indonesia. With visibility to drill down to individual users, we find use highly unbalanced and the network supported by only a handful of relatively heavy consumers. 45% of users are offline more days than online, and the median user consumes only 77 MB per day online and 36 MB per day on average, limiting consumption by frequently “topping up” in small amounts. Outside video and social media, messaging and IP calling provided by over-the-top services like Facebook Messenger, QQ, and WhatsApp comprise a relatively large percentage of traffic consistently across both heavy and light users. Our analysis shows that Internet-only Community Cellular Networks can be profitable despite most users spending less than $1 USD/day, and offers insights into the unique properties of these networks.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121478234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Information Elicitation from Rowdy Crowds 从喧闹人群中获取信息

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449840

G. Schoenebeck, Fang-Yi Yu, Yichi Zhang

We initiate the study of information elicitation mechanisms for a crowd containing both self-interested agents, who respond to incentives, and adversarial agents, who may collude to disrupt the system. Our mechanisms work in the peer prediction setting where ground truth need not be accessible to the mechanism or even exist. We provide a meta-mechanism that reduces the design of peer prediction mechanisms to a related robust learning problem. The resulting mechanisms are ϵ-informed truthful, which means truth-telling is the highest paid ϵ-Bayesian Nash equilibrium (up to ϵ-error) and pays strictly more than uninformative equilibria. The value of ϵ depends on the properties of robust learning algorithm, and typically limits to 0 as the number of tasks and agents increase. We show how to use our meta-mechanism to design mechanisms with provable guarantees in two important crowdsourcing settings even when some agents are self-interested and others are adversarial.

我们开始研究一个群体的信息激发机制，这个群体既包含对激励做出反应的自利主体，也包含可能串通破坏系统的对抗性主体。我们的机制在同伴预测设置中起作用，在这种设置中，基础真相不需要被机制访问，甚至不需要存在。我们提供了一种元机制，将同伴预测机制的设计简化为相关的鲁棒学习问题。由此产生的机制是ϵ-informed真实的，这意味着说真话是收入最高的ϵ-Bayesian纳什均衡(最高ϵ-error)，并且比不提供信息的均衡付出更多。λ的值取决于鲁棒学习算法的特性，通常随着任务和代理数量的增加而限制为0。我们展示了如何在两个重要的众包设置中使用我们的元机制来设计具有可证明保证的机制，即使一些代理是自利的，而另一些代理是敌对的。

引用次数: 7

Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection 选择:一种基于gnn的欺诈检测不平衡学习方法

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449989

Yang Liu, Xiang Ao, Zidi Qin, Jianfeng Chi, Jinghua Feng, Hao Yang, Qing He

Graph-based fraud detection approaches have escalated lots of attention recently due to the abundant relational information of graph-structured data, which may be beneficial for the detection of fraudsters. However, the GNN-based algorithms could fare poorly when the label distribution of nodes is heavily skewed, and it is common in sensitive areas such as financial fraud, etc. To remedy the class imbalance problem of graph-based fraud detection, we propose a Pick and Choose Graph Neural Network (PC-GNN for short) for imbalanced supervised learning on graphs. First, nodes and edges are picked with a devised label-balanced sampler to construct sub-graphs for mini-batch training. Next, for each node in the sub-graph, the neighbor candidates are chosen by a proposed neighborhood sampler. Finally, information from the selected neighbors and different relations are aggregated to obtain the final representation of a target node. Experiments on both benchmark and real-world graph-based fraud detection tasks demonstrate that PC-GNN apparently outperforms state-of-the-art baselines.

基于图的欺诈检测方法近年来受到越来越多的关注，因为图结构数据中含有丰富的关系信息，这可能有利于欺诈者的检测。然而，当节点的标签分布严重偏斜时，基于gnn的算法可能会表现不佳，并且在金融欺诈等敏感领域很常见。为了解决基于图的欺诈检测中的类不平衡问题，我们提出了一种用于图上不平衡监督学习的Pick and Choose图神经网络(PC-GNN)。首先，使用设计的标签平衡采样器选择节点和边，构建用于小批量训练的子图。接下来，对于子图中的每个节点，由提议的邻域采样器选择邻居候选节点。最后，聚合来自所选邻居和不同关系的信息，以获得目标节点的最终表示。在基准测试和现实世界基于图形的欺诈检测任务上的实验表明，PC-GNN明显优于最先进的基线。

引用次数: 130

Hashing-Accelerated Graph Neural Networks for Link Prediction 用于链路预测的哈希加速图神经网络

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449884

Wei Wu, Bin Li, Chuan Luo, W. Nejdl

Networks are ubiquitous in the real world. Link prediction, as one of the key problems for network-structured data, aims to predict whether there exists a link between two nodes. The traditional approaches are based on the explicit similarity computation between the compact node representation by embedding each node into a low-dimensional space. In order to efficiently handle the intensive similarity computation in link prediction, the hashing technique has been successfully used to produce the node representation in the Hamming space. However, the hashing-based link prediction algorithms face accuracy loss from the randomized hashing techniques or inefficiency from the learning to hash techniques in the embedding process. Currently, the Graph Neural Network (GNN) framework has been widely applied to the graph-related tasks in an end-to-end manner, but it commonly requires substantial computational resources and memory costs due to massive parameter learning, which makes the GNN-based algorithms impractical without the help of a powerful workhorse. In this paper, we propose a simple and effective model called #GNN, which balances the trade-off between accuracy and efficiency. #GNN is able to efficiently acquire node representation in the Hamming space for link prediction by exploiting the randomized hashing technique to implement message passing and capture high-order proximity in the GNN framework. Furthermore, we characterize the discriminative power of #GNN in probability. The extensive experimental results demonstrate that the proposed #GNN algorithm achieves accuracy comparable to the learning-based algorithms and outperforms the randomized algorithm, while running significantly faster than the learning-based algorithms. Also, the proposed algorithm shows excellent scalability on a large-scale network with the limited resources.

网络在现实世界中无处不在。链路预测是网络结构化数据的关键问题之一，其目的是预测两个节点之间是否存在链路。传统的方法是通过将每个节点嵌入到低维空间中，在紧凑节点表示之间进行显式相似性计算。为了有效地处理链路预测中密集的相似性计算，成功地利用哈希技术在汉明空间中生成节点表示。然而，基于哈希的链路预测算法在嵌入过程中面临随机哈希技术带来的精度损失或学习哈希技术带来的效率低下。目前，图形神经网络(GNN)框架已被广泛应用于端到端与图形相关的任务，但由于需要大量的参数学习，通常需要大量的计算资源和内存成本，这使得基于GNN的算法在没有强大的工作机器的帮助下不切实际。在本文中，我们提出了一个简单而有效的模型，称为#GNN，它平衡了精度和效率之间的权衡。#GNN能够通过利用随机散列技术实现消息传递并捕获GNN框架中的高阶接近度，有效地获取汉明空间中的节点表示以进行链路预测。此外，我们用概率来表征#GNN的判别能力。大量的实验结果表明，所提出的#GNN算法达到了与基于学习的算法相当的精度，优于随机化算法，同时运行速度明显快于基于学习的算法。此外，该算法在资源有限的大规模网络中具有良好的可扩展性。

{"title":"Hashing-Accelerated Graph Neural Networks for Link Prediction","authors":"Wei Wu, Bin Li, Chuan Luo, W. Nejdl","doi":"10.1145/3442381.3449884","DOIUrl":"https://doi.org/10.1145/3442381.3449884","url":null,"abstract":"Networks are ubiquitous in the real world. Link prediction, as one of the key problems for network-structured data, aims to predict whether there exists a link between two nodes. The traditional approaches are based on the explicit similarity computation between the compact node representation by embedding each node into a low-dimensional space. In order to efficiently handle the intensive similarity computation in link prediction, the hashing technique has been successfully used to produce the node representation in the Hamming space. However, the hashing-based link prediction algorithms face accuracy loss from the randomized hashing techniques or inefficiency from the learning to hash techniques in the embedding process. Currently, the Graph Neural Network (GNN) framework has been widely applied to the graph-related tasks in an end-to-end manner, but it commonly requires substantial computational resources and memory costs due to massive parameter learning, which makes the GNN-based algorithms impractical without the help of a powerful workhorse. In this paper, we propose a simple and effective model called #GNN, which balances the trade-off between accuracy and efficiency. #GNN is able to efficiently acquire node representation in the Hamming space for link prediction by exploiting the randomized hashing technique to implement message passing and capture high-order proximity in the GNN framework. Furthermore, we characterize the discriminative power of #GNN in probability. The extensive experimental results demonstrate that the proposed #GNN algorithm achieves accuracy comparable to the learning-based algorithms and outperforms the randomized algorithm, while running significantly faster than the learning-based algorithms. Also, the proposed algorithm shows excellent scalability on a large-scale network with the limited resources.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133029945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25