首页 > 最新文献

Proceedings of the Web Conference 2021最新文献

英文 中文
HGCF: Hyperbolic Graph Convolution Networks for Collaborative Filtering 协同过滤的双曲图卷积网络
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450101
Jianing Sun, Zhaoyue Cheng, S. Zuberi, Felipe Pérez, M. Volkovs
Hyperbolic spaces offer a rich setup to learn embeddings with superior properties that have been leveraged in areas such as computer vision, natural language processing and computational biology. Recently, several hyperbolic approaches have been proposed to learn robust representations for users and items in the recommendation setting. However, these approaches don’t capture the higher order relationships that typically exist in the recommendation domain. Graph convolutional neural networks (GCNs) on the other hand excel at capturing higher order information by applying multiple levels of aggregation to local representations. In this paper we combine these frameworks in a novel way, by proposing a hyperbolic GCN model for collaborative filtering. We demonstrate that our model can be effectively learned with a margin ranking loss, and show that hyperbolic space has desirable properties under the rank margin setting. At test time, inference in our model is done using the hyperbolic distance which preserves the structure of the learned space. We conduct extensive empirical analysis on three public benchmarks and compare against a large set of baselines. Our approach achieves highly competitive results and outperforms leading baselines including the Euclidean GCN counterpart. We further study the properties of the learned hyperbolic embeddings and show that they offer meaningful insights into the data. Full code for this work is available here: https://github.com/layer6ai-labs/HGCF.
双曲空间为学习在计算机视觉、自然语言处理和计算生物学等领域中利用的具有优越特性的嵌入提供了丰富的设置。最近,人们提出了几种双曲线方法来学习推荐设置中用户和项目的鲁棒表示。然而,这些方法不能捕获推荐领域中通常存在的高阶关系。另一方面,图卷积神经网络(GCNs)擅长通过对局部表示应用多级聚合来捕获高阶信息。在本文中,我们以一种新颖的方式结合了这些框架,提出了一个双曲GCN模型用于协同过滤。我们证明了我们的模型可以有效地学习与边际排序损失,并表明双曲空间具有理想的性质下的排名边际设置。在测试时,我们的模型使用双曲距离进行推理,该距离保留了学习空间的结构。我们对三个公共基准进行了广泛的实证分析,并与大量基线进行了比较。我们的方法取得了极具竞争力的结果,并优于领先的基线,包括欧几里得GCN对应。我们进一步研究了学习到的双曲嵌入的性质,并表明它们为数据提供了有意义的见解。完整的代码可以在这里找到:https://github.com/layer6ai-labs/HGCF。
{"title":"HGCF: Hyperbolic Graph Convolution Networks for Collaborative Filtering","authors":"Jianing Sun, Zhaoyue Cheng, S. Zuberi, Felipe Pérez, M. Volkovs","doi":"10.1145/3442381.3450101","DOIUrl":"https://doi.org/10.1145/3442381.3450101","url":null,"abstract":"Hyperbolic spaces offer a rich setup to learn embeddings with superior properties that have been leveraged in areas such as computer vision, natural language processing and computational biology. Recently, several hyperbolic approaches have been proposed to learn robust representations for users and items in the recommendation setting. However, these approaches don’t capture the higher order relationships that typically exist in the recommendation domain. Graph convolutional neural networks (GCNs) on the other hand excel at capturing higher order information by applying multiple levels of aggregation to local representations. In this paper we combine these frameworks in a novel way, by proposing a hyperbolic GCN model for collaborative filtering. We demonstrate that our model can be effectively learned with a margin ranking loss, and show that hyperbolic space has desirable properties under the rank margin setting. At test time, inference in our model is done using the hyperbolic distance which preserves the structure of the learned space. We conduct extensive empirical analysis on three public benchmarks and compare against a large set of baselines. Our approach achieves highly competitive results and outperforms leading baselines including the Euclidean GCN counterpart. We further study the properties of the learned hyperbolic embeddings and show that they offer meaningful insights into the data. Full code for this work is available here: https://github.com/layer6ai-labs/HGCF.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132763326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification 基于标签关注的极端分类图神经网络
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449937
Deepak Saini, A. Jain, Kushal Dave, Jian Jiao, Amit Singh, Ruofei Zhang, M. Varma
This paper develops the GalaXC algorithm for Extreme Classification, where the task is to annotate a document with the most relevant subset of labels from an extremely large label set. Extreme classification has been successfully applied to several real world web-scale applications such as web search, product recommendation, query rewriting, etc. GalaXC identifies two critical deficiencies in leading extreme classification algorithms. First, existing approaches generally assume that documents and labels reside in disjoint sets, even though in several applications, labels and documents cohabit the same space. Second, several approaches, albeit scalable, do not utilize various forms of metadata offered by applications, such as label text and label correlations. To remedy these, GalaXC presents a framework that enables collaborative learning over joint document-label graphs at massive scales, in a way that naturally allows various auxiliary sources of information, including label metadata, to be incorporated. GalaXC also introduces a novel label-wise attention mechanism to meld high-capacity extreme classifiers with its framework. An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4 × V100 GPUs. This allowed GalaXC to not only scale to applications with several millions of labels, but also be up to 18% more accurate than leading deep extreme classifiers, while being upto 2-50 × faster to train and 10 × faster to predict on benchmark datasets. GalaXC is particularly well-suited to warm-start scenarios where predictions need to be made on data points with partially revealed label sets, and was found to be up to 25% more accurate than extreme classification algorithms specifically designed for warm start settings. In A/B tests conducted on the Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively. Code for GalaXC is available at https://github.com/Extreme-classification/GalaXC
本文开发了用于极端分类的GalaXC算法,该算法的任务是从一个极大的标签集中使用最相关的标签子集来注释文档。极端分类已经成功地应用于几个现实世界的网络规模应用,如网络搜索、产品推荐、查询重写等。GalaXC发现了两个主要极端分类算法的关键缺陷。首先,现有的方法通常假设文档和标签位于不相交的集合中,即使在几个应用程序中,标签和文档共存于同一空间。第二,有几种方法虽然是可伸缩的,但不利用应用程序提供的各种形式的元数据,如标签文本和标签相关性。为了解决这些问题,GalaXC提出了一个框架,可以在大规模的联合文档标签图上进行协作学习,以一种自然允许各种辅助信息源(包括标签元数据)被合并的方式。GalaXC还引入了一种新颖的标签注意机制,将高容量极端分类器与其框架融合在一起。提出了一种有效的端到端GalaXC实现,在4 × V100 gpu上,可以在不到100小时的时间内对具有50M个标签和97M个训练文档的数据集进行训练。这使得GalaXC不仅可以扩展到具有数百万个标签的应用程序,而且比领先的深度极端分类器的准确率提高了18%,同时训练速度提高了2-50倍,在基准数据集上的预测速度提高了10倍。GalaXC特别适合热启动场景,在这种场景中,需要对部分显示标签集的数据点进行预测,并且被发现比专门为热启动设置设计的极端分类算法准确率高出25%。在Bing搜索引擎上进行的A/B测试中,GalaXC可以将点击率(Click Yield, CY)和覆盖率分别提高1.52%和1.11%。GalaXC的代码可在https://github.com/Extreme-classification/GalaXC上获得
{"title":"GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification","authors":"Deepak Saini, A. Jain, Kushal Dave, Jian Jiao, Amit Singh, Ruofei Zhang, M. Varma","doi":"10.1145/3442381.3449937","DOIUrl":"https://doi.org/10.1145/3442381.3449937","url":null,"abstract":"This paper develops the GalaXC algorithm for Extreme Classification, where the task is to annotate a document with the most relevant subset of labels from an extremely large label set. Extreme classification has been successfully applied to several real world web-scale applications such as web search, product recommendation, query rewriting, etc. GalaXC identifies two critical deficiencies in leading extreme classification algorithms. First, existing approaches generally assume that documents and labels reside in disjoint sets, even though in several applications, labels and documents cohabit the same space. Second, several approaches, albeit scalable, do not utilize various forms of metadata offered by applications, such as label text and label correlations. To remedy these, GalaXC presents a framework that enables collaborative learning over joint document-label graphs at massive scales, in a way that naturally allows various auxiliary sources of information, including label metadata, to be incorporated. GalaXC also introduces a novel label-wise attention mechanism to meld high-capacity extreme classifiers with its framework. An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4 × V100 GPUs. This allowed GalaXC to not only scale to applications with several millions of labels, but also be up to 18% more accurate than leading deep extreme classifiers, while being upto 2-50 × faster to train and 10 × faster to predict on benchmark datasets. GalaXC is particularly well-suited to warm-start scenarios where predictions need to be made on data points with partially revealed label sets, and was found to be up to 25% more accurate than extreme classification algorithms specifically designed for warm start settings. In A/B tests conducted on the Bing search engine, GalaXC could improve the Click Yield (CY) and coverage by 1.52% and 1.11% respectively. Code for GalaXC is available at https://github.com/Extreme-classification/GalaXC","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"71 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131653856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Pivot-based Candidate Retrieval for Cross-lingual Entity Linking 基于数据轴的跨语言实体链接候选检索
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449852
Qian Liu, Xiubo Geng, Jie Lu, Daxin Jiang
Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon-based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic-based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin.
候选实体检索在跨语言实体链接中起着至关重要的作用。在XEL中,实体候选检索需要在给定源语言的句子或问题(即提及)中的一段文本的情况下,从目标语言的大型知识图谱中检索可信的候选实体列表。现有的研究主要分为两大类:基于词汇的方法和基于语义的方法。基于词典的方法通常创建跨语言和提及实体的词典,这是有效的,但严重依赖于双语资源(例如维基百科中的跨语言链接)。基于语义的方法将不同语言的提及和实体映射到统一的嵌入空间,减少了对大型双语词典的依赖。然而,它的有效性受到固定长度向量表示能力的限制。在本文中,我们提出了一种基于支点的方法,它继承了上述两种方法的优点,同时避免了它们的局限性。它以一组似是而非的目标语提及作为支点来弥合两种类型的差距:跨语言差距和提及-实体差距。具体来说,它首先通过跨语言语义检索和选择机制将源语言中的提及转换为目标语言中可信提及的中介集,然后通过词汇检索根据生成的提及检索候选实体。该方法仅依赖于一个小型的双语词词典,充分利用了词汇和语义匹配的优势。在跨越11种语言的两个具有挑战性的跨语言实体链接数据集上的实验结果表明,基于支点的方法在很大程度上优于基于词典和基于语义的方法。
{"title":"Pivot-based Candidate Retrieval for Cross-lingual Entity Linking","authors":"Qian Liu, Xiubo Geng, Jie Lu, Daxin Jiang","doi":"10.1145/3442381.3449852","DOIUrl":"https://doi.org/10.1145/3442381.3449852","url":null,"abstract":"Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon-based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic-based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"281 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122940370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
LChecker: Detecting Loose Comparison Bugs in PHP LChecker:检测PHP中的松散比较错误
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449826
Penghui Li, W. Meng
Weakly-typed languages such as PHP support loosely comparing two operands by implicitly converting their types and values. Such a language feature is widely used but can also pose severe security threats. In certain conditions, loose comparisons can cause unexpected results, leading to authentication bypass and other functionality problems. In this paper, we present the first in-depth study of such loose comparison bugs. We develop LChecker, a system to statically detect PHP loose comparison bugs. It employs a context-sensitive inter-procedural data-flow analysis together with several new techniques. We also enhance the PHP interpreter to help dynamically validate the detected bugs. Our evaluation shows that LChecker can both effectively and efficiently detect PHP loose comparison bugs with a reasonably low false-positive rate. It also successfully detected all previously known bugs in our evaluation dataset with no false negative. Using LChecker, we discovered 42 new loose comparison bugs and were assigned 9 new CVE IDs.
PHP等弱类型语言通过隐式转换两个操作数的类型和值来支持松散比较。这种语言特性被广泛使用,但也可能带来严重的安全威胁。在某些情况下,松散比较可能会导致意想不到的结果,从而导致身份验证绕过和其他功能问题。在本文中,我们首次对这种松散比较错误进行了深入研究。我们开发了LChecker,一个静态检测PHP松散比较错误的系统。它采用上下文敏感的过程间数据流分析以及几种新技术。我们还增强了PHP解释器,以帮助动态验证检测到的错误。我们的评估表明,LChecker能够以相当低的假阳性率有效地检测PHP松散比较错误。它还成功地检测到我们的评估数据集中所有以前已知的错误,没有假阴性。使用LChecker,我们发现了42个新的松散比较错误,并分配了9个新的CVE id。
{"title":"LChecker: Detecting Loose Comparison Bugs in PHP","authors":"Penghui Li, W. Meng","doi":"10.1145/3442381.3449826","DOIUrl":"https://doi.org/10.1145/3442381.3449826","url":null,"abstract":"Weakly-typed languages such as PHP support loosely comparing two operands by implicitly converting their types and values. Such a language feature is widely used but can also pose severe security threats. In certain conditions, loose comparisons can cause unexpected results, leading to authentication bypass and other functionality problems. In this paper, we present the first in-depth study of such loose comparison bugs. We develop LChecker, a system to statically detect PHP loose comparison bugs. It employs a context-sensitive inter-procedural data-flow analysis together with several new techniques. We also enhance the PHP interpreter to help dynamically validate the detected bugs. Our evaluation shows that LChecker can both effectively and efficiently detect PHP loose comparison bugs with a reasonably low false-positive rate. It also successfully detected all previously known bugs in our evaluation dataset with no false negative. Using LChecker, we discovered 42 new loose comparison bugs and were assigned 9 new CVE IDs.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123847170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
What do You Mean? Interpreting Image Classification with Crowdsourced Concept Extraction and Analysis 你是什么意思?基于众包概念提取与分析的图像分类解释
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450069
Agathe Balayn, Panagiotis Soilis, C. Lofi, Jie Yang, A. Bozzon
Global interpretability is a vital requirement for image classification applications. Existing interpretability methods mainly explain a model behavior by identifying salient image patches, which require manual efforts from users to make sense of, and also do not typically support model validation with questions that investigate multiple visual concepts. In this paper, we introduce a scalable human-in-the-loop approach for global interpretability. Salient image areas identified by local interpretability methods are annotated with semantic concepts, which are then aggregated into a tabular representation of images to facilitate automatic statistical analysis of model behavior. We show that this approach answers interpretability needs for both model validation and exploration, and provides semantically more diverse, informative, and relevant explanations while still allowing for scalable and cost-efficient execution.
全局可解释性是图像分类应用的重要要求。现有的可解释性方法主要通过识别显著图像补丁来解释模型行为,这些补丁需要用户手工操作才能理解,并且通常不支持用调查多个视觉概念的问题来验证模型。在本文中,我们引入了一种可扩展的人在环方法来实现全局可解释性。通过局部可解释性方法识别的显著图像区域用语义概念进行注释,然后将其聚合成图像的表格表示,以方便模型行为的自动统计分析。我们表明,这种方法满足了模型验证和探索的可解释性需求,并提供了语义上更多样化、信息量更大、更相关的解释,同时仍然允许可扩展和经济高效的执行。
{"title":"What do You Mean? Interpreting Image Classification with Crowdsourced Concept Extraction and Analysis","authors":"Agathe Balayn, Panagiotis Soilis, C. Lofi, Jie Yang, A. Bozzon","doi":"10.1145/3442381.3450069","DOIUrl":"https://doi.org/10.1145/3442381.3450069","url":null,"abstract":"Global interpretability is a vital requirement for image classification applications. Existing interpretability methods mainly explain a model behavior by identifying salient image patches, which require manual efforts from users to make sense of, and also do not typically support model validation with questions that investigate multiple visual concepts. In this paper, we introduce a scalable human-in-the-loop approach for global interpretability. Salient image areas identified by local interpretability methods are annotated with semantic concepts, which are then aggregated into a tabular representation of images to facilitate automatic statistical analysis of model behavior. We show that this approach answers interpretability needs for both model validation and exploration, and provides semantically more diverse, informative, and relevant explanations while still allowing for scalable and cost-efficient execution.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124056670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Attent: Active Attributed Network Alignment 注意事项:主动属性网络对齐
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449886
Qinghai Zhou, Liangyue Li, Xintao Wu, Nan Cao, Lei Ying, Hanghang Tong
Network alignment finds node correspondences across multiple networks, where the alignment accuracy is of crucial importance because of its profound impact on downstream applications. The vast majority of existing works focus on how to best utilize the topology and attribute information of the input networks as well as the anchor links when available. Nonetheless, it has not been well studied on how to boost the alignment performance through actively obtaining high-quality and informative anchor links, with a few exceptions. The sparse literature on active network alignment introduces the human in the loop to label some seed node correspondence (i.e., anchor links), which are informative from the perspective of querying the most uncertain node given few potential matchings. However, the direct influence of the intrinsic network attribute information on the alignment results has largely remained unknown. In this paper, we tackle this challenge and propose an active network alignment method (Attent) to identify the best nodes to query. The key idea of the proposed method is to leverage effective and efficient influence functions defined over the alignment solution to evaluate the goodness of the candidate nodes for query. Our proposed query strategy bears three distinct advantages, including (1) effectiveness, being able to accurately quantify the influence of the candidate nodes on the alignment results; (2) efficiency, scaling linearly with 15 − 17 × speed-up over the straight-forward implementation without any quality loss; (3) generality, consistently improving alignment performance of a variety of network alignment algorithms.
网络对齐查找跨多个网络的节点对应,其中对齐精度至关重要,因为它对下游应用程序有深远的影响。现有的绝大多数工作都集中在如何最好地利用输入网络的拓扑和属性信息以及可用的锚链接。然而,除了少数例外,如何通过主动获取高质量和信息丰富的锚链接来提高定位性能还没有得到很好的研究。在主动网络对齐的稀疏文献中,引入了环路中的人来标记一些种子节点对应(即锚链接),从在潜在匹配很少的情况下查询最不确定的节点的角度来看,这是有信息的。然而,内部网络属性信息对对齐结果的直接影响在很大程度上仍然未知。在本文中,我们解决了这一挑战,并提出了一种主动网络对齐方法(attention)来识别查询的最佳节点。该方法的关键思想是利用在对齐解决方案上定义的有效和高效的影响函数来评估查询候选节点的优劣。我们提出的查询策略具有三个明显的优势,包括:(1)有效性,能够准确地量化候选节点对对齐结果的影响;(2)效率,在没有任何质量损失的情况下,与直接实现相比,线性扩展15 - 17倍的加速;(3)通用性,不断提高各种网络对准算法的对准性能。
{"title":"Attent: Active Attributed Network Alignment","authors":"Qinghai Zhou, Liangyue Li, Xintao Wu, Nan Cao, Lei Ying, Hanghang Tong","doi":"10.1145/3442381.3449886","DOIUrl":"https://doi.org/10.1145/3442381.3449886","url":null,"abstract":"Network alignment finds node correspondences across multiple networks, where the alignment accuracy is of crucial importance because of its profound impact on downstream applications. The vast majority of existing works focus on how to best utilize the topology and attribute information of the input networks as well as the anchor links when available. Nonetheless, it has not been well studied on how to boost the alignment performance through actively obtaining high-quality and informative anchor links, with a few exceptions. The sparse literature on active network alignment introduces the human in the loop to label some seed node correspondence (i.e., anchor links), which are informative from the perspective of querying the most uncertain node given few potential matchings. However, the direct influence of the intrinsic network attribute information on the alignment results has largely remained unknown. In this paper, we tackle this challenge and propose an active network alignment method (Attent) to identify the best nodes to query. The key idea of the proposed method is to leverage effective and efficient influence functions defined over the alignment solution to evaluate the goodness of the candidate nodes for query. Our proposed query strategy bears three distinct advantages, including (1) effectiveness, being able to accurately quantify the influence of the candidate nodes on the alignment results; (2) efficiency, scaling linearly with 15 − 17 × speed-up over the straight-forward implementation without any quality loss; (3) generality, consistently improving alignment performance of a variety of network alignment algorithms.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124555927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
XY-Sketch: on Sketching Data Streams at Web Scale XY-Sketch:在网络规模上绘制数据流
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449984
Yongqiang Liu, Xike Xie
Conventional sketching methods on counting stream item frequencies use hash functions for mapping data items to a concise structure, e.g., a two-dimensional array, at the expense of overcounting due to hashing collisions. Despite the popularity, however, the accumulated errors originated in hashing collisions deteriorate the sketching accuracies at the rapid pace of data increasing, which poses a great challenge to sketch big data streams at web scale. In this paper, we propose a novel structure, called XY-sketch, which estimates the frequency of a data item by estimating the probability of this item appearing in the data stream. The framework associated with XY-sketch consists of two phases, namely decomposition and recomposition phases. A data item is split into a set of compactly stored basic elements, which can be stringed up in a probabilistic manner for query evaluation during the recomposition phase. Throughout, we conduct optimization under space constraints and detailed theoretical analysis. Experiments on both real and synthetic datasets are done to show the superior scalability on sketching large-scale streams. Remarkably, XY-sketch is orders of magnitudes more accurate than existing solutions, when the space budget is small.
传统的计算流项频率的草图方法使用哈希函数将数据项映射到一个简洁的结构,例如,一个二维数组,代价是由于哈希冲突而导致计数过多。然而,在数据快速增长的情况下,哈希碰撞产生的累积误差会降低绘制精度,这对web规模的大数据流绘制提出了很大的挑战。在本文中,我们提出了一种新的结构,称为XY-sketch,它通过估计数据项在数据流中出现的概率来估计数据项的频率。与XY-sketch相关的框架包括两个阶段,即分解和重组阶段。数据项被分割成一组紧凑存储的基本元素,这些元素可以以概率方式串起来,以便在重组阶段进行查询计算。在整个过程中,我们在空间约束下进行了优化,并进行了详细的理论分析。在真实数据集和合成数据集上进行了实验,证明了该方法在绘制大规模流图方面具有优越的可扩展性。值得注意的是,当空间预算很小时,XY-sketch比现有的解决方案精确了几个数量级。
{"title":"XY-Sketch: on Sketching Data Streams at Web Scale","authors":"Yongqiang Liu, Xike Xie","doi":"10.1145/3442381.3449984","DOIUrl":"https://doi.org/10.1145/3442381.3449984","url":null,"abstract":"Conventional sketching methods on counting stream item frequencies use hash functions for mapping data items to a concise structure, e.g., a two-dimensional array, at the expense of overcounting due to hashing collisions. Despite the popularity, however, the accumulated errors originated in hashing collisions deteriorate the sketching accuracies at the rapid pace of data increasing, which poses a great challenge to sketch big data streams at web scale. In this paper, we propose a novel structure, called XY-sketch, which estimates the frequency of a data item by estimating the probability of this item appearing in the data stream. The framework associated with XY-sketch consists of two phases, namely decomposition and recomposition phases. A data item is split into a set of compactly stored basic elements, which can be stringed up in a probabilistic manner for query evaluation during the recomposition phase. Throughout, we conduct optimization under space constraints and detailed theoretical analysis. Experiments on both real and synthetic datasets are done to show the superior scalability on sketching large-scale streams. Remarkably, XY-sketch is orders of magnitudes more accurate than existing solutions, when the space budget is small.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127807576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ColChain: Collaborative Linked Data Networks ColChain:协作关联数据网络
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450037
Christian Aebeloe, Gabriela Montoya, K. Hose
One of the major obstacles that currently prevents the Semantic Web from exploiting its full potential is that the data it provides access to is sometimes not available or outdated. The reason is rooted deep within its architecture that relies on data providers to keep the data available, queryable, and up-to-date at all times – an expectation that many data providers in reality cannot live up to for an extended (or infinite) period of time. Hence, decentralized architectures have recently been proposed that use replication to keep the data available in case the data provider fails. Although this increases availability, it does not help keeping the data up-to-date or allow users to query and access previous versions of a dataset. In this paper, we therefore propose ColChain (COLlaborative knowledge CHAINs), a novel decentralized architecture based on blockchains that not only lowers the burden for the data providers but at the same time also allows users to propose updates to faulty or outdated data, trace updates back to their origin, and query older versions of the data. Our extensive experiments show that ColChain reaches these goals while achieving query processing performance comparable to the state of the art.
目前阻碍语义网充分发挥其潜力的主要障碍之一是,它提供访问的数据有时不可用或过时。其原因深深植根于它的体系结构中,该体系结构依赖于数据提供者来保持数据的可用性、可查询性和随时更新——这是许多数据提供者在长时间(或无限时间)内无法实现的期望。因此,最近有人提出使用复制来保持数据可用的分散架构,以防数据提供者出现故障。虽然这增加了可用性,但它无助于保持数据的最新或允许用户查询和访问数据集的以前版本。因此,在本文中,我们提出了ColChain(协作知识链),这是一种基于区块链的新型去中心化架构,它不仅降低了数据提供者的负担,同时还允许用户对错误或过时的数据提出更新建议,追溯更新的来源,并查询旧版本的数据。我们的大量实验表明,ColChain达到了这些目标,同时实现了与最先进的查询处理性能相当的性能。
{"title":"ColChain: Collaborative Linked Data Networks","authors":"Christian Aebeloe, Gabriela Montoya, K. Hose","doi":"10.1145/3442381.3450037","DOIUrl":"https://doi.org/10.1145/3442381.3450037","url":null,"abstract":"One of the major obstacles that currently prevents the Semantic Web from exploiting its full potential is that the data it provides access to is sometimes not available or outdated. The reason is rooted deep within its architecture that relies on data providers to keep the data available, queryable, and up-to-date at all times – an expectation that many data providers in reality cannot live up to for an extended (or infinite) period of time. Hence, decentralized architectures have recently been proposed that use replication to keep the data available in case the data provider fails. Although this increases availability, it does not help keeping the data up-to-date or allow users to query and access previous versions of a dataset. In this paper, we therefore propose ColChain (COLlaborative knowledge CHAINs), a novel decentralized architecture based on blockchains that not only lowers the burden for the data providers but at the same time also allows users to propose updates to faulty or outdated data, trace updates back to their origin, and query older versions of the data. Our extensive experiments show that ColChain reaches these goals while achieving query processing performance comparable to the state of the art.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128938699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
WebSocket Adoption and the Landscape of the Real-Time Web WebSocket的采用和实时网络的前景
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450063
Paul Murley, Zane Ma, Joshua Mason, Michael Bailey, Amin Kharraz
Developers are increasingly deploying web applications which require real-time bidirectional updates, a use case which does not naturally align with the traditional client-server architecture of the web. Many solutions have arisen to address this need over the preceding decades, including HTTP polling, Server-Sent Events, and WebSockets. This paper investigates this ecosystem and reports on the prevalence, benefits, and drawbacks of these technologies, with a particular focus on the adoption of WebSockets. We crawl the Tranco Top 1 Million websites to build a dataset for studying real-time updates in the wild. We find that HTTP Polling remains significantly more common than WebSockets, and WebSocket adoption appears to have stagnated in the past two to three years. We investigate some of the possible reasons for this decrease in the rate of adoption, and we contrast the adoption process to that of other web technologies. Our findings further suggest that even when WebSockets are employed, the prescribed best practices for securing them are often disregarded. The dataset is made available in the hopes that it may help inform the development of future real-time solutions for the web.
开发人员越来越多地部署需要实时双向更新的web应用程序,这种用例与传统的web客户端-服务器架构并不自然一致。在过去的几十年里,出现了许多解决方案来满足这一需求,包括HTTP轮询、服务器发送事件和WebSockets。本文调查了这个生态系统,并报告了这些技术的流行、优点和缺点,特别关注了WebSockets的采用。我们抓取了Tranco排名前100万的网站,建立了一个数据集,用于研究野外的实时更新。我们发现HTTP轮询仍然比WebSocket更普遍,WebSocket的采用在过去的两到三年中似乎停滞不前。我们调查了这种采用率下降的一些可能原因,并将其采用过程与其他web技术进行了对比。我们的研究结果进一步表明,即使使用了WebSockets,规定的保护它们的最佳实践也经常被忽视。提供这个数据集的目的是希望它可以为未来网络实时解决方案的开发提供信息。
{"title":"WebSocket Adoption and the Landscape of the Real-Time Web","authors":"Paul Murley, Zane Ma, Joshua Mason, Michael Bailey, Amin Kharraz","doi":"10.1145/3442381.3450063","DOIUrl":"https://doi.org/10.1145/3442381.3450063","url":null,"abstract":"Developers are increasingly deploying web applications which require real-time bidirectional updates, a use case which does not naturally align with the traditional client-server architecture of the web. Many solutions have arisen to address this need over the preceding decades, including HTTP polling, Server-Sent Events, and WebSockets. This paper investigates this ecosystem and reports on the prevalence, benefits, and drawbacks of these technologies, with a particular focus on the adoption of WebSockets. We crawl the Tranco Top 1 Million websites to build a dataset for studying real-time updates in the wild. We find that HTTP Polling remains significantly more common than WebSockets, and WebSocket adoption appears to have stagnated in the past two to three years. We investigate some of the possible reasons for this decrease in the rate of adoption, and we contrast the adoption process to that of other web technologies. Our findings further suggest that even when WebSockets are employed, the prescribed best practices for securing them are often disregarded. The dataset is made available in the hopes that it may help inform the development of future real-time solutions for the web.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128815546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Temporal Analysis of the Entire Ethereum Blockchain Network 整个以太坊区块链网络的时间分析
Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449916
Lin Zhao, Sourav Sengupta, Arijit Khan, Robby Luo
With over 42 billion USD market capitalization (October 2020), Ethereum is the largest public blockchain that supports smart contracts. Recent works have modeled transactions, tokens, and other interactions in the Ethereum blockchain as static graphs to provide new observations and insights by conducting relevant graph analysis. Surprisingly, there is much less study on the evolution and temporal properties of these networks. In this paper, we investigate the evolutionary nature of Ethereum interaction networks from a temporal graphs perspective. We study the growth rate and model of four Ethereum blockchain networks, active lifespan and update rate of high-degree vertices. We detect anomalies based on temporal changes in global network properties, and forecast the survival of network communities in succeeding months leveraging on the relevant graph features and machine learning models.
以太坊市值超过420亿美元(2020年10月),是支持智能合约的最大公共区块链。最近的工作将以太坊区块链中的交易,令牌和其他交互建模为静态图形,通过进行相关的图形分析来提供新的观察和见解。令人惊讶的是,对这些网络的进化和时间特性的研究要少得多。在本文中,我们从时间图的角度研究了以太坊交互网络的进化本质。我们研究了四个以太坊区块链网络的增长率和模型,活跃寿命和高度顶点的更新率。我们根据全球网络属性的时间变化检测异常,并利用相关的图特征和机器学习模型预测网络社区在随后几个月的生存。
{"title":"Temporal Analysis of the Entire Ethereum Blockchain Network","authors":"Lin Zhao, Sourav Sengupta, Arijit Khan, Robby Luo","doi":"10.1145/3442381.3449916","DOIUrl":"https://doi.org/10.1145/3442381.3449916","url":null,"abstract":"With over 42 billion USD market capitalization (October 2020), Ethereum is the largest public blockchain that supports smart contracts. Recent works have modeled transactions, tokens, and other interactions in the Ethereum blockchain as static graphs to provide new observations and insights by conducting relevant graph analysis. Surprisingly, there is much less study on the evolution and temporal properties of these networks. In this paper, we investigate the evolutionary nature of Ethereum interaction networks from a temporal graphs perspective. We study the growth rate and model of four Ethereum blockchain networks, active lifespan and update rate of high-degree vertices. We detect anomalies based on temporal changes in global network properties, and forecast the survival of network communities in succeeding months leveraging on the relevant graph features and machine learning models.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128752210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
期刊
Proceedings of the Web Conference 2021
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1