Proceedings of the ACM Web Conference 2023最新文献_第10页

Net-track: Generic Web Tracking Detection Using Packet Metadata Net-track:使用包元数据的通用Web跟踪检测

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583372

Dongkeun Lee, Minwoo Joo, Wonjun Lee

While third-party trackers breach users’ privacy by compiling large amounts of personal data through web tracking techniques, combating these trackers is still left at the hand of each user. Although network operators may attempt a network-wide detection of trackers through inspecting all web traffic inside the network, their methods are not only privacy-intrusive but of limited accuracy as these are susceptible to domain changes or ineffective against encrypted traffic. To this end, in this paper, we propose Net-track, a novel approach to managing a secure web environment through platform-independent, encryption-agnostic detection of trackers. Utilizing only side-channel data from network traffic that are still available when encrypted, Net-track accurately detects trackers network-wide, irrespective of user’s browsers or devices without looking into packet payloads or resources fetched from the web server. This prevents user data from leaking to tracking servers in a privacy-preserving manner. By measuring statistics from traffic traces and their similarities, we show distinctions between benign traffic and tracker traffic in their traffic patterns and build Net-track based on the features that fully capture trackers’ distinctive characteristics. Evaluation results show that Net-track is able to detect trackers with 94.02% accuracy and can even discover new trackers yet unrecognized by existing filter lists. Furthermore, Net-track shows its potential for real-time detection, maintaining its performance when using only a portion of each traffic trace.

虽然第三方跟踪器通过网络跟踪技术收集大量个人数据，侵犯了用户的隐私，但与这些跟踪器作斗争仍然是每个用户的责任。尽管网络运营商可能会通过检查网络内的所有网络流量来尝试在网络范围内检测跟踪器，但他们的方法不仅侵犯隐私，而且准确性有限，因为这些方法容易受到域变化的影响，或者对加密流量无效。为此，在本文中，我们提出了Net-track，这是一种通过平台无关、加密无关的跟踪器检测来管理安全web环境的新方法。Net-track仅利用来自网络流量的侧信道数据，当加密时仍然可用，Net-track准确地检测到跟踪器全网范围内，无论用户的浏览器或设备如何，而无需查看数据包有效负载或从web服务器获取的资源。这可以防止用户数据以保护隐私的方式泄露到跟踪服务器。通过测量流量轨迹及其相似性的统计数据，我们显示了良性流量和跟踪器流量在流量模式上的区别，并基于充分捕捉跟踪器独特特征的特征构建了网络跟踪。评估结果表明，Net-track能够以94.02%的准确率检测跟踪器，甚至可以发现现有过滤列表无法识别的新跟踪器。此外，Net-track显示了其实时检测的潜力，在仅使用每个流量跟踪的一部分时保持其性能。

{"title":"Net-track: Generic Web Tracking Detection Using Packet Metadata","authors":"Dongkeun Lee, Minwoo Joo, Wonjun Lee","doi":"10.1145/3543507.3583372","DOIUrl":"https://doi.org/10.1145/3543507.3583372","url":null,"abstract":"While third-party trackers breach users’ privacy by compiling large amounts of personal data through web tracking techniques, combating these trackers is still left at the hand of each user. Although network operators may attempt a network-wide detection of trackers through inspecting all web traffic inside the network, their methods are not only privacy-intrusive but of limited accuracy as these are susceptible to domain changes or ineffective against encrypted traffic. To this end, in this paper, we propose Net-track, a novel approach to managing a secure web environment through platform-independent, encryption-agnostic detection of trackers. Utilizing only side-channel data from network traffic that are still available when encrypted, Net-track accurately detects trackers network-wide, irrespective of user’s browsers or devices without looking into packet payloads or resources fetched from the web server. This prevents user data from leaking to tracking servers in a privacy-preserving manner. By measuring statistics from traffic traces and their similarities, we show distinctions between benign traffic and tracker traffic in their traffic patterns and build Net-track based on the features that fully capture trackers’ distinctive characteristics. Evaluation results show that Net-track is able to detect trackers with 94.02% accuracy and can even discover new trackers yet unrecognized by existing filter lists. Furthermore, Net-track shows its potential for real-time detection, maintaining its performance when using only a portion of each traffic trace.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129385637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval 超越双塔:用于候选检索的属性引导表示学习

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583254

Hongyuan Shan, Qishen Zhang, Zhongyi Liu, Guannan Zhang, Chenliang Li

Candidate retrieval is a key part of the modern search engines whose goal is to find candidate items that are semantically related to the query from a large item pool. The core difference against the later ranking stage is the requirement of low latency. Hence, two-tower structure with two parallel yet independent encoder for both query and item is prevalent in many systems. In these efforts, the semantic information of a query and a candidate item is fed into the corresponding encoder and then use their representations for retrieval. With the popularity of pre-trained semantic models, the state-of-the-art for semantic retrieval tasks has achieved the significant performance gain. However, the capacity of learning relevance signals is still limited by the isolation between the query and the item. The interaction-based modeling between the query and the item has been widely validated to be useful for the ranking stage, where more computation cost is affordable. Here, we are quite initerested in an demanding question: how to exploiting query-item interaction-based learning to enhance candidate retrieval and still maintain the low computation cost. Note that an item usually contain various heteorgeneous attributes which could help us understand the item characteristics more precisely. To this end, we propose a novel attribute guided representation learning framework (named AGREE) to enhance the candidate retrieval by exploiting query-attribute relevance. The key idea is to couple the query and item representation learning together during the training phase, but also enable easy decoupling for efficient inference. Specifically, we introduce an attribute fusion layer in the item side to identify most relevant item features for item representation. On the query side, an attribute-aware learning process is introduced to better infer the search intent also from these attributes. After model training, we then decouple the attribute information away from the query encoder, which guarantees the low latency for the inference phase. Extensive experiments over two real-world large-scale datasets demonstrate the superiority of the proposed AGREE against several state-of-the-art technical alternatives. Further online A/B test from AliPay search servise also show that AGREE achieves substantial performance gain over four business metrics. Currently, the proposed AGREE has been deployed online in AliPay for serving major traffic.

候选检索是现代搜索引擎的关键部分，其目标是从一个大的项目池中找到与查询在语义上相关的候选项目。与后面的排名阶段的核心区别是对低延迟的要求。因此，在许多系统中，查询和项目都采用两个并行但独立的编码器的双塔结构。在这些工作中，查询和候选项的语义信息被输入到相应的编码器中，然后使用它们的表示进行检索。随着预训练语义模型的普及，语义检索任务的性能得到了显著提高。然而，相关性信号的学习能力仍然受到查询和项目之间的隔离的限制。查询和项目之间基于交互的建模已被广泛验证，可用于排序阶段，因为在排序阶段可以负担得起更多的计算成本。如何利用基于查询项交互的学习来增强候选检索，同时保持较低的计算成本，这是我们非常感兴趣的问题。请注意，一个项目通常包含各种异构属性，这些属性可以帮助我们更准确地理解项目特征。为此，我们提出了一种新的属性引导表示学习框架(命名为AGREE)，通过利用查询-属性相关性来增强候选检索。关键思想是在训练阶段将查询和项表示学习耦合在一起，但也可以轻松解耦以实现有效的推理。具体来说，我们在项目端引入了一个属性融合层，以识别项目表示中最相关的项目特征。在查询端，引入了属性感知学习过程，从这些属性中更好地推断出搜索意图。在模型训练之后，我们将属性信息与查询编码器解耦，这保证了推理阶段的低延迟。在两个真实世界的大规模数据集上进行的大量实验表明，与几种最先进的技术替代方案相比，所提出的AGREE具有优越性。支付宝搜索服务的进一步在线A/B测试也表明，AGREE在四个业务指标上取得了实质性的性能提升。目前，提议的协议已经在线部署在支付宝中，用于服务主要流量。

{"title":"Beyond Two-Tower: Attribute Guided Representation Learning for Candidate Retrieval","authors":"Hongyuan Shan, Qishen Zhang, Zhongyi Liu, Guannan Zhang, Chenliang Li","doi":"10.1145/3543507.3583254","DOIUrl":"https://doi.org/10.1145/3543507.3583254","url":null,"abstract":"Candidate retrieval is a key part of the modern search engines whose goal is to find candidate items that are semantically related to the query from a large item pool. The core difference against the later ranking stage is the requirement of low latency. Hence, two-tower structure with two parallel yet independent encoder for both query and item is prevalent in many systems. In these efforts, the semantic information of a query and a candidate item is fed into the corresponding encoder and then use their representations for retrieval. With the popularity of pre-trained semantic models, the state-of-the-art for semantic retrieval tasks has achieved the significant performance gain. However, the capacity of learning relevance signals is still limited by the isolation between the query and the item. The interaction-based modeling between the query and the item has been widely validated to be useful for the ranking stage, where more computation cost is affordable. Here, we are quite initerested in an demanding question: how to exploiting query-item interaction-based learning to enhance candidate retrieval and still maintain the low computation cost. Note that an item usually contain various heteorgeneous attributes which could help us understand the item characteristics more precisely. To this end, we propose a novel attribute guided representation learning framework (named AGREE) to enhance the candidate retrieval by exploiting query-attribute relevance. The key idea is to couple the query and item representation learning together during the training phase, but also enable easy decoupling for efficient inference. Specifically, we introduce an attribute fusion layer in the item side to identify most relevant item features for item representation. On the query side, an attribute-aware learning process is introduced to better infer the search intent also from these attributes. After model training, we then decouple the attribute information away from the query encoder, which guarantees the low latency for the inference phase. Extensive experiments over two real-world large-scale datasets demonstrate the superiority of the proposed AGREE against several state-of-the-art technical alternatives. Further online A/B test from AliPay search servise also show that AGREE achieves substantial performance gain over four business metrics. Currently, the proposed AGREE has been deployed online in AliPay for serving major traffic.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134117427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

NetGuard: Protecting Commercial Web APIs from Model Inversion Attacks using GAN-generated Fake Samples NetGuard:使用gan生成的假样本保护商业Web api免受模型反转攻击

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583224

Xueluan Gong, Ziyao Wang, Yanjiao Chen, Qianqian Wang, Cong Wang, Chao Shen

Recently more and more cloud service providers (e.g., Microsoft, Google, and Amazon) have commercialized their well-trained deep learning models by providing limited access via web API interfaces. However, it is shown that these APIs are susceptible to model inversion attacks, where attackers can recover the training data with high fidelity, which may cause serious privacy leakage.Existing defenses against model inversion attacks, however, hinder the model performance and are ineffective for more advanced attacks, e.g., Mirror [4]. In this paper, we proposed NetGuard, a novel utility-aware defense methodology against model inversion attacks (MIAs). Unlike previous works that perturb prediction outputs of the victim model, we propose to mislead the MIA effort by inserting engineered fake samples during the training process. A generative adversarial network (GAN) is carefully built to construct fake training samples to mislead the attack model without degrading the performance of the victim model. Besides, we adopt continual learning to further improve the utility of the victim model. Extensive experiments on CelebA, VGG-Face, and VGG-Face2 datasets show that NetGuard is superior to existing defenses, including DP [37] and Ad-mi [32] on state-of-the-art model inversion attacks, i.e., DMI [8], Mirror [4], Privacy [12], and Alignment [34].

最近，越来越多的云服务提供商(如微软、谷歌和亚马逊)通过提供通过web API接口的有限访问，将他们训练有素的深度学习模型商业化。然而，研究表明这些api容易受到模型反转攻击，攻击者可以高保真地恢复训练数据，这可能会造成严重的隐私泄露。然而，现有的针对模型反转攻击的防御会阻碍模型的性能，并且对更高级的攻击无效，例如Mirror[4]。在本文中，我们提出了NetGuard，一种针对模型反转攻击(mia)的新型实用感知防御方法。与之前扰乱受害者模型预测输出的工作不同，我们建议通过在训练过程中插入工程假样本来误导MIA的努力。在不降低受害者模型性能的前提下，精心构建生成式对抗网络(GAN)来构造虚假的训练样本来误导攻击模型。此外，我们采用持续学习的方法进一步提高受害者模型的实用性。在CelebA、VGG-Face和VGG-Face2数据集上进行的大量实验表明，NetGuard在最先进的模型反转攻击(即DMI[8]、Mirror[4]、Privacy[12]和Alignment[34])上优于现有的防御措施，包括DP[37]和Ad-mi[32]。

{"title":"NetGuard: Protecting Commercial Web APIs from Model Inversion Attacks using GAN-generated Fake Samples","authors":"Xueluan Gong, Ziyao Wang, Yanjiao Chen, Qianqian Wang, Cong Wang, Chao Shen","doi":"10.1145/3543507.3583224","DOIUrl":"https://doi.org/10.1145/3543507.3583224","url":null,"abstract":"Recently more and more cloud service providers (e.g., Microsoft, Google, and Amazon) have commercialized their well-trained deep learning models by providing limited access via web API interfaces. However, it is shown that these APIs are susceptible to model inversion attacks, where attackers can recover the training data with high fidelity, which may cause serious privacy leakage.Existing defenses against model inversion attacks, however, hinder the model performance and are ineffective for more advanced attacks, e.g., Mirror [4]. In this paper, we proposed NetGuard, a novel utility-aware defense methodology against model inversion attacks (MIAs). Unlike previous works that perturb prediction outputs of the victim model, we propose to mislead the MIA effort by inserting engineered fake samples during the training process. A generative adversarial network (GAN) is carefully built to construct fake training samples to mislead the attack model without degrading the performance of the victim model. Besides, we adopt continual learning to further improve the utility of the victim model. Extensive experiments on CelebA, VGG-Face, and VGG-Face2 datasets show that NetGuard is superior to existing defenses, including DP [37] and Ad-mi [32] on state-of-the-art model inversion attacks, i.e., DMI [8], Mirror [4], Privacy [12], and Alignment [34].","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122071653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Path-specific Causal Fair Prediction via Auxiliary Graph Structure Learning 基于辅助图结构学习的路径特定因果公平预测

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583280

Liuyi Yao, Yaliang Li, Bolin Ding, Jingren Zhou, Jinduo Liu, Mengdi Huai, Jing Gao

With ubiquitous adoption of machine learning algorithms in web technologies, such as recommendation system and social network, algorithm fairness has become a trending topic, and it has a great impact on social welfare. Among different fairness definitions, path-specific causal fairness is a widely adopted one with great potentials, as it distinguishes the fair and unfair effects that the sensitive attributes exert on algorithm predictions. Existing methods based on path-specific causal fairness either require graph structure as the prior knowledge or have high complexity in the calculation of path-specific effect. To tackle these challenges, we propose a novel casual graph based fair prediction framework which integrates graph structure learning into fair prediction to ensure that unfair pathways are excluded in the causal graph. Furthermore, we generalize the proposed framework to the scenarios where sensitive attributes can be non-root nodes and affected by other variables, which is commonly observed in real-world applications, such as recommendation system, but hardly addressed by existing works. We provide theoretical analysis on the generalization bound for the proposed fair prediction method, and conduct a series of experiments on real-world datasets to demonstrate that the proposed framework can provide better prediction performance and algorithm fairness trade-off.

随着机器学习算法在推荐系统、社交网络等web技术中的广泛应用，算法公平性已经成为一个热门话题，对社会福利产生了很大的影响。在不同的公平性定义中，路径特定的因果公平性是一种被广泛采用的具有很大潜力的公平性定义，因为它区分了敏感属性对算法预测的公平和不公平影响。现有的基于路径特定因果公平性的方法要么需要图结构作为先验知识，要么在计算路径特定效应时具有较高的复杂性。为了解决这些挑战，我们提出了一种基于随机图的公平预测框架，该框架将图结构学习集成到公平预测中，以确保不公平路径被排除在因果图中。此外，我们将所提出的框架推广到敏感属性可能是非根节点并受其他变量影响的场景，这在现实应用中很常见，例如推荐系统，但现有作品很少解决。我们对所提出的公平预测方法的泛化界进行了理论分析，并在实际数据集上进行了一系列实验，证明所提出的框架能够提供更好的预测性能和算法公平性权衡。

{"title":"Path-specific Causal Fair Prediction via Auxiliary Graph Structure Learning","authors":"Liuyi Yao, Yaliang Li, Bolin Ding, Jingren Zhou, Jinduo Liu, Mengdi Huai, Jing Gao","doi":"10.1145/3543507.3583280","DOIUrl":"https://doi.org/10.1145/3543507.3583280","url":null,"abstract":"With ubiquitous adoption of machine learning algorithms in web technologies, such as recommendation system and social network, algorithm fairness has become a trending topic, and it has a great impact on social welfare. Among different fairness definitions, path-specific causal fairness is a widely adopted one with great potentials, as it distinguishes the fair and unfair effects that the sensitive attributes exert on algorithm predictions. Existing methods based on path-specific causal fairness either require graph structure as the prior knowledge or have high complexity in the calculation of path-specific effect. To tackle these challenges, we propose a novel casual graph based fair prediction framework which integrates graph structure learning into fair prediction to ensure that unfair pathways are excluded in the causal graph. Furthermore, we generalize the proposed framework to the scenarios where sensitive attributes can be non-root nodes and affected by other variables, which is commonly observed in real-world applications, such as recommendation system, but hardly addressed by existing works. We provide theoretical analysis on the generalization bound for the proposed fair prediction method, and conduct a series of experiments on real-world datasets to demonstrate that the proposed framework can provide better prediction performance and algorithm fairness trade-off.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124955068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Hierarchy-Aware Multi-Hop Question Answering over Knowledge Graphs 基于知识图谱的层次感知多跳问答

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583376

Junnan Dong, Qinggang Zhang, Xiao Huang, Keyu Duan, Qiaoyu Tan, Zhimeng Jiang

Knowledge graphs (KGs) have been widely used to enhance complex question answering (QA). To understand complex questions, existing studies employ language models (LMs) to encode contexts. Despite the simplicity, they neglect the latent relational information among question concepts and answers in KGs. While question concepts ubiquitously present hyponymy at the semantic level, e.g., mammals and animals, this feature is identically reflected in the hierarchical relations in KGs, e.g., a_type_of. Therefore, we are motivated to explore comprehensive reasoning by the hierarchical structures in KGs to help understand questions. However, it is non-trivial to reason over tree-like structures compared with chained paths. Moreover, identifying appropriate hierarchies relies on expertise. To this end, we propose HamQA, a novel Hierarchy-aware multi-hop Question Answering framework on knowledge graphs, to effectively align the mutual hierarchical information between question contexts and KGs. The entire learning is conducted in Hyperbolic space, inspired by its advantages of embedding hierarchical structures. Specifically, (i) we design a context-aware graph attentive network to capture context information. (ii) Hierarchical structures are continuously preserved in KGs by minimizing the Hyperbolic geodesic distances. The comprehensive reasoning is conducted to jointly train both components and provide a top-ranked candidate as an optimal answer. We achieve a higher ranking than the state-of-the-art multi-hop baselines on the official OpenBookQA leaderboard with an accuracy of 85%.

知识图谱(Knowledge graphs, KGs)已被广泛用于提高复杂问题的回答能力。为了理解复杂的问题，现有的研究使用语言模型(LMs)对上下文进行编码。虽然问题概念在语义层面上普遍存在着下位义，例如哺乳动物和动物，但这一特征同样反映在问题概念的层次关系上，例如a_type_of。因此，我们有动机探索KGs中层次结构的综合推理，以帮助理解问题。然而，与链式路径相比，对树状结构进行推理是非平凡的。此外，确定适当的层次结构依赖于专业知识。为此，我们提出了一种新的基于知识图的层次感知多跳问答框架HamQA，以有效地对齐问题上下文和知识图谱之间的相互层次信息，整个学习过程在双曲空间中进行，灵感来自于其嵌入层次结构的优势。具体来说，(i)我们设计了一个上下文感知的图关注网络来捕获上下文信息。(ii)通过最小化双曲测地线距离，在kg中连续保留了分层结构。进行综合推理，共同训练两个分量，并提供一个排名靠前的候选人作为最优答案。我们在官方OpenBookQA排行榜上获得了比最先进的多跳基线更高的排名，准确率为85%。

{"title":"Hierarchy-Aware Multi-Hop Question Answering over Knowledge Graphs","authors":"Junnan Dong, Qinggang Zhang, Xiao Huang, Keyu Duan, Qiaoyu Tan, Zhimeng Jiang","doi":"10.1145/3543507.3583376","DOIUrl":"https://doi.org/10.1145/3543507.3583376","url":null,"abstract":"Knowledge graphs (KGs) have been widely used to enhance complex question answering (QA). To understand complex questions, existing studies employ language models (LMs) to encode contexts. Despite the simplicity, they neglect the latent relational information among question concepts and answers in KGs. While question concepts ubiquitously present hyponymy at the semantic level, e.g., mammals and animals, this feature is identically reflected in the hierarchical relations in KGs, e.g., a_type_of. Therefore, we are motivated to explore comprehensive reasoning by the hierarchical structures in KGs to help understand questions. However, it is non-trivial to reason over tree-like structures compared with chained paths. Moreover, identifying appropriate hierarchies relies on expertise. To this end, we propose HamQA, a novel Hierarchy-aware multi-hop Question Answering framework on knowledge graphs, to effectively align the mutual hierarchical information between question contexts and KGs. The entire learning is conducted in Hyperbolic space, inspired by its advantages of embedding hierarchical structures. Specifically, (i) we design a context-aware graph attentive network to capture context information. (ii) Hierarchical structures are continuously preserved in KGs by minimizing the Hyperbolic geodesic distances. The comprehensive reasoning is conducted to jointly train both components and provide a top-ranked candidate as an optimal answer. We achieve a higher ranking than the state-of-the-art multi-hop baselines on the official OpenBookQA leaderboard with an accuracy of 85%.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125083271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

SISSI: An Architecture for Semantic Interoperable Self-Sovereign Identity-based Access Control on the Web Web上语义互操作的自主身份访问控制体系结构

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583409

Christoph H.-J. Braun, V. Papanchev, Tobias Käfer

We present an architecture for authentication and authorization on the Web that is based on the Self-Sovereign Identity paradigm. Using our architecture, we aim to achieve semantic interoperability across different approaches to SSI. We build on the underlying RDF data model of the W3C’s recommendation for Verifiable Credentials and specify semantic access control rules using SHACL. Our communication protocol for an authorization process is based on Decentralised Identifiers and extends the Hyperledger Aries Present Proof protocol. We propose a modular architecture that allows for flexible extension, e. g., for supporting more signature schemes or Decentralised Identifier Methods. For evaluation, we implemented a Proof-of-Concept: We show that a Web-based approach to SSI outperfoms a blockchain-based approach to SSI in terms of End-to-End execution time.

我们提出了一种基于自我主权身份范式的Web上的身份验证和授权体系结构。使用我们的体系结构，我们的目标是实现跨不同SSI方法的语义互操作性。我们基于W3C推荐的可验证凭据的底层RDF数据模型进行构建，并使用SHACL指定语义访问控制规则。我们的授权过程通信协议基于去中心化标识符，并扩展了超级账本Aries Present Proof协议。我们提出了一个模块化架构，允许灵活的扩展，例如，支持更多的签名方案或分散的标识符方法。为了进行评估，我们实现了概念验证:我们表明，基于web的SSI方法在端到端执行时间方面优于基于区块链的SSI方法。

引用次数: 2

Minimum Topology Attacks for Graph Neural Networks 图神经网络的最小拓扑攻击

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583509

Mengmei Zhang, Xiao Wang, Chuan Shi, Lingjuan Lyu, Tianchi Yang, Junping Du

With the great popularity of Graph Neural Networks (GNNs), their robustness to adversarial topology attacks has received significant attention. Although many attack methods have been proposed, they mainly focus on fixed-budget attacks, aiming at finding the most adversarial perturbations within a fixed budget for target node. However, considering the varied robustness of each node, there is an inevitable dilemma caused by the fixed budget, i.e., no successful perturbation is found when the budget is relatively small, while if it is too large, the yielding redundant perturbations will hurt the invisibility. To break this dilemma, we propose a new type of topology attack, named minimum-budget topology attack, aiming to adaptively find the minimum perturbation sufficient for a successful attack on each node. To this end, we propose an attack model, named MiBTack, based on a dynamic projected gradient descent algorithm, which can effectively solve the involving non-convex constraint optimization on discrete topology. Extensive results on three GNNs and four real-world datasets show that MiBTack can successfully lead all target nodes misclassified with the minimum perturbation edges. Moreover, the obtained minimum budget can be used to measure node robustness, so we can explore the relationships of robustness, topology, and uncertainty for nodes, which is beyond what the current fixed-budget topology attacks can offer.

随着图神经网络(gnn)的广泛应用，其对对抗拓扑攻击的鲁棒性受到了广泛的关注。虽然已经提出了许多攻击方法，但它们主要集中在固定预算攻击上，旨在寻找目标节点在固定预算范围内最具对抗性的扰动。然而，考虑到每个节点的鲁棒性不同，固定预算不可避免地造成了一个困境，即当预算较小时，无法找到成功的扰动，而当预算过大时，产生的冗余扰动会损害不可见性。为了打破这一困境，我们提出了一种新的拓扑攻击，称为最小预算拓扑攻击，旨在自适应地找到足以成功攻击每个节点的最小扰动。为此，我们提出了一种基于动态投影梯度下降算法的攻击模型MiBTack，该模型可以有效地解决离散拓扑上的非凸约束优化问题。在3个gnn和4个真实数据集上的广泛实验结果表明，MiBTack能够以最小的扰动边成功地导致所有目标节点的误分类。此外，获得的最小预算可以用来衡量节点的鲁棒性，因此我们可以探索节点的鲁棒性、拓扑和不确定性之间的关系，这是目前固定预算拓扑攻击所不能提供的。

{"title":"Minimum Topology Attacks for Graph Neural Networks","authors":"Mengmei Zhang, Xiao Wang, Chuan Shi, Lingjuan Lyu, Tianchi Yang, Junping Du","doi":"10.1145/3543507.3583509","DOIUrl":"https://doi.org/10.1145/3543507.3583509","url":null,"abstract":"With the great popularity of Graph Neural Networks (GNNs), their robustness to adversarial topology attacks has received significant attention. Although many attack methods have been proposed, they mainly focus on fixed-budget attacks, aiming at finding the most adversarial perturbations within a fixed budget for target node. However, considering the varied robustness of each node, there is an inevitable dilemma caused by the fixed budget, i.e., no successful perturbation is found when the budget is relatively small, while if it is too large, the yielding redundant perturbations will hurt the invisibility. To break this dilemma, we propose a new type of topology attack, named minimum-budget topology attack, aiming to adaptively find the minimum perturbation sufficient for a successful attack on each node. To this end, we propose an attack model, named MiBTack, based on a dynamic projected gradient descent algorithm, which can effectively solve the involving non-convex constraint optimization on discrete topology. Extensive results on three GNNs and four real-world datasets show that MiBTack can successfully lead all target nodes misclassified with the minimum perturbation edges. Moreover, the obtained minimum budget can be used to measure node robustness, so we can explore the relationships of robustness, topology, and uncertainty for nodes, which is beyond what the current fixed-budget topology attacks can offer.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127726112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Controllable Universal Fair Representation Learning 可控普遍公平代表学习

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583307

Yue Cui, Ma Chen, Kai Zheng, Lei Chen, Xiaofang Zhou

Learning fair and transferable representations of users that can be used for a wide spectrum of downstream tasks (specifically, machine learning models) has great potential in fairness-aware Web services. Existing studies focus on debiasing w.r.t. a small scale of (one or a handful of) fixed pre-defined sensitive attributes. However, in real practice, downstream data users can be interested in various protected groups and these are usually not known as prior. This requires the learned representations to be fair w.r.t. all possible sensitive attributes. We name this task universal fair representation learning, in which an exponential number of sensitive attributes need to be dealt with, bringing the challenges of unreasonable computational cost and un-guaranteed fairness constraints. To address these problems, we propose a controllable universal fair representation learning (CUFRL) method. An effective bound is first derived via the lens of mutual information to guarantee parity of the universal set of sensitive attributes while maintaining the accuracy of downstream tasks. We also theoretically establish that the number of sensitive attributes that need to be processed can be reduced from exponential to linear. Experiments on two public real-world datasets demonstrate CUFRL can achieve significantly better accuracy-fairness trade-off compared with baseline approaches.

学习公平和可转移的用户表示，可用于广泛的下游任务(特别是机器学习模型)，在公平感知的Web服务中具有巨大的潜力。现有的研究主要集中在对一小部分(一个或几个)固定的预定义敏感属性进行降噪。然而，在实际实践中，下游数据用户可能对各种受保护组感兴趣，而这些组通常是未知的。这要求学习到的表示对于所有可能的敏感属性都是公平的。我们将此任务命名为通用公平表示学习，其中需要处理指数数量的敏感属性，带来不合理的计算成本和不保证的公平性约束的挑战。为了解决这些问题，我们提出了一种可控的通用公平表征学习(CUFRL)方法。首先通过互信息透镜推导出有效界，以保证敏感属性通用集的奇偶性，同时保持下游任务的准确性。我们还从理论上建立了需要处理的敏感属性的数量可以从指数减少到线性。在两个公开的真实数据集上的实验表明，与基线方法相比，CUFRL可以实现更好的准确性和公平性权衡。

{"title":"Controllable Universal Fair Representation Learning","authors":"Yue Cui, Ma Chen, Kai Zheng, Lei Chen, Xiaofang Zhou","doi":"10.1145/3543507.3583307","DOIUrl":"https://doi.org/10.1145/3543507.3583307","url":null,"abstract":"Learning fair and transferable representations of users that can be used for a wide spectrum of downstream tasks (specifically, machine learning models) has great potential in fairness-aware Web services. Existing studies focus on debiasing w.r.t. a small scale of (one or a handful of) fixed pre-defined sensitive attributes. However, in real practice, downstream data users can be interested in various protected groups and these are usually not known as prior. This requires the learned representations to be fair w.r.t. all possible sensitive attributes. We name this task universal fair representation learning, in which an exponential number of sensitive attributes need to be dealt with, bringing the challenges of unreasonable computational cost and un-guaranteed fairness constraints. To address these problems, we propose a controllable universal fair representation learning (CUFRL) method. An effective bound is first derived via the lens of mutual information to guarantee parity of the universal set of sensitive attributes while maintaining the accuracy of downstream tasks. We also theoretically establish that the number of sensitive attributes that need to be processed can be reduced from exponential to linear. Experiments on two public real-world datasets demonstrate CUFRL can achieve significantly better accuracy-fairness trade-off compared with baseline approaches.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127791996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Node-wise Diffusion for Scalable Graph Learning 面向可扩展图学习的节点扩散

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583408

Keke Huang, Jing Tang, Juncheng Liu, Renchi Yang, X. Xiao

Graph Neural Networks (GNNs) have shown superior performance for semi-supervised learning of numerous web applications, such as classification on web services and pages, analysis of online social networks, and recommendation in e-commerce. The state of the art derives representations for all nodes in graphs following the same diffusion (message passing) model without discriminating their uniqueness. However, (i) labeled nodes involved in model training usually account for a small portion of graphs in the semi-supervised setting, and (ii) different nodes locate at different graph local contexts and it inevitably degrades the representation qualities if treating them undistinguishedly in diffusion. To address the above issues, we develop NDM, a universal node-wise diffusion model, to capture the unique characteristics of each node in diffusion, by which NDM is able to yield high-quality node representations. In what follows, we customize NDM for semi-supervised learning and design the NIGCN model. In particular, NIGCN advances the efficiency significantly since it (i) produces representations for labeled nodes only and (ii) adopts well-designed neighbor sampling techniques tailored for node representation generation. Extensive experimental results on various types of web datasets, including citation, social and co-purchasing graphs, not only verify the state-of-the-art effectiveness of NIGCN but also strongly support the remarkable scalability of NIGCN. In particular, NIGCN completes representation generation and training within 10 seconds on the dataset with hundreds of millions of nodes and billions of edges, up to orders of magnitude speedups over the baselines, while achieving the highest F1-scores on classification.

图神经网络(gnn)在许多web应用程序的半监督学习方面表现优异，例如web服务和页面的分类，在线社交网络的分析以及电子商务中的推荐。目前的技术状态是根据相同的扩散(消息传递)模型派生图中所有节点的表示，而不区分它们的唯一性。然而，(i)在半监督设置中，模型训练中涉及的标记节点通常只占图的一小部分，(ii)不同的节点位于不同的图局部上下文，如果在扩散中不区分它们，不可避免地会降低表示质量。为了解决上述问题，我们开发了NDM，一个通用的节点智能扩散模型，以捕获扩散中每个节点的独特特征，通过该模型，NDM能够产生高质量的节点表示。接下来，我们为半监督学习定制NDM，并设计NIGCN模型。特别是，NIGCN显著提高了效率，因为它(i)仅为标记节点生成表示，(ii)采用为节点表示生成量身定制的精心设计的邻居采样技术。在各种类型的网络数据集上的大量实验结果，包括引文、社交和共同购买图，不仅验证了NIGCN的最先进的有效性，而且有力地支持了NIGCN显著的可扩展性。特别是，NIGCN在拥有数亿个节点和数十亿条边的数据集上，在10秒内完成了表示生成和训练，速度比基线提高了几个数量级，同时在分类上获得了最高的f1分数。

{"title":"Node-wise Diffusion for Scalable Graph Learning","authors":"Keke Huang, Jing Tang, Juncheng Liu, Renchi Yang, X. Xiao","doi":"10.1145/3543507.3583408","DOIUrl":"https://doi.org/10.1145/3543507.3583408","url":null,"abstract":"Graph Neural Networks (GNNs) have shown superior performance for semi-supervised learning of numerous web applications, such as classification on web services and pages, analysis of online social networks, and recommendation in e-commerce. The state of the art derives representations for all nodes in graphs following the same diffusion (message passing) model without discriminating their uniqueness. However, (i) labeled nodes involved in model training usually account for a small portion of graphs in the semi-supervised setting, and (ii) different nodes locate at different graph local contexts and it inevitably degrades the representation qualities if treating them undistinguishedly in diffusion. To address the above issues, we develop NDM, a universal node-wise diffusion model, to capture the unique characteristics of each node in diffusion, by which NDM is able to yield high-quality node representations. In what follows, we customize NDM for semi-supervised learning and design the NIGCN model. In particular, NIGCN advances the efficiency significantly since it (i) produces representations for labeled nodes only and (ii) adopts well-designed neighbor sampling techniques tailored for node representation generation. Extensive experimental results on various types of web datasets, including citation, social and co-purchasing graphs, not only verify the state-of-the-art effectiveness of NIGCN but also strongly support the remarkable scalability of NIGCN. In particular, NIGCN completes representation generation and training within 10 seconds on the dataset with hundreds of millions of nodes and billions of edges, up to orders of magnitude speedups over the baselines, while achieving the highest F1-scores on classification.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126489090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Quantize Sequential Recommenders Without Private Data 量化顺序推荐没有私人数据

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583351

Lin-Sheng Shi, Yuang Liu, J. Wang, Wei Zhang

Deep neural networks have achieved great success in sequential recommendation systems. While maintaining high competence in user modeling and next-item recommendation, these models have long been plagued by the numerous parameters and computation, which inhibit them to be deployed on resource-constrained mobile devices. Model quantization, as one of the main paradigms for compression techniques, converts float parameters to low-bit values to reduce parameter redundancy and accelerate inference. To avoid drastic performance degradation, it usually requests a fine-tuning phase with an original dataset. However, the training set of user-item interactions is not always available due to transmission limits or privacy concerns. In this paper, we propose a novel framework to quantize sequential recommenders without access to any real private data. A generator is employed in the framework to synthesize fake sequence samples to feed the quantized sequential recommendation model and minimize the gap with a full-precision sequential recommendation model. The generator and the quantized model are optimized with a min-max game — alternating discrepancy estimation and knowledge transfer. Moreover, we devise a two-level discrepancy modeling strategy to transfer information between the quantized model and the full-precision model. The extensive experiments of various recommendation networks on three public datasets demonstrate the effectiveness of the proposed framework.

深度神经网络在顺序推荐系统中取得了巨大的成功。虽然这些模型在用户建模和下一项推荐方面保持了较高的能力，但长期以来，这些模型一直受到参数和计算量过多的困扰，这限制了它们在资源受限的移动设备上的部署。模型量化是压缩技术的主要范式之一，它将浮点数参数转换为低比特值，以减少参数冗余，加快推理速度。为了避免剧烈的性能下降，它通常要求对原始数据集进行微调阶段。然而，由于传输限制或隐私问题，用户-项目交互的训练集并不总是可用的。在本文中，我们提出了一个新的框架来量化顺序推荐，而不需要访问任何真实的私有数据。在框架中使用生成器合成假序列样本，为量化序列推荐模型提供数据，从而使与全精度序列推荐模型的差距最小化。采用最小-最大博弈-交替差异估计和知识转移对生成器和量化模型进行优化。此外，我们还设计了一种两级差异建模策略，在量化模型和全精度模型之间传递信息。在三个公共数据集上对各种推荐网络进行了大量实验，证明了所提框架的有效性。

{"title":"Quantize Sequential Recommenders Without Private Data","authors":"Lin-Sheng Shi, Yuang Liu, J. Wang, Wei Zhang","doi":"10.1145/3543507.3583351","DOIUrl":"https://doi.org/10.1145/3543507.3583351","url":null,"abstract":"Deep neural networks have achieved great success in sequential recommendation systems. While maintaining high competence in user modeling and next-item recommendation, these models have long been plagued by the numerous parameters and computation, which inhibit them to be deployed on resource-constrained mobile devices. Model quantization, as one of the main paradigms for compression techniques, converts float parameters to low-bit values to reduce parameter redundancy and accelerate inference. To avoid drastic performance degradation, it usually requests a fine-tuning phase with an original dataset. However, the training set of user-item interactions is not always available due to transmission limits or privacy concerns. In this paper, we propose a novel framework to quantize sequential recommenders without access to any real private data. A generator is employed in the framework to synthesize fake sequence samples to feed the quantized sequential recommendation model and minimize the gap with a full-precision sequential recommendation model. The generator and the quantized model are optimized with a min-max game — alternating discrepancy estimation and knowledge transfer. Moreover, we devise a two-level discrepancy modeling strategy to transfer information between the quantized model and the full-precision model. The extensive experiments of various recommendation networks on three public datasets demonstrate the effectiveness of the proposed framework.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116005239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0