Proceedings of the ACM Web Conference 2023最新文献_第9页

Preserving Missing Data Distribution in Synthetic Data 合成数据中缺失数据分布的保护

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583297

Xinyu Wang, H. Asif, Jaideep Vaidya

Data from Web artifacts and from the Web is often sensitive and cannot be directly shared for data analysis. Therefore, synthetic data generated from the real data is increasingly used as a privacy-preserving substitute. In many cases, real data from the web has missing values where the missingness itself possesses important informational content, which domain experts leverage to improve their analysis. However, this information content is lost if either imputation or deletion is used before synthetic data generation. In this paper, we propose several methods to generate synthetic data that preserve both the observable and the missing data distributions. An extensive empirical evaluation over a range of carefully fabricated and real world datasets demonstrates the effectiveness of our approach.

来自Web工件和Web的数据通常是敏感的，不能直接共享用于数据分析。因此，从真实数据中生成的合成数据越来越多地被用作保护隐私的替代品。在许多情况下，来自网络的真实数据有缺失的价值，而缺失本身拥有重要的信息内容，领域专家利用这些信息来改进他们的分析。但是，如果在合成数据生成之前使用插入或删除，则该信息内容将丢失。在本文中，我们提出了几种方法来生成既保留可观察数据分布又保留缺失数据分布的合成数据。对一系列精心制作的真实世界数据集进行了广泛的实证评估，证明了我们方法的有效性。

引用次数: 0

CL-WSTC: Continual Learning for Weakly Supervised Text Classification on the Internet 互联网上弱监督文本分类的持续学习

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583249

Miao Li, Jiaqi Zhu, Xin Yang, Yi Yang, Qiang Gao, Hongan Wang

Continual text classification is an important research direction in Web mining. Existing works are limited to supervised approaches relying on abundant labeled data, but in the open and dynamic environment of Internet, involving constant semantic change of known topics and the appearance of unknown topics, text annotations are hard to access in time for each period. That calls for the technique of weakly supervised text classification (WSTC), which requires just seed words for each category and has succeed in static text classification tasks. However, there are still no studies of applying WSTC methods in a continual learning paradigm to actually accommodate the open and evolving Internet. In this paper, we tackle this problem for the first time and propose a framework, named Continual Learning for Weakly Supervised Text Classification (CL-WSTC), which can take any WSTC method as base model. It consists of two modules, classification decision with delay and seed word updating. In the former, the probability threshold for each category in each period is adaptively learned to determine the acceptance/rejection of texts. In the latter, with candidate words output by the base model, seed words are added and deleted via reinforcement learning with immediate rewards, according to an empirically certified unsupervised measure. Extensive experiments show that our approach has strong universality and can achieve a better trade-off between classification accuracy and decision timeliness compared to non-continual counterparts, with intuitively interpretable updating of seed words.

连续文本分类是Web挖掘的一个重要研究方向。现有的工作仅限于依赖于大量标注数据的监督方法，但在互联网开放、动态的环境中，涉及到已知主题的不断语义变化和未知主题的出现，文本注释很难在每个时期都能及时获取。这需要弱监督文本分类技术(WSTC)，该技术只需要每个类别的种子词，并且在静态文本分类任务中取得了成功。然而，目前还没有研究将WSTC方法应用到持续学习范式中，以真正适应开放和不断发展的互联网。本文首次解决了这一问题，提出了一个基于弱监督文本分类的持续学习框架(CL-WSTC)，该框架可以采用任何弱监督文本分类方法作为基本模型。它包括两个模块:带延迟的分类决策模块和种子词更新模块。在前者中，自适应学习每个时期每个类别的概率阈值，以确定文本的接受/拒绝。在后者中，根据基础模型输出的候选词，根据经验认证的无监督度量，通过带有即时奖励的强化学习来添加和删除种子词。大量的实验表明，我们的方法具有很强的通用性，与非连续的方法相比，可以在分类精度和决策及时性之间实现更好的权衡，并且种子词的更新具有直观的可解释性。

{"title":"CL-WSTC: Continual Learning for Weakly Supervised Text Classification on the Internet","authors":"Miao Li, Jiaqi Zhu, Xin Yang, Yi Yang, Qiang Gao, Hongan Wang","doi":"10.1145/3543507.3583249","DOIUrl":"https://doi.org/10.1145/3543507.3583249","url":null,"abstract":"Continual text classification is an important research direction in Web mining. Existing works are limited to supervised approaches relying on abundant labeled data, but in the open and dynamic environment of Internet, involving constant semantic change of known topics and the appearance of unknown topics, text annotations are hard to access in time for each period. That calls for the technique of weakly supervised text classification (WSTC), which requires just seed words for each category and has succeed in static text classification tasks. However, there are still no studies of applying WSTC methods in a continual learning paradigm to actually accommodate the open and evolving Internet. In this paper, we tackle this problem for the first time and propose a framework, named Continual Learning for Weakly Supervised Text Classification (CL-WSTC), which can take any WSTC method as base model. It consists of two modules, classification decision with delay and seed word updating. In the former, the probability threshold for each category in each period is adaptively learned to determine the acceptance/rejection of texts. In the latter, with candidate words output by the base model, seed words are added and deleted via reinforcement learning with immediate rewards, according to an empirically certified unsupervised measure. Extensive experiments show that our approach has strong universality and can achieve a better trade-off between classification accuracy and decision timeliness compared to non-continual counterparts, with intuitively interpretable updating of seed words.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121122858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fair Graph Representation Learning via Diverse Mixture-of-Experts 基于不同专家组合的公平图表示学习

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583207

Zheyuan Liu, Chunhui Zhang, Yijun Tian, Erchi Zhang, Chao Huang, Yanfang Ye, Chuxu Zhang

Graph Neural Networks (GNNs) have demonstrated a great representation learning capability on graph data and have been utilized in various downstream applications. However, real-world data in web-based applications (e.g., recommendation and advertising) always contains bias, preventing GNNs from learning fair representations. Although many works were proposed to address the fairness issue, they suffer from the significant problem of insufficient learnable knowledge with limited attributes after debiasing. To address this problem, we develop Graph-Fairness Mixture of Experts (G-Fame), a novel plug-and-play method to assist any GNNs to learn distinguishable representations with unbiased attributes. Furthermore, based on G-Fame, we propose G-Fame++, which introduces three novel strategies to improve the representation fairness from node representations, model layer, and parameter redundancy perspectives. In particular, we first present the embedding diversified method to learn distinguishable node representations. Second, we design the layer diversified strategy to maximize the output difference of distinct model layers. Third, we introduce the expert diversified method to minimize expert parameter similarities to learn diverse and complementary representations. Extensive experiments demonstrate the superiority of G-Fame and G-Fame++ in both accuracy and fairness, compared to state-of-the-art methods across multiple graph datasets.

图神经网络(gnn)在图数据上表现出了良好的表征学习能力，并在各种下游应用中得到了应用。然而，基于web的应用程序中的真实数据(例如，推荐和广告)总是包含偏见，阻止gnn学习公平表示。虽然提出了许多解决公平性问题的作品，但它们在去偏后存在可学习知识不足、属性有限的显著问题。为了解决这个问题，我们开发了专家的图公平混合(G-Fame)，这是一种新颖的即插即用方法，可以帮助任何gnn学习具有无偏属性的可区分表示。此外，在G-Fame的基础上，我们提出了G-Fame++，从节点表示、模型层和参数冗余的角度引入了三种新的策略来提高表示公平性。特别地，我们首先提出了嵌入多样化的方法来学习可区分的节点表示。其次，我们设计了层多样化策略，使不同模型层的输出差最大化。第三，引入专家多样化方法，最小化专家参数相似度，学习不同的互补表示。大量的实验表明，与跨多个图数据集的最先进方法相比，G-Fame和G-Fame++在准确性和公平性方面都具有优势。

{"title":"Fair Graph Representation Learning via Diverse Mixture-of-Experts","authors":"Zheyuan Liu, Chunhui Zhang, Yijun Tian, Erchi Zhang, Chao Huang, Yanfang Ye, Chuxu Zhang","doi":"10.1145/3543507.3583207","DOIUrl":"https://doi.org/10.1145/3543507.3583207","url":null,"abstract":"Graph Neural Networks (GNNs) have demonstrated a great representation learning capability on graph data and have been utilized in various downstream applications. However, real-world data in web-based applications (e.g., recommendation and advertising) always contains bias, preventing GNNs from learning fair representations. Although many works were proposed to address the fairness issue, they suffer from the significant problem of insufficient learnable knowledge with limited attributes after debiasing. To address this problem, we develop Graph-Fairness Mixture of Experts (G-Fame), a novel plug-and-play method to assist any GNNs to learn distinguishable representations with unbiased attributes. Furthermore, based on G-Fame, we propose G-Fame++, which introduces three novel strategies to improve the representation fairness from node representations, model layer, and parameter redundancy perspectives. In particular, we first present the embedding diversified method to learn distinguishable node representations. Second, we design the layer diversified strategy to maximize the output difference of distinct model layers. Third, we introduce the expert diversified method to minimize expert parameter similarities to learn diverse and complementary representations. Extensive experiments demonstrate the superiority of G-Fame and G-Fame++ in both accuracy and fairness, compared to state-of-the-art methods across multiple graph datasets.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123150917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Gender Pay Gap in Sports on a Fan-Request Celebrity Video Site 在一个粉丝请求的名人视频网站上，体育界的性别收入差距

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583884

Nazanin Sabri, Stephen Reysen, Ingmar Weber

The internet is often thought of as a democratizer, enabling equality in aspects such as pay, as well as a tool introducing novel communication and monetization opportunities. In this study we examine athletes on Cameo, a website that enables bi-directional fan-celebrity interactions, questioning whether the well-documented gender pay gaps in sports persist in this digital setting. Traditional studies into gender pay gaps in sports are mostly in a centralized setting where an organization decides the pay for the players, while Cameo facilitates grass-roots fan engagement where fans pay for video messages from their preferred athletes. The results showed that even on such a platform gender pay gaps persist, both in terms of cost-per-message, and in the number of requests, proxied by number of ratings. For instance, we find that female athletes have a median pay of 30$ per-video, while the same statistic is 40$ for men. The results also contribute to the study of parasocial relationships and personalized fan engagements over a distance. Something that has become more relevant during the ongoing COVID-19 pandemic, where in-person fan engagement has often been limited.

互联网通常被认为是一种民主化的工具，在薪酬等方面实现了平等，同时也是一种引入新交流和盈利机会的工具。在这项研究中，我们研究了Cameo上的运动员，Cameo是一个能够实现粉丝与名人双向互动的网站，我们质疑体育界有充分记录的性别收入差距是否在这个数字环境中仍然存在。传统的体育性别薪酬差距研究大多集中在一个组织决定球员薪酬的环境中，而Cameo促进了基层球迷的参与，球迷为他们喜欢的运动员的视频信息付费。结果表明，即使在这样一个平台上，性别收入差距仍然存在，无论是在每条消息的成本方面，还是在请求的数量方面(由评级数量代表)。例如，我们发现女运动员每段视频的平均收入为30美元，而男性运动员的平均收入为40美元。研究结果还有助于研究远距离的准社会关系和个性化粉丝互动。在正在进行的COVID-19大流行期间，面对面的粉丝参与往往受到限制，这一点变得更加重要。

{"title":"Gender Pay Gap in Sports on a Fan-Request Celebrity Video Site","authors":"Nazanin Sabri, Stephen Reysen, Ingmar Weber","doi":"10.1145/3543507.3583884","DOIUrl":"https://doi.org/10.1145/3543507.3583884","url":null,"abstract":"The internet is often thought of as a democratizer, enabling equality in aspects such as pay, as well as a tool introducing novel communication and monetization opportunities. In this study we examine athletes on Cameo, a website that enables bi-directional fan-celebrity interactions, questioning whether the well-documented gender pay gaps in sports persist in this digital setting. Traditional studies into gender pay gaps in sports are mostly in a centralized setting where an organization decides the pay for the players, while Cameo facilitates grass-roots fan engagement where fans pay for video messages from their preferred athletes. The results showed that even on such a platform gender pay gaps persist, both in terms of cost-per-message, and in the number of requests, proxied by number of ratings. For instance, we find that female athletes have a median pay of 30$ per-video, while the same statistic is 40$ for men. The results also contribute to the study of parasocial relationships and personalized fan engagements over a distance. Something that has become more relevant during the ongoing COVID-19 pandemic, where in-person fan engagement has often been limited.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134196397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Is your digital neighbor a reliable investment advisor? 你的数字邻居是一个可靠的投资顾问吗?

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583502

Daisuke Kawai, A. Cuevas, Bryan R. Routledge, K. Soska, Ariel Zetlin-Jones, Nicolas Christin

The web and social media platforms have drastically changed how investors produce and consume financial advice. Historically, individual investors were often relying on newsletters and related prospectus backed by the reputation and track record of their issuers. Nowadays, financial advice is frequently offered online, by anonymous or pseudonymous parties with little at stake. As such, a natural question is to investigate whether these modern financial “influencers” operate in good faith, or whether they might be misleading their followers intentionally. To start answering this question, we obtained data from a very large cryptocurrency derivatives exchange, from which we derived individual trading positions. Some of the investors on that platform elect to link to their Twitter profiles. We were thus able to compare the positions publicly espoused on Twitter with those actually taken in the market. We discovered that 1) staunchly “bullish” investors on Twitter often took much more moderate, if not outright opposite, positions in their own trades when the market was down, 2) their followers tended to align their positions with bullish Twitter outlooks, and 3) moderate voices on Twitter (and their own followers) were on the other hand far more consistent with their actual investment strategies. In other words, while social media advice may attempt to foster a sense of camaraderie among people of like-minded beliefs, the reality is that this is merely an illusion, which may result in financial losses for people blindly following advice.

网络和社交媒体平台极大地改变了投资者提供和消费理财建议的方式。从历史上看，个人投资者往往依赖于以发行人的声誉和业绩为基础的通讯和相关招股说明书。如今，金融建议经常由匿名或假名人士在网上提供，没有什么风险。因此，一个自然的问题是调查这些现代金融“影响者”是否真诚地运作，或者他们是否有意误导他们的追随者。为了回答这个问题，我们从一个非常大的加密货币衍生品交易所获得了数据，从中我们得出了个人交易头寸。该平台上的一些投资者选择链接到他们的Twitter个人资料。因此，我们能够比较Twitter上公开支持的立场与市场上实际采取的立场。我们发现，1)当市场下跌时，Twitter上坚定的“看涨”投资者往往在自己的交易中采取更温和的立场，如果不是完全相反的话;2)他们的追随者倾向于将他们的立场与看涨Twitter的前景保持一致;3)另一方面，Twitter上温和的声音(以及他们自己的追随者)与他们实际的投资策略更加一致。换句话说，虽然社交媒体的建议可能试图在志同道合的人之间培养一种同志感，但现实是，这只是一种错觉，这可能会导致盲目听从建议的人遭受经济损失。

{"title":"Is your digital neighbor a reliable investment advisor?","authors":"Daisuke Kawai, A. Cuevas, Bryan R. Routledge, K. Soska, Ariel Zetlin-Jones, Nicolas Christin","doi":"10.1145/3543507.3583502","DOIUrl":"https://doi.org/10.1145/3543507.3583502","url":null,"abstract":"The web and social media platforms have drastically changed how investors produce and consume financial advice. Historically, individual investors were often relying on newsletters and related prospectus backed by the reputation and track record of their issuers. Nowadays, financial advice is frequently offered online, by anonymous or pseudonymous parties with little at stake. As such, a natural question is to investigate whether these modern financial “influencers” operate in good faith, or whether they might be misleading their followers intentionally. To start answering this question, we obtained data from a very large cryptocurrency derivatives exchange, from which we derived individual trading positions. Some of the investors on that platform elect to link to their Twitter profiles. We were thus able to compare the positions publicly espoused on Twitter with those actually taken in the market. We discovered that 1) staunchly “bullish” investors on Twitter often took much more moderate, if not outright opposite, positions in their own trades when the market was down, 2) their followers tended to align their positions with bullish Twitter outlooks, and 3) moderate voices on Twitter (and their own followers) were on the other hand far more consistent with their actual investment strategies. In other words, while social media advice may attempt to foster a sense of camaraderie among people of like-minded beliefs, the reality is that this is merely an illusion, which may result in financial losses for people blindly following advice.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134129610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency 针对单列格式不一致的人在循环中的正则表达式提取

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583515

Shaochen Yu, Lei Han, M. Indulska, S. Sadiq, Gianluca Demartini

Format inconsistency is one of the most frequently appearing data quality issues encountered during data cleaning. Existing automated approaches commonly lack applicability and generalisability, while approaches with human inputs typically require specialized skills such as writing regular expressions. This paper proposes a novel hybrid human-machine system, namely “Data-Scanner-4C”, which leverages crowdsourcing to address syntactic format inconsistencies in a single column effectively. We first ask crowd workers to create examples from single-column data through “data selection” and “result validation” tasks. Then, we propose and use a novel rule-based learning algorithm to infer the regular expressions that propagate formats from created examples to the entire column. Our system integrates crowdsourcing and algorithmic format extraction techniques in a single workflow. Having human experts write regular expressions is no longer required, thereby reducing both the time as well as the opportunity for error. We conducted experiments through both synthetic and real-world datasets, and our results show how the proposed approach is applicable and effective across data types and formats.

格式不一致是数据清理过程中遇到的最常见的数据质量问题之一。现有的自动化方法通常缺乏适用性和通用性，而人工输入的方法通常需要专门的技能，比如编写正则表达式。本文提出了一种新型的混合人机系统，即“Data-Scanner-4C”，它利用众包的方式有效地解决单列中的语法格式不一致问题。我们首先要求人群工作者通过“数据选择”和“结果验证”任务从单列数据创建示例。然后，我们提出并使用了一种新的基于规则的学习算法来推断将格式从创建的示例传播到整个列的正则表达式。我们的系统将众包和算法格式提取技术集成在一个工作流程中。不再需要人类专家编写正则表达式，从而减少了时间和出错的机会。我们通过合成数据集和真实数据集进行了实验，我们的结果显示了所提出的方法如何在数据类型和格式之间适用和有效。

{"title":"Human-in-the-loop Regular Expression Extraction for Single Column Format Inconsistency","authors":"Shaochen Yu, Lei Han, M. Indulska, S. Sadiq, Gianluca Demartini","doi":"10.1145/3543507.3583515","DOIUrl":"https://doi.org/10.1145/3543507.3583515","url":null,"abstract":"Format inconsistency is one of the most frequently appearing data quality issues encountered during data cleaning. Existing automated approaches commonly lack applicability and generalisability, while approaches with human inputs typically require specialized skills such as writing regular expressions. This paper proposes a novel hybrid human-machine system, namely “Data-Scanner-4C”, which leverages crowdsourcing to address syntactic format inconsistencies in a single column effectively. We first ask crowd workers to create examples from single-column data through “data selection” and “result validation” tasks. Then, we propose and use a novel rule-based learning algorithm to infer the regular expressions that propagate formats from created examples to the entire column. Our system integrates crowdsourcing and algorithmic format extraction techniques in a single workflow. Having human experts write regular expressions is no longer required, thereby reducing both the time as well as the opportunity for error. We conducted experiments through both synthetic and real-world datasets, and our results show how the proposed approach is applicable and effective across data types and formats.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132269925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-aspect Diffusion Network Inference 多向扩散网络推理

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583228

Hao Huang, Ke‐qi Han, Beicheng Xu, Ting Gan

To learn influence relationships between nodes in a diffusion network, most existing approaches resort to precise timestamps of historical node infections. The target network is customarily assumed as an one-aspect diffusion network, with homogeneous influence relationships. Nonetheless, tracing node infection timestamps is often infeasible due to high cost, and the type of influence relationships may be heterogeneous because of the diversity of propagation media. In this work, we study how to infer a multi-aspect diffusion network with heterogeneous influence relationships, using only node infection statuses that are more readily accessible in practice. Equipped with a probabilistic generative model, we iteratively conduct a posteriori, quantitative analysis on historical diffusion results of the network, and infer the structure and strengths of homogeneous influence relationships in each aspect. Extensive experiments on both synthetic and real-world networks are conducted, and the results verify the effectiveness and efficiency of our approach.

为了了解扩散网络中节点之间的影响关系，大多数现有方法采用历史节点感染的精确时间戳。目标网络通常被假设为单向扩散网络，具有均匀的影响关系。然而，由于成本高，跟踪节点感染时间戳通常是不可行的，并且由于传播媒介的多样性，影响关系的类型可能是异构的。在这项工作中，我们研究了如何推断具有异质影响关系的多方面扩散网络，仅使用在实践中更容易获得的节点感染状态。利用概率生成模型，对网络的历史扩散结果进行后验、定量的迭代分析，推断出各方面同质影响关系的结构和强度。在合成网络和现实网络上进行了大量的实验，结果验证了我们方法的有效性和效率。

引用次数: 0

Is IPFS Ready for Decentralized Video Streaming? IPFS为去中心化视频流做好准备了吗?

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583404

Zhengyu Wu, ChengHao Ryan Yang, Santiago Vargas, A. Balasubramanian

InterPlanetary File System (IPFS) is a peer-to-peer protocol for decentralized content storage and retrieval. The IPFS platform has the potential to help users evade censorship and avoid a central point of failure. IPFS is seeing increasing adoption for distributing various kinds of files, including video. However, the performance of video streaming on IPFS has not been well-studied. We conduct a measurement study with over 28,000 videos hosted on the IPFS network and find that video streaming experiences high stall rates due to relatively high Round Trip Times (RTT). Further, videos are encoded using a single static quality, because of which streaming cannot adapt to different network conditions. A natural approach is to use adaptive bitrate (ABR) algorithms for streaming, which encode videos in multiple qualities and streams according to the throughput available. However, traditional ABR algorithms perform poorly on IPFS because the throughput cannot be estimated correctly. The main problem is that video segments can be retrieved from multiple sources, making it difficult to estimate the throughput. To overcome this issue, we have designed Telescope, an IPFS-aware ABR system. We conduct experiments on the IPFS network, where IPFS video providers are geographically distributed across the globe. Our results show that Telescope significantly improves the Quality of Experience (QoE) of videos, for a diverse set of network and cache conditions, compared to traditional ABR.

星际文件系统(IPFS)是用于分散内容存储和检索的点对点协议。IPFS平台有可能帮助用户逃避审查并避免中心故障点。IPFS被越来越多地用于分发各种文件，包括视频。然而，视频流在IPFS上的性能还没有得到很好的研究。我们对IPFS网络上托管的28,000多个视频进行了测量研究，发现由于相对较高的往返时间(RTT)，视频流的失速率很高。此外，视频使用单一的静态质量进行编码，因此流媒体无法适应不同的网络条件。一种自然的方法是使用自适应比特率(ABR)算法进行流媒体，该算法根据可用的吞吐量将视频编码为多种质量和流。然而，传统的ABR算法在IPFS上表现不佳，因为不能正确估计吞吐量。主要问题是视频片段可以从多个来源检索，这使得很难估计吞吐量。为了解决这个问题，我们设计了望远镜，一个ipfs感知的ABR系统。我们在IPFS网络上进行实验，IPFS视频提供商分布在全球各地。我们的研究结果表明，与传统的ABR相比，在不同的网络和缓存条件下，Telescope显著提高了视频的体验质量(QoE)。

{"title":"Is IPFS Ready for Decentralized Video Streaming?","authors":"Zhengyu Wu, ChengHao Ryan Yang, Santiago Vargas, A. Balasubramanian","doi":"10.1145/3543507.3583404","DOIUrl":"https://doi.org/10.1145/3543507.3583404","url":null,"abstract":"InterPlanetary File System (IPFS) is a peer-to-peer protocol for decentralized content storage and retrieval. The IPFS platform has the potential to help users evade censorship and avoid a central point of failure. IPFS is seeing increasing adoption for distributing various kinds of files, including video. However, the performance of video streaming on IPFS has not been well-studied. We conduct a measurement study with over 28,000 videos hosted on the IPFS network and find that video streaming experiences high stall rates due to relatively high Round Trip Times (RTT). Further, videos are encoded using a single static quality, because of which streaming cannot adapt to different network conditions. A natural approach is to use adaptive bitrate (ABR) algorithms for streaming, which encode videos in multiple qualities and streams according to the throughput available. However, traditional ABR algorithms perform poorly on IPFS because the throughput cannot be estimated correctly. The main problem is that video segments can be retrieved from multiple sources, making it difficult to estimate the throughput. To overcome this issue, we have designed Telescope, an IPFS-aware ABR system. We conduct experiments on the IPFS network, where IPFS video providers are geographically distributed across the globe. Our results show that Telescope significantly improves the Quality of Experience (QoE) of videos, for a diverse set of network and cache conditions, compared to traditional ABR.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132264502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Web Structure Derived Clustering for Optimised Web Accessibility Evaluation Web结构衍生聚类优化Web可访问性评价

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583508

Alexander Hambley, Y. Yeşilada, Markel Vigo, S. Harper

Web accessibility evaluation is a costly and complex process due to limited time, resources and ambiguity. To optimise the accessibility evaluation process, we aim to reduce the number of pages auditors must review by employing statistically representative pages, reducing a site of thousands of pages to a manageable review of archetypal pages. Our paper focuses on representativeness, one of six proposed metrics that form our methodology, to address the limitations we have identified with the W3C Website Accessibility Conformance Evaluation Methodology (WCAG-EM). These include the evaluative scope, the non-probabilistic sampling approach, and the potential for bias within the selected sample. Representativeness, in particular, is a metric to assess the quality and coverage of sampling. To measure this, we systematically evaluate five web page representations with a website of 388 pages, including tags, structure, the DOM tree, content, and a mixture of structure and content. Our findings highlight the importance of including structural components in representations. We validate our conclusions using the same methodology for three additional random sites of 500 pages. As an exclusive attribute, we find that features derived from web content are suboptimal and can lead to lower quality and more disparate clustering for optimised accessibility evaluation.

由于时间、资源和模糊性的限制，Web可访问性评估是一个昂贵而复杂的过程。为了优化可访问性评估过程，我们的目标是通过使用具有统计代表性的页面来减少审计人员必须审查的页面数量，将数千个页面的站点减少为可管理的原型页面审查。我们的论文主要关注代表性，这是构成我们方法的六个建议指标之一，以解决我们在W3C网站可访问性一致性评估方法(WCAG-EM)中发现的局限性。这些包括评估范围，非概率抽样方法，以及在所选样本内的潜在偏差。特别是代表性，是评估抽样质量和覆盖范围的度量标准。为了衡量这一点，我们系统地评估了包含388个页面的网站的五种网页表示，包括标签、结构、DOM树、内容以及结构和内容的混合。我们的研究结果强调了在表征中包含结构成分的重要性。我们用同样的方法对另外三个500页的随机网站验证了我们的结论。作为一个排他性的属性，我们发现从web内容衍生的特征是次优的，并且可能导致质量较低，并且在优化的可访问性评估中存在更多不同的聚类。

{"title":"Web Structure Derived Clustering for Optimised Web Accessibility Evaluation","authors":"Alexander Hambley, Y. Yeşilada, Markel Vigo, S. Harper","doi":"10.1145/3543507.3583508","DOIUrl":"https://doi.org/10.1145/3543507.3583508","url":null,"abstract":"Web accessibility evaluation is a costly and complex process due to limited time, resources and ambiguity. To optimise the accessibility evaluation process, we aim to reduce the number of pages auditors must review by employing statistically representative pages, reducing a site of thousands of pages to a manageable review of archetypal pages. Our paper focuses on representativeness, one of six proposed metrics that form our methodology, to address the limitations we have identified with the W3C Website Accessibility Conformance Evaluation Methodology (WCAG-EM). These include the evaluative scope, the non-probabilistic sampling approach, and the potential for bias within the selected sample. Representativeness, in particular, is a metric to assess the quality and coverage of sampling. To measure this, we systematically evaluate five web page representations with a website of 388 pages, including tags, structure, the DOM tree, content, and a mixture of structure and content. Our findings highlight the importance of including structural components in representations. We validate our conclusions using the same methodology for three additional random sites of 500 pages. As an exclusive attribute, we find that features derived from web content are suboptimal and can lead to lower quality and more disparate clustering for optimised accessibility evaluation.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134561834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Anti-FakeU: Defending Shilling Attacks on Graph Neural Network based Recommender Model 反假币:基于图神经网络推荐模型的先令攻击防御

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583289

X. You, Chi-Pan Li, Daizong Ding, Mi Zhang, Fuli Feng, Xudong Pan, Min Yang

Graph neural network (GNN) based recommendation models are observed to be more vulnerable against carefully-designed malicious records injected into the system, i.e., shilling attacks, which manipulate the recommendation to common users and therefore impair user trust. In this paper, we for the first time conduct a systematic study on the vulnerability of GNN based recommendation model against the shilling attack. With the aid of theoretical analysis, we attribute the root cause of the vulnerability to its neighborhood aggregation mechanism, which could make the negative impact of attacks propagate rapidly in the system. To restore the robustness of GNN based recommendation model, the key factor lies in detecting malicious records in the system and preventing the propagation of misinformation. To this end, we construct a user-user graph to capture the patterns of malicious behaviors and design a novel GNN based detector to identify fake users. Furthermore, we develop a data augmentation strategy and a joint learning paradigm to train the recommender model and the proposed detector. Extensive experiments on benchmark datasets validate the enhanced robustness of the proposed method in resisting various types of shilling attacks and identifying fake users, e.g., our proposed method fully mitigating the impact of popularity attacks on target items up to , and improving the accuracy of detecting fake users on the Gowalla dataset by .

基于图神经网络(GNN)的推荐模型更容易受到精心设计的恶意记录注入系统的攻击，即先令攻击，这种攻击会操纵对普通用户的推荐，从而损害用户信任。本文首次系统研究了基于GNN的推荐模型对先令攻击的脆弱性。通过理论分析，我们将漏洞产生的根本原因归结为其邻域聚集机制，该机制使得攻击的负面影响在系统中迅速传播。要恢复基于GNN的推荐模型的鲁棒性，关键在于检测系统中的恶意记录，防止错误信息的传播。为此，我们构建了一个用户-用户图来捕捉恶意行为的模式，并设计了一个新的基于GNN的检测器来识别假用户。此外，我们开发了一个数据增强策略和一个联合学习范例来训练推荐模型和所提出的检测器。在基准数据集上的大量实验验证了所提方法在抵抗各种类型的先令攻击和识别假用户方面增强的鲁棒性，例如，我们提出的方法完全减轻了流行攻击对目标项目的影响，并提高了Gowalla数据集上检测假用户的准确性。

{"title":"Anti-FakeU: Defending Shilling Attacks on Graph Neural Network based Recommender Model","authors":"X. You, Chi-Pan Li, Daizong Ding, Mi Zhang, Fuli Feng, Xudong Pan, Min Yang","doi":"10.1145/3543507.3583289","DOIUrl":"https://doi.org/10.1145/3543507.3583289","url":null,"abstract":"Graph neural network (GNN) based recommendation models are observed to be more vulnerable against carefully-designed malicious records injected into the system, i.e., shilling attacks, which manipulate the recommendation to common users and therefore impair user trust. In this paper, we for the first time conduct a systematic study on the vulnerability of GNN based recommendation model against the shilling attack. With the aid of theoretical analysis, we attribute the root cause of the vulnerability to its neighborhood aggregation mechanism, which could make the negative impact of attacks propagate rapidly in the system. To restore the robustness of GNN based recommendation model, the key factor lies in detecting malicious records in the system and preventing the propagation of misinformation. To this end, we construct a user-user graph to capture the patterns of malicious behaviors and design a novel GNN based detector to identify fake users. Furthermore, we develop a data augmentation strategy and a joint learning paradigm to train the recommender model and the proposed detector. Extensive experiments on benchmark datasets validate the enhanced robustness of the proposed method in resisting various types of shilling attacks and identifying fake users, e.g., our proposed method fully mitigating the impact of popularity attacks on target items up to , and improving the accuracy of detecting fake users on the Gowalla dataset by .","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129385378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0