Proceedings of the ACM Web Conference 2023最新文献_第3页

Addressing Heterophily in Graph Anomaly Detection: A Perspective of Graph Spectrum 图异常检测中的异构性寻址:图谱的视角

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583268

Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, Yongdong Zhang

Graph anomaly detection (GAD) suffers from heterophily — abnormal nodes are sparse so that they are connected to vast normal nodes. The current solutions upon Graph Neural Networks (GNNs) blindly smooth the representation of neiboring nodes, thus undermining the discriminative information of the anomalies. To alleviate the issue, recent studies identify and discard inter-class edges through estimating and comparing the node-level representation similarity. However, the representation of a single node can be misleading when the prediction error is high, thus hindering the performance of the edge indicator. In graph signal processing, the smoothness index is a widely adopted metric which plays the role of frequency in classical spectral analysis. Considering the ground truth Y to be a signal on graph, the smoothness index is equivalent to the value of the heterophily ratio. From this perspective, we aim to address the heterophily problem in the spectral domain. First, we point out that heterophily is positively associated with the frequency of a graph. Towards this end, we could prune inter-class edges by simply emphasizing and delineating the high-frequency components of the graph. Recall that graph Laplacian is a high-pass filter, we adopt it to measure the extent of 1-hop label changing of the center node and indicate high-frequency components. As GAD can be formulated as a semi-supervised binary classification problem, only part of the nodes are labeled. As an alternative, we use the prediction of the nodes to estimate it. Through our analysis, we show that prediction errors are less likely to affect the identification process. Extensive empirical evaluations on four benchmarks demonstrate the effectiveness of the indicator over popular homophilic, heterophilic, and tailored fraud detection methods. Our proposed indicator can effectively reduce the heterophily degree of the graph, thus boosting the overall GAD performance. Codes are open-sourced in https://github.com/blacksingular/GHRN.

图异常检测(GAD)存在异质性——异常节点稀疏，因此它们与大量正常节点相连。目前基于图神经网络(gnn)的解决方案盲目地平滑了相邻节点的表示，从而破坏了异常的判别信息。为了缓解这一问题，最近的研究通过估计和比较节点级表示相似度来识别和丢弃类间边缘。然而，当预测误差较大时，单个节点的表示可能会产生误导，从而影响边缘指示器的性能。在图信号处理中，平滑度指标是一种被广泛采用的度量，它在经典的频谱分析中扮演着频率的角色。考虑地面真值Y是图上的一个信号，平滑度指标等于异方差比的值。从这个角度来看，我们的目标是解决谱域的杂性问题。首先，我们指出异亲性与图的频率呈正相关。为此，我们可以通过简单地强调和描绘图的高频成分来修剪类间边缘。回想一下，图拉普拉斯是一个高通滤波器，我们用它来测量中心节点的1跳标签变化的程度，并表示高频成分。由于GAD可以表述为半监督二值分类问题，因此只有部分节点被标记。作为替代方案，我们使用节点的预测来估计它。通过我们的分析，我们表明预测误差不太可能影响识别过程。对四个基准的广泛实证评估表明，该指标优于流行的同性、异性恋和量身定制的欺诈检测方法。我们提出的指标可以有效地降低图的异质性程度，从而提高GAD的整体性能。代码在https://github.com/blacksingular/GHRN中开源。

{"title":"Addressing Heterophily in Graph Anomaly Detection: A Perspective of Graph Spectrum","authors":"Yuan Gao, Xiang Wang, Xiangnan He, Zhenguang Liu, Huamin Feng, Yongdong Zhang","doi":"10.1145/3543507.3583268","DOIUrl":"https://doi.org/10.1145/3543507.3583268","url":null,"abstract":"Graph anomaly detection (GAD) suffers from heterophily — abnormal nodes are sparse so that they are connected to vast normal nodes. The current solutions upon Graph Neural Networks (GNNs) blindly smooth the representation of neiboring nodes, thus undermining the discriminative information of the anomalies. To alleviate the issue, recent studies identify and discard inter-class edges through estimating and comparing the node-level representation similarity. However, the representation of a single node can be misleading when the prediction error is high, thus hindering the performance of the edge indicator. In graph signal processing, the smoothness index is a widely adopted metric which plays the role of frequency in classical spectral analysis. Considering the ground truth Y to be a signal on graph, the smoothness index is equivalent to the value of the heterophily ratio. From this perspective, we aim to address the heterophily problem in the spectral domain. First, we point out that heterophily is positively associated with the frequency of a graph. Towards this end, we could prune inter-class edges by simply emphasizing and delineating the high-frequency components of the graph. Recall that graph Laplacian is a high-pass filter, we adopt it to measure the extent of 1-hop label changing of the center node and indicate high-frequency components. As GAD can be formulated as a semi-supervised binary classification problem, only part of the nodes are labeled. As an alternative, we use the prediction of the nodes to estimate it. Through our analysis, we show that prediction errors are less likely to affect the identification process. Extensive empirical evaluations on four benchmarks demonstrate the effectiveness of the indicator over popular homophilic, heterophilic, and tailored fraud detection methods. Our proposed indicator can effectively reduce the heterophily degree of the graph, thus boosting the overall GAD performance. Codes are open-sourced in https://github.com/blacksingular/GHRN.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131447467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A First Look at Public Service Websites from the Affordability Lens 从可负担性的角度看公共服务网站

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583415

Rumaisa Habib, Aimen Inam, Ayesha Ali, I. Qazi, Z. Qazi

Public service websites act as official gateways to services provided by governments. Many of these websites are essential for citizens to receive reliable information and online government services. However, the lack of affordability of mobile broadband services in many developing countries and the rising complexity of websites create barriers for citizens in accessing these government websites. This paper presents the first large-scale analysis of the affordability of public service websites in developing countries. We do this by collecting a corpus of 1900 public service websites, including public websites from nine developing countries and for comparison websites from nine developed countries. Our investigation is driven by website complexity analysis as well as evaluation through a recently proposed affordability index. Our analysis reveals that, in general, public service websites in developing countries do not meet the affordability target set by the UN’s Broadband Commission. However, we show that several countries can be brought within or closer to the affordability target by implementing webpage optimizations to reduce page sizes. We also discuss policy interventions that can help make access to public service website more affordable.

公共服务网站是政府提供服务的官方门户。许多此类网站对公民获得可靠信息和在线政府服务至关重要。然而，在许多发展中国家，移动宽带服务缺乏可负担性以及网站的日益复杂给公民访问这些政府网站造成了障碍。本文首次对发展中国家公共服务网站的可负担性进行了大规模分析。为此，我们收集了1900个公共服务网站的语料库，其中包括9个发展中国家的公共网站和9个发达国家的比较网站。我们的调查是基于网站的复杂性分析以及通过最近提出的可负担性指数进行的评估。我们的分析表明，总体而言，发展中国家的公共服务网站没有达到联合国宽带委员会设定的可负担性目标。然而，我们表明，一些国家可以通过实施网页优化来减少页面大小来达到或接近可负担性目标。我们还讨论了政策干预，可以帮助访问公共服务网站更实惠。

{"title":"A First Look at Public Service Websites from the Affordability Lens","authors":"Rumaisa Habib, Aimen Inam, Ayesha Ali, I. Qazi, Z. Qazi","doi":"10.1145/3543507.3583415","DOIUrl":"https://doi.org/10.1145/3543507.3583415","url":null,"abstract":"Public service websites act as official gateways to services provided by governments. Many of these websites are essential for citizens to receive reliable information and online government services. However, the lack of affordability of mobile broadband services in many developing countries and the rising complexity of websites create barriers for citizens in accessing these government websites. This paper presents the first large-scale analysis of the affordability of public service websites in developing countries. We do this by collecting a corpus of 1900 public service websites, including public websites from nine developing countries and for comparison websites from nine developed countries. Our investigation is driven by website complexity analysis as well as evaluation through a recently proposed affordability index. Our analysis reveals that, in general, public service websites in developing countries do not meet the affordability target set by the UN’s Broadband Commission. However, we show that several countries can be brought within or closer to the affordability target by implementing webpage optimizations to reduce page sizes. We also discuss policy interventions that can help make access to public service website more affordable.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133794414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping Flood Exposure, Damage, and Population Needs Using Remote and Social Sensing: A Case Study of 2022 Pakistan Floods 利用遥感和社会遥感绘制洪水暴露、损害和人口需求:以2022年巴基斯坦洪水为例

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583881

Zainab Akhtar, Umair Qazi, Rizwan Sadiq, Aya El-Sakka, M. Sajjad, Ferda Ofli, Muhammad Imran

The devastating 2022 floods in Pakistan resulted in a catastrophe impacting millions of people and destroying thousands of homes. While disaster management efforts were taken, crisis responders struggled to understand the country-wide flood extent, population exposure, urgent needs of affected people, and various types of damage. To tackle this challenge, we leverage remote and social sensing with geospatial data using state-of-the-art machine learning techniques for text and image processing. Our satellite-based analysis over a one-month period (25 Aug–25 Sep) revealed that 11.48% of Pakistan was inundated. When combined with geospatial data, this meant 18.9 million people were at risk across 160 districts in Pakistan, with adults constituting 50% of the exposed population. Our social sensing data analysis surfaced 106.7k reports pertaining to deaths, injuries, and concerns of the affected people. To understand the urgent needs of the affected population, we analyzed tweet texts and found that South Karachi, Chitral and North Waziristan required the most basic necessities like food and shelter. Further analysis of tweet images revealed that Lasbela, Rajanpur, and Jhal Magsi had the highest damage reports normalized by their population. These damage reports were found to correlate strongly with affected people reports and need reports, achieving an R-Square of 0.96 and 0.94, respectively. Our extensive study shows that combining remote sensing, social sensing, and geospatial data can provide accurate and timely information during a disaster event, which is crucial in prioritizing areas for immediate and gradual response.

2022年巴基斯坦发生的毁灭性洪水造成了一场灾难，影响了数百万人，摧毁了数千座房屋。虽然采取了灾害管理措施，但危机应对人员仍在努力了解全国范围内的洪水程度、人口暴露情况、受影响人群的紧急需求以及各种类型的损失。为了应对这一挑战，我们利用遥感和社会遥感与地理空间数据，使用最先进的机器学习技术进行文本和图像处理。我们在一个月期间(8月25日至9月25日)的卫星分析显示，巴基斯坦11.48%的地区被淹没。结合地理空间数据，这意味着巴基斯坦160个地区的1890万人处于危险之中，其中成年人占暴露人口的50%。我们的社会感知数据分析显示了106.7万份与死亡、受伤和受影响人群关注有关的报告。为了了解受灾人口的迫切需求，我们分析了推特文本，发现卡拉奇南部、吉德拉尔和瓦济里斯坦北部需要食物和住所等最基本的必需品。对推特图像的进一步分析显示，拉斯拉贝拉、拉詹普尔和杰哈尔马格西的损失报告按其人口标准化计算最高。发现这些损害报告与受影响人员报告和需求报告密切相关，r平方分别为0.96和0.94。我们的广泛研究表明，结合遥感、社会传感和地理空间数据可以在灾害事件中提供准确和及时的信息，这对于确定需要立即和逐步响应的优先区域至关重要。

{"title":"Mapping Flood Exposure, Damage, and Population Needs Using Remote and Social Sensing: A Case Study of 2022 Pakistan Floods","authors":"Zainab Akhtar, Umair Qazi, Rizwan Sadiq, Aya El-Sakka, M. Sajjad, Ferda Ofli, Muhammad Imran","doi":"10.1145/3543507.3583881","DOIUrl":"https://doi.org/10.1145/3543507.3583881","url":null,"abstract":"The devastating 2022 floods in Pakistan resulted in a catastrophe impacting millions of people and destroying thousands of homes. While disaster management efforts were taken, crisis responders struggled to understand the country-wide flood extent, population exposure, urgent needs of affected people, and various types of damage. To tackle this challenge, we leverage remote and social sensing with geospatial data using state-of-the-art machine learning techniques for text and image processing. Our satellite-based analysis over a one-month period (25 Aug–25 Sep) revealed that 11.48% of Pakistan was inundated. When combined with geospatial data, this meant 18.9 million people were at risk across 160 districts in Pakistan, with adults constituting 50% of the exposed population. Our social sensing data analysis surfaced 106.7k reports pertaining to deaths, injuries, and concerns of the affected people. To understand the urgent needs of the affected population, we analyzed tweet texts and found that South Karachi, Chitral and North Waziristan required the most basic necessities like food and shelter. Further analysis of tweet images revealed that Lasbela, Rajanpur, and Jhal Magsi had the highest damage reports normalized by their population. These damage reports were found to correlate strongly with affected people reports and need reports, achieving an R-Square of 0.96 and 0.94, respectively. Our extensive study shows that combining remote sensing, social sensing, and geospatial data can provide accurate and timely information during a disaster event, which is crucial in prioritizing areas for immediate and gradual response.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114805101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Breaking Filter Bubble: A Reinforcement Learning Framework of Controllable Recommender System 破滤泡:一种可控推荐系统的强化学习框架

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583856

Zhenyan Li, Yancheng Dong, Chen Gao, Yizhou Zhao, Dong Li, Jianye Hao, Kai Zhang, Yong Li, Zhi Wang

In the information-overloaded era of the Web, recommender systems that provide personalized content filtering are now the mainstream portal for users to access Web information. Recommender systems deploy machine learning models to learn users’ preferences from collected historical data, leading to more centralized recommendation results due to the feedback loop. As a result, it will harm the ranking of content outside the narrowed scope and limit the options seen by users. In this work, we first conduct data analysis from a graph view to observe that the users’ feedback is restricted to limited items, verifying the phenomenon of centralized recommendation. We further develop a general simulation framework to derive the procedure of the recommender system, including data collection, model learning, and item exposure, which forms a loop. To address the filter bubble issue under the feedback loop, we then propose a general and easy-to-use reinforcement learning-based method, which can adaptively select few but effective connections between nodes from different communities as the exposure list. We conduct extensive experiments in the simulation framework based on large-scale real-world datasets. The results demonstrate that our proposed reinforcement learning-based control method can serve as an effective solution to alleviate the filter bubble and the separated communities induced by it. We believe the proposed framework of controllable recommendation in this work can inspire not only the researchers of recommender systems, but also a broader community concerned with artificial intelligence algorithms’ impact on humanity, especially for those vulnerable populations on the Web.

在信息超载的Web时代，提供个性化内容过滤的推荐系统是目前用户获取Web信息的主流门户。推荐系统部署机器学习模型，从收集的历史数据中学习用户的偏好，由于反馈循环，导致更集中的推荐结果。因此，它将损害缩小范围之外的内容的排名，并限制用户看到的选项。在这项工作中，我们首先从图形视图进行数据分析，观察到用户的反馈被限制在有限的项目上，验证了集中式推荐的现象。我们进一步开发了一个通用的仿真框架来推导推荐系统的过程，包括数据收集、模型学习和项目曝光，形成一个循环。为了解决反馈回路下的过滤气泡问题，我们提出了一种通用且易于使用的基于强化学习的方法，该方法可以自适应地选择不同社区节点之间较少但有效的连接作为暴露列表。我们在基于大规模真实世界数据集的模拟框架中进行了广泛的实验。结果表明，我们提出的基于强化学习的控制方法可以有效地缓解过滤泡及其引起的分离群落。我们相信，本文提出的可控推荐框架不仅可以激励推荐系统的研究人员，还可以激励更广泛的社区关注人工智能算法对人类的影响，特别是对网络上的弱势群体的影响。

{"title":"Breaking Filter Bubble: A Reinforcement Learning Framework of Controllable Recommender System","authors":"Zhenyan Li, Yancheng Dong, Chen Gao, Yizhou Zhao, Dong Li, Jianye Hao, Kai Zhang, Yong Li, Zhi Wang","doi":"10.1145/3543507.3583856","DOIUrl":"https://doi.org/10.1145/3543507.3583856","url":null,"abstract":"In the information-overloaded era of the Web, recommender systems that provide personalized content filtering are now the mainstream portal for users to access Web information. Recommender systems deploy machine learning models to learn users’ preferences from collected historical data, leading to more centralized recommendation results due to the feedback loop. As a result, it will harm the ranking of content outside the narrowed scope and limit the options seen by users. In this work, we first conduct data analysis from a graph view to observe that the users’ feedback is restricted to limited items, verifying the phenomenon of centralized recommendation. We further develop a general simulation framework to derive the procedure of the recommender system, including data collection, model learning, and item exposure, which forms a loop. To address the filter bubble issue under the feedback loop, we then propose a general and easy-to-use reinforcement learning-based method, which can adaptively select few but effective connections between nodes from different communities as the exposure list. We conduct extensive experiments in the simulation framework based on large-scale real-world datasets. The results demonstrate that our proposed reinforcement learning-based control method can serve as an effective solution to alleviate the filter bubble and the separated communities induced by it. We believe the proposed framework of controllable recommendation in this work can inspire not only the researchers of recommender systems, but also a broader community concerned with artificial intelligence algorithms’ impact on humanity, especially for those vulnerable populations on the Web.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116946580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

MassNE: Exploring Higher-Order Interactions with Marginal Effect for Massive Battle Outcome Prediction 大规模战斗结果预测的边际效应探索高阶交互作用

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583390

Yin Gu, Kai Zhang, Qi Liu, Xin Lin, Zhenya Huang, Enhong Chen

In online games, predicting massive battle outcomes is a fundamental task of many applications, such as team optimization and tactical formulation. Existing works do not pay adequate attention to the massive battle. They either seek to evaluate individuals in isolation or mine simple pair-wise interactions between individuals, neither of which effectively captures the intricate interactions between massive units (e.g., individuals). Furthermore, as the team size increases, the phenomenon of diminishing marginal utility of units emerges. Such a diminishing pattern is rarely noticed in previous work, and how to capture it from data remains a challenge. To this end, we propose a novel Massive battle outcome predictor with margiNal Effect modules, namely MassNE, which comprehensively incorporates individual effects, cooperation effects (i.e., intra-team interactions) and suppression effects (i.e., inter-team interactions) for predicting battle outcomes. Specifically, we design marginal effect modules to learn how units’ marginal utility changing respect to their number, where the monotonicity assumption is applied to ensure rationality. In addition, we evaluate the current classical models and provide mathematical proofs that MassNE is able to generalize several earlier works in massive settings. Massive battle datasets generated by StarCraft II APIs are adopted to evaluate the performances of MassNE. Extensive experiments empirically demonstrate the effectiveness of MassNE, and MassNE can reveal reasonable cooperation effects, suppression effects, and marginal utilities of combat units from the data.

在网络游戏中，预测大规模战斗结果是许多应用程序的基本任务，例如团队优化和战术制定。现有的作品没有对大规模的战斗给予足够的重视。他们要么寻求孤立地评估个体，要么挖掘个体之间简单的成对相互作用，这两种方法都无法有效地捕捉到大单位(例如个体)之间复杂的相互作用。此外，随着团队规模的增加，出现了单位边际效用递减的现象。在以前的工作中很少注意到这种递减模式，如何从数据中捕获它仍然是一个挑战。为此，我们提出了一种具有边际效应模块的大规模战斗结果预测器，即MassNE，它综合了个体效应、合作效应(即团队内互动)和抑制效应(即团队间互动)来预测战斗结果。具体来说，我们设计了边际效应模块来学习单位的边际效用如何随其数量而变化，其中使用单调性假设来确保合理性。此外，我们评估了当前的经典模型，并提供了数学证明，证明MassNE能够在大规模环境中推广一些早期的工作。采用星际争霸II api生成的大量战斗数据集来评估MassNE的性能。大量的实验经验证明了MassNE的有效性，MassNE可以从数据中揭示作战单位的合理合作效应、抑制效应和边际效用。

{"title":"MassNE: Exploring Higher-Order Interactions with Marginal Effect for Massive Battle Outcome Prediction","authors":"Yin Gu, Kai Zhang, Qi Liu, Xin Lin, Zhenya Huang, Enhong Chen","doi":"10.1145/3543507.3583390","DOIUrl":"https://doi.org/10.1145/3543507.3583390","url":null,"abstract":"In online games, predicting massive battle outcomes is a fundamental task of many applications, such as team optimization and tactical formulation. Existing works do not pay adequate attention to the massive battle. They either seek to evaluate individuals in isolation or mine simple pair-wise interactions between individuals, neither of which effectively captures the intricate interactions between massive units (e.g., individuals). Furthermore, as the team size increases, the phenomenon of diminishing marginal utility of units emerges. Such a diminishing pattern is rarely noticed in previous work, and how to capture it from data remains a challenge. To this end, we propose a novel Massive battle outcome predictor with margiNal Effect modules, namely MassNE, which comprehensively incorporates individual effects, cooperation effects (i.e., intra-team interactions) and suppression effects (i.e., inter-team interactions) for predicting battle outcomes. Specifically, we design marginal effect modules to learn how units’ marginal utility changing respect to their number, where the monotonicity assumption is applied to ensure rationality. In addition, we evaluate the current classical models and provide mathematical proofs that MassNE is able to generalize several earlier works in massive settings. Massive battle datasets generated by StarCraft II APIs are adopted to evaluate the performances of MassNE. Extensive experiments empirically demonstrate the effectiveness of MassNE, and MassNE can reveal reasonable cooperation effects, suppression effects, and marginal utilities of combat units from the data.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117228726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GNNs and Graph Generative models for biomedical applications 生物医学应用的gnn和图生成模型

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3593049

M. Vazirgiannis

Graph generative models are recently gaining significant interest in current application domains. They are commonly used to model social networks, knowledge graphs, and protein-protein interaction networks. In this talk we will present the potential of graph generative models and our recent relevant efforts in the biomedical domain. More specifically we present a novel architecture that generates medical records as graphs with privacy guarantees. We capitalize and modify the graph Variational autoencoders (VAEs) architecture. We train the generative model with the well known MIMIC medical database and achieve generated data that are very similar to the real ones yet provide privacy guarantees. We also develop new GNNs for predicting antibiotic resistance and other protein related downstream tasks such as enzymes classifications and Gene Ontology classification. We achieve there as well promising results with potential for future application in broader biomedical related tasks. Finally we present future research directions for multi modal generative models involving graphs.

图生成模型最近在当前的应用领域获得了极大的兴趣。它们通常用于建立社会网络、知识图和蛋白质-蛋白质相互作用网络的模型。在这次演讲中，我们将介绍图形生成模型的潜力以及我们最近在生物医学领域的相关工作。更具体地说，我们提出了一种新的架构，将医疗记录生成为具有隐私保证的图形。我们对图变分自编码器(VAEs)体系结构进行了资本化和修改。我们使用著名的MIMIC医学数据库对生成模型进行训练，生成的数据与真实数据非常相似，同时提供隐私保障。我们还开发了新的gnn用于预测抗生素耐药性和其他蛋白质相关的下游任务，如酶分类和基因本体分类。我们也取得了有希望的结果，未来有可能应用于更广泛的生物医学相关任务。最后，对涉及图的多模态生成模型的未来研究方向进行了展望。

引用次数: 0

Catch: Collaborative Feature Set Search for Automated Feature Engineering 捕获:自动化特征工程的协同特征集搜索

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583527

Guoshan Lu, Haobo Wang, Saisai Yang, Jing Yuan, Guozheng Yang, Cheng Zang, Gang Chen, J. Zhao

Feature engineering often plays a crucial role in building mining systems for tabular data, which traditionally requires experienced human experts to perform. Thanks to the rapid advances in reinforcement learning, it has offered an automated alternative, i.e. automated feature engineering (AutoFE). In this work, through scrutiny of the prior AutoFE methods, we characterize several research challenges that remained in this regime, concerning system-wide efficiency, efficacy, and practicality toward production. We then propose Catch, a full-fledged new AutoFE framework that comprehensively addresses the aforementioned challenges. The core to Catch composes a hierarchical-policy reinforcement learning scheme that manifests a collaborative feature engineering exploration and exploitation grounded on the granularity of the whole feature set. At a higher level of the hierarchy, a decision-making module controls the post-processing of the attained feature engineering transformation. We extensively experiment with Catch on 26 academic standardized tabular datasets and 9 industrialized real-world datasets. Measured by numerous metrics and analyses, Catch establishes a new state-of-the-art, from perspectives performance, latency as well as its practicality towards production. Source code1 can be found at https://github.com/1171000709/Catch.

特征工程通常在构建表格数据挖掘系统中起着至关重要的作用，传统上这需要有经验的人类专家来执行。由于强化学习的快速发展，它提供了一个自动化的替代方案，即自动化特征工程(AutoFE)。在这项工作中，通过对先前AutoFE方法的审查，我们描述了在该制度下仍然存在的几个研究挑战，涉及系统范围的效率，功效和生产的实用性。然后，我们提出Catch，这是一个全面解决上述挑战的完整的新AutoFE框架。Catch的核心组成了一个分层策略强化学习方案，该方案体现了基于整个特征集粒度的协作特征工程探索和利用。在层次结构的更高层次上，决策模块控制所获得的特征工程转换的后处理。我们在26个学术标准化表格数据集和9个工业化真实世界数据集上广泛地实验了Catch。通过大量的指标和分析，Catch从性能、延迟和生产实用性的角度建立了一个新的技术水平。源代码1可以在https://github.com/1171000709/Catch上找到。

{"title":"Catch: Collaborative Feature Set Search for Automated Feature Engineering","authors":"Guoshan Lu, Haobo Wang, Saisai Yang, Jing Yuan, Guozheng Yang, Cheng Zang, Gang Chen, J. Zhao","doi":"10.1145/3543507.3583527","DOIUrl":"https://doi.org/10.1145/3543507.3583527","url":null,"abstract":"Feature engineering often plays a crucial role in building mining systems for tabular data, which traditionally requires experienced human experts to perform. Thanks to the rapid advances in reinforcement learning, it has offered an automated alternative, i.e. automated feature engineering (AutoFE). In this work, through scrutiny of the prior AutoFE methods, we characterize several research challenges that remained in this regime, concerning system-wide efficiency, efficacy, and practicality toward production. We then propose Catch, a full-fledged new AutoFE framework that comprehensively addresses the aforementioned challenges. The core to Catch composes a hierarchical-policy reinforcement learning scheme that manifests a collaborative feature engineering exploration and exploitation grounded on the granularity of the whole feature set. At a higher level of the hierarchy, a decision-making module controls the post-processing of the attained feature engineering transformation. We extensively experiment with Catch on 26 academic standardized tabular datasets and 9 industrialized real-world datasets. Measured by numerous metrics and analyses, Catch establishes a new state-of-the-art, from perspectives performance, latency as well as its practicality towards production. Source code1 can be found at https://github.com/1171000709/Catch.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117206012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Combining Worker Factors for Heterogeneous Crowd Task Assignment 结合工人因素的异构人群任务分配

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583190

S. Wijenayake, Danula Hettiachchi, Jorge Gonçalves

Optimising the assignment of tasks to workers is an effective approach to ensure high quality in crowdsourced data - particularly in heterogeneous micro tasks. However, previous attempts at heterogeneous micro task assignment based on worker characteristics are limited to using cognitive skills, despite literature emphasising that worker performance varies based on other parameters. This study is an initial step towards understanding whether and how multiple parameters such as cognitive skills, mood, personality, alertness, comprehension skill, and social and physical context of workers can be leveraged in tandem to improve worker performance estimations in heterogeneous micro tasks. Our predictive models indicate that these parameters have varying effects on worker performance in the five task types considered – sentiment analysis, classification, transcription, named entity recognition and bounding box. Moreover, we note 0.003 - 0.018 reduction in mean absolute error of predicted worker accuracy across all tasks, when task assignment is based on models that consider all parameters vs. models that only consider workers’ cognitive skills. Our findings pave the way for the use of holistic approaches in micro task assignment that effectively quantify worker context.

优化工作人员的任务分配是确保众包数据高质量的有效方法，特别是在异构微任务中。然而，尽管文献强调工人的绩效会根据其他参数而变化，但之前基于工人特征的异构微任务分配的尝试仅限于使用认知技能。本研究是了解员工的认知技能、情绪、个性、警觉性、理解技能、社会和身体环境等多重参数是否以及如何协同利用，以提高员工在异构微任务中的绩效评估的第一步。我们的预测模型表明，这些参数在考虑的五种任务类型(情感分析、分类、转录、命名实体识别和边界框)中对工人绩效有不同的影响。此外，我们注意到，当任务分配基于考虑所有参数的模型而不是仅考虑工人认知技能的模型时，所有任务中预测工人准确性的平均绝对误差减少了0.003 - 0.018。我们的发现为在微观任务分配中使用整体方法铺平了道路，这种方法可以有效地量化员工环境。

{"title":"Combining Worker Factors for Heterogeneous Crowd Task Assignment","authors":"S. Wijenayake, Danula Hettiachchi, Jorge Gonçalves","doi":"10.1145/3543507.3583190","DOIUrl":"https://doi.org/10.1145/3543507.3583190","url":null,"abstract":"Optimising the assignment of tasks to workers is an effective approach to ensure high quality in crowdsourced data - particularly in heterogeneous micro tasks. However, previous attempts at heterogeneous micro task assignment based on worker characteristics are limited to using cognitive skills, despite literature emphasising that worker performance varies based on other parameters. This study is an initial step towards understanding whether and how multiple parameters such as cognitive skills, mood, personality, alertness, comprehension skill, and social and physical context of workers can be leveraged in tandem to improve worker performance estimations in heterogeneous micro tasks. Our predictive models indicate that these parameters have varying effects on worker performance in the five task types considered – sentiment analysis, classification, transcription, named entity recognition and bounding box. Moreover, we note 0.003 - 0.018 reduction in mean absolute error of predicted worker accuracy across all tasks, when task assignment is based on models that consider all parameters vs. models that only consider workers’ cognitive skills. Our findings pave the way for the use of holistic approaches in micro task assignment that effectively quantify worker context.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128273319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Match4Match: Enhancing Text-Video Retrieval by Maximum Flow with Minimum Cost Match4Match:以最小代价最大流量增强文本视频检索

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583365

Zhongjie Duan, Chengyu Wang, Cen Chen, Wenmeng Zhou, Jun Huang, Weining Qian

With the explosive growth of video and text data on the web, text-video retrieval has become a vital task for online video platforms. Recently, text-video retrieval methods based on pre-trained models have attracted a lot of attention. However, existing methods cannot effectively capture the fine-grained information in videos, and typically suffer from the hubness problem where a collection of similar videos are retrieved by a large number of different queries. In this paper, we propose Match4Match, a new text-video retrieval method based on CLIP (Contrastive Language-Image Pretraining) and graph optimization theories. To balance calculation efficiency and model accuracy, Match4Match seamlessly supports three inference modes for different application scenarios. In fast vector retrieval mode, we embed texts and videos in the same space and employ a vector retrieval engine to obtain the top K videos. In fine-grained alignment mode, our method fully utilizes the pre-trained knowledge of the CLIP model to align words with corresponding video frames, and uses the fine-grained information to compute text-video similarity more accurately. In flow-style matching mode, to alleviate the detrimental impact of the hubness problem, we model the retrieval problem as a combinatorial optimization problem and solve it using maximum flow with minimum cost algorithm. To demonstrate the effectiveness of our method, we conduct experiments on five public text-video datasets. The overall performance of our proposed method outperforms state-of-the-art methods. Additionally, we evaluate the computational efficiency of Match4Match. Benefiting from the three flexible inference modes, Match4Match can respond to a large number of query requests with low latency or achieve high recall with acceptable time consumption.

随着网络视频和文本数据的爆炸式增长，文本视频检索已成为网络视频平台的重要任务。近年来，基于预训练模型的文本视频检索方法备受关注。然而，现有的方法不能有效地捕获视频中的细粒度信息，并且通常存在hub问题，即通过大量不同的查询检索一组相似的视频。本文提出了一种基于CLIP(对比语言-图像预训练)和图优化理论的文本视频检索方法Match4Match。为了平衡计算效率和模型精度，Match4Match无缝支持三种不同应用场景的推理模式。在快速矢量检索模式中，我们将文本和视频嵌入到同一空间中，并使用矢量检索引擎获得top K的视频。在细粒度对齐模式下，我们的方法充分利用CLIP模型的预训练知识将单词与相应的视频帧对齐，并利用细粒度信息更准确地计算文本-视频相似度。在流型匹配模式下，为了减轻轮毂问题的不利影响，我们将检索问题建模为组合优化问题，并采用最大流量最小代价算法进行求解。为了证明我们方法的有效性，我们在五个公共文本视频数据集上进行了实验。我们提出的方法的总体性能优于最先进的方法。此外，我们还评估了Match4Match的计算效率。得益于这三种灵活的推理模式，Match4Match可以以低延迟响应大量查询请求，或者以可接受的时间消耗实现高召回。

{"title":"Match4Match: Enhancing Text-Video Retrieval by Maximum Flow with Minimum Cost","authors":"Zhongjie Duan, Chengyu Wang, Cen Chen, Wenmeng Zhou, Jun Huang, Weining Qian","doi":"10.1145/3543507.3583365","DOIUrl":"https://doi.org/10.1145/3543507.3583365","url":null,"abstract":"With the explosive growth of video and text data on the web, text-video retrieval has become a vital task for online video platforms. Recently, text-video retrieval methods based on pre-trained models have attracted a lot of attention. However, existing methods cannot effectively capture the fine-grained information in videos, and typically suffer from the hubness problem where a collection of similar videos are retrieved by a large number of different queries. In this paper, we propose Match4Match, a new text-video retrieval method based on CLIP (Contrastive Language-Image Pretraining) and graph optimization theories. To balance calculation efficiency and model accuracy, Match4Match seamlessly supports three inference modes for different application scenarios. In fast vector retrieval mode, we embed texts and videos in the same space and employ a vector retrieval engine to obtain the top K videos. In fine-grained alignment mode, our method fully utilizes the pre-trained knowledge of the CLIP model to align words with corresponding video frames, and uses the fine-grained information to compute text-video similarity more accurately. In flow-style matching mode, to alleviate the detrimental impact of the hubness problem, we model the retrieval problem as a combinatorial optimization problem and solve it using maximum flow with minimum cost algorithm. To demonstrate the effectiveness of our method, we conduct experiments on five public text-video datasets. The overall performance of our proposed method outperforms state-of-the-art methods. Additionally, we evaluate the computational efficiency of Match4Match. Benefiting from the three flexible inference modes, Match4Match can respond to a large number of query requests with low latency or achieve high recall with acceptable time consumption.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129916676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Attentional Multi-scale Co-evolving Model for Dynamic Link Prediction 动态链路预测的注意多尺度协同进化模型

Proceedings of the ACM Web Conference 2023

Pub Date : 2023-04-30 DOI: 10.1145/3543507.3583396

Guozhen Zhang, Tian Ye, Depeng Jin, Yong Li

Dynamic link prediction is essential for a wide range of domains, including social networks, bioinformatics, knowledge bases, and recommender systems. Existing works have demonstrated that structural information and temporal information are two of the most important information for this problem. However, existing works either focus on modeling them independently or modeling the temporal dynamics of a single structural scale, neglecting the complex correlations among them. This paper proposes to model the inherent correlations among the evolving dynamics of different structural scales for dynamic link prediction. Following this idea, we propose an Attentional Multi-scale Co-evolving Network (AMCNet). Specifically, We model multi-scale structural information by a motif-based graph neural network with multi-scale pooling. Then, we design a hierarchical attention-based sequence-to-sequence model for learning the complex correlations among the evolution dynamics of different structural scales. Extensive experiments on four real-world datasets with different characteristics demonstrate that AMCNet significantly outperforms the state-of-the-art in both single-step and multi-step dynamic link prediction tasks.

动态链接预测在广泛的领域是必不可少的，包括社会网络、生物信息学、知识库和推荐系统。已有的研究表明，结构信息和时间信息是解决这一问题的两个最重要的信息。然而，现有的研究要么是对它们进行独立建模，要么是对单个结构尺度的时间动态建模，而忽略了它们之间的复杂关联。本文提出建立不同结构尺度演化动力学之间的内在关联模型，用于动态链接预测。基于这一思想，我们提出了一个关注多尺度协同进化网络(AMCNet)。具体来说，我们利用基于多尺度池化的图形神经网络对多尺度结构信息进行建模。然后，我们设计了一个基于层次注意的序列到序列模型来学习不同结构尺度的进化动态之间的复杂关联。在四个具有不同特征的真实数据集上进行的大量实验表明，AMCNet在单步和多步动态链路预测任务中都明显优于最先进的方法。

{"title":"An Attentional Multi-scale Co-evolving Model for Dynamic Link Prediction","authors":"Guozhen Zhang, Tian Ye, Depeng Jin, Yong Li","doi":"10.1145/3543507.3583396","DOIUrl":"https://doi.org/10.1145/3543507.3583396","url":null,"abstract":"Dynamic link prediction is essential for a wide range of domains, including social networks, bioinformatics, knowledge bases, and recommender systems. Existing works have demonstrated that structural information and temporal information are two of the most important information for this problem. However, existing works either focus on modeling them independently or modeling the temporal dynamics of a single structural scale, neglecting the complex correlations among them. This paper proposes to model the inherent correlations among the evolving dynamics of different structural scales for dynamic link prediction. Following this idea, we propose an Attentional Multi-scale Co-evolving Network (AMCNet). Specifically, We model multi-scale structural information by a motif-based graph neural network with multi-scale pooling. Then, we design a hierarchical attention-based sequence-to-sequence model for learning the complex correlations among the evolution dynamics of different structural scales. Extensive experiments on four real-world datasets with different characteristics demonstrate that AMCNet significantly outperforms the state-of-the-art in both single-step and multi-step dynamic link prediction tasks.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130070890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1