Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.
{"title":"Optimizing Guided Traversal for Fast Learned Sparse Retrieval","authors":"Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang","doi":"10.1145/3543507.3583497","DOIUrl":"https://doi.org/10.1145/3543507.3583497","url":null,"abstract":"Recent studies show that BM25-driven dynamic index skipping can greatly accelerate MaxScore-based document retrieval based on the learned sparse representation derived by DeepImpact. This paper investigates the effectiveness of such a traversal guidance strategy during top k retrieval when using other models such as SPLADE and uniCOIL, and finds that unconstrained BM25-driven skipping could have a visible relevance degradation when the BM25 model is not well aligned with a learned weight model or when retrieval depth k is small. This paper generalizes the previous work and optimizes the BM25 guided index traversal with a two-level pruning control scheme and model alignment for fast retrieval using a sparse representation. Although there can be a cost of increased latency, the proposed scheme is much faster than the original MaxScore method without BM25 guidance while retaining the relevance effectiveness. This paper analyzes the competitiveness of this two-level pruning scheme, and evaluates its tradeoff in ranking relevance and time efficiency when searching several test datasets.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114444178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The user experience of mobile web video streaming is often impacted by insufficient and dynamic network bandwidth. In this paper, we design Bidirectionally Optimized Super-Resolution (BiSR) to improve the quality of experience (QoE) for mobile web users under limited bandwidth. BiSR exploits a deep neural network (DNN)-based model to super-resolve key frames efficiently without changing the inter-frame spatial-temporal information. We then propose a downscaling DNN and a mobile-specific optimized lightweight super-resolution DNN to enhance the performance. Finally, a novel reinforcement learning-based adaptive bitrate (ABR) algorithm is proposed to verify the performance of BiSR on real network traces. Our evaluation, using a full system implementation, shows that BiSR saves 26% of bitrate compared to the traditional H.264 codec and improves the SSIM of video by 3.7% compared to the prior state-of-the-art. Overall, BiSR enhances the user-perceived quality of experience by up to 30.6%.
{"title":"BiSR: Bidirectionally Optimized Super-Resolution for Mobile Video Streaming","authors":"Q. Yu, Qing Li, Rui He, Gareth Tyson, Wanxin Shi, Jianhui Lv, Zhenhui Yuan, Peng Zhang, Yulong Lan, Zhicheng Li","doi":"10.1145/3543507.3583519","DOIUrl":"https://doi.org/10.1145/3543507.3583519","url":null,"abstract":"The user experience of mobile web video streaming is often impacted by insufficient and dynamic network bandwidth. In this paper, we design Bidirectionally Optimized Super-Resolution (BiSR) to improve the quality of experience (QoE) for mobile web users under limited bandwidth. BiSR exploits a deep neural network (DNN)-based model to super-resolve key frames efficiently without changing the inter-frame spatial-temporal information. We then propose a downscaling DNN and a mobile-specific optimized lightweight super-resolution DNN to enhance the performance. Finally, a novel reinforcement learning-based adaptive bitrate (ABR) algorithm is proposed to verify the performance of BiSR on real network traces. Our evaluation, using a full system implementation, shows that BiSR saves 26% of bitrate compared to the traditional H.264 codec and improves the SSIM of video by 3.7% compared to the prior state-of-the-art. Overall, BiSR enhances the user-perceived quality of experience by up to 30.6%.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125579125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Signed graphs model complex relations using both positive and negative edges. Signed graph neural networks (SGNN) are powerful tools to analyze signed graphs. We address the vulnerability of SGNN to potential edge noise in the input graph. Our goal is to strengthen existing SGNN allowing them to withstand edge noises by extracting robust representations for signed graphs. First, we analyze the expressiveness of SGNN using an extended Weisfeiler-Lehman (WL) graph isomorphism test and identify the limitations to SGNN over triangles that are unbalanced. Then, we design some structure-based regularizers to be used in conjunction with an SGNN that highlight intrinsic properties of a signed graph. The tools and insights above allow us to propose a novel framework, Robust Signed Graph Neural Network (RSGNN), which adopts a dual architecture that simultaneously denoises the graph while learning node representations. We validate the performance of our model empirically on four real-world signed graph datasets, i.e., Bitcoin_OTC, Bitcoin_Alpha, Epinion and Slashdot, RSGNN can clearly improve the robustness of popular SGNN models. When the signed graphs are affected by random noise, our method outperforms baselines by up to 9.35% Binary-F1 for link sign prediction. Our implementation is available in PyTorch1.
{"title":"RSGNN: A Model-agnostic Approach for Enhancing the Robustness of Signed Graph Neural Networks","authors":"Zeyu Zhang, Jiamou Liu, Xianda Zheng, Yifei Wang, Pengqian Han, Yupan Wang, Kaiqi Zhao, Zijian Zhang","doi":"10.1145/3543507.3583221","DOIUrl":"https://doi.org/10.1145/3543507.3583221","url":null,"abstract":"Signed graphs model complex relations using both positive and negative edges. Signed graph neural networks (SGNN) are powerful tools to analyze signed graphs. We address the vulnerability of SGNN to potential edge noise in the input graph. Our goal is to strengthen existing SGNN allowing them to withstand edge noises by extracting robust representations for signed graphs. First, we analyze the expressiveness of SGNN using an extended Weisfeiler-Lehman (WL) graph isomorphism test and identify the limitations to SGNN over triangles that are unbalanced. Then, we design some structure-based regularizers to be used in conjunction with an SGNN that highlight intrinsic properties of a signed graph. The tools and insights above allow us to propose a novel framework, Robust Signed Graph Neural Network (RSGNN), which adopts a dual architecture that simultaneously denoises the graph while learning node representations. We validate the performance of our model empirically on four real-world signed graph datasets, i.e., Bitcoin_OTC, Bitcoin_Alpha, Epinion and Slashdot, RSGNN can clearly improve the robustness of popular SGNN models. When the signed graphs are affected by random noise, our method outperforms baselines by up to 9.35% Binary-F1 for link sign prediction. Our implementation is available in PyTorch1.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126236918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, many promising embedding models have been proposed to embed knowledge graphs (KGs) and their more general forms, such as n-ary relational data (NRD) and hyper-relational KG (HKG). To promote the data adaptability and performance of embedding models, KG searching methods propose to search for suitable models for a given KG data set. But they are restricted to a single KG form, and the searched models are restricted to a single type of embedding model. To tackle such issues, we propose to build a search space for the message function in graph neural networks (GNNs). However, it is a non-trivial task. Existing message function designs fix the structures and operators, which makes them difficult to handle different KG forms and data sets. Therefore, we first design a novel message function space, which enables both structures and operators to be searched for the given KG form (including KG, NRD, and HKG) and data. The proposed space can flexibly take different KG forms as inputs and is expressive to search for different types of embedding models. Especially, some existing message function designs and some classic KG embedding models can be instantiated as special cases of our space. We empirically show that the searched message functions are data-dependent, and can achieve leading performance on benchmark KGs, NRD, and HKGs.
{"title":"Message Function Search for Knowledge Graph Embedding","authors":"Shimin Di, Lei Chen","doi":"10.1145/3543507.3583546","DOIUrl":"https://doi.org/10.1145/3543507.3583546","url":null,"abstract":"Recently, many promising embedding models have been proposed to embed knowledge graphs (KGs) and their more general forms, such as n-ary relational data (NRD) and hyper-relational KG (HKG). To promote the data adaptability and performance of embedding models, KG searching methods propose to search for suitable models for a given KG data set. But they are restricted to a single KG form, and the searched models are restricted to a single type of embedding model. To tackle such issues, we propose to build a search space for the message function in graph neural networks (GNNs). However, it is a non-trivial task. Existing message function designs fix the structures and operators, which makes them difficult to handle different KG forms and data sets. Therefore, we first design a novel message function space, which enables both structures and operators to be searched for the given KG form (including KG, NRD, and HKG) and data. The proposed space can flexibly take different KG forms as inputs and is expressive to search for different types of embedding models. Especially, some existing message function designs and some classic KG embedding models can be instantiated as special cases of our space. We empirically show that the searched message functions are data-dependent, and can achieve leading performance on benchmark KGs, NRD, and HKGs.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126827698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhe Xie, Haowen Xu, Wenxiao Chen, Wanxue Li, Huai Jiang, Lang Su, Hanzhang Wang, Dan Pei
The microservice architecture is widely employed in large Internet systems. For each user request, a few of the microservices are called, and a trace is formed to record the tree-like call dependencies among microservices and the time consumption at each call node. Traces are useful in diagnosing system failures, but their complex structures make it difficult to model their patterns and detect their anomalies. In this paper, we propose a novel dual-variable graph variational autoencoder (VAE) for unsupervised anomaly detection on microservice traces. To reconstruct the time consumption of nodes, we propose a novel dispatching layer. We find that the inversion of negative log-likelihood (NLL) appears for some anomalous samples, which makes the anomaly score infeasible for anomaly detection. To address this, we point out that the NLL can be decomposed into KL-divergence and data entropy, whereas lower-dimensional anomalies can introduce an entropy gap with normal inputs. We propose three techniques to mitigate this entropy gap for trace anomaly detection: Bernoulli & Categorical Scaling, Node Count Normalization, and Gaussian Std-Limit. On five trace datasets from a top Internet company, our proposed TraceVAE achieves excellent F-scores.
{"title":"Unsupervised Anomaly Detection on Microservice Traces through Graph VAE","authors":"Zhe Xie, Haowen Xu, Wenxiao Chen, Wanxue Li, Huai Jiang, Lang Su, Hanzhang Wang, Dan Pei","doi":"10.1145/3543507.3583215","DOIUrl":"https://doi.org/10.1145/3543507.3583215","url":null,"abstract":"The microservice architecture is widely employed in large Internet systems. For each user request, a few of the microservices are called, and a trace is formed to record the tree-like call dependencies among microservices and the time consumption at each call node. Traces are useful in diagnosing system failures, but their complex structures make it difficult to model their patterns and detect their anomalies. In this paper, we propose a novel dual-variable graph variational autoencoder (VAE) for unsupervised anomaly detection on microservice traces. To reconstruct the time consumption of nodes, we propose a novel dispatching layer. We find that the inversion of negative log-likelihood (NLL) appears for some anomalous samples, which makes the anomaly score infeasible for anomaly detection. To address this, we point out that the NLL can be decomposed into KL-divergence and data entropy, whereas lower-dimensional anomalies can introduce an entropy gap with normal inputs. We propose three techniques to mitigate this entropy gap for trace anomaly detection: Bernoulli & Categorical Scaling, Node Count Normalization, and Gaussian Std-Limit. On five trace datasets from a top Internet company, our proposed TraceVAE achieves excellent F-scores.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122768153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
People often use disease or symptom terms on social media and online forums in ways other than to describe their health. Thus the NLP health mention classification (HMC) task aims to identify posts where users are discussing health conditions literally, not figuratively. Existing computational research typically only studies health mentions within well-represented groups in developed nations. Developing countries with limited health surveillance abilities fail to benefit from such data to manage public health crises. To advance the HMC research and benefit more diverse populations, we present the Nairaland health mention dataset (NHMD), a new dataset collected from a dedicated web forum for Nigerians. NHMD consists of 7,763 manually labelled posts extracted based on four prevalent diseases (HIV/AIDS, Malaria, Stroke and Tuberculosis) in Nigeria. With NHMD, we conduct extensive experiments using current state-of-the-art models for HMC and identify that, compared to existing public datasets, NHMD contains out-of-distribution examples. Hence, it is well suited for domain adaptation studies. The introduction of the NHMD dataset imposes better diversity coverage of vulnerable populations and generalisation for HMC tasks in a global public health surveillance setting. Additionally, we present a novel multi-task learning approach for HMC tasks by combining literal word meaning prediction as an auxiliary task. Experimental results demonstrate that the proposed approach outperforms state-of-the-art methods statistically significantly (p < 0.01, Wilcoxon test) in terms of F1 score over the state-of-the-art and shows that our new dataset poses a strong challenge to the existing HMC methods.
{"title":"Improving Health Mention Classification Through Emphasising Literal Meanings: A Study Towards Diversity and Generalisation for Public Health Surveillance","authors":"O. T. Aduragba, Jialin Yu, A. Cristea, Yang Long","doi":"10.1145/3543507.3583877","DOIUrl":"https://doi.org/10.1145/3543507.3583877","url":null,"abstract":"People often use disease or symptom terms on social media and online forums in ways other than to describe their health. Thus the NLP health mention classification (HMC) task aims to identify posts where users are discussing health conditions literally, not figuratively. Existing computational research typically only studies health mentions within well-represented groups in developed nations. Developing countries with limited health surveillance abilities fail to benefit from such data to manage public health crises. To advance the HMC research and benefit more diverse populations, we present the Nairaland health mention dataset (NHMD), a new dataset collected from a dedicated web forum for Nigerians. NHMD consists of 7,763 manually labelled posts extracted based on four prevalent diseases (HIV/AIDS, Malaria, Stroke and Tuberculosis) in Nigeria. With NHMD, we conduct extensive experiments using current state-of-the-art models for HMC and identify that, compared to existing public datasets, NHMD contains out-of-distribution examples. Hence, it is well suited for domain adaptation studies. The introduction of the NHMD dataset imposes better diversity coverage of vulnerable populations and generalisation for HMC tasks in a global public health surveillance setting. Additionally, we present a novel multi-task learning approach for HMC tasks by combining literal word meaning prediction as an auxiliary task. Experimental results demonstrate that the proposed approach outperforms state-of-the-art methods statistically significantly (p < 0.01, Wilcoxon test) in terms of F1 score over the state-of-the-art and shows that our new dataset poses a strong challenge to the existing HMC methods.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122768913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaibin Wang, Qiang He, Feifei Chen, Hai Jin, Yun Yang
Federated learning (FL) has been widely acknowledged as a promising solution to training machine learning (ML) model training with privacy preservation. To reduce the traffic overheads incurred by FL systems, edge servers have been included between clients and the parameter server to aggregate clients’ local models. Recent studies on this edge-assisted hierarchical FL scheme have focused on ensuring or accelerating model convergence by coping with various factors, e.g., uncertain network conditions, unreliable clients, heterogeneous compute resources, etc. This paper presents our three new discoveries of the edge-assisted hierarchical FL scheme: 1) it wastes significant time during its two-phase training rounds; 2) it does not recognize or utilize model diversity when producing a global model; and 3) it is vulnerable to model poisoning attacks. To overcome these drawbacks, we propose FedEdge, a novel edge-assisted hierarchical FL scheme that accelerates model training with asynchronous local federated training and adaptive model aggregation. Extensive experiments are conducted on two widely-used public datasets. The results demonstrate that, compared with state-of-the-art FL schemes, FedEdge accelerates model convergence by 1.14 × −3.20 ×, and improves model accuracy by 2.14% - 6.63%.
{"title":"FedEdge: Accelerating Edge-Assisted Federated Learning","authors":"Kaibin Wang, Qiang He, Feifei Chen, Hai Jin, Yun Yang","doi":"10.1145/3543507.3583264","DOIUrl":"https://doi.org/10.1145/3543507.3583264","url":null,"abstract":"Federated learning (FL) has been widely acknowledged as a promising solution to training machine learning (ML) model training with privacy preservation. To reduce the traffic overheads incurred by FL systems, edge servers have been included between clients and the parameter server to aggregate clients’ local models. Recent studies on this edge-assisted hierarchical FL scheme have focused on ensuring or accelerating model convergence by coping with various factors, e.g., uncertain network conditions, unreliable clients, heterogeneous compute resources, etc. This paper presents our three new discoveries of the edge-assisted hierarchical FL scheme: 1) it wastes significant time during its two-phase training rounds; 2) it does not recognize or utilize model diversity when producing a global model; and 3) it is vulnerable to model poisoning attacks. To overcome these drawbacks, we propose FedEdge, a novel edge-assisted hierarchical FL scheme that accelerates model training with asynchronous local federated training and adaptive model aggregation. Extensive experiments are conducted on two widely-used public datasets. The results demonstrate that, compared with state-of-the-art FL schemes, FedEdge accelerates model convergence by 1.14 × −3.20 ×, and improves model accuracy by 2.14% - 6.63%.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121517630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequential recommendation aims to offer potentially interesting products to users by capturing their historical sequence of interacted items. Although it has facilitated extensive physical scenarios, sequential recommendation for multi-modal sequences has long been neglected. Multi-modal data that depicts a user’s historical interactions exists ubiquitously, such as product pictures, textual descriptions, and interacted item sequences, providing semantic information from multiple perspectives that comprehensively describe a user’s preferences. However, existing sequential recommendation methods either fail to directly handle multi-modality or suffer from high computational complexity. To address this, we propose a novel Multi-Modal Multi-Layer Perceptron (MMMLP) for maintaining multi-modal sequences for sequential recommendation. MMMLP is a purely MLP-based architecture that consists of three modules - the Feature Mixer Layer, Fusion Mixer Layer, and Prediction Layer - and has an edge on both efficacy and efficiency. Extensive experiments show that MMMLP achieves state-of-the-art performance with linear complexity. We also conduct ablating analysis to verify the contribution of each component. Furthermore, compatible experiments are devised, and the results show that the multi-modal representation learned by our proposed model generally benefits other recommendation models, emphasizing our model’s ability to handle multi-modal information. We have made our code available online to ease reproducibility1.
{"title":"MMMLP: Multi-modal Multilayer Perceptron for Sequential Recommendations","authors":"Jiahao Liang, Xiangyu Zhao, Muyang Li, Zijian Zhang, Wanyu Wang, Haochen Liu, Zitao Liu","doi":"10.1145/3543507.3583378","DOIUrl":"https://doi.org/10.1145/3543507.3583378","url":null,"abstract":"Sequential recommendation aims to offer potentially interesting products to users by capturing their historical sequence of interacted items. Although it has facilitated extensive physical scenarios, sequential recommendation for multi-modal sequences has long been neglected. Multi-modal data that depicts a user’s historical interactions exists ubiquitously, such as product pictures, textual descriptions, and interacted item sequences, providing semantic information from multiple perspectives that comprehensively describe a user’s preferences. However, existing sequential recommendation methods either fail to directly handle multi-modality or suffer from high computational complexity. To address this, we propose a novel Multi-Modal Multi-Layer Perceptron (MMMLP) for maintaining multi-modal sequences for sequential recommendation. MMMLP is a purely MLP-based architecture that consists of three modules - the Feature Mixer Layer, Fusion Mixer Layer, and Prediction Layer - and has an edge on both efficacy and efficiency. Extensive experiments show that MMMLP achieves state-of-the-art performance with linear complexity. We also conduct ablating analysis to verify the contribution of each component. Furthermore, compatible experiments are devised, and the results show that the multi-modal representation learned by our proposed model generally benefits other recommendation models, emphasizing our model’s ability to handle multi-modal information. We have made our code available online to ease reproducibility1.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125002148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Subset selection tasks, arise in recommendation systems and search engines and ask to select a subset of items that maximize the value for the user. The values of subsets often display diminishing returns, and hence, submodular functions have been used to model them. If the inputs defining the submodular function are known, then existing algorithms can be used. In many applications, however, inputs have been observed to have social biases that reduce the utility of the output subset. Hence, interventions to improve the utility are desired. Prior works focus on maximizing linear functions—a special case of submodular functions—and show that fairness constraint-based interventions can not only ensure proportional representation but also achieve near-optimal utility in the presence of biases. We study the maximization of a family of submodular functions that capture functions arising in the aforementioned applications. Our first result is that, unlike linear functions, constraint-based interventions cannot guarantee any constant fraction of the optimal utility for this family of submodular functions. Our second result is an algorithm for submodular maximization. The algorithm provably outputs subsets that have near-optimal utility for this family under mild assumptions and that proportionally represent items from each group. In empirical evaluation, with both synthetic and real-world data, we observe that this algorithm improves the utility of the output subset for this family of submodular functions over baselines.
{"title":"Maximizing Submodular Functions for Recommendation in the Presence of Biases","authors":"Anay Mehrotra, Nisheeth K. Vishnoi","doi":"10.1145/3543507.3583195","DOIUrl":"https://doi.org/10.1145/3543507.3583195","url":null,"abstract":"Subset selection tasks, arise in recommendation systems and search engines and ask to select a subset of items that maximize the value for the user. The values of subsets often display diminishing returns, and hence, submodular functions have been used to model them. If the inputs defining the submodular function are known, then existing algorithms can be used. In many applications, however, inputs have been observed to have social biases that reduce the utility of the output subset. Hence, interventions to improve the utility are desired. Prior works focus on maximizing linear functions—a special case of submodular functions—and show that fairness constraint-based interventions can not only ensure proportional representation but also achieve near-optimal utility in the presence of biases. We study the maximization of a family of submodular functions that capture functions arising in the aforementioned applications. Our first result is that, unlike linear functions, constraint-based interventions cannot guarantee any constant fraction of the optimal utility for this family of submodular functions. Our second result is an algorithm for submodular maximization. The algorithm provably outputs subsets that have near-optimal utility for this family under mild assumptions and that proportionally represent items from each group. In empirical evaluation, with both synthetic and real-world data, we observe that this algorithm improves the utility of the output subset for this family of submodular functions over baselines.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123185659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The goal of relation extraction (RE) is to extract the semantic relations between/among entities in the text. As a fundamental task in information systems, it is crucial to ensure the robustness of RE models. Despite the high accuracy current deep neural models have achieved in RE tasks, they are easily affected by spurious correlations. One solution to this problem is to train the model with counterfactually augmented data (CAD) such that it can learn the causation rather than the confounding. However, no attempt has been made on generating counterfactuals for RE tasks. In this paper, we formulate the problem of automatically generating CAD for RE tasks from an entity-centric viewpoint, and develop a novel approach to derive contextual counterfactuals for entities. Specifically, we exploit two elementary topological properties, i.e., the centrality and the shortest path, in syntactic and semantic dependency graphs, to first identify and then intervene on the contextual causal features for entities. We conduct a comprehensive evaluation on four RE datasets by combining our proposed approach with a variety of RE backbones. Results prove that our approach not only improves the performance of the backbones but also makes them more robust in the out-of-domain test 1.
{"title":"Towards Model Robustness: Generating Contextual Counterfactuals for Entities in Relation Extraction","authors":"Mi Zhang, T. Qian, Ting Zhang, Xin Miao","doi":"10.1145/3543507.3583504","DOIUrl":"https://doi.org/10.1145/3543507.3583504","url":null,"abstract":"The goal of relation extraction (RE) is to extract the semantic relations between/among entities in the text. As a fundamental task in information systems, it is crucial to ensure the robustness of RE models. Despite the high accuracy current deep neural models have achieved in RE tasks, they are easily affected by spurious correlations. One solution to this problem is to train the model with counterfactually augmented data (CAD) such that it can learn the causation rather than the confounding. However, no attempt has been made on generating counterfactuals for RE tasks. In this paper, we formulate the problem of automatically generating CAD for RE tasks from an entity-centric viewpoint, and develop a novel approach to derive contextual counterfactuals for entities. Specifically, we exploit two elementary topological properties, i.e., the centrality and the shortest path, in syntactic and semantic dependency graphs, to first identify and then intervene on the contextual causal features for entities. We conduct a comprehensive evaluation on four RE datasets by combining our proposed approach with a variety of RE backbones. Results prove that our approach not only improves the performance of the backbones but also makes them more robust in the out-of-domain test 1.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126286392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}