Personalized search returns each user more accurate results by collecting the user’s historical search behaviors to infer her interests and query intents. However, it brings the risk of user privacy leakage, and this may greatly limit the practical application of personalized search. In this paper, we focus on the problem of privacy protection in personalized search, and propose a privacy protection enhanced personalized search framework, denoted with FedPS. Under this framework, we keep each user’s private data on her individual client, and train a shared personalized ranking model with all users’ decentralized data by means of federated learning. We implement two models within the framework: the first one applies a personalization model with a personal module that fits the user’s data distribution to alleviate the challenge of data heterogeneity in federated learning; the second model introduces trustworthy proxies and group servers to solve the problems of limited communication, performance bottleneck and privacy attack for FedPS. Experimental results verify that our proposed framework can enhance privacy protection without losing too much accuracy.
{"title":"FedPS: A Privacy Protection Enhanced Personalized Search Framework","authors":"Jing Yao, Zhicheng Dou, Ji-rong Wen","doi":"10.1145/3442381.3449936","DOIUrl":"https://doi.org/10.1145/3442381.3449936","url":null,"abstract":"Personalized search returns each user more accurate results by collecting the user’s historical search behaviors to infer her interests and query intents. However, it brings the risk of user privacy leakage, and this may greatly limit the practical application of personalized search. In this paper, we focus on the problem of privacy protection in personalized search, and propose a privacy protection enhanced personalized search framework, denoted with FedPS. Under this framework, we keep each user’s private data on her individual client, and train a shared personalized ranking model with all users’ decentralized data by means of federated learning. We implement two models within the framework: the first one applies a personalization model with a personal module that fits the user’s data distribution to alleviate the challenge of data heterogeneity in federated learning; the second model introduces trustworthy proxies and group servers to solve the problems of limited communication, performance bottleneck and privacy attack for FedPS. Experimental results verify that our proposed framework can enhance privacy protection without losing too much accuracy.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125361772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.
{"title":"Insightful Dimensionality Reduction with Very Low Rank Variable Subsets","authors":"Bruno Ordozgoiti, Sachith Pai, M. Kołczyńska","doi":"10.1145/3442381.3450067","DOIUrl":"https://doi.org/10.1145/3442381.3450067","url":null,"abstract":"Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121479600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Liang, Yonghao Fu, Lin Gui, Min Yang, Jiachen Du, Yulan He, Ruifeng Xu
Target plays an essential role in stance detection of an opinionated review/claim, since the stance expressed in the text often depends on the target. In practice, we need to deal with targets unseen in the annotated training data. As such, detecting stance for an unknown or unseen target is an important research problem. This paper presents a novel approach that automatically identifies and adapts the target-dependent and target-independent roles that a word plays with respect to a specific target in stance expressions, so as to achieve cross-target stance detection. More concretely, we explore a novel solution of constructing heterogeneous target-adaptive pragmatics dependency graphs (TPDG) for each sentence towards a given target. An in-target graph is constructed to produce inherent pragmatics dependencies of words for a distinct target. In addition, another cross-target graph is constructed to develop the versatility of words across all targets for boosting the learning of dominant word-level stance expressions available to an unknown target. A novel graph-aware model with interactive Graphical Convolutional Network (GCN) blocks is developed to derive the target-adaptive graph representation of the context for stance detection. The experimental results on a number of benchmark datasets show that our proposed model outperforms state-of-the-art methods in cross-target stance detection.
{"title":"Target-adaptive Graph for Cross-target Stance Detection","authors":"Bin Liang, Yonghao Fu, Lin Gui, Min Yang, Jiachen Du, Yulan He, Ruifeng Xu","doi":"10.1145/3442381.3449790","DOIUrl":"https://doi.org/10.1145/3442381.3449790","url":null,"abstract":"Target plays an essential role in stance detection of an opinionated review/claim, since the stance expressed in the text often depends on the target. In practice, we need to deal with targets unseen in the annotated training data. As such, detecting stance for an unknown or unseen target is an important research problem. This paper presents a novel approach that automatically identifies and adapts the target-dependent and target-independent roles that a word plays with respect to a specific target in stance expressions, so as to achieve cross-target stance detection. More concretely, we explore a novel solution of constructing heterogeneous target-adaptive pragmatics dependency graphs (TPDG) for each sentence towards a given target. An in-target graph is constructed to produce inherent pragmatics dependencies of words for a distinct target. In addition, another cross-target graph is constructed to develop the versatility of words across all targets for boosting the learning of dominant word-level stance expressions available to an unknown target. A novel graph-aware model with interactive Graphical Convolutional Network (GCN) blocks is developed to derive the target-adaptive graph representation of the context for stance detection. The experimental results on a number of benchmark datasets show that our proposed model outperforms state-of-the-art methods in cross-target stance detection.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122574695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Symbolic execution has been widely applied in detecting vulnerabilities in web applications. Modeling language-specific built-in functions is essential for symbolic execution. Since built-in functions tend to be complicated and are typically implemented in low-level languages, a common strategy is to manually translate them into the SMT-LIB language for constraint solving. Such translation requires an excessive amount of human effort and deep understandings of the function behaviors. Incorrect translation can invalidate the final results. This problem aggravates in PHP applications because of their cross-language nature, i.e., , the built-in functions are written in C, but the rest code is in PHP. In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions for symbolic execution. We synthesize C programs by transforming the constraint solving task in PHP symbolic execution into a C-compliant format and integrating them with C implementations of the built-in functions. We apply symbolic execution on the synthesized C program to find a feasible path, which gives a solution that can be applied to the original PHP constraints. In this way, we automate the modeling of built-in functions in PHP applications. We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results demonstrate that our automated method is more accurate with a higher function coverage, and can exploit a similar number of vulnerabilities. Our empirical analysis also shows that the manual and automated methods have different strengths, which complement each other in certain scenarios. Therefore, the best practice is to combine both of them to optimize the accuracy, correctness, and coverage of symbolic execution.
{"title":"On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution","authors":"Penghui Li, W. Meng, Kangjie Lu, Changhua Luo","doi":"10.1145/3442381.3450002","DOIUrl":"https://doi.org/10.1145/3442381.3450002","url":null,"abstract":"Symbolic execution has been widely applied in detecting vulnerabilities in web applications. Modeling language-specific built-in functions is essential for symbolic execution. Since built-in functions tend to be complicated and are typically implemented in low-level languages, a common strategy is to manually translate them into the SMT-LIB language for constraint solving. Such translation requires an excessive amount of human effort and deep understandings of the function behaviors. Incorrect translation can invalidate the final results. This problem aggravates in PHP applications because of their cross-language nature, i.e., , the built-in functions are written in C, but the rest code is in PHP. In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions for symbolic execution. We synthesize C programs by transforming the constraint solving task in PHP symbolic execution into a C-compliant format and integrating them with C implementations of the built-in functions. We apply symbolic execution on the synthesized C program to find a feasible path, which gives a solution that can be applied to the original PHP constraints. In this way, we automate the modeling of built-in functions in PHP applications. We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results demonstrate that our automated method is more accurate with a higher function coverage, and can exploit a similar number of vulnerabilities. Our empirical analysis also shows that the manual and automated methods have different strengths, which complement each other in certain scenarios. Therefore, the best practice is to combine both of them to optimize the accuracy, correctness, and coverage of symbolic execution.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123092585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Papadakis, G. Mandilaras, N. Mamoulis, Manolis Koubarakis
Geospatial data constitute a considerable part of Semantic Web data, but at the moment, its sources are inadequately interlinked with topological relations in the Linked Open Data cloud. Geospatial Interlinking covers this gap with batch techniques that are restricted to individual topological relations, even though most operations are common for all main relations. In this work, we introduce a batch algorithm that simultaneously computes all topological relations and define the task of Progressive Geospatial Interlinking, which produces results in a pay-as-you-go manner when the available computational or temporal resources are limited. We propose two progressive algorithms and conduct a thorough experimental study over large, real datasets, demonstrating the superiority of our techniques over the current state-of-the-art.
{"title":"Progressive, Holistic Geospatial Interlinking","authors":"G. Papadakis, G. Mandilaras, N. Mamoulis, Manolis Koubarakis","doi":"10.1145/3442381.3449850","DOIUrl":"https://doi.org/10.1145/3442381.3449850","url":null,"abstract":"Geospatial data constitute a considerable part of Semantic Web data, but at the moment, its sources are inadequately interlinked with topological relations in the Linked Open Data cloud. Geospatial Interlinking covers this gap with batch techniques that are restricted to individual topological relations, even though most operations are common for all main relations. In this work, we introduce a batch algorithm that simultaneously computes all topological relations and define the task of Progressive Geospatial Interlinking, which produces results in a pay-as-you-go manner when the available computational or temporal resources are limited. We propose two progressive algorithms and conduct a thorough experimental study over large, real datasets, demonstrating the superiority of our techniques over the current state-of-the-art.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"443 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122232137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Luggen, J. Audiffren, D. Difallah, P. Cudré-Mauroux
Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity – that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) presents multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. Finally, we make Wiki2Prop available as a property recommender system that can be activated and used directly in the context of a Wikidata entity page.
{"title":"Wiki2Prop: A Multimodal Approach for Predicting Wikidata Properties from Wikipedia","authors":"Michael Luggen, J. Audiffren, D. Difallah, P. Cudré-Mauroux","doi":"10.1145/3442381.3450082","DOIUrl":"https://doi.org/10.1145/3442381.3450082","url":null,"abstract":"Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity – that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) presents multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. Finally, we make Wiki2Prop available as a property recommender system that can be activated and used directly in the context of a Wikidata entity page.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125825625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhao Gao, Haoyu Wang, Li Li, Xiapu Luo, Guoai Xu, Xuanzhe Liu
Mobile gambling app, as a new type of online gambling service emerging in the mobile era, has become one of the most popular and lucrative underground businesses in the mobile app ecosystem. Since its born, mobile gambling app has received strict regulations from both government authorities and app markets. However, to the best of our knowledge, mobile gambling apps have not been investigated by our research community. In this paper, we take the first step to fill the void. Specifically, we first perform a 5-month dataset collection process to harvest illegal gambling apps in China, where mobile gambling apps are outlawed. We have collected 3,366 unique gambling apps with 5,344 different versions. We then characterize the gambling apps from various perspectives including app distribution channels, network infrastructure, malicious behaviors, abused third-party and payment services. Our work has revealed a number of covert distribution channels, the unique characteristics of gambling apps, and the abused fourth-party payment services. At last, we further propose a “guilt-by-association” expansion method to identify new suspicious gambling services, which help us further identify over 140K suspicious gambling domains and over 57K gambling app candidates. Our study demonstrates the urgency for detecting and regulating illegal gambling apps.
{"title":"Demystifying Illegal Mobile Gambling Apps","authors":"Yuhao Gao, Haoyu Wang, Li Li, Xiapu Luo, Guoai Xu, Xuanzhe Liu","doi":"10.1145/3442381.3449932","DOIUrl":"https://doi.org/10.1145/3442381.3449932","url":null,"abstract":"Mobile gambling app, as a new type of online gambling service emerging in the mobile era, has become one of the most popular and lucrative underground businesses in the mobile app ecosystem. Since its born, mobile gambling app has received strict regulations from both government authorities and app markets. However, to the best of our knowledge, mobile gambling apps have not been investigated by our research community. In this paper, we take the first step to fill the void. Specifically, we first perform a 5-month dataset collection process to harvest illegal gambling apps in China, where mobile gambling apps are outlawed. We have collected 3,366 unique gambling apps with 5,344 different versions. We then characterize the gambling apps from various perspectives including app distribution channels, network infrastructure, malicious behaviors, abused third-party and payment services. Our work has revealed a number of covert distribution channels, the unique characteristics of gambling apps, and the abused fourth-party payment services. At last, we further propose a “guilt-by-association” expansion method to identify new suspicious gambling services, which help us further identify over 140K suspicious gambling domains and over 57K gambling app candidates. Our study demonstrates the urgency for detecting and regulating illegal gambling apps.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125916787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaqi Li, Xuan Wu, Chang Lu, Wenxing Deng, Yizheng Zhao
This paper tackles the problem of computing views of OWL ontologies using a forgetting-based approach. In traditional relational databases, a view is a subset of a database, whereas in ontologies, a view is more than a subset; it contains not only axioms contained in the original ontology, but may also contain newly-derived axioms entailed by the original ontology (implicitly contained in the original ontology). Specifically, given an ontology , the signature of is the set of all the names in , and a view of is a new ontology obtained from using only part of ’s signature, namely the target signature, while preserving all logical entailments up to the target signature. Computing views of OWL ontologies is useful for Semantic Web applications such as ontology-based query answering, in a way that the view can be used as a substitute of the original ontology to answer queries formulated with the target signature, and information hiding, in the sense that it restricts users from viewing certain information of an ontology. Forgetting is a form of non-standard reasoning concerned with eliminating from an ontology a subset of its signature, namely the forgetting signature, in such a way that all logical entailments are preserved up to the target signature. Forgetting can thus be used as a means for computing views of OWL ontologies — the solution of forgetting a set of names from an ontology is the view of for the target signature . In this paper, we present a forgetting-based method for computing views of OWL ontologies specified in the description logic , the basic extended with role hierarchy, nominals and inverse roles. The method is terminating and sound. Despite the method not being complete, an evaluation with a prototype implementation of the method on a corpus of real-world ontologies has shown very good success rates. This is very useful from the perspective of the Semantic Web, as it provides knowledge engineers with a powerful tool for creating views of OWL ontologies.
{"title":"Computing Views of OWL Ontologies for the Semantic Web","authors":"Jiaqi Li, Xuan Wu, Chang Lu, Wenxing Deng, Yizheng Zhao","doi":"10.1145/3442381.3449881","DOIUrl":"https://doi.org/10.1145/3442381.3449881","url":null,"abstract":"This paper tackles the problem of computing views of OWL ontologies using a forgetting-based approach. In traditional relational databases, a view is a subset of a database, whereas in ontologies, a view is more than a subset; it contains not only axioms contained in the original ontology, but may also contain newly-derived axioms entailed by the original ontology (implicitly contained in the original ontology). Specifically, given an ontology , the signature of is the set of all the names in , and a view of is a new ontology obtained from using only part of ’s signature, namely the target signature, while preserving all logical entailments up to the target signature. Computing views of OWL ontologies is useful for Semantic Web applications such as ontology-based query answering, in a way that the view can be used as a substitute of the original ontology to answer queries formulated with the target signature, and information hiding, in the sense that it restricts users from viewing certain information of an ontology. Forgetting is a form of non-standard reasoning concerned with eliminating from an ontology a subset of its signature, namely the forgetting signature, in such a way that all logical entailments are preserved up to the target signature. Forgetting can thus be used as a means for computing views of OWL ontologies — the solution of forgetting a set of names from an ontology is the view of for the target signature . In this paper, we present a forgetting-based method for computing views of OWL ontologies specified in the description logic , the basic extended with role hierarchy, nominals and inverse roles. The method is terminating and sound. Despite the method not being complete, an evaluation with a prototype implementation of the method on a corpus of real-world ontologies has shown very good success rates. This is very useful from the perspective of the Semantic Web, as it provides knowledge engineers with a powerful tool for creating views of OWL ontologies.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126106266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fanghua Ye, Zhiwei Lin, Chuan Chen, Zibin Zheng, Hong Huang
The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.
{"title":"Outlier-Resilient Web Service QoS Prediction","authors":"Fanghua Ye, Zhiwei Lin, Chuan Chen, Zibin Zheng, Hong Huang","doi":"10.1145/3442381.3449938","DOIUrl":"https://doi.org/10.1145/3442381.3449938","url":null,"abstract":"The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114253342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph Neural Networks (GNNs) have drawn considerable attention in recent years and achieved outstanding performance in many tasks. Most empirical studies of GNNs assume that the observed graph represents a complete and accurate picture of node relationship. However, this fundamental assumption cannot always be satisfied, since the real-world graphs from complex systems are error-prone and may not be compatible with the properties of GNNs. Therefore, GNNs solely relying on original graph may cause unsatisfactory results, one typical example of which is that GNNs perform well on graphs with homophily while fail on the disassortative situation. In this paper, we propose graph estimation neural networks GEN, which estimates graph structure for GNNs. Specifically, our GEN presents a structure model to fit the mechanism of GNNs by generating graphs with community structure, and an observation model that injects multifaceted observations into calculating the posterior distribution of graphs and is the first to incorporate multi-order neighborhood information. With above two models, the estimation of graph is implemented based on Bayesian inference to maximize the posterior probability, which attains mutual optimization with GNN parameters in an iterative framework. To comprehensively evaluate the performance of GEN, we perform a set of experiments on several benchmark datasets with different homophily and a synthetic dataset, where the experimental results demonstrate the effectiveness of our GEN and rationality of the estimated graph.
{"title":"Graph Structure Estimation Neural Networks","authors":"Ruijia Wang, Shuai Mou, Xiao Wang, Wanpeng Xiao, Qi Ju, C. Shi, Xing Xie","doi":"10.1145/3442381.3449952","DOIUrl":"https://doi.org/10.1145/3442381.3449952","url":null,"abstract":"Graph Neural Networks (GNNs) have drawn considerable attention in recent years and achieved outstanding performance in many tasks. Most empirical studies of GNNs assume that the observed graph represents a complete and accurate picture of node relationship. However, this fundamental assumption cannot always be satisfied, since the real-world graphs from complex systems are error-prone and may not be compatible with the properties of GNNs. Therefore, GNNs solely relying on original graph may cause unsatisfactory results, one typical example of which is that GNNs perform well on graphs with homophily while fail on the disassortative situation. In this paper, we propose graph estimation neural networks GEN, which estimates graph structure for GNNs. Specifically, our GEN presents a structure model to fit the mechanism of GNNs by generating graphs with community structure, and an observation model that injects multifaceted observations into calculating the posterior distribution of graphs and is the first to incorporate multi-order neighborhood information. With above two models, the estimation of graph is implemented based on Bayesian inference to maximize the posterior probability, which attains mutual optimization with GNN parameters in an iterative framework. To comprehensively evaluate the performance of GEN, we perform a set of experiments on several benchmark datasets with different homophily and a synthetic dataset, where the experimental results demonstrate the effectiveness of our GEN and rationality of the estimated graph.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128137710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}