Proceedings of the Web Conference 2021最新文献_第4页

FedPS: A Privacy Protection Enhanced Personalized Search Framework FedPS:隐私保护增强的个性化搜索框架

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449936

Jing Yao, Zhicheng Dou, Ji-rong Wen

Personalized search returns each user more accurate results by collecting the user’s historical search behaviors to infer her interests and query intents. However, it brings the risk of user privacy leakage, and this may greatly limit the practical application of personalized search. In this paper, we focus on the problem of privacy protection in personalized search, and propose a privacy protection enhanced personalized search framework, denoted with FedPS. Under this framework, we keep each user’s private data on her individual client, and train a shared personalized ranking model with all users’ decentralized data by means of federated learning. We implement two models within the framework: the first one applies a personalization model with a personal module that fits the user’s data distribution to alleviate the challenge of data heterogeneity in federated learning; the second model introduces trustworthy proxies and group servers to solve the problems of limited communication, performance bottleneck and privacy attack for FedPS. Experimental results verify that our proposed framework can enhance privacy protection without losing too much accuracy.

个性化搜索通过收集用户的历史搜索行为来推断其兴趣和查询意图，从而为每个用户返回更准确的结果。然而，它带来了用户隐私泄露的风险，这可能会极大地限制个性化搜索的实际应用。本文针对个性化搜索中的隐私保护问题，提出了一种隐私保护增强的个性化搜索框架，用FedPS表示。在此框架下，我们将每个用户的私人数据保存在其个人客户端上，并通过联邦学习的方式与所有用户的分散数据训练共享的个性化排名模型。我们在框架内实现了两个模型:第一个模型应用个性化模型，其中包含适合用户数据分布的个人模块，以缓解联邦学习中数据异构的挑战;第二个模型引入可信代理和组服务器，解决FedPS通信受限、性能瓶颈和隐私攻击等问题。实验结果验证了我们提出的框架可以在不损失太多准确性的情况下增强隐私保护。

{"title":"FedPS: A Privacy Protection Enhanced Personalized Search Framework","authors":"Jing Yao, Zhicheng Dou, Ji-rong Wen","doi":"10.1145/3442381.3449936","DOIUrl":"https://doi.org/10.1145/3442381.3449936","url":null,"abstract":"Personalized search returns each user more accurate results by collecting the user’s historical search behaviors to infer her interests and query intents. However, it brings the risk of user privacy leakage, and this may greatly limit the practical application of personalized search. In this paper, we focus on the problem of privacy protection in personalized search, and propose a privacy protection enhanced personalized search framework, denoted with FedPS. Under this framework, we keep each user’s private data on her individual client, and train a shared personalized ranking model with all users’ decentralized data by means of federated learning. We implement two models within the framework: the first one applies a personalization model with a personal module that fits the user’s data distribution to alleviate the challenge of data heterogeneity in federated learning; the second model introduces trustworthy proxies and group servers to solve the problems of limited communication, performance bottleneck and privacy attack for FedPS. Experimental results verify that our proposed framework can enhance privacy protection without losing too much accuracy.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125361772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Insightful Dimensionality Reduction with Very Low Rank Variable Subsets 极低秩变量子集的深刻降维

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450067

Bruno Ordozgoiti, Sachith Pai, M. Kołczyńska

Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.

降维技术可用于生成稳健、经济的预测模型，并增强探索性数据分析的可解释性。然而，许多这些方法产生的模型都是根据抽象因素制定的，或者过于高维，无法促进洞察力和适应低计算预算。在本文中，我们探索了一种可解释降维的替代方法。给定一个数据矩阵，我们研究以下问题:是否存在可以主要由单一因素解释的变量子集?我们将这个挑战表述为寻找接近秩1的子矩阵的问题。尽管它有潜力，但这个主题在文献中还没有得到充分的解决，而且实际上没有为此目的同时有效、高效和可扩展的算法。我们将任务形式化为两个问题，我们在计算复杂性方面进行了表征，并提出了具有近似保证的高效可扩展算法。我们的实验证明了我们的方法如何在数据中产生深刻的发现，并表明我们的算法优于强基线。

{"title":"Insightful Dimensionality Reduction with Very Low Rank Variable Subsets","authors":"Bruno Ordozgoiti, Sachith Pai, M. Kołczyńska","doi":"10.1145/3442381.3450067","DOIUrl":"https://doi.org/10.1145/3442381.3450067","url":null,"abstract":"Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121479600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Target-adaptive Graph for Cross-target Stance Detection 交叉目标姿态检测的目标自适应图

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449790

Bin Liang, Yonghao Fu, Lin Gui, Min Yang, Jiachen Du, Yulan He, Ruifeng Xu

Target plays an essential role in stance detection of an opinionated review/claim, since the stance expressed in the text often depends on the target. In practice, we need to deal with targets unseen in the annotated training data. As such, detecting stance for an unknown or unseen target is an important research problem. This paper presents a novel approach that automatically identifies and adapts the target-dependent and target-independent roles that a word plays with respect to a specific target in stance expressions, so as to achieve cross-target stance detection. More concretely, we explore a novel solution of constructing heterogeneous target-adaptive pragmatics dependency graphs (TPDG) for each sentence towards a given target. An in-target graph is constructed to produce inherent pragmatics dependencies of words for a distinct target. In addition, another cross-target graph is constructed to develop the versatility of words across all targets for boosting the learning of dominant word-level stance expressions available to an unknown target. A novel graph-aware model with interactive Graphical Convolutional Network (GCN) blocks is developed to derive the target-adaptive graph representation of the context for stance detection. The experimental results on a number of benchmark datasets show that our proposed model outperforms state-of-the-art methods in cross-target stance detection.

由于文本中所表达的立场往往取决于目标，因此目标在武断评论/主张的立场检测中起着至关重要的作用。在实践中，我们需要处理在标注的训练数据中看不到的目标。因此，对未知或看不见的目标进行姿态检测是一个重要的研究问题。本文提出了一种新的方法，该方法自动识别和适应单词在姿态表达中相对于特定目标所扮演的目标依赖和目标独立的角色，从而实现跨目标姿态检测。更具体地说，我们探索了一种针对给定目标的每个句子构建异构目标自适应语用依赖图(TPDG)的新解决方案。通过构建目标内图来生成不同目标词的内在语用依赖关系。此外，我们还构建了另一个跨目标图来开发单词在所有目标上的通用性，以促进对未知目标可用的优势词级姿态表达的学习。提出了一种新的基于交互式图形卷积网络(GCN)块的图形感知模型，以导出用于姿态检测的上下文的目标自适应图形表示。在多个基准数据集上的实验结果表明，我们提出的模型在交叉目标姿态检测方面优于目前最先进的方法。

{"title":"Target-adaptive Graph for Cross-target Stance Detection","authors":"Bin Liang, Yonghao Fu, Lin Gui, Min Yang, Jiachen Du, Yulan He, Ruifeng Xu","doi":"10.1145/3442381.3449790","DOIUrl":"https://doi.org/10.1145/3442381.3449790","url":null,"abstract":"Target plays an essential role in stance detection of an opinionated review/claim, since the stance expressed in the text often depends on the target. In practice, we need to deal with targets unseen in the annotated training data. As such, detecting stance for an unknown or unseen target is an important research problem. This paper presents a novel approach that automatically identifies and adapts the target-dependent and target-independent roles that a word plays with respect to a specific target in stance expressions, so as to achieve cross-target stance detection. More concretely, we explore a novel solution of constructing heterogeneous target-adaptive pragmatics dependency graphs (TPDG) for each sentence towards a given target. An in-target graph is constructed to produce inherent pragmatics dependencies of words for a distinct target. In addition, another cross-target graph is constructed to develop the versatility of words across all targets for boosting the learning of dominant word-level stance expressions available to an unknown target. A novel graph-aware model with interactive Graphical Convolutional Network (GCN) blocks is developed to derive the target-adaptive graph representation of the context for stance detection. The experimental results on a number of benchmark datasets show that our proposed model outperforms state-of-the-art methods in cross-target stance detection.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122574695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution PHP符号执行自动内置函数建模的可行性研究

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450002

Penghui Li, W. Meng, Kangjie Lu, Changhua Luo

Symbolic execution has been widely applied in detecting vulnerabilities in web applications. Modeling language-specific built-in functions is essential for symbolic execution. Since built-in functions tend to be complicated and are typically implemented in low-level languages, a common strategy is to manually translate them into the SMT-LIB language for constraint solving. Such translation requires an excessive amount of human effort and deep understandings of the function behaviors. Incorrect translation can invalidate the final results. This problem aggravates in PHP applications because of their cross-language nature, i.e., , the built-in functions are written in C, but the rest code is in PHP. In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions for symbolic execution. We synthesize C programs by transforming the constraint solving task in PHP symbolic execution into a C-compliant format and integrating them with C implementations of the built-in functions. We apply symbolic execution on the synthesized C program to find a feasible path, which gives a solution that can be applied to the original PHP constraints. In this way, we automate the modeling of built-in functions in PHP applications. We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results demonstrate that our automated method is more accurate with a higher function coverage, and can exploit a similar number of vulnerabilities. Our empirical analysis also shows that the manual and automated methods have different strengths, which complement each other in certain scenarios. Therefore, the best practice is to combine both of them to optimize the accuracy, correctness, and coverage of symbolic execution.

符号执行在web应用程序漏洞检测中得到了广泛的应用。对特定于语言的内置函数进行建模对于符号执行至关重要。由于内置函数往往很复杂，而且通常是用低级语言实现的，因此常用的策略是手动将它们翻译成SMT-LIB语言，以便求解约束。这样的翻译需要大量的人力和对功能行为的深刻理解。不正确的翻译会使最终结果无效。由于PHP应用程序的跨语言特性，这个问题在PHP应用程序中更加严重，例如，内置函数是用C编写的，但其余代码是用PHP编写的。在本文中，我们探讨了自动化PHP内置函数建模过程的可行性。我们通过将PHP符号执行中的约束求解任务转换为符合C的格式，并将其与内置函数的C实现集成来合成C程序。我们对合成的C程序进行符号执行，寻找可行的路径，给出了一个可以应用于原PHP约束的解决方案。通过这种方式，我们自动化了PHP应用程序中内置函数的建模。我们将我们的自动化方法与最先进的手动建模工具进行了彻底的比较。评估结果表明，我们的自动化方法更准确，具有更高的功能覆盖率，并且可以利用相似数量的漏洞。我们的实证分析也表明，手动方法和自动化方法具有不同的优势，在某些情况下可以相互补充。因此，最佳实践是将它们结合起来，以优化符号执行的准确性、正确性和覆盖范围。

{"title":"On the Feasibility of Automated Built-in Function Modeling for PHP Symbolic Execution","authors":"Penghui Li, W. Meng, Kangjie Lu, Changhua Luo","doi":"10.1145/3442381.3450002","DOIUrl":"https://doi.org/10.1145/3442381.3450002","url":null,"abstract":"Symbolic execution has been widely applied in detecting vulnerabilities in web applications. Modeling language-specific built-in functions is essential for symbolic execution. Since built-in functions tend to be complicated and are typically implemented in low-level languages, a common strategy is to manually translate them into the SMT-LIB language for constraint solving. Such translation requires an excessive amount of human effort and deep understandings of the function behaviors. Incorrect translation can invalidate the final results. This problem aggravates in PHP applications because of their cross-language nature, i.e., , the built-in functions are written in C, but the rest code is in PHP. In this paper, we explore the feasibility of automating the process of modeling PHP built-in functions for symbolic execution. We synthesize C programs by transforming the constraint solving task in PHP symbolic execution into a C-compliant format and integrating them with C implementations of the built-in functions. We apply symbolic execution on the synthesized C program to find a feasible path, which gives a solution that can be applied to the original PHP constraints. In this way, we automate the modeling of built-in functions in PHP applications. We thoroughly compare our automated method with the state-of-the-art manual modeling tool. The evaluation results demonstrate that our automated method is more accurate with a higher function coverage, and can exploit a similar number of vulnerabilities. Our empirical analysis also shows that the manual and automated methods have different strengths, which complement each other in certain scenarios. Therefore, the best practice is to combine both of them to optimize the accuracy, correctness, and coverage of symbolic execution.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123092585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Progressive, Holistic Geospatial Interlinking 渐进的、整体的地理空间互联

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449850

G. Papadakis, G. Mandilaras, N. Mamoulis, Manolis Koubarakis

Geospatial data constitute a considerable part of Semantic Web data, but at the moment, its sources are inadequately interlinked with topological relations in the Linked Open Data cloud. Geospatial Interlinking covers this gap with batch techniques that are restricted to individual topological relations, even though most operations are common for all main relations. In this work, we introduce a batch algorithm that simultaneously computes all topological relations and define the task of Progressive Geospatial Interlinking, which produces results in a pay-as-you-go manner when the available computational or temporal resources are limited. We propose two progressive algorithms and conduct a thorough experimental study over large, real datasets, demonstrating the superiority of our techniques over the current state-of-the-art.

地理空间数据构成了语义Web数据中相当大的一部分，但目前，在关联开放数据云中，地理空间数据的来源与拓扑关系的互连还不够充分。地理空间互连用仅限于单个拓扑关系的批处理技术弥补了这一差距，尽管大多数操作对所有主要关系都是通用的。在这项工作中，我们引入了一种批处理算法，该算法同时计算所有拓扑关系并定义渐进式地理空间互连的任务，当可用的计算资源或时间资源有限时，该算法以随用随付的方式产生结果。我们提出了两种渐进式算法，并在大型真实数据集上进行了彻底的实验研究，证明了我们的技术优于当前最先进的技术。

引用次数: 8

Wiki2Prop: A Multimodal Approach for Predicting Wikidata Properties from Wikipedia Wiki2Prop:从维基百科预测维基数据属性的多模式方法

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3450082

Michael Luggen, J. Audiffren, D. Difallah, P. Cudré-Mauroux

Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity – that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) presents multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. Finally, we make Wiki2Prop available as a property recommender system that can be activated and used directly in the context of a Wikidata entity page.

维基数据正迅速成为众多在线任务的关键资源，如语音识别、实体链接、问题回答或语义搜索。Wikidata的值直接链接到与每个实体相关联的丰富信息——也就是说，描述每个实体的属性以及与其他实体的关系。尽管社区在Wikidata项目上投入了巨大的人工和自动努力，但不断增长的实体数量(现在超过1亿)在图中的知识差距方面提出了多重挑战，这些挑战很难追踪。为了帮助指导社区填补维基数据中的空白，我们建议对实体可能缺失的属性进行识别和排序。在这项工作中，我们专注于具有任何语言的专用维基百科页面的实体，以直接基于文本内容进行预测。我们证明这个问题可以被表述为一个多标签分类问题，其中在维基数据中定义的每个属性都是一个潜在的标签。我们的主要贡献是Wiki2Prop，它使用多模态深度学习方法来预测应该将哪些属性附加到给定实体上，使用它的维基百科页面嵌入。此外，Wiki2Prop能够在可用的情况下以多语言嵌入和多模式数据(如图像)的形式合并其他功能。我们根据最先进的技术对我们的方法进行了实证评估，并展示了Wiki2Prop如何在维基数据的属性预测任务中显著优于其竞争对手，以及如何使用多语言和多模态数据进一步改进结果。最后，我们使Wiki2Prop成为一个属性推荐系统，可以在维基数据实体页面的上下文中直接激活和使用。

{"title":"Wiki2Prop: A Multimodal Approach for Predicting Wikidata Properties from Wikipedia","authors":"Michael Luggen, J. Audiffren, D. Difallah, P. Cudré-Mauroux","doi":"10.1145/3442381.3450082","DOIUrl":"https://doi.org/10.1145/3442381.3450082","url":null,"abstract":"Wikidata is rapidly emerging as a key resource for a multitude of online tasks such as Speech Recognition, Entity Linking, Question Answering, or Semantic Search. The value of Wikidata is directly linked to the rich information associated with each entity – that is, the properties describing each entity as well as the relationships to other entities. Despite the tremendous manual and automatic efforts the community invested in the Wikidata project, the growing number of entities (now more than 100 million) presents multiple challenges in terms of knowledge gaps in the graph that are hard to track. To help guide the community in filling the gaps in Wikidata, we propose to identify and rank the properties that an entity might be missing. In this work, we focus on entities with a dedicated Wikipedia page in any language to make predictions directly based on textual content. We show that this problem can be formulated as a multi-label classification problem where every property defined in Wikidata is a potential label. Our main contribution, Wiki2Prop, solves this problem using a multimodal Deep Learning method to predict which properties should be attached to a given entity, using its Wikipedia page embeddings. Moreover, Wiki2Prop is able to incorporate additional features in the form of multilingual embeddings and multimodal data such as images whenever available. We empirically evaluate our approach against the state of the art and show how Wiki2Prop significantly outperforms its competitors for the task of property prediction in Wikidata, and how the use of multilingual and multimodal data improves the results further. Finally, we make Wiki2Prop available as a property recommender system that can be activated and used directly in the context of a Wikidata entity page.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125825625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Demystifying Illegal Mobile Gambling Apps 揭秘非法手机赌博应用

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449932

Yuhao Gao, Haoyu Wang, Li Li, Xiapu Luo, Guoai Xu, Xuanzhe Liu

Mobile gambling app, as a new type of online gambling service emerging in the mobile era, has become one of the most popular and lucrative underground businesses in the mobile app ecosystem. Since its born, mobile gambling app has received strict regulations from both government authorities and app markets. However, to the best of our knowledge, mobile gambling apps have not been investigated by our research community. In this paper, we take the first step to fill the void. Specifically, we first perform a 5-month dataset collection process to harvest illegal gambling apps in China, where mobile gambling apps are outlawed. We have collected 3,366 unique gambling apps with 5,344 different versions. We then characterize the gambling apps from various perspectives including app distribution channels, network infrastructure, malicious behaviors, abused third-party and payment services. Our work has revealed a number of covert distribution channels, the unique characteristics of gambling apps, and the abused fourth-party payment services. At last, we further propose a “guilt-by-association” expansion method to identify new suspicious gambling services, which help us further identify over 140K suspicious gambling domains and over 57K gambling app candidates. Our study demonstrates the urgency for detecting and regulating illegal gambling apps.

手机赌博app作为移动时代兴起的一种新型在线赌博服务，已经成为移动应用生态系统中最受欢迎和最赚钱的地下业务之一。自诞生以来，手机赌博应用就受到了政府部门和应用市场的严格监管。然而，据我们所知，手机赌博应用并没有被我们的研究团队调查过。在本文中，我们迈出了填补空白的第一步。具体来说，我们首先执行了一个为期5个月的数据集收集过程，以收集中国的非法赌博应用程序，移动赌博应用程序在中国是非法的。我们收集了3366个独特的赌博应用程序，有5344个不同的版本。然后，我们从应用程序的分销渠道、网络基础设施、恶意行为、被滥用的第三方和支付服务等多个角度对赌博应用程序进行了描述。我们的工作揭示了一些隐蔽的分销渠道，赌博应用的独特特征，以及被滥用的第四方支付服务。最后，我们进一步提出了一种“联想罪责”扩展方法来识别新的可疑赌博服务，帮助我们进一步识别超过14万个可疑赌博域名和超过57万个赌博应用候选。我们的研究表明了检测和监管非法赌博应用的紧迫性。

{"title":"Demystifying Illegal Mobile Gambling Apps","authors":"Yuhao Gao, Haoyu Wang, Li Li, Xiapu Luo, Guoai Xu, Xuanzhe Liu","doi":"10.1145/3442381.3449932","DOIUrl":"https://doi.org/10.1145/3442381.3449932","url":null,"abstract":"Mobile gambling app, as a new type of online gambling service emerging in the mobile era, has become one of the most popular and lucrative underground businesses in the mobile app ecosystem. Since its born, mobile gambling app has received strict regulations from both government authorities and app markets. However, to the best of our knowledge, mobile gambling apps have not been investigated by our research community. In this paper, we take the first step to fill the void. Specifically, we first perform a 5-month dataset collection process to harvest illegal gambling apps in China, where mobile gambling apps are outlawed. We have collected 3,366 unique gambling apps with 5,344 different versions. We then characterize the gambling apps from various perspectives including app distribution channels, network infrastructure, malicious behaviors, abused third-party and payment services. Our work has revealed a number of covert distribution channels, the unique characteristics of gambling apps, and the abused fourth-party payment services. At last, we further propose a “guilt-by-association” expansion method to identify new suspicious gambling services, which help us further identify over 140K suspicious gambling domains and over 57K gambling app candidates. Our study demonstrates the urgency for detecting and regulating illegal gambling apps.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125916787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Computing Views of OWL Ontologies for the Semantic Web 语义网OWL本体的计算视图

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449881

Jiaqi Li, Xuan Wu, Chang Lu, Wenxing Deng, Yizheng Zhao

This paper tackles the problem of computing views of OWL ontologies using a forgetting-based approach. In traditional relational databases, a view is a subset of a database, whereas in ontologies, a view is more than a subset; it contains not only axioms contained in the original ontology, but may also contain newly-derived axioms entailed by the original ontology (implicitly contained in the original ontology). Specifically, given an ontology , the signature of is the set of all the names in , and a view of is a new ontology obtained from using only part of ’s signature, namely the target signature, while preserving all logical entailments up to the target signature. Computing views of OWL ontologies is useful for Semantic Web applications such as ontology-based query answering, in a way that the view can be used as a substitute of the original ontology to answer queries formulated with the target signature, and information hiding, in the sense that it restricts users from viewing certain information of an ontology. Forgetting is a form of non-standard reasoning concerned with eliminating from an ontology a subset of its signature, namely the forgetting signature, in such a way that all logical entailments are preserved up to the target signature. Forgetting can thus be used as a means for computing views of OWL ontologies — the solution of forgetting a set of names from an ontology is the view of for the target signature . In this paper, we present a forgetting-based method for computing views of OWL ontologies specified in the description logic , the basic extended with role hierarchy, nominals and inverse roles. The method is terminating and sound. Despite the method not being complete, an evaluation with a prototype implementation of the method on a corpus of real-world ontologies has shown very good success rates. This is very useful from the perspective of the Semantic Web, as it provides knowledge engineers with a powerful tool for creating views of OWL ontologies.

本文使用一种基于遗忘的方法来解决OWL本体视图的计算问题。在传统的关系数据库中，视图是数据库的一个子集，而在本体中，视图不仅仅是一个子集;它不仅包含原本体所包含的公理，还可能包含原本体所需要的新派生公理(隐式地包含在原本体中)。具体来说，给定一个本体，其中的签名是其中所有名称的集合，而视图是仅使用其签名的一部分即目标签名而获得的新本体，同时保留到目标签名的所有逻辑蕴涵。OWL本体的计算视图对于语义Web应用程序(如基于本体的查询应答)非常有用，在某种程度上，该视图可以用作原始本体的替代品，以回答使用目标签名制定的查询和信息隐藏，从某种意义上说，它限制用户查看本体的某些信息。遗忘是一种非标准推理的形式，涉及从本体中消除其签名的子集，即遗忘签名，以这样一种方式，所有逻辑蕴涵都保留到目标签名。因此，遗忘可以用作计算OWL本体视图的一种方法——从本体中遗忘一组名称的解决方案就是目标签名的视图。本文提出了一种基于遗忘的描述逻辑OWL本体视图计算方法，该方法扩展了角色层次、标称和逆角色。这种方法是有效的。尽管该方法还不完整，但在现实世界本体的语料库上对该方法的原型实现进行了评估，显示出非常好的成功率。从语义Web的角度来看，这是非常有用的，因为它为知识工程师提供了创建OWL本体视图的强大工具。

{"title":"Computing Views of OWL Ontologies for the Semantic Web","authors":"Jiaqi Li, Xuan Wu, Chang Lu, Wenxing Deng, Yizheng Zhao","doi":"10.1145/3442381.3449881","DOIUrl":"https://doi.org/10.1145/3442381.3449881","url":null,"abstract":"This paper tackles the problem of computing views of OWL ontologies using a forgetting-based approach. In traditional relational databases, a view is a subset of a database, whereas in ontologies, a view is more than a subset; it contains not only axioms contained in the original ontology, but may also contain newly-derived axioms entailed by the original ontology (implicitly contained in the original ontology). Specifically, given an ontology , the signature of is the set of all the names in , and a view of is a new ontology obtained from using only part of ’s signature, namely the target signature, while preserving all logical entailments up to the target signature. Computing views of OWL ontologies is useful for Semantic Web applications such as ontology-based query answering, in a way that the view can be used as a substitute of the original ontology to answer queries formulated with the target signature, and information hiding, in the sense that it restricts users from viewing certain information of an ontology. Forgetting is a form of non-standard reasoning concerned with eliminating from an ontology a subset of its signature, namely the forgetting signature, in such a way that all logical entailments are preserved up to the target signature. Forgetting can thus be used as a means for computing views of OWL ontologies — the solution of forgetting a set of names from an ontology is the view of for the target signature . In this paper, we present a forgetting-based method for computing views of OWL ontologies specified in the description logic , the basic extended with role hierarchy, nominals and inverse roles. The method is terminating and sound. Despite the method not being complete, an evaluation with a prototype implementation of the method on a corpus of real-world ontologies has shown very good success rates. This is very useful from the perspective of the Semantic Web, as it provides knowledge engineers with a powerful tool for creating views of OWL ontologies.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126106266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Outlier-Resilient Web Service QoS Prediction 异常弹性Web服务QoS预测

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449938

Fanghua Ye, Zhiwei Lin, Chuan Chen, Zibin Zheng, Hong Huang

The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.

Web服务的激增使得用户很难在众多功能相同或相似的候选服务中选择最合适的服务。服务质量(QoS)描述了Web服务的非功能特征，它已成为区分服务选择的关键因素。但是，由于时间成本高，资源开销大，用户无法调用所有的Web服务来获得相应的QoS值。因此，预测未知的QoS值是必要的。虽然提出了各种QoS预测方法，但很少有方法考虑离群值，这可能会大大降低预测性能。为了克服这一局限性，本文提出了一种异常值弹性QoS预测方法。我们的方法利用柯西损失来度量观测到的QoS值与预测值之间的差异。由于柯西损失的鲁棒性，我们的方法对异常值具有弹性。我们进一步扩展了我们的方法，通过考虑时间信息来提供时间感知的QoS预测结果。最后，我们在静态和动态数据集上进行了广泛的实验。结果表明，我们的方法能够实现比最先进的基线方法更好的性能。

{"title":"Outlier-Resilient Web Service QoS Prediction","authors":"Fanghua Ye, Zhiwei Lin, Chuan Chen, Zibin Zheng, Hong Huang","doi":"10.1145/3442381.3449938","DOIUrl":"https://doi.org/10.1145/3442381.3449938","url":null,"abstract":"The proliferation of Web services makes it difficult for users to select the most appropriate one among numerous functionally identical or similar service candidates. Quality-of-Service (QoS) describes the non-functional characteristics of Web services, and it has become the key differentiator for service selection. However, users cannot invoke all Web services to obtain the corresponding QoS values due to high time cost and huge resource overhead. Thus, it is essential to predict unknown QoS values. Although various QoS prediction methods have been proposed, few of them have taken outliers into consideration, which may dramatically degrade the prediction performance. To overcome this limitation, we propose an outlier-resilient QoS prediction method in this paper. Our method utilizes Cauchy loss to measure the discrepancy between the observed QoS values and the predicted ones. Owing to the robustness of Cauchy loss, our method is resilient to outliers. We further extend our method to provide time-aware QoS prediction results by taking the temporal information into consideration. Finally, we conduct extensive experiments on both static and dynamic datasets. The results demonstrate that our method is able to achieve better performance than state-of-the-art baseline methods.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114253342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Graph Structure Estimation Neural Networks 图结构估计神经网络

Proceedings of the Web Conference 2021

Pub Date : 2021-04-19 DOI: 10.1145/3442381.3449952

Ruijia Wang, Shuai Mou, Xiao Wang, Wanpeng Xiao, Qi Ju, C. Shi, Xing Xie

Graph Neural Networks (GNNs) have drawn considerable attention in recent years and achieved outstanding performance in many tasks. Most empirical studies of GNNs assume that the observed graph represents a complete and accurate picture of node relationship. However, this fundamental assumption cannot always be satisfied, since the real-world graphs from complex systems are error-prone and may not be compatible with the properties of GNNs. Therefore, GNNs solely relying on original graph may cause unsatisfactory results, one typical example of which is that GNNs perform well on graphs with homophily while fail on the disassortative situation. In this paper, we propose graph estimation neural networks GEN, which estimates graph structure for GNNs. Specifically, our GEN presents a structure model to fit the mechanism of GNNs by generating graphs with community structure, and an observation model that injects multifaceted observations into calculating the posterior distribution of graphs and is the first to incorporate multi-order neighborhood information. With above two models, the estimation of graph is implemented based on Bayesian inference to maximize the posterior probability, which attains mutual optimization with GNN parameters in an iterative framework. To comprehensively evaluate the performance of GEN, we perform a set of experiments on several benchmark datasets with different homophily and a synthetic dataset, where the experimental results demonstrate the effectiveness of our GEN and rationality of the estimated graph.

近年来，图神经网络(GNNs)在许多任务中取得了优异的表现，受到了广泛的关注。大多数gnn的实证研究都假设观察到的图代表了节点关系的完整和准确的图像。然而，这个基本假设并不总是被满足，因为来自复杂系统的真实世界图形容易出错，并且可能与gnn的特性不兼容。因此，单纯依赖原始图的gnn可能会导致不满意的结果，其中一个典型的例子是gnn在同质图上表现良好，而在非配图上表现不佳。本文提出了图估计神经网络GEN，用于估计gnn的图结构。具体而言，我们的GEN提出了一个通过生成具有社区结构的图来拟合gnn机制的结构模型，以及一个将多方面观测值注入计算图的后验分布并首次纳入多阶邻域信息的观测模型。在上述两种模型中，图的估计都是基于贝叶斯推理来实现后验概率最大化，在迭代框架内实现了与GNN参数的相互优化。为了全面评估GEN算法的性能，我们在几个不同同质性的基准数据集和一个合成数据集上进行了一组实验，实验结果证明了我们的GEN算法的有效性和估计图的合理性。

{"title":"Graph Structure Estimation Neural Networks","authors":"Ruijia Wang, Shuai Mou, Xiao Wang, Wanpeng Xiao, Qi Ju, C. Shi, Xing Xie","doi":"10.1145/3442381.3449952","DOIUrl":"https://doi.org/10.1145/3442381.3449952","url":null,"abstract":"Graph Neural Networks (GNNs) have drawn considerable attention in recent years and achieved outstanding performance in many tasks. Most empirical studies of GNNs assume that the observed graph represents a complete and accurate picture of node relationship. However, this fundamental assumption cannot always be satisfied, since the real-world graphs from complex systems are error-prone and may not be compatible with the properties of GNNs. Therefore, GNNs solely relying on original graph may cause unsatisfactory results, one typical example of which is that GNNs perform well on graphs with homophily while fail on the disassortative situation. In this paper, we propose graph estimation neural networks GEN, which estimates graph structure for GNNs. Specifically, our GEN presents a structure model to fit the mechanism of GNNs by generating graphs with community structure, and an observation model that injects multifaceted observations into calculating the posterior distribution of graphs and is the first to incorporate multi-order neighborhood information. With above two models, the estimation of graph is implemented based on Bayesian inference to maximize the posterior probability, which attains mutual optimization with GNN parameters in an iterative framework. To comprehensively evaluate the performance of GEN, we perform a set of experiments on several benchmark datasets with different homophily and a synthetic dataset, where the experimental results demonstrate the effectiveness of our GEN and rationality of the estimated graph.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128137710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47