ACM Transactions on Knowledge Discovery from Data最新文献_第8页

Domain Generalization in Time Series Forecasting 时间序列预测中的领域泛化

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-31 DOI: 10.1145/3643035

Songgaojun Deng, Olivier Sprangers, Ming Li, Sebastian Schelter, Maarten de Rijke

Domain generalization aims to design models that can effectively generalize to unseen target domains by learning from observed source domains. Domain generalization poses a significant challenge for time series data, due to varying data distributions and temporal dependencies. Existing approaches to domain generalization are not designed for time series data, which often results in suboptimal or unstable performance when confronted with diverse temporal patterns and complex data characteristics. We propose a novel approach to tackle the problem of domain generalization in time series forecasting. We focus on a scenario where time series domains share certain common attributes and exhibit no abrupt distribution shifts. Our method revolves around the incorporation of a key regularization term into an existing time series forecasting model: domain discrepancy regularization. In this way, we aim to enforce consistent performance across different domains that exhibit distinct patterns. We calibrate the regularization term by investigating the performance within individual domains and propose the domain discrepancy regularization with domain difficulty awareness. We demonstrate the effectiveness of our method on multiple datasets, including synthetic and real-world time series datasets from diverse domains such as retail, transportation, and finance. Our method is compared against traditional methods, deep learning models, and domain generalization approaches to provide comprehensive insights into its performance. In these experiments, our method showcases superior performance, surpassing both the base model and competing domain generalization models across all datasets. Furthermore, our method is highly general and can be applied to various time series models.

领域泛化旨在通过从观察到的源领域中学习，设计出能够有效泛化到未知目标领域的模型。由于数据分布和时间依赖性各不相同，领域泛化对时间序列数据提出了巨大挑战。现有的领域泛化方法不是针对时间序列数据设计的，因此在面对不同的时间模式和复杂的数据特征时，往往会导致性能不理想或不稳定。我们提出了一种新方法来解决时间序列预测中的领域泛化问题。我们将重点放在时间序列域具有某些共同属性且不表现出突然分布变化的情况上。我们的方法是在现有的时间序列预测模型中加入一个关键的正则化项：域差异正则化。这样，我们就能在表现出不同模式的不同领域中实现一致的性能。我们通过研究单个领域内的性能来校准正则化项，并提出了具有领域难度意识的领域差异正则化。我们在多个数据集上展示了我们方法的有效性，包括来自零售、交通和金融等不同领域的合成和真实世界时间序列数据集。我们将我们的方法与传统方法、深度学习模型和领域泛化方法进行了比较，以全面了解其性能。在这些实验中，我们的方法展示了卓越的性能，在所有数据集上都超越了基础模型和竞争性领域泛化模型。此外，我们的方法具有很强的通用性，可应用于各种时间序列模型。

{"title":"Domain Generalization in Time Series Forecasting","authors":"Songgaojun Deng, Olivier Sprangers, Ming Li, Sebastian Schelter, Maarten de Rijke","doi":"10.1145/3643035","DOIUrl":"https://doi.org/10.1145/3643035","url":null,"abstract":"Domain generalization aims to design models that can effectively generalize to unseen target domains by learning from observed source domains. Domain generalization poses a significant challenge for time series data, due to varying data distributions and temporal dependencies. Existing approaches to domain generalization are not designed for time series data, which often results in suboptimal or unstable performance when confronted with diverse temporal patterns and complex data characteristics. We propose a novel approach to tackle the problem of domain generalization in time series forecasting. We focus on a scenario where time series domains share certain common attributes and exhibit no abrupt distribution shifts. Our method revolves around the incorporation of a key regularization term into an existing time series forecasting model: domain discrepancy regularization. In this way, we aim to enforce consistent performance across different domains that exhibit distinct patterns. We calibrate the regularization term by investigating the performance within individual domains and propose the domain discrepancy regularization with domain difficulty awareness. We demonstrate the effectiveness of our method on multiple datasets, including synthetic and real-world time series datasets from diverse domains such as retail, transportation, and finance. Our method is compared against traditional methods, deep learning models, and domain generalization approaches to provide comprehensive insights into its performance. In these experiments, our method showcases superior performance, surpassing both the base model and competing domain generalization models across all datasets. Furthermore, our method is highly general and can be applied to various time series models.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"172 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139645681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

X-FSPMiner: A Novel Algorithm for Frequent Similar Pattern Mining X-FSPMiner：频繁相似模式挖掘的新算法

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-30 DOI: 10.1145/3643820

Ansel Y. Rodríguez-González, Ramón Aranda, Miguel Á. Álvarez-Carmona, Angel Díaz-Pacheco, Rosa María Valdovinos Rosas

Frequent similar pattern mining (FSP mining) allows found frequent patterns hidden from the classical approach. However, the use of similarity functions implies more computational effort, becoming necessary to develop more efficient algorithms for FSP mining. This work aims to improve the efficiency of mining all FSPs when using Boolean and non-increasing monotonic similarity functions. A data structure to condense an object description collection named FV-Tree, and an algorithm for mine all FSP from the FV-Tree, named X-FSPMiner, are proposed. The experimental results reveal that the novel algorithm X-FSPMiner vastly outperforms the state-of-the-art algorithms for mine all FSP using Boolean and non-increasing monotonic similarity functions.

频繁相似模式挖掘（FSP 挖掘）可以发现隐藏在经典方法中的频繁模式。然而，使用相似性函数意味着更多的计算工作，因此有必要开发更高效的 FSP 挖掘算法。这项工作旨在提高使用布尔和非递增单调相似函数挖掘所有 FSP 的效率。本文提出了一种用于压缩对象描述集合的数据结构，命名为 FV-Tree，以及一种从 FV-Tree 中挖掘所有 FSP 的算法，命名为 X-FSPMiner。实验结果表明，新算法 X-FSPMiner 在使用布尔和非递增单调相似函数挖掘所有 FSP 方面大大优于最先进的算法。

引用次数: 0

Scalable and Inductive Semi-supervised Classifier with Sample Weighting Based on Graph Topology 基于图拓扑的可扩展归纳式半监督分类器与样本加权法

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-30 DOI: 10.1145/3643645

Fadi Dornaika, Zoulfikar Ibrahim, Alirezah Bosaghzadeh

Recently, graph-based semi-supervised learning (GSSL) has garnered significant interest in the realms of machine learning and pattern recognition. Although some of the proposed methods have made some progress, there are still some shortcomings that need to be overcome. There are three main limitations. First, the graphs used in these approaches are usually predefined regardless of the task at hand. Second, due to the use of graphs, almost all approaches are unable to process and consider data with a very large number of unlabeled samples. Thirdly, the imbalance of the topology of the samples is very often not taken into account. In particular, processing large datasets with GSSL might pose challenges in terms of computational resource feasibility. In this paper, we present a scalable and inductive GSSL method. We broaden the scope of the graph topology imbalance paradigm to extensive databases. Second, we employ the calculated weights of the labeled sample for the label-matching term in the global objective function. This leads to a unified, scalable, semi-supervised learning model that allows simultaneous labeling of unlabeled data, projection of the feature space onto the labeling space, along with the graph matrix of anchors. In the proposed scheme, the integration of labels and features from anchors is applied for the adaptive construction of the anchor graph. Experimental results were performed on four large databases: NORB, RCV1, Covtype, and MNIST. These experiments demonstrate that the proposed method exhibits superior performance when compared to existing scalable semi-supervised learning models.

最近，基于图的半监督学习（GSSL）在机器学习和模式识别领域引起了极大的兴趣。尽管一些提出的方法取得了一些进展，但仍有一些不足之处需要克服。主要有三个局限性。首先，这些方法中使用的图形通常是预定义的，与手头的任务无关。其次，由于使用图形，几乎所有方法都无法处理和考虑具有大量未标记样本的数据。第三，样本拓扑结构的不平衡往往没有被考虑在内。特别是，使用 GSSL 处理大型数据集可能会给计算资源的可行性带来挑战。在本文中，我们提出了一种可扩展的归纳 GSSL 方法。我们将图拓扑不平衡范例的范围扩大到了广泛的数据库。其次，我们在全局目标函数的标签匹配项中使用了计算得出的标签样本权重。这就产生了一种统一的、可扩展的半监督学习模型，它允许同时对未标记数据进行标记、将特征空间投影到标记空间以及锚点图矩阵。在所提出的方案中，锚点的标签和特征整合被应用于锚点图的自适应构建。实验结果在四个大型数据库中进行了验证：NORB、RCV1、Covtype 和 MNIST。这些实验表明，与现有的可扩展半监督学习模型相比，所提出的方法表现出更优越的性能。

{"title":"Scalable and Inductive Semi-supervised Classifier with Sample Weighting Based on Graph Topology","authors":"Fadi Dornaika, Zoulfikar Ibrahim, Alirezah Bosaghzadeh","doi":"10.1145/3643645","DOIUrl":"https://doi.org/10.1145/3643645","url":null,"abstract":"Recently, graph-based semi-supervised learning (GSSL) has garnered significant interest in the realms of machine learning and pattern recognition. Although some of the proposed methods have made some progress, there are still some shortcomings that need to be overcome. There are three main limitations. First, the graphs used in these approaches are usually predefined regardless of the task at hand. Second, due to the use of graphs, almost all approaches are unable to process and consider data with a very large number of unlabeled samples. Thirdly, the imbalance of the topology of the samples is very often not taken into account. In particular, processing large datasets with GSSL might pose challenges in terms of computational resource feasibility. In this paper, we present a scalable and inductive GSSL method. We broaden the scope of the graph topology imbalance paradigm to extensive databases. Second, we employ the calculated weights of the labeled sample for the label-matching term in the global objective function. This leads to a unified, scalable, semi-supervised learning model that allows simultaneous labeling of unlabeled data, projection of the feature space onto the labeling space, along with the graph matrix of anchors. In the proposed scheme, the integration of labels and features from anchors is applied for the adaptive construction of the anchor graph. Experimental results were performed on four large databases: NORB, RCV1, Covtype, and MNIST. These experiments demonstrate that the proposed method exhibits superior performance when compared to existing scalable semi-supervised learning models.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"8 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Differential Privacy in Sequential Recommendation: A Noisy Graph Neural Network Approach 序列推荐中的差异隐私：噪声图神经网络方法

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-30 DOI: 10.1145/3643821

Wentao Hu, Hui Fang

With increasing frequency of high-profile privacy breaches in various online platforms, users are becoming more concerned about their privacy. And recommender system is the core component of online platforms for providing personalized service, consequently, its privacy preservation has attracted great attention. As the gold standard of privacy protection, differential privacy has been widely adopted to preserve privacy in recommender systems. However, existing differentially private recommender systems only consider static and independent interactions, so they cannot apply to sequential recommendation where behaviors are dynamic and dependent. Meanwhile, little attention has been paid on the privacy risk of sensitive user features, most of them only protect user feedbacks. In this work, we propose a novel DIfferentially Private Sequential recommendation framework with a noisy Graph Neural Network approach (denoted as DIPSGNN) to address these limitations. To the best of our knowledge, we are the first to achieve differential privacy in sequential recommendation with dependent interactions. Specifically, in DIPSGNN, we first leverage piecewise mechanism to protect sensitive user features. Then, we innovatively add calibrated noise into aggregation step of graph neural network based on aggregation perturbation mechanism. And this noisy graph neural network can protect sequentially dependent interactions and capture user preferences simultaneously. Extensive experiments demonstrate the superiority of our method over state-of-the-art differentially private recommender systems in terms of better balance between privacy and accuracy.

随着各种网络平台上高曝光率的隐私泄露事件日益频繁，用户越来越关注自己的隐私。而推荐系统是网络平台提供个性化服务的核心组成部分，因此，其隐私保护问题备受关注。作为隐私保护的黄金标准，差分隐私被广泛用于保护推荐系统中的隐私。然而，现有的差异化隐私推荐系统只考虑了静态和独立的交互，因此无法适用于行为是动态和依赖的连续推荐。同时，人们很少关注敏感用户特征的隐私风险，大多数系统只保护用户反馈。在这项工作中，我们提出了一种新颖的 "无噪声图神经网络"（DIPSGNN）隐私序列推荐框架，以解决这些局限性。据我们所知，我们是第一个在具有依赖性交互的顺序推荐中实现差异化隐私的人。具体来说，在 DIPSGNN 中，我们首先利用分片机制来保护敏感的用户特征。然后，我们基于聚合扰动机制，在图神经网络的聚合步骤中创新性地添加了校准噪声。这种有噪声的图神经网络可以保护顺序依赖的交互，并同时捕捉用户偏好。广泛的实验证明，我们的方法优于最先进的差异化隐私推荐系统，在隐私和准确性之间取得了更好的平衡。

{"title":"Towards Differential Privacy in Sequential Recommendation: A Noisy Graph Neural Network Approach","authors":"Wentao Hu, Hui Fang","doi":"10.1145/3643821","DOIUrl":"https://doi.org/10.1145/3643821","url":null,"abstract":"With increasing frequency of high-profile privacy breaches in various online platforms, users are becoming more concerned about their privacy. And recommender system is the core component of online platforms for providing personalized service, consequently, its privacy preservation has attracted great attention. As the gold standard of privacy protection, differential privacy has been widely adopted to preserve privacy in recommender systems. However, existing differentially private recommender systems only consider static and independent interactions, so they cannot apply to sequential recommendation where behaviors are dynamic and dependent. Meanwhile, little attention has been paid on the privacy risk of sensitive user features, most of them only protect user feedbacks. In this work, we propose a novel DIfferentially Private Sequential recommendation framework with a noisy Graph Neural Network approach (denoted as DIPSGNN) to address these limitations. To the best of our knowledge, we are the first to achieve differential privacy in sequential recommendation with dependent interactions. Specifically, in DIPSGNN, we first leverage piecewise mechanism to protect sensitive user features. Then, we innovatively add calibrated noise into aggregation step of graph neural network based on aggregation perturbation mechanism. And this noisy graph neural network can protect sequentially dependent interactions and capture user preferences simultaneously. Extensive experiments demonstrate the superiority of our method over state-of-the-art differentially private recommender systems in terms of better balance between privacy and accuracy.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"5 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prerequisite-enhanced category-aware graph neural networks for course recommendation 用于课程推荐的前提条件增强型类别感知图神经网络

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-29 DOI: 10.1145/3643644

Jianshan Sun, Suyuan Mei, Kun Yuan, Yuanchun Jiang, Jie Cao

The rapid development of Massive Open Online Courses (MOOCs) platforms has created an urgent need for an efficient personalized course recommender system that can assist learners of all backgrounds and levels of knowledge in selecting appropriate courses. Currently, most existing methods utilize a sequential recommendation paradigm that captures the user’s learning interests from their learning history, typically through recurrent or graph neural networks. However, fewer studies have explored how to incorporate principles of human learning at both the course and category levels to enhance course recommendations. In this paper, we aim to address this gap by introducing a novel model, named Prerequisite-Enhanced Catory-Aware Graph Neural Network (PCGNN), for course recommendation. Specifically, we first construct a course prerequisite graph that reflects the human learning principles and further pre-train the course prerequisite relationships as the base embeddings for courses and categories. Then, to capture the user’s complex learning patterns, we build an item graph and a category graph from the user’s historical learning records, respectively: (1) the item graph reflects the course-level local learning transition patterns and (2) the category graph provides insight into the user’s long-term learning interest. Correspondingly, we propose a user interest encoder that employs a gated graph neural network to learn the course-level user interest embedding and design a category transition pattern encoder that utilizes GRU to yield the category-level user interest embedding. Finally, the two fine-grained user interest embeddings are fused to achieve precise course prediction. Extensive experiments on two real-world datasets demonstrate the effectiveness of PCGNN compared with other state-of-the-art methods.

大规模开放式在线课程（MOOCs）平台的快速发展迫切需要一种高效的个性化课程推荐系统，以帮助不同背景和知识水平的学习者选择合适的课程。目前，大多数现有方法都采用顺序推荐模式，通常通过递归或图神经网络从用户的学习历史中捕捉其学习兴趣。然而，很少有研究探讨如何在课程和类别两个层面上结合人类学习原则来增强课程推荐。在本文中，我们引入了一个用于课程推荐的新模型，名为 "先决条件增强型认知图神经网络（PCGNN）"，旨在填补这一空白。具体来说，我们首先构建了一个反映人类学习原则的课程先决条件图，并进一步预训练课程先决条件关系作为课程和类别的基础嵌入。然后，为了捕捉用户复杂的学习模式，我们从用户的历史学习记录中分别构建了项目图和类别图：（1）项目图反映了课程层面的局部学习过渡模式；（2）类别图提供了对用户长期学习兴趣的洞察。相应地，我们提出了一种用户兴趣编码器，利用门控图神经网络学习课程级用户兴趣嵌入，并设计了一种类别转换模式编码器，利用 GRU 生成类别级用户兴趣嵌入。最后，将两个细粒度用户兴趣嵌入融合起来，实现精确的课程预测。在两个真实数据集上进行的广泛实验证明，与其他最先进的方法相比，PCGNN 非常有效。

{"title":"Prerequisite-enhanced category-aware graph neural networks for course recommendation","authors":"Jianshan Sun, Suyuan Mei, Kun Yuan, Yuanchun Jiang, Jie Cao","doi":"10.1145/3643644","DOIUrl":"https://doi.org/10.1145/3643644","url":null,"abstract":"The rapid development of Massive Open Online Courses (MOOCs) platforms has created an urgent need for an efficient personalized course recommender system that can assist learners of all backgrounds and levels of knowledge in selecting appropriate courses. Currently, most existing methods utilize a sequential recommendation paradigm that captures the user’s learning interests from their learning history, typically through recurrent or graph neural networks. However, fewer studies have explored how to incorporate principles of human learning at both the course and category levels to enhance course recommendations. In this paper, we aim to address this gap by introducing a novel model, named Prerequisite-Enhanced Catory-Aware Graph Neural Network (PCGNN), for course recommendation. Specifically, we first construct a course prerequisite graph that reflects the human learning principles and further pre-train the course prerequisite relationships as the base embeddings for courses and categories. Then, to capture the user’s complex learning patterns, we build an item graph and a category graph from the user’s historical learning records, respectively: (1) the item graph reflects the course-level local learning transition patterns and (2) the category graph provides insight into the user’s long-term learning interest. Correspondingly, we propose a user interest encoder that employs a gated graph neural network to learn the course-level user interest embedding and design a category transition pattern encoder that utilizes GRU to yield the category-level user interest embedding. Finally, the two fine-grained user interest embeddings are fused to achieve precise course prediction. Extensive experiments on two real-world datasets demonstrate the effectiveness of PCGNN compared with other state-of-the-art methods.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"25 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attacking Click-through Rate Predictors via Generating Realistic Fake Samples 通过生成逼真的假样本攻击点击率预测器

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-27 DOI: 10.1145/3643685

Mingxing Duan, Kenli Li, Weinan Zhang, Jiarui Qin, Bin Xiao

How to construct imperceptible (realistic) fake samples is critical in adversarial attacks. Due to the sample feature diversity of a recommender system (containing both discrete and continuous features), traditional gradient-based adversarial attack methods may fail to construct realistic fake samples. Meanwhile, most recommendation models adopt click-through rate (CTR) predictors, which usually utilize black-box deep models with discrete features as input. Thus, how to efficiently construct realistic fake samples for black-box recommender systems is still full of challenges. In this paper, we propose a hierarchical adversarial attack method against black-box CTR models via generating realistic fake samples, named CTRAttack. To better train the generation network, the weights of its embedding layer are shared with those of the substitute model, with both the similarity loss and classification loss used to update the generation network. To ensure that the discrete features of the generated fake samples are all real, we first adopt the similarity loss to ensure that the distribution of the generated perturbed samples is sufficiently close to the distribution of the real features, then the nearest neighbor algorithm is used to retrieve the most appropriate features for non-existent discrete features from the candidate instance set. Extensive experiments demonstrate that CTRAttack can not only effectively attack the black-box recommender systems but also improve the robustness of these models while maintaining prediction accuracy.

如何构建不易察觉（逼真）的虚假样本在对抗攻击中至关重要。由于推荐系统的样本特征多样性（包含离散和连续特征），传统的基于梯度的对抗攻击方法可能无法构建逼真的假样本。同时，大多数推荐模型都采用点击率（CTR）预测器，这种预测器通常利用黑盒深度模型，以离散特征作为输入。因此，如何高效地为黑盒推荐系统构建真实的假样本仍然充满挑战。本文提出了一种针对黑盒 CTR 模型的分层对抗攻击方法，即 CTRAttack。为了更好地训练生成网络，其嵌入层的权重与替代模型的权重共享，相似性损失和分类损失都用于更新生成网络。为了确保生成的假样本的离散特征都是真实的，我们首先采用相似性损失来确保生成的扰动样本的分布与真实特征的分布足够接近，然后使用近邻算法从候选实例集中为不存在的离散特征检索最合适的特征。大量实验证明，CTRAttack 不仅能有效攻击黑盒推荐系统，还能在保持预测准确性的同时提高这些模型的鲁棒性。

{"title":"Attacking Click-through Rate Predictors via Generating Realistic Fake Samples","authors":"Mingxing Duan, Kenli Li, Weinan Zhang, Jiarui Qin, Bin Xiao","doi":"10.1145/3643685","DOIUrl":"https://doi.org/10.1145/3643685","url":null,"abstract":"How to construct imperceptible (realistic) fake samples is critical in adversarial attacks. Due to the sample feature diversity of a recommender system (containing both discrete and continuous features), traditional gradient-based adversarial attack methods may fail to construct realistic fake samples. Meanwhile, most recommendation models adopt click-through rate (CTR) predictors, which usually utilize black-box deep models with discrete features as input. Thus, how to efficiently construct realistic fake samples for black-box recommender systems is still full of challenges. In this paper, we propose a hierarchical adversarial attack method against black-box CTR models via generating realistic fake samples, named CTRAttack. To better train the generation network, the weights of its embedding layer are shared with those of the substitute model, with both the similarity loss and classification loss used to update the generation network. To ensure that the discrete features of the generated fake samples are all real, we first adopt the similarity loss to ensure that the distribution of the generated perturbed samples is sufficiently close to the distribution of the real features, then the nearest neighbor algorithm is used to retrieve the most appropriate features for non-existent discrete features from the candidate instance set. Extensive experiments demonstrate that CTRAttack can not only effectively attack the black-box recommender systems but also improve the robustness of these models while maintaining prediction accuracy.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"10 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FiFrauD: Unsupervised Financial Fraud Detection in Dynamic Graph Streams FiFrauD：动态图流中的无监督金融欺诈检测

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-27 DOI: 10.1145/3641857

Samira Khodabandehlou, Alireza Hashemi Golpayegani

Given a stream of financial transactions between traders in an e-market, how can we accurately detect fraudulent traders and suspicious behaviors in real time? Despite the efforts made in detecting these fraudsters, this field still faces serious challenges, including the ineffectiveness of existing methods for the complex and streaming environment of e-markets. As a result, it is still difficult to quickly and accurately detect suspected traders and behavior patterns in real-time transactions, and it is still considered an open problem. Therefore, to solve this problem and alleviate the existing challenges, in this paper, we propose FiFrauD, which is an unsupervised, scalable approach that depicts the behavior of manipulators in a transaction stream. In this approach, real-time transactions between traders are converted into a stream of graphs, and instead of using supervised and semi-supervised learning methods, fraudulent traders are detected precisely by exploiting density signals in graphs. Specifically, we reveal the traits of fraudulent traders in the market and propose a novel metric from this perspective, i.e., graph topology, time, and behavior. Then, we search for suspicious blocks by greedily optimizing the proposed metric. Theoretical analysis demonstrates upper bounds for FiFrauD's effectiveness in catching suspicious trades. Extensive experiments on five real-world datasets with both actual and synthetic labels demonstrate that FiFrauD achieves significant accuracy improvements compared to state-of-the-art fraud detection methods. Also, it can find various suspicious behavior patterns in a linear running time and provide interpretable results. Furthermore, FiFrauD is resistant to the camouflage tactics used by fraudulent traders.

面对电子市场中交易者之间的金融交易流，我们如何才能准确地实时检测出欺诈交易者和可疑行为？尽管在侦测这些欺诈者方面做出了努力，但这一领域仍面临严峻挑战，包括现有方法在电子市场复杂的流式环境中效果不佳。因此，要在实时交易中快速准确地检测出可疑交易者和行为模式仍然十分困难，而且仍被认为是一个悬而未决的问题。因此，为了解决这一问题，缓解现有的挑战，我们在本文中提出了 FiFrauD，这是一种无监督、可扩展的方法，用于描述交易流中操纵者的行为。在这种方法中，交易者之间的实时交易被转换成图流，而不是使用监督和半监督学习方法，而是通过利用图中的密度信号来精确检测欺诈交易者。具体来说，我们揭示了市场中欺诈交易者的特征，并从这个角度提出了一种新的度量方法，即图拓扑、时间和行为。然后，我们通过贪婪地优化所提出的指标来搜索可疑区块。理论分析表明了 FiFrauD 在捕捉可疑交易方面的有效性上限。在实际和合成标签的五个真实数据集上进行的广泛实验表明，与最先进的欺诈检测方法相比，FiFrauD 的准确率有了显著提高。此外，它还能在线性运行时间内发现各种可疑行为模式，并提供可解释的结果。此外，FiFrauD 还能抵御欺诈交易者使用的伪装策略。

{"title":"FiFrauD: Unsupervised Financial Fraud Detection in Dynamic Graph Streams","authors":"Samira Khodabandehlou, Alireza Hashemi Golpayegani","doi":"10.1145/3641857","DOIUrl":"https://doi.org/10.1145/3641857","url":null,"abstract":"Given a stream of financial transactions between traders in an e-market, how can we accurately detect fraudulent traders and suspicious behaviors in real time? Despite the efforts made in detecting these fraudsters, this field still faces serious challenges, including the ineffectiveness of existing methods for the complex and streaming environment of e-markets. As a result, it is still difficult to quickly and accurately detect suspected traders and behavior patterns in real-time transactions, and it is still considered an open problem. Therefore, to solve this problem and alleviate the existing challenges, in this paper, we propose FiFrauD, which is an unsupervised, scalable approach that depicts the behavior of manipulators in a transaction stream. In this approach, real-time transactions between traders are converted into a stream of graphs, and instead of using supervised and semi-supervised learning methods, fraudulent traders are detected precisely by exploiting density signals in graphs. Specifically, we reveal the traits of fraudulent traders in the market and propose a novel metric from this perspective, i.e., graph topology, time, and behavior. Then, we search for suspicious blocks by greedily optimizing the proposed metric. Theoretical analysis demonstrates upper bounds for FiFrauD's effectiveness in catching suspicious trades. Extensive experiments on five real-world datasets with both actual and synthetic labels demonstrate that FiFrauD achieves significant accuracy improvements compared to state-of-the-art fraud detection methods. Also, it can find various suspicious behavior patterns in a linear running time and provide interpretable results. Furthermore, FiFrauD is resistant to the camouflage tactics used by fraudulent traders.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Math Word Problem Generation via Disentangled Memory Retrieval 通过分离记忆检索生成数学单词问题

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-26 DOI: 10.1145/3639569

Wei Qin, Xiaowei Wang, Zhenzhen Hu, Lei Wang, Yunshi Lan, Richang Hong

The task of math word problem(MWP) generation, which generates an MWP given an equation and relevant topic words, has increasingly attracted researchers’ attention. In this work, we introduce a simple memory retrieval module to search related training MWPs, which are used to augment the generation. To retrieve more relevant training data, we also propose a disentangled memory retrieval module based on the simple memory retrieval module. To this end, we first disentangle the training MWPs into logical description and scenario description and then record them in respective memory modules. Later, we use the given equation and topic words as queries to retrieve relevant logical descriptions and scenario descriptions from the corresponding memory modules respectively. The retrieved results are then used to complement the process of the MWP generation. Extensive experiments and ablation studies verify the superior performance of our method and the effectiveness of each proposed module. The code is available at https://github.com/mwp-g/MWPG-DMR.

数学单词问题（MWP）生成任务是在给定方程和相关主题词的情况下生成一个 MWP，这一任务越来越受到研究人员的关注。在这项工作中，我们引入了一个简单的记忆检索模块，用于搜索相关的训练 MWP，并将其用于增强生成。为了检索到更多相关的训练数据，我们还在简单记忆检索模块的基础上提出了一种分解记忆检索模块。为此，我们首先将训练 MWP 分解为逻辑描述和场景描述，然后将其记录在相应的记忆模块中。之后，我们使用给定的方程和主题词作为查询，分别从相应的记忆模块中检索相关的逻辑描述和场景描述。检索结果将用于补充 MWP 生成过程。广泛的实验和消融研究验证了我们的方法的卓越性能和每个拟议模块的有效性。代码见 https://github.com/mwp-g/MWPG-DMR。

引用次数: 0

A Survey on AutoML Methods and Systems for Clustering 有关用于聚类的 AutoML 方法和系统的调查

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-26 DOI: 10.1145/3643564

Yannis Poulakis, Christos Doulkeridis, Dimosthenis Kyriazis

Automated Machine Learning (AutoML) aims to identify the best-performing machine learning algorithm along with its input parameters for a given data set and a specific machine learning task. This is a challenging problem, as the process of finding the best model and tuning it for a particular problem at hand is both time-consuming for a data scientist and computationally expensive. In this survey, we focus on unsupervised learning, and we turn our attention on AutoML methods for clustering. We present a systematic review that includes many recent research works for automated clustering. Furthermore, we provide a taxonomy for the classification of existing works, and we perform a qualitative comparison. As a result, this survey provides a comprehensive overview of the field of AutoML for clustering. Moreover, we identify open challenges for future research in this field.

自动机器学习（AutoML）旨在针对给定的数据集和特定的机器学习任务，找出性能最佳的机器学习算法及其输入参数。这是一个具有挑战性的问题，因为对于数据科学家来说，为手头的特定问题找到最佳模型并对其进行调整的过程既耗时又耗费计算资源。在本调查中，我们将重点放在无监督学习上，并将注意力转向用于聚类的 AutoML 方法。我们对自动聚类的许多最新研究成果进行了系统回顾。此外，我们还为现有作品的分类提供了一个分类标准，并进行了定性比较。因此，本调查报告提供了自动聚类ML 领域的全面概述。此外，我们还确定了该领域未来研究的挑战。

引用次数: 0

CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge Graph CoBjeason：基于知情知识图谱的多代理协作推理图像中的覆盖对象

IF 3.6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data

Pub Date : 2024-01-26 DOI: 10.1145/3643565

Huan Rong, Minfeng Qian, Tinghuai Ma, Di Jin, Victor S. Sheng

Object detection is a widely studied problem in existing works. However, in this paper, we turn to a more challenging problem of “Covered Object Reasoning”, aimed at reasoning the category label of target object in the given image particularly when it has been totally covered (or invisible). To resolve this problem, we propose CoBjeason to seize the opportunity when visual reasoning meets the knowledge graph, where “empirical cognition” on common visual contexts have been incorporated as knowledge graph to conduct reinforced multi-hop reasoning via two collaborative agents. Such two agents, for one thing, stand at the covered object (or unknown entity) to observe the surrounding visual cues in the given image and gradually select entities and relations from the global gallery-level knowledge graph which contains entity-pairs frequently occurring across the entire image-collection, so as to infer the main structure of image-level knowledge graph forward expanded from the unknown entity. In turn, for another, based on the reasoned image-level knowledge graph, the semantic context among entities will be aggregated backward into unknown entity to select an appropriate entity from the global gallery-level knowledge graph as the reasoning result. Moreover, such two agents will collaborate with each other, securing that the above Forward & Backward Reasoning will step towards the same destination of the higher performance on covered object reasoning. To our best knowledge, this is the first work on Covered Object Reasoning with Knowledge Graphs and reinforced Multi-Agent collaboration. Particularly, our study on Covered Object Reasoning and the proposed model CoBjeason could offer novel insights into more basic Computer Vision (CV) tasks, such as Semantic Segmentation with better understanding on the current scene when some objects are blurred or covered, Visual Question Answering with enhancement on the inference in more complicated visual context when some objects are covered or invisible, and Image Caption Generation with the augmentation on the richness of visual context for images containing partially visible objects. The improvement on the above basic CV tasks can further refine more complicated ones involved with nuanced visual interpretation like Autonomous Driving, where the recognition and reasoning on partially visible or covered object are critical. According to the experimental results, our proposed CoBjeason can achieve the best overall ranking performance on covered object reasoning compared with other models, meanwhile enjoying the advantage of lower “exploration cost”, with the insensitivity against the long-tail covered objects and the acceptable time complexity.

物体检测是现有著作中广泛研究的问题。然而，在本文中，我们将转向更具挑战性的 "覆盖物体推理 "问题，旨在推理给定图像中目标物体的类别标签，尤其是当目标物体被完全覆盖（或不可见）时。为了解决这个问题，我们提出了 CoBjeason，以抓住视觉推理与知识图谱相遇的机会，将对常见视觉环境的 "经验认知 "纳入知识图谱，通过两个协作代理进行强化的多跳推理。这样的两个代理，一是站在被覆盖对象（或未知实体）的位置，观察给定图像中周围的视觉线索，并逐步从包含整个图像集合中频繁出现的实体对的全局图库级知识图谱中选择实体和关系，从而推断出从未知实体向前扩展的图像级知识图谱的主要结构。而另一个代理则根据推理出的图像级知识图谱，将实体间的语义上下文反向聚合到未知实体中，从全局图库级知识图谱中选择合适的实体作为推理结果。此外，这两个代理还将相互协作，确保上述前向与后向推理（Forward & Backward Reasoning）朝着同一目标迈进，即提高覆盖对象推理的性能。据我们所知，这是第一项利用知识图谱和强化多代理协作进行覆盖对象推理的研究。特别是，我们对覆盖物体推理的研究和提出的模型 CoBjeason 可以为更多基本的计算机视觉（CV）任务提供新的见解，例如，当一些物体被模糊或覆盖时，语义分割可以更好地理解当前场景；当一些物体被覆盖或不可见时，视觉问题解答可以增强在更复杂的视觉上下文中的推理；对于包含部分可见物体的图像，图像标题生成可以增强视觉上下文的丰富性。对上述基本 CV 任务的改进可以进一步完善涉及细微视觉解释的更复杂任务，如自动驾驶，其中对部分可见或覆盖物体的识别和推理至关重要。实验结果表明，与其他模型相比，我们提出的 CoBjeason 在覆盖物体推理方面的整体排名性能最佳，同时还具有 "探索成本 "较低、对长尾覆盖物体不敏感、时间复杂度可接受等优势。

{"title":"CoBjeason: Reasoning Covered Object in Image by Multi-Agent Collaboration Based on Informed Knowledge Graph","authors":"Huan Rong, Minfeng Qian, Tinghuai Ma, Di Jin, Victor S. Sheng","doi":"10.1145/3643565","DOIUrl":"https://doi.org/10.1145/3643565","url":null,"abstract":"Object detection is a widely studied problem in existing works. However, in this paper, we turn to a more challenging problem of “Covered Object Reasoning”, aimed at reasoning the category label of target object in the given image particularly when it has been totally covered (or invisible). To resolve this problem, we propose CoBjeason to seize the opportunity when visual reasoning meets the knowledge graph, where “empirical cognition” on common visual contexts have been incorporated as knowledge graph to conduct reinforced multi-hop reasoning via two collaborative agents. Such two agents, for one thing, stand at the covered object (or unknown entity) to observe the surrounding visual cues in the given image and gradually select entities and relations from the global gallery-level knowledge graph which contains entity-pairs frequently occurring across the entire image-collection, so as to infer the main structure of image-level knowledge graph forward expanded from the unknown entity. In turn, for another, based on the reasoned image-level knowledge graph, the semantic context among entities will be aggregated backward into unknown entity to select an appropriate entity from the global gallery-level knowledge graph as the reasoning result. Moreover, such two agents will collaborate with each other, securing that the above Forward & Backward Reasoning will step towards the same destination of the higher performance on covered object reasoning. To our best knowledge, this is the first work on Covered Object Reasoning with Knowledge Graphs and reinforced Multi-Agent collaboration. Particularly, our study on Covered Object Reasoning and the proposed model CoBjeason could offer novel insights into more basic Computer Vision (CV) tasks, such as Semantic Segmentation with better understanding on the current scene when some objects are blurred or covered, Visual Question Answering with enhancement on the inference in more complicated visual context when some objects are covered or invisible, and Image Caption Generation with the augmentation on the richness of visual context for images containing partially visible objects. The improvement on the above basic CV tasks can further refine more complicated ones involved with nuanced visual interpretation like Autonomous Driving, where the recognition and reasoning on partially visible or covered object are critical. According to the experimental results, our proposed CoBjeason can achieve the best overall ranking performance on covered object reasoning compared with other models, meanwhile enjoying the advantage of lower “exploration cost”, with the insensitivity against the long-tail covered objects and the acceptable time complexity.","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"75 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139589294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0