首页 > 最新文献

ACM Transactions on Knowledge Discovery from Data最新文献

英文 中文
NOODLE: Joint Cross-View Discrepancy Discovery and High-Order Correlation Detection for Multi-View Subspace Clustering NOODLE:多视图子空间聚类的跨视图差异发现和高阶相关性联合检测
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-20 DOI: 10.1145/3653305
Zhibin Gu, Songhe Feng, Zhendong Li, Jiazheng Yuan, Jun Liu

Benefiting from the effective exploration of the valuable topological pair-wise relationship of data points across multiple views, multi-view subspace clustering (MVSC) has received increasing attention in recent years. However, we observe that existing MVSC approaches still suffer from two limitations that need to be further improved to enhance the clustering effectiveness. Firstly, previous MVSC approaches mainly prioritize extracting multi-view consistency, often neglecting the cross-view discrepancy that may arise from noise, outliers, and view-inherent properties. Secondly, existing techniques are constrained by their reliance on pair-wise sample correlation and pair-wise view correlation, failing to capture the high-order correlations that are enclosed within multiple views. To address these issues, we propose a novel MVSC framework via joiNt crOss-view discrepancy discOvery anD high-order correLation dEtection (NOODLE), seeking an informative target subspace representation compatible across multiple features to facilitate the downstream clustering task. Specifically, we first exploit the self-representation mechanism to learn multiple view-specific affinity matrices, which are further decomposed into cohesive factors and incongruous factors to fit the multi-view consistency and discrepancy, respectively. Additionally, an explicit cross-view sparse regularization is applied to incoherent parts, ensuring the consistency and discrepancy to be precisely separated from the initial subspace representations. Meanwhile, the multiple cohesive parts are stacked into a three-dimensional tensor associated with a tensor-Singular Value Decomposition (t-SVD) based weighted tensor nuclear norm constraint, enabling effective detection of the high-order correlations implicit in multi-view data. Our proposed method outperforms state-of-the-art methods for multi-view clustering on six benchmark datasets, demonstrating its effectiveness.

多视角子空间聚类(Multi-view Subspace Clustering,MVSC)得益于有效发掘数据点在多个视角下有价值的拓扑配对关系,近年来受到越来越多的关注。然而,我们发现现有的多视角子空间聚类方法仍然存在两个局限性,需要进一步改进以提高聚类效果。首先,以往的 MVSC 方法主要优先提取多视图一致性,往往忽略了可能由噪声、异常值和视图固有属性引起的跨视图差异。其次,现有技术依赖于成对样本相关性和成对视图相关性,无法捕捉到多视图中的高阶相关性。为了解决这些问题,我们提出了一种新颖的 MVSC 框架,即 "多视角差异识别和高阶相关性保护(NOODLE)",寻求一种兼容多个特征的信息目标子空间表示,以促进下游聚类任务。具体来说,我们首先利用自表示机制来学习多个视图特定的亲和矩阵,并将其进一步分解为内聚因子和不协调因子,以分别适应多视图一致性和差异性。此外,还对不一致部分采用了明确的跨视角稀疏正则化,确保一致性和差异性从初始子空间表征中精确分离出来。同时,将多个内聚部分堆叠成一个三维张量,并与基于张量-星形值分解(t-SVD)的加权张量核规范约束相关联,从而有效检测多视角数据中隐含的高阶相关性。我们提出的方法在六个基准数据集上的表现优于最先进的多视角聚类方法,证明了它的有效性。
{"title":"NOODLE: Joint Cross-View Discrepancy Discovery and High-Order Correlation Detection for Multi-View Subspace Clustering","authors":"Zhibin Gu, Songhe Feng, Zhendong Li, Jiazheng Yuan, Jun Liu","doi":"10.1145/3653305","DOIUrl":"https://doi.org/10.1145/3653305","url":null,"abstract":"<p>Benefiting from the effective exploration of the valuable topological pair-wise relationship of data points across multiple views, multi-view subspace clustering (MVSC) has received increasing attention in recent years. However, we observe that existing MVSC approaches still suffer from two limitations that need to be further improved to enhance the clustering effectiveness. Firstly, previous MVSC approaches mainly prioritize extracting multi-view consistency, often neglecting the cross-view discrepancy that may arise from noise, outliers, and view-inherent properties. Secondly, existing techniques are constrained by their reliance on pair-wise sample correlation and pair-wise view correlation, failing to capture the high-order correlations that are enclosed within multiple views. To address these issues, we propose a novel MVSC framework via joi<b>N</b>t cr<b>O</b>ss-view discrepancy disc<b>O</b>very an<b>D</b> high-order corre<b>L</b>ation d<b>E</b>tection (<b>NOODLE</b>), seeking an informative target subspace representation compatible across multiple features to facilitate the downstream clustering task. Specifically, we first exploit the self-representation mechanism to learn multiple view-specific affinity matrices, which are further decomposed into cohesive factors and incongruous factors to fit the multi-view consistency and discrepancy, respectively. Additionally, an explicit cross-view sparse regularization is applied to incoherent parts, ensuring the consistency and discrepancy to be precisely separated from the initial subspace representations. Meanwhile, the multiple cohesive parts are stacked into a three-dimensional tensor associated with a tensor-Singular Value Decomposition (t-SVD) based weighted tensor nuclear norm constraint, enabling effective detection of the high-order correlations implicit in multi-view data. Our proposed method outperforms state-of-the-art methods for multi-view clustering on six benchmark datasets, demonstrating its effectiveness.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"23 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140196302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representative and Back-In-Time Sampling from Real-World Hypergraphs 从真实世界超图中进行代表性和实时采样
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-19 DOI: 10.1145/3653306
Minyoung Choe, Jaemin Yoo, Geon Lee, Woonsung Baek, U Kang, Kijung Shin

Graphs are widely used for representing pairwise interactions in complex systems. Since such real-world graphs are large and often evergrowing, sampling subgraphs is useful for various purposes, including simulation, visualization, stream processing, representation learning, and crawling. However, many complex systems consist of group interactions (e.g., collaborations of researchers and discussions on online Q&A platforms) and thus are represented more naturally and accurately by hypergraphs than by ordinary graphs. Motivated by the prevalence of large-scale hypergraphs, we study the problem of sampling from real-world hypergraphs, aiming to answer (Q1) how can we measure the goodness of sub-hypergraphs, and (Q2) how can we efficiently find a “good” sub-hypergraph. Regarding Q1, we distinguish between two goals: (a) representative sampling, which aims to capture the characteristics of the input hypergraph, and (b) back-in-time sampling, which aims to closely approximate a past snapshot of the input time-evolving hypergraph. To evaluate the similarity of the sampled sub-hypergraph to the target (i.e., the input hypergraph or its past snapshot), we consider 10 graph-level, hyperedge-level, and node-level statistics. Regarding Q2, we first conduct a thorough analysis of various intuitive approaches using 11 real-world hypergraphs, Then, based on this analysis, we propose MiDaS and MiDaS-B, designed for representative sampling and back-in-time sampling, respectively. Regarding representative sampling, we demonstrate through extensive experiments that MiDaS, which employs a sampling bias towards high-degree nodes in hyperedge selection, is (a) Representative: finding overall the most representative samples among 15 considered approaches, (b) Fast: several orders of magnitude faster than the strongest competitors, and (c) Automatic: automatically tuning the degree of sampling bias. Regarding back-in-time sampling, we demonstrate that MiDaS-B inherits the strengths of MiDaS despite an additional challenge—the unavailability of the target (i.e., past snapshot). It effectively handles this challenge by focusing on replicating universal evolutionary patterns, rather than directly replicating the target.

图被广泛用于表示复杂系统中的成对交互。由于这种真实世界的图很大,而且经常不断增长,因此对子图进行采样可用于各种目的,包括模拟、可视化、流处理、表征学习和爬行。然而,许多复杂系统由群体交互组成(如研究人员的合作和在线问答平台上的讨论),因此超图比普通图更自然、更准确地表示这些系统。受大规模超图普遍存在的启发,我们研究了从真实世界超图中抽样的问题,旨在回答(问题 1)如何衡量子超图的好坏,以及(问题 2)如何高效地找到 "好 "的子超图。关于问题 1,我们区分了两个目标:(a) 代表性采样,其目的是捕捉输入超图的特征;(b) 时间回溯采样,其目的是接近输入的随时间演变的超图的过去快照。为了评估采样子超图与目标图(即输入超图或其过去快照)的相似性,我们考虑了 10 个图级、超边级和节点级统计数据。关于 Q2,我们首先使用 11 个真实超图对各种直观方法进行了深入分析,然后在此基础上提出了 MiDaS 和 MiDaS-B,分别用于代表性采样和时间回溯采样。在代表性采样方面,我们通过大量实验证明,MiDaS 在选择超图时采用了偏向高阶节点的采样方法,具有以下优点:(a)代表性:在所考虑的 15 种方法中找到的样本总体上最具代表性;(b)快速:比最强的竞争对手快几个数量级;(c)自动:可自动调整采样偏差程度。关于回溯时间采样,我们证明 MiDaS-B 继承了 MiDaS 的优势,尽管它还面临一个额外的挑战--目标(即过去的快照)不可用。它专注于复制普遍的进化模式,而不是直接复制目标,从而有效地应对了这一挑战。
{"title":"Representative and Back-In-Time Sampling from Real-World Hypergraphs","authors":"Minyoung Choe, Jaemin Yoo, Geon Lee, Woonsung Baek, U Kang, Kijung Shin","doi":"10.1145/3653306","DOIUrl":"https://doi.org/10.1145/3653306","url":null,"abstract":"<p>Graphs are widely used for representing pairwise interactions in complex systems. Since such real-world graphs are large and often evergrowing, sampling subgraphs is useful for various purposes, including simulation, visualization, stream processing, representation learning, and crawling. However, many complex systems consist of group interactions (e.g., collaborations of researchers and discussions on online Q&amp;A platforms) and thus are represented more naturally and accurately by hypergraphs than by ordinary graphs. Motivated by the prevalence of large-scale hypergraphs, we study the problem of sampling from real-world hypergraphs, aiming to answer (Q1) how can we measure the goodness of sub-hypergraphs, and (Q2) how can we efficiently find a “good” sub-hypergraph. Regarding Q1, we distinguish between two goals: (a) <i>representative sampling</i>, which aims to capture the characteristics of the input hypergraph, and (b) <i>back-in-time sampling</i>, which aims to closely approximate a past snapshot of the input time-evolving hypergraph. To evaluate the similarity of the sampled sub-hypergraph to the target (i.e., the input hypergraph or its past snapshot), we consider 10 graph-level, hyperedge-level, and node-level statistics. Regarding Q2, we first conduct a thorough analysis of various intuitive approaches using 11 real-world hypergraphs, Then, based on this analysis, we propose <span>MiDaS</span> and <span>MiDaS-B</span>, designed for representative sampling and back-in-time sampling, respectively. Regarding representative sampling, we demonstrate through extensive experiments that <span>MiDaS</span>, which employs a sampling bias towards high-degree nodes in hyperedge selection, is (a) <b>Representative</b>: finding overall the most representative samples among 15 considered approaches, (b) <b>Fast</b>: several orders of magnitude faster than the strongest competitors, and (c) <b>Automatic</b>: automatically tuning the degree of sampling bias. Regarding back-in-time sampling, we demonstrate that <span>MiDaS-B</span> inherits the strengths of <span>MiDaS</span> despite an additional challenge—the unavailability of the target (i.e., past snapshot). It effectively handles this challenge by focusing on replicating universal evolutionary patterns, rather than directly replicating the target.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"26 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140167461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised Multi-view Clustering based on Nonnegative Matrix Factorization with Fusion Regularization 基于融合正则化非负矩阵因式分解的半监督多视角聚类技术
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-18 DOI: 10.1145/3653022
Guosheng Cui, Ruxin Wang, Dan Wu, Ye Li

Multi-view clustering has attracted significant attention and application. Nonnegative matrix factorization is one popular feature learning technology in pattern recognition. In recent years, many semi-supervised nonnegative matrix factorization algorithms are proposed by considering label information, which has achieved outstanding performance for multi-view clustering. However, most of these existing methods have either failed to consider discriminative information effectively or included too much hyper-parameters. Addressing these issues, a semi-supervised multi-view nonnegative matrix factorization with a novel fusion regularization (FRSMNMF) is developed in this paper. In this work, we uniformly constrain alignment of multiple views and discriminative information among clusters with designed fusion regularization. Meanwhile, to align the multiple views effectively, two kinds of compensating matrices are used to normalize the feature scales of different views. Additionally, we preserve the geometry structure information of labeled and unlabeled samples by introducing the graph regularization simultaneously. Due to the proposed methods, two effective optimization strategies based on multiplicative update rules are designed. Experiments implemented on six real-world datasets have demonstrated the effectiveness of our FRSMNMF comparing with several state-of-the-art unsupervised and semi-supervised approaches.

多视图聚类已引起了广泛关注和应用。非负矩阵因式分解是模式识别中一种常用的特征学习技术。近年来,人们提出了许多考虑标签信息的半监督非负矩阵因式分解算法,这些算法在多视图聚类中取得了优异的性能。然而,大多数现有方法要么未能有效考虑判别信息,要么包含过多的超参数。针对这些问题,本文开发了一种带有新型融合正则化(FRSMNMF)的半监督多视图非负矩阵因式分解方法。在这项工作中,我们利用设计的融合正则化对多视图的配准和聚类间的判别信息进行了统一约束。同时,为了有效地对多视图进行配准,我们使用了两种补偿矩阵对不同视图的特征尺度进行归一化处理。此外,我们还通过同时引入图正则化来保留已标记样本和未标记样本的几何结构信息。根据所提出的方法,我们设计了两种基于乘法更新规则的有效优化策略。在六个真实世界数据集上进行的实验证明,与几种最先进的无监督和半监督方法相比,我们的 FRSMNMF 非常有效。
{"title":"Semi-supervised Multi-view Clustering based on Nonnegative Matrix Factorization with Fusion Regularization","authors":"Guosheng Cui, Ruxin Wang, Dan Wu, Ye Li","doi":"10.1145/3653022","DOIUrl":"https://doi.org/10.1145/3653022","url":null,"abstract":"<p>Multi-view clustering has attracted significant attention and application. Nonnegative matrix factorization is one popular feature learning technology in pattern recognition. In recent years, many semi-supervised nonnegative matrix factorization algorithms are proposed by considering label information, which has achieved outstanding performance for multi-view clustering. However, most of these existing methods have either failed to consider discriminative information effectively or included too much hyper-parameters. Addressing these issues, a semi-supervised multi-view nonnegative matrix factorization with a novel fusion regularization (FRSMNMF) is developed in this paper. In this work, we uniformly constrain alignment of multiple views and discriminative information among clusters with designed fusion regularization. Meanwhile, to align the multiple views effectively, two kinds of compensating matrices are used to normalize the feature scales of different views. Additionally, we preserve the geometry structure information of labeled and unlabeled samples by introducing the graph regularization simultaneously. Due to the proposed methods, two effective optimization strategies based on multiplicative update rules are designed. Experiments implemented on six real-world datasets have demonstrated the effectiveness of our FRSMNMF comparing with several state-of-the-art unsupervised and semi-supervised approaches.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"47 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dual Perspective Framework of Knowledge-correlation for Cross-domain Recommendation 用于跨领域推荐的知识关联双视角框架
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-18 DOI: 10.1145/3652520
Yuhan Wang, Qing Xie, Mengzi Tang, Lin Li, Jingling Yuan, Yongjian Liu

Recommender System provides users with online services in a personalized way. The performance of traditional recommender systems may deteriorate because of problems such as cold-start and data sparsity. Cross-domain Recommendation System utilizes the richer information from auxiliary domains to guide the task in the target domain. However, direct knowledge transfer may lead to a negative impact due to data heterogeneity and feature mismatch between domains. In this paper, we innovatively explore the cross-domain correlation from the perspectives of content semanticity and structural connectivity to fully exploit the information of Knowledge Graph. First, we adopt domain adaptation that automatically extracts transferable features to capture cross-domain semantic relations. Second, we devise a knowledge-aware graph neural network to explicitly model the high-order connectivity across domains. Third, we develop feature fusion strategies to combine the advantages of semantic and structural information. By simulating the cold-start scenario on two real-world datasets, the experimental results show that our proposed method has superior performance in accuracy and diversity compared with the SOTA methods. It demonstrates that our method can accurately predict users’ expressed preferences while exploring their potential diverse interests.

推荐系统以个性化的方式为用户提供在线服务。由于冷启动和数据稀疏等问题,传统推荐系统的性能可能会下降。跨领域推荐系统利用辅助领域的丰富信息来指导目标领域的任务。然而,由于域间数据异构和特征不匹配,直接的知识转移可能会带来负面影响。在本文中,我们从内容语义和结构连接的角度创新性地探索了跨领域相关性,以充分利用知识图谱的信息。首先,我们采用领域适应技术,自动提取可转移特征,捕捉跨领域语义关系。其次,我们设计了一种知识感知图神经网络,以明确建立跨领域高阶连接模型。第三,我们开发了特征融合策略,以结合语义信息和结构信息的优势。通过在两个真实数据集上模拟冷启动场景,实验结果表明,与 SOTA 方法相比,我们提出的方法在准确性和多样性方面都有更出色的表现。这表明我们的方法可以准确预测用户表达的偏好,同时发掘他们潜在的不同兴趣。
{"title":"A Dual Perspective Framework of Knowledge-correlation for Cross-domain Recommendation","authors":"Yuhan Wang, Qing Xie, Mengzi Tang, Lin Li, Jingling Yuan, Yongjian Liu","doi":"10.1145/3652520","DOIUrl":"https://doi.org/10.1145/3652520","url":null,"abstract":"<p>Recommender System provides users with online services in a personalized way. The performance of traditional recommender systems may deteriorate because of problems such as cold-start and data sparsity. Cross-domain Recommendation System utilizes the richer information from auxiliary domains to guide the task in the target domain. However, direct knowledge transfer may lead to a negative impact due to data heterogeneity and feature mismatch between domains. In this paper, we innovatively explore the cross-domain correlation from the perspectives of content semanticity and structural connectivity to fully exploit the information of Knowledge Graph. First, we adopt domain adaptation that automatically extracts transferable features to capture cross-domain semantic relations. Second, we devise a knowledge-aware graph neural network to explicitly model the high-order connectivity across domains. Third, we develop feature fusion strategies to combine the advantages of semantic and structural information. By simulating the cold-start scenario on two real-world datasets, the experimental results show that our proposed method has superior performance in accuracy and diversity compared with the SOTA methods. It demonstrates that our method can accurately predict users’ expressed preferences while exploring their potential diverse interests.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepMeshCity: A Deep Learning Model for Urban Grid Prediction DeepMeshCity:用于城市网格预测的深度学习模型
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-15 DOI: 10.1145/3652859
Chi Zhang, Linhao Cai, Meng Chen, Xiucheng Li, Gao Cong

Urban grid prediction can be applied to many classic spatial-temporal prediction tasks such as air quality prediction, crowd density prediction, and traffic flow prediction, which is of great importance to smart city building. In light of its practical values, many methods have been developed for it and have achieved promising results. Despite their successes, two main challenges remain open: a) how to well capture the global dependencies and b) how to effectively model the multi-scale spatial-temporal correlations? To address these two challenges, we propose a novel method—DeepMeshCity, with a carefully-designed Self-Attention Citywide Grid Learner (SA-CGL) block comprising of a self-attention unit and a Citywide Grid Learner (CGL) unit. The self-attention block aims to capture the global spatial dependencies, and the CGL unit is responsible for learning the spatial-temporal correlations. In particular, a multi-scale memory unit is proposed to traverse all stacked SA-CGL blocks along a zigzag path to capture the multi-scale spatial-temporal correlations. In addition, we propose to initialize the single-scale memory units and the multi-scale memory units by using the corresponding ones in the previous fragment stack, so as to speed up the model training. We evaluate the performance of our proposed model by comparing with several state-of-the-art methods on four real-world datasets for two urban grid prediction applications. The experimental results verify the superiority of DeepMeshCity over the existing ones. The code is available at https://github.com/ILoveStudying/DeepMeshCity.

城市网格预测可应用于许多经典的时空预测任务,如空气质量预测、人群密度预测、交通流量预测等,对智慧城市建设具有重要意义。鉴于其实用价值,许多方法已被开发出来,并取得了可喜的成果。尽管这些方法取得了成功,但仍存在两大挑战:a) 如何很好地捕捉全局相关性;b) 如何有效地模拟多尺度时空相关性?为了解决这两个难题,我们提出了一种新方法--DeepMeshCity,其中包含一个精心设计的自我关注全城网格学习器(SA-CGL)模块,由一个自我关注单元和一个全城网格学习器(CGL)单元组成。自我注意单元旨在捕捉全局空间依赖关系,而全城网格学习单元则负责学习时空相关性。我们特别提出了一个多尺度记忆单元,用于沿着之字形路径遍历所有堆叠的 SA-CGL 块,以捕捉多尺度时空相关性。此外,我们还建议使用前一个片段堆栈中的相应记忆单元来初始化单尺度记忆单元和多尺度记忆单元,从而加快模型训练速度。我们在两个城市网格预测应用的四个实际数据集上,通过与几种最先进的方法进行比较,评估了我们提出的模型的性能。实验结果验证了 DeepMeshCity 优于现有方法。代码可在 https://github.com/ILoveStudying/DeepMeshCity 上获取。
{"title":"DeepMeshCity: A Deep Learning Model for Urban Grid Prediction","authors":"Chi Zhang, Linhao Cai, Meng Chen, Xiucheng Li, Gao Cong","doi":"10.1145/3652859","DOIUrl":"https://doi.org/10.1145/3652859","url":null,"abstract":"<p>Urban grid prediction can be applied to many classic spatial-temporal prediction tasks such as air quality prediction, crowd density prediction, and traffic flow prediction, which is of great importance to smart city building. In light of its practical values, many methods have been developed for it and have achieved promising results. Despite their successes, two main challenges remain open: a) how to well capture the global dependencies and b) how to effectively model the multi-scale spatial-temporal correlations? To address these two challenges, we propose a novel method—<sans-serif>DeepMeshCity</sans-serif>, with a carefully-designed Self-Attention Citywide Grid Learner (<sans-serif>SA-CGL</sans-serif>) block comprising of a self-attention unit and a Citywide Grid Learner (<sans-serif>CGL</sans-serif>) unit. The self-attention block aims to capture the global spatial dependencies, and the <sans-serif>CGL</sans-serif> unit is responsible for learning the spatial-temporal correlations. In particular, a multi-scale memory unit is proposed to traverse all stacked <sans-serif>SA-CGL</sans-serif> blocks along a zigzag path to capture the multi-scale spatial-temporal correlations. In addition, we propose to initialize the single-scale memory units and the multi-scale memory units by using the corresponding ones in the previous fragment stack, so as to speed up the model training. We evaluate the performance of our proposed model by comparing with several state-of-the-art methods on four real-world datasets for two urban grid prediction applications. The experimental results verify the superiority of DeepMeshCity over the existing ones. The code is available at https://github.com/ILoveStudying/DeepMeshCity.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"19 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FulBM: Fast fully batch maintenance for landmark-based 3-hop cover labeling FulBM:基于地标的 3 跳覆盖标记的快速全批量维护
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-15 DOI: 10.1145/3650035
Wentai Zhang, HaiHong E, HaoRan Luo, Mingzhi Sun

Landmark-based 3-hop cover labeling is a category of approaches for shortest distance/path queries on large-scale complex networks. It pre-computes an index offline to accelerate the online distance/path query. Most real-world graphs undergo rapid changes in topology, which makes index maintenance on dynamic graphs necessary. So far, the majority of index maintenance methods can handle only one edge update (either an addition or deletion) each time. To keep up with frequently changing graphs, we research the fully batch maintenance problem for the 3-hop cover labeling, and proposed the method called FulBM. FulBM is composed of two algorithms: InsBM and DelBM, which are designed to handle batch edge insertions and deletions respectively. This separation is motivated by the insight that batch maintenance for edge insertions are much more time-efficient, and the fact that most edge updates in the real world are incremental. Both InsBM and DelBM are equipped with well-designed pruning strategies to minimize the number of vertex accesses. We have conducted comprehensive experiments on both synthetic and real-world graphs to verify the efficiency of FulBM and its variants for weighted graphs. The results show that our methods achieve 5.5 × to 228 × speedup compared with the state-of-the-art method.

基于地标的三跳覆盖标注法是在大规模复杂网络上进行最短距离/路径查询的一类方法。它通过离线预计算索引来加速在线距离/路径查询。现实世界中的大多数图的拓扑结构都会发生快速变化,因此有必要对动态图进行索引维护。迄今为止,大多数索引维护方法每次只能处理一条边的更新(添加或删除)。为了适应频繁变化的图,我们研究了 3 跳覆盖标记的完全批量维护问题,并提出了名为 FulBM 的方法。FulBM 由两种算法组成:InsBM和DelBM,分别用于处理边的批量插入和删除。这种分离的原因是,批量维护边缘插入更省时,而且现实世界中的大多数边缘更新都是增量的。InsBM 和 DelBM 都配备了精心设计的剪枝策略,以最大限度地减少顶点访问次数。我们在合成图和真实图上进行了全面的实验,以验证 FulBM 及其变体在加权图上的效率。结果表明,与最先进的方法相比,我们的方法提高了 5.5 到 228 倍的速度。
{"title":"FulBM: Fast fully batch maintenance for landmark-based 3-hop cover labeling","authors":"Wentai Zhang, HaiHong E, HaoRan Luo, Mingzhi Sun","doi":"10.1145/3650035","DOIUrl":"https://doi.org/10.1145/3650035","url":null,"abstract":"<p>Landmark-based 3-hop cover labeling is a category of approaches for shortest distance/path queries on large-scale complex networks. It pre-computes an index offline to accelerate the online distance/path query. Most real-world graphs undergo rapid changes in topology, which makes index maintenance on dynamic graphs necessary. So far, the majority of index maintenance methods can handle only one edge update (either an addition or deletion) each time. To keep up with frequently changing graphs, we research the <b>ful</b><i>ly</i> <b>b</b><i>atch</i> <b>m</b><i>aintenance</i> problem for the 3-hop cover labeling, and proposed the method called <i>FulBM</i>. FulBM is composed of two algorithms: InsBM and DelBM, which are designed to handle batch edge insertions and deletions respectively. This separation is motivated by the insight that batch maintenance for edge insertions are much more time-efficient, and the fact that most edge updates in the real world are incremental. Both InsBM and DelBM are equipped with well-designed pruning strategies to minimize the number of vertex accesses. We have conducted comprehensive experiments on both synthetic and real-world graphs to verify the efficiency of FulBM and its variants for weighted graphs. The results show that our methods achieve 5.5 × to 228 × speedup compared with the state-of-the-art method.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"169 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Node Embedding Preserving Graph Summarization 节点嵌入保存图摘要
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-08 DOI: 10.1145/3649505
Houquan Zhou, Shenghua Liu, Huawei Shen, Xueqi Cheng

Graph summarization is a useful tool for analyzing large-scale graphs. Some works tried to preserve original node embeddings encoding rich structural information of nodes on the summary graph. However, their algorithms are designed heuristically and not theoretically guaranteed. In this paper, we theoretically study the problem of preserving node embeddings on summary graph. We prove that three matrix-factorization based node embedding methods of the original graph can be approximated by that of the summary graph, and propose a novel graph summarization method, named HCSumm, based on this analysis. Extensive experiments are performed on real-world datasets to evaluate the effectiveness of our proposed method. The experimental results show that our method outperforms the state-of-the-art methods in preserving node embeddings.

图摘要是分析大规模图的有用工具。一些研究试图在摘要图上保留原始节点嵌入,以编码节点的丰富结构信息。然而,它们的算法都是启发式设计的,并没有理论保证。本文从理论上研究了在摘要图上保留节点嵌入的问题。我们证明了原始图的三种基于矩阵因子化的节点嵌入方法可以被摘要图的节点嵌入方法近似,并在此基础上提出了一种名为 HCSumm 的新型图摘要方法。我们在真实世界的数据集上进行了广泛的实验,以评估我们提出的方法的有效性。实验结果表明,我们的方法在保留节点嵌入方面优于最先进的方法。
{"title":"Node Embedding Preserving Graph Summarization","authors":"Houquan Zhou, Shenghua Liu, Huawei Shen, Xueqi Cheng","doi":"10.1145/3649505","DOIUrl":"https://doi.org/10.1145/3649505","url":null,"abstract":"<p>Graph summarization is a useful tool for analyzing large-scale graphs. Some works tried to preserve original node embeddings encoding rich structural information of nodes on the summary graph. However, their algorithms are designed heuristically and not theoretically guaranteed. In this paper, we theoretically study the problem of preserving node embeddings on summary graph. We prove that three matrix-factorization based node embedding methods of the original graph can be approximated by that of the summary graph, and propose a novel graph summarization method, named <span>HCSumm</span>, based on this analysis. Extensive experiments are performed on real-world datasets to evaluate the effectiveness of our proposed method. The experimental results show that our method outperforms the state-of-the-art methods in preserving node embeddings.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"60 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Content-Aware Influence Maximization via Online Learning to Rank 通过在线学习排名实现自适应内容感知影响力最大化
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-08 DOI: 10.1145/3651987
Konstantinos Theocharidis, Panagiotis Karras, Manolis Terrovitis, Spiros Skiadopoulos, Hady W. Lauw

How can we adapt the composition of a post over a series of rounds to make it more appealing in a social network? Techniques that progressively learn how to make a fixed post more influential over rounds have been studied in the context of the Influence Maximization (IM) problem, which seeks a set of seed users that maximize a post’s influence. However, there is no work on progressively learning how a post’s features affect its influence. In this paper, we propose and study the problem of Adaptive Content-Aware Influence Maximization (ACAIM), which calls to find k features to form a post in each round so as to maximize the cumulative influence of those posts over all rounds. We solve ACAIM by applying, for the first time, an Online Learning to Rank (OLR) framework to IM purposes. We introduce the CATRID propagation model, which expresses how posts disseminate in a social network using click probabilities and post visibility criteria, and develop a simulator that runs CATRID via a training-testing scheme based on real posts of the VK social network, so as to realistically represent the learning environment. We deploy three learners that solve ACAIM in an online (real-time) manner. We experimentally prove the practical suitability of our solutions via exhaustive experiments on multiple brands (operating as different case studies) and several VK datasets; the best learner is evaluated on 45 separate case studies yielding convincing results.

如何在一系列回合中调整帖子的构成,使其在社交网络中更具吸引力?在影响力最大化(IM)问题的背景下,人们已经研究了逐步学习如何使一个固定的帖子在一系列回合中更具影响力的技术,该问题旨在寻找一组能使帖子影响力最大化的种子用户。然而,目前还没有关于逐步学习帖子特征如何影响其影响力的研究。在本文中,我们提出并研究了自适应内容感知影响力最大化(ACAIM)问题,该问题要求在每一轮中找到 k 个特征来组成帖子,从而使这些帖子在所有轮次中的累积影响力最大化。我们首次将在线学习排名(OLR)框架应用于即时通讯目的,从而解决了 ACAIM 问题。我们引入了 CATRID 传播模型,该模型使用点击概率和帖子可见度标准来表达帖子在社交网络中的传播方式,并开发了一个模拟器,通过基于 VK 社交网络真实帖子的训练-测试方案来运行 CATRID,从而真实地再现学习环境。我们部署了三个学习者,以在线(实时)方式解决 ACAIM 问题。我们在多个品牌(作为不同的案例研究)和多个 VK 数据集上进行了详尽的实验,证明了我们的解决方案的实用性;我们在 45 个独立案例研究中对最佳学习器进行了评估,结果令人信服。
{"title":"Adaptive Content-Aware Influence Maximization via Online Learning to Rank","authors":"Konstantinos Theocharidis, Panagiotis Karras, Manolis Terrovitis, Spiros Skiadopoulos, Hady W. Lauw","doi":"10.1145/3651987","DOIUrl":"https://doi.org/10.1145/3651987","url":null,"abstract":"<p>How can we adapt the composition of a post over a series of rounds to make it more appealing in a social network? Techniques that progressively learn how to make a <i>fixed</i> post more influential over rounds have been studied in the context of the <i>Influence Maximization</i> (IM) problem, which seeks a set of <i>seed users</i> that maximize a post’s influence. However, there is no work on progressively learning how a post’s <i>features</i> affect its influence. In this paper, we propose and study the problem of <i>Adaptive Content-Aware Influence Maximization</i> (ACAIM), which calls to find <i>k</i> features to form a post in each round so as to maximize the cumulative influence of those posts over all rounds. We solve ACAIM by applying, for the first time, an <i>Online Learning to Rank</i> (OLR) framework to IM purposes. We introduce the CATRID <i>propagation model</i>, which expresses how posts disseminate in a social network using click probabilities and post visibility criteria, and develop a <i>simulator</i> that runs CATRID via a training-testing scheme based on real posts of the VK social network, so as to realistically represent the learning environment. We deploy three <i>learners</i> that solve ACAIM in an online (real-time) manner. We experimentally prove the practical suitability of our solutions via exhaustive experiments on multiple brands (operating as different <i>case studies</i>) and several VK datasets; the best learner is evaluated on 45 separate <i>case studies</i> yielding convincing results.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"57 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scenario and Multi-Task Aware Feature Interaction for Recommendation System 用于推荐系统的多场景和多任务感知特征交互
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-06 DOI: 10.1145/3651312
Derun Song, Enneng Yang, Guibing Guo, Li Shen, Linying Jiang, Xingwei Wang

Multi-scenario and multi-task recommendation can use various feedback behaviors of users in different scenarios to learn users’ preferences and then make recommendations, which has attracted attention. However, the existing work ignores feature interactions and the fact that a pair of feature interactions will have differing levels of importance under different scenario-task pairs, leading to sub-optimal user preference learning. In this paper, we propose a Multi-scenario and Multi-task aware Feature Interaction model, dubbed MMFI, to explicitly model feature interactions and learn the importance of feature interaction pairs in different scenarios and tasks. Specifically, MMFI first incorporates a pairwise feature interaction unit and a scenario-task interaction unit to effectively capture the interaction of feature pairs and scenario-task pairs. Then MMFI designs a scenario-task aware attention layer for learning the importance of feature interactions from coarse-grained to fine-grained, improving the model’s performance on various scenario-task pairs. More specifically, this attention layer consists of three modules: a fully shared bottom module, a partially shared middle module, and a specific output module. Finally, MMFI adapts two sparsity-aware functions to remove some useless feature interactions. Extensive experiments on two public datasets demonstrate the superiority of the proposed method over the existing multi-task recommendation, multi-scenario recommendation, and multi-scenario & multi-task recommendation models.

多场景、多任务推荐可以利用用户在不同场景中的各种反馈行为来学习用户的偏好,进而进行推荐,这一点已经引起了人们的关注。然而,现有的工作忽略了特征交互,忽略了在不同场景-任务配对下,一对特征交互的重要程度不同,导致用户偏好学习未达到最优。在本文中,我们提出了一种多场景和多任务感知特征交互模型(称为 MMFI),以明确建立特征交互模型,并学习不同场景和任务中特征交互对的重要性。具体来说,MMFI 首先包含一个成对特征交互单元和一个场景-任务交互单元,以有效捕捉成对特征和场景-任务的交互。然后,MMFI 设计了一个场景-任务感知注意力层,用于从粗粒度到细粒度学习特征交互的重要性,从而提高模型在各种场景-任务对上的性能。更具体地说,该注意力层由三个模块组成:完全共享的底部模块、部分共享的中间模块和特定的输出模块。最后,MMFI 调整了两个稀疏感知函数,以去除一些无用的特征交互。在两个公共数据集上进行的大量实验证明,与现有的多任务推荐、多场景推荐和多场景 & 多任务推荐模型相比,所提出的方法更具优势。
{"title":"Multi-Scenario and Multi-Task Aware Feature Interaction for Recommendation System","authors":"Derun Song, Enneng Yang, Guibing Guo, Li Shen, Linying Jiang, Xingwei Wang","doi":"10.1145/3651312","DOIUrl":"https://doi.org/10.1145/3651312","url":null,"abstract":"<p>Multi-scenario and multi-task recommendation can use various feedback behaviors of users in different scenarios to learn users’ preferences and then make recommendations, which has attracted attention. However, the existing work ignores feature interactions and the fact that a pair of feature interactions will have differing levels of importance under different scenario-task pairs, leading to sub-optimal user preference learning. In this paper, we propose a <b>M</b>ulti-scenario and <b>M</b>ulti-task aware <b>F</b>eature <b>I</b>nteraction model, dubbed <b>MMFI</b>, to explicitly model feature interactions and learn the importance of feature interaction pairs in different scenarios and tasks. Specifically, MMFI first incorporates a pairwise feature interaction unit and a scenario-task interaction unit to effectively capture the interaction of feature pairs and scenario-task pairs. Then MMFI designs a scenario-task aware attention layer for learning the importance of feature interactions from coarse-grained to fine-grained, improving the model’s performance on various scenario-task pairs. More specifically, this attention layer consists of three modules: a fully shared bottom module, a partially shared middle module, and a specific output module. Finally, MMFI adapts two sparsity-aware functions to remove some useless feature interactions. Extensive experiments on two public datasets demonstrate the superiority of the proposed method over the existing multi-task recommendation, multi-scenario recommendation, and multi-scenario &amp; multi-task recommendation models.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"33 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SsAG: Summarization and Sparsification of Attributed Graphs SsAG:归属图的汇总和稀疏化
IF 3.6 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-03-06 DOI: 10.1145/3651619
Sarwan Ali, Muhammad Ahmad, Maham Anwer Beg, Imdad Ullah Khan, Safiullah Faizullah, Muhammad Asad Khan

Graph summarization has become integral for managing and analyzing large-scale graphs in diverse real-world applications, including social networks, biological networks, and communication networks. Existing methods for graph summarization often face challenges, being either computationally expensive, limiting their applicability to large graphs, or lacking the incorporation of node attributes. In response, we introduce SsAG, an efficient and scalable lossy graph summarization method designed to preserve the essential structure of the original graph. SsAG computes a sparse representation (summary) of the input graph, accommodating graphs with node attributes. The summary is structured as a graph on supernodes (subsets of vertices of G), where weighted superedges connect pairs of supernodes. The methodology focuses on constructing a summary graph with k supernodes, aiming to minimize the reconstruction error (the difference between the original graph and the graph reconstructed from the summary) while maximizing homogeneity with respect to the node attributes. The construction process involves iteratively merging pairs of nodes. To enhance computational efficiency, we derive a closed-form expression for efficiently computing the reconstruction error (RE) after merging a pair, enabling constant-time approximation of this score. We assign a weight to each supernode, quantifying their contribution to the score of pairs, and utilize a weighted sampling strategy to select the best pair for merging. Notably, a logarithmic-sized sample achieves a summary comparable in quality based on various measures. Additionally, we propose a sparsification step for the constructed summary, aiming to reduce storage costs to a specified target size with a marginal increase in RE. Empirical evaluations across diverse real-world graphs demonstrate that SsAG exhibits superior speed, being up to 17 × faster, while generating summaries of comparable quality. This work represents a significant advancement in the field, addressing computational challenges and showcasing the effectiveness of SsAG in graph summarization.

在社交网络、生物网络和通信网络等各种现实世界应用中,图摘要已成为管理和分析大规模图不可或缺的一部分。现有的图摘要方法往往面临计算成本高、对大型图的适用性有限或缺乏节点属性等挑战。为此,我们引入了 SsAG,这是一种高效、可扩展的有损图总结方法,旨在保留原始图的基本结构。SsAG 可计算输入图的稀疏表示(摘要),并可容纳具有节点属性的图。摘要的结构是上节点(G 的顶点子集)图,其中加权上桥连接上节点对。该方法的重点是构建具有 k 个超级节点的摘要图,旨在最大限度地减少重建误差(原始图与根据摘要重建的图之间的差异),同时最大限度地提高节点属性的同质性。构建过程包括迭代合并节点对。为了提高计算效率,我们推导出了一个闭式表达式,用于有效计算合并节点对后的重建误差 (RE),从而在恒定时间内逼近这一分数。我们为每个超级节点分配一个权重,量化它们对数据对得分的贡献,并利用加权抽样策略选择最佳数据对进行合并。值得注意的是,一个对数大小的样本可以获得基于各种衡量标准的质量相当的摘要。此外,我们还为构建的摘要提出了一个稀疏化步骤,旨在将存储成本降低到指定的目标大小,而 RE 只会有边际增加。对各种真实图进行的经验评估表明,SsAG 的速度更快,最高可达 17 倍,同时生成的摘要质量相当。这项工作代表了该领域的重大进步,解决了计算难题,展示了 SsAG 在图摘要中的有效性。
{"title":"SsAG: Summarization and Sparsification of Attributed Graphs","authors":"Sarwan Ali, Muhammad Ahmad, Maham Anwer Beg, Imdad Ullah Khan, Safiullah Faizullah, Muhammad Asad Khan","doi":"10.1145/3651619","DOIUrl":"https://doi.org/10.1145/3651619","url":null,"abstract":"<p>Graph summarization has become integral for managing and analyzing large-scale graphs in diverse real-world applications, including social networks, biological networks, and communication networks. Existing methods for graph summarization often face challenges, being either computationally expensive, limiting their applicability to large graphs, or lacking the incorporation of node attributes. In response, we introduce <span>SsAG</span>, an efficient and scalable lossy graph summarization method designed to preserve the essential structure of the original graph. <span>SsAG</span> computes a sparse representation (summary) of the input graph, accommodating graphs with node attributes. The summary is structured as a graph on supernodes (subsets of vertices of <i>G</i>), where weighted superedges connect pairs of supernodes. The methodology focuses on constructing a summary graph with <i>k</i> supernodes, aiming to minimize the reconstruction error (the difference between the original graph and the graph reconstructed from the summary) while maximizing homogeneity with respect to the node attributes. The construction process involves iteratively merging pairs of nodes. To enhance computational efficiency, we derive a closed-form expression for efficiently computing the reconstruction error (RE) after merging a pair, enabling constant-time approximation of this score. We assign a weight to each supernode, quantifying their contribution to the score of pairs, and utilize a weighted sampling strategy to select the best pair for merging. Notably, a logarithmic-sized sample achieves a summary comparable in quality based on various measures. Additionally, we propose a sparsification step for the constructed summary, aiming to reduce storage costs to a specified target size with a marginal increase in RE. Empirical evaluations across diverse real-world graphs demonstrate that <span>SsAG</span> exhibits superior speed, being up to 17 × faster, while generating summaries of comparable quality. This work represents a significant advancement in the field, addressing computational challenges and showcasing the effectiveness of <span>SsAG</span> in graph summarization.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"104 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140045427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Knowledge Discovery from Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1