首页 > 最新文献

IEEE Transactions on Big Data最新文献

英文 中文
Crime Prediction With Missing Data Via Spatiotemporal Regularized Tensor Decomposition 基于时空正则张量分解的缺失数据犯罪预测
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-06 DOI: 10.1109/TBDATA.2023.3283098
Weichao Liang;Jie Cao;Lei Chen;Youquan Wang;Jia Wu;Amin Beheshti;Jiangnan Tang
The goal of crime prediction is to forecast the number of crime incidents at each region of a city based on the historical crime data. It has attracted a great deal of attention from both academic and industrial communities due to its considerable significance in improving urban safety and reducing financial losses. Although much progress has been made in this field, most of the existing approaches assume that the historical crime data are complete, which does not hold in many real-world scenarios. Meanwhile, crime incidents are affected by multiple factors and have intricate spatial, temporal, and categorical correlations, which are not fully utilized by the current methods. In this article, we propose a novel tensor decomposition based framework, named TD-Crime, to conduct prediction directly on the incomplete crime data. Specifically, we first organize the crime data as a tensor and then apply the nonnegative CP decomposition to it, which not only provides a natural solution to the missing data problem but also captures the spatial, temporal, and categorical correlations implicitly. Moreover, we attempt to exploit the spatial and temporal correlations explicitly by directly learning from the crime data to further improve the forecasting performance. Finally, we obtain a joint optimization problem and present an efficient alternating optimization scheme to find a satisfactory solution. Extensive experiments on the real-world crime datasets show that TD-Crime can address the crime prediction task effectively under different missing data scenarios.
犯罪预测的目标是根据历史犯罪数据,预测城市各个区域的犯罪事件数量。由于其在改善城市安全和减少经济损失方面具有重要意义,因此引起了学术界和工业界的广泛关注。尽管在这一领域取得了很大的进展,但大多数现有的方法都假设历史犯罪数据是完整的,这在许多现实世界的情况下是不成立的。同时,犯罪事件受多种因素的影响,具有复杂的空间、时间和类别相关性,现有方法未能充分利用这些因素。在本文中,我们提出了一种新的基于张量分解的框架TD-Crime,直接对不完整的犯罪数据进行预测。具体来说,我们首先将犯罪数据组织为一个张量,然后对其应用非负CP分解,这不仅为缺失数据问题提供了自然的解决方案,而且还隐含地捕获了空间、时间和类别相关性。此外,我们试图通过直接学习犯罪数据来明确地利用空间和时间相关性,以进一步提高预测性能。最后,我们得到了一个联合优化问题,并给出了一种有效的交替优化方案来寻找满意的解。在真实犯罪数据集上的大量实验表明,TD-Crime可以有效地解决不同缺失数据场景下的犯罪预测任务。
{"title":"Crime Prediction With Missing Data Via Spatiotemporal Regularized Tensor Decomposition","authors":"Weichao Liang;Jie Cao;Lei Chen;Youquan Wang;Jia Wu;Amin Beheshti;Jiangnan Tang","doi":"10.1109/TBDATA.2023.3283098","DOIUrl":"10.1109/TBDATA.2023.3283098","url":null,"abstract":"The goal of crime prediction is to forecast the number of crime incidents at each region of a city based on the historical crime data. It has attracted a great deal of attention from both academic and industrial communities due to its considerable significance in improving urban safety and reducing financial losses. Although much progress has been made in this field, most of the existing approaches assume that the historical crime data are complete, which does not hold in many real-world scenarios. Meanwhile, crime incidents are affected by multiple factors and have intricate spatial, temporal, and categorical correlations, which are not fully utilized by the current methods. In this article, we propose a novel tensor decomposition based framework, named TD-Crime, to conduct prediction directly on the incomplete crime data. Specifically, we first organize the crime data as a tensor and then apply the nonnegative CP decomposition to it, which not only provides a natural solution to the missing data problem but also captures the spatial, temporal, and categorical correlations implicitly. Moreover, we attempt to exploit the spatial and temporal correlations explicitly by directly learning from the crime data to further improve the forecasting performance. Finally, we obtain a joint optimization problem and present an efficient alternating optimization scheme to find a satisfactory solution. Extensive experiments on the real-world crime datasets show that TD-Crime can address the crime prediction task effectively under different missing data scenarios.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1392-1407"},"PeriodicalIF":7.2,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42653830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection 无监督异常检测中局部密度峰值的增密路径搜索
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-04-17 DOI: 10.1109/TBDATA.2023.3265509
Jiachen Zhao;Fang Deng;Jiaqi Zhu;Jie Chen
Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.
无监督异常检测(AD)是数据挖掘领域中一个具有挑战性的问题。基于聚类的AD方法旨在将正常数据点分组为聚类,然后将不属于任何聚类的点视为异常。然而,它们可能会遇到未知簇数和任意簇形状的问题。为了应对这些挑战,本文提出了一种新的基于聚类的AD方法——密度增加路径(DIP)。DIP搜索每个数据点的路径。路径从数据点本身开始,经过密度单调增加的几个点,并在密度峰值结束。此外,DIP通过结合路径上每一步的距离和密度增量来定义每条路径的攀爬难度,可以将其视为路径起点的异常分数。DIP可以自适应地决定峰值的数量,以应对未知集群数量的挑战。由于DIP要求路径通过几个点,而不是直接到达峰值,因此它可以处理任意的簇形状。我们还提出了集成DIP来提高预测精度。在四个合成数据集和十一个真实世界基准上的实验结果表明,DIP优于现有方法。
{"title":"Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection","authors":"Jiachen Zhao;Fang Deng;Jiaqi Zhu;Jie Chen","doi":"10.1109/TBDATA.2023.3265509","DOIUrl":"10.1109/TBDATA.2023.3265509","url":null,"abstract":"Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1198-1209"},"PeriodicalIF":7.2,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41278104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Consistent Graph Neural Networks for Semi-Supervised Node Classification 用于半监督节点分类的自洽图神经网络
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-04-12 DOI: 10.1109/TBDATA.2023.3266590
Yanbei Liu;Shichuan Zhao;Xiao Wang;Lei Geng;Zhitao Xiao;Jerry Chun-Wei Lin
Graph Neural Networks (GNNs), the powerful graph representation technique based on deep learning, have attracted great research interest in recent years. Although many GNNs have achieved the state-of-the-art accuracy on a set of standard benchmark datasets, they are still limited to traditional semi-supervised framework and lack of sufficient supervision information, especially for the large amount of unlabeled data. To overcome this issue, we propose a novel self-consistent graph neural networks (SCGNN) framework to enrich the supervision information from two aspects: the self-consistency of unlabeled data and the label information of labeled data. First, in order to extract the self-supervision information from the numerous unlabeled nodes, we perform graph data augmentation and leverage a self-consistent constraint to maximize the mutual information of the unlabeled nodes across different augmented graph views. The self-consistency can sufficiently utilize the intrinsic structural attributes of the graph to extract the self-supervision information from unlabeled data and improve the subsequent classification result. Second, to further extract supervision information from scarce labeled nodes, we introduce a fusion mechanism to obtain comprehensive node embeddings by fusing node representations of two positive graph views, and optimize the classification loss over labeled nodes to maximize the utilization of label information. We conduct comprehensive empirical studies on six public benchmark datasets in node classification task. In terms of accuracy, SCGNN improves by an average of 2.08% over the best baseline, and specifically by 5.8% on the Disease dataset.
图神经网络是一种基于深度学习的强大的图表示技术,近年来引起了人们的极大研究兴趣。尽管许多GNN在一组标准基准数据集上实现了最先进的准确性,但它们仍然局限于传统的半监督框架,并且缺乏足够的监督信息,尤其是对于大量未标记的数据。为了克服这一问题,我们提出了一种新的自洽图神经网络(SCGNN)框架,从两个方面丰富监督信息:未标记数据的自洽性和标记数据的标记信息。首先,为了从众多未标记节点中提取自监督信息,我们执行图数据扩充,并利用自一致约束来最大化未标记节点在不同扩充图视图中的相互信息。自一致性可以充分利用图的内在结构属性,从未标记的数据中提取自监督信息,提高后续的分类结果。其次,为了进一步从稀缺的标记节点中提取监督信息,我们引入了一种融合机制,通过融合两个正图视图的节点表示来获得全面的节点嵌入,并优化标记节点上的分类损失,以最大限度地利用标记信息。在节点分类任务中,我们对六个公共基准数据集进行了全面的实证研究。就准确性而言,SCGNN比最佳基线平均提高了2.08%,特别是在疾病数据集上提高了5.8%。
{"title":"Self-Consistent Graph Neural Networks for Semi-Supervised Node Classification","authors":"Yanbei Liu;Shichuan Zhao;Xiao Wang;Lei Geng;Zhitao Xiao;Jerry Chun-Wei Lin","doi":"10.1109/TBDATA.2023.3266590","DOIUrl":"10.1109/TBDATA.2023.3266590","url":null,"abstract":"Graph Neural Networks (GNNs), the powerful graph representation technique based on deep learning, have attracted great research interest in recent years. Although many GNNs have achieved the state-of-the-art accuracy on a set of standard benchmark datasets, they are still limited to traditional semi-supervised framework and lack of sufficient supervision information, especially for the large amount of unlabeled data. To overcome this issue, we propose a novel self-consistent graph neural networks (SCGNN) framework to enrich the supervision information from two aspects: the self-consistency of unlabeled data and the label information of labeled data. First, in order to extract the \u0000<styled-content>self-supervision information</styled-content>\u0000 from the numerous unlabeled nodes, we perform graph data augmentation and leverage a self-consistent constraint to maximize the mutual information of the unlabeled nodes across different augmented graph views. The self-consistency can sufficiently utilize the intrinsic structural attributes of the graph to extract the \u0000<styled-content>self-supervision information</styled-content>\u0000 from unlabeled data and improve the subsequent classification result. Second, to further extract supervision information from scarce labeled nodes, we introduce a fusion mechanism to obtain comprehensive node embeddings by fusing node representations of two positive graph views, and optimize the classification loss over labeled nodes to maximize the utilization of label information. We conduct comprehensive empirical studies on six public benchmark datasets in node classification task. In terms of accuracy, SCGNN improves by an average of 2.08% over the best baseline, and specifically by 5.8% on the Disease dataset.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1186-1197"},"PeriodicalIF":7.2,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49370204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient and Secure Data Sharing Scheme on Interoperable Blockchain Database 可互操作区块链数据库上高效安全的数据共享方案
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-04-06 DOI: 10.1109/TBDATA.2023.3265178
Kun Hao;Junchang Xin;Zhiqiong Wang;Zhongming Yao;Guoren Wang
Interoperable Blockchain Database (IBD) can enable users to execute transactions for sharing data stored in various blockchains maintained by different organizations or communities in a transparent manner. However, compared to traditional distributed databases, IBD can hardly provide high-level security and scalability, which are caused by many factors, such as system architecture, consensus protocol, and interactive pattern. Among them, the consensus protocol is the most critical factor, since the credibility of consensus nodes inside the corresponding blockchains are difficult to be guaranteed. Additionally, the consensus protocol directly affects the verification efficiency for given transactions in IBD. In this paper, we formally concern the problem of secure data sharing in IBD. We present a scheme named Hybridchain to execute transactions for sharing data securely and efficiently. We first propose a novel concept named Interoperable Consensus Group (ICG) which organizes a set of basic consensus nodes into a group, each of which is responsible for managing at least one local blockchain. Then, we present an interoperable cross-chains consensus protocol to achieve eventual consistency of blockchain transactions. We conduct extensive experiments, and the evaluation results show that our proposed approach achieves superior performance.
可互操作的区块链数据库(IBD)可以使用户以透明的方式执行交易,以共享存储在不同组织或社区维护的各种区块链中的数据。然而,与传统的分布式数据库相比,IBD很难提供高水平的安全性和可扩展性,这是由系统架构、共识协议、交互模式等诸多因素造成的。其中,共识协议是最关键的因素,因为相应区块链内部共识节点的可信度很难得到保证。此外,共识协议直接影响IBD中给定交易的验证效率。本文正式讨论了IBD中数据的安全共享问题。我们提出了一种名为Hybridchain的方案来安全有效地执行交易以共享数据。我们首先提出了一个名为互操作共识组(ICG)的新概念,它将一组基本共识节点组织成一个组,每个组负责管理至少一个本地区块链。然后,我们提出了一个可互操作的跨链共识协议,以实现区块链事务的最终一致性。我们进行了大量的实验,评估结果表明我们提出的方法取得了优异的性能。
{"title":"Efficient and Secure Data Sharing Scheme on Interoperable Blockchain Database","authors":"Kun Hao;Junchang Xin;Zhiqiong Wang;Zhongming Yao;Guoren Wang","doi":"10.1109/TBDATA.2023.3265178","DOIUrl":"10.1109/TBDATA.2023.3265178","url":null,"abstract":"Interoperable Blockchain Database (IBD) can enable users to execute transactions for sharing data stored in various blockchains maintained by different organizations or communities in a transparent manner. However, compared to traditional distributed databases, IBD can hardly provide high-level security and scalability, which are caused by many factors, such as system architecture, consensus protocol, and interactive pattern. Among them, the consensus protocol is the most critical factor, since the credibility of consensus nodes inside the corresponding blockchains are difficult to be guaranteed. Additionally, the consensus protocol directly affects the verification efficiency for given transactions in IBD. In this paper, we formally concern the problem of secure data sharing in IBD. We present a scheme named \u0000<italic>Hybridchain</i>\u0000 to execute transactions for sharing data securely and efficiently. We first propose a novel concept named \u0000<italic>Interoperable Consensus Group</i>\u0000 (ICG) which organizes a set of basic consensus nodes into a group, each of which is responsible for managing at least one local blockchain. Then, we present an interoperable cross-chains consensus protocol to achieve eventual consistency of blockchain transactions. We conduct extensive experiments, and the evaluation results show that our proposed approach achieves superior performance.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1171-1185"},"PeriodicalIF":7.2,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41675603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Self-Supervised Federated Adaptation for Multi-Site Brain Disease Diagnosis 自监督联合自适应在多部位脑疾病诊断中的应用
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-04-03 DOI: 10.1109/TBDATA.2023.3264109
Qiming Yang;Qi Zhu;Mingming Wang;Wei Shao;Zheng Zhang;Daoqiang Zhang
The multi-site approach has attracted increasing attention in brain disease diagnosis, because it can improve the prediction performance by integrating sample information from different medical institutions. However, its training procedure requires the transmission of subject's original images or features among sites, which may cause privacy disclosure. In this article, we propose a self-supervised federated adaptation (S2FA) framework for robust multi-site prediction, which can reduce the risk of privacy disclosure. As far as we know, it is the first work to investigate the cross-site brain disease diagnosis, which trains model on source sites and tests on target site, often occurring in clinical practice. First, we implement a decentralized federated optimization strategy, by which each site communicates model parameters periodically. Second, we construct an auxiliary self-supervised model for target site through transferring knowledge from source sites with self-paced learning. Then, a hash mapping is proposed to encode the target feature, simultaneously reducing the risk of privacy information disclosure and alleviating data heterogeneity among sites. Finally, we achieve the cross-site prediction by weighted federated source model and auxiliary target model. Experimental results on multi-site datasets show that the proposed S2FA can accurately identify brain disease. Our codes are available at https://github.com/nuaayqm/S2FA.
多站点方法在脑疾病诊断中越来越受到关注,因为它可以通过整合来自不同医疗机构的样本信息来提高预测性能。然而,其训练程序需要在网站之间传输受试者的原始图像或特征,这可能会导致隐私泄露。在本文中,我们提出了一种用于鲁棒多站点预测的自监督联合自适应(S2FA)框架,该框架可以降低隐私泄露的风险。据我们所知,这是研究跨部位脑部疾病诊断的第一项工作,该诊断在源部位训练模型,在目标部位测试,通常发生在临床实践中。首先,我们实现了一种去中心化的联合优化策略,通过该策略,每个站点定期传递模型参数。其次,我们通过自节奏学习从源站点转移知识,构建了目标站点的辅助自监督模型。然后,提出了一种哈希映射来对目标特征进行编码,同时降低了隐私信息泄露的风险,缓解了站点之间的数据异构性。最后,通过加权联邦源模型和辅助目标模型实现了跨站点预测。在多站点数据集上的实验结果表明,所提出的S2FA可以准确识别脑部疾病。我们的代码可在https://github.com/nuaayqm/S2FA.
{"title":"Self-Supervised Federated Adaptation for Multi-Site Brain Disease Diagnosis","authors":"Qiming Yang;Qi Zhu;Mingming Wang;Wei Shao;Zheng Zhang;Daoqiang Zhang","doi":"10.1109/TBDATA.2023.3264109","DOIUrl":"10.1109/TBDATA.2023.3264109","url":null,"abstract":"The multi-site approach has attracted increasing attention in brain disease diagnosis, because it can improve the prediction performance by integrating sample information from different medical institutions. However, its training procedure requires the transmission of subject's original images or features among sites, which may cause privacy disclosure. In this article, we propose a self-supervised federated adaptation (S2FA) framework for robust multi-site prediction, which can reduce the risk of privacy disclosure. As far as we know, it is the first work to investigate the cross-site brain disease diagnosis, which trains model on source sites and tests on target site, often occurring in clinical practice. First, we implement a decentralized federated optimization strategy, by which each site communicates model parameters periodically. Second, we construct an auxiliary self-supervised model for target site through transferring knowledge from source sites with self-paced learning. Then, a hash mapping is proposed to encode the target feature, simultaneously reducing the risk of privacy information disclosure and alleviating data heterogeneity among sites. Finally, we achieve the cross-site prediction by weighted federated source model and auxiliary target model. Experimental results on multi-site datasets show that the proposed S2FA can accurately identify brain disease. Our codes are available at \u0000<uri>https://github.com/nuaayqm/S2FA</uri>\u0000.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1334-1346"},"PeriodicalIF":7.2,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49220442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quality Inference in Federated Learning With Secure Aggregation 安全聚合联合学习中的质量推断
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-03-29 DOI: 10.1109/TBDATA.2023.3280406
Balázs Pejó;Gergely Biczók
Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality (i.e., the ratio of correct labels) of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to stabilize training performance, measure the individual contribution of participants, and detect misbehavior.
开发联邦学习算法既是为了提高效率,也是为了确保个人数据和业务数据的隐私性和机密性。尽管没有明确共享数据,但最近的研究表明,这种机制仍然可能泄露敏感信息。因此,在许多实际场景中使用安全聚合来防止归因到特定的参与者。在本文中,我们关注单个训练数据集的质量(即正确标签的比例),并表明即使应用了安全聚合,也可以推断出这种质量信息并归因于特定的参与者。具体来说,通过一系列的图像识别实验,我们推断出参与者的相对质量排序。此外,我们应用推断的质量信息来稳定训练绩效,衡量参与者的个人贡献,并检测不良行为。
{"title":"Quality Inference in Federated Learning With Secure Aggregation","authors":"Balázs Pejó;Gergely Biczók","doi":"10.1109/TBDATA.2023.3280406","DOIUrl":"10.1109/TBDATA.2023.3280406","url":null,"abstract":"Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality (i.e., the ratio of correct labels) of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to stabilize training performance, measure the individual contribution of participants, and detect misbehavior.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1430-1437"},"PeriodicalIF":7.2,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6687317/10236926/10138056.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45853111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Privacy and Efficiency of Communications in Federated Split Learning 联邦分裂学习中通信的保密性和效率
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-03-29 DOI: 10.1109/TBDATA.2023.3280405
Zongshun Zhang;Andrea Pinto;Valeria Turina;Flavio Esposito;Ibrahim Matta
Every day, large amounts of sensitive data are distributed across mobile phones, wearable devices, and other sensors. Traditionally, these enormous datasets have been processed on a single system, with complex models being trained to make valuable predictions. Distributed machine learning techniques such as Federated and Split Learning have recently been developed to protect user data and privacy better while ensuring high performance. Both of these distributed learning architectures have advantages and disadvantages. In this article, we examine these tradeoffs and suggest a new hybrid Federated Split Learning architecture that combines the efficiency and privacy benefits of both. Our evaluation demonstrates how our hybrid Federated Split Learning approach can lower the amount of processing power required by each client running a distributed learning system, and reduce training and inference time while keeping a similar accuracy. We also discuss the resiliency of our approach to deep learning privacy inference attacks and compare our solution to other recently proposed benchmarks.
每天,大量敏感数据分布在手机、可穿戴设备和其他传感器上。传统上,这些庞大的数据集是在一个系统上处理的,复杂的模型被训练来做出有价值的预测。最近开发了分布式机器学习技术,如联合学习和分割学习,以更好地保护用户数据和隐私,同时确保高性能。这两种分布式学习体系结构都有优点和缺点。在本文中,我们研究了这些权衡,并提出了一种新的混合联合拆分学习架构,该架构结合了两者的效率和隐私优势。我们的评估表明,我们的混合联合分割学习方法可以降低运行分布式学习系统的每个客户端所需的处理能力,减少训练和推理时间,同时保持类似的准确性。我们还讨论了我们的方法对深度学习隐私推断攻击的弹性,并将我们的解决方案与最近提出的其他基准进行了比较。
{"title":"Privacy and Efficiency of Communications in Federated Split Learning","authors":"Zongshun Zhang;Andrea Pinto;Valeria Turina;Flavio Esposito;Ibrahim Matta","doi":"10.1109/TBDATA.2023.3280405","DOIUrl":"10.1109/TBDATA.2023.3280405","url":null,"abstract":"Every day, large amounts of sensitive data are distributed across mobile phones, wearable devices, and other sensors. Traditionally, these enormous datasets have been processed on a single system, with complex models being trained to make valuable predictions. Distributed machine learning techniques such as Federated and Split Learning have recently been developed to protect user data and privacy better while ensuring high performance. Both of these distributed learning architectures have advantages and disadvantages. In this article, we examine these tradeoffs and suggest a new hybrid Federated Split Learning architecture that combines the efficiency and privacy benefits of both. Our evaluation demonstrates how our hybrid Federated Split Learning approach can lower the amount of processing power required by each client running a distributed learning system, and reduce training and inference time while keeping a similar accuracy. We also discuss the resiliency of our approach to deep learning privacy inference attacks and compare our solution to other recently proposed benchmarks.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1380-1391"},"PeriodicalIF":7.2,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42482111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Cross-Region Courier Displacement for On-Demand Delivery With Multi-Agent Reinforcement Learning 基于多Agent强化学习的按需配送跨区域快递员置换
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-03-28 DOI: 10.1109/TBDATA.2023.3262408
Shuai Wang;Shijie Hu;Baoshen Guo;Guang Wang
On-demand delivery has become prevailing for people to order meals and groceries online, especially during the pandemic. It is essential to dispatch massive orders to limited couriers to satisfy on-demand delivery users, especially during peak hours. Existing studies mainly focus on order dispatching within a region, and they are challenging to be applied to the cross-region courier displacement problem due to (1) unique practical factors, including regional spatial-temporal demand-supply dynamics and strict delivery time constraints, and (2) the large-scale setting and high-dimensional decision space given massive couriers in on-demand delivery. To address these challenges, in this work, we propose an efficient cross-region courier displacement framework, i.e., Courier Displacement Reinforcement Learning (short for CDRL) based on centralized multi-agent actor-critic, which first design the actor-critic network with a time-varying displacement intensity control module to capture demand-supply dynamics and utilize the centralized training and decentralized execution multi-agent framework to address the large-scale coordination. One-month real-world order records collected from one of the biggest on-demand delivery services in the world are utilized to show the performance of our design. The extensive results show that our method offers a 47.97% of increase in balancing supply and demand and reduces idle ride time by 24.62% simultaneously.
按需配送已成为人们在网上订餐和订购食品杂货的主流,尤其是在疫情期间。向有限的快递员发送大量订单以满足按需配送用户的需求至关重要,尤其是在高峰时段。现有的研究主要集中在一个区域内的订单调度,由于(1)独特的现实因素,包括区域时空供需动态和严格的交货时间限制,这些研究很难应用于跨区域快递员位移问题,以及(2)在按需递送中给大量快递员的大规模设置和高维决策空间。为了应对这些挑战,在这项工作中,我们提出了一个有效的跨区域信使位移框架,即基于集中式多智能体行动者-批评者的信使位移强化学习(CDRL的缩写),首先设计了具有时变位移强度控制模块的actor-critic网络来捕捉供需动态,并利用集中训练和分散执行的多智能体框架来解决大规模协调问题。从世界上最大的按需配送服务之一收集的一个月的真实订单记录用于显示我们的设计性能。广泛的结果表明,我们的方法在平衡供需方面增加了47.97%,同时减少了24.62%的空转时间。
{"title":"Cross-Region Courier Displacement for On-Demand Delivery With Multi-Agent Reinforcement Learning","authors":"Shuai Wang;Shijie Hu;Baoshen Guo;Guang Wang","doi":"10.1109/TBDATA.2023.3262408","DOIUrl":"10.1109/TBDATA.2023.3262408","url":null,"abstract":"On-demand delivery has become prevailing for people to order meals and groceries online, especially during the pandemic. It is essential to dispatch massive orders to limited couriers to satisfy on-demand delivery users, especially during peak hours. Existing studies mainly focus on order dispatching within a region, and they are challenging to be applied to the cross-region courier displacement problem due to (1) unique practical factors, including regional spatial-temporal demand-supply dynamics and strict delivery time constraints, and (2) the large-scale setting and high-dimensional decision space given massive couriers in on-demand delivery. To address these challenges, in this work, we propose an efficient cross-region courier displacement framework, i.e., \u0000<underline>C</u>\u0000ourier \u0000<underline>D</u>\u0000isplacement \u0000<underline>R</u>\u0000einforcement \u0000<underline>L</u>\u0000earning (short for \u0000<italic>CDRL</i>\u0000) based on centralized multi-agent actor-critic, which first design the actor-critic network with a time-varying displacement intensity control module to capture demand-supply dynamics and utilize the centralized training and decentralized execution multi-agent framework to address the large-scale coordination. One-month real-world order records collected from one of the biggest on-demand delivery services in the world are utilized to show the performance of our design. The extensive results show that our method offers a 47.97% of increase in balancing supply and demand and reduces idle ride time by 24.62% simultaneously.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1321-1333"},"PeriodicalIF":7.2,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42456979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Application of Mathematical Optimization in Data Visualization and Visual Analytics: A Survey 数学优化在数据可视化和可视化分析中的应用综述
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-03-27 DOI: 10.1109/TBDATA.2023.3262151
Guodao Sun;Zihao Zhu;Gefei Zhang;Chaoqing Xu;Yunchao Wang;Sujia Zhu;Baofeng Chang;Ronghua Liang
Mathematical optimization is the process of determining the set of globally or locally optimal parameters in a finite or infinite search space. It has been extensively employed in the research areas of computer science, engineering, operations research, and economics. The application of mathematical optimization has also been extended to data visualization, where it can enhance data processing, structure visualization, and facilitate exploration. However, the current state of summarization in the application of mathematical optimization in data visualization remains inadequate. In this article, we review and classify the existing techniques for advanced mathematical optimization in the fields of data visualization and visual analytics. The classification is conducted based on a classical visualization pipeline, including data enhancement and transformation, representation and rendering, as well as interactive exploration and analysis. We also discuss various mathematical optimization models and their solution methods to help readers gain a better understanding of the relationship among models, visualization, and application scenarios. We additionally provide an online exploration demo, which could enable users to interactively find relevant articles. Based on the limitations and potential trends revealed in the existing literature, we define future challenges in the cross-disciplinary of mathematical optimization and data visualization.
数学优化是在有限或无限搜索空间中确定全局或局部最优参数集的过程。它被广泛应用于计算机科学、工程、运筹学和经济学的研究领域。数学优化的应用也扩展到了数据可视化,它可以增强数据处理、结构可视化,并便于探索。然而,目前数学优化在数据可视化中的应用综述仍然不足。在本文中,我们回顾并分类了数据可视化和可视化分析领域中现有的高级数学优化技术。分类是基于经典的可视化管道进行的,包括数据增强和转换、表示和渲染,以及交互式探索和分析。我们还讨论了各种数学优化模型及其解决方法,以帮助读者更好地理解模型、可视化和应用场景之间的关系。我们还提供了一个在线探索演示,使用户可以交互式地查找相关文章。基于现有文献中揭示的局限性和潜在趋势,我们定义了数学优化和数据可视化交叉学科的未来挑战。
{"title":"Application of Mathematical Optimization in Data Visualization and Visual Analytics: A Survey","authors":"Guodao Sun;Zihao Zhu;Gefei Zhang;Chaoqing Xu;Yunchao Wang;Sujia Zhu;Baofeng Chang;Ronghua Liang","doi":"10.1109/TBDATA.2023.3262151","DOIUrl":"10.1109/TBDATA.2023.3262151","url":null,"abstract":"Mathematical optimization is the process of determining the set of globally or locally optimal parameters in a finite or infinite search space. It has been extensively employed in the research areas of computer science, engineering, operations research, and economics. The application of mathematical optimization has also been extended to data visualization, where it can enhance data processing, structure visualization, and facilitate exploration. However, the current state of summarization in the application of mathematical optimization in data visualization remains inadequate. In this article, we review and classify the existing techniques for advanced mathematical optimization in the fields of data visualization and visual analytics. The classification is conducted based on a classical visualization pipeline, including data enhancement and transformation, representation and rendering, as well as interactive exploration and analysis. We also discuss various mathematical optimization models and their solution methods to help readers gain a better understanding of the relationship among models, visualization, and application scenarios. We additionally provide an online exploration demo, which could enable users to interactively find relevant articles. Based on the limitations and potential trends revealed in the existing literature, we define future challenges in the cross-disciplinary of mathematical optimization and data visualization.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1018-1037"},"PeriodicalIF":7.2,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49608202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Multi-Modal Hypergraph Neural Network via Parametric Filtering and Feature Sampling 基于参数滤波和特征采样的多模态超图神经网络
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-03-22 DOI: 10.1109/TBDATA.2023.3278988
Zijian Liu;Yang Luo;Xitong Pu;Geyong Min;Chunbo Luo
In the real world, relationships between objects are often complex, involving multiple variables and modes. Hypergraph neural networks possess the capability to capture and represent such intricate relationships by deriving and inheriting their graph-based counterparts. Nevertheless, both graph and hypergraph neural networks suffer from the problem of over-smoothing when multiple graph convolution layers are stacked. To address this issue, this article introduces the Multi-modal Hypergraph Neural Network with Parametric Filtering and Feature Sampling (MHNet) to encode complex hypergraph features and mitigate over-smoothing. The proposed approach uses hypergraph structures to model high-order and multi-modal data correlations, a polynomial hypergraph filter to dynamically extract multi-scale node features through parametric polynomial fitting, and a feature sampling strategy to learn from sparse and labeled samples while avoiding overfitting. Experimental results on four hypergraph datasets and two multi-modal visual datasets demonstrate that the proposed MHNet outperforms state-of-the-art algorithms.
在现实世界中,对象之间的关系通常是复杂的,涉及多个变量和模式。超图神经网络具有通过派生和继承基于图的对等体来捕获和表示这种复杂关系的能力。然而,当多个图卷积层堆叠时,图和超图神经网络都存在过度平滑的问题。为了解决这个问题,本文引入了带有参数滤波和特征采样的多模态超图神经网络(MHNet)来编码复杂的超图特征并减轻过度平滑。该方法使用超图结构来模拟高阶和多模态数据相关性,使用多项式超图滤波器通过参数多项式拟合动态提取多尺度节点特征,并使用特征采样策略从稀疏和标记样本中学习,同时避免过拟合。在四个超图数据集和两个多模态视觉数据集上的实验结果表明,所提出的MHNet优于目前最先进的算法。
{"title":"A Multi-Modal Hypergraph Neural Network via Parametric Filtering and Feature Sampling","authors":"Zijian Liu;Yang Luo;Xitong Pu;Geyong Min;Chunbo Luo","doi":"10.1109/TBDATA.2023.3278988","DOIUrl":"10.1109/TBDATA.2023.3278988","url":null,"abstract":"In the real world, relationships between objects are often complex, involving multiple variables and modes. Hypergraph neural networks possess the capability to capture and represent such intricate relationships by deriving and inheriting their graph-based counterparts. Nevertheless, both graph and hypergraph neural networks suffer from the problem of over-smoothing when multiple graph convolution layers are stacked. To address this issue, this article introduces the Multi-modal Hypergraph Neural Network with Parametric Filtering and Feature Sampling (MHNet) to encode complex hypergraph features and mitigate over-smoothing. The proposed approach uses hypergraph structures to model high-order and multi-modal data correlations, a polynomial hypergraph filter to dynamically extract multi-scale node features through parametric polynomial fitting, and a feature sampling strategy to learn from sparse and labeled samples while avoiding overfitting. Experimental results on four hypergraph datasets and two multi-modal visual datasets demonstrate that the proposed MHNet outperforms state-of-the-art algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1365-1379"},"PeriodicalIF":7.2,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47901372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1