首页 > 最新文献

IEEE Transactions on Big Data最新文献

英文 中文
A Survey of Blockchain-Based Schemes for Data Sharing and Exchange 基于区块链的数据共享和交换方案研究
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-07-07 DOI: 10.1109/TBDATA.2023.3293279
Rui Song;Bin Xiao;Yubo Song;Songtao Guo;Yuanyuan Yang
Data immutability, transparency and decentralization of blockchain make it widely used in various fields, such as Internet of things, finance, energy and healthcare. With the advent of the Big Data era, various companies and organizations urgently need data from other parties for data analysis and mining to provide better services. Therefore, data sharing and data exchange have become an enormous industry. Traditional centralized data platforms face many problems, such as privacy leakage, high transaction costs and lack of interoperability. Introducing blockchain into this field can address these problems, while providing decentralized data storage and exchange, access control, identity authentication and copyright protection. Although many impressive blockchain-based schemes for data sharing or data exchange scenarios have been presented in recent years, there is still a lack of review and summary of work in this area. In this paper, we conduct a detailed survey of blockchain-based data sharing and data exchange platforms, discussing the latest technical architectures and research results in this field. In particular, we first survey the current blockchain-based data sharing solutions and provide a detailed analysis of system architecture, access control, interoperability, and security. We then review blockchain-based data exchange systems and data marketplaces, discussing trading process, monetization, copyright protection and other related topics.
区块链的数据不变性、透明性和去中心化使其广泛应用于物联网、金融、能源、医疗等各个领域。随着大数据时代的到来,各种公司和组织迫切需要来自其他各方的数据进行数据分析和挖掘,以提供更好的服务。因此,数据共享和数据交换已经成为一个巨大的产业。传统的集中式数据平台面临着隐私泄露、交易成本高、缺乏互操作性等诸多问题。将区块链引入该领域可以解决这些问题,同时提供分散的数据存储和交换、访问控制、身份认证和版权保护。尽管近年来提出了许多令人印象深刻的基于区块链的数据共享或数据交换方案,但仍然缺乏对该领域工作的审查和总结。在本文中,我们对基于区块链的数据共享和数据交换平台进行了详细的调查,讨论了该领域的最新技术架构和研究成果。特别是,我们首先调查了当前基于区块链的数据共享解决方案,并对系统架构、访问控制、互操作性和安全性进行了详细分析。然后,我们回顾了基于区块链的数据交换系统和数据市场,讨论了交易流程、货币化、版权保护和其他相关主题。
{"title":"A Survey of Blockchain-Based Schemes for Data Sharing and Exchange","authors":"Rui Song;Bin Xiao;Yubo Song;Songtao Guo;Yuanyuan Yang","doi":"10.1109/TBDATA.2023.3293279","DOIUrl":"10.1109/TBDATA.2023.3293279","url":null,"abstract":"Data immutability, transparency and decentralization of blockchain make it widely used in various fields, such as Internet of things, finance, energy and healthcare. With the advent of the Big Data era, various companies and organizations urgently need data from other parties for data analysis and mining to provide better services. Therefore, data sharing and data exchange have become an enormous industry. Traditional centralized data platforms face many problems, such as privacy leakage, high transaction costs and lack of interoperability. Introducing blockchain into this field can address these problems, while providing decentralized data storage and exchange, access control, identity authentication and copyright protection. Although many impressive blockchain-based schemes for data sharing or data exchange scenarios have been presented in recent years, there is still a lack of review and summary of work in this area. In this paper, we conduct a detailed survey of blockchain-based data sharing and data exchange platforms, discussing the latest technical architectures and research results in this field. In particular, we first survey the current blockchain-based data sharing solutions and provide a detailed analysis of system architecture, access control, interoperability, and security. We then review blockchain-based data exchange systems and data marketplaces, discussing trading process, monetization, copyright protection and other related topics.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1477-1495"},"PeriodicalIF":7.2,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Event Extraction by Associating Event Types and Argument Roles 通过关联事件类型和参数角色提取事件
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-07-03 DOI: 10.1109/TBDATA.2023.3291563
Qian Li;Shu Guo;Jia Wu;Jianxin Li;Jiawei Sheng;Hao Peng;Lihong Wang
Event extraction (EE), which acquires structural event knowledge from texts, can be divided into two sub-tasks: event type classification and element extraction (namely identifying triggers and arguments under different role patterns). As different event types always own distinct extraction schemas (i.e., role patterns), previous work on EE usually follows an isolated learning paradigm, performing element extraction independently for different event types. It ignores meaningful associations among event types and argument roles, leading to relatively poor performance for less frequent types/roles. This paper proposes a novel neural association framework for the EE task. Given a document, it first performs type classification via constructing a document-level event graph to associate sentence nodes of different types and adopting a document-awared graph attention network to learn sentence embeddings. Then, element extraction is achieved by building a new schema of argument roles, with a type-awared parameter inheritance mechanism to enhance role preference for extracted elements. As such, our model takes into account type and role associations during EE, enabling implicit information sharing among them. Experimental results show that our approach consistently outperforms most state-of-the-art EE methods in both sub-tasks, especially at least 2.51% and 1.12% improvement of the event trigger identification and argument role classification sub-tasks. Particularly, for types/roles with less training data, the performance is superior to the existing methods.
事件提取(EE)从文本中获取结构化的事件知识,分为事件类型分类和元素提取两个子任务(即识别不同角色模式下的触发器和参数)。由于不同的事件类型总是拥有不同的提取模式(即角色模式),以前关于EE的工作通常遵循一个孤立的学习范式,对不同的事件类型独立地执行元素提取。它忽略了事件类型和参数角色之间有意义的关联,导致较少使用的类型/角色的性能相对较差。本文提出了一种新的面向情感表达任务的神经关联框架。给定一个文档,它首先通过构建文档级事件图来关联不同类型的句子节点,并采用文档感知的图关注网络来学习句子嵌入,从而进行类型分类。然后,通过构建新的参数角色模式来实现元素提取,并使用类型感知的参数继承机制来增强提取元素的角色偏好。因此,我们的模型考虑了EE期间的类型和角色关联,从而实现了它们之间的隐式信息共享。实验结果表明,我们的方法在两个子任务上都优于大多数最先进的EE方法,特别是在事件触发识别和参数角色分类子任务上分别提高了至少2.51%和1.12%。特别是对于训练数据较少的类型/角色,性能优于现有方法。
{"title":"Event Extraction by Associating Event Types and Argument Roles","authors":"Qian Li;Shu Guo;Jia Wu;Jianxin Li;Jiawei Sheng;Hao Peng;Lihong Wang","doi":"10.1109/TBDATA.2023.3291563","DOIUrl":"10.1109/TBDATA.2023.3291563","url":null,"abstract":"Event extraction (EE), which acquires structural event knowledge from texts, can be divided into two sub-tasks: event type classification and element extraction (namely identifying triggers and arguments under different role patterns). As different event types always own distinct extraction schemas (i.e., role patterns), previous work on EE usually follows an isolated learning paradigm, performing element extraction independently for different event types. It ignores meaningful associations among event types and argument roles, leading to relatively poor performance for less frequent types/roles. This paper proposes a novel neural association framework for the EE task. Given a document, it first performs type classification via constructing a document-level event graph to associate sentence nodes of different types and adopting a document-awared graph attention network to learn sentence embeddings. Then, element extraction is achieved by building a new schema of argument roles, with a type-awared parameter inheritance mechanism to enhance role preference for extracted elements. As such, our model takes into account type and role associations during EE, enabling implicit information sharing among them. Experimental results show that our approach consistently outperforms most state-of-the-art EE methods in both sub-tasks, especially at least 2.51% and 1.12% improvement of the event trigger identification and argument role classification sub-tasks. Particularly, for types/roles with less training data, the performance is superior to the existing methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1549-1560"},"PeriodicalIF":7.2,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88360714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Personalized Interventions to Increase the Employment Success of People With Disability 提高残疾人就业成功率的个性化干预措施
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-07-03 DOI: 10.1109/TBDATA.2023.3291547
Ha Xuan Tran;Thuc Duy Le;Jiuyong Li;Lin Liu;Jixue Liu;Yanchang Zhao;Tony Waters
An emerging problem in Disability Employment Services (DES) is recommending to people with disability the right skill to upgrade and the right upgrade level to achieve maximum improvement in their employment success. This problem requires causal reasoning to estimate the individual causal effect of possible factors on the outcome to determine the most effective intervention. In this paper, we propose a causal graph based framework to solve the intervention recommendation problem for survival outcome (job retention time) and non-survival outcome (employment status). For an individual, a personalized causal graph is predicted for them. It indicates which factors affect the outcome and their causal effects at different intervention levels. Based on the causal graph, we can determine the most effective intervention for an individual, i.e., the one that can generate a maximum outcome increase. Experiments with two case studies show that our framework can help people with disability increase their employment success. Evaluations with public datasets also show the advantage of our framework in other applications.
残疾人士就业服务(DES)的一个新问题是向残疾人士推荐适当的技能和适当的升级水平,以最大限度地提高他们的就业成功率。这个问题需要因果推理来估计可能因素对结果的个别因果影响,以确定最有效的干预措施。在本文中,我们提出了一个基于因果图的框架来解决生存结果(工作保留时间)和非生存结果(就业状态)的干预推荐问题。对于个人来说,一个个性化的因果图被预测出来。它表明在不同的干预水平下,哪些因素影响结果及其因果关系。根据因果图,我们可以确定对个体最有效的干预措施,即能够产生最大结果增加的干预措施。两个案例研究的实验表明,我们的框架可以帮助残疾人提高他们的就业成功率。使用公共数据集进行评估也显示了我们的框架在其他应用程序中的优势。
{"title":"Personalized Interventions to Increase the Employment Success of People With Disability","authors":"Ha Xuan Tran;Thuc Duy Le;Jiuyong Li;Lin Liu;Jixue Liu;Yanchang Zhao;Tony Waters","doi":"10.1109/TBDATA.2023.3291547","DOIUrl":"10.1109/TBDATA.2023.3291547","url":null,"abstract":"An emerging problem in Disability Employment Services (DES) is recommending to people with disability the right skill to upgrade and the right upgrade level to achieve maximum improvement in their employment success. This problem requires causal reasoning to estimate the individual causal effect of possible factors on the outcome to determine the most effective intervention. In this paper, we propose a causal graph based framework to solve the intervention recommendation problem for survival outcome (job retention time) and non-survival outcome (employment status). For an individual, a personalized causal graph is predicted for them. It indicates which factors affect the outcome and their causal effects at different intervention levels. Based on the causal graph, we can determine the most effective intervention for an individual, i.e., the one that can generate a maximum outcome increase. Experiments with two case studies show that our framework can help people with disability increase their employment success. Evaluations with public datasets also show the advantage of our framework in other applications.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1561-1574"},"PeriodicalIF":7.2,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of Visual Affordance Recognition Based on Deep Learning 基于深度学习的视觉可视性识别研究综述
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-07-03 DOI: 10.1109/TBDATA.2023.3291558
Dongpan Chen;Dehui Kong;Jinghua Li;Shaofan Wang;Baocai Yin
Visual affordance recognition is an important research topic in robotics, human-computer interaction, and other computer vision tasks. In recent years, deep learning-based affordance recognition methods have achieved remarkable performance. However, there is no unified and intensive survey of these methods up to now. Therefore, this article reviews and investigates existing deep learning-based affordance recognition methods from a comprehensive perspective, hoping to pursue greater acceleration in this research domain. Specifically, this article first classifies affordance recognition into five tasks, delves into the methodologies of each task, and explores their rationales and essential relations. Second, several representative affordance recognition datasets are investigated carefully. Third, based on these datasets, this article provides a comprehensive performance comparison and analysis of the current affordance recognition methods, reporting the results of different methods on the same datasets and the results of each method on different datasets. Finally, this article summarizes the progress of affordance recognition, outlines the existing difficulties and provides corresponding solutions, and discusses its future application trends.
视觉特征识别是机器人技术、人机交互和其他计算机视觉任务中的一个重要研究课题。近年来,基于深度学习的可视性识别方法取得了令人瞩目的成绩。然而,目前对这些方法还没有统一而深入的研究。因此,本文从综合的角度对现有的基于深度学习的可视性识别方法进行了回顾和研究,希望能在这一研究领域取得更大的进步。具体而言,本文首先将功能识别分为五个任务,并对每个任务的方法进行了探讨,并探讨了它们之间的基本原理和本质关系。其次,仔细研究了几个具有代表性的功能识别数据集。第三,在这些数据集的基础上,本文对现有的功能识别方法进行了全面的性能比较和分析,报告了不同方法在同一数据集上的结果,以及每种方法在不同数据集上的结果。最后,本文总结了功能识别的研究进展,指出了存在的困难并提出了相应的解决方案,并对其未来的应用趋势进行了探讨。
{"title":"A Survey of Visual Affordance Recognition Based on Deep Learning","authors":"Dongpan Chen;Dehui Kong;Jinghua Li;Shaofan Wang;Baocai Yin","doi":"10.1109/TBDATA.2023.3291558","DOIUrl":"10.1109/TBDATA.2023.3291558","url":null,"abstract":"Visual affordance recognition is an important research topic in robotics, human-computer interaction, and other computer vision tasks. In recent years, deep learning-based affordance recognition methods have achieved remarkable performance. However, there is no unified and intensive survey of these methods up to now. Therefore, this article reviews and investigates existing deep learning-based affordance recognition methods from a comprehensive perspective, hoping to pursue greater acceleration in this research domain. Specifically, this article first classifies affordance recognition into five tasks, delves into the methodologies of each task, and explores their rationales and essential relations. Second, several representative affordance recognition datasets are investigated carefully. Third, based on these datasets, this article provides a comprehensive performance comparison and analysis of the current affordance recognition methods, reporting the results of different methods on the same datasets and the results of each method on different datasets. Finally, this article summarizes the progress of affordance recognition, outlines the existing difficulties and provides corresponding solutions, and discusses its future application trends.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1458-1476"},"PeriodicalIF":7.2,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multivariate Time-Series Forecasting Model: Predictability Analysis and Empirical Study 多元时间序列预测模型:可预测性分析与实证研究
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-22 DOI: 10.1109/TBDATA.2023.3288693
Qinpei Zhao;Guangda Yang;Kai Zhao;Jiaming Yin;Weixiong Rao;Lei Chen
Multivariate time series forecasting has wide applications such as traffic flow prediction, supermarket commodity demand forecasting and etc., and a large number of forecasting models have been developed. Given these models, a natural question has been raised: what theoretical limits of forecasting accuracy can these models achieve? Recent works of urban human mobility prediction have made progress on the maximum predictability that any algorithm can achieve. However, existing approaches on maximum predictability on the multivariate time series fully ignore the interrelationship between multiple variables. In this article, we propose a methodology to measure the upper limit of predictability for multivariate time series with multivariate constraint relations. The key of the proposed methodology is a novel entropy, named Multivariate Constraint Sample Entropy (McSE), to incorporate the multivariate constraint relations for better predictability. We conduct a systematic evaluation over eight datasets and compare existing methods with our proposed predictability and find that we get a higher predictability. We also find that the forecasting algorithms that capture the multivariate constraint relation information, such as GNN, can achieve higher accuracy, confirming the importance of multivariate constraint relations for predictability.
多元时间序列预测在交通流量预测、超市商品需求预测等方面有着广泛的应用,并开发了大量的预测模型。考虑到这些模型,一个自然的问题出现了:这些模型能达到的预测精度的理论极限是什么?最近的城市人口流动预测工作在任何算法都能达到的最大可预测性方面取得了进展。然而,现有的多变量时间序列最大可预测性方法完全忽略了多变量之间的相互关系。本文提出了一种测量具有多变量约束关系的多变量时间序列的可预测性上限的方法。该方法的关键是一个新的熵,称为多元约束样本熵(McSE),它包含了多变量约束关系,以获得更好的可预测性。我们对八个数据集进行了系统评估,并将现有方法与我们提出的可预测性进行了比较,发现我们得到了更高的可预测性。我们还发现,捕获多变量约束关系信息的预测算法,如GNN,可以达到更高的精度,证实了多变量约束关系对可预测性的重要性。
{"title":"Multivariate Time-Series Forecasting Model: Predictability Analysis and Empirical Study","authors":"Qinpei Zhao;Guangda Yang;Kai Zhao;Jiaming Yin;Weixiong Rao;Lei Chen","doi":"10.1109/TBDATA.2023.3288693","DOIUrl":"10.1109/TBDATA.2023.3288693","url":null,"abstract":"Multivariate time series forecasting has wide applications such as traffic flow prediction, supermarket commodity demand forecasting and etc., and a large number of forecasting models have been developed. Given these models, a natural question has been raised: what theoretical limits of forecasting accuracy can these models achieve? Recent works of urban human mobility prediction have made progress on the maximum predictability that any algorithm can achieve. However, existing approaches on maximum predictability on the multivariate time series fully ignore the interrelationship between multiple variables. In this article, we propose a methodology to measure the upper limit of predictability for multivariate time series with multivariate constraint relations. The key of the proposed methodology is a novel entropy, named Multivariate Constraint Sample Entropy (\u0000<italic>McSE</i>\u0000), to incorporate the multivariate constraint relations for better predictability. We conduct a systematic evaluation over eight datasets and compare existing methods with our proposed predictability and find that we get a higher predictability. We also find that the forecasting algorithms that capture the multivariate constraint relation information, such as GNN, can achieve higher accuracy, confirming the importance of multivariate constraint relations for predictability.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1536-1548"},"PeriodicalIF":7.2,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Privacy-Aware Causal Structure Learning in Federated Setting 联邦环境下隐私感知因果结构学习研究
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-13 DOI: 10.1109/TBDATA.2023.3285477
Jianli Huang;Xianjie Guo;Kui Yu;Fuyuan Cao;Jiye Liang
Causal structure learning has been extensively studied and widely used in machine learning and various applications. To achieve an ideal performance, existing causal structure learning algorithms often need to centralize a large amount of data from multiple data sources. However, in the privacy-preserving setting, it is impossible to centralize data from all sources and put them together as a single dataset. To preserve data privacy, federated learning as a new learning paradigm has attached much attention in machine learning in recent years. In this paper, we study a privacy-aware causal structure learning problem in the federated setting and propose a novel federated PC (FedPC) algorithm with two new strategies for preserving data privacy without centralizing data. Specifically, we first propose a novel layer-wise aggregation strategy for a seamless adaptation of the PC algorithm into the federated learning paradigm for federated skeleton learning, then we design an effective strategy for learning consistent separation sets for federated edge orientation. The extensive experiments validate that FedPC is effective for causal structure learning in federated learning setting.
因果结构学习在机器学习和各种应用中得到了广泛的研究和应用。为了达到理想的性能,现有的因果结构学习算法往往需要对来自多个数据源的大量数据进行集中处理。然而,在隐私保护设置中,不可能集中所有来源的数据并将它们放在一起作为单个数据集。为了保护数据隐私,联邦学习作为一种新的学习范式,近年来在机器学习领域备受关注。本文研究了联邦环境下隐私感知的因果结构学习问题,提出了一种新的联邦PC (FedPC)算法,该算法采用两种新的策略来保护数据隐私,而不需要将数据集中。具体来说,我们首先提出了一种新的分层聚合策略,用于将PC算法无缝地适应到联邦骨架学习的联邦学习范式中,然后我们设计了一种有效的策略来学习联邦边缘方向的一致分离集。大量的实验验证了FedPC在联邦学习环境下对因果结构学习的有效性。
{"title":"Towards Privacy-Aware Causal Structure Learning in Federated Setting","authors":"Jianli Huang;Xianjie Guo;Kui Yu;Fuyuan Cao;Jiye Liang","doi":"10.1109/TBDATA.2023.3285477","DOIUrl":"10.1109/TBDATA.2023.3285477","url":null,"abstract":"Causal structure learning has been extensively studied and widely used in machine learning and various applications. To achieve an ideal performance, existing causal structure learning algorithms often need to centralize a large amount of data from multiple data sources. However, in the privacy-preserving setting, it is impossible to centralize data from all sources and put them together as a single dataset. To preserve data privacy, federated learning as a new learning paradigm has attached much attention in machine learning in recent years. In this paper, we study a privacy-aware causal structure learning problem in the federated setting and propose a novel federated PC (FedPC) algorithm with two new strategies for preserving data privacy without centralizing data. Specifically, we first propose a novel layer-wise aggregation strategy for a seamless adaptation of the PC algorithm into the federated learning paradigm for federated skeleton learning, then we design an effective strategy for learning consistent separation sets for federated edge orientation. The extensive experiments validate that FedPC is effective for causal structure learning in federated learning setting.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1525-1535"},"PeriodicalIF":7.2,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77771046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RGSE: Robust Graph Structure Embedding for Anomalous Link Detection 基于鲁棒图结构嵌入的异常链路检测
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-08 DOI: 10.1109/TBDATA.2023.3284270
Zhen Liu;Wenbo Zuo;Dongning Zhang;Xiaodong Feng
Anomalous links such as noisy links or adversarial edges widely exist in real-world networks, which may undermine the credibility of the network study, e.g., community detection in social networks. Therefore, anomalous links need to be removed from the polluted network by a detector. Due to the co-existence of normal links and anomalous links, how to identify anomalous links in a polluted network is a challenging issue. By designing a robust graph structure embedding framework, also called RGSE, the link-level feature representations that are generated from both global embedding view and local stable view can be used for anomalous link detection on contaminated graphs. Comparison experiments on a variety of datasets demonstrate that the new model and its variants achieve up to an average 5.2% improvement with respect to the accuracy of anomalous link detection against the traditional graph representation models. Further analyses also provide interpretable evidence to support the model's superiority.
异常链接,如噪声链接或对抗性边缘,广泛存在于真实世界的网络中,这可能会破坏网络研究的可信度,例如社交网络中的社区检测。因此,异常链路需要通过检测器从被污染的网络中去除。由于正常链路和异常链路共存,如何识别污染网络中的异常链路是一个具有挑战性的问题。通过设计一个鲁棒的图结构嵌入框架,也称为RGSE,从全局嵌入视图和局部稳定视图生成的链接级特征表示可以用于污染图上的异常链接检测。在各种数据集上的比较实验表明,与传统的图表示模型相比,新模型及其变体在异常链接检测的准确性方面平均提高了5.2%。进一步的分析也提供了可解释的证据来支持该模型的优越性。
{"title":"RGSE: Robust Graph Structure Embedding for Anomalous Link Detection","authors":"Zhen Liu;Wenbo Zuo;Dongning Zhang;Xiaodong Feng","doi":"10.1109/TBDATA.2023.3284270","DOIUrl":"10.1109/TBDATA.2023.3284270","url":null,"abstract":"Anomalous links such as noisy links or adversarial edges widely exist in real-world networks, which may undermine the credibility of the network study, e.g., community detection in social networks. Therefore, anomalous links need to be removed from the polluted network by a detector. Due to the co-existence of normal links and anomalous links, how to identify anomalous links in a polluted network is a challenging issue. By designing a robust graph structure embedding framework, also called RGSE, the link-level feature representations that are generated from both global embedding view and local stable view can be used for anomalous link detection on contaminated graphs. Comparison experiments on a variety of datasets demonstrate that the new model and its variants achieve up to an average 5.2% improvement with respect to the accuracy of anomalous link detection against the traditional graph representation models. Further analyses also provide interpretable evidence to support the model's superiority.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1420-1429"},"PeriodicalIF":7.2,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46406222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outsourced Privacy-Preserving Data Alignment on Vertically Partitioned Database 垂直分区数据库上的外包隐私保护数据对齐
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-08 DOI: 10.1109/TBDATA.2023.3284271
Zhuzhu Wang;Cui Hu;Bin Xiao;Yang Liu;Teng Li;Zhuo Ma;Jianfeng Ma
In the context of real-world secure outsourced computations, private data alignment has been always the essential preprocessing step. However, current private data alignment schemes, mainly circuit-based, suffer from high communication overhead and often need to transfer potentially gigabytes of data. In this paper, we propose a lightweight private data alignment protocol (called SC-PSI) that can overcome the bottleneck of communication. Specifically, SC-PSI involves four phases of computations, including data preprocessing, data outsourcing, private set member (PSM) evaluation and circuit computation (CC). Like prior works, the major overhead of SC-PSI mainly lies in the latter two phases. The improvement is SC-PSI utilizes the function secret sharing technique to develop the PSM protocol, which avoids the multiple rounds of communication to compute intersection set members. Moreover, benefited from our specially designed PSM protocol, SC-PSI does not to execute complex secure comparison circuits in the CC phase. Experimentally, we validate that compared to prior works, SC-PSI can save around 61.39% running time and 89.61% communication overhead.
在现实世界安全外包计算的背景下,私有数据对齐一直是必不可少的预处理步骤。然而,当前的私有数据对齐方案,主要是基于电路的,存在高通信开销,并且经常需要传输潜在的千兆字节的数据。在本文中,我们提出了一种轻量级的专用数据对齐协议(称为SC-PSI),可以克服通信瓶颈。具体来说,SC-PSI涉及四个阶段的计算,包括数据预处理、数据外包、私有集成员(PSM)评估和电路计算(CC)。与先前的工作一样,SC-PSI的主要开销主要位于后两个阶段。改进之处在于SC-PSI利用函数秘密共享技术开发了PSM协议,避免了计算交集集成员的多轮通信。此外,得益于我们专门设计的PSM协议,SC-PSI在CC阶段不执行复杂的安全比较电路。实验证明,与以前的工作相比,SC-PSI可以节省约61.39%的运行时间和89.61%的通信开销。
{"title":"Outsourced Privacy-Preserving Data Alignment on Vertically Partitioned Database","authors":"Zhuzhu Wang;Cui Hu;Bin Xiao;Yang Liu;Teng Li;Zhuo Ma;Jianfeng Ma","doi":"10.1109/TBDATA.2023.3284271","DOIUrl":"10.1109/TBDATA.2023.3284271","url":null,"abstract":"In the context of real-world secure outsourced computations, private data alignment has been always the essential preprocessing step. However, current private data alignment schemes, mainly circuit-based, suffer from high communication overhead and often need to transfer potentially gigabytes of data. In this paper, we propose a lightweight private data alignment protocol (called SC-PSI) that can overcome the bottleneck of communication. Specifically, SC-PSI involves four phases of computations, including data preprocessing, data outsourcing, private set member (PSM) evaluation and circuit computation (CC). Like prior works, the major overhead of SC-PSI mainly lies in the latter two phases. The improvement is SC-PSI utilizes the function secret sharing technique to develop the PSM protocol, which avoids the multiple rounds of communication to compute intersection set members. Moreover, benefited from our specially designed PSM protocol, SC-PSI does not to execute complex secure comparison circuits in the CC phase. Experimentally, we validate that compared to prior works, SC-PSI can save around 61.39% running time and 89.61% communication overhead.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1408-1419"},"PeriodicalIF":7.2,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44411408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards an Energy Complexity Model for Distributed Data Processing Algorithms 分布式数据处理算法的能量复杂度模型
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-08 DOI: 10.1109/TBDATA.2023.3284259
Jie Song;Xingchen Zhao;Chaopeng Guo;Yu Gu;Ge Yu
Modern data centers exist as infrastructure in the era of Big Data. Big data processing applications are the major computing workload of data centers. Electricity cost accounts for about 50% of data centers’ operational costs. Therefore, the energy consumed for running distributed data processing algorithms on a data center is starting to attract both academia and industry. Most works study the energy consumption from the hardware perspective and only a few of them from the algorithm perspective. A general and hardware-independent energy evaluation model for the algorithms is in demand. With the model, algorithm designers can evaluate the energy consumption, compare energy consumption features and facilitate energy consumption optimization of distributed data processing algorithms. Inspired by the time complexity model, we propose an energy complexity model for describing the trends that an algorithm's energy consumption grows with the algorithm's input size. We argue that a good algorithm, especially for processing Big Data, should have a ‘small’ energy complexity. We define $E(n)$ to represent the functional relationship that associates an algorithm's input size $n$ with its notional energy consumption $E$. Based on the well-known abstract Bulk Synchronous Parallel (BSP) computer and programming model, we present a complete $E(n)$ solution, including abstraction, generalization, quantification, derivation, comparison, analysis, examples, verification, and applications. Comprehensive experimental analysis shows that the proposed energy complexity model is practical, interestingly, and not equivalent to time complexity.
现代数据中心作为大数据时代的基础设施而存在。大数据处理应用是数据中心的主要计算工作量。电力成本约占数据中心运营成本的50%。因此,在数据中心运行分布式数据处理算法所消耗的能量开始引起学术界和工业界的关注。大多数研究从硬件角度研究能耗,从算法角度研究能耗的研究很少。需要一种通用的、与硬件无关的算法能量评估模型。通过该模型,算法设计者可以对分布式数据处理算法的能耗进行评估,比较能耗特征,便于对分布式数据处理算法进行能耗优化。受时间复杂度模型的启发,我们提出了一个能量复杂度模型来描述算法的能量消耗随算法输入规模的增长趋势。我们认为,一个好的算法,尤其是处理大数据的算法,应该具有“小”的能量复杂度。我们定义$E(n)$来表示将算法的输入大小$n$与其名义能耗$E$相关联的函数关系。基于著名的批量同步并行(Bulk Synchronous Parallel, BSP)计算机和编程模型,我们提出了一个完整的$E(n)$解决方案,包括抽象、概括、量化、推导、比较、分析、实例、验证和应用。综合实验分析表明,所提出的能量复杂度模型具有实用性和趣味性,且不等同于时间复杂度。
{"title":"Towards an Energy Complexity Model for Distributed Data Processing Algorithms","authors":"Jie Song;Xingchen Zhao;Chaopeng Guo;Yu Gu;Ge Yu","doi":"10.1109/TBDATA.2023.3284259","DOIUrl":"10.1109/TBDATA.2023.3284259","url":null,"abstract":"Modern data centers exist as infrastructure in the era of Big Data. Big data processing applications are the major computing workload of data centers. Electricity cost accounts for about 50% of data centers’ operational costs. Therefore, the energy consumed for running distributed data processing algorithms on a data center is starting to attract both academia and industry. Most works study the energy consumption from the hardware perspective and only a few of them from the algorithm perspective. A general and hardware-independent energy evaluation model for the algorithms is in demand. With the model, algorithm designers can evaluate the energy consumption, compare energy consumption features and facilitate energy consumption optimization of distributed data processing algorithms. Inspired by the time complexity model, we propose an energy complexity model for describing the trends that an algorithm's energy consumption grows with the algorithm's input size. We argue that a good algorithm, especially for processing Big Data, should have a ‘small’ energy complexity. We define \u0000<inline-formula><tex-math>$E(n)$</tex-math></inline-formula>\u0000 to represent the functional relationship that associates an algorithm's input size \u0000<inline-formula><tex-math>$n$</tex-math></inline-formula>\u0000 with its notional energy consumption \u0000<inline-formula><tex-math>$E$</tex-math></inline-formula>\u0000. Based on the well-known abstract Bulk Synchronous Parallel (BSP) computer and programming model, we present a complete \u0000<inline-formula><tex-math>$E(n)$</tex-math></inline-formula>\u0000 solution, including abstraction, generalization, quantification, derivation, comparison, analysis, examples, verification, and applications. Comprehensive experimental analysis shows that the proposed energy complexity model is practical, interestingly, and not equivalent to time complexity.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1510-1524"},"PeriodicalIF":7.2,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-Agnostic Method: Exposing Deepfake Using Pixel-Wise Spatial and Temporal Fingerprints 模型不可知方法:利用逐像素时空指纹暴露深度造假
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-06-08 DOI: 10.1109/TBDATA.2023.3284272
Jun Yang;Yaoru Sun;Maoyu Mao;Lizhi Bai;Siyu Zhang;Fang Wang
Deepfake poses a serious threat to the reliability of judicial evidence and intellectual property protection. Existing detection methods either blindly utilize deep learning or use biosignal features, but neither considers spatial and temporal relevance of face features. These methods are increasingly unable to resist the growing realism of fake videos and lack generalization. In this paper, we identify a reliable fingerprint through the consistency of AR coefficients and extend the original PPG signal to 3-dimensional fingerprints to effectively detect fake content. Using these reliable fingerprints, we propose a novel model-agnostic method to expose Deepfake by analyzing temporal and spatial faint synthetic signals hidden in portrait videos. Specifically, our method extracts two types of faint information, i.e., PPG features and AR features, which are used as the basis for forensics in temporal and spatial domains, respectively. PPG allows remote estimation of the heart rate in face videos, and irregular heart rate fluctuations expose traces of tampering. AR coefficients reflect pixel-wise correlation and spatial traces of smoothing caused by up-sampling in the process of generating fake faces. Furthermore, we employ two ACBlock-based DenseNets as classifiers. Our method provides state-of-the-art performance on multiple deep forgery datasets and demonstrates better generalization.
深度造假对司法证据的可靠性和知识产权保护构成严重威胁。现有的检测方法要么盲目利用深度学习,要么利用生物信号特征,但都没有考虑人脸特征的时空相关性。这些方法越来越无法抵抗假视频日益增长的真实感,缺乏泛化。在本文中,我们通过AR系数的一致性来识别可靠的指纹,并将原始的PPG信号扩展到三维指纹中,以有效地检测虚假内容。利用这些可靠的指纹,我们提出了一种新的模型不可知方法,通过分析隐藏在人像视频中的时空微弱合成信号来暴露Deepfake。具体来说,我们的方法提取了两种类型的微弱信息,即PPG特征和AR特征,分别作为时间和空间域取证的基础。PPG允许远程估计面部视频中的心率,不规则的心率波动暴露了篡改的痕迹。AR系数反映了伪人脸生成过程中上采样产生的逐像素相关性和平滑的空间痕迹。此外,我们使用两个基于acblock的densenet作为分类器。我们的方法在多个深度伪造数据集上提供了最先进的性能,并展示了更好的泛化。
{"title":"Model-Agnostic Method: Exposing Deepfake Using Pixel-Wise Spatial and Temporal Fingerprints","authors":"Jun Yang;Yaoru Sun;Maoyu Mao;Lizhi Bai;Siyu Zhang;Fang Wang","doi":"10.1109/TBDATA.2023.3284272","DOIUrl":"10.1109/TBDATA.2023.3284272","url":null,"abstract":"Deepfake poses a serious threat to the reliability of judicial evidence and intellectual property protection. Existing detection methods either blindly utilize deep learning or use biosignal features, but neither considers spatial and temporal relevance of face features. These methods are increasingly unable to resist the growing realism of fake videos and lack generalization. In this paper, we identify a reliable fingerprint through the consistency of AR coefficients and extend the original PPG signal to 3-dimensional fingerprints to effectively detect fake content. Using these reliable fingerprints, we propose a novel model-agnostic method to expose Deepfake by analyzing temporal and spatial faint synthetic signals hidden in portrait videos. Specifically, our method extracts two types of faint information, i.e., PPG features and AR features, which are used as the basis for forensics in temporal and spatial domains, respectively. PPG allows remote estimation of the heart rate in face videos, and irregular heart rate fluctuations expose traces of tampering. AR coefficients reflect pixel-wise correlation and spatial traces of smoothing caused by up-sampling in the process of generating fake faces. Furthermore, we employ two ACBlock-based DenseNets as classifiers. Our method provides state-of-the-art performance on multiple deep forgery datasets and demonstrates better generalization.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1496-1509"},"PeriodicalIF":7.2,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1