IEEE Transactions on Big Data最新文献_第3页

Parallelly Running and Privacy-Preserving Agglomerative Hierarchical Clustering in Outsourced Cloud Computing Environments 外包云计算环境中并行运行和隐私保护的聚类层次聚类

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-20 DOI: 10.1109/TBDATA.2024.3403375

Jeongsu Park;Dong Hoon Lee

As a Big Data analysis technique, hierarchical clustering is helpful in summarizing data since it returns the clusters of the data and their clustering history. Cloud computing is the most suitable option to efficiently perform hierarchical clustering over numerous data. However, since compromised cloud service providers can cause serious privacy problems by revealing data, it is necessary to solve the problems prior to using the external cloud computing service. Privacy-preserving hierarchical clustering protocol in an outsourced computing environment has never been proposed in existing works. Existing protocols have several problems that limit the number of participating data owners or disclose the information of data. In this article, we propose a parallelly running and privacy-preserving agglomerative hierarchical clustering (ppAHC) over the union of datasets of multiple data owners in an outsourced computing environment, which is the first protocol to the best of our knowledge. The proposed ppAHC does not disclose any information about input and output, including the data access patterns. The proposed ppAHC is highly efficient and suitable for Big Data analysis to handle numerous data since its cost for one round is independent of the amount of data. It allows data owners without sufficient computing capability to participate in a collaborative hierarchical clustering.

作为一种大数据分析技术，分层聚类可以返回数据的聚类及其聚类历史，有助于对数据进行汇总。云计算是在大量数据上有效执行分层聚类的最合适选择。但是，由于被入侵的云服务提供商可能会泄露数据，从而造成严重的隐私问题，因此有必要在使用外部云计算服务之前解决这些问题。外包计算环境下保护隐私的分层聚类协议在现有文献中尚未被提出。现有协议存在几个问题，即限制参与数据所有者的数量或泄露数据信息。在本文中，我们在外包计算环境中针对多个数据所有者的数据集联合提出了并行运行和保护隐私的聚合分层聚类（ppAHC），这是我们所知的第一个协议。提议的ppAHC没有披露任何关于输入和输出的信息，包括数据访问模式。所提出的ppAHC是高效的，适合于处理大量数据的大数据分析，因为它的一轮成本与数据量无关。它允许没有足够计算能力的数据所有者参与协作分层集群。

{"title":"Parallelly Running and Privacy-Preserving Agglomerative Hierarchical Clustering in Outsourced Cloud Computing Environments","authors":"Jeongsu Park;Dong Hoon Lee","doi":"10.1109/TBDATA.2024.3403375","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3403375","url":null,"abstract":"As a Big Data analysis technique, hierarchical clustering is helpful in summarizing data since it returns the clusters of the data and their clustering history. Cloud computing is the most suitable option to efficiently perform hierarchical clustering over numerous data. However, since compromised cloud service providers can cause serious privacy problems by revealing data, it is necessary to solve the problems prior to using the external cloud computing service. Privacy-preserving hierarchical clustering protocol in an outsourced computing environment has never been proposed in existing works. Existing protocols have several problems that limit the number of participating data owners or disclose the information of data. In this article, we propose a parallelly running and privacy-preserving agglomerative hierarchical clustering (ppAHC) over the union of datasets of multiple data owners in an outsourced computing environment, which is the first protocol to the best of our knowledge. The proposed ppAHC does not disclose any information about input and output, including the data access patterns. The proposed ppAHC is highly efficient and suitable for Big Data analysis to handle numerous data since its cost for one round is independent of the amount of data. It allows data owners without sufficient computing capability to participate in a collaborative hierarchical clustering.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"174-189"},"PeriodicalIF":7.5,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Training Large-Scale Graph Neural Networks via Graph Partial Pooling 通过图部分池化训练大规模图神经网络

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-20 DOI: 10.1109/TBDATA.2024.3403380

Qi Zhang;Yanfeng Sun;Shaofan Wang;Junbin Gao;Yongli Hu;Baocai Yin

Graph Neural Networks (GNNs) are powerful tools for graph representation learning, but they face challenges when applied to large-scale graphs due to substantial computational costs and memory requirements. To address scalability limitations, various methods have been proposed, including sampling-based and decoupling-based methods. However, these methods have their limitations: sampling-based methods inevitably discard some link information during the sampling process, while decoupling-based methods require alterations to the model's structure, reducing their adaptability to various GNNs. This paper proposes a novel graph pooling method, Graph Partial Pooling (GPPool), for scaling GNNs to large-scale graphs. GPPool is a versatile and straightforward technique that enhances training efficiency while simultaneously reducing memory requirements. GPPool constructs small-scale pooled graphs by pooling partial nodes into supernodes. Each pooled graph consists of supernodes and unpooled nodes, preserving valuable local and global information. Training GNNs on these graphs reduces memory demands and enhances their performance. Additionally, this paper provides a theoretical analysis of training GNNs using GPPool-constructed graphs from a graph diffusion perspective. It shows that a GNN can be transformed from a large-scale graph into pooled graphs with minimal approximation error. A series of experiments on datasets of varying scales demonstrates the effectiveness of GPPool.

图神经网络（gnn）是图表示学习的强大工具，但由于大量的计算成本和内存需求，它们在应用于大规模图时面临挑战。为了解决可伸缩性的限制，已经提出了各种方法，包括基于采样和基于解耦的方法。然而，这些方法都有其局限性：基于采样的方法在采样过程中不可避免地丢弃了一些链路信息，而基于解耦的方法需要改变模型的结构，降低了其对各种gnn的适应性。本文提出了一种新的图池化方法——图部分池化（GPPool），用于将gnn扩展到大规模图。GPPool是一种通用且简单的技术，可以提高训练效率，同时降低内存需求。GPPool通过将部分节点池化为超级节点来构建小规模池化图。每个池化图由超级节点和非池化节点组成，保留有价值的局部和全局信息。在这些图上训练gnn可以减少内存需求并提高其性能。此外，本文还从图扩散的角度对gppool构造图训练gnn进行了理论分析。结果表明，GNN可以以最小的近似误差将大规模图转换为池图。在不同规模的数据集上进行的一系列实验证明了GPPool的有效性。

{"title":"Training Large-Scale Graph Neural Networks via Graph Partial Pooling","authors":"Qi Zhang;Yanfeng Sun;Shaofan Wang;Junbin Gao;Yongli Hu;Baocai Yin","doi":"10.1109/TBDATA.2024.3403380","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3403380","url":null,"abstract":"Graph Neural Networks (GNNs) are powerful tools for graph representation learning, but they face challenges when applied to large-scale graphs due to substantial computational costs and memory requirements. To address scalability limitations, various methods have been proposed, including sampling-based and decoupling-based methods. However, these methods have their limitations: sampling-based methods inevitably discard some link information during the sampling process, while decoupling-based methods require alterations to the model's structure, reducing their adaptability to various GNNs. This paper proposes a novel graph pooling method, Graph Partial Pooling (GPPool), for scaling GNNs to large-scale graphs. GPPool is a versatile and straightforward technique that enhances training efficiency while simultaneously reducing memory requirements. GPPool constructs small-scale pooled graphs by pooling partial nodes into supernodes. Each pooled graph consists of supernodes and unpooled nodes, preserving valuable local and global information. Training GNNs on these graphs reduces memory demands and enhances their performance. Additionally, this paper provides a theoretical analysis of training GNNs using GPPool-constructed graphs from a graph diffusion perspective. It shows that a GNN can be transformed from a large-scale graph into pooled graphs with minimal approximation error. A series of experiments on datasets of varying scales demonstrates the effectiveness of GPPool.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"221-233"},"PeriodicalIF":7.5,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature Subspace Learning-Based Binary Differential Evolution Algorithm for Unsupervised Feature Selection 基于特征子空间学习的无监督特征选择二元差分进化算法

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-19 DOI: 10.1109/TBDATA.2024.3378090

Tao Li;Yuhua Qian;Feijiang Li;Xinyan Liang;Zhi-Hui Zhan

It is a challenging task to select the informative features that can maintain the manifold structure in the original feature space. Many unsupervised feature selection methods still suffer the poor cluster performance in the selected feature subset. To tackle this problem, a feature subspace learning-based binary differential evolution algorithm is proposed for unsupervised feature selection. First, a new unsupervised feature selection framework based on evolutionary computation is designed, in which the feature subspace learning and the population search mechanism are combined into a unified unsupervised feature selection. Second, a local manifold structure learning strategy and a sample pseudo-label learning strategy are presented to calculate the importance of the selected feature subspace. Third, the binary differential evolution algorithm is developed to optimize the selected feature subspace, in which the binary information migration mutation operator and the adaptive crossover operator are designed to promote the searching for the global optimal feature subspace. Experimental results on various types of real-world datasets demonstrate that the proposed algorithm can obtain more informative feature subset and competitive cluster performance compared with eight state-of-the-art unsupervised feature selection methods.

在原始特征空间中选择能够保持流形结构的信息特征是一项具有挑战性的任务。许多无监督特征选择方法在选择的特征子集中仍然存在聚类性能差的问题。为了解决这一问题，提出了一种基于特征子空间学习的二元差分进化算法进行无监督特征选择。首先，设计了一种新的基于进化计算的无监督特征选择框架，将特征子空间学习和种群搜索机制结合为统一的无监督特征选择；其次，提出了局部流形结构学习策略和样本伪标签学习策略来计算所选特征子空间的重要性；第三，提出了优化所选特征子空间的二元差分进化算法，设计了二元信息迁移变异算子和自适应交叉算子，促进了对全局最优特征子空间的搜索；在不同类型的真实数据集上的实验结果表明，与目前最先进的八种无监督特征选择方法相比，该算法可以获得更多信息的特征子集和具有竞争力的聚类性能。

{"title":"Feature Subspace Learning-Based Binary Differential Evolution Algorithm for Unsupervised Feature Selection","authors":"Tao Li;Yuhua Qian;Feijiang Li;Xinyan Liang;Zhi-Hui Zhan","doi":"10.1109/TBDATA.2024.3378090","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3378090","url":null,"abstract":"It is a challenging task to select the informative features that can maintain the manifold structure in the original feature space. Many unsupervised feature selection methods still suffer the poor cluster performance in the selected feature subset. To tackle this problem, a feature subspace learning-based binary differential evolution algorithm is proposed for unsupervised feature selection. First, a new unsupervised feature selection framework based on evolutionary computation is designed, in which the feature subspace learning and the population search mechanism are combined into a unified unsupervised feature selection. Second, a local manifold structure learning strategy and a sample pseudo-label learning strategy are presented to calculate the importance of the selected feature subspace. Third, the binary differential evolution algorithm is developed to optimize the selected feature subspace, in which the binary information migration mutation operator and the adaptive crossover operator are designed to promote the searching for the global optimal feature subspace. Experimental results on various types of real-world datasets demonstrate that the proposed algorithm can obtain more informative feature subset and competitive cluster performance compared with eight state-of-the-art unsupervised feature selection methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"99-114"},"PeriodicalIF":7.5,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning From Crowds Using Graph Neural Networks With Attention Mechanism 基于注意机制的图神经网络群体学习

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-19 DOI: 10.1109/TBDATA.2024.3378100

Jing Zhang;Ming Wu;Zeyi Sun;Cangqi Zhou

Crowdsourcing has been playing an essential role in machine learning since it can obtain a large number of labels in an economical and fast manner for training increasingly complex learning models. However, the application of crowdsourcing learning still faces several challenges such as the low quality of crowd labels and the urgent requirement for learning models adapting to the label noises. There have been many studies focusing on truth inference algorithms to improve the quality of labels obtained by crowdsourcing. Comparably, end-to-end predictive model learning in crowdsourcing scenarios, especially using cutting-edge deep learning techniques, is still in its infant stage. In this paper, we propose a novel graph convolutional network-based framework, namely CGNNAT, which models the correlation of instances by combining the GCN model with an attention mechanism to learn more representative node embeddings for a better understanding of the bias tendency of crowd workers. Furthermore, a specific projection processing layer is employed in CGNNAT to model the reliability of each crowd worker, which makes the model an end-to-end neural network directly trained by noisy crowd labels. Experimental results on several real-world and synthetic datasets show that the proposed CGNNAT outperforms state-of-the-art and classical methods in terms of label prediction.

众包在机器学习中发挥着至关重要的作用，因为它可以以经济快速的方式获得大量的标签，用于训练日益复杂的学习模型。然而，众包学习的应用仍然面临着一些挑战，如群体标签的质量不高以及对适应标签噪声的学习模型的迫切要求。为了提高众包获得的标签的质量，已经有很多研究集中在真值推断算法上。相比之下，在众包场景下的端到端预测模型学习，特别是使用尖端的深度学习技术，仍处于起步阶段。在本文中，我们提出了一种新的基于图卷积网络的框架，即CGNNAT，该框架将GCN模型与注意机制相结合，对实例的相关性进行建模，以学习更具代表性的节点嵌入，从而更好地理解人群工作者的偏见倾向。此外，在CGNNAT中使用特定的投影处理层对每个人群工作者的可靠性进行建模，使该模型成为一个端到端的神经网络，直接由噪声人群标签训练。在多个真实数据集和合成数据集上的实验结果表明，所提出的CGNNAT在标签预测方面优于最先进的和经典的方法。

{"title":"Learning From Crowds Using Graph Neural Networks With Attention Mechanism","authors":"Jing Zhang;Ming Wu;Zeyi Sun;Cangqi Zhou","doi":"10.1109/TBDATA.2024.3378100","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3378100","url":null,"abstract":"Crowdsourcing has been playing an essential role in machine learning since it can obtain a large number of labels in an economical and fast manner for training increasingly complex learning models. However, the application of crowdsourcing learning still faces several challenges such as the low quality of crowd labels and the urgent requirement for learning models adapting to the label noises. There have been many studies focusing on truth inference algorithms to improve the quality of labels obtained by crowdsourcing. Comparably, end-to-end predictive model learning in crowdsourcing scenarios, especially using cutting-edge deep learning techniques, is still in its infant stage. In this paper, we propose a novel graph convolutional network-based framework, namely CGNNAT, which models the correlation of instances by combining the GCN model with an attention mechanism to learn more representative node embeddings for a better understanding of the bias tendency of crowd workers. Furthermore, a specific projection processing layer is employed in CGNNAT to model the reliability of each crowd worker, which makes the model an end-to-end neural network directly trained by noisy crowd labels. Experimental results on several real-world and synthetic datasets show that the proposed CGNNAT outperforms state-of-the-art and classical methods in terms of label prediction.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"86-98"},"PeriodicalIF":7.5,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Distributed Generative Adversarial Network for Data Augmentation Under Vertical Federated Learning 垂直联邦学习下数据增强的分布式生成对抗网络

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-12 DOI: 10.1109/TBDATA.2024.3375150

Yunpeng Xiao;Xufeng Li;Tun Li;Rong Wang;Yucai Pang;Guoyin Wang

Vertical federated learning can aggregate participant data features. To address the issue of insufficient overlapping data in vertical federated learning, this study presents a generative adversarial network model that allows distributed data augmentation. First, this study proposes a distributed generative adversarial network FeCGAN for multiple participants with insufficient overlapping data, considering the fact that the generative adversarial network can generate simulation samples. This network is suitable for multiple data sources and can augment participants’ local data. Second, to address the problem of learning divergence caused by different local distributions of multiple data sources, this study proposes the aggregation algorithm FedKL. It aggregates the feedback of the local discriminator to interact with the generator and learns the local data distribution more accurately. Finally, given the problem of data waste caused by the unavailability of nonoverlapping data, this study proposes a data augmentation method called VFeDA. It uses FeCGAN to generate pseudo features and expands more overlapping data, thereby improving the data use. Experiments showed that the proposed model is suitable for multiple data sources and can generate high-quality data.

垂直联合学习可以聚合参与者的数据特征。为了解决垂直联邦学习中重叠数据不足的问题，本研究提出了一种允许分布式数据增强的生成对抗网络模型。首先，考虑到生成式对抗网络可以生成模拟样本，本研究针对数据重叠不足的多参与者，提出了分布式生成式对抗网络FeCGAN。该网络适用于多个数据源，可以增强参与者的本地数据。其次，针对多个数据源局部分布不同导致的学习发散问题，本文提出了聚合算法FedKL。它将局部鉴别器的反馈聚合起来与生成器交互，更准确地学习到局部数据的分布。最后，针对非重叠数据不可用导致的数据浪费问题，本研究提出了一种数据增强方法VFeDA。利用FeCGAN生成伪特征，扩展更多重叠数据，提高数据利用率。实验表明，该模型适用于多数据源，能够生成高质量的数据。

{"title":"A Distributed Generative Adversarial Network for Data Augmentation Under Vertical Federated Learning","authors":"Yunpeng Xiao;Xufeng Li;Tun Li;Rong Wang;Yucai Pang;Guoyin Wang","doi":"10.1109/TBDATA.2024.3375150","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3375150","url":null,"abstract":"Vertical federated learning can aggregate participant data features. To address the issue of insufficient overlapping data in vertical federated learning, this study presents a generative adversarial network model that allows distributed data augmentation. First, this study proposes a distributed generative adversarial network FeCGAN for multiple participants with insufficient overlapping data, considering the fact that the generative adversarial network can generate simulation samples. This network is suitable for multiple data sources and can augment participants’ local data. Second, to address the problem of learning divergence caused by different local distributions of multiple data sources, this study proposes the aggregation algorithm FedKL. It aggregates the feedback of the local discriminator to interact with the generator and learns the local data distribution more accurately. Finally, given the problem of data waste caused by the unavailability of nonoverlapping data, this study proposes a data augmentation method called VFeDA. It uses FeCGAN to generate pseudo features and expands more overlapping data, thereby improving the data use. Experiments showed that the proposed model is suitable for multiple data sources and can generate high-quality data.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"74-85"},"PeriodicalIF":7.5,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Fast and Robust Attention-Free Heterogeneous Graph Convolutional Network 快速稳健的无注意力异构图卷积网络

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-12 DOI: 10.1109/TBDATA.2024.3375152

Yeyu Yan;Zhongying Zhao;Zhan Yang;Yanwei Yu;Chao Li

Due to the widespread applications of heterogeneous graphs in the real world, heterogeneous graph neural networks (HGNNs) have developed rapidly and made a great success in recent years. To effectively capture the complex interactions in heterogeneous graphs, various attention mechanisms are widely used in designing HGNNs. However, the employment of these attention mechanisms brings two key problems: high computational complexity and poor robustness. To address these problems, we propose a Fast and Robust attention-free Heterogeneous Graph Convolutional Network (FastRo-HGCN) without any attention mechanisms. Specifically, we first construct virtual links based on the topology similarity and feature similarity of the nodes to strengthen the connections between the target nodes. Then, we design type normalization to aggregate and transfer the intra-type and inter-type node information. The above methods are used to reduce the interference of noisy information. Finally, we further enhance the robustness and relieve the negative effects of oversmoothing with the self-loops of nodes. Extensive experimental results on three real-world datasets fully demonstrate that the proposed FastRo-HGCN significantly outperforms the state-of-the-art models.

由于异构图在现实世界中的广泛应用，异构图神经网络（HGNN）近年来发展迅速并取得了巨大成功。为了有效捕捉异构图中的复杂交互，各种注意力机制被广泛应用于异构图神经网络的设计中。然而，这些注意力机制的使用带来了两个关键问题：计算复杂度高和鲁棒性差。为了解决这些问题，我们提出了一种无需任何注意力机制的快速鲁棒无注意力异构图卷积网络（FastRo-HGCN）。具体来说，我们首先根据节点的拓扑相似性和特征相似性构建虚拟链接，以加强目标节点之间的连接。然后，我们设计了类型归一化，以聚合和传递类型内和类型间的节点信息。通过上述方法，可以减少噪声信息的干扰。最后，我们利用节点的自循环进一步增强了鲁棒性并缓解了过平滑的负面影响。在三个真实世界数据集上的广泛实验结果充分证明，所提出的 FastRo-HGCN 明显优于最先进的模型。

{"title":"A Fast and Robust Attention-Free Heterogeneous Graph Convolutional Network","authors":"Yeyu Yan;Zhongying Zhao;Zhan Yang;Yanwei Yu;Chao Li","doi":"10.1109/TBDATA.2024.3375152","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3375152","url":null,"abstract":"Due to the widespread applications of heterogeneous graphs in the real world, heterogeneous graph neural networks (HGNNs) have developed rapidly and made a great success in recent years. To effectively capture the complex interactions in heterogeneous graphs, various attention mechanisms are widely used in designing HGNNs. However, the employment of these attention mechanisms brings two key problems: high computational complexity and poor robustness. To address these problems, we propose a \u0000<bold>Fast\u0000 and \u0000<bold>Ro\u0000bust attention-free \u0000<bold>H\u0000eterogeneous \u0000<bold>G\u0000raph \u0000<bold>C\u0000onvolutional \u0000<bold>N\u0000etwork (FastRo-HGCN) without any attention mechanisms. Specifically, we first construct virtual links based on the topology similarity and feature similarity of the nodes to strengthen the connections between the target nodes. Then, we design type normalization to aggregate and transfer the intra-type and inter-type node information. The above methods are used to reduce the interference of noisy information. Finally, we further enhance the robustness and relieve the negative effects of oversmoothing with the self-loops of nodes. Extensive experimental results on three real-world datasets fully demonstrate that the proposed FastRo-HGCN significantly outperforms the state-of-the-art models.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 5","pages":"669-681"},"PeriodicalIF":7.5,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PR3: Reversible and Usability-Enhanced Visual Privacy Protection via Thumbnail Preservation and Data Hiding PR3：可逆和可用性增强的视觉隐私保护，通过缩略图保存和数据隐藏

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-12 DOI: 10.1109/TBDATA.2024.3375155

Ruoyu Zhao;Yushu Zhang;Wenying Wen;Xinpeng Zhang;Xiaochun Cao;Yong Xiang

The image hosting platform is becoming increasingly popular due to its user-friendly features, but it is prone to causing privacy concerns. Only protecting privacy, in fact, can be easy to come true, but usability is frequently sacrificed. Visual privacy protection schemes aim to make a balance between privacy and usability, whereas they are often irreversible. Recently, some reversible visual privacy protection schemes have been proposed by preserving thumbnails (known as TPE). However, they either have excessive states in the Markov chain modeled by the scheme or cannot reverse losslessly. Meanwhile, images encrypted by existing TPE schemes can not embed additional information and thus the usability is limited to visual observation. In view of this, we pertinently propose a reversible and usability-enhanced visual privacy protection scheme (called PR3) based on thumbnail preservation and data hiding. In this scheme, we utilize the sum-preserving data embedding algorithm to substitute the the lowest seven bits of the image without changing the sum. Any data overflow resulting from the above process is stored in the vacated space of the most significant bits. The remaining space serves two purposes: embedding additional information and adjusting the image to approximate the thumbnail. Compared with existing TPE works, PR3 has fewer states in the Markov chain and supports lossless recovery of images. In addition, additional information can be embedded in the encrypted image to enhance usability.

由于其用户友好的特性，图像托管平台正变得越来越受欢迎，但它容易引起隐私问题。事实上，只有保护隐私才能很容易实现，但可用性经常被牺牲。视觉隐私保护方案旨在在隐私和可用性之间取得平衡，然而它们通常是不可逆转的。近年来，人们提出了一些基于保留缩略图的可逆视觉隐私保护方案（TPE）。然而，它们要么在方案建模的马尔可夫链中有过多的状态，要么不能无损地逆转。同时，现有的TPE加密方案无法嵌入附加信息，可用性仅限于视觉观察。鉴于此，我们针对性地提出了一种基于缩略图保存和数据隐藏的可逆性和可用性增强的视觉隐私保护方案（PR3）。在该方案中，我们利用保持和的数据嵌入算法在不改变和的情况下替换图像的最低7位。由上述过程引起的任何数据溢出都存储在最高有效位的空出空间中。剩余的空间用于两个目的：嵌入额外的信息和调整图像以接近缩略图。与现有的TPE作品相比，PR3的马尔可夫链状态更少，支持图像的无损恢复。此外，可以在加密图像中嵌入额外的信息，以增强可用性。

{"title":"PR3: Reversible and Usability-Enhanced Visual Privacy Protection via Thumbnail Preservation and Data Hiding","authors":"Ruoyu Zhao;Yushu Zhang;Wenying Wen;Xinpeng Zhang;Xiaochun Cao;Yong Xiang","doi":"10.1109/TBDATA.2024.3375155","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3375155","url":null,"abstract":"The image hosting platform is becoming increasingly popular due to its user-friendly features, but it is prone to causing privacy concerns. Only protecting privacy, in fact, can be easy to come true, but usability is frequently sacrificed. Visual privacy protection schemes aim to make a balance between privacy and usability, whereas they are often irreversible. Recently, some reversible visual privacy protection schemes have been proposed by preserving thumbnails (known as TPE). However, they either have excessive states in the Markov chain modeled by the scheme or cannot reverse losslessly. Meanwhile, images encrypted by existing TPE schemes can not embed additional information and thus the usability is limited to visual observation. In view of this, we pertinently propose a reversible and usability-enhanced visual privacy protection scheme (called PR3) based on thumbnail preservation and data hiding. In this scheme, we utilize the sum-preserving data embedding algorithm to substitute the the lowest seven bits of the image without changing the sum. Any data overflow resulting from the above process is stored in the vacated space of the most significant bits. The remaining space serves two purposes: embedding additional information and adjusting the image to approximate the thumbnail. Compared with existing TPE works, PR3 has fewer states in the Markov chain and supports lossless recovery of images. In addition, additional information can be embedded in the encrypted image to enhance usability.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"59-73"},"PeriodicalIF":7.5,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FuzzyPPI: Large-Scale Interaction of Human Proteome at Fuzzy Semantic Space 模糊语义空间中人类蛋白质组的大规模相互作用

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-08 DOI: 10.1109/TBDATA.2024.3375149

Anup Kumar Halder;Soumyendu Sekhar Bandyopadhyay;Witold Jedrzejewski;Subhadip Basu;Jacek Sroka

Large-scale protein-protein interaction (PPI) network of an organism provides key insights into its cellular and molecular functionalities, signaling pathways and underlying disease mechanisms. For any organism, the total unexplored protein interactions significantly outnumbers all known positive and negative interactions. For Human, all known PPI datasets contain only

$sim!! 5.61$

million positive and

$sim!! 0.76$

million negative interactions, which is

$sim!! 3.1$

% of potential interactions. We have implemented a distributed algorithm in Apache Spark that evaluates a Human PPI network of

$sim !! 180$

million potential interactions resulting from 18 994 reviewed proteins for which Gene Ontology (GO) annotations are available. The computed scores have been validated against state-of-the-art methods on benchmark datasets. FuzzyPPI performed significantly better with an average F1 score of 0.62 compared to GOntoSim (0.39), GOGO (0.38), and Wang (0.38) when tested with the Gold Standard PPI Dataset. The resulting scores are published with a web server for non-commercial use at http://fuzzyppi.mimuw.edu.pl/. Moreover, conventional PPI prediction methods produce binary results, but in fact this is just a simplification as PPIs have strengths or probabilities and recent studies show that protein binding affinities may prove to be effective in detecting protein complexes, disease association analysis, signaling network reconstruction, etc. Keeping these in mind, our algorithm is based on a fuzzy semantic scoring function and produces probabilities of interaction.

大型蛋白质-蛋白质相互作用（PPI）网络为生物体的细胞和分子功能、信号通路和潜在的疾病机制提供了关键的见解。对于任何生物体来说，未探明的蛋白质相互作用的总数远远超过所有已知的积极和消极的相互作用。对于人类，所有已知的PPI数据集只包含$sim!！561万美元的正数和$ $ $ $ ！76万美元的负面互动，这是$sim !！3.1$%的潜在相互作用。我们在Apache Spark中实现了一种分布式算法，用于评估人类PPI网络$sim !！1.8亿美元的潜在相互作用，由18 994个已审查的蛋白质产生，基因本体（GO）注释可用。计算的分数已经在基准数据集上针对最先进的方法进行了验证。与GOntoSim (0.39)， GOGO（0.38）和Wang（0.38）相比，FuzzyPPI在使用黄金标准PPI数据集进行测试时表现明显更好，平均F1得分为0.62。结果分数在http://fuzzyppi.mimuw.edu.pl/上与非商业用途的web服务器一起发布。此外，传统的PPI预测方法产生二元结果，但实际上这只是一种简化，因为PPI具有优势或概率，最近的研究表明，蛋白质结合亲和力可能在检测蛋白质复合物，疾病关联分析，信号网络重建等方面有效。记住这些，我们的算法基于模糊语义评分函数并产生交互概率。

{"title":"FuzzyPPI: Large-Scale Interaction of Human Proteome at Fuzzy Semantic Space","authors":"Anup Kumar Halder;Soumyendu Sekhar Bandyopadhyay;Witold Jedrzejewski;Subhadip Basu;Jacek Sroka","doi":"10.1109/TBDATA.2024.3375149","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3375149","url":null,"abstract":"Large-scale protein-protein interaction (PPI) network of an organism provides key insights into its cellular and molecular functionalities, signaling pathways and underlying disease mechanisms. For any organism, the total unexplored protein interactions significantly outnumbers all known positive and negative interactions. For Human, all known PPI datasets contain only <inline-formula><tex-math>$sim!! 5.61$</tex-math></inline-formula> million positive and <inline-formula><tex-math>$sim!! 0.76$</tex-math></inline-formula> million negative interactions, which is <inline-formula><tex-math>$sim!! 3.1$</tex-math></inline-formula>% of potential interactions. We have implemented a distributed algorithm in Apache Spark that evaluates a Human PPI network of <inline-formula><tex-math>$sim !! 180$</tex-math></inline-formula> million potential interactions resulting from 18 994 reviewed proteins for which Gene Ontology (GO) annotations are available. The computed scores have been validated against state-of-the-art methods on benchmark datasets. FuzzyPPI performed significantly better with an average F1 score of 0.62 compared to GOntoSim (0.39), GOGO (0.38), and Wang (0.38) when tested with the Gold Standard PPI Dataset. The resulting scores are published with a web server for non-commercial use at <uri>http://fuzzyppi.mimuw.edu.pl/</uri>. Moreover, conventional PPI prediction methods produce binary results, but in fact this is just a simplification as PPIs have strengths or probabilities and recent studies show that protein binding affinities may prove to be effective in detecting protein complexes, disease association analysis, signaling network reconstruction, etc. Keeping these in mind, our algorithm is based on a fuzzy semantic scoring function and produces probabilities of interaction.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"47-58"},"PeriodicalIF":7.5,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FAER: Fairness-Aware Event-Participant Recommendation in Event-Based Social Networks FAER：基于事件的社交网络中的公平意识事件参与者推荐

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-01 DOI: 10.1109/TBDATA.2024.3372409

Yuan Liang

The event-based social network (EBSN) is a new type of social network that combines online and offline networks. In recent years, an important task in EBSN recommendation systems has been to design better and more reasonable recommendation algorithms to improve the accuracy of recommendation and enhance user satisfaction. However, the current research seldom considers how to coordinate fairness among individual users and reduce the impact of individual unreasonable feedback in group event recommendation. In addition, when considering the fairness to individuals, the accuracy of recommendation is not greatly improved by fully incorporating the key context information. To solve these problems, we propose a prefiltering algorithm to filter the candidate event set, a multidimensional context recommendation method to provide personalized event recommendations for each user in the group, and a group consensus function fusion strategy to fuse the recommendation results of the members of the group. To improve overall satisfaction with the recommendations, we propose a ranking adjustment strategy for the key context. Finally, we verify the effectiveness of our proposed algorithm on real data sets and find that FAER is superior to the latest algorithms in terms of global satisfaction, distance satisfaction and user fairness.

基于事件的社交网络（EBSN）是一种结合了线上和线下网络的新型社交网络。近年来，EBSN 推荐系统的一个重要任务是设计更好、更合理的推荐算法，以提高推荐的准确性和用户满意度。然而，目前的研究很少考虑如何协调个体用户之间的公平性，减少个体不合理反馈对群体事件推荐的影响。此外，在考虑对个体的公平性时，并不能通过充分结合关键上下文信息来大幅提高推荐的准确性。为了解决这些问题，我们提出了一种预过滤算法来过滤候选事件集，一种多维情境推荐方法来为群组中的每个用户提供个性化的事件推荐，以及一种群组共识函数融合策略来融合群组成员的推荐结果。为了提高推荐结果的整体满意度，我们提出了关键情境的排序调整策略。最后，我们在真实数据集上验证了所提算法的有效性，发现 FAER 在全局满意度、距离满意度和用户公平性方面都优于最新算法。

{"title":"FAER: Fairness-Aware Event-Participant Recommendation in Event-Based Social Networks","authors":"Yuan Liang","doi":"10.1109/TBDATA.2024.3372409","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3372409","url":null,"abstract":"The \u0000<underline>e\u0000vent-\u0000<underline>b\u0000ased \u0000<underline>s\u0000ocial \u0000<underline>n\u0000etwork (EBSN) is a new type of social network that combines online and offline networks. In recent years, an important task in EBSN recommendation systems has been to design better and more reasonable recommendation algorithms to improve the accuracy of recommendation and enhance user satisfaction. However, the current research seldom considers how to coordinate fairness among individual users and reduce the impact of individual unreasonable feedback in group event recommendation. In addition, when considering the fairness to individuals, the accuracy of recommendation is not greatly improved by fully incorporating the key context information. To solve these problems, we propose a prefiltering algorithm to filter the candidate event set, a multidimensional context recommendation method to provide personalized event recommendations for each user in the group, and a group consensus function fusion strategy to fuse the recommendation results of the members of the group. To improve overall satisfaction with the recommendations, we propose a ranking adjustment strategy for the key context. Finally, we verify the effectiveness of our proposed algorithm on real data sets and find that FAER is superior to the latest algorithms in terms of global satisfaction, distance satisfaction and user fairness.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 5","pages":"655-668"},"PeriodicalIF":7.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Optimization Algorithm for Detection of Security Attacks in IoT-Enabled Cyber-Physical Systems 物联网网络物理系统安全攻击检测的混合优化算法

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data

Pub Date : 2024-03-01 DOI: 10.1109/TBDATA.2024.3372368

Amit Sagu;Nasib Singh Gill;Preeti Gulia;Ishaani Priyadarshini;Jyotir Moy Chatterjee

The Internet of Things (IoT) is being prominently used in smart cities and a wide range of applications in society. The benefits of IoT are evident, but cyber terrorism and security concerns inhibit many organizations and users from deploying it. Cyber-physical systems that are IoT-enabled might be difficult to secure since security solutions designed for general information/operational technology systems may not work as well in an environment. Thus, deep learning (DL) can assist as a powerful tool for building IoT-enabled cyber-physical systems with automatic anomaly detection. In this paper, two distinct DL models have been employed i.e., Deep Belief Network (DBN) and Convolutional Neural Network (CNN), considered hybrid classifiers, to create a framework for detecting attacks in IoT-enabled cyber-physical systems. However, DL models need to be trained in such a way that will increase their classification accuracy. Therefore, this paper also aims to present a new hybrid optimization algorithm called “Seagull Adapted Elephant Herding Optimization” (SAEHO) to tune the weights of the hybrid classifier. The “Hybrid Classifier + SAEHO” framework takes the feature extracted dataset as an input and classifies the network as either attack or benign. Using sensitivity, precision, accuracy, and specificity, two datasets were compared. In every performance metric, the proposed framework outperforms conventional methods.

物联网（IoT）在智慧城市和社会的广泛应用中得到突出应用。物联网的好处是显而易见的，但网络恐怖主义和安全问题阻碍了许多组织和用户部署物联网。支持物联网的网络物理系统可能难以保护，因为为一般信息/操作技术系统设计的安全解决方案可能无法在特定环境中正常工作。因此，深度学习（DL）可以作为构建具有自动异常检测功能的物联网网络物理系统的强大工具。在本文中，采用了两种不同的深度学习模型，即深度信念网络（DBN）和卷积神经网络（CNN），它们被认为是混合分类器，来创建一个框架，用于检测支持物联网的网络物理系统中的攻击。然而，DL模型需要以这样一种方式进行训练，以提高其分类精度。因此，本文还旨在提出一种新的混合优化算法“Seagull adaptive Elephant Herding optimization”（SAEHO）来调整混合分类器的权重。“Hybrid Classifier + SAEHO”框架将特征提取的数据集作为输入，对网络进行攻击和良性分类。通过灵敏度、精密度、准确度和特异性对两个数据集进行比较。在每个性能指标中，所提出的框架都优于传统方法。

{"title":"Hybrid Optimization Algorithm for Detection of Security Attacks in IoT-Enabled Cyber-Physical Systems","authors":"Amit Sagu;Nasib Singh Gill;Preeti Gulia;Ishaani Priyadarshini;Jyotir Moy Chatterjee","doi":"10.1109/TBDATA.2024.3372368","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3372368","url":null,"abstract":"The Internet of Things (IoT) is being prominently used in smart cities and a wide range of applications in society. The benefits of IoT are evident, but cyber terrorism and security concerns inhibit many organizations and users from deploying it. Cyber-physical systems that are IoT-enabled might be difficult to secure since security solutions designed for general information/operational technology systems may not work as well in an environment. Thus, deep learning (DL) can assist as a powerful tool for building IoT-enabled cyber-physical systems with automatic anomaly detection. In this paper, two distinct DL models have been employed i.e., Deep Belief Network (DBN) and Convolutional Neural Network (CNN), considered hybrid classifiers, to create a framework for detecting attacks in IoT-enabled cyber-physical systems. However, DL models need to be trained in such a way that will increase their classification accuracy. Therefore, this paper also aims to present a new hybrid optimization algorithm called “Seagull Adapted Elephant Herding Optimization” (SAEHO) to tune the weights of the hybrid classifier. The “Hybrid Classifier + SAEHO” framework takes the feature extracted dataset as an input and classifies the network as either attack or benign. Using sensitivity, precision, accuracy, and specificity, two datasets were compared. In every performance metric, the proposed framework outperforms conventional methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"35-46"},"PeriodicalIF":7.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0