Big Data Mining and Analytics最新文献_第3页

VDCM: A Data Collection Mechanism for Crowd Sensing in Vehicular Ad Hoc Networks VDCM:车辆自组织网络中人群感知的数据收集机制

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020041

Juli Yin;Linfeng Wei;Zhiquan Liu;Xi Yang;Hongliang Sun;Yudan Cheng;Jianbin Mai

With the rapid development of mobile devices, aggregation security and efficiency topics are more important than past in crowd sensing. When collecting large-scale vehicle-provided data, the data transmitted via autonomous networks are publicly accessible to all attackers, which increases the risk of vehicle exposure. So we need to ensure data aggregation security. In addition, low aggregation efficiency will lead to insufficient sensing data, making the data unable to provide data mining services. Aiming at the problem of aggregation security and efficiency in large-scale data collection, this article proposes a data collection mechanism (VDCM) for crowd sensing in vehicular ad hoc networks (VANETs). The mechanism includes two mechanism assumptions and selects appropriate methods to reduce consumption. It selects sub mechanism 1 when there exist very few vehicles or the coalition cannot be formed, otherwise selects sub mechanism 2. Single aggregation is used to collect data in sub mechanism 1. In sub mechanism 2, cooperative vehicles are selected by using coalition formation strategy and auction cooperation agreement, and multi aggregation is used to collect data. Two sub mechanisms use Paillier homomorphic encryption technology to ensure the security of data aggregation. In addition, mechanism supplements the data update and scoring steps to increase the amount of available data. The performance analysis shows that the mechanism proposed in this paper can safely aggregate data and reduce consumption. The simulation results indicate that the proposed mechanism reduces time consumption and increases the amount of available data compared with existing mechanisms.

随着移动设备的快速发展，聚集安全和效率问题在人群感知中比以往更加重要。在收集大规模车辆提供的数据时，通过自主网络传输的数据对所有攻击者都是公开的，这增加了车辆暴露的风险。因此，我们需要确保数据聚合的安全性。此外，聚合效率低会导致传感数据不足，使数据无法提供数据挖掘服务。针对大规模数据采集中的聚集安全性和效率问题，本文提出了一种用于车载自组织网络（VANET）人群感知的数据采集机制（VDCM）。该机制包括两个机制假设，并选择适当的方法来减少消耗。当车辆非常少或联盟无法形成时，它选择子机构1，否则选择子机构2。单个聚合用于在子机制1中收集数据。在子机制2中，通过联盟形成策略和拍卖合作协议来选择合作车辆，并使用多聚合来收集数据。两个子机制使用了Paillier同态加密技术来保证数据聚合的安全性。此外，该机制补充了数据更新和评分步骤，以增加可用数据的数量。性能分析表明，本文提出的机制可以安全地聚合数据，降低功耗。仿真结果表明，与现有机制相比，该机制减少了时间消耗，增加了可用数据量。

{"title":"VDCM: A Data Collection Mechanism for Crowd Sensing in Vehicular Ad Hoc Networks","authors":"Juli Yin;Linfeng Wei;Zhiquan Liu;Xi Yang;Hongliang Sun;Yudan Cheng;Jianbin Mai","doi":"10.26599/BDMA.2022.9020041","DOIUrl":"10.26599/BDMA.2022.9020041","url":null,"abstract":"With the rapid development of mobile devices, aggregation security and efficiency topics are more important than past in crowd sensing. When collecting large-scale vehicle-provided data, the data transmitted via autonomous networks are publicly accessible to all attackers, which increases the risk of vehicle exposure. So we need to ensure data aggregation security. In addition, low aggregation efficiency will lead to insufficient sensing data, making the data unable to provide data mining services. Aiming at the problem of aggregation security and efficiency in large-scale data collection, this article proposes a data collection mechanism (VDCM) for crowd sensing in vehicular ad hoc networks (VANETs). The mechanism includes two mechanism assumptions and selects appropriate methods to reduce consumption. It selects sub mechanism 1 when there exist very few vehicles or the coalition cannot be formed, otherwise selects sub mechanism 2. Single aggregation is used to collect data in sub mechanism 1. In sub mechanism 2, cooperative vehicles are selected by using coalition formation strategy and auction cooperation agreement, and multi aggregation is used to collect data. Two sub mechanisms use Paillier homomorphic encryption technology to ensure the security of data aggregation. In addition, mechanism supplements the data update and scoring steps to increase the amount of available data. The performance analysis shows that the mechanism proposed in this paper can safely aggregate data and reduce consumption. The simulation results indicate that the proposed mechanism reduces time consumption and increases the amount of available data compared with existing mechanisms.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"391-403"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233240.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48342786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Call for Papers: Special Issue on Edge AI Empowered Giant Model Training 论文征集：Edge AI赋能巨型模型训练特刊

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29

引用次数: 0

AI-Based Hybrid Models for Predicting Loan Risk in the Banking Sector 基于人工智能的银行业贷款风险预测混合模型

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020037

Vikas Kumar;Shaiku Shahida Saheb;Preeti;Atif Ghayas;Sunil Kumari;Jai Kishan Chandel;Saroj Kumar Pandey;Santosh Kumar

Every real-world scenario is now digitally replicated in order to reduce paperwork and human labor costs. Machine Learning (ML) models are also being used to make predictions in these applications. Accurate forecasting requires knowledge of these machine learning models and their distinguishing features. The datasets we use as input for each of these different types of ML models, yielding different results. The choice of an ML model for a dataset is critical. A loan risk model is used to show how ML models for a dataset can be linked together. The purpose of this study is to look into how we could use machine learning to quantify or forecast mortgage credit risk. This phrase refers to the process of evaluating massive amounts of data in order to derive useful information for making decisions in a variety of fields. If credit risk is considered, a method based on an examination of what caused and how mortgage credit risk affected credit defaults during the still-current economic crisis of 2021 will be tried. Various approaches to credit risk calculation will be examined, ranging from the most basic to the most complex. In addition, we will conduct a case study on a sample of mortgage loans and compare the results of three different analytical approaches, logistic regression, decision tree, and gradient boost to see which one produced the most commercially useful insights.

现在，为了减少文书工作和人力成本，每个真实世界的场景都被数字化复制。机器学习（ML）模型也被用于在这些应用中进行预测。准确的预测需要了解这些机器学习模型及其显著特征。我们使用的数据集作为这些不同类型的ML模型的输入，产生不同的结果。为数据集选择ML模型至关重要。贷款风险模型用于显示如何将数据集的ML模型链接在一起。本研究的目的是探讨我们如何使用机器学习来量化或预测抵押贷款信贷风险。这个短语指的是评估大量数据的过程，以便获得在各个领域做出决策的有用信息。如果考虑到信贷风险，将尝试一种基于对2021年当前经济危机期间抵押贷款信贷风险造成的原因以及如何影响信贷违约的研究的方法。将研究各种信用风险计算方法，从最基本的到最复杂的。此外，我们将对抵押贷款样本进行案例研究，并比较逻辑回归、决策树和梯度提升三种不同分析方法的结果，看看哪种方法产生了最具商业价值的见解。

{"title":"AI-Based Hybrid Models for Predicting Loan Risk in the Banking Sector","authors":"Vikas Kumar;Shaiku Shahida Saheb;Preeti;Atif Ghayas;Sunil Kumari;Jai Kishan Chandel;Saroj Kumar Pandey;Santosh Kumar","doi":"10.26599/BDMA.2022.9020037","DOIUrl":"10.26599/BDMA.2022.9020037","url":null,"abstract":"Every real-world scenario is now digitally replicated in order to reduce paperwork and human labor costs. Machine Learning (ML) models are also being used to make predictions in these applications. Accurate forecasting requires knowledge of these machine learning models and their distinguishing features. The datasets we use as input for each of these different types of ML models, yielding different results. The choice of an ML model for a dataset is critical. A loan risk model is used to show how ML models for a dataset can be linked together. The purpose of this study is to look into how we could use machine learning to quantify or forecast mortgage credit risk. This phrase refers to the process of evaluating massive amounts of data in order to derive useful information for making decisions in a variety of fields. If credit risk is considered, a method based on an examination of what caused and how mortgage credit risk affected credit defaults during the still-current economic crisis of 2021 will be tried. Various approaches to credit risk calculation will be examined, ranging from the most basic to the most complex. In addition, we will conduct a case study on a sample of mortgage loans and compare the results of three different analytical approaches, logistic regression, decision tree, and gradient boost to see which one produced the most commercially useful insights.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"478-490"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233246.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43857463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Personalized Federated Learning for Heterogeneous Residential Load Forecasting 异构住宅负荷预测的个性化联合学习

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020043

Xiaodong Qu;Chengcheng Guan;Gang Xie;Zhiyi Tian;Keshav Sood;Chaoli Sun;Lei Cui

Accurate load forecasting is critical for electricity production, transmission, and maintenance. Deep learning (DL) model has replaced other classical models as the most popular prediction models. However, the deep prediction model requires users to provide a large amount of private electricity consumption data, which has potential privacy risks. Edge nodes can federally train a global model through aggregation using federated learning (FL). As a novel distributed machine learning (ML) technique, it only exchanges model parameters without sharing raw data. However, existing forecasting methods based on FL still face challenges from data heterogeneity and privacy disclosure. Accordingly, we propose a user-level load forecasting system based on personalized federated learning (PFL) to address these issues. The obtained personalized model outperforms the global model on local data. Further, we introduce a novel differential privacy (DP) algorithm in the proposed system to provide an additional privacy guarantee. Based on the principle of generative adversarial network (GAN), the algorithm achieves the balance between privacy and prediction accuracy throughout the game. We perform simulation experiments on the real-world dataset and the experimental results show that the proposed system can comply with the requirement for accuracy and privacy in real load forecasting scenarios.

准确的负荷预测对于电力生产、输电和维护至关重要。深度学习（DL）模型已经取代其他经典模型成为最流行的预测模型。然而，深度预测模型需要用户提供大量私人用电数据，这存在潜在的隐私风险。边缘节点可以使用联合学习（FL）通过聚合来联合训练全局模型。作为一种新型的分布式机器学习技术，它只交换模型参数，不共享原始数据。然而，现有的基于FL的预测方法仍然面临着数据异质性和隐私披露的挑战。因此，我们提出了一个基于个性化联合学习（PFL）的用户级负荷预测系统来解决这些问题。所获得的个性化模型在局部数据上优于全局模型。此外，我们在所提出的系统中引入了一种新的差分隐私（DP）算法，以提供额外的隐私保证。基于生成对抗性网络（GAN）的原理，该算法在整个游戏中实现了隐私和预测准确性之间的平衡。我们在真实世界的数据集上进行了仿真实验，实验结果表明，所提出的系统能够满足真实负荷预测场景中对准确性和隐私性的要求。

{"title":"Personalized Federated Learning for Heterogeneous Residential Load Forecasting","authors":"Xiaodong Qu;Chengcheng Guan;Gang Xie;Zhiyi Tian;Keshav Sood;Chaoli Sun;Lei Cui","doi":"10.26599/BDMA.2022.9020043","DOIUrl":"10.26599/BDMA.2022.9020043","url":null,"abstract":"Accurate load forecasting is critical for electricity production, transmission, and maintenance. Deep learning (DL) model has replaced other classical models as the most popular prediction models. However, the deep prediction model requires users to provide a large amount of private electricity consumption data, which has potential privacy risks. Edge nodes can federally train a global model through aggregation using federated learning (FL). As a novel distributed machine learning (ML) technique, it only exchanges model parameters without sharing raw data. However, existing forecasting methods based on FL still face challenges from data heterogeneity and privacy disclosure. Accordingly, we propose a user-level load forecasting system based on personalized federated learning (PFL) to address these issues. The obtained personalized model outperforms the global model on local data. Further, we introduce a novel differential privacy (DP) algorithm in the proposed system to provide an additional privacy guarantee. Based on the principle of generative adversarial network (GAN), the algorithm achieves the balance between privacy and prediction accuracy throughout the game. We perform simulation experiments on the real-world dataset and the experimental results show that the proposed system can comply with the requirement for accuracy and privacy in real load forecasting scenarios.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"421-432"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233242.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48886250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

K-Means Clustering with Local Distance Privacy 具有局部距离隐私的K-Means聚类

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020050

Mengmeng Yang;Longxia Huang;Chenghua Tang

With the development of information technology, a mass of data are generated every day. Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition. K-means clustering has been widely used for cluster analysis in real life. However, these analyses are based on users' data, which disclose users' privacy. Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis. However, existing $K$-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee. To solve this problem, we propose a novel method that provides local distance privacy for users who participate in the clustering analysis. Instead of making the users' records in-distinguish from each other in high-dimensional space, we map the user's record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other. To be specific, we generate a noisy distance first and then synthesize the high-dimensional data record. We propose a Bounded Laplace Method (BLM) and a Cluster Indistinguishable Method (CIM) to sample such a noisy distance, which satisfies the local differential privacy guarantee and local dE-privacy guarantee, respectively. Furthermore, we introduce a way to generate synthetic data records in high-dimensional space. Our experimental evaluation results show that our methods outperform the traditional methods significantly.

随着信息技术的发展，每天都会产生大量的数据。收集和分析这些数据有助于服务提供商改善服务，并在激烈的市场竞争中获得优势。K-means聚类在实际生活中被广泛应用于聚类分析。然而，这些分析是基于用户的数据，这些数据披露了用户的隐私。局部差分隐私由于其强大的隐私保障，近年来引起了人们的广泛关注，并被应用于聚类分析。然而，现有的具有局部差分隐私保护的$K$-均值聚类方法由于在整个数据集中引入了大量噪声以确保隐私保证，因此无法获得理想的聚类结果。为了解决这个问题，我们提出了一种新的方法，为参与聚类分析的用户提供本地距离隐私。我们没有在高维空间中使用户的记录相互区分，而是将用户的记录映射到一维距离空间中，并使这种距离空间中的记录不相互区分。具体来说，我们首先生成一个有噪声的距离，然后合成高维数据记录。我们提出了一种有界拉普拉斯方法（BLM）和一种聚类不可分辨方法（CIM）来对这种噪声距离进行采样，分别满足局部差分隐私保证和局部dE隐私保证。此外，我们还介绍了一种在高维空间中生成合成数据记录的方法。我们的实验评估结果表明，我们的方法显著优于传统方法。

{"title":"K-Means Clustering with Local Distance Privacy","authors":"Mengmeng Yang;Longxia Huang;Chenghua Tang","doi":"10.26599/BDMA.2022.9020050","DOIUrl":"10.26599/BDMA.2022.9020050","url":null,"abstract":"With the development of information technology, a mass of data are generated every day. Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition. K-means clustering has been widely used for cluster analysis in real life. However, these analyses are based on users' data, which disclose users' privacy. Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis. However, existing \u0000<tex>$K$</tex>\u0000-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee. To solve this problem, we propose a novel method that provides local distance privacy for users who participate in the clustering analysis. Instead of making the users' records in-distinguish from each other in high-dimensional space, we map the user's record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other. To be specific, we generate a noisy distance first and then synthesize the high-dimensional data record. We propose a Bounded Laplace Method (BLM) and a Cluster Indistinguishable Method (CIM) to sample such a noisy distance, which satisfies the local differential privacy guarantee and local d\u0000<inf>E</inf>\u0000-privacy guarantee, respectively. Furthermore, we introduce a way to generate synthetic data records in high-dimensional space. Our experimental evaluation results show that our methods outperform the traditional methods significantly.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"433-442"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233248.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46837075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Elastic Optimization for Stragglers in Edge Federated Learning 边缘联邦学习中掉队者的弹性优化

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020046

Khadija Sultana;Khandakar Ahmed;Bruce Gu;Hua Wang

To fully exploit enormous data generated by intelligent devices in edge computing, edge federated learning (EFL) is envisioned as a promising solution. The distributed collaborative training in EFL deals with delay and privacy issues compared to traditional centralized model training. However, the existence of straggling devices, responding slow to servers, degrades model performance. We consider data heterogeneity from two aspects: high dimensional data generated at edge devices where the number of features is greater than that of observations and the heterogeneity caused by partial device participation. With large number of features, computation overhead on the devices increases, causing edge devices to become stragglers. And incorporation of partial training results causes gradients to be diverged which further exaggerates when more training is performed to reach local optima. In this paper, we introduce elastic optimization methods for stragglers due to data heterogeneity in edge federated learning. Specifically, we define the problem of stragglers in EFL. Then, we formulate an optimization problem to be solved at edge devices. We customize a benchmark algorithm, FedAvg, to obtain a new elastic optimization algorithm (FedEN) which is applied in local training of edge devices. FedEN mitigates stragglers by having a balance between lasso and ridge penalization thereby generating sparse model updates and enforcing parameters as close as to local optima. We have evaluated the proposed model on MNIST and CIFAR-10 datasets. Simulated experiments demonstrate that our approach improves run time training performance by achieving average accuracy with less communication rounds. The results confirm the improved performance of our approach over benchmark algorithms.

为了在边缘计算中充分利用智能设备生成的大量数据，边缘联合学习（EFL）被认为是一种很有前途的解决方案。与传统的集中式模式训练相比，EFL中的分布式协作训练处理了延迟和隐私问题。然而，零散设备的存在，对服务器的响应缓慢，降低了模型的性能。我们从两个方面考虑数据异质性：在特征数量大于观测数量的边缘设备上生成的高维数据，以及部分设备参与引起的异质性。随着大量特征的出现，设备上的计算开销增加，导致边缘设备变得掉队。部分训练结果的结合会导致梯度发散，当进行更多训练以达到局部最优时，这会进一步夸大。在本文中，我们介绍了边缘联合学习中由于数据异构而导致掉队者的弹性优化方法。具体来说，我们定义了英语中的掉队者问题。然后，我们提出了一个要在边缘设备上解决的优化问题。我们定制了一个基准算法FedAvg，以获得一种新的弹性优化算法（FedEN），该算法应用于边缘设备的局部训练。FedEN通过在套索和山脊惩罚之间保持平衡来缓解掉队者，从而生成稀疏模型更新并强制执行接近局部最优的参数。我们已经在MNIST和CIFAR-10数据集上评估了所提出的模型。模拟实验表明，我们的方法通过减少通信轮次来实现平均精度，从而提高了运行时训练性能。结果证实了我们的方法相对于基准算法的改进性能。

{"title":"Elastic Optimization for Stragglers in Edge Federated Learning","authors":"Khadija Sultana;Khandakar Ahmed;Bruce Gu;Hua Wang","doi":"10.26599/BDMA.2022.9020046","DOIUrl":"10.26599/BDMA.2022.9020046","url":null,"abstract":"To fully exploit enormous data generated by intelligent devices in edge computing, edge federated learning (EFL) is envisioned as a promising solution. The distributed collaborative training in EFL deals with delay and privacy issues compared to traditional centralized model training. However, the existence of straggling devices, responding slow to servers, degrades model performance. We consider data heterogeneity from two aspects: high dimensional data generated at edge devices where the number of features is greater than that of observations and the heterogeneity caused by partial device participation. With large number of features, computation overhead on the devices increases, causing edge devices to become stragglers. And incorporation of partial training results causes gradients to be diverged which further exaggerates when more training is performed to reach local optima. In this paper, we introduce elastic optimization methods for stragglers due to data heterogeneity in edge federated learning. Specifically, we define the problem of stragglers in EFL. Then, we formulate an optimization problem to be solved at edge devices. We customize a benchmark algorithm, FedAvg, to obtain a new elastic optimization algorithm (FedEN) which is applied in local training of edge devices. FedEN mitigates stragglers by having a balance between lasso and ridge penalization thereby generating sparse model updates and enforcing parameters as close as to local optima. We have evaluated the proposed model on MNIST and CIFAR-10 datasets. Simulated experiments demonstrate that our approach improves run time training performance by achieving average accuracy with less communication rounds. The results confirm the improved performance of our approach over benchmark algorithms.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"404-420"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233241.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47729450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A PLS-SEM Based Approach: Analyzing Generation Z Purchase Intention Through Facebook's Big Data 基于PLS-SEM的方法：通过Facebook的大数据分析Z世代的购买意愿

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020033

Vikas Kumar;Preeti;Shaiku Shahida Saheb;Sunil Kumari;Kanishka Pathak;Jai Kishan Chandel;Neeraj Varshney;Ankit Kumar

The objective of this paper is to provide a better rendition of Generation Z purchase intentions of retail products through Facebook. The study gyrated around the favorable attitude formation of Generation Z translating into intentions to purchase retail products through Facebook. The role of antecedents of attitude, namely enjoyment, credibility, and peer communication was also explored. The main purpose was to analyze the F-commerce pervasiveness (retail purchases through Facebook) among Generation Z in India and how could it be materialized effectively. A conceptual façade was proposed after trotting out germane and urbane literature. The study focused exclusively on Generation Z population. The data were statistically analyzed using partial least squares structural equation modelling. The study found the proposed conceptual model had a high prediction power of Generation Z intentions to purchase retail products through Facebook verifying the materialization of F-commerce. Enjoyment, credibility, and peer communication were proved to be good predictors of attitude (R²=0.589) and furthermore attitude was found to be a stellar antecedent to purchase intentions (R²=0.540).

本文的目的是通过Facebook更好地再现Z世代对零售产品的购买意图。这项研究围绕着Z世代形成的有利态度转变为通过Facebook购买零售产品的意图展开。还探讨了态度的前因，即享受、可信度和同伴交流的作用。主要目的是分析F商务在印度Z世代中的普遍性（通过Facebook进行零售购买），以及如何有效地实现它。一个概念性的外观是在抛出德国和城市文学之后提出的。这项研究只关注Z世代人群。使用偏最小二乘结构方程模型对数据进行统计分析。研究发现，所提出的概念模型对Z世代通过Facebook购买零售产品的意图具有很高的预测力，验证了F-commerce的物化。乐趣、可信度和同伴交流被证明是态度的良好预测因素（R2=0.589），此外，态度被发现是购买意愿的主要前提（R2=0.540）。

{"title":"A PLS-SEM Based Approach: Analyzing Generation Z Purchase Intention Through Facebook's Big Data","authors":"Vikas Kumar;Preeti;Shaiku Shahida Saheb;Sunil Kumari;Kanishka Pathak;Jai Kishan Chandel;Neeraj Varshney;Ankit Kumar","doi":"10.26599/BDMA.2022.9020033","DOIUrl":"10.26599/BDMA.2022.9020033","url":null,"abstract":"The objective of this paper is to provide a better rendition of Generation Z purchase intentions of retail products through Facebook. The study gyrated around the favorable attitude formation of Generation Z translating into intentions to purchase retail products through Facebook. The role of antecedents of attitude, namely enjoyment, credibility, and peer communication was also explored. The main purpose was to analyze the F-commerce pervasiveness (retail purchases through Facebook) among Generation Z in India and how could it be materialized effectively. A conceptual façade was proposed after trotting out germane and urbane literature. The study focused exclusively on Generation Z population. The data were statistically analyzed using partial least squares structural equation modelling. The study found the proposed conceptual model had a high prediction power of Generation Z intentions to purchase retail products through Facebook verifying the materialization of F-commerce. Enjoyment, credibility, and peer communication were proved to be good predictors of attitude (R\u0000<sup>2</sup>\u0000=0.589) and furthermore attitude was found to be a stellar antecedent to purchase intentions (R\u0000<sup>2</sup>\u0000=0.540).","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"491-503"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233245.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46940167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Privacy-Aware and Trustworthy Data Sharing Using Blockchain for Edge Intelligence 利用区块链实现边缘智能的隐私感知和可信数据共享

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2023.9020012

Youyang Qu;Lichuan Ma;Wenjie Ye;Xuemeng Zhai;Shui Yu;Yunfeng Li;David Smith

The popularization of intelligent healthcare devices and big data analytics significantly boosts the development of Smart Healthcare Networks (SHNs). To enhance the precision of diagnosis, different participants in SHNs share health data that contain sensitive information. Therefore, the data exchange process raises privacy concerns, especially when the integration of health data from multiple sources (linkage attack) results in further leakage. Linkage attack is a type of dominant attack in the privacy domain, which can leverage various data sources for private data mining. Furthermore, adversaries launch poisoning attacks to falsify the health data, which leads to misdiagnosing or even physical damage. To protect private health data, we propose a personalized differential privacy model based on the trust levels among users. The trust is evaluated by a defined community density, while the corresponding privacy protection level is mapped to controllable randomized noise constrained by differential privacy. To avoid linkage attacks in personalized differential privacy, we design a noise correlation decoupling mechanism using a Markov stochastic process. In addition, we build the community model on a blockchain, which can mitigate the risk of poisoning attacks during differentially private data transmission over SHNs. Extensive experiments and analysis on real-world datasets have testified the proposed model, and achieved better performance compared with existing research from perspectives of privacy protection and effectiveness.

智能医疗设备和大数据分析的普及大大推动了智能医疗网络（SHN）的发展。为了提高诊断的准确性，SHN的不同参与者共享包含敏感信息的健康数据。因此，数据交换过程引发了隐私问题，尤其是当来自多个来源的健康数据集成（链接攻击）导致进一步泄露时。链接攻击是隐私领域的一种主要攻击，它可以利用各种数据源进行私人数据挖掘。此外，对手发动中毒攻击以伪造健康数据，从而导致误诊甚至身体损伤。为了保护私人健康数据，我们提出了一个基于用户之间信任水平的个性化差异隐私模型。信任通过定义的社区密度来评估，而相应的隐私保护级别被映射到受差分隐私约束的可控随机噪声。为了避免个性化差分隐私中的链接攻击，我们使用马尔可夫随机过程设计了一种噪声相关解耦机制。此外，我们在区块链上建立了社区模型，可以降低SHN上差异私有数据传输过程中中毒攻击的风险。在真实世界数据集上进行的大量实验和分析验证了所提出的模型，并从隐私保护和有效性的角度与现有研究相比取得了更好的性能。

{"title":"Towards Privacy-Aware and Trustworthy Data Sharing Using Blockchain for Edge Intelligence","authors":"Youyang Qu;Lichuan Ma;Wenjie Ye;Xuemeng Zhai;Shui Yu;Yunfeng Li;David Smith","doi":"10.26599/BDMA.2023.9020012","DOIUrl":"10.26599/BDMA.2023.9020012","url":null,"abstract":"The popularization of intelligent healthcare devices and big data analytics significantly boosts the development of Smart Healthcare Networks (SHNs). To enhance the precision of diagnosis, different participants in SHNs share health data that contain sensitive information. Therefore, the data exchange process raises privacy concerns, especially when the integration of health data from multiple sources (linkage attack) results in further leakage. Linkage attack is a type of dominant attack in the privacy domain, which can leverage various data sources for private data mining. Furthermore, adversaries launch poisoning attacks to falsify the health data, which leads to misdiagnosing or even physical damage. To protect private health data, we propose a personalized differential privacy model based on the trust levels among users. The trust is evaluated by a defined community density, while the corresponding privacy protection level is mapped to controllable randomized noise constrained by differential privacy. To avoid linkage attacks in personalized differential privacy, we design a noise correlation decoupling mechanism using a Markov stochastic process. In addition, we build the community model on a blockchain, which can mitigate the risk of poisoning attacks during differentially private data transmission over SHNs. Extensive experiments and analysis on real-world datasets have testified the proposed model, and achieved better performance compared with existing research from perspectives of privacy protection and effectiveness.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"443-464"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233247.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48363167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

τSQWRL: A TSQL2-Like Query Language for Temporal Ontologies Generated from JSON Big Data τSQWRL：一种类似TSQL2的JSON大数据时态本体查询语言

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-04-07 DOI: 10.26599/BDMA.2022.9020044

Zouhaier Brahmia;Fabio Grandi;Rafik Bouaziz

Temporal ontologies allow to represent not only concepts, their properties, and their relationships, but also time-varying information through explicit versioning of definitions or through the four-dimensional perdurantist view. They are widely used to formally represent temporal data semantics in several applications belonging to different fields (e.g., Semantic Web, expert systems, knowledge bases, big data, and artificial intelligence). They facilitate temporal knowledge representation and discovery, with the support of temporal data querying and reasoning. However, there is no standard or consensual temporal ontology query language. In a previous work, we have proposed an approach named τJOWL (temporal OWL 2 from temporal JSON, where OWL 2 stands for “OWL 2 Web Ontology Language” and JSON stands for “JavaScript Object Notation”). τJOWL allows (1) to automatically build a temporal OWL 2 ontology of data, following the Closed World Assumption (CWA), from temporal JSON-based big data, and (2) to manage its incremental maintenance accommodating their evolution, in a temporal and multi-schema-version environment. In this paper, we propose a temporal ontology query language for rJOWL, named rSQWRL (temporal SQWRL), designed as a temporal extension of the ontology query language-Semantic Query-enhanced Web Rule Language (SQWRL). The new language has been inspired by the features of the consensual temporal query language TSQL2 (Temporal SQL2), well known in the temporal (relational) database community. The aim of the proposal is to enable and simplify the task of retrieving any desired ontology version or of specifying any (complex) temporal query on time-varying ontologies generated from time-varying big data. Some examples, in the Internet of Healthcare Things (IoHT) domain, are provided to motivate and illustrate our proposal.

时间本体不仅可以表示概念、它们的属性和它们的关系，还可以通过定义的显式版本控制或通过四维持久主义视图来表示时变信息。它们被广泛用于在属于不同领域的几个应用程序（例如，语义网、专家系统、知识库、大数据和人工智能）中正式表示时态数据语义。它们在时间数据查询和推理的支持下，促进了时间知识的表示和发现。然而，目前还没有标准的或一致的时态本体查询语言。在之前的工作中，我们提出了一种名为τJOWL的方法（时态OWL2来自时态JSON，其中OWL2代表“OWL2 Web本体语言”，JSON代表“JavaScript对象表示法”）。τJOWL允许（1）根据封闭世界假设（CWA），从基于时态JSON的大数据中自动构建时态OWL2数据本体，以及（2）在时态和多模式版本环境中管理其增量维护，以适应其演变。在本文中，我们为rJOWL提出了一种时态本体查询语言，称为rSQWRL（时态SQWRL），它是本体查询语言语义查询增强型Web规则语言（SQWRL，Semantic query enhanced Web Rule language）的时态扩展。这种新语言的灵感来自于一致时态查询语言TSQL2（TemporalSQL2）的特性，TSQL2在时态（关系）数据库社区中很有名。该提案的目的是实现并简化检索任何期望的本体版本的任务，或指定对由时变大数据生成的时变本体的任何（复杂）时间查询的任务。提供了医疗保健物联网（IoHT）领域的一些例子来激励和说明我们的建议。

{"title":"τSQWRL: A TSQL2-Like Query Language for Temporal Ontologies Generated from JSON Big Data","authors":"Zouhaier Brahmia;Fabio Grandi;Rafik Bouaziz","doi":"10.26599/BDMA.2022.9020044","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020044","url":null,"abstract":"Temporal ontologies allow to represent not only concepts, their properties, and their relationships, but also time-varying information through explicit versioning of definitions or through the four-dimensional perdurantist view. They are widely used to formally represent temporal data semantics in several applications belonging to different fields (e.g., Semantic Web, expert systems, knowledge bases, big data, and artificial intelligence). They facilitate temporal knowledge representation and discovery, with the support of temporal data querying and reasoning. However, there is no standard or consensual temporal ontology query language. In a previous work, we have proposed an approach named τJOWL (temporal OWL 2 from temporal JSON, where OWL 2 stands for “OWL 2 Web Ontology Language” and JSON stands for “JavaScript Object Notation”). τJOWL allows (1) to automatically build a temporal OWL 2 ontology of data, following the Closed World Assumption (CWA), from temporal JSON-based big data, and (2) to manage its incremental maintenance accommodating their evolution, in a temporal and multi-schema-version environment. In this paper, we propose a temporal ontology query language for rJOWL, named rSQWRL (temporal SQWRL), designed as a temporal extension of the ontology query language-Semantic Query-enhanced Web Rule Language (SQWRL). The new language has been inspired by the features of the consensual temporal query language TSQL2 (Temporal SQL2), well known in the temporal (relational) database community. The aim of the proposal is to enable and simplify the task of retrieving any desired ontology version or of specifying any (complex) temporal query on time-varying ontologies generated from time-varying big data. Some examples, in the Internet of Healthcare Things (IoHT) domain, are provided to motivate and illustrate our proposal.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 3","pages":"288-300"},"PeriodicalIF":13.6,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10097649/10097652.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67837480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Call for Papers: Special Issue on Intelligent Network Video Advances Based on Transformers 论文征集：基于变压器的智能网络视频进展特刊

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-04-07 DOI: 10.26599/BDMA.2022.9020053

引用次数: 0