首页 > 最新文献

Big Data Mining and Analytics最新文献

英文 中文
IoTDQ: An Industrial IoT Data Analysis Library for Apache IoTDB IoTDQ:适用于 Apache IoTDB 的工业物联网数据分析库
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020010
Pengyu Chen;Wendi He;Wenxuan Ma;Xiangdong Huang;Chen Wang
There is a growing demand for time series data analysis in industry areas. Apache IoTDB is a time series database designed for the Internet of Things (IoT) with enhanced storage and I/O performance. With User-Defined Functions (UDF) provided, computation for time series can be executed on Apache IoTDB directly. To satisfy most of the common requirements in industrial time series analysis, we create a UDF library, IoTDQ, on Apache IoTDB. This library integrates stream computation functions on data quality analysis, data profiling, anomaly detection, data repairing, etc. IoTDQ enables users to conduct a wide range of analyses, such as monitoring, error diagnosis, equipment reliability analysis. It provides a framework for users to examine IoT time series with data quality problems. Experiments show that IoTDQ keeps the same level of performance compared to mainstream alternatives, and shortens I/O consumption for Apache IoTDB users.
工业领域对时间序列数据分析的需求日益增长。Apache IoTDB 是专为物联网(IoT)设计的时间序列数据库,具有更强的存储和 I/O 性能。通过提供用户自定义函数(UDF),可以直接在 Apache IoTDB 上执行时间序列计算。为了满足工业时间序列分析中的大多数常见要求,我们在 Apache IoTDB 上创建了一个 UDF 库 IoTDQ。该库集成了数据质量分析、数据剖析、异常检测、数据修复等流计算功能。IoTDQ 使用户能够进行各种分析,如监控、错误诊断、设备可靠性分析等。它为用户检查存在数据质量问题的物联网时间序列提供了一个框架。实验表明,与主流替代方案相比,IoTDQ 保持了相同的性能水平,并缩短了 Apache IoTDB 用户的 I/O 消耗。
{"title":"IoTDQ: An Industrial IoT Data Analysis Library for Apache IoTDB","authors":"Pengyu Chen;Wendi He;Wenxuan Ma;Xiangdong Huang;Chen Wang","doi":"10.26599/BDMA.2023.9020010","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020010","url":null,"abstract":"There is a growing demand for time series data analysis in industry areas. Apache IoTDB is a time series database designed for the Internet of Things (IoT) with enhanced storage and I/O performance. With User-Defined Functions (UDF) provided, computation for time series can be executed on Apache IoTDB directly. To satisfy most of the common requirements in industrial time series analysis, we create a UDF library, IoTDQ, on Apache IoTDB. This library integrates stream computation functions on data quality analysis, data profiling, anomaly detection, data repairing, etc. IoTDQ enables users to conduct a wide range of analyses, such as monitoring, error diagnosis, equipment reliability analysis. It provides a framework for users to examine IoT time series with data quality problems. Experiments show that IoTDQ keeps the same level of performance compared to mainstream alternatives, and shortens I/O consumption for Apache IoTDB users.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"29-41"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372952","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Call for Papers: Special Issue on Challenges and Opportunities in Biomedical Big Data Analysis: From Large Language Models to Clinical Applications 征稿:生物医学大数据分析的挑战与机遇特刊:从大型语言模型到临床应用
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020026
{"title":"Call for Papers: Special Issue on Challenges and Opportunities in Biomedical Big Data Analysis: From Large Language Models to Clinical Applications","authors":"","doi":"10.26599/BDMA.2023.9020026","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020026","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"244-244"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372958","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular Generation and Optimization of Molecular Properties Using a Transformer Model 分子生成和使用变压器模型优化分子特性
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020009
Zhongyin Xu;Xiujuan Lei;Mei Ma;Yi Pan
Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12 365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.
生成满足特定性质的新型分子是现代药物发现中一项具有挑战性的任务,它需要在满足化学规则的基础上优化特定目标。在这里,我们的目标是优化特定分子的特性,以满足生成分子的特定特性。这里使用的是包含源分子和目标分子的匹配分子对(MMPs),并选择对数密度(logD)和溶解度(solubility)作为优化属性。主要的创新工作在于从矩阵维度的角度计算特定转换器的相关数据。然后,利用阈值区间和状态变化对 logD 和溶解度进行编码,以便进行后续测试。在实验过程中,我们根据各组中重原子占所有原子的比例来筛选数据,并分别选择 12 365、1503 和 1570 个 MMP 作为训练集、验证集和测试集。在生成具有特定性质的分子的能力方面,将变换器模型与基线模型进行了比较。结果表明,变换器模型可以准确地优化源分子以满足特定属性。
{"title":"Molecular Generation and Optimization of Molecular Properties Using a Transformer Model","authors":"Zhongyin Xu;Xiujuan Lei;Mei Ma;Yi Pan","doi":"10.26599/BDMA.2023.9020009","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020009","url":null,"abstract":"Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12 365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"142-155"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental Data Stream Classification with Adaptive Multi-Task Multi-View Learning 利用自适应多任务多视图学习进行增量数据流分类
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020006
Jun Wang;Maiwang Shi;Xiao Zhang;Yan Li;Yunsheng Yuan;Chenglei Yang;Dongxiao Yu
With the enhancement of data collection capabilities, massive streaming data have been accumulated in numerous application scenarios. Specifically, the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple sensors. Existing incremental learning methods are often single-task single-view, which cannot learn shared representations between relevant tasks and views. An adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges, utilizing the idea of multi-task multi-view learning. Specifically, the attention mechanism is first used to align different sensor data of different views. In addition, MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental learning. Results reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines.
随着数据收集能力的增强,在众多应用场景中积累了大量的流数据。具体来说,基于移动传感器的数据流分类问题可以形式化为一个多任务多视图学习问题,具体任务包括从多个传感器收集的具有共享特征的多个视图。现有的增量学习方法通常是单任务单视图的,无法学习相关任务和视图之间的共享表征。为了应对上述挑战,我们利用多任务多视图学习的思想,提出了一种用于数据流分类的自适应多任务多视图增量学习框架,称为 MTMVIS。具体来说,首先利用注意力机制对不同视角的传感器数据进行对齐。此外,MTMVIS 还从多任务多视图学习的角度出发,使用自适应 Fisher 正则化来克服增量学习中的灾难性遗忘。结果表明,根据在两个不同数据集上与其他基线进行的实验,所提出的框架优于最先进的方法。
{"title":"Incremental Data Stream Classification with Adaptive Multi-Task Multi-View Learning","authors":"Jun Wang;Maiwang Shi;Xiao Zhang;Yan Li;Yunsheng Yuan;Chenglei Yang;Dongxiao Yu","doi":"10.26599/BDMA.2023.9020006","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020006","url":null,"abstract":"With the enhancement of data collection capabilities, massive streaming data have been accumulated in numerous application scenarios. Specifically, the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple sensors. Existing incremental learning methods are often single-task single-view, which cannot learn shared representations between relevant tasks and views. An adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges, utilizing the idea of multi-task multi-view learning. Specifically, the attention mechanism is first used to align different sensor data of different views. In addition, MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental learning. Results reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"87-106"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discriminatively Constrained Semi-Supervised Multi-View Nonnegative Matrix Factorization with Graph Regularization 带图正则化的判别约束半监督多视图非负矩阵因式分解
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020004
Guosheng Cui;Ye Li;Jianzhong Li;Jianping Fan
Nonnegative Matrix Factorization (NMF) is one of the most popular feature learning technologies in the field of machine learning and pattern recognition. It has been widely used and studied in the multi-view clustering tasks because of its effectiveness. This study proposes a general semi-supervised multi-view nonnegative matrix factorization algorithm. This algorithm incorporates discriminative and geometric information on data to learn a better-fused representation, and adopts a feature normalizing strategy to align the different views. Two specific implementations of this algorithm are developed to validate the effectiveness of the proposed framework: Graph regularization based Discriminatively Constrained Multi-View Nonnegative Matrix Factorization (GDCMVNMF) and Extended Multi-View Constrained Nonnegative Matrix Factorization (ExMVCNMF). The intrinsic connection between these two specific implementations is discussed, and the optimization based on multiply update rules is presented. Experiments on six datasets show that the effectiveness of GDCMVNMF and ExMVCNMF outperforms several representative unsupervised and semi-supervised multi-view NMF approaches.
非负矩阵分解(NMF)是机器学习和模式识别领域最流行的特征学习技术之一。由于其高效性,它在多视图聚类任务中得到了广泛的应用和研究。本研究提出了一种通用的半监督多视角非负矩阵因式分解算法。该算法结合了数据的判别信息和几何信息,以学习更好的融合表示,并采用特征归一化策略来调整不同视图。我们开发了该算法的两个具体实现,以验证所提框架的有效性:基于图形正则化的判别约束多视图非负矩阵分解(GDCMVNMF)和扩展多视图约束非负矩阵分解(ExMVCNMF)。本文讨论了这两种具体实现之间的内在联系,并介绍了基于乘法更新规则的优化方法。在六个数据集上进行的实验表明,GDCMVNMF 和 ExMVCNMF 的效果优于几种代表性的无监督和半监督多视图 NMF 方法。
{"title":"Discriminatively Constrained Semi-Supervised Multi-View Nonnegative Matrix Factorization with Graph Regularization","authors":"Guosheng Cui;Ye Li;Jianzhong Li;Jianping Fan","doi":"10.26599/BDMA.2023.9020004","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020004","url":null,"abstract":"Nonnegative Matrix Factorization (NMF) is one of the most popular feature learning technologies in the field of machine learning and pattern recognition. It has been widely used and studied in the multi-view clustering tasks because of its effectiveness. This study proposes a general semi-supervised multi-view nonnegative matrix factorization algorithm. This algorithm incorporates discriminative and geometric information on data to learn a better-fused representation, and adopts a feature normalizing strategy to align the different views. Two specific implementations of this algorithm are developed to validate the effectiveness of the proposed framework: Graph regularization based Discriminatively Constrained Multi-View Nonnegative Matrix Factorization (GDCMVNMF) and Extended Multi-View Constrained Nonnegative Matrix Factorization (ExMVCNMF). The intrinsic connection between these two specific implementations is discussed, and the optimization based on multiply update rules is presented. Experiments on six datasets show that the effectiveness of GDCMVNMF and ExMVCNMF outperforms several representative unsupervised and semi-supervised multi-view NMF approaches.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"55-74"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372950","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
QAR Data Imputation Using Generative Adversarial Network with Self-Attention Mechanism 利用具有自我关注机制的生成对抗网络进行 QAR 数据推算
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020001
Jingqi Zhao;Chuitian Rong;Xin Dang;Huabo Sun
Quick Access Recorder (QAR), an important device for storing data from various flight parameters, contains a large amount of valuable data and comprehensively records the real state of the airline flight. However, the recorded data have certain missing values due to factors, such as weather and equipment anomalies. These missing values seriously affect the analysis of QAR data by aeronautical engineers, such as airline flight scenario reproduction and airline flight safety status assessment. Therefore, imputing missing values in the QAR data, which can further guarantee the flight safety of airlines, is crucial. QAR data also have multivariate, multiprocess, and temporal features. Therefore, we innovatively propose the imputation models A-AEGAN (“A” denotes attention mechanism, “AE” denotes autoencoder, and “GAN” denotes generative adversarial network) and SA-AEGAN (“SA” denotes self-attentive mechanism) for missing values of QAR data, which can be effectively applied to QAR data. Specifically, we apply an innovative generative adversarial network to impute missing values from QAR data. The improved gated recurrent unit is then introduced as the neural unit of GAN, which can successfully capture the temporal relationships in QAR data. In addition, we modify the basic structure of GAN by using an autoencoder as the generator and a recurrent neural network as the discriminator. The missing values in the QAR data are imputed by using the adversarial relationship between generator and discriminator. We introduce an attention mechanism in the autoencoder to further improve the capability of the proposed model to capture the features of QAR data. Attention mechanisms can maintain the correlation among QAR data and improve the capability of the model to impute missing data. Furthermore, we improve the proposed model by integrating a self-attention mechanism to further capture the relationship between different parameters within the QAR data. Experimental results on real datasets demonstrate that the model can reasonably impute the missing values in QAR data with excellent results.
快速存取记录仪(QAR)作为存储各种飞行参数数据的重要设备,包含了大量宝贵的数据,全面记录了航空公司飞行的真实情况。然而,由于天气和设备异常等因素,记录的数据存在一定的缺失值。这些缺失值严重影响了航空工程师对 QAR 数据的分析,如航空飞行场景再现、航空飞行安全状态评估等。因此,对 QAR 数据中的缺失值进行补偿,进一步保障航空公司的飞行安全至关重要。QAR 数据还具有多变量、多过程和时间特征。因此,我们创新性地提出了针对 QAR 数据缺失值的估算模型 A-AEGAN("A "表示注意机制,"AE "表示自动编码器,"GAN "表示生成对抗网络)和 SA-AEGAN("SA "表示自注意机制),可以有效地应用于 QAR 数据。具体来说,我们应用创新的生成式对抗网络来计算 QAR 数据的缺失值。然后引入改进的门控递归单元作为 GAN 的神经单元,它能成功捕捉 QAR 数据中的时间关系。此外,我们还修改了 GAN 的基本结构,使用自编码器作为生成器,使用递归神经网络作为判别器。QAR 数据中的缺失值是通过生成器和判别器之间的对抗关系来估算的。我们在自动编码器中引入了注意机制,以进一步提高所提模型捕捉 QAR 数据特征的能力。注意机制可以保持 QAR 数据之间的相关性,提高模型计算缺失数据的能力。此外,我们还通过整合自注意机制来改进所提出的模型,以进一步捕捉 QAR 数据中不同参数之间的关系。在真实数据集上的实验结果表明,该模型可以合理地补偿 QAR 数据中的缺失值,并取得了出色的效果。
{"title":"QAR Data Imputation Using Generative Adversarial Network with Self-Attention Mechanism","authors":"Jingqi Zhao;Chuitian Rong;Xin Dang;Huabo Sun","doi":"10.26599/BDMA.2023.9020001","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020001","url":null,"abstract":"Quick Access Recorder (QAR), an important device for storing data from various flight parameters, contains a large amount of valuable data and comprehensively records the real state of the airline flight. However, the recorded data have certain missing values due to factors, such as weather and equipment anomalies. These missing values seriously affect the analysis of QAR data by aeronautical engineers, such as airline flight scenario reproduction and airline flight safety status assessment. Therefore, imputing missing values in the QAR data, which can further guarantee the flight safety of airlines, is crucial. QAR data also have multivariate, multiprocess, and temporal features. Therefore, we innovatively propose the imputation models A-AEGAN (“A” denotes attention mechanism, “AE” denotes autoencoder, and “GAN” denotes generative adversarial network) and SA-AEGAN (“SA” denotes self-attentive mechanism) for missing values of QAR data, which can be effectively applied to QAR data. Specifically, we apply an innovative generative adversarial network to impute missing values from QAR data. The improved gated recurrent unit is then introduced as the neural unit of GAN, which can successfully capture the temporal relationships in QAR data. In addition, we modify the basic structure of GAN by using an autoencoder as the generator and a recurrent neural network as the discriminator. The missing values in the QAR data are imputed by using the adversarial relationship between generator and discriminator. We introduce an attention mechanism in the autoencoder to further improve the capability of the proposed model to capture the features of QAR data. Attention mechanisms can maintain the correlation among QAR data and improve the capability of the model to impute missing data. Furthermore, we improve the proposed model by integrating a self-attention mechanism to further capture the relationship between different parameters within the QAR data. Experimental results on real datasets demonstrate that the model can reasonably impute the missing values in QAR data with excellent results.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"12-28"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372953","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Smart Meter Data Encryption Scheme Based on Distributed Differential Privacy 基于分布式差分隐私的多智能电表数据加密方案
1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020008
Renwu Yan;Yang Zheng;Ning Yu;Cen Liang
Under the general trend of the rapid development of smart grids, data security and privacy are facing serious challenges; protecting the privacy data of single users under the premise of obtaining user-aggregated data has attracted widespread attention. In this study, we propose an encryption scheme on the basis of differential privacy for the problem of user privacy leakage when aggregating data from multiple smart meters. First, we use an improved homomorphic encryption method to realize the encryption aggregation of users' data. Second, we propose a double-blind noise addition protocol to generate distributed noise through interaction between users and a cloud platform to prevent semi-honest participants from stealing data by colluding with one another. Finally, the simulation results show that the proposed scheme can encrypt the transmission of multi-intelligent meter data under the premise of satisfying the differential privacy mechanism. Even if an attacker has enough background knowledge, the security of the electricity information of one another can be ensured.
在智能电网快速发展的大趋势下,数据安全和隐私保护面临严峻挑战,在获取用户聚合数据的前提下保护单个用户的隐私数据受到广泛关注。本研究针对多个智能电表数据聚合时用户隐私泄露的问题,提出了一种基于差分隐私的加密方案。首先,我们使用改进的同态加密方法实现用户数据的加密聚合。其次,我们提出了一种双盲噪声添加协议,通过用户与云平台之间的交互产生分布式噪声,防止半诚信参与者通过相互勾结窃取数据。最后,仿真结果表明,在满足差分隐私机制的前提下,所提出的方案可以对多智能仪表数据的传输进行加密。即使攻击者拥有足够的背景知识,也能确保彼此的用电信息安全。
{"title":"Multi-Smart Meter Data Encryption Scheme Based on Distributed Differential Privacy","authors":"Renwu Yan;Yang Zheng;Ning Yu;Cen Liang","doi":"10.26599/BDMA.2023.9020008","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020008","url":null,"abstract":"Under the general trend of the rapid development of smart grids, data security and privacy are facing serious challenges; protecting the privacy data of single users under the premise of obtaining user-aggregated data has attracted widespread attention. In this study, we propose an encryption scheme on the basis of differential privacy for the problem of user privacy leakage when aggregating data from multiple smart meters. First, we use an improved homomorphic encryption method to realize the encryption aggregation of users' data. Second, we propose a double-blind noise addition protocol to generate distributed noise through interaction between users and a cloud platform to prevent semi-honest participants from stealing data by colluding with one another. Finally, the simulation results show that the proposed scheme can encrypt the transmission of multi-intelligent meter data under the premise of satisfying the differential privacy mechanism. Even if an attacker has enough background knowledge, the security of the electricity information of one another can be ensured.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"131-141"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372998","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diagnosis and Detection of Alzheimer's Disease Using Learning Algorithm 基于学习算法的阿尔茨海默病诊断与检测
IF 13.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-12-01 DOI: 10.26599/bdma.2022.9020049
G. Shukla, Santosh Kumar, S. Pandey, Rohit Agarwal, Neeraj Varshney, Ankit Kumar
{"title":"Diagnosis and Detection of Alzheimer's Disease Using Learning Algorithm","authors":"G. Shukla, Santosh Kumar, S. Pandey, Rohit Agarwal, Neeraj Varshney, Ankit Kumar","doi":"10.26599/bdma.2022.9020049","DOIUrl":"https://doi.org/10.26599/bdma.2022.9020049","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"1 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69029476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data 利用Hadoop和MapReduce实现基于复制的大数据资源分配查询管理
IF 13.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020026
Ankit Kumar;Neeraj Varshney;Surbhi Bhatiya;Kamred Udham Singh
We live in an age where everything around us is being created. Data generation rates are so scary, creating pressure to implement costly and straightforward data storage and recovery processes. MapReduce model functionality is used for creating a cluster parallel, distributed algorithm, and large datasets. The MapReduce strategy from Hadoop helps develop a community of non-commercial use to offer a new algorithm for resolving such problems for commercial applications as expected from this working algorithm with insights as a result of disproportionate or discriminatory Hadoop cluster results. Expected results are obtained in the work and the exam conducted under this job; many of them are scheduled to set schedules, match matrices' data positions, clustering before determining to click, and accurate mapping and internal reliability to be closed together to avoid running and execution times. Mapper output and proponents have been implemented, and the map has been used to reduce the function. The execution input key/value pair and output key/value pair have been set. This paper focuses on evaluating this technique for the efficient retrieval of large volumes of data. The technique allows for capabilities to inform a massive database of information, from storage and indexing techniques to the distribution of queries, scalability, and performance in heterogeneous environments. The results show that the proposed work reduces the data processing time by 30%.
我们生活在一个我们周围的一切都在被创造的时代。数据生成率如此之高,给实施成本高昂且简单的数据存储和恢复过程带来了压力。MapReduce模型功能用于创建集群并行、分布式算法和大型数据集。Hadoop的MapReduce策略有助于开发一个非商业用途的社区,以提供一种新的算法来解决商业应用程序中的此类问题,正如该工作算法所预期的那样,由于Hadoop集群结果不相称或歧视性,它具有洞察力。在这份工作下进行的工作和考试取得了预期成绩;它们中的许多都被安排来设置时间表、匹配矩阵的数据位置、在确定点击之前进行聚类、准确的映射和内部可靠性,以避免运行和执行时间。已经实现了映射器输出和支持者,并使用映射来减少功能。执行输入键值对和输出键值对已经设置。本文的重点是评估这种技术对大量数据的有效检索。该技术允许向大型数据库提供信息,从存储和索引技术到异构环境中的查询分布、可扩展性和性能。结果表明,所提出的工作将数据处理时间减少了30%。
{"title":"Replication-Based Query Management for Resource Allocation Using Hadoop and MapReduce over Big Data","authors":"Ankit Kumar;Neeraj Varshney;Surbhi Bhatiya;Kamred Udham Singh","doi":"10.26599/BDMA.2022.9020026","DOIUrl":"10.26599/BDMA.2022.9020026","url":null,"abstract":"We live in an age where everything around us is being created. Data generation rates are so scary, creating pressure to implement costly and straightforward data storage and recovery processes. MapReduce model functionality is used for creating a cluster parallel, distributed algorithm, and large datasets. The MapReduce strategy from Hadoop helps develop a community of non-commercial use to offer a new algorithm for resolving such problems for commercial applications as expected from this working algorithm with insights as a result of disproportionate or discriminatory Hadoop cluster results. Expected results are obtained in the work and the exam conducted under this job; many of them are scheduled to set schedules, match matrices' data positions, clustering before determining to click, and accurate mapping and internal reliability to be closed together to avoid running and execution times. Mapper output and proponents have been implemented, and the map has been used to reduce the function. The execution input key/value pair and output key/value pair have been set. This paper focuses on evaluating this technique for the efficient retrieval of large volumes of data. The technique allows for capabilities to inform a massive database of information, from storage and indexing techniques to the distribution of queries, scalability, and performance in heterogeneous environments. The results show that the proposed work reduces the data processing time by 30%.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"465-477"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233249.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49356278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Clinical Data Analysis Based Diagnostic Systems for Heart Disease Prediction Using Ensemble Method 基于临床数据分析的心脏病集成预测诊断系统
IF 13.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020052
Ankit Kumar;Kamred Udham Singh;Manish Kumar
The correct diagnosis of heart disease can save lives, while the incorrect diagnosis can be lethal. The UCI machine learning heart disease dataset compares the results and analyses of various machine learning approaches, including deep learning. We used a dataset with 13 primary characteristics to carry out the research. Support vector machine and logistic regression algorithms are used to process the datasets, and the latter displays the highest accuracy in predicting coronary disease. Python programming is used to process the datasets. Multiple research initiatives have used machine learning to speed up the healthcare sector. We also used conventional machine learning approaches in our investigation to uncover the links between the numerous features available in the dataset and then used them effectively in anticipation of heart infection risks. Using the accuracy and confusion matrix has resulted in some favorable outcomes. To get the best results, the dataset contains certain unnecessary features that are dealt with using isolation logistic regression and Support Vector Machine (SVM) classification.
心脏病的正确诊断可以挽救生命,而不正确的诊断可能是致命的。UCI机器学习心脏病数据集比较了包括深度学习在内的各种机器学习方法的结果和分析。我们使用了一个具有13个主要特征的数据集来进行研究。支持向量机和逻辑回归算法用于处理数据集,后者在预测冠心病方面显示出最高的准确性。Python编程用于处理数据集。多项研究计划都使用机器学习来加快医疗保健领域的发展。我们在调查中还使用了传统的机器学习方法来揭示数据集中可用的众多特征之间的联系,然后有效地使用它们来预测心脏感染风险。使用准确度和混淆矩阵已经产生了一些有利的结果。为了获得最佳结果,数据集包含某些不必要的特征,这些特征使用隔离逻辑回归和支持向量机(SVM)分类进行处理。
{"title":"A Clinical Data Analysis Based Diagnostic Systems for Heart Disease Prediction Using Ensemble Method","authors":"Ankit Kumar;Kamred Udham Singh;Manish Kumar","doi":"10.26599/BDMA.2022.9020052","DOIUrl":"10.26599/BDMA.2022.9020052","url":null,"abstract":"The correct diagnosis of heart disease can save lives, while the incorrect diagnosis can be lethal. The UCI machine learning heart disease dataset compares the results and analyses of various machine learning approaches, including deep learning. We used a dataset with 13 primary characteristics to carry out the research. Support vector machine and logistic regression algorithms are used to process the datasets, and the latter displays the highest accuracy in predicting coronary disease. Python programming is used to process the datasets. Multiple research initiatives have used machine learning to speed up the healthcare sector. We also used conventional machine learning approaches in our investigation to uncover the links between the numerous features available in the dataset and then used them effectively in anticipation of heart infection risks. Using the accuracy and confusion matrix has resulted in some favorable outcomes. To get the best results, the dataset contains certain unnecessary features that are dealt with using isolation logistic regression and Support Vector Machine (SVM) classification.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"513-525"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233243.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42487577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data Mining and Analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1