Big Data Mining and Analytics最新文献

英文中文

IoTDQ: An Industrial IoT Data Analysis Library for Apache IoTDB IoTDQ：适用于 Apache IoTDB 的工业物联网数据分析库

1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020010

Pengyu Chen;Wendi He;Wenxuan Ma;Xiangdong Huang;Chen Wang

There is a growing demand for time series data analysis in industry areas. Apache IoTDB is a time series database designed for the Internet of Things (IoT) with enhanced storage and I/O performance. With User-Defined Functions (UDF) provided, computation for time series can be executed on Apache IoTDB directly. To satisfy most of the common requirements in industrial time series analysis, we create a UDF library, IoTDQ, on Apache IoTDB. This library integrates stream computation functions on data quality analysis, data profiling, anomaly detection, data repairing, etc. IoTDQ enables users to conduct a wide range of analyses, such as monitoring, error diagnosis, equipment reliability analysis. It provides a framework for users to examine IoT time series with data quality problems. Experiments show that IoTDQ keeps the same level of performance compared to mainstream alternatives, and shortens I/O consumption for Apache IoTDB users.

工业领域对时间序列数据分析的需求日益增长。Apache IoTDB 是专为物联网（IoT）设计的时间序列数据库，具有更强的存储和 I/O 性能。通过提供用户自定义函数（UDF），可以直接在 Apache IoTDB 上执行时间序列计算。为了满足工业时间序列分析中的大多数常见要求，我们在 Apache IoTDB 上创建了一个 UDF 库 IoTDQ。该库集成了数据质量分析、数据剖析、异常检测、数据修复等流计算功能。IoTDQ 使用户能够进行各种分析，如监控、错误诊断、设备可靠性分析等。它为用户检查存在数据质量问题的物联网时间序列提供了一个框架。实验表明，与主流替代方案相比，IoTDQ 保持了相同的性能水平，并缩短了 Apache IoTDB 用户的 I/O 消耗。

引用次数: 0

Call for Papers: Special Issue on Challenges and Opportunities in Biomedical Big Data Analysis: From Large Language Models to Clinical Applications 征稿：生物医学大数据分析的挑战与机遇特刊：从大型语言模型到临床应用

1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020026

引用次数: 0

Molecular Generation and Optimization of Molecular Properties Using a Transformer Model 分子生成和使用变压器模型优化分子特性

1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020009

Zhongyin Xu;Xiujuan Lei;Mei Ma;Yi Pan

Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12 365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.

生成满足特定性质的新型分子是现代药物发现中一项具有挑战性的任务，它需要在满足化学规则的基础上优化特定目标。在这里，我们的目标是优化特定分子的特性，以满足生成分子的特定特性。这里使用的是包含源分子和目标分子的匹配分子对（MMPs），并选择对数密度（logD）和溶解度（solubility）作为优化属性。主要的创新工作在于从矩阵维度的角度计算特定转换器的相关数据。然后，利用阈值区间和状态变化对 logD 和溶解度进行编码，以便进行后续测试。在实验过程中，我们根据各组中重原子占所有原子的比例来筛选数据，并分别选择 12 365、1503 和 1570 个 MMP 作为训练集、验证集和测试集。在生成具有特定性质的分子的能力方面，将变换器模型与基线模型进行了比较。结果表明，变换器模型可以准确地优化源分子以满足特定属性。

{"title":"Molecular Generation and Optimization of Molecular Properties Using a Transformer Model","authors":"Zhongyin Xu;Xiujuan Lei;Mei Ma;Yi Pan","doi":"10.26599/BDMA.2023.9020009","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020009","url":null,"abstract":"Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery, which requires the optimization of a specific objective based on satisfying chemical rules. Herein, we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated molecule. The Matched Molecular Pairs (MMPs), which contain the source and target molecules, are used herein, and logD and solubility are selected as the optimization properties. The main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix dimension. Threshold intervals and state changes are then used to encode logD and solubility for subsequent tests. During the experiments, we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12 365, 1503, and 1570 MMPs as the training, validation, and test sets, respectively. Transformer models are compared with the baseline models with respect to their abilities to generate molecules with specific properties. Results show that the transformer model can accurately optimize the source molecules to satisfy specific properties.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"142-155"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incremental Data Stream Classification with Adaptive Multi-Task Multi-View Learning 利用自适应多任务多视图学习进行增量数据流分类

1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020006

Jun Wang;Maiwang Shi;Xiao Zhang;Yan Li;Yunsheng Yuan;Chenglei Yang;Dongxiao Yu

With the enhancement of data collection capabilities, massive streaming data have been accumulated in numerous application scenarios. Specifically, the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple sensors. Existing incremental learning methods are often single-task single-view, which cannot learn shared representations between relevant tasks and views. An adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges, utilizing the idea of multi-task multi-view learning. Specifically, the attention mechanism is first used to align different sensor data of different views. In addition, MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental learning. Results reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines.

随着数据收集能力的增强，在众多应用场景中积累了大量的流数据。具体来说，基于移动传感器的数据流分类问题可以形式化为一个多任务多视图学习问题，具体任务包括从多个传感器收集的具有共享特征的多个视图。现有的增量学习方法通常是单任务单视图的，无法学习相关任务和视图之间的共享表征。为了应对上述挑战，我们利用多任务多视图学习的思想，提出了一种用于数据流分类的自适应多任务多视图增量学习框架，称为 MTMVIS。具体来说，首先利用注意力机制对不同视角的传感器数据进行对齐。此外，MTMVIS 还从多任务多视图学习的角度出发，使用自适应 Fisher 正则化来克服增量学习中的灾难性遗忘。结果表明，根据在两个不同数据集上与其他基线进行的实验，所提出的框架优于最先进的方法。

{"title":"Incremental Data Stream Classification with Adaptive Multi-Task Multi-View Learning","authors":"Jun Wang;Maiwang Shi;Xiao Zhang;Yan Li;Yunsheng Yuan;Chenglei Yang;Dongxiao Yu","doi":"10.26599/BDMA.2023.9020006","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020006","url":null,"abstract":"With the enhancement of data collection capabilities, massive streaming data have been accumulated in numerous application scenarios. Specifically, the issue of classifying data streams based on mobile sensors can be formalized as a multi-task multi-view learning problem with a specific task comprising multiple views with shared features collected from multiple sensors. Existing incremental learning methods are often single-task single-view, which cannot learn shared representations between relevant tasks and views. An adaptive multi-task multi-view incremental learning framework for data stream classification called MTMVIS is proposed to address the above challenges, utilizing the idea of multi-task multi-view learning. Specifically, the attention mechanism is first used to align different sensor data of different views. In addition, MTMVIS uses adaptive Fisher regularization from the perspective of multi-task multi-view learning to overcome catastrophic forgetting in incremental learning. Results reveal that the proposed framework outperforms state-of-the-art methods based on the experiments on two different datasets with other baselines.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"87-106"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discriminatively Constrained Semi-Supervised Multi-View Nonnegative Matrix Factorization with Graph Regularization 带图正则化的判别约束半监督多视图非负矩阵因式分解

1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020004

Guosheng Cui;Ye Li;Jianzhong Li;Jianping Fan

Nonnegative Matrix Factorization (NMF) is one of the most popular feature learning technologies in the field of machine learning and pattern recognition. It has been widely used and studied in the multi-view clustering tasks because of its effectiveness. This study proposes a general semi-supervised multi-view nonnegative matrix factorization algorithm. This algorithm incorporates discriminative and geometric information on data to learn a better-fused representation, and adopts a feature normalizing strategy to align the different views. Two specific implementations of this algorithm are developed to validate the effectiveness of the proposed framework: Graph regularization based Discriminatively Constrained Multi-View Nonnegative Matrix Factorization (GDCMVNMF) and Extended Multi-View Constrained Nonnegative Matrix Factorization (ExMVCNMF). The intrinsic connection between these two specific implementations is discussed, and the optimization based on multiply update rules is presented. Experiments on six datasets show that the effectiveness of GDCMVNMF and ExMVCNMF outperforms several representative unsupervised and semi-supervised multi-view NMF approaches.

非负矩阵分解（NMF）是机器学习和模式识别领域最流行的特征学习技术之一。由于其高效性，它在多视图聚类任务中得到了广泛的应用和研究。本研究提出了一种通用的半监督多视角非负矩阵因式分解算法。该算法结合了数据的判别信息和几何信息，以学习更好的融合表示，并采用特征归一化策略来调整不同视图。我们开发了该算法的两个具体实现，以验证所提框架的有效性：基于图形正则化的判别约束多视图非负矩阵分解（GDCMVNMF）和扩展多视图约束非负矩阵分解（ExMVCNMF）。本文讨论了这两种具体实现之间的内在联系，并介绍了基于乘法更新规则的优化方法。在六个数据集上进行的实验表明，GDCMVNMF 和 ExMVCNMF 的效果优于几种代表性的无监督和半监督多视图 NMF 方法。

{"title":"Discriminatively Constrained Semi-Supervised Multi-View Nonnegative Matrix Factorization with Graph Regularization","authors":"Guosheng Cui;Ye Li;Jianzhong Li;Jianping Fan","doi":"10.26599/BDMA.2023.9020004","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020004","url":null,"abstract":"Nonnegative Matrix Factorization (NMF) is one of the most popular feature learning technologies in the field of machine learning and pattern recognition. It has been widely used and studied in the multi-view clustering tasks because of its effectiveness. This study proposes a general semi-supervised multi-view nonnegative matrix factorization algorithm. This algorithm incorporates discriminative and geometric information on data to learn a better-fused representation, and adopts a feature normalizing strategy to align the different views. Two specific implementations of this algorithm are developed to validate the effectiveness of the proposed framework: Graph regularization based Discriminatively Constrained Multi-View Nonnegative Matrix Factorization (GDCMVNMF) and Extended Multi-View Constrained Nonnegative Matrix Factorization (ExMVCNMF). The intrinsic connection between these two specific implementations is discussed, and the optimization based on multiply update rules is presented. Experiments on six datasets show that the effectiveness of GDCMVNMF and ExMVCNMF outperforms several representative unsupervised and semi-supervised multi-view NMF approaches.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"55-74"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372950","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QAR Data Imputation Using Generative Adversarial Network with Self-Attention Mechanism 利用具有自我关注机制的生成对抗网络进行 QAR 数据推算

1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020001

Jingqi Zhao;Chuitian Rong;Xin Dang;Huabo Sun

Quick Access Recorder (QAR), an important device for storing data from various flight parameters, contains a large amount of valuable data and comprehensively records the real state of the airline flight. However, the recorded data have certain missing values due to factors, such as weather and equipment anomalies. These missing values seriously affect the analysis of QAR data by aeronautical engineers, such as airline flight scenario reproduction and airline flight safety status assessment. Therefore, imputing missing values in the QAR data, which can further guarantee the flight safety of airlines, is crucial. QAR data also have multivariate, multiprocess, and temporal features. Therefore, we innovatively propose the imputation models A-AEGAN (“A” denotes attention mechanism, “AE” denotes autoencoder, and “GAN” denotes generative adversarial network) and SA-AEGAN (“SA” denotes self-attentive mechanism) for missing values of QAR data, which can be effectively applied to QAR data. Specifically, we apply an innovative generative adversarial network to impute missing values from QAR data. The improved gated recurrent unit is then introduced as the neural unit of GAN, which can successfully capture the temporal relationships in QAR data. In addition, we modify the basic structure of GAN by using an autoencoder as the generator and a recurrent neural network as the discriminator. The missing values in the QAR data are imputed by using the adversarial relationship between generator and discriminator. We introduce an attention mechanism in the autoencoder to further improve the capability of the proposed model to capture the features of QAR data. Attention mechanisms can maintain the correlation among QAR data and improve the capability of the model to impute missing data. Furthermore, we improve the proposed model by integrating a self-attention mechanism to further capture the relationship between different parameters within the QAR data. Experimental results on real datasets demonstrate that the model can reasonably impute the missing values in QAR data with excellent results.

快速存取记录仪（QAR）作为存储各种飞行参数数据的重要设备，包含了大量宝贵的数据，全面记录了航空公司飞行的真实情况。然而，由于天气和设备异常等因素，记录的数据存在一定的缺失值。这些缺失值严重影响了航空工程师对 QAR 数据的分析，如航空飞行场景再现、航空飞行安全状态评估等。因此，对 QAR 数据中的缺失值进行补偿，进一步保障航空公司的飞行安全至关重要。QAR 数据还具有多变量、多过程和时间特征。因此，我们创新性地提出了针对 QAR 数据缺失值的估算模型 A-AEGAN（"A "表示注意机制，"AE "表示自动编码器，"GAN "表示生成对抗网络）和 SA-AEGAN（"SA "表示自注意机制），可以有效地应用于 QAR 数据。具体来说，我们应用创新的生成式对抗网络来计算 QAR 数据的缺失值。然后引入改进的门控递归单元作为 GAN 的神经单元，它能成功捕捉 QAR 数据中的时间关系。此外，我们还修改了 GAN 的基本结构，使用自编码器作为生成器，使用递归神经网络作为判别器。QAR 数据中的缺失值是通过生成器和判别器之间的对抗关系来估算的。我们在自动编码器中引入了注意机制，以进一步提高所提模型捕捉 QAR 数据特征的能力。注意机制可以保持 QAR 数据之间的相关性，提高模型计算缺失数据的能力。此外，我们还通过整合自注意机制来改进所提出的模型，以进一步捕捉 QAR 数据中不同参数之间的关系。在真实数据集上的实验结果表明，该模型可以合理地补偿 QAR 数据中的缺失值，并取得了出色的效果。

{"title":"QAR Data Imputation Using Generative Adversarial Network with Self-Attention Mechanism","authors":"Jingqi Zhao;Chuitian Rong;Xin Dang;Huabo Sun","doi":"10.26599/BDMA.2023.9020001","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020001","url":null,"abstract":"Quick Access Recorder (QAR), an important device for storing data from various flight parameters, contains a large amount of valuable data and comprehensively records the real state of the airline flight. However, the recorded data have certain missing values due to factors, such as weather and equipment anomalies. These missing values seriously affect the analysis of QAR data by aeronautical engineers, such as airline flight scenario reproduction and airline flight safety status assessment. Therefore, imputing missing values in the QAR data, which can further guarantee the flight safety of airlines, is crucial. QAR data also have multivariate, multiprocess, and temporal features. Therefore, we innovatively propose the imputation models A-AEGAN (“A” denotes attention mechanism, “AE” denotes autoencoder, and “GAN” denotes generative adversarial network) and SA-AEGAN (“SA” denotes self-attentive mechanism) for missing values of QAR data, which can be effectively applied to QAR data. Specifically, we apply an innovative generative adversarial network to impute missing values from QAR data. The improved gated recurrent unit is then introduced as the neural unit of GAN, which can successfully capture the temporal relationships in QAR data. In addition, we modify the basic structure of GAN by using an autoencoder as the generator and a recurrent neural network as the discriminator. The missing values in the QAR data are imputed by using the adversarial relationship between generator and discriminator. We introduce an attention mechanism in the autoencoder to further improve the capability of the proposed model to capture the features of QAR data. Attention mechanisms can maintain the correlation among QAR data and improve the capability of the model to impute missing data. Furthermore, we improve the proposed model by integrating a self-attention mechanism to further capture the relationship between different parameters within the QAR data. Experimental results on real datasets demonstrate that the model can reasonably impute the missing values in QAR data with excellent results.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"12-28"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372953","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Smart Meter Data Encryption Scheme Based on Distributed Differential Privacy 基于分布式差分隐私的多智能电表数据加密方案

1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-12-25 DOI: 10.26599/BDMA.2023.9020008

Renwu Yan;Yang Zheng;Ning Yu;Cen Liang

Under the general trend of the rapid development of smart grids, data security and privacy are facing serious challenges; protecting the privacy data of single users under the premise of obtaining user-aggregated data has attracted widespread attention. In this study, we propose an encryption scheme on the basis of differential privacy for the problem of user privacy leakage when aggregating data from multiple smart meters. First, we use an improved homomorphic encryption method to realize the encryption aggregation of users' data. Second, we propose a double-blind noise addition protocol to generate distributed noise through interaction between users and a cloud platform to prevent semi-honest participants from stealing data by colluding with one another. Finally, the simulation results show that the proposed scheme can encrypt the transmission of multi-intelligent meter data under the premise of satisfying the differential privacy mechanism. Even if an attacker has enough background knowledge, the security of the electricity information of one another can be ensured.

在智能电网快速发展的大趋势下，数据安全和隐私保护面临严峻挑战，在获取用户聚合数据的前提下保护单个用户的隐私数据受到广泛关注。本研究针对多个智能电表数据聚合时用户隐私泄露的问题，提出了一种基于差分隐私的加密方案。首先，我们使用改进的同态加密方法实现用户数据的加密聚合。其次，我们提出了一种双盲噪声添加协议，通过用户与云平台之间的交互产生分布式噪声，防止半诚信参与者通过相互勾结窃取数据。最后，仿真结果表明，在满足差分隐私机制的前提下，所提出的方案可以对多智能仪表数据的传输进行加密。即使攻击者拥有足够的背景知识，也能确保彼此的用电信息安全。

{"title":"Multi-Smart Meter Data Encryption Scheme Based on Distributed Differential Privacy","authors":"Renwu Yan;Yang Zheng;Ning Yu;Cen Liang","doi":"10.26599/BDMA.2023.9020008","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020008","url":null,"abstract":"Under the general trend of the rapid development of smart grids, data security and privacy are facing serious challenges; protecting the privacy data of single users under the premise of obtaining user-aggregated data has attracted widespread attention. In this study, we propose an encryption scheme on the basis of differential privacy for the problem of user privacy leakage when aggregating data from multiple smart meters. First, we use an improved homomorphic encryption method to realize the encryption aggregation of users' data. Second, we propose a double-blind noise addition protocol to generate distributed noise through interaction between users and a cloud platform to prevent semi-honest participants from stealing data by colluding with one another. Finally, the simulation results show that the proposed scheme can encrypt the transmission of multi-intelligent meter data under the premise of satisfying the differential privacy mechanism. Even if an attacker has enough background knowledge, the security of the electricity information of one another can be ensured.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"131-141"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372998","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diagnosis and Detection of Alzheimer's Disease Using Learning Algorithm 基于学习算法的阿尔茨海默病诊断与检测

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-12-01 DOI: 10.26599/bdma.2022.9020049

G. Shukla, Santosh Kumar, S. Pandey, Rohit Agarwal, Neeraj Varshney, Ankit Kumar

引用次数: 0

Total Contents 总目录

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29

引用次数: 0

Diagnosis and Detection of Alzheimer's Disease Using Learning Algorithm 利用学习算法诊断和检测阿尔茨海默病

IF 13.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Mining and Analytics

Pub Date : 2023-08-29 DOI: 10.26599/BDMA.2022.9020049

Gargi Pant Shukla;Santosh Kumar;Saroj Kumar Pandey;Rohit Agarwal;Neeraj Varshney;Ankit Kumar

In Computer-Aided Detection (CAD) brain disease classification is a vital issue. Alzheimer's Disease (AD) and brain tumors are the primary reasons of death. The studies of these diseases are carried out by Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), and Computed Tomography (CT) scans which require expertise to understand the modality. The disease is the most prevalent in the elderly and can be fatal in its later stages. The result can be determined by calculating the mini-mental state exam score, following which the MRI scan of the brain is successful. Apart from that, various classification algorithms, such as machine learning and deep learning, are useful for diagnosing MRI scans. However, they do have some limitations in terms of accuracy. This paper proposes some insightful pre-processing methods that significantly improve the classification performance of these MRI images. Additionally, it reduced the time it took to train the model of various pre-existing learning algorithms. A dataset was obtained from Alzheimer's Disease Neurological Initiative (ADNI) and converted from a 4D format to a 2D format. Selective clipping, grayscale image conversion, and histogram equalization techniques were used to pre-process the images. After pre-processing, we proposed three learning algorithms for AD classification, that is random forest, XGBoost, and Convolution Neural Networks (CNN). Results are computed on dataset and show that it outperformed with exiting work in terms of accuracy is 97.57% and sensitivity is 97.60%.

在计算机辅助检测（CAD）中，脑部疾病的分类是一个至关重要的问题。阿尔茨海默病（AD）和脑肿瘤是死亡的主要原因。这些疾病的研究是通过磁共振成像（MRI）、正电子发射断层扫描（PET）和计算机断层扫描（CT）进行的，这些扫描需要专业知识来了解其形态。这种疾病在老年人中最为普遍，在后期可能致命。结果可以通过计算迷你精神状态检查分数来确定，随后大脑的MRI扫描是成功的。除此之外，各种分类算法，如机器学习和深度学习，对诊断MRI扫描很有用。然而，它们在准确性方面确实有一些局限性。本文提出了一些有见地的预处理方法，显著提高了这些MRI图像的分类性能。此外，它还减少了训练各种预先存在的学习算法模型所需的时间。数据集从阿尔茨海默病神经倡议组织（ADNI）获得，并从4D格式转换为2D格式。使用选择性剪切、灰度图像转换和直方图均衡技术对图像进行预处理。经过预处理，我们提出了三种用于AD分类的学习算法，即随机森林、XGBoost和卷积神经网络（CNN）。在数据集上计算结果表明，它的准确率为97.57%，灵敏度为97.60%，优于现有工作。

{"title":"Diagnosis and Detection of Alzheimer's Disease Using Learning Algorithm","authors":"Gargi Pant Shukla;Santosh Kumar;Saroj Kumar Pandey;Rohit Agarwal;Neeraj Varshney;Ankit Kumar","doi":"10.26599/BDMA.2022.9020049","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020049","url":null,"abstract":"In Computer-Aided Detection (CAD) brain disease classification is a vital issue. Alzheimer's Disease (AD) and brain tumors are the primary reasons of death. The studies of these diseases are carried out by Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), and Computed Tomography (CT) scans which require expertise to understand the modality. The disease is the most prevalent in the elderly and can be fatal in its later stages. The result can be determined by calculating the mini-mental state exam score, following which the MRI scan of the brain is successful. Apart from that, various classification algorithms, such as machine learning and deep learning, are useful for diagnosing MRI scans. However, they do have some limitations in terms of accuracy. This paper proposes some insightful pre-processing methods that significantly improve the classification performance of these MRI images. Additionally, it reduced the time it took to train the model of various pre-existing learning algorithms. A dataset was obtained from Alzheimer's Disease Neurological Initiative (ADNI) and converted from a 4D format to a 2D format. Selective clipping, grayscale image conversion, and histogram equalization techniques were used to pre-process the images. After pre-processing, we proposed three learning algorithms for AD classification, that is random forest, XGBoost, and Convolution Neural Networks (CNN). Results are computed on dataset and show that it outperformed with exiting work in terms of accuracy is 97.57% and sensitivity is 97.60%.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 4","pages":"504-512"},"PeriodicalIF":13.6,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/10233239/10233244.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68010085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Big Data Mining and Analytics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀