{"title":"基于大数据医疗数据保存的改进三维旋转几何数据摄动","authors":"Jayanti Dansana, M. R. Kabat, P. Pattnaik","doi":"10.14569/ijacsa.2023.0140592","DOIUrl":null,"url":null,"abstract":"— With the rise in technology, a huge volume of data is being processed using data mining, especially in the healthcare sector. Usually, medical data consist of a lot of personal data, and third parties utilize it for the data mining process. Perturbation in health care data highly aids in preventing intruders from utilizing the patient’s privacy. One of the challenges in data perturbation is managing data utility and privacy protection. Medical data mining has certain special properties compared with other data mining fields. Hence, in this work, the machine learning (ML) based perturbation approach is introduced to provide more privacy to healthcare data. Here, clustering and IGDP-3DR processes are applied to improve healthcare privacy preservation. Initially, the dataset is pre-processed using data normalization. Then, the dimensionality is reduced by SVD with PCA (singular value decomposition with Principal component analysis). Then, the clustering process is performed by IFCM (Improved Fuzzy C means). The high-dimensional data are divided into several segments by IFCM, and every partition is set as a cluster. Then, improved Geometric Data Perturbation (IGDP) is used to perturb the clustered data. IGDP is a combination of GDP with 3D rotation (3DR). Finally, the perturbed data are classified using a machine learning (ML) classifier - kernel Support Vector Machine- Horse Herd Optimization (KSVM-HHO) to classify the perturbed data and ensure better accuracy. The overall evaluation of the proposed KSVM-HHO is carried out in the Python platform. The performance of the IGDP-KSVM-HHO is compared over the two benchmark datasets, Wisconsin prognostic breast cancer (WBC) and Pima Indians Diabetes (PID) dataset. For the WBC dataset, the proposed method obtains an overall accuracy of 98.08% perturbed data, and for the PID dataset, the proposed method obtains an overall accuracy of 98.04%.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":"70 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved 3D Rotation-based Geometric Data Perturbation Based on Medical Data Preservation in Big Data\",\"authors\":\"Jayanti Dansana, M. R. Kabat, P. Pattnaik\",\"doi\":\"10.14569/ijacsa.2023.0140592\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"— With the rise in technology, a huge volume of data is being processed using data mining, especially in the healthcare sector. Usually, medical data consist of a lot of personal data, and third parties utilize it for the data mining process. Perturbation in health care data highly aids in preventing intruders from utilizing the patient’s privacy. One of the challenges in data perturbation is managing data utility and privacy protection. Medical data mining has certain special properties compared with other data mining fields. Hence, in this work, the machine learning (ML) based perturbation approach is introduced to provide more privacy to healthcare data. Here, clustering and IGDP-3DR processes are applied to improve healthcare privacy preservation. Initially, the dataset is pre-processed using data normalization. Then, the dimensionality is reduced by SVD with PCA (singular value decomposition with Principal component analysis). Then, the clustering process is performed by IFCM (Improved Fuzzy C means). The high-dimensional data are divided into several segments by IFCM, and every partition is set as a cluster. Then, improved Geometric Data Perturbation (IGDP) is used to perturb the clustered data. IGDP is a combination of GDP with 3D rotation (3DR). Finally, the perturbed data are classified using a machine learning (ML) classifier - kernel Support Vector Machine- Horse Herd Optimization (KSVM-HHO) to classify the perturbed data and ensure better accuracy. The overall evaluation of the proposed KSVM-HHO is carried out in the Python platform. The performance of the IGDP-KSVM-HHO is compared over the two benchmark datasets, Wisconsin prognostic breast cancer (WBC) and Pima Indians Diabetes (PID) dataset. For the WBC dataset, the proposed method obtains an overall accuracy of 98.08% perturbed data, and for the PID dataset, the proposed method obtains an overall accuracy of 98.04%.\",\"PeriodicalId\":13824,\"journal\":{\"name\":\"International Journal of Advanced Computer Science and Applications\",\"volume\":\"70 1\",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Computer Science and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14569/ijacsa.2023.0140592\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Computer Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14569/ijacsa.2023.0140592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
摘要
-随着技术的发展,正在使用数据挖掘处理大量数据,特别是在医疗保健领域。通常,医疗数据由大量个人数据组成,第三方利用这些数据进行数据挖掘。在医疗保健数据的扰动高度有助于防止入侵者利用病人的隐私。数据扰动的挑战之一是管理数据效用和隐私保护。与其他数据挖掘领域相比,医疗数据挖掘具有一定的特殊性。因此,在这项工作中,引入了基于机器学习(ML)的扰动方法来为医疗保健数据提供更多隐私。在这里,应用聚类和IGDP-3DR流程来改进医疗保健隐私保护。首先,使用数据规范化对数据集进行预处理。然后,利用主成分分析的奇异值分解(singular value decomposition with Principal component analysis)进行SVD降维。然后,通过IFCM(改进模糊C均值)进行聚类处理。IFCM将高维数据分成若干段,并将每个分区设置为一个聚类。然后,采用改进的几何数据摄动(IGDP)对聚类数据进行摄动。IGDP是GDP与3D旋转(3DR)的组合。最后,使用机器学习(ML)分类器-核支持向量机-马群优化(KSVM-HHO)对扰动数据进行分类,以确保更好的精度。提出的KSVM-HHO的总体评估是在Python平台上进行的。IGDP-KSVM-HHO的性能在两个基准数据集上进行了比较,威斯康星州预后乳腺癌(WBC)和皮马印第安人糖尿病(PID)数据集。对于WBC数据集,所提方法得到的扰动数据总体准确率为98.08%,对于PID数据集,所提方法得到的扰动数据总体准确率为98.04%。
Improved 3D Rotation-based Geometric Data Perturbation Based on Medical Data Preservation in Big Data
— With the rise in technology, a huge volume of data is being processed using data mining, especially in the healthcare sector. Usually, medical data consist of a lot of personal data, and third parties utilize it for the data mining process. Perturbation in health care data highly aids in preventing intruders from utilizing the patient’s privacy. One of the challenges in data perturbation is managing data utility and privacy protection. Medical data mining has certain special properties compared with other data mining fields. Hence, in this work, the machine learning (ML) based perturbation approach is introduced to provide more privacy to healthcare data. Here, clustering and IGDP-3DR processes are applied to improve healthcare privacy preservation. Initially, the dataset is pre-processed using data normalization. Then, the dimensionality is reduced by SVD with PCA (singular value decomposition with Principal component analysis). Then, the clustering process is performed by IFCM (Improved Fuzzy C means). The high-dimensional data are divided into several segments by IFCM, and every partition is set as a cluster. Then, improved Geometric Data Perturbation (IGDP) is used to perturb the clustered data. IGDP is a combination of GDP with 3D rotation (3DR). Finally, the perturbed data are classified using a machine learning (ML) classifier - kernel Support Vector Machine- Horse Herd Optimization (KSVM-HHO) to classify the perturbed data and ensure better accuracy. The overall evaluation of the proposed KSVM-HHO is carried out in the Python platform. The performance of the IGDP-KSVM-HHO is compared over the two benchmark datasets, Wisconsin prognostic breast cancer (WBC) and Pima Indians Diabetes (PID) dataset. For the WBC dataset, the proposed method obtains an overall accuracy of 98.08% perturbed data, and for the PID dataset, the proposed method obtains an overall accuracy of 98.04%.
期刊介绍:
IJACSA is a scholarly computer science journal representing the best in research. Its mission is to provide an outlet for quality research to be publicised and published to a global audience. The journal aims to publish papers selected through rigorous double-blind peer review to ensure originality, timeliness, relevance, and readability. In sync with the Journal''s vision "to be a respected publication that publishes peer reviewed research articles, as well as review and survey papers contributed by International community of Authors", we have drawn reviewers and editors from Institutions and Universities across the globe. A double blind peer review process is conducted to ensure that we retain high standards. At IJACSA, we stand strong because we know that global challenges make way for new innovations, new ways and new talent. International Journal of Advanced Computer Science and Applications publishes carefully refereed research, review and survey papers which offer a significant contribution to the computer science literature, and which are of interest to a wide audience. Coverage extends to all main-stream branches of computer science and related applications