基于特征的医学数据库处理方法

Proceedings of the International Conference on Advances in Information Communication Technology & Computing Pub Date : 2016-08-12 DOI:10.1145/2979779.2979873

Ritu Chauhan, Harleen Kaur, Sukrati Sharma

{"title":"基于特征的医学数据库处理方法","authors":"Ritu Chauhan, Harleen Kaur, Sukrati Sharma","doi":"10.1145/2979779.2979873","DOIUrl":null,"url":null,"abstract":"Medical data mining is an emerging field employed to discover hidden knowledge within the large datasets for early medical diagnosis of disease. Usually, large databases comprise of numerous features which may have missing values, noise and outliers. However, such features can mislead to future medical diagnosis. Moreover to deal with irrelevant and redundant features among large databases, proper pre processing data techniques needs be applied. In, past studies data mining technique such as feature selection is efficiently applied to deal with irrelevant, noisy and redundant features. This paper explains application of data mining techniques using feature selection for pancreatic cancer patients to conduct machine learning studies on collected patient records. We have evaluated different feature selection techniques such as Correlation-Based Filter Method (CFS) and Wrapper Subset Evaluation using Naive Bayes and J48 (an implementation of C4.5) classifier on medical databases to analyze varied data mining algorithms which can effectively classify medical data for future medical diagnosis. Further, experimental techniques have been used to measure the effectiveness and efficiency of feature selection algorithms. The experimental analysis conducted has proven beneficiary to determine machine learning methods for effective analysis of pancreatic cancer diagnosis.","PeriodicalId":298730,"journal":{"name":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Feature Based Approach for Medical Databases\",\"authors\":\"Ritu Chauhan, Harleen Kaur, Sukrati Sharma\",\"doi\":\"10.1145/2979779.2979873\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Medical data mining is an emerging field employed to discover hidden knowledge within the large datasets for early medical diagnosis of disease. Usually, large databases comprise of numerous features which may have missing values, noise and outliers. However, such features can mislead to future medical diagnosis. Moreover to deal with irrelevant and redundant features among large databases, proper pre processing data techniques needs be applied. In, past studies data mining technique such as feature selection is efficiently applied to deal with irrelevant, noisy and redundant features. This paper explains application of data mining techniques using feature selection for pancreatic cancer patients to conduct machine learning studies on collected patient records. We have evaluated different feature selection techniques such as Correlation-Based Filter Method (CFS) and Wrapper Subset Evaluation using Naive Bayes and J48 (an implementation of C4.5) classifier on medical databases to analyze varied data mining algorithms which can effectively classify medical data for future medical diagnosis. Further, experimental techniques have been used to measure the effectiveness and efficiency of feature selection algorithms. The experimental analysis conducted has proven beneficiary to determine machine learning methods for effective analysis of pancreatic cancer diagnosis.\",\"PeriodicalId\":298730,\"journal\":{\"name\":\"Proceedings of the International Conference on Advances in Information Communication Technology & Computing\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Advances in Information Communication Technology & Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2979779.2979873\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2979779.2979873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

医学数据挖掘是一个新兴的领域，用于发现隐藏在大数据集中的知识，用于疾病的早期医学诊断。通常，大型数据库包含许多特征，这些特征可能有缺失值、噪声和异常值。然而，这些特征可能会误导未来的医学诊断。此外，为了处理大型数据库中不相关和冗余的特征，需要采用适当的数据预处理技术。在过去的研究中，数据挖掘技术如特征选择被有效地用于处理不相关、有噪声和冗余的特征。本文介绍了数据挖掘技术在胰腺癌患者特征选择中的应用，对收集到的患者记录进行机器学习研究。我们在医疗数据库上评估了不同的特征选择技术，如基于关联的过滤方法(CFS)和使用朴素贝叶斯和J48 (C4.5的实现)分类器的包装子集评估，以分析各种数据挖掘算法，这些算法可以有效地对医疗数据进行分类，为未来的医疗诊断提供帮助。此外，还利用实验技术来衡量特征选择算法的有效性和效率。所进行的实验分析已被证明有利于确定有效分析胰腺癌诊断的机器学习方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Feature Based Approach for Medical Databases

Medical data mining is an emerging field employed to discover hidden knowledge within the large datasets for early medical diagnosis of disease. Usually, large databases comprise of numerous features which may have missing values, noise and outliers. However, such features can mislead to future medical diagnosis. Moreover to deal with irrelevant and redundant features among large databases, proper pre processing data techniques needs be applied. In, past studies data mining technique such as feature selection is efficiently applied to deal with irrelevant, noisy and redundant features. This paper explains application of data mining techniques using feature selection for pancreatic cancer patients to conduct machine learning studies on collected patient records. We have evaluated different feature selection techniques such as Correlation-Based Filter Method (CFS) and Wrapper Subset Evaluation using Naive Bayes and J48 (an implementation of C4.5) classifier on medical databases to analyze varied data mining algorithms which can effectively classify medical data for future medical diagnosis. Further, experimental techniques have been used to measure the effectiveness and efficiency of feature selection algorithms. The experimental analysis conducted has proven beneficiary to determine machine learning methods for effective analysis of pancreatic cancer diagnosis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on Advances in Information Communication Technology & Computing

自引率

0.00%

发文量