Breast Cancer Prediction Using Spark MLlib and ML Packages

P. D. Hung, Tran Duc Hanh, V. Diep
{"title":"Breast Cancer Prediction Using Spark MLlib and ML Packages","authors":"P. D. Hung, Tran Duc Hanh, V. Diep","doi":"10.1145/3309129.3309133","DOIUrl":null,"url":null,"abstract":"Nowadays, Machine Learning has been applied in variety aspects of life especially in health care. Classifications using Machine learning has been greatly improved in order to make predictions and to support doctors making diagnoses. Furthermore, human lives are changing with Big Data covering a wide of array of science knowledge and with Data Mining solving problems by analyzing data and discovering patterns in present databases. The prediction process is heavily data driven and therefore advanced machine learning techniques are often utilized. In this paper, we will take a look at what types experiment data are typically used, do preliminary analysis on them, and generate breast cancer prediction models - all with PySpark and its machine learning frameworks. Using a database with more than a hundred sets of data gathered in routine blood analysis, the accuracy rates of detection and classification are about 72% and 83% respectively.","PeriodicalId":326530,"journal":{"name":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","volume":"81 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3309129.3309133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

Nowadays, Machine Learning has been applied in variety aspects of life especially in health care. Classifications using Machine learning has been greatly improved in order to make predictions and to support doctors making diagnoses. Furthermore, human lives are changing with Big Data covering a wide of array of science knowledge and with Data Mining solving problems by analyzing data and discovering patterns in present databases. The prediction process is heavily data driven and therefore advanced machine learning techniques are often utilized. In this paper, we will take a look at what types experiment data are typically used, do preliminary analysis on them, and generate breast cancer prediction models - all with PySpark and its machine learning frameworks. Using a database with more than a hundred sets of data gathered in routine blood analysis, the accuracy rates of detection and classification are about 72% and 83% respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用Spark MLlib和ML包进行乳腺癌预测
如今,机器学习已经应用于生活的各个方面,特别是在医疗保健方面。为了做出预测和支持医生做出诊断,使用机器学习的分类已经得到了很大的改进。此外,大数据涵盖了广泛的科学知识,而数据挖掘通过分析数据和发现现有数据库中的模式来解决问题,人类的生活正在发生变化。预测过程在很大程度上是数据驱动的,因此经常使用先进的机器学习技术。在本文中,我们将看看通常使用什么类型的实验数据,对它们进行初步分析,并生成乳腺癌预测模型-所有这些都使用PySpark及其机器学习框架。利用百余组血常规分析数据的数据库,检测准确率约72%,分类准确率约83%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proceedings of the 5th International Conference on Bioinformatics Research and Applications A Study on Optimizing MarkDuplicate in Genome Sequencing Pipeline Biotable: A Tool to Extract Semantic Structure of Table in Biology Literature Screening Feasibility and Comparison of Deep Artificial Neural Networks Algorithms for Classification of Skin Lesions Identification of Viable Embryos Using Deep Learning for Medical Image
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1