机器学习方法在临床研究中的应用

Chen Qian, Jayesh P. Rai, Jianmin Pan, A. Bhatnagar, C. McClain, S. Rai
{"title":"机器学习方法在临床研究中的应用","authors":"Chen Qian, Jayesh P. Rai, Jianmin Pan, A. Bhatnagar, C. McClain, S. Rai","doi":"10.15406/bbij.2020.09.00305","DOIUrl":null,"url":null,"abstract":"Machine learning has been a trending topic for which almost every research area would like to incorporate some of the technique in their studies. In this paper, we demonstrate several machine learning models using two different data sets. One data set is the thermograms time series data on a cancer study that was conducted at the University of Louisville Hospital, and the other set is from the world-renowned Framingham Heart Study. Thermograms can be used to determine a patient’s health status, yet the difficulty of analyzing such a high-dimensional dataset makes it rarely applied, especially in cancer research. Previously, Rai et al.1 proposed an approach for data reduction along with comparison between parametric method, non-parametric method (KNN), and semiparametric method (DTW-KNN) for group classification. They concluded that the performance of two-group classification is better than the three-group classification. In addition, the classifications between types of cancer are somewhat challenging. The Framingham Heart Study is a famous longitudinal dataset which includes risk factors that could potentially lead to the heart disease. Previously, Weng et al.2 and Alaa et al.3 concluded that machine learning could significantly improve the accuracy of cardiovascular risk prediction. Since the original Framingham data have been thoroughly analyzed, it would be interesting to see how machine learning models could improve prediction. In this manuscript, we further analyze both the thermogram and the Framingham Heart Study datasets with several learning models such as gradient boosting, neural network, and random forest by using SAS Visual Data Mining and Machine Learning on SAS Viya. Each method is briefly discussed along with a model comparison. Based on the Youden’s index and misclassification rate, we select the best learning model. For big data inference, SAS Visual Data Mining and Machine Learning on SAS Viya, a cloud computing and structured statistical solution, may become a choice of computing.","PeriodicalId":90455,"journal":{"name":"Biometrics & biostatistics international journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Target classification using machine learning approaches with applications to clinical studies\",\"authors\":\"Chen Qian, Jayesh P. Rai, Jianmin Pan, A. Bhatnagar, C. McClain, S. Rai\",\"doi\":\"10.15406/bbij.2020.09.00305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning has been a trending topic for which almost every research area would like to incorporate some of the technique in their studies. In this paper, we demonstrate several machine learning models using two different data sets. One data set is the thermograms time series data on a cancer study that was conducted at the University of Louisville Hospital, and the other set is from the world-renowned Framingham Heart Study. Thermograms can be used to determine a patient’s health status, yet the difficulty of analyzing such a high-dimensional dataset makes it rarely applied, especially in cancer research. Previously, Rai et al.1 proposed an approach for data reduction along with comparison between parametric method, non-parametric method (KNN), and semiparametric method (DTW-KNN) for group classification. They concluded that the performance of two-group classification is better than the three-group classification. In addition, the classifications between types of cancer are somewhat challenging. The Framingham Heart Study is a famous longitudinal dataset which includes risk factors that could potentially lead to the heart disease. Previously, Weng et al.2 and Alaa et al.3 concluded that machine learning could significantly improve the accuracy of cardiovascular risk prediction. Since the original Framingham data have been thoroughly analyzed, it would be interesting to see how machine learning models could improve prediction. In this manuscript, we further analyze both the thermogram and the Framingham Heart Study datasets with several learning models such as gradient boosting, neural network, and random forest by using SAS Visual Data Mining and Machine Learning on SAS Viya. Each method is briefly discussed along with a model comparison. Based on the Youden’s index and misclassification rate, we select the best learning model. For big data inference, SAS Visual Data Mining and Machine Learning on SAS Viya, a cloud computing and structured statistical solution, may become a choice of computing.\",\"PeriodicalId\":90455,\"journal\":{\"name\":\"Biometrics & biostatistics international journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrics & biostatistics international journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15406/bbij.2020.09.00305\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics & biostatistics international journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15406/bbij.2020.09.00305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

机器学习已经成为一个热门话题,几乎每个研究领域都希望在他们的研究中纳入一些技术。在本文中,我们使用两个不同的数据集演示了几个机器学习模型。一组数据是在路易斯维尔大学医院进行的一项癌症研究的热成像时间序列数据,另一组来自世界著名的弗雷明汉心脏研究。热像图可用于确定患者的健康状况,但分析这种高维数据集的难度使其很少应用,特别是在癌症研究中。之前,Rai et al.1提出了一种数据约简方法,并比较了参数方法、非参数方法(KNN)和半参数方法(dww -KNN)进行分组分类。他们得出结论,两组分类的表现优于三组分类。此外,癌症类型之间的分类有些挑战性。弗雷明汉心脏研究是一个著名的纵向数据集,其中包括可能导致心脏病的风险因素。此前,Weng et al.2和Alaa et al.3得出结论,机器学习可以显著提高心血管风险预测的准确性。由于原始的Framingham数据已经被彻底分析过,所以看看机器学习模型如何改进预测将是一件很有趣的事情。在本文中,我们使用SAS可视化数据挖掘和SAS Viya上的机器学习,利用梯度增强、神经网络和随机森林等几种学习模型进一步分析了热像图和Framingham心脏研究数据集。每种方法都进行了简要讨论,并进行了模型比较。基于约登指数和误分类率,选择最佳学习模型。对于大数据推理,基于SAS Viya的SAS可视化数据挖掘和机器学习,这是一种云计算和结构化统计解决方案,可能成为计算的选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Target classification using machine learning approaches with applications to clinical studies
Machine learning has been a trending topic for which almost every research area would like to incorporate some of the technique in their studies. In this paper, we demonstrate several machine learning models using two different data sets. One data set is the thermograms time series data on a cancer study that was conducted at the University of Louisville Hospital, and the other set is from the world-renowned Framingham Heart Study. Thermograms can be used to determine a patient’s health status, yet the difficulty of analyzing such a high-dimensional dataset makes it rarely applied, especially in cancer research. Previously, Rai et al.1 proposed an approach for data reduction along with comparison between parametric method, non-parametric method (KNN), and semiparametric method (DTW-KNN) for group classification. They concluded that the performance of two-group classification is better than the three-group classification. In addition, the classifications between types of cancer are somewhat challenging. The Framingham Heart Study is a famous longitudinal dataset which includes risk factors that could potentially lead to the heart disease. Previously, Weng et al.2 and Alaa et al.3 concluded that machine learning could significantly improve the accuracy of cardiovascular risk prediction. Since the original Framingham data have been thoroughly analyzed, it would be interesting to see how machine learning models could improve prediction. In this manuscript, we further analyze both the thermogram and the Framingham Heart Study datasets with several learning models such as gradient boosting, neural network, and random forest by using SAS Visual Data Mining and Machine Learning on SAS Viya. Each method is briefly discussed along with a model comparison. Based on the Youden’s index and misclassification rate, we select the best learning model. For big data inference, SAS Visual Data Mining and Machine Learning on SAS Viya, a cloud computing and structured statistical solution, may become a choice of computing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A three-way multivariate data analysis: comparison of EU countries’ COVID-19 incidence trajectories from May 2020 to February 2021 Comparison of quota sampling and stratified random sampling A simple graphic method to assess correlation Forecasting homicides, rapes and counterfeiting currency: A case study in Sri Lanka Dynamics of Spruce budworms and single species competition models with bifurcation analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1