机器学习方法在临床研究中的应用

Biometrics & biostatistics international journal Pub Date : 2020-05-02 DOI:10.15406/bbij.2020.09.00305

Chen Qian, Jayesh P. Rai, Jianmin Pan, A. Bhatnagar, C. McClain, S. Rai

{"title":"机器学习方法在临床研究中的应用","authors":"Chen Qian, Jayesh P. Rai, Jianmin Pan, A. Bhatnagar, C. McClain, S. Rai","doi":"10.15406/bbij.2020.09.00305","DOIUrl":null,"url":null,"abstract":"Machine learning has been a trending topic for which almost every research area would like to incorporate some of the technique in their studies. In this paper, we demonstrate several machine learning models using two different data sets. One data set is the thermograms time series data on a cancer study that was conducted at the University of Louisville Hospital, and the other set is from the world-renowned Framingham Heart Study. Thermograms can be used to determine a patient’s health status, yet the difficulty of analyzing such a high-dimensional dataset makes it rarely applied, especially in cancer research. Previously, Rai et al.1 proposed an approach for data reduction along with comparison between parametric method, non-parametric method (KNN), and semiparametric method (DTW-KNN) for group classification. They concluded that the performance of two-group classification is better than the three-group classification. In addition, the classifications between types of cancer are somewhat challenging. The Framingham Heart Study is a famous longitudinal dataset which includes risk factors that could potentially lead to the heart disease. Previously, Weng et al.2 and Alaa et al.3 concluded that machine learning could significantly improve the accuracy of cardiovascular risk prediction. Since the original Framingham data have been thoroughly analyzed, it would be interesting to see how machine learning models could improve prediction. In this manuscript, we further analyze both the thermogram and the Framingham Heart Study datasets with several learning models such as gradient boosting, neural network, and random forest by using SAS Visual Data Mining and Machine Learning on SAS Viya. Each method is briefly discussed along with a model comparison. Based on the Youden’s index and misclassification rate, we select the best learning model. For big data inference, SAS Visual Data Mining and Machine Learning on SAS Viya, a cloud computing and structured statistical solution, may become a choice of computing.","PeriodicalId":90455,"journal":{"name":"Biometrics & biostatistics international journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Target classification using machine learning approaches with applications to clinical studies\",\"authors\":\"Chen Qian, Jayesh P. Rai, Jianmin Pan, A. Bhatnagar, C. McClain, S. Rai\",\"doi\":\"10.15406/bbij.2020.09.00305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning has been a trending topic for which almost every research area would like to incorporate some of the technique in their studies. In this paper, we demonstrate several machine learning models using two different data sets. One data set is the thermograms time series data on a cancer study that was conducted at the University of Louisville Hospital, and the other set is from the world-renowned Framingham Heart Study. Thermograms can be used to determine a patient’s health status, yet the difficulty of analyzing such a high-dimensional dataset makes it rarely applied, especially in cancer research. Previously, Rai et al.1 proposed an approach for data reduction along with comparison between parametric method, non-parametric method (KNN), and semiparametric method (DTW-KNN) for group classification. They concluded that the performance of two-group classification is better than the three-group classification. In addition, the classifications between types of cancer are somewhat challenging. The Framingham Heart Study is a famous longitudinal dataset which includes risk factors that could potentially lead to the heart disease. Previously, Weng et al.2 and Alaa et al.3 concluded that machine learning could significantly improve the accuracy of cardiovascular risk prediction. Since the original Framingham data have been thoroughly analyzed, it would be interesting to see how machine learning models could improve prediction. In this manuscript, we further analyze both the thermogram and the Framingham Heart Study datasets with several learning models such as gradient boosting, neural network, and random forest by using SAS Visual Data Mining and Machine Learning on SAS Viya. Each method is briefly discussed along with a model comparison. Based on the Youden’s index and misclassification rate, we select the best learning model. For big data inference, SAS Visual Data Mining and Machine Learning on SAS Viya, a cloud computing and structured statistical solution, may become a choice of computing.\",\"PeriodicalId\":90455,\"journal\":{\"name\":\"Biometrics & biostatistics international journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrics & biostatistics international journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15406/bbij.2020.09.00305\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrics & biostatistics international journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15406/bbij.2020.09.00305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

机器学习已经成为一个热门话题，几乎每个研究领域都希望在他们的研究中纳入一些技术。在本文中，我们使用两个不同的数据集演示了几个机器学习模型。一组数据是在路易斯维尔大学医院进行的一项癌症研究的热成像时间序列数据，另一组来自世界著名的弗雷明汉心脏研究。热像图可用于确定患者的健康状况，但分析这种高维数据集的难度使其很少应用，特别是在癌症研究中。之前，Rai et al.1提出了一种数据约简方法，并比较了参数方法、非参数方法(KNN)和半参数方法(dww -KNN)进行分组分类。他们得出结论，两组分类的表现优于三组分类。此外，癌症类型之间的分类有些挑战性。弗雷明汉心脏研究是一个著名的纵向数据集，其中包括可能导致心脏病的风险因素。此前，Weng et al.2和Alaa et al.3得出结论，机器学习可以显著提高心血管风险预测的准确性。由于原始的Framingham数据已经被彻底分析过，所以看看机器学习模型如何改进预测将是一件很有趣的事情。在本文中，我们使用SAS可视化数据挖掘和SAS Viya上的机器学习，利用梯度增强、神经网络和随机森林等几种学习模型进一步分析了热像图和Framingham心脏研究数据集。每种方法都进行了简要讨论，并进行了模型比较。基于约登指数和误分类率，选择最佳学习模型。对于大数据推理，基于SAS Viya的SAS可视化数据挖掘和机器学习，这是一种云计算和结构化统计解决方案，可能成为计算的选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Target classification using machine learning approaches with applications to clinical studies

Machine learning has been a trending topic for which almost every research area would like to incorporate some of the technique in their studies. In this paper, we demonstrate several machine learning models using two different data sets. One data set is the thermograms time series data on a cancer study that was conducted at the University of Louisville Hospital, and the other set is from the world-renowned Framingham Heart Study. Thermograms can be used to determine a patient’s health status, yet the difficulty of analyzing such a high-dimensional dataset makes it rarely applied, especially in cancer research. Previously, Rai et al.1 proposed an approach for data reduction along with comparison between parametric method, non-parametric method (KNN), and semiparametric method (DTW-KNN) for group classification. They concluded that the performance of two-group classification is better than the three-group classification. In addition, the classifications between types of cancer are somewhat challenging. The Framingham Heart Study is a famous longitudinal dataset which includes risk factors that could potentially lead to the heart disease. Previously, Weng et al.2 and Alaa et al.3 concluded that machine learning could significantly improve the accuracy of cardiovascular risk prediction. Since the original Framingham data have been thoroughly analyzed, it would be interesting to see how machine learning models could improve prediction. In this manuscript, we further analyze both the thermogram and the Framingham Heart Study datasets with several learning models such as gradient boosting, neural network, and random forest by using SAS Visual Data Mining and Machine Learning on SAS Viya. Each method is briefly discussed along with a model comparison. Based on the Youden’s index and misclassification rate, we select the best learning model. For big data inference, SAS Visual Data Mining and Machine Learning on SAS Viya, a cloud computing and structured statistical solution, may become a choice of computing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biometrics & biostatistics international journal

自引率

0.00%

发文量