Statistical Machine Learning Approaches to Liver Disease Prediction

Livers Pub Date : 2021-12-01 DOI:10.3390/livers1040023
Fahad B. Mostafa, E. Hasan, Morgan R. Williamson, Hafiz T A Khan
{"title":"Statistical Machine Learning Approaches to Liver Disease Prediction","authors":"Fahad B. Mostafa, E. Hasan, Morgan R. Williamson, Hafiz T A Khan","doi":"10.3390/livers1040023","DOIUrl":null,"url":null,"abstract":"Medical diagnoses have important implications for improving patient care, research, and policy. For a medical diagnosis, health professionals use different kinds of pathological methods to make decisions on medical reports in terms of the patients’ medical conditions. Recently, clinicians have been actively engaged in improving medical diagnoses. The use of artificial intelligence and machine learning in combination with clinical findings has further improved disease detection. In the modern era, with the advantage of computers and technologies, one can collect data and visualize many hidden outcomes such as dealing with missing data in medical research. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning (ML), data-driven algorithms can be utilized to validate existing methods and help researchers to make potential new decisions. The purpose of this study was to extract significant predictors for liver disease from the medical analysis of 615 humans using ML algorithms. Data visualizations were implemented to reveal significant findings such as missing values. Multiple imputations by chained equations (MICEs) were applied to generate missing data points, and principal component analysis (PCA) was used to reduce the dimensionality. Variable importance ranking using the Gini index was implemented to verify significant predictors obtained from the PCA. Training data (ntrain=399) for learning and testing data (ntest=216) in the ML methods were used for predicting classifications. The study compared binary classifier machine learning algorithms (i.e., artificial neural network, random forest (RF), and support vector machine), which were utilized on a published liver disease data set to classify individuals with liver diseases, which will allow health professionals to make a better diagnosis. The synthetic minority oversampling technique was applied to oversample the minority class to regulate overfitting problems. The RF significantly contributed (p<0.001) to a higher accuracy score of 98.14% compared to the other methods. Thus, this suggests that ML methods predict liver disease by incorporating the risk factors, which may improve the inference-based diagnosis of patients.","PeriodicalId":74083,"journal":{"name":"Livers","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Livers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/livers1040023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Medical diagnoses have important implications for improving patient care, research, and policy. For a medical diagnosis, health professionals use different kinds of pathological methods to make decisions on medical reports in terms of the patients’ medical conditions. Recently, clinicians have been actively engaged in improving medical diagnoses. The use of artificial intelligence and machine learning in combination with clinical findings has further improved disease detection. In the modern era, with the advantage of computers and technologies, one can collect data and visualize many hidden outcomes such as dealing with missing data in medical research. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning (ML), data-driven algorithms can be utilized to validate existing methods and help researchers to make potential new decisions. The purpose of this study was to extract significant predictors for liver disease from the medical analysis of 615 humans using ML algorithms. Data visualizations were implemented to reveal significant findings such as missing values. Multiple imputations by chained equations (MICEs) were applied to generate missing data points, and principal component analysis (PCA) was used to reduce the dimensionality. Variable importance ranking using the Gini index was implemented to verify significant predictors obtained from the PCA. Training data (ntrain=399) for learning and testing data (ntest=216) in the ML methods were used for predicting classifications. The study compared binary classifier machine learning algorithms (i.e., artificial neural network, random forest (RF), and support vector machine), which were utilized on a published liver disease data set to classify individuals with liver diseases, which will allow health professionals to make a better diagnosis. The synthetic minority oversampling technique was applied to oversample the minority class to regulate overfitting problems. The RF significantly contributed (p<0.001) to a higher accuracy score of 98.14% compared to the other methods. Thus, this suggests that ML methods predict liver disease by incorporating the risk factors, which may improve the inference-based diagnosis of patients.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
肝脏疾病预测的统计机器学习方法
医学诊断对改善患者护理、研究和政策具有重要意义。对于医疗诊断,卫生专业人员使用不同类型的病理学方法,根据患者的医疗状况对医疗报告做出决定。最近,临床医生积极参与改进医学诊断。人工智能和机器学习的使用与临床发现相结合,进一步改进了疾病检测。在现代,借助计算机和技术的优势,人们可以收集数据并可视化许多隐藏的结果,例如处理医学研究中缺失的数据。基于特定问题的统计机器学习算法可以帮助人们做出决策。机器学习(ML),数据驱动的算法可以用来验证现有的方法,并帮助研究人员做出潜在的新决策。本研究的目的是使用ML算法从615人的医学分析中提取肝病的重要预测因素。实施了数据可视化,以揭示重大发现,如缺失值。采用链式方程(MICE)的多重输入生成缺失数据点,并采用主成分分析(PCA)降维。使用基尼指数进行变量重要性排序,以验证从主成分分析中获得的显著预测因素。ML方法中用于学习的训练数据(ntrain=399)和测试数据(ntest=216)用于预测分类。该研究比较了二元分类器机器学习算法(即人工神经网络、随机森林(RF)和支持向量机),这些算法被用于已发表的肝病数据集,对肝病患者进行分类,这将使卫生专业人员能够做出更好的诊断。将合成少数过采样技术应用于少数类的过采样,以调节过拟合问题。与其他方法相比,射频显著提高了98.14%的准确率(p<0.001)。因此,这表明ML方法通过结合风险因素来预测肝病,这可能会提高对患者的推断诊断。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.00
自引率
0.00%
发文量
0
期刊最新文献
Targeting Autophagy for Acetaminophen-Induced Liver Injury: An Update. Understanding Macrophage Complexity in Metabolic Dysfunction-Associated Steatotic Liver Disease: Transitioning from the M1/M2 Paradigm to Spatial Dynamics. Lobar and Segmental Atrophy of the Liver: Differential Diagnoses and Treatments Obliterative Portal Venopathy during Estrogen Therapy in a Transgender Woman: A Case Report Understanding the Liver’s Role in the Clearance of Aβ40
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1