属性约简和决策树修剪简化肝纤维化预测算法:队列研究

M. Mabrouk, Abubakr Awad, H. Shousha, Wafaa Alake, A. Salama, T. Awad
{"title":"属性约简和决策树修剪简化肝纤维化预测算法:队列研究","authors":"M. Mabrouk, Abubakr Awad, H. Shousha, Wafaa Alake, A. Salama, T. Awad","doi":"10.5121/CSIT.2019.90927","DOIUrl":null,"url":null,"abstract":"Background: Assessment of liver fibrosis is a vital need for enabling therapeutic decisions and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the definitive investigation for assessing the stage of liver fibrosis but it carries several limitations. FIB-4 and APRI also have a limited accuracy. The National Committee for Control of Viral Hepatitis (NCCVH) in Egypt has supplied a valuable pool of electronic patients’ data that data mining techniques can analyze to disclose hidden patterns, trends leading to the evolution of predictive algorithms. Aim: to collaborate with physicians to develop a novel reliable, easy to comprehend noninvasive model to predict the stage of liver fibrosis utilizing routine workup, without imposing extra costs for additional examinations especially in areas with limited resources like Egypt. Methods: This multi-centered retrospective study included baseline demographic, laboratory, and histopathological data of 69106 patients with chronic hepatitis C. We started by data collection preprocessing, cleansing and formatting for knowledge discovery of useful information from Electronic Health Records EHRs. Data mining has been used to build a decision tree (Reduced Error Pruning tree (REP tree)) with 10-fold internal cross-validation. Histopathology results were used to assess accuracy for fibrosis stages. Machine learning feature selection and reduction (CfsSubseteval / best first) reduced the initial number of input features (N=15) to the most relevant ones (N=6) for developing the prediction model. Results: In this study, 32419 patients had F(0-1), 25073 had F(2) and 11615 had F(3-4). FIB-4 and APRI revalidation in our study showed low accuracy and high discordance with biopsy results, with overall AUC 0.68 and 0.58 respectively. Out of 15 attributes machine learning selected Age, AFP, AST, glucose, albumin, and platelet as the most relevant attributes. Results for REP tree indicated an overall classification accuracy up to 70% and ROC Area 0.74 which was not nearly affected by attribute reduction, and pruning . However attribute reduction, and tree pruning were associated with simpler model easy to understand by physician with less time for execution. Conclusion: This study we had the chance to study a large cohort of 69106 chronic hepatitis patients with available liver biopsy results to revise and validate the accuracy of FIB-4 and APRI. This study represents the collaboration between computer scientist and hepatologists to provide clinicians with an accurate novel and reliable, noninvasive model to predict the stage of liver fibrosis.","PeriodicalId":248929,"journal":{"name":"9th International Conference on Computer Science, Engineering and Applications (CCSEA 2019)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Attribute Reduction and Decision Tree Pruning to Simplify Liver Fibrosis Prediction Algorithms A Cohort Study\",\"authors\":\"M. Mabrouk, Abubakr Awad, H. Shousha, Wafaa Alake, A. Salama, T. Awad\",\"doi\":\"10.5121/CSIT.2019.90927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Assessment of liver fibrosis is a vital need for enabling therapeutic decisions and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the definitive investigation for assessing the stage of liver fibrosis but it carries several limitations. FIB-4 and APRI also have a limited accuracy. The National Committee for Control of Viral Hepatitis (NCCVH) in Egypt has supplied a valuable pool of electronic patients’ data that data mining techniques can analyze to disclose hidden patterns, trends leading to the evolution of predictive algorithms. Aim: to collaborate with physicians to develop a novel reliable, easy to comprehend noninvasive model to predict the stage of liver fibrosis utilizing routine workup, without imposing extra costs for additional examinations especially in areas with limited resources like Egypt. Methods: This multi-centered retrospective study included baseline demographic, laboratory, and histopathological data of 69106 patients with chronic hepatitis C. We started by data collection preprocessing, cleansing and formatting for knowledge discovery of useful information from Electronic Health Records EHRs. Data mining has been used to build a decision tree (Reduced Error Pruning tree (REP tree)) with 10-fold internal cross-validation. Histopathology results were used to assess accuracy for fibrosis stages. Machine learning feature selection and reduction (CfsSubseteval / best first) reduced the initial number of input features (N=15) to the most relevant ones (N=6) for developing the prediction model. Results: In this study, 32419 patients had F(0-1), 25073 had F(2) and 11615 had F(3-4). FIB-4 and APRI revalidation in our study showed low accuracy and high discordance with biopsy results, with overall AUC 0.68 and 0.58 respectively. Out of 15 attributes machine learning selected Age, AFP, AST, glucose, albumin, and platelet as the most relevant attributes. Results for REP tree indicated an overall classification accuracy up to 70% and ROC Area 0.74 which was not nearly affected by attribute reduction, and pruning . However attribute reduction, and tree pruning were associated with simpler model easy to understand by physician with less time for execution. Conclusion: This study we had the chance to study a large cohort of 69106 chronic hepatitis patients with available liver biopsy results to revise and validate the accuracy of FIB-4 and APRI. This study represents the collaboration between computer scientist and hepatologists to provide clinicians with an accurate novel and reliable, noninvasive model to predict the stage of liver fibrosis.\",\"PeriodicalId\":248929,\"journal\":{\"name\":\"9th International Conference on Computer Science, Engineering and Applications (CCSEA 2019)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"9th International Conference on Computer Science, Engineering and Applications (CCSEA 2019)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/CSIT.2019.90927\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th International Conference on Computer Science, Engineering and Applications (CCSEA 2019)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/CSIT.2019.90927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

背景:肝纤维化的评估对慢性肝炎的治疗决策和预后评估至关重要。肝活检被认为是评估肝纤维化分期的决定性调查,但它有一些局限性。FIB-4和APRI也有一定的准确性。埃及病毒性肝炎控制国家委员会(NCCVH)提供了一个有价值的电子患者数据库,数据挖掘技术可以对其进行分析,以揭示隐藏的模式和趋势,从而导致预测算法的发展。目的:与医生合作开发一种新颖可靠、易于理解的无创模型,利用常规检查来预测肝纤维化的阶段,而不会增加额外检查的额外费用,特别是在资源有限的地区,如埃及。方法:本多中心回顾性研究包括69106例慢性丙型肝炎患者的基线人口统计学、实验室和组织病理学数据。我们从数据收集、预处理、清理和格式化开始,以便从电子健康记录EHRs中发现有用的信息。数据挖掘被用于构建具有10倍内部交叉验证的决策树(减少错误修剪树(REP树))。组织病理学结果用于评估纤维化分期的准确性。机器学习特征选择与约简(CfsSubseteval / best first)将初始输入特征(N=15)减少到最相关的特征(N=6),用于开发预测模型。结果:本研究中F(0-1) 32419例,F(2) 25073例,F(3-4) 11615例。本研究中FIB-4和APRI再验证的准确性较低,与活检结果高度不一致,总AUC分别为0.68和0.58。在15个属性中,机器学习选择年龄、AFP、AST、葡萄糖、白蛋白和血小板作为最相关的属性。结果表明,REP树总体分类精度达70%,ROC Area 0.74,属性约简和剪枝对分类精度影响不大。而属性约简和树修剪的模型更简单,易于医生理解,执行时间更短。结论:在这项研究中,我们有机会对69106名慢性肝炎患者进行大队列研究,这些患者有可用的肝活检结果,以修正和验证FIB-4和APRI的准确性。这项研究代表了计算机科学家和肝病学家之间的合作,为临床医生提供了一个准确、新颖、可靠、无创的模型来预测肝纤维化的分期。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Attribute Reduction and Decision Tree Pruning to Simplify Liver Fibrosis Prediction Algorithms A Cohort Study
Background: Assessment of liver fibrosis is a vital need for enabling therapeutic decisions and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the definitive investigation for assessing the stage of liver fibrosis but it carries several limitations. FIB-4 and APRI also have a limited accuracy. The National Committee for Control of Viral Hepatitis (NCCVH) in Egypt has supplied a valuable pool of electronic patients’ data that data mining techniques can analyze to disclose hidden patterns, trends leading to the evolution of predictive algorithms. Aim: to collaborate with physicians to develop a novel reliable, easy to comprehend noninvasive model to predict the stage of liver fibrosis utilizing routine workup, without imposing extra costs for additional examinations especially in areas with limited resources like Egypt. Methods: This multi-centered retrospective study included baseline demographic, laboratory, and histopathological data of 69106 patients with chronic hepatitis C. We started by data collection preprocessing, cleansing and formatting for knowledge discovery of useful information from Electronic Health Records EHRs. Data mining has been used to build a decision tree (Reduced Error Pruning tree (REP tree)) with 10-fold internal cross-validation. Histopathology results were used to assess accuracy for fibrosis stages. Machine learning feature selection and reduction (CfsSubseteval / best first) reduced the initial number of input features (N=15) to the most relevant ones (N=6) for developing the prediction model. Results: In this study, 32419 patients had F(0-1), 25073 had F(2) and 11615 had F(3-4). FIB-4 and APRI revalidation in our study showed low accuracy and high discordance with biopsy results, with overall AUC 0.68 and 0.58 respectively. Out of 15 attributes machine learning selected Age, AFP, AST, glucose, albumin, and platelet as the most relevant attributes. Results for REP tree indicated an overall classification accuracy up to 70% and ROC Area 0.74 which was not nearly affected by attribute reduction, and pruning . However attribute reduction, and tree pruning were associated with simpler model easy to understand by physician with less time for execution. Conclusion: This study we had the chance to study a large cohort of 69106 chronic hepatitis patients with available liver biopsy results to revise and validate the accuracy of FIB-4 and APRI. This study represents the collaboration between computer scientist and hepatologists to provide clinicians with an accurate novel and reliable, noninvasive model to predict the stage of liver fibrosis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Context-Aware Trust-Based Access Control for Ubiquitous Systems An Innovative Approach to User Interface Engineering Attribute Reduction and Decision Tree Pruning to Simplify Liver Fibrosis Prediction Algorithms A Cohort Study Ensemble learning using frequent itemset mining for anomaly detection Security Considerations for Edge Computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1