A Data-Driven Method for Selecting Optimal Models Based on Graphical Visualisation of Differences in Sequentially Fitted ROC Model Parameters

Q2 Computer Science Data Science Journal Pub Date : 2013-05-02 DOI:10.2481/dsj.WDS-045
Kassim S. Mwitondi, R. Moustafa, A. Hadi
{"title":"A Data-Driven Method for Selecting Optimal Models Based on Graphical Visualisation of Differences in Sequentially Fitted ROC Model Parameters","authors":"Kassim S. Mwitondi, R. Moustafa, A. Hadi","doi":"10.2481/dsj.WDS-045","DOIUrl":null,"url":null,"abstract":"Differences in modelling techniques and model performance assessments typically impinge on the quality of knowledge extraction from data. We propose an algorithm for determining optimal patterns in data by separately training and testing three decision tree models in the Pima Indians Diabetes and the Bupa Liver Disorders datasets. Model performance is assessed using ROC curves and the Youden Index. Moving differences between sequential fitted parameters are then extracted, and their respective probability density estimations are used to track their variability using an iterative graphical data visualisation technique developed for this purpose. Our results show that the proposed strategy separates the groups more robustly than the plain ROC/Youden approach, eliminates obscurity, and minimizes over-fitting. Further, the algorithm can easily be understood by non-specialists and demonstrates multi-disciplinary compliance.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2481/dsj.WDS-045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 8

Abstract

Differences in modelling techniques and model performance assessments typically impinge on the quality of knowledge extraction from data. We propose an algorithm for determining optimal patterns in data by separately training and testing three decision tree models in the Pima Indians Diabetes and the Bupa Liver Disorders datasets. Model performance is assessed using ROC curves and the Youden Index. Moving differences between sequential fitted parameters are then extracted, and their respective probability density estimations are used to track their variability using an iterative graphical data visualisation technique developed for this purpose. Our results show that the proposed strategy separates the groups more robustly than the plain ROC/Youden approach, eliminates obscurity, and minimizes over-fitting. Further, the algorithm can easily be understood by non-specialists and demonstrates multi-disciplinary compliance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于序列拟合ROC模型参数差异图形可视化的数据驱动选择最优模型的方法
建模技术和模型性能评估的差异通常会影响从数据中提取知识的质量。我们提出了一种算法,通过在皮马印第安人糖尿病和Bupa肝脏疾病数据集中分别训练和测试三种决策树模型来确定数据中的最佳模式。使用ROC曲线和约登指数评估模型性能。然后提取序列拟合参数之间的移动差异,并使用为此目的开发的迭代图形数据可视化技术,使用它们各自的概率密度估计来跟踪它们的可变性。我们的结果表明,所提出的策略比普通的ROC/Youden方法更稳健地分离组,消除了模糊性,并最大限度地减少了过度拟合。此外,该算法可以很容易地被非专业人员理解,并表现出多学科的遵从性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data Science Journal
Data Science Journal Computer Science-Computer Science (miscellaneous)
CiteScore
5.40
自引率
0.00%
发文量
17
审稿时长
10 weeks
期刊介绍: The Data Science Journal is a peer-reviewed electronic journal publishing papers on the management of data and databases in Science and Technology. Details can be found in the prospectus. The scope of the journal includes descriptions of data systems, their publication on the internet, applications and legal issues. All of the Sciences are covered, including the Physical Sciences, Engineering, the Geosciences and the Biosciences, along with Agriculture and the Medical Science. The journal publishes papers about data and data systems; it does not publish data or data compilations. However it may publish papers about methods of data compilation or analysis.
期刊最新文献
Data on the Margins – Data from LGBTIQ+ Populations in European Social Science Data Archives Insights on Sustainability of Earth Science Data Infrastructure Projects Using OpenBIS as Virtual Research Environment: An ELN-LIMS Open-Source Database Tool as a Framework within the CRC 1411 Design of Particulate Products Umbrella Data Management Plans to Integrate FAIR Data: Lessons From the ISIDORe and BY-COVID Consortia for Pandemic Preparedness The Launch of the <em>Data Science Journal</em>&nbsp;in 2002
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1