Zoish: A Novel Feature Selection Approach Leveraging Shapley Additive Values for Machine Learning Applications in Healthcare

Hossein Javedani Sadaei, Salvatore Loguercio, Mahdi Shafiei Neyestanak, Ali Torkamani, Daria Prilutsky
{"title":"Zoish: A Novel Feature Selection Approach Leveraging Shapley Additive Values for Machine Learning Applications in Healthcare","authors":"Hossein Javedani Sadaei, Salvatore Loguercio, Mahdi Shafiei Neyestanak, Ali Torkamani, Daria Prilutsky","doi":"10.1142/9789811286421_0007","DOIUrl":null,"url":null,"abstract":"In the intricate landscape of healthcare analytics, effective feature selection is a prerequisite for generating robust predictive models, especially given the common challenges of sample sizes and potential biases. Zoish uniquely addresses these issues by employing Shapley additive values—an idea rooted in cooperative game theory—to enable both transparent and automated feature selection. Unlike existing tools, Zoish is versatile, designed to seamlessly integrate with an array of machine learning libraries including scikit-learn, XGBoost, CatBoost, and imbalanced-learn. The distinct advantage of Zoish lies in its dual algorithmic approach for calculating Shapley values, allowing it to efficiently manage both large and small datasets. This adaptability renders it exceptionally suitable for a wide spectrum of healthcare-related tasks. The tool also places a strong emphasis on interpretability, providing comprehensive visualizations for analyzed features. Its customizable settings offer users fine-grained control over feature selection, thus optimizing for specific predictive objectives. This manuscript elucidates the mathematical framework underpinning Zoish and how it uniquely combines local and global feature selection into a single, streamlined process. To validate Zoish’s efficiency and adaptability, we present case studies in breast cancer prediction and Montreal Cognitive Assessment (MoCA) prediction in Parkinson’s disease, along with evaluations on 300 synthetic datasets. These applications underscore Zoish’s unparalleled performance in diverse healthcare contexts and against its counterparts.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789811286421_0007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

In the intricate landscape of healthcare analytics, effective feature selection is a prerequisite for generating robust predictive models, especially given the common challenges of sample sizes and potential biases. Zoish uniquely addresses these issues by employing Shapley additive values—an idea rooted in cooperative game theory—to enable both transparent and automated feature selection. Unlike existing tools, Zoish is versatile, designed to seamlessly integrate with an array of machine learning libraries including scikit-learn, XGBoost, CatBoost, and imbalanced-learn. The distinct advantage of Zoish lies in its dual algorithmic approach for calculating Shapley values, allowing it to efficiently manage both large and small datasets. This adaptability renders it exceptionally suitable for a wide spectrum of healthcare-related tasks. The tool also places a strong emphasis on interpretability, providing comprehensive visualizations for analyzed features. Its customizable settings offer users fine-grained control over feature selection, thus optimizing for specific predictive objectives. This manuscript elucidates the mathematical framework underpinning Zoish and how it uniquely combines local and global feature selection into a single, streamlined process. To validate Zoish’s efficiency and adaptability, we present case studies in breast cancer prediction and Montreal Cognitive Assessment (MoCA) prediction in Parkinson’s disease, along with evaluations on 300 synthetic datasets. These applications underscore Zoish’s unparalleled performance in diverse healthcare contexts and against its counterparts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Zoish:利用夏普利加法值为医疗保健领域的机器学习应用提供新颖的特征选择方法
在错综复杂的医疗分析领域,有效的特征选择是生成稳健预测模型的先决条件,尤其是考虑到样本量和潜在偏差等常见挑战。Zoish 采用夏普利加法值(Shapley additive values)--一种植根于合作博弈论的理念--实现了透明和自动的特征选择,从而独特地解决了这些问题。与现有工具不同的是,Zoish 功能多样,可与一系列机器学习库无缝集成,包括 scikit-learn、XGBoost、CatBoost 和 imbalanced-learn。Zoish 的显著优势在于其计算 Shapley 值的双重算法方法,使其能够高效地管理大型和小型数据集。这种适应性使其非常适合广泛的医疗保健相关任务。该工具还非常注重可解释性,为分析特征提供全面的可视化效果。它的可定制设置为用户提供了对特征选择的精细控制,从而优化了特定的预测目标。本手稿阐明了支撑 Zoish 的数学框架,以及它如何将局部和全局特征选择独特地结合到一个单一、精简的流程中。为了验证 Zoish 的效率和适应性,我们介绍了乳腺癌预测和帕金森病蒙特利尔认知评估(MoCA)预测的案例研究,以及对 300 个合成数据集的评估。这些应用凸显了 Zoish 在不同医疗环境中与同行相比无与伦比的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.50
自引率
0.00%
发文量
0
期刊最新文献
FedBrain: Federated Training of Graph Neural Networks for Connectome-based Brain Imaging Analysis. Generating new drug repurposing hypotheses using disease-specific hypergraphs. Impact of Measurement Noise on Genetic Association Studies of Cardiac Function. Imputation of race and ethnicity categories using genetic ancestry from real-world genomic testing data. intCC: An efficient weighted integrative consensus clustering of multimodal data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1