Gaussian process regression and classification using International Classification of Disease codes as covariates

IF 0.7 4区 数学 Q3 STATISTICS & PROBABILITY Stat Pub Date : 2023-10-07 DOI:10.1002/sta4.618
Sanvesh Srivastava, Zongyi Xu, Yunyi Li, W. Nick Street, Stephanie Gilbertson-White
{"title":"Gaussian process regression and classification using International Classification of Disease codes as covariates","authors":"Sanvesh Srivastava, Zongyi Xu, Yunyi Li, W. Nick Street, Stephanie Gilbertson-White","doi":"10.1002/sta4.618","DOIUrl":null,"url":null,"abstract":"In electronic health records (EHRs) data analysis, nonparametric regression and classification using International Classification of Disease (ICD) codes as covariates remain understudied. Automated methods have been developed over the years for predicting biomedical responses using EHRs, but relatively less attention has been paid to developing patient similarity measures that use ICD codes and chronic conditions, where a chronic condition is defined as a set of ICD codes. We address this problem by first developing a string kernel function for measuring the similarity between a pair of primary chronic conditions, represented as subsets of ICD codes. Second, we extend this similarity measure to a family of covariance functions on subsets of chronic conditions. This family is used in developing Gaussian process (GP) priors for Bayesian nonparametric regression and classification using diagnoses and other demographic information as covariates. Markov chain Monte Carlo (MCMC) algorithms are used for posterior inference and predictions. The proposed methods are tuning free, so they are ideal for automated prediction of biomedical responses depending on chronic conditions. We evaluate the practical performance of our method on EHR data collected from 1660 patients at the University of Iowa Hospitals and Clinics (UIHC) with six different primary cancer sites. Our method provides better sensitivity and specificity than its competitors in classifying different primary cancer sites and estimates the marginal associations between chronic conditions and primary cancer sites.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"27 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stat","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.618","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

In electronic health records (EHRs) data analysis, nonparametric regression and classification using International Classification of Disease (ICD) codes as covariates remain understudied. Automated methods have been developed over the years for predicting biomedical responses using EHRs, but relatively less attention has been paid to developing patient similarity measures that use ICD codes and chronic conditions, where a chronic condition is defined as a set of ICD codes. We address this problem by first developing a string kernel function for measuring the similarity between a pair of primary chronic conditions, represented as subsets of ICD codes. Second, we extend this similarity measure to a family of covariance functions on subsets of chronic conditions. This family is used in developing Gaussian process (GP) priors for Bayesian nonparametric regression and classification using diagnoses and other demographic information as covariates. Markov chain Monte Carlo (MCMC) algorithms are used for posterior inference and predictions. The proposed methods are tuning free, so they are ideal for automated prediction of biomedical responses depending on chronic conditions. We evaluate the practical performance of our method on EHR data collected from 1660 patients at the University of Iowa Hospitals and Clinics (UIHC) with six different primary cancer sites. Our method provides better sensitivity and specificity than its competitors in classifying different primary cancer sites and estimates the marginal associations between chronic conditions and primary cancer sites.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用国际疾病分类代码作为协变量的高斯过程回归和分类
在电子健康记录(EHRs)数据分析中,使用国际疾病分类(ICD)代码作为协变量的非参数回归和分类仍未得到充分研究。多年来,人们已经开发出自动化方法,利用电子病历预测生物医学反应,但相对较少关注使用ICD代码和慢性病开发患者相似性测量,其中慢性病被定义为一组ICD代码。为了解决这个问题,我们首先开发了一个字符串核函数,用于测量一对主要慢性疾病之间的相似性,表示为ICD代码的子集。其次,我们将这种相似性度量扩展到慢性病子集上的协方差函数族。该家族用于开发高斯过程(GP)先验,用于贝叶斯非参数回归和分类,使用诊断和其他人口统计信息作为协变量。马尔可夫链蒙特卡罗(MCMC)算法用于后验推理和预测。所提出的方法是免费调整的,因此它们是根据慢性疾病自动预测生物医学反应的理想选择。我们对来自爱荷华大学医院和诊所(UIHC)的6个不同原发癌症部位的1660名患者的电子病历数据进行了评估。我们的方法在分类不同的原发癌部位和估计慢性疾病与原发癌部位之间的边际关联方面提供了比其竞争对手更好的敏感性和特异性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Stat
Stat Decision Sciences-Statistics, Probability and Uncertainty
CiteScore
1.10
自引率
0.00%
发文量
85
期刊介绍: Stat is an innovative electronic journal for the rapid publication of novel and topical research results, publishing compact articles of the highest quality in all areas of statistical endeavour. Its purpose is to provide a means of rapid sharing of important new theoretical, methodological and applied research. Stat is a joint venture between the International Statistical Institute and Wiley-Blackwell. Stat is characterised by: • Speed - a high-quality review process that aims to reach a decision within 20 days of submission. • Concision - a maximum article length of 10 pages of text, not including references. • Supporting materials - inclusion of electronic supporting materials including graphs, video, software, data and images. • Scope - addresses all areas of statistics and interdisciplinary areas. Stat is a scientific journal for the international community of statisticians and researchers and practitioners in allied quantitative disciplines.
期刊最新文献
Communication‐Efficient Distributed Estimation of Causal Effects With High‐Dimensional Data A Joint Temporal Model for Hospitalizations and ICU Admissions Due to COVID‐19 in Quebec Bitcoin Price Prediction Using Deep Bayesian LSTM With Uncertainty Quantification: A Monte Carlo Dropout–Based Approach Exact interval estimation for three parameters subject to false positive misclassification Novel Closed‐Form Point Estimators for a Weighted Exponential Family Derived From Likelihood Equations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1