使用排序模型预测糖尿病患者的停药:机器学习模型开发

JMIR bioinformatics and biotechnology Pub Date : 2022-09-23 DOI:10.2196/37951

Hisashi Kurasawa, Kayo Waki, Akihiro Chiba, Tomohisa Seki, Katsuyoshi Hayashi, Akinori Fujino, Tsuneyuki Haga, Takashi Noguchi, Kazuhiko Ohe

{"title":"使用排序模型预测糖尿病患者的停药:机器学习模型开发","authors":"Hisashi Kurasawa, Kayo Waki, Akihiro Chiba, Tomohisa Seki, Katsuyoshi Hayashi, Akinori Fujino, Tsuneyuki Haga, Takashi Noguchi, Kazuhiko Ohe","doi":"10.2196/37951","DOIUrl":null,"url":null,"abstract":"Background: Treatment discontinuation (TD) is one of the major prognostic issues in diabetes care, and several models have been proposed to predict a missed appointment that may lead to TD in patients with diabetes by using binary classification models for the early detection of TD and for providing intervention support for patients. However, as binary classification models output the probability of a missed appointment occurring within a predetermined period, they are limited in their ability to estimate the magnitude of TD risk in patients with inconsistent intervals between appointments, making it difficult to prioritize patients for whom intervention support should be provided.Objective: This study aimed to develop a machine-learned prediction model that can output a TD risk score defined by the length of time until TD and prioritize patients for intervention according to their TD risk.Methods: This model included patients with diagnostic codes indicative of diabetes at the University of Tokyo Hospital between September 3, 2012, and May 17, 2014. The model was internally validated with patients from the same hospital from May 18, 2014, to January 29, 2016. The data used in this study included 7551 patients who visited the hospital after January 1, 2004, and had diagnostic codes indicative of diabetes. In particular, data that were recorded in the electronic medical records between September 3, 2012, and January 29, 2016, were used. The main outcome was the TD of a patient, which was defined as missing a scheduled clinical appointment and having no hospital visits within 3 times the average number of days between the visits of the patient and within 60 days. The TD risk score was calculated by using the parameters derived from the machine-learned ranking model. The prediction capacity was evaluated by using test data with the C-index for the performance of ranking patients, area under the receiver operating characteristic curve, and area under the precision-recall curve for discrimination, in addition to a calibration plot.Results: The means (95% confidence limits) of the C-index, area under the receiver operating characteristic curve, and area under the precision-recall curve for the TD risk score were 0.749 (0.655, 0.823), 0.758 (0.649, 0.857), and 0.713 (0.554, 0.841), respectively. The observed and predicted probabilities were correlated with the calibration plots.Conclusions: A TD risk score was developed for patients with diabetes by combining a machine-learned method with electronic medical records. The score calculation can be integrated into medical records to identify patients at high risk of TD, which would be useful in supporting diabetes care and preventing TD.","PeriodicalId":73552,"journal":{"name":"JMIR bioinformatics and biotechnology","volume":" ","pages":"e37951"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135228/pdf/","citationCount":"0","resultStr":"{\"title\":\"Treatment Discontinuation Prediction in Patients With Diabetes Using a Ranking Model: Machine Learning Model Development.\",\"authors\":\"Hisashi Kurasawa, Kayo Waki, Akihiro Chiba, Tomohisa Seki, Katsuyoshi Hayashi, Akinori Fujino, Tsuneyuki Haga, Takashi Noguchi, Kazuhiko Ohe\",\"doi\":\"10.2196/37951\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Treatment discontinuation (TD) is one of the major prognostic issues in diabetes care, and several models have been proposed to predict a missed appointment that may lead to TD in patients with diabetes by using binary classification models for the early detection of TD and for providing intervention support for patients. However, as binary classification models output the probability of a missed appointment occurring within a predetermined period, they are limited in their ability to estimate the magnitude of TD risk in patients with inconsistent intervals between appointments, making it difficult to prioritize patients for whom intervention support should be provided.Objective: This study aimed to develop a machine-learned prediction model that can output a TD risk score defined by the length of time until TD and prioritize patients for intervention according to their TD risk.Methods: This model included patients with diagnostic codes indicative of diabetes at the University of Tokyo Hospital between September 3, 2012, and May 17, 2014. The model was internally validated with patients from the same hospital from May 18, 2014, to January 29, 2016. The data used in this study included 7551 patients who visited the hospital after January 1, 2004, and had diagnostic codes indicative of diabetes. In particular, data that were recorded in the electronic medical records between September 3, 2012, and January 29, 2016, were used. The main outcome was the TD of a patient, which was defined as missing a scheduled clinical appointment and having no hospital visits within 3 times the average number of days between the visits of the patient and within 60 days. The TD risk score was calculated by using the parameters derived from the machine-learned ranking model. The prediction capacity was evaluated by using test data with the C-index for the performance of ranking patients, area under the receiver operating characteristic curve, and area under the precision-recall curve for discrimination, in addition to a calibration plot.Results: The means (95% confidence limits) of the C-index, area under the receiver operating characteristic curve, and area under the precision-recall curve for the TD risk score were 0.749 (0.655, 0.823), 0.758 (0.649, 0.857), and 0.713 (0.554, 0.841), respectively. The observed and predicted probabilities were correlated with the calibration plots.Conclusions: A TD risk score was developed for patients with diabetes by combining a machine-learned method with electronic medical records. The score calculation can be integrated into medical records to identify patients at high risk of TD, which would be useful in supporting diabetes care and preventing TD.\",\"PeriodicalId\":73552,\"journal\":{\"name\":\"JMIR bioinformatics and biotechnology\",\"volume\":\" \",\"pages\":\"e37951\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11135228/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR bioinformatics and biotechnology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/37951\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR bioinformatics and biotechnology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/37951","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

停药（TD）是糖尿病护理中的主要预后问题之一，已经提出了几种模型，通过使用二元分类模型来早期检测TD并为患者提供干预支持，来预测可能导致糖尿病患者出现TD的错过预约。然而，由于二元分类模型输出了在预定时间内错过预约的概率，因此它们在估计预约间隔不一致的患者的TD风险大小的能力有限，因此很难优先考虑应该为其提供干预支持的患者。本研究旨在开发一种机器学习预测模型，该模型可以输出由到达TD的时间长度定义的TD风险评分，并根据患者的TD风险优先进行干预。该模型包括2012年9月3日至2014年5月17日期间在东京大学医院诊断出糖尿病的患者。该模型于2014年5月18日至2016年1月29日在同一家医院的患者中进行了内部验证。本研究中使用的数据包括7551名2004年1月1日后就诊的患者，他们的诊断代码表明患有糖尿病。特别是，使用了2012年9月3日至2016年1月29日期间记录在电子医疗记录中的数据。主要结果是患者的TD，它被定义为错过了预定的临床预约，并且在患者就诊之间平均天数的3倍内和60天内没有去医院就诊。TD风险评分是通过使用机器学习排名模型得出的参数来计算的。除了校准图外，还通过使用测试数据评估预测能力，该测试数据具有用于对患者进行排名的C指数、受试者操作特征曲线下的面积和用于区分的精度-召回曲线下的区域。TD风险评分的C指数平均值（95%置信限）、受试者操作特征曲线下面积和精确回忆曲线下面积分别为0.749（0.655，0.823）、0.758（0.649，0.857）和0.713（0.554，0.841）。观测和预测的概率与校准图相关。通过将机器学习方法与电子医疗记录相结合，为糖尿病患者开发了TD风险评分。得分计算可以集成到医疗记录中，以识别TD高危患者，这将有助于支持糖尿病护理和预防TD。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Treatment Discontinuation Prediction in Patients With Diabetes Using a Ranking Model: Machine Learning Model Development.

Background: Treatment discontinuation (TD) is one of the major prognostic issues in diabetes care, and several models have been proposed to predict a missed appointment that may lead to TD in patients with diabetes by using binary classification models for the early detection of TD and for providing intervention support for patients. However, as binary classification models output the probability of a missed appointment occurring within a predetermined period, they are limited in their ability to estimate the magnitude of TD risk in patients with inconsistent intervals between appointments, making it difficult to prioritize patients for whom intervention support should be provided.

Objective: This study aimed to develop a machine-learned prediction model that can output a TD risk score defined by the length of time until TD and prioritize patients for intervention according to their TD risk.

Methods: This model included patients with diagnostic codes indicative of diabetes at the University of Tokyo Hospital between September 3, 2012, and May 17, 2014. The model was internally validated with patients from the same hospital from May 18, 2014, to January 29, 2016. The data used in this study included 7551 patients who visited the hospital after January 1, 2004, and had diagnostic codes indicative of diabetes. In particular, data that were recorded in the electronic medical records between September 3, 2012, and January 29, 2016, were used. The main outcome was the TD of a patient, which was defined as missing a scheduled clinical appointment and having no hospital visits within 3 times the average number of days between the visits of the patient and within 60 days. The TD risk score was calculated by using the parameters derived from the machine-learned ranking model. The prediction capacity was evaluated by using test data with the C-index for the performance of ranking patients, area under the receiver operating characteristic curve, and area under the precision-recall curve for discrimination, in addition to a calibration plot.

Results: The means (95% confidence limits) of the C-index, area under the receiver operating characteristic curve, and area under the precision-recall curve for the TD risk score were 0.749 (0.655, 0.823), 0.758 (0.649, 0.857), and 0.713 (0.554, 0.841), respectively. The observed and predicted probabilities were correlated with the calibration plots.

Conclusions: A TD risk score was developed for patients with diabetes by combining a machine-learned method with electronic medical records. The score calculation can be integrated into medical records to identify patients at high risk of TD, which would be useful in supporting diabetes care and preventing TD.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JMIR bioinformatics and biotechnology

CiteScore

2.90

自引率

0.00%

发文量