首页 > 最新文献

PLOS digital health最新文献

英文 中文
Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality. 从名字推断性别:比较 Genderize、Gender API 和性别 R 软件包对不同国籍作者的准确性。
Pub Date : 2024-10-29 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000456
Alexander D VanHelene, Ishaani Khatri, C Beau Hilton, Sanjay Mishra, Ece D Gamsiz Uzun, Jeremy L Warner

Meta-researchers commonly leverage tools that infer gender from first names, especially when studying gender disparities. However, tools vary in their accuracy, ease of use, and cost. The objective of this study was to compare the accuracy and cost of the commercial software Genderize and Gender API, and the open-source gender R package. Differences in binary gender prediction accuracy between the three services were evaluated. Gender prediction accuracy was tested on a multi-national dataset of 32,968 gender-labeled clinical trial authors. Additionally, two datasets from previous studies with 5779 and 6131 names, respectively, were re-evaluated with modern implementations of Genderize and Gender API. The gender inference accuracy of Genderize and Gender API were compared, both with and without supplying trialists' country of origin in the API call. The accuracy of the gender R package was only evaluated without supplying countries of origin. The accuracy of Genderize, Gender API, and the gender R package were defined as the percentage of correct gender predictions. Accuracy differences between methods were evaluated using McNemar's test. Genderize and Gender API demonstrated 96.6% and 96.1% accuracy, respectively, when countries of origin were not supplied in the API calls. Genderize and Gender API achieved the highest accuracy when predicting the gender of German authors with accuracies greater than 98%. Genderize and Gender API were least accurate with South Korean, Chinese, Singaporean, and Taiwanese authors, demonstrating below 82% accuracy. Genderize can provide similar accuracy to Gender API while being 4.85x less expensive. The gender R package achieved below 86% accuracy on the full dataset. In the replication studies, Genderize and gender API demonstrated better performance than in the original publications. Our results indicate that Genderize and Gender API achieve similar accuracy on a multinational dataset. The gender R package is uniformly less accurate than Genderize and Gender API.

元研究人员通常会利用从名字推断性别的工具,尤其是在研究性别差异时。然而,这些工具在准确性、易用性和成本方面各不相同。本研究旨在比较商业软件 Genderize 和 Gender API 以及开源性别 R 软件包的准确性和成本。评估了三种服务在二元性别预测准确性方面的差异。性别预测准确性在一个包含 32968 名性别标签临床试验作者的多国数据集上进行了测试。此外,还使用 Genderize 和 Gender API 的现代实现方法重新评估了以前研究中的两个数据集,这两个数据集分别包含 5779 和 6131 个名字。在 API 调用中提供和不提供试验者原籍国的情况下,对 Genderize 和 Gender API 的性别推断准确性进行了比较。仅在不提供原籍国的情况下评估了性别 R 软件包的准确性。Genderize、Gender API 和性别 R 软件包的准确性被定义为性别预测正确率。使用 McNemar 检验评估了不同方法之间的准确性差异。当在 API 调用中不提供原籍国时,Genderize 和 Gender API 的准确率分别为 96.6% 和 96.1%。当预测德国作者的性别时,Genderize 和 Gender API 的准确率最高,准确率超过 98%。在预测韩国、中国、新加坡和台湾作者的性别时,Genderize 和 Gender API 的准确率最低,准确率低于 82%。Genderize 可以提供与 Gender API 相似的准确率,而成本却低 4.85 倍。性别 R 软件包在全部数据集上的准确率低于 86%。在复制研究中,Genderize 和 Gender API 的表现优于原始出版物。我们的结果表明,Genderize 和 Gender API 在多国数据集上达到了相似的准确率。性别 R 软件包的准确性一律低于 Genderize 和 Gender API。
{"title":"Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality.","authors":"Alexander D VanHelene, Ishaani Khatri, C Beau Hilton, Sanjay Mishra, Ece D Gamsiz Uzun, Jeremy L Warner","doi":"10.1371/journal.pdig.0000456","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000456","url":null,"abstract":"<p><p>Meta-researchers commonly leverage tools that infer gender from first names, especially when studying gender disparities. However, tools vary in their accuracy, ease of use, and cost. The objective of this study was to compare the accuracy and cost of the commercial software Genderize and Gender API, and the open-source gender R package. Differences in binary gender prediction accuracy between the three services were evaluated. Gender prediction accuracy was tested on a multi-national dataset of 32,968 gender-labeled clinical trial authors. Additionally, two datasets from previous studies with 5779 and 6131 names, respectively, were re-evaluated with modern implementations of Genderize and Gender API. The gender inference accuracy of Genderize and Gender API were compared, both with and without supplying trialists' country of origin in the API call. The accuracy of the gender R package was only evaluated without supplying countries of origin. The accuracy of Genderize, Gender API, and the gender R package were defined as the percentage of correct gender predictions. Accuracy differences between methods were evaluated using McNemar's test. Genderize and Gender API demonstrated 96.6% and 96.1% accuracy, respectively, when countries of origin were not supplied in the API calls. Genderize and Gender API achieved the highest accuracy when predicting the gender of German authors with accuracies greater than 98%. Genderize and Gender API were least accurate with South Korean, Chinese, Singaporean, and Taiwanese authors, demonstrating below 82% accuracy. Genderize can provide similar accuracy to Gender API while being 4.85x less expensive. The gender R package achieved below 86% accuracy on the full dataset. In the replication studies, Genderize and gender API demonstrated better performance than in the original publications. Our results indicate that Genderize and Gender API achieve similar accuracy on a multinational dataset. The gender R package is uniformly less accurate than Genderize and Gender API.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000456"},"PeriodicalIF":0.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11521266/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning For Risk Prediction After Heart Failure Emergency Department Visit or Hospital Admission Using Administrative Health Data. 利用健康管理数据进行心力衰竭急诊就诊或入院后风险预测的机器学习。
Pub Date : 2024-10-25 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000636
Nowell M Fine, Sunil V Kalmady, Weijie Sun, Russ Greiner, Jonathan G Howlett, James A White, Finlay A McAlister, Justin A Ezekowitz, Padma Kaul

Aims: Patients visiting the emergency department (ED) or hospitalized for heart failure (HF) are at increased risk for subsequent adverse outcomes, however effective risk stratification remains challenging. We utilized a machine-learning (ML)-based approach to identify HF patients at risk of adverse outcomes after an ED visit or hospitalization using a large regional administrative healthcare data system.

Methods and results: Patients visiting the ED or hospitalized with HF between 2002-2016 in Alberta, Canada were included. Outcomes of interest were 30-day and 1-year HF-related ED visits, HF hospital readmission or all-cause mortality. We applied a feature extraction method using deep feature synthesis from multiple sources of health data and compared performance of a gradient boosting algorithm (CatBoost) with logistic regression modelling. The area under receiver operating characteristic curve (AUC-ROC) was used to assess model performance. We included 50,630 patients with 93,552 HF ED visits/hospitalizations. At 30-day follow-up in the holdout validation cohort, the AUC-ROC for the combined endpoint of HF ED visit, HF hospital readmission or death for the Catboost and logistic regression models was 74.16 (73.18-75.11) versus 62.25 (61.25-63.18), respectively. At 1-year follow-up corresponding values were 76.80 (76.1-77.47) versus 69.52 (68.77-70.26), respectively. AUC-ROC values for the endpoint of all-cause death alone at 30-days and 1-year follow-up were 83.21 (81.83-84.41) versus 69.53 (67.98-71.18), and 85.73 (85.14-86.29) versus 69.40 (68.57-70.26), for the CatBoost and logistic regression models, respectively.

Conclusions: ML-based modelling with deep feature synthesis provided superior risk stratification for HF patients at 30-days and 1-year follow-up after an ED visit or hospitalization using data from a large administrative regional healthcare system.

目的:因心力衰竭(HF)到急诊科(ED)就诊或住院的患者随后出现不良后果的风险会增加,但有效的风险分层仍具有挑战性。我们采用了一种基于机器学习(ML)的方法,利用一个大型地区行政医疗数据系统来识别急诊科就诊或住院后有不良后果风险的心衰患者:纳入了 2002-2016 年间加拿大艾伯塔省的急诊室就诊或住院的高血压患者。我们关注的结果是 30 天和 1 年的心房颤动相关急诊就诊、心房颤动再入院或全因死亡率。我们从多个健康数据源中采用深度特征综合提取方法,并比较了梯度提升算法(CatBoost)和逻辑回归模型的性能。接受者操作特征曲线下面积(AUC-ROC)用于评估模型性能。我们纳入了 50,630 名患者,其中 93,552 人次接受了高频急诊就诊/住院治疗。在保留验证队列的 30 天随访中,Catboost 模型和逻辑回归模型对合并终点(HF ED 就诊、HF 再入院或死亡)的 AUC-ROC 分别为 74.16(73.18-75.11)和 62.25(61.25-63.18)。随访 1 年的相应值分别为 76.80(76.1-77.47)对 69.52(68.77-70.26)。CatBoost模型和逻辑回归模型在30天和随访1年时的全因死亡终点AUC-ROC值分别为83.21(81.83-84.41)对69.53(67.98-71.18),85.73(85.14-86.29)对69.40(68.57-70.26):基于 ML 的建模与深度特征合成可在急诊室就诊或住院后 30 天和 1 年随访期间,利用大型行政区域医疗保健系统的数据为心房颤动患者提供更优越的风险分层。
{"title":"Machine Learning For Risk Prediction After Heart Failure Emergency Department Visit or Hospital Admission Using Administrative Health Data.","authors":"Nowell M Fine, Sunil V Kalmady, Weijie Sun, Russ Greiner, Jonathan G Howlett, James A White, Finlay A McAlister, Justin A Ezekowitz, Padma Kaul","doi":"10.1371/journal.pdig.0000636","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000636","url":null,"abstract":"<p><strong>Aims: </strong>Patients visiting the emergency department (ED) or hospitalized for heart failure (HF) are at increased risk for subsequent adverse outcomes, however effective risk stratification remains challenging. We utilized a machine-learning (ML)-based approach to identify HF patients at risk of adverse outcomes after an ED visit or hospitalization using a large regional administrative healthcare data system.</p><p><strong>Methods and results: </strong>Patients visiting the ED or hospitalized with HF between 2002-2016 in Alberta, Canada were included. Outcomes of interest were 30-day and 1-year HF-related ED visits, HF hospital readmission or all-cause mortality. We applied a feature extraction method using deep feature synthesis from multiple sources of health data and compared performance of a gradient boosting algorithm (CatBoost) with logistic regression modelling. The area under receiver operating characteristic curve (AUC-ROC) was used to assess model performance. We included 50,630 patients with 93,552 HF ED visits/hospitalizations. At 30-day follow-up in the holdout validation cohort, the AUC-ROC for the combined endpoint of HF ED visit, HF hospital readmission or death for the Catboost and logistic regression models was 74.16 (73.18-75.11) versus 62.25 (61.25-63.18), respectively. At 1-year follow-up corresponding values were 76.80 (76.1-77.47) versus 69.52 (68.77-70.26), respectively. AUC-ROC values for the endpoint of all-cause death alone at 30-days and 1-year follow-up were 83.21 (81.83-84.41) versus 69.53 (67.98-71.18), and 85.73 (85.14-86.29) versus 69.40 (68.57-70.26), for the CatBoost and logistic regression models, respectively.</p><p><strong>Conclusions: </strong>ML-based modelling with deep feature synthesis provided superior risk stratification for HF patients at 30-days and 1-year follow-up after an ED visit or hospitalization using data from a large administrative regional healthcare system.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000636"},"PeriodicalIF":0.0,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11508085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An integrative systematic review on interventions to improve layperson's ability to identify trustworthy digital health information. 关于提高非专业人士识别可信数字健康信息能力的干预措施的综合系统综述。
Pub Date : 2024-10-25 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000638
Hind Mohamed, Esme Kittle, Nehal Nour, Ruba Hamed, Kaylem Feeney, Jon Salsberg, Dervla Kelly

Health information on the Internet has a ubiquitous influence on health consumers' behaviour. Searching and evaluating online health information poses a real challenge for many health consumers. To our knowledge, our systematic review paper is the first to explore the interventions targeting lay people to improve their e-health literacy skills. Our paper aims to explore interventions to improve laypeople ability to identify trustworthy online health information. The search was conducted on Ovid Medline, Embase, Cochrane database, Academic Search Complete, and APA psych info. Publications were selected by screening title, abstract, and full text, then manual review of reference lists of selected publications. Data was extracted from eligible studies on an excel sheet about the types of interventions, the outcomes of the interventions and whether they are effective, and the barriers and facilitators for using the interventions by consumers. A mixed-methods appraisal tool was used to appraise evidence from quantitative, qualitative, and mixed-methods studies. Whittemore and Knafl's integrative review approach was used as a guidance for narrative synthesis. The total number of included studies is twelve. Media literacy interventions are the most common type of interventions. Few studies measured the effect of the interventions on patient health outcomes. All the procedural and navigation/ evaluation skills-building interventions are significantly effective. Computer/internet illiteracy and the absence of guidance/facilitators are significant barriers to web-based intervention use. Few interventions are distinguished by its implementation in a context tailored to consumers, using a human-centred design approach, and delivery through multiple health stakeholders' partnership. There is potential for further research to understand how to improve consumers health information use focusing on collaborative learning, using human-centred approaches, and addressing the social determinants of health.

互联网上的健康信息对健康消费者的行为有着无处不在的影响。对许多健康消费者来说,搜索和评估在线健康信息是一项真正的挑战。据我们所知,我们的系统综述论文是第一篇探讨针对非专业人士的干预措施,以提高他们的电子健康知识技能的论文。我们的论文旨在探讨提高非专业人士识别可信在线健康信息能力的干预措施。我们在 Ovid Medline、Embase、Cochrane 数据库、Academic Search Complete 和 APA psych info 上进行了检索。通过筛选标题、摘要和全文,然后对所选出版物的参考文献列表进行人工审阅。通过 excel 表从符合条件的研究中提取数据,内容包括干预措施的类型、干预措施的结果和是否有效,以及消费者使用干预措施的障碍和促进因素。采用混合方法评估工具对定量、定性和混合方法研究的证据进行评估。Whittemore和Knafl的综合综述法被用作叙事综合的指导。共纳入 12 项研究。媒体素养干预是最常见的干预类型。很少有研究测量了干预措施对患者健康结果的影响。所有程序性干预和导航/评估技能培养干预都非常有效。计算机/互联网文盲和缺乏指导/协助者是使用网络干预的主要障碍。很少有干预措施能够在为消费者量身定制的环境中实施,采用以人为本的设计方法,并通过多个健康利益相关者的合作来实施。我们有潜力开展进一步的研究,以了解如何改善消费者对健康信息的使用,重点是协作学习、使用以人为本的方法以及解决健康的社会决定因素。
{"title":"An integrative systematic review on interventions to improve layperson's ability to identify trustworthy digital health information.","authors":"Hind Mohamed, Esme Kittle, Nehal Nour, Ruba Hamed, Kaylem Feeney, Jon Salsberg, Dervla Kelly","doi":"10.1371/journal.pdig.0000638","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000638","url":null,"abstract":"<p><p>Health information on the Internet has a ubiquitous influence on health consumers' behaviour. Searching and evaluating online health information poses a real challenge for many health consumers. To our knowledge, our systematic review paper is the first to explore the interventions targeting lay people to improve their e-health literacy skills. Our paper aims to explore interventions to improve laypeople ability to identify trustworthy online health information. The search was conducted on Ovid Medline, Embase, Cochrane database, Academic Search Complete, and APA psych info. Publications were selected by screening title, abstract, and full text, then manual review of reference lists of selected publications. Data was extracted from eligible studies on an excel sheet about the types of interventions, the outcomes of the interventions and whether they are effective, and the barriers and facilitators for using the interventions by consumers. A mixed-methods appraisal tool was used to appraise evidence from quantitative, qualitative, and mixed-methods studies. Whittemore and Knafl's integrative review approach was used as a guidance for narrative synthesis. The total number of included studies is twelve. Media literacy interventions are the most common type of interventions. Few studies measured the effect of the interventions on patient health outcomes. All the procedural and navigation/ evaluation skills-building interventions are significantly effective. Computer/internet illiteracy and the absence of guidance/facilitators are significant barriers to web-based intervention use. Few interventions are distinguished by its implementation in a context tailored to consumers, using a human-centred design approach, and delivery through multiple health stakeholders' partnership. There is potential for further research to understand how to improve consumers health information use focusing on collaborative learning, using human-centred approaches, and addressing the social determinants of health.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000638"},"PeriodicalIF":0.0,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11508166/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data as scientific currency: Challenges experienced by researchers with sharing health data in sub-Saharan Africa. 作为科学货币的数据:撒哈拉以南非洲研究人员在共享健康数据方面遇到的挑战。
Pub Date : 2024-10-24 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000635
Jyothi Chabilall, Qunita Brown, Nezerith Cengiz, Keymanthri Moodley

Innovative information-sharing techniques and rapid access to stored research data as scientific currency have proved highly beneficial in healthcare and health research. Yet, researchers often experience conflict between data sharing to promote health-related scientific knowledge for the common good and their personal academic advancement. There is a scarcity of studies exploring the perspectives of health researchers in sub-Saharan Africa (SSA) regarding the challenges with data sharing in the context of data-intensive research. The study began with a quantitative survey and research, after which the researchers engaged in a qualitative study. This qualitative cross-sectional baseline study reports on the challenges faced by health researchers, in terms of data sharing. In-depth interviews were conducted via Microsoft Teams between July 2022 and April 2023 with 16 health researchers from 16 different countries across SSA. We employed purposive and snowballing sampling techniques to invite participants via email. The recorded interviews were transcribed, coded and analysed thematically using ATLAS.ti. Five recurrent themes and several subthemes emerged related to (1) individual researcher concerns (fears regarding data sharing, publication and manuscript pressure), (2) structural issues impacting data sharing, (3) recognition in academia (scooping of research data, acknowledgement and research incentives) (4) ethical challenges experienced by health researchers in SSA (confidentiality and informed consent, commercialisation and benefit sharing) and (5) legal lacunae (gaps in laws and regulations). Significant discomfort about data sharing exists amongst health researchers in this sample of respondents from SSA, resulting in a reluctance to share data despite acknowledging the scientific benefits of such sharing. This discomfort is related to the lack of adequate guidelines and governance processes in the context of health research collaborations, both locally and internationally. Consequently, concerns about ethical and legal issues are increasing. Resources are needed in SSA to improve the quality, value and veracity of data-as these are ethical imperatives. Strengthening data governance via robust guidelines, legislation and appropriate data sharing agreements will increase trust amongst health researchers and data donors alike.

事实证明,创新的信息共享技术和作为科学货币的存储研究数据的快速获取,对医疗保健和健康研究大有裨益。然而,研究人员经常会遇到数据共享与个人学术进步之间的冲突,前者是为了共同利益而促进与健康相关的科学知识,后者是为了个人学术进步而促进与健康相关的科学知识。很少有研究从撒哈拉以南非洲(SSA)卫生研究人员的角度探讨在数据密集型研究背景下数据共享所面临的挑战。这项研究首先进行了一项定量调查和研究,之后研究人员又进行了一项定性研究。这项定性横断面基线研究报告了卫生研究人员在数据共享方面面临的挑战。2022 年 7 月至 2023 年 4 月期间,我们通过 Microsoft Teams 对来自撒哈拉以南非洲地区 16 个不同国家的 16 名健康研究人员进行了深入访谈。我们采用了目的性抽样和滚雪球抽样技术,通过电子邮件邀请参与者。我们使用 ATLAS.ti 对访谈录音进行了转录、编码和专题分析。出现了五个重复出现的主题和几个次主题,分别涉及:(1) 研究人员个人的担忧(对数据共享的恐惧、出版和稿件压力),(2) 影响数据共享的结构性问题,(3) 学术界的认可(研究数据的独家使用、认可和研究奖励),(4) SSA 卫生研究人员经历的伦理挑战(保密和知情同意、商业化和利益共享),以及 (5) 法律空白(法律法规的空白)。在 SSA 的受访者样本中,卫生研究人员对数据共享存在严重的不适感,导致他们不愿共享数据,尽管他们承认这种共享具有科学效益。这种不适感与当地和国际健康研究合作中缺乏适当的指导方针和管理程序有关。因此,对伦理和法律问题的担忧与日俱增。撒哈拉以南非洲地区需要资源来提高数据的质量、价值和真实性,因为这些都是伦理方面的当务之急。通过强有力的指导方针、立法和适当的数据共享协议来加强数据管理,将增加健康研究人员和数据捐赠者之间的信任。
{"title":"Data as scientific currency: Challenges experienced by researchers with sharing health data in sub-Saharan Africa.","authors":"Jyothi Chabilall, Qunita Brown, Nezerith Cengiz, Keymanthri Moodley","doi":"10.1371/journal.pdig.0000635","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000635","url":null,"abstract":"<p><p>Innovative information-sharing techniques and rapid access to stored research data as scientific currency have proved highly beneficial in healthcare and health research. Yet, researchers often experience conflict between data sharing to promote health-related scientific knowledge for the common good and their personal academic advancement. There is a scarcity of studies exploring the perspectives of health researchers in sub-Saharan Africa (SSA) regarding the challenges with data sharing in the context of data-intensive research. The study began with a quantitative survey and research, after which the researchers engaged in a qualitative study. This qualitative cross-sectional baseline study reports on the challenges faced by health researchers, in terms of data sharing. In-depth interviews were conducted via Microsoft Teams between July 2022 and April 2023 with 16 health researchers from 16 different countries across SSA. We employed purposive and snowballing sampling techniques to invite participants via email. The recorded interviews were transcribed, coded and analysed thematically using ATLAS.ti. Five recurrent themes and several subthemes emerged related to (1) individual researcher concerns (fears regarding data sharing, publication and manuscript pressure), (2) structural issues impacting data sharing, (3) recognition in academia (scooping of research data, acknowledgement and research incentives) (4) ethical challenges experienced by health researchers in SSA (confidentiality and informed consent, commercialisation and benefit sharing) and (5) legal lacunae (gaps in laws and regulations). Significant discomfort about data sharing exists amongst health researchers in this sample of respondents from SSA, resulting in a reluctance to share data despite acknowledging the scientific benefits of such sharing. This discomfort is related to the lack of adequate guidelines and governance processes in the context of health research collaborations, both locally and internationally. Consequently, concerns about ethical and legal issues are increasing. Resources are needed in SSA to improve the quality, value and veracity of data-as these are ethical imperatives. Strengthening data governance via robust guidelines, legislation and appropriate data sharing agreements will increase trust amongst health researchers and data donors alike.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000635"},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of a continuous single lead electrocardiogram analytic to predict patient deterioration requiring rapid response team activation. 使用连续单导联心电图分析仪预测需要启动快速反应小组的病人病情恶化情况。
Pub Date : 2024-10-24 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000465
Sooin Lee, Bryce Benson, Ashwin Belle, Richard P Medlin, David Jerkins, Foster Goss, Ashish K Khanna, Michael A DeVita, Kevin R Ward

Identifying the onset of patient deterioration is challenging despite the potential to respond to patients earlier with better vital sign monitoring and rapid response team (RRT) activation. In this study an ECG based software as a medical device, the Analytic for Hemodynamic Instability Predictive Index (AHI-PI), was compared to the vital signs of heart rate, blood pressure, and respiratory rate, evaluating how early it indicated risk before an RRT activation. A higher proportion of the events had risk indication by AHI-PI (92.71%) than by vital signs (41.67%). AHI-PI indicated risk early, with an average of over a day before RRT events. In events whose risks were indicated by both AHI-PI and vital signs, AHI-PI demonstrated earlier recognition of deterioration compared to vital signs. A case-control study showed that situations requiring RRTs were more likely to have AHI-PI risk indication than those that did not. The study derived several insights in support of AHI-PI's efficacy as a clinical decision support system. The findings demonstrated AHI-PI's potential to serve as a reliable predictor of future RRT events. It could potentially help clinicians recognize early clinical deterioration and respond to those unnoticed by vital signs, thereby helping clinicians improve clinical outcomes.

尽管通过更好的生命体征监测和快速反应小组(RRT)的启动可以更早地对患者做出反应,但识别患者病情恶化的起始时间仍具有挑战性。在这项研究中,将基于心电图软件的医疗设备--血流动力学不稳定性预测指数分析仪(AHI-PI)与心率、血压和呼吸频率等生命体征进行了比较,以评估在启动 RRT 之前,AHI-PI 能多早显示风险。与生命体征(41.67%)相比,AHI-PI(92.71%)能更早地提示风险。AHI-PI 提示风险的时间较早,平均比 RRT 事件早一天以上。在 AHI-PI 和生命体征均可提示风险的事件中,AHI-PI 比生命体征更早地识别出病情恶化。一项病例对照研究显示,需要 RRT 的情况比不需要 RRT 的情况更有可能出现 AHI-PI 风险提示。该研究得出了一些见解,支持 AHI-PI 作为临床决策支持系统的功效。研究结果表明,AHI-PI 有可能成为未来 RRT 事件的可靠预测指标。它有可能帮助临床医生识别早期临床恶化,并对生命体征未注意到的情况做出反应,从而帮助临床医生改善临床预后。
{"title":"Use of a continuous single lead electrocardiogram analytic to predict patient deterioration requiring rapid response team activation.","authors":"Sooin Lee, Bryce Benson, Ashwin Belle, Richard P Medlin, David Jerkins, Foster Goss, Ashish K Khanna, Michael A DeVita, Kevin R Ward","doi":"10.1371/journal.pdig.0000465","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000465","url":null,"abstract":"<p><p>Identifying the onset of patient deterioration is challenging despite the potential to respond to patients earlier with better vital sign monitoring and rapid response team (RRT) activation. In this study an ECG based software as a medical device, the Analytic for Hemodynamic Instability Predictive Index (AHI-PI), was compared to the vital signs of heart rate, blood pressure, and respiratory rate, evaluating how early it indicated risk before an RRT activation. A higher proportion of the events had risk indication by AHI-PI (92.71%) than by vital signs (41.67%). AHI-PI indicated risk early, with an average of over a day before RRT events. In events whose risks were indicated by both AHI-PI and vital signs, AHI-PI demonstrated earlier recognition of deterioration compared to vital signs. A case-control study showed that situations requiring RRTs were more likely to have AHI-PI risk indication than those that did not. The study derived several insights in support of AHI-PI's efficacy as a clinical decision support system. The findings demonstrated AHI-PI's potential to serve as a reliable predictor of future RRT events. It could potentially help clinicians recognize early clinical deterioration and respond to those unnoticed by vital signs, thereby helping clinicians improve clinical outcomes.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000465"},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier. 将电子病历数据中的偏差概念化:儿科肥胖症发病率分类器的人口亚群绩效差异案例研究。
Pub Date : 2024-10-23 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000642
Elizabeth A Campbell, Saurav Bose, Aaron J Masino

Electronic Health Records (EHRs) are increasingly used to develop machine learning models in predictive medicine. There has been limited research on utilizing machine learning methods to predict childhood obesity and related disparities in classifier performance among vulnerable patient subpopulations. In this work, classification models are developed to recognize pediatric obesity using temporal condition patterns obtained from patient EHR data in a U.S. study population. We trained four machine learning algorithms (Logistic Regression, Random Forest, Gradient Boosted Trees, and Neural Networks) to classify cases and controls as obesity positive or negative, and optimized hyperparameter settings through a bootstrapping methodology. To assess the classifiers for bias, we studied model performance by population subgroups then used permutation analysis to identify the most predictive features for each model and the demographic characteristics of patients with these features. Mean AUC-ROC values were consistent across classifiers, ranging from 0.72-0.80. Some evidence of bias was identified, although this was through the models performing better for minority subgroups (African Americans and patients enrolled in Medicaid). Permutation analysis revealed that patients from vulnerable population subgroups were over-represented among patients with the most predictive diagnostic patterns. We hypothesize that our models performed better on under-represented groups because the features more strongly associated with obesity were more commonly observed among minority patients. These findings highlight the complex ways that bias may arise in machine learning models and can be incorporated into future research to develop a thorough analytical approach to identify and mitigate bias that may arise from features and within EHR datasets when developing more equitable models.

电子健康记录(EHR)越来越多地被用于开发预测医学中的机器学习模型。在利用机器学习方法预测儿童肥胖症以及易受影响的患者亚群中分类器性能的相关差异方面,研究还很有限。在这项工作中,我们开发了分类模型,利用从美国研究人群的患者电子病历数据中获得的时间条件模式来识别小儿肥胖症。我们训练了四种机器学习算法(逻辑回归、随机森林、梯度提升树和神经网络),将病例和对照组划分为肥胖阳性或阴性,并通过引导方法优化了超参数设置。为了评估分类器的偏差,我们研究了不同人群亚群的模型性能,然后使用置换分析确定了每个模型最具预测性的特征以及具有这些特征的患者的人口统计学特征。不同分类器的平均 AUC-ROC 值一致,范围在 0.72-0.80 之间。发现了一些偏倚的证据,但这是通过模型对少数族裔亚群(非裔美国人和参加医疗补助的患者)的表现更好而发现的。置换分析表明,弱势人群亚群的患者在最具预测性诊断模式的患者中比例过高。我们假设,我们的模型在代表性不足的群体中表现更佳,因为在少数群体患者中更常观察到与肥胖关联更强的特征。这些发现凸显了机器学习模型中可能出现偏差的复杂方式,可将其纳入未来的研究中,以开发一种全面的分析方法,在开发更公平的模型时,识别并减轻可能来自特征和电子病历数据集的偏差。
{"title":"Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.","authors":"Elizabeth A Campbell, Saurav Bose, Aaron J Masino","doi":"10.1371/journal.pdig.0000642","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000642","url":null,"abstract":"<p><p>Electronic Health Records (EHRs) are increasingly used to develop machine learning models in predictive medicine. There has been limited research on utilizing machine learning methods to predict childhood obesity and related disparities in classifier performance among vulnerable patient subpopulations. In this work, classification models are developed to recognize pediatric obesity using temporal condition patterns obtained from patient EHR data in a U.S. study population. We trained four machine learning algorithms (Logistic Regression, Random Forest, Gradient Boosted Trees, and Neural Networks) to classify cases and controls as obesity positive or negative, and optimized hyperparameter settings through a bootstrapping methodology. To assess the classifiers for bias, we studied model performance by population subgroups then used permutation analysis to identify the most predictive features for each model and the demographic characteristics of patients with these features. Mean AUC-ROC values were consistent across classifiers, ranging from 0.72-0.80. Some evidence of bias was identified, although this was through the models performing better for minority subgroups (African Americans and patients enrolled in Medicaid). Permutation analysis revealed that patients from vulnerable population subgroups were over-represented among patients with the most predictive diagnostic patterns. We hypothesize that our models performed better on under-represented groups because the features more strongly associated with obesity were more commonly observed among minority patients. These findings highlight the complex ways that bias may arise in machine learning models and can be incorporated into future research to develop a thorough analytical approach to identify and mitigate bias that may arise from features and within EHR datasets when developing more equitable models.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000642"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11498669/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of observability period on the classification of COPD diagnosis timing among Medicare beneficiaries with lung cancer. 观察期对肺癌医疗保险受益人慢性阻塞性肺疾病诊断时间分类的影响。
Pub Date : 2024-10-22 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000633
Eman Metwally, Sarah E Soppe, Jennifer L Lund, Sharon Peacock Hinton, Caroline A Thompson

Background: Investigators often use claims data to estimate the diagnosis timing of chronic conditions. However, misclassification of chronic conditions is common due to variability in healthcare utilization and in claims history across patients.

Objective: We aimed to quantify the effect of various Medicare fee-for-service continuous enrollment period and lookback period (LBP) on misclassification of COPD and sample size.

Methods: A stepwise tutorial to classify COPD, based on its diagnosis timing relative to lung cancer diagnosis using the Surveillance Epidemiology and End Results cancer registry linked to Medicare insurance claims. We used 3 approaches varying the LBP and required continuous enrollment (i.e., observability) period between 1 to 5 years. Patients with lung cancer were classified based on their COPD related healthcare utilization into 3 groups: pre-existing COPD (diagnosis at least 3 months before lung cancer diagnosis), concurrent COPD (diagnosis during the -/+ 3months of lung cancer diagnosis), and non-COPD. Among those with 5 years of continuous enrollment, we estimated the sensitivity of the LBP to ascertain COPD diagnosis as the number of patients with pre-existing COPD using a shorter LBP divided by the number of patients with pre-existing COPD using a longer LBP.

Results: Extending the LBP from 1 to 5 years increased prevalence of pre-existing COPD from ~ 36% to 51%, decreased both concurrent COPD from ~ 34% to 23% and non-COPD from ~ 29% to 25%. There was minimal effect of extending the required continuous enrollment period beyond one year across various LBPs. In those with 5 years of continuous enrollment, sensitivity of COPD classification (95% CI) increased with longer LBP from 70.1% (69.7% to 70.4%) for one-year LBP to 100% for 5-years LBP.

Conclusion: The length of optimum LBP and continuous enrollment period depends on the context of the research question and the data generating mechanisms. Among Medicare beneficiaries, the best approach to identify diagnosis timing of COPD relative to lung cancer diagnosis is to use all available LBP with at least one year of required continuous enrollment.

背景:调查人员经常使用索赔数据来估算慢性病的诊断时间。然而,由于不同患者的医疗保健使用情况和报销历史存在差异,慢性病的错误分类很常见:我们的目的是量化各种医疗保险付费服务连续参保期和回溯期(LBP)对慢性阻塞性肺病分类错误和样本量的影响:通过使用与医疗保险索赔相关联的 "监测流行病学和最终结果 "癌症登记处,根据慢性阻塞性肺病与肺癌诊断的相对诊断时间,对慢性阻塞性肺病进行逐步分类。我们采用了 3 种不同的肺结核分类方法,并要求连续登记(即可观察性)时间在 1 到 5 年之间。肺癌患者根据其慢性阻塞性肺病相关的医疗保健使用情况分为 3 组:原有慢性阻塞性肺病(肺癌确诊前至少 3 个月确诊)、并发慢性阻塞性肺病(肺癌确诊前 -/+ 3 个月确诊)和非慢性阻塞性肺病。在连续登记 5 年的患者中,我们估算了枸橼酸脯氨酸酯酶对确定慢性阻塞性肺病诊断的灵敏度,即使用较短枸橼酸脯氨酸酯酶的原有慢性阻塞性肺病患者人数除以使用较长枸橼酸脯氨酸酯酶的原有慢性阻塞性肺病患者人数:结果:将枸橼酸脯氨酸苷的有效期从 1 年延长至 5 年,原有慢性阻塞性肺病的患病率从约 36% 上升至 51%,并发慢性阻塞性肺病的患病率从约 34% 下降至 23%,非慢性阻塞性肺病的患病率从约 29% 下降至 25%。在各种肺结核中,将所需的连续参保时间延长至一年以上的影响微乎其微。在连续登记 5 年的患者中,慢性阻塞性肺病分类的灵敏度(95% CI)随着枸杞期的延长而增加,从一年枸杞期的 70.1%(69.7% 至 70.4%)增加到 5 年枸杞期的 100%:最佳 LBP 和连续登记期的长度取决于研究问题的背景和数据生成机制。在医疗保险受益人中,确定 COPD 诊断时间与肺癌诊断时间的最佳方法是使用所有可用的 LBP,并要求至少有一年的连续登记时间。
{"title":"Impact of observability period on the classification of COPD diagnosis timing among Medicare beneficiaries with lung cancer.","authors":"Eman Metwally, Sarah E Soppe, Jennifer L Lund, Sharon Peacock Hinton, Caroline A Thompson","doi":"10.1371/journal.pdig.0000633","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000633","url":null,"abstract":"<p><strong>Background: </strong>Investigators often use claims data to estimate the diagnosis timing of chronic conditions. However, misclassification of chronic conditions is common due to variability in healthcare utilization and in claims history across patients.</p><p><strong>Objective: </strong>We aimed to quantify the effect of various Medicare fee-for-service continuous enrollment period and lookback period (LBP) on misclassification of COPD and sample size.</p><p><strong>Methods: </strong>A stepwise tutorial to classify COPD, based on its diagnosis timing relative to lung cancer diagnosis using the Surveillance Epidemiology and End Results cancer registry linked to Medicare insurance claims. We used 3 approaches varying the LBP and required continuous enrollment (i.e., observability) period between 1 to 5 years. Patients with lung cancer were classified based on their COPD related healthcare utilization into 3 groups: pre-existing COPD (diagnosis at least 3 months before lung cancer diagnosis), concurrent COPD (diagnosis during the -/+ 3months of lung cancer diagnosis), and non-COPD. Among those with 5 years of continuous enrollment, we estimated the sensitivity of the LBP to ascertain COPD diagnosis as the number of patients with pre-existing COPD using a shorter LBP divided by the number of patients with pre-existing COPD using a longer LBP.</p><p><strong>Results: </strong>Extending the LBP from 1 to 5 years increased prevalence of pre-existing COPD from ~ 36% to 51%, decreased both concurrent COPD from ~ 34% to 23% and non-COPD from ~ 29% to 25%. There was minimal effect of extending the required continuous enrollment period beyond one year across various LBPs. In those with 5 years of continuous enrollment, sensitivity of COPD classification (95% CI) increased with longer LBP from 70.1% (69.7% to 70.4%) for one-year LBP to 100% for 5-years LBP.</p><p><strong>Conclusion: </strong>The length of optimum LBP and continuous enrollment period depends on the context of the research question and the data generating mechanisms. Among Medicare beneficiaries, the best approach to identify diagnosis timing of COPD relative to lung cancer diagnosis is to use all available LBP with at least one year of required continuous enrollment.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000633"},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning and diSentangling patient static information from time-series Electronic hEalth Records (STEER). 从时间序列电子健康记录(STEER)中学习和识别患者静态信息。
Pub Date : 2024-10-21 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000640
Wei Liao, Joel Voldman

Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. Previous work has shown that self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to various comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive information for downstream tasks.

最近在医疗保健领域开展的机器学习工作引起了人们对患者隐私和算法公平性的关注。之前的研究表明,自我报告的种族可以从不具种族信息的医疗数据中预测出来。然而,数据识别的程度尚不可知,我们也没有办法开发出其结果受此类信息影响最小的模型。在此,我们系统地研究了时间序列电子健康记录数据预测患者静态信息的能力。我们发现,不仅原始的时间序列数据,而且从机器学习模型中学习到的表征,都可以通过训练来预测各种静态信息,其接收者操作特征曲线下面积对生物性别的预测高达 0.851,对二进制年龄的预测高达 0.869,对自我报告的种族的预测高达 0.810。如此高的预测性能可以扩展到各种合并症因素,即使模型是针对不同的任务、使用不同的队列、使用不同的模型架构和数据库进行训练时也是如此。考虑到这些发现对隐私和公平性的影响,我们开发了一种基于变异自动编码器的方法,该方法可学习结构化潜空间,从时间序列数据中分离出患者敏感属性。我们的工作深入研究了机器学习模型从时间序列电子健康记录中编码患者静态信息的能力,并为下游任务引入了一种保护患者敏感信息的通用方法。
{"title":"Learning and diSentangling patient static information from time-series Electronic hEalth Records (STEER).","authors":"Wei Liao, Joel Voldman","doi":"10.1371/journal.pdig.0000640","DOIUrl":"10.1371/journal.pdig.0000640","url":null,"abstract":"<p><p>Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. Previous work has shown that self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to various comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive information for downstream tasks.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000640"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493250/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Derivation and validation of an algorithm to predict transitions from community to residential long-term care among persons with dementia-A retrospective cohort study. 痴呆症患者从社区向长期住院护理过渡的预测算法的推导和验证--一项回顾性队列研究。
Pub Date : 2024-10-18 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000441
Wenshan Li, Luke Turcotte, Amy T Hsu, Robert Talarico, Danial Qureshi, Colleen Webber, Steven Hawken, Peter Tanuseputro, Douglas G Manuel, Greg Huyer

Objectives: To develop and validate a model to predict time-to-LTC admissions among individuals with dementia.

Design: Population-based retrospective cohort study using health administrative data.

Setting and participants: Community-dwelling older adults (65+) in Ontario living with dementia and assessed with the Resident Assessment Instrument for Home Care (RAI-HC) between April 1, 2010 and March 31, 2017.

Methods: Individuals in the derivation cohort (n = 95,813; assessed before March 31, 2015) were followed for up to 360 days after the index RAI-HC assessment for admission into LTC. We used a multivariable Fine Gray sub-distribution hazard model to predict the cumulative incidence of LTC entry while accounting for all-cause mortality as a competing risk. The model was validated in 34,038 older adults with dementia with an index RAI-HC assessment between April 1, 2015 and March 31, 2017.

Results: Within one year of a RAI-HC assessment, 35,513 (37.1%) individuals in the derivation cohort and 10,735 (31.5%) in the validation cohort entered LTC. Our algorithm was well-calibrated (Emax = 0.119, ICIavg = 0.057) and achieved a c-statistic of 0.707 (95% confidence interval: 0.703-0.712) in the validation cohort.

Conclusions and implications: We developed an algorithm to predict time to LTC entry among individuals living with dementia. This tool can inform care planning for individuals with dementia and their family caregivers.

目的开发并验证一个模型,以预测痴呆症患者入住长期护理中心的时间:设计:基于人口的回顾性队列研究,使用健康管理数据:2010年4月1日至2017年3月31日期间,安大略省居住在社区的患有痴呆症的老年人(65岁以上),并使用家庭护理居民评估工具(RAI-HC)进行评估:对衍生队列(n = 95,813;2015 年 3 月 31 日之前评估)中的个人进行了长达 360 天的随访,随访时间为 RAI-HC 评估指数进入 LTC 后的 360 天。我们使用了一个多变量 Fine Gray 子分布危险模型来预测进入 LTC 的累积发病率,同时将全因死亡率作为竞争风险加以考虑。该模型在2015年4月1日至2017年3月31日期间进行了RAI-HC指数评估的34038名老年痴呆症患者中进行了验证:在RAI-HC评估后的一年内,推导队列中有35513人(37.1%)和验证队列中有10735人(31.5%)进入了LTC。我们的算法校准良好(Emax = 0.119,ICIavg = 0.057),验证队列中的 c 统计量为 0.707(95% 置信区间:0.703-0.712):我们开发了一种算法来预测痴呆症患者进入长期护理中心的时间。该工具可为痴呆症患者及其家庭护理者的护理规划提供参考。
{"title":"Derivation and validation of an algorithm to predict transitions from community to residential long-term care among persons with dementia-A retrospective cohort study.","authors":"Wenshan Li, Luke Turcotte, Amy T Hsu, Robert Talarico, Danial Qureshi, Colleen Webber, Steven Hawken, Peter Tanuseputro, Douglas G Manuel, Greg Huyer","doi":"10.1371/journal.pdig.0000441","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000441","url":null,"abstract":"<p><strong>Objectives: </strong>To develop and validate a model to predict time-to-LTC admissions among individuals with dementia.</p><p><strong>Design: </strong>Population-based retrospective cohort study using health administrative data.</p><p><strong>Setting and participants: </strong>Community-dwelling older adults (65+) in Ontario living with dementia and assessed with the Resident Assessment Instrument for Home Care (RAI-HC) between April 1, 2010 and March 31, 2017.</p><p><strong>Methods: </strong>Individuals in the derivation cohort (n = 95,813; assessed before March 31, 2015) were followed for up to 360 days after the index RAI-HC assessment for admission into LTC. We used a multivariable Fine Gray sub-distribution hazard model to predict the cumulative incidence of LTC entry while accounting for all-cause mortality as a competing risk. The model was validated in 34,038 older adults with dementia with an index RAI-HC assessment between April 1, 2015 and March 31, 2017.</p><p><strong>Results: </strong>Within one year of a RAI-HC assessment, 35,513 (37.1%) individuals in the derivation cohort and 10,735 (31.5%) in the validation cohort entered LTC. Our algorithm was well-calibrated (Emax = 0.119, ICIavg = 0.057) and achieved a c-statistic of 0.707 (95% confidence interval: 0.703-0.712) in the validation cohort.</p><p><strong>Conclusions and implications: </strong>We developed an algorithm to predict time to LTC entry among individuals living with dementia. This tool can inform care planning for individuals with dementia and their family caregivers.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000441"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11488705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using interpretable machine learning to predict bloodstream infection and antimicrobial resistance in patients admitted to ICU: Early alert predictors based on EHR data to guide antimicrobial stewardship. 利用可解释的机器学习预测重症监护室住院患者的血流感染和抗菌药物耐药性:基于电子病历数据的早期预警预测,指导抗菌药物管理。
Pub Date : 2024-10-16 eCollection Date: 2024-10-01 DOI: 10.1371/journal.pdig.0000641
Davide Ferrari, Pietro Arina, Jonathan Edgeworth, Vasa Curcin, Veronica Guidetti, Federica Mandreoli, Yanzhong Wang

Nosocomial infections and Antimicrobial Resistance (AMR) stand as formidable healthcare challenges on a global scale. To address these issues, various infection control protocols and personalized treatment strategies, guided by laboratory tests, aim to detect bloodstream infections (BSI) and assess the potential for AMR. In this study, we introduce a machine learning (ML) approach based on Multi-Objective Symbolic Regression (MOSR), an evolutionary approach to create ML models in the form of readable mathematical equations in a multi-objective way to overcome the limitation of standard single-objective approaches. This method leverages readily available clinical data collected upon admission to intensive care units, with the goal of predicting the presence of BSI and AMR. We further assess its performance by comparing it to established ML algorithms using both naturally imbalanced real-world data and data that has been balanced through oversampling techniques. Our findings reveal that traditional ML models exhibit subpar performance across all training scenarios. In contrast, MOSR, specifically configured to minimize false negatives by optimizing also for the F1-Score, outperforms other ML algorithms and consistently delivers reliable results, irrespective of the training set balance with F1-Score.22 and.28 higher than any other alternative. This research signifies a promising path forward in enhancing Antimicrobial Stewardship (AMS) strategies. Notably, the MOSR approach can be readily implemented on a large scale, offering a new ML tool to find solutions to these critical healthcare issues affected by limited data availability.

非医院感染和抗菌药物耐药性(AMR)是全球范围内医疗保健领域面临的严峻挑战。为了解决这些问题,在实验室检测的指导下,各种感染控制协议和个性化治疗策略旨在检测血流感染(BSI)并评估 AMR 的可能性。在本研究中,我们介绍了一种基于多目标符号回归(MOSR)的机器学习(ML)方法,这是一种以多目标方式创建可读数学方程形式的 ML 模型的进化方法,克服了标准单目标方法的局限性。这种方法利用了重症监护病房入院时收集的现成临床数据,目的是预测是否存在 BSI 和 AMR。我们使用自然失衡的真实世界数据和通过超采样技术实现平衡的数据,将其与成熟的 ML 算法进行比较,从而进一步评估其性能。我们的研究结果表明,传统的 ML 模型在所有训练场景中都表现不佳。与此相反,MOSR 通过对 F1 分数进行优化,将假阴性降到最低,其性能优于其他 ML 算法,无论训练集平衡与否,都能持续提供可靠的结果,其 F1 分数分别比其他任何算法高出 22 分和 28 分。这项研究为加强抗菌药物管理(AMS)战略开辟了一条充满希望的道路。值得注意的是,MOSR 方法可以很容易地大规模实施,它提供了一种新的 ML 工具,可以为这些受有限数据可用性影响的关键医疗保健问题找到解决方案。
{"title":"Using interpretable machine learning to predict bloodstream infection and antimicrobial resistance in patients admitted to ICU: Early alert predictors based on EHR data to guide antimicrobial stewardship.","authors":"Davide Ferrari, Pietro Arina, Jonathan Edgeworth, Vasa Curcin, Veronica Guidetti, Federica Mandreoli, Yanzhong Wang","doi":"10.1371/journal.pdig.0000641","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000641","url":null,"abstract":"<p><p>Nosocomial infections and Antimicrobial Resistance (AMR) stand as formidable healthcare challenges on a global scale. To address these issues, various infection control protocols and personalized treatment strategies, guided by laboratory tests, aim to detect bloodstream infections (BSI) and assess the potential for AMR. In this study, we introduce a machine learning (ML) approach based on Multi-Objective Symbolic Regression (MOSR), an evolutionary approach to create ML models in the form of readable mathematical equations in a multi-objective way to overcome the limitation of standard single-objective approaches. This method leverages readily available clinical data collected upon admission to intensive care units, with the goal of predicting the presence of BSI and AMR. We further assess its performance by comparing it to established ML algorithms using both naturally imbalanced real-world data and data that has been balanced through oversampling techniques. Our findings reveal that traditional ML models exhibit subpar performance across all training scenarios. In contrast, MOSR, specifically configured to minimize false negatives by optimizing also for the F1-Score, outperforms other ML algorithms and consistently delivers reliable results, irrespective of the training set balance with F1-Score.22 and.28 higher than any other alternative. This research signifies a promising path forward in enhancing Antimicrobial Stewardship (AMS) strategies. Notably, the MOSR approach can be readily implemented on a large scale, offering a new ML tool to find solutions to these critical healthcare issues affected by limited data availability.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000641"},"PeriodicalIF":0.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11482717/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
PLOS digital health
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1