首页 > 最新文献

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science最新文献

英文 中文
Characterizing Disparities in the Treatment of Intimate Partner Violence. 描述亲密伴侣暴力治疗中的差异。
Çerağ Oğuztüzün, Mehmet Koyutürk, Günnur Karakurt

Exposure to Intimate Partner Violence (IPV) has lasting adverse effects on the physical, behavioral, cognitive, and emotional health of survivors. To this end, it is critical to understand the effectiveness of IPV treatment strategies in reducing IPV and its debilitating effects. Meta-analyses designed to comprehensively describe the effectiveness of treatments offer unique advantages. However, the heterogeneity within and between studies poses challenges in interpreting findings. Meta-analyses are therefore unlikely to identify the factors that underlie disparities in treatment efficacy. To characterize the effect of demographic and social factors on treatment effectiveness, we develop a comprehensive computational and statistical framework that uses Meta-regression to characterize the effect of demographic and social variables on treatment outcomes. The innovations in our methodology include (i) standardization of outcome variables to enable meaningful comparisons among studies, and (ii) two parallel meta-regression pipelines to reliably handle missing data.

亲密伴侣暴力(IPV)会对幸存者的身体、行为、认知和情感健康产生持久的不良影响。为此,了解 IPV 治疗策略在减少 IPV 及其破坏性影响方面的有效性至关重要。旨在全面描述治疗效果的 Meta 分析具有独特的优势。然而,研究内部和研究之间的异质性给解释研究结果带来了挑战。因此,Meta 分析不太可能找出导致治疗效果差异的因素。为了描述人口和社会因素对治疗效果的影响,我们开发了一个全面的计算和统计框架,利用元回归来描述人口和社会变量对治疗结果的影响。我们在方法上的创新包括:(i) 对结果变量进行标准化,以便在不同研究之间进行有意义的比较;(ii) 两个并行的元回归管道,以便可靠地处理缺失数据。
{"title":"Characterizing Disparities in the Treatment of Intimate Partner Violence.","authors":"Çerağ Oğuztüzün, Mehmet Koyutürk, Günnur Karakurt","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Exposure to Intimate Partner Violence (IPV) has lasting adverse effects on the physical, behavioral, cognitive, and emotional health of survivors. To this end, it is critical to understand the effectiveness of IPV treatment strategies in reducing IPV and its debilitating effects. Meta-analyses designed to comprehensively describe the effectiveness of treatments offer unique advantages. However, the heterogeneity within and between studies poses challenges in interpreting findings. Meta-analyses are therefore unlikely to identify the factors that underlie disparities in treatment efficacy. To characterize the effect of demographic and social factors on treatment effectiveness, we develop a comprehensive computational and statistical framework that uses Meta-regression to characterize the effect of demographic and social variables on treatment outcomes. The innovations in our methodology include (i) standardization of outcome variables to enable meaningful comparisons among studies, and (ii) two parallel meta-regression pipelines to reliably handle missing data.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"408-417"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283094/pdf/2326.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9710340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Detection of Intimate Partner Violence Victims from Social Media for Proactive Delivery of Support. 从社交媒体自动检测亲密伴侣暴力受害者,以主动提供支持。
Yuting Guo, Sangmi Kim, Elise Warren, Yuan-Chi Yang, Sahithi Lakamana, Abeed Sarker

Social media platforms are increasingly being used by intimate partner violence (IPV) victims to share experiences and seek support. If such information is automatically curated, it may be possible to conduct social media based surveillance and even design interventions over such platforms. In this paper, we describe the development of a supervised classification system that automatically characterizes IPV-related posts on the social network Reddit. We collected data from four IPV-related subreddits and manually annotated the data to indicate whether a post is a self-report of IPV or not. Using the annotated data (N=289), we trained, evaluated, and compared supervised machine learning systems. A transformer-based classifier, RoBERTa, obtained the best classification performance with overall accuracy of 78% and IPV-self-report class 𝐹1 -score of 0.67. Post-classification error analyses revealed that misclassifications often occur for posts that are very long or are non-first-person reports of IPV. Despite the relatively small annotated data, our classification methods obtained promising results, indicating that it may be possible to detect and, hence, provide support to IPV victims over Reddit.

亲密伴侣暴力 (IPV) 受害者越来越多地使用社交媒体平台来分享经历和寻求支持。如果能对此类信息进行自动整理,就有可能在此类平台上进行基于社交媒体的监控,甚至设计干预措施。在本文中,我们介绍了一个监督分类系统的开发过程,该系统可自动描述社交网络 Reddit 上与 IPV 相关的帖子。我们从四个与 IPV 相关的 subreddits 中收集了数据,并对数据进行了人工标注,以表明帖子是否是 IPV 的自我报告。利用注释数据(N=289),我们对监督机器学习系统进行了训练、评估和比较。基于转换器的分类器 RoBERTa 获得了最好的分类效果,总体准确率为 78%,IPV 自我报告类的ᵃ1 分数为 0.67。分类后误差分析表明,对于篇幅很长或非第一人称的 IPV 报告,经常会出现分类错误。尽管注释数据相对较少,但我们的分类方法仍取得了可喜的成果,这表明我们有可能在 Reddit 上检测到 IPV 受害者并为其提供支持。
{"title":"Automatic Detection of Intimate Partner Violence Victims from Social Media for Proactive Delivery of Support.","authors":"Yuting Guo, Sangmi Kim, Elise Warren, Yuan-Chi Yang, Sahithi Lakamana, Abeed Sarker","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Social media platforms are increasingly being used by intimate partner violence (IPV) victims to share experiences and seek support. If such information is automatically curated, it may be possible to conduct social media based surveillance and even design interventions over such platforms. In this paper, we describe the development of a supervised classification system that automatically characterizes IPV-related posts on the social network Reddit. We collected data from four IPV-related subreddits and manually annotated the data to indicate whether a post is a self-report of IPV or not. Using the annotated data (N=289), we trained, evaluated, and compared supervised machine learning systems. A transformer-based classifier, RoBERTa, obtained the best classification performance with overall accuracy of 78% and IPV-self-report class 𝐹<sub>1</sub> -score of 0.67. Post-classification error analyses revealed that misclassifications often occur for posts that are very long or are non-first-person reports of IPV. Despite the relatively small annotated data, our classification methods obtained promising results, indicating that it may be possible to detect and, hence, provide support to IPV victims over Reddit.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"254-260"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283132/pdf/2018.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9767214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Association of Learning Health System Practicing Hospitals and other Health Information Interested Hospitals with Patient-Generated Health Data Uptake. 学习健康系统实践医院协会和其他有患者生成健康数据的健康信息感兴趣的医院。
Ibukun E Fowe, Neal T Wallace, Jeffrey Kaye

Patient generated health data (PGHD) has been described as a necessary addition to provider-generated information for improving care processes in US hospitals. This study evaluated the distribution of Health Information Interested (HII) US hospitals that are more likely to capture or use PGHD. The literature suggests that HII hospitals are more likely to capture and use PGHD. Cross-sectional analysis of the 2018 American Hospital Association's (AHA) health-IT-supplement and other supporting datasets showed that HII hospitals collectively and majority of HII hospital subcategories evaluated were associated with increased PGHD capture and use. The full Learning Health System (LHS) hospital subcategory had the highest association and hospitals in the meaningful use stage three compliant (MU3) and PCORI funded subcategory also had higher rates of PGHD capture or use when in combination with LHS hospitals. Hence, being LHS appears to be the strongest practice and policy lever to increase PGHD capture and use.

患者生成的健康数据(PGHD)被描述为对提供者生成的信息的必要补充,用于改善美国医院的护理流程。本研究评估了更有可能捕获或使用PGHD的美国健康信息感兴趣(HII)医院的分布情况。文献表明,HII医院更有可能捕获和使用PGHD。对2018年美国医院协会(AHA)健康信息技术补充和其他支持数据集的横断面分析显示,HII医院和大多数评估的HII医院子类别与PGHD捕获和使用的增加有关。完全学习健康系统(LHS)医院子类别具有最高的关联性,在符合有意义使用阶段三(MU3)和PCORI资助的子类别中的医院与LHS医院联合使用时,PGHD的捕获或使用率也较高。因此,成为LHS似乎是增加PGHD捕获和使用的最强实践和政策杠杆。
{"title":"The Association of Learning Health System Practicing Hospitals and other Health Information Interested Hospitals with Patient-Generated Health Data Uptake.","authors":"Ibukun E Fowe, Neal T Wallace, Jeffrey Kaye","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Patient generated health data (PGHD) has been described as a necessary addition to provider-generated information for improving care processes in US hospitals. This study evaluated the distribution of Health Information Interested (HII) US hospitals that are more likely to capture or use PGHD. The literature suggests that HII hospitals are more likely to capture and use PGHD. Cross-sectional analysis of the 2018 American Hospital Association's (AHA) health-IT-supplement and other supporting datasets showed that HII hospitals collectively and majority of HII hospital subcategories evaluated were associated with increased PGHD capture and use. The full Learning Health System (LHS) hospital subcategory had the highest association and hospitals in the meaningful use stage three compliant (MU3) and PCORI funded subcategory also had higher rates of PGHD capture or use when in combination with LHS hospitals. Hence, being LHS appears to be the strongest practice and policy lever to increase PGHD capture and use.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"176-185"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283141/pdf/2055.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enrichment of a Data Lake to Support Population Health Outcomes Studies Using Social Determinants Linked EHR Data. 利用社会决定因素相关的电子病历数据丰富数据湖以支持人口健康结果研究。
Md Kamruz Zaman Rana, Xing Song, Humayera Islam, Tanmoy Paul, Khuder Alaboud, Lemuel R Waitman, Abu S M Mosa

The integration of electronic health records (EHRs) with social determinants of health (SDoH) is crucial for population health outcome research, but it requires the collection of identifiable information and poses security risks. This study presents a framework for facilitating de-identified clinical data with privacy-preserved geocoded linked SDoH data in a Data Lake. A reidentification risk detection algorithm was also developed to evaluate the transmission risk of the data. The utility of this framework was demonstrated through one population health outcomes research analyzing the correlation between socioeconomic status and the risk of having chronic conditions. The results of this study inform the development of evidence-based interventions and support the use of this framework in understanding the complex relationships between SDoH and health outcomes. This framework reduces computational and administrative workload and security risks for researchers and preserves data privacy and enables rapid and reliable research on SDoH-connected clinical data for research institutes.

电子健康记录(EHRs)与健康社会决定因素(SDoH)的整合对于人口健康结果研究至关重要,但它需要收集可识别的信息,并存在安全风险。本研究提出了一个框架,用于促进在数据湖中使用隐私保护的地理编码链接的SDoH数据去识别临床数据。提出了一种重新识别风险检测算法来评估数据的传输风险。通过一项人口健康结果研究,分析了社会经济地位与患慢性病风险之间的相关性,证明了这一框架的效用。这项研究的结果为基于证据的干预措施的发展提供了信息,并支持使用这一框架来理解SDoH与健康结果之间的复杂关系。该框架减少了研究人员的计算和管理工作量和安全风险,并保护了数据隐私,使研究机构能够快速可靠地研究与sdoh相关的临床数据。
{"title":"Enrichment of a Data Lake to Support Population Health Outcomes Studies Using Social Determinants Linked EHR Data.","authors":"Md Kamruz Zaman Rana,&nbsp;Xing Song,&nbsp;Humayera Islam,&nbsp;Tanmoy Paul,&nbsp;Khuder Alaboud,&nbsp;Lemuel R Waitman,&nbsp;Abu S M Mosa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The integration of electronic health records (EHRs) with social determinants of health (SDoH) is crucial for population health outcome research, but it requires the collection of identifiable information and poses security risks. This study presents a framework for facilitating de-identified clinical data with privacy-preserved geocoded linked SDoH data in a Data Lake. A reidentification risk detection algorithm was also developed to evaluate the transmission risk of the data. The utility of this framework was demonstrated through one population health outcomes research analyzing the correlation between socioeconomic status and the risk of having chronic conditions. The results of this study inform the development of evidence-based interventions and support the use of this framework in understanding the complex relationships between SDoH and health outcomes. This framework reduces computational and administrative workload and security risks for researchers and preserves data privacy and enables rapid and reliable research on SDoH-connected clinical data for research institutes.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"448-457"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283101/pdf/2450.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Suicidal Behavior and Self-harm Among Children Presenting to Emergency Departments: A Tree-based Classification Approach. 检测急诊科就诊儿童的自杀行为和自残行为:基于树的分类方法
Juliet B Edgcomb, Chi-Hong Tseng, Mengtong Pan, Alexandra Klomhaus, Bonnie Zima

Suicide is the second leading cause of death of U.S. children over 10 years old. Application of statistical learning to structured EHR data may improve detection of children with suicidal behavior and self-harm. Classification trees (CART) were developed and cross-validated using mental health-related emergency department (MH-ED) visits (2015-2019) of children 10-17 years (N=600) across two sites. Performance was compared with the CDC Surveillance Case Definition ICD-10-CM code list. Gold-standard was child psychiatrist chart review. Visits were suicide-related among 284/600 (47.3%) children. ICD-10-CM detected cases with sensitivity 70.7 (95%CI 67.0-74.3), specificity 99.0 (98.8-100), and 85/284 (29.9%) false negatives. CART detected cases with sensitivity 85.1 (64.7-100) and specificity 94.9 (89.2-100). Strongest predictors were suicide-related code, MH- and suicide-related chief complaints, site, area deprivation index, and depression. Diagnostic codes miss nearly one-third of children with suicidal behavior and self-harm. Advances in EHR-based phenotyping have the potential to improve detection of childhood-onset suicidality.

自杀是美国 10 岁以上儿童的第二大死因。将统计学习应用于结构化电子病历数据可提高对有自杀行为和自残行为的儿童的检测能力。我们开发了分类树 (CART),并使用两个地点的 10-17 岁儿童(N=600)的精神健康相关急诊(MH-ED)就诊记录(2015-2019 年)进行交叉验证。结果与疾病预防控制中心监测病例定义 ICD-10-CM 代码列表进行了比较。金标准为儿童精神科医生病历审查。284/600(47.3%)名儿童的就诊与自杀有关。ICD-10-CM 发现病例的灵敏度为 70.7 (95%CI 67.0-74.3),特异性为 99.0 (98.8-100),假阴性为 85/284 (29.9%)。CART 检测病例的灵敏度为 85.1(64.7-100),特异性为 94.9(89.2-100)。最强的预测因素是自杀相关代码、精神健康和自杀相关主诉、地点、地区贫困指数和抑郁症。诊断代码遗漏了近三分之一有自杀行为和自残行为的儿童。基于电子病历的表型分析技术的进步有望改善对儿童自杀倾向的检测。
{"title":"Detection of Suicidal Behavior and Self-harm Among Children Presenting to Emergency Departments: A Tree-based Classification Approach.","authors":"Juliet B Edgcomb, Chi-Hong Tseng, Mengtong Pan, Alexandra Klomhaus, Bonnie Zima","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Suicide is the second leading cause of death of U.S. children over 10 years old. Application of statistical learning to structured EHR data may improve detection of children with suicidal behavior and self-harm. Classification trees (CART) were developed and cross-validated using mental health-related emergency department (MH-ED) visits (2015-2019) of children 10-17 years (N=600) across two sites. Performance was compared with the CDC Surveillance Case Definition ICD-10-CM code list. Gold-standard was child psychiatrist chart review. Visits were suicide-related among 284/600 (47.3%) children. ICD-10-CM detected cases with sensitivity 70.7 (95%CI 67.0-74.3), specificity 99.0 (98.8-100), and 85/284 (29.9%) false negatives. CART detected cases with sensitivity 85.1 (64.7-100) and specificity 94.9 (89.2-100). Strongest predictors were suicide-related code, MH- and suicide-related chief complaints, site, area deprivation index, and depression. Diagnostic codes miss nearly one-third of children with suicidal behavior and self-harm. Advances in EHR-based phenotyping have the potential to improve detection of childhood-onset suicidality.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"108-117"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283119/pdf/2295.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Mining Pipeline for COVID-19 Vaccine Safety Analysis Using a Large Electronic Health Record. 基于大型电子健康记录的COVID-19疫苗安全性分析数据挖掘管道
Yan Huang, Xiaojin Li, Deepa Dongarwar, Hulin Wu, Guo-Qiang Zhang

We developed a novel data mining pipeline that automatically extracts potential COVID-19 vaccine-related adverse events from a large Electronic Health Record (EHR) dataset. We applied this pipeline to Optum® de-identified COVID-19 EHR dataset containing COVID-19 vaccine records between December 11, 2020 and January 20, 2022. We compared post-vaccination diagnoses between the COVID-19 vaccine group and the influenza vaccine group among 553,682 individuals without COVID-19 infection. We extracted 1,414 ICD-10 diagnosis categories (first three ICD10 digits) within 180 days after the first dose of the COVID-19 vaccine. We then ranked the diagnosis codes using the adverse event rates and adjusted odds ratio based on the self-controlled case series analysis. Using inverse probability of censoring weighting, we estimated the right-censored time-to-event records. Our results show that the COVID-19 vaccine has a similar adverse events rate to the influenza vaccine. We found 20 types of potential COVID-19 vaccine-related adverse events that may need further investigation.

我们开发了一种新的数据挖掘管道,可以从大型电子健康记录(EHR)数据集中自动提取潜在的COVID-19疫苗相关不良事件。我们将该管线应用于Optum®去识别的COVID-19电子病历数据集,该数据集包含2020年12月11日至2022年1月20日期间的COVID-19疫苗记录。我们比较了553,682名未感染COVID-19的个体中COVID-19疫苗组和流感疫苗组的疫苗接种后诊断。在首次接种COVID-19疫苗后180天内提取1414个ICD-10诊断类别(ICD10前三位数字)。然后,我们根据自我对照病例序列分析,使用不良事件发生率和调整的优势比对诊断代码进行排名。使用反向概率的审查权,我们估计正确审查的时间到事件的记录。我们的研究结果表明,COVID-19疫苗与流感疫苗具有相似的不良事件发生率。我们发现了20种潜在的COVID-19疫苗相关不良事件,可能需要进一步调查。
{"title":"Data Mining Pipeline for COVID-19 Vaccine Safety Analysis Using a Large Electronic Health Record.","authors":"Yan Huang, Xiaojin Li, Deepa Dongarwar, Hulin Wu, Guo-Qiang Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We developed a novel data mining pipeline that automatically extracts potential COVID-19 vaccine-related adverse events from a large Electronic Health Record (EHR) dataset. We applied this pipeline to Optum<sup>®</sup> de-identified COVID-19 EHR dataset containing COVID-19 vaccine records between December 11, 2020 and January 20, 2022. We compared post-vaccination diagnoses between the COVID-19 vaccine group and the influenza vaccine group among 553,682 individuals without COVID-19 infection. We extracted 1,414 ICD-10 diagnosis categories (first three ICD10 digits) within 180 days after the first dose of the COVID-19 vaccine. We then ranked the diagnosis codes using the adverse event rates and adjusted odds ratio based on the self-controlled case series analysis. Using inverse probability of censoring weighting, we estimated the right-censored time-to-event records. Our results show that the COVID-19 vaccine has a similar adverse events rate to the influenza vaccine. We found 20 types of potential COVID-19 vaccine-related adverse events that may need further investigation.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"271-280"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283124/pdf/2352.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Automated Machine Learning for Cognitive Outcome Prediction from Multimodal Brain Imaging using STREAMLINE. 利用 STREAMLINE 探索通过多模态脑成像进行认知结果预测的自动化机器学习。
Xinkai Wang, Yanbo Feng, Boning Tong, Jingxuan Bao, Marylyn D Ritchie, Andrew J Saykin, Jason H Moore, Ryan Urbanowicz, Li Shen

STREAMLINE is a simple, transparent, end-to-end automated machine learning (AutoML) pipeline for easily conducting rigorous machine learning (ML) modeling and analysis. The initial version is limited to binary classification. In this work, we extend STREAMLINE through implementing multiple regression-based ML models, including linear regression, elastic net, group lasso, and L21 norm. We demonstrate the effectiveness of the regression version of STREAMLINE by applying it to the prediction of Alzheimer's disease (AD) cognitive outcomes using multimodal brain imaging data. Our empirical results demonstrate the feasibility and effectiveness of the newly expanded STREAMLINE as an AutoML pipeline for evaluating AD regression models, and for discovering multimodal imaging biomarkers.

STREAMLINE 是一个简单、透明、端到端的自动机器学习(AutoML)管道,可轻松进行严格的机器学习(ML)建模和分析。最初的版本仅限于二元分类。在这项工作中,我们扩展了 STREAMLINE,实现了多种基于回归的 ML 模型,包括线性回归、弹性网、组套索和 L21 准则。我们将 STREAMLINE 的回归版本应用于使用多模态脑成像数据预测阿尔茨海默病(AD)的认知结果,从而证明了它的有效性。我们的实证结果证明了新扩展的 STREAMLINE 作为评估 AD 回归模型和发现多模态成像生物标记物的 AutoML 管道的可行性和有效性。
{"title":"Exploring Automated Machine Learning for Cognitive Outcome Prediction from Multimodal Brain Imaging using STREAMLINE.","authors":"Xinkai Wang, Yanbo Feng, Boning Tong, Jingxuan Bao, Marylyn D Ritchie, Andrew J Saykin, Jason H Moore, Ryan Urbanowicz, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>STREAMLINE is a simple, transparent, end-to-end automated machine learning (AutoML) pipeline for easily conducting rigorous machine learning (ML) modeling and analysis. The initial version is limited to binary classification. In this work, we extend STREAMLINE through implementing multiple regression-based ML models, including linear regression, elastic net, group lasso, and L21 norm. We demonstrate the effectiveness of the regression version of STREAMLINE by applying it to the prediction of Alzheimer's disease (AD) cognitive outcomes using multimodal brain imaging data. Our empirical results demonstrate the feasibility and effectiveness of the newly expanded STREAMLINE as an AutoML pipeline for evaluating AD regression models, and for discovering multimodal imaging biomarkers.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"544-553"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283099/pdf/2390.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10070912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection. 避免标签选择情况下有偏差的临床机器学习模型性能评估
Conor K Corbin, Michael Baiocchi, Jonathan H Chen

When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are properly specified, they recover full population estimates. We then tackle the real-world task of monitoring the performance of deployed machine learning models whose interactions with clinicians feed-back and affect the selection mechanism of the labels. We train three machine learning models to flag low-yield laboratory diagnostics, and simulate their intended consequence of reducing wasteful laboratory utilization. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool. We propose an altered deployment procedure, one that combines injected randomization with traditional weighted estimates, and find it recovers true model performance.

在评估临床机器学习模型的性能时,必须考虑部署人群。当带有观察标签的患者群体只是部署群体的一个子集(标签选择)时,对观察群体的标准模型性能估计可能会产生误导。在这项研究中,我们描述了三类标签选择,并模拟了五种因果关系不同的情况,以评估特定的选择机制如何偏离一套通常报告的二元机器学习模型性能指标。模拟结果表明,当选择受到观测特征的影响时,对模型区分度的天真估计可能会产生误导。当选择受标签影响时,对校准的天真估计无法反映现实。我们借鉴了因果推理文献中的传统加权估计器,发现当选择概率被正确指定时,它们能恢复完整的群体估计值。然后,我们解决了监控已部署机器学习模型性能的现实任务,这些模型与临床医生的互动反馈会影响标签的选择机制。我们训练了三个机器学习模型来标记低收益的实验室诊断,并模拟其减少实验室浪费的预期结果。我们发现,对所观察人群的 AUROC 的天真估计会低估实际性能达 20%。这种差距足以导致错误地终止一个成功的临床决策支持工具。我们提出了一种改变的部署程序,该程序将注入随机化与传统的加权估计相结合,并发现它能恢复真实的模型性能。
{"title":"Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection.","authors":"Conor K Corbin, Michael Baiocchi, Jonathan H Chen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are properly specified, they recover full population estimates. We then tackle the real-world task of monitoring the performance of deployed machine learning models whose interactions with clinicians feed-back and affect the selection mechanism of the labels. We train three machine learning models to flag low-yield laboratory diagnostics, and simulate their intended consequence of reducing wasteful laboratory utilization. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool. We propose an altered deployment procedure, one that combines injected randomization with traditional weighted estimates, and find it recovers true model performance.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283136/pdf/2405.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9703649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking Ambient NO2 Pollution Measures with Electronic Health Record Data to Study Asthma Exacerbations. 将环境二氧化氮污染测量与电子健康记录数据联系起来研究哮喘恶化。
Alana Schreibman, Sherrie Xie, Rebecca A Hubbard, Blanca E Himes

Electronic health record (EHR)-derived data can be linked to geospatially distributed socioeconomic and environmental factors to conduct large-scale epidemiologic studies. Ambient NO2 is a known environmental risk factor for asthma. However, health exposure studies often rely on data from geographically sparse regulatory monitors that may not reflect true individual exposure. We contrasted use of interpolated NO2 regulatory monitor data with raw satellite measurements and satellite-derived ground estimates, building on previous work which has computed improved exposure estimates from remotely sensed data. Raw satellite and satellite-derived ground measurements captured spatial variation missed by interpolated ground monitor measurements. Multivariable analyses comparing these three NO2 measurement approaches (interpolated monitor, raw satellite, and satellite-derived) revealed a positive relationship between exposure and asthma exacerbations for both satellite measurements. Exposure-outcome relationships using the interpolated monitor NO2 were inconsistent with known relationships to asthma, suggesting that interpolated monitor data might yield misleading results in small region studies.

电子健康记录(EHR)生成的数据可与地理空间分布的社会经济和环境因素联系起来,以开展大规模流行病学研究。环境中的二氧化氮是哮喘的已知环境风险因素。然而,健康暴露研究通常依赖于来自地理位置稀疏的监管监测仪的数据,这些数据可能无法反映真实的个人暴露情况。我们将内插的二氧化氮监管监测数据与原始卫星测量数据和卫星衍生的地面估算数据进行了对比,并借鉴了之前通过遥感数据计算改进的暴露估算数据的工作。原始卫星测量数据和卫星衍生地面测量数据捕捉到了插值地面监测仪测量数据所忽略的空间变化。比较这三种二氧化氮测量方法(内插监测、原始卫星和卫星衍生)的多变量分析表明,两种卫星测量方法的暴露量与哮喘恶化之间存在正相关关系。使用插值监测仪测量的二氧化氮暴露量与哮喘的已知关系不一致,这表明在小区域研究中,插值监测仪数据可能会产生误导性结果。
{"title":"Linking Ambient NO2 Pollution Measures with Electronic Health Record Data to Study Asthma Exacerbations.","authors":"Alana Schreibman, Sherrie Xie, Rebecca A Hubbard, Blanca E Himes","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Electronic health record (EHR)-derived data can be linked to geospatially distributed socioeconomic and environmental factors to conduct large-scale epidemiologic studies. Ambient NO2 is a known environmental risk factor for asthma. However, health exposure studies often rely on data from geographically sparse regulatory monitors that may not reflect true individual exposure. We contrasted use of interpolated NO2 regulatory monitor data with raw satellite measurements and satellite-derived ground estimates, building on previous work which has computed improved exposure estimates from remotely sensed data. Raw satellite and satellite-derived ground measurements captured spatial variation missed by interpolated ground monitor measurements. Multivariable analyses comparing these three NO2 measurement approaches (interpolated monitor, raw satellite, and satellite-derived) revealed a positive relationship between exposure and asthma exacerbations for both satellite measurements. Exposure-outcome relationships using the interpolated monitor NO2 were inconsistent with known relationships to asthma, suggesting that interpolated monitor data might yield misleading results in small region studies.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"467-476"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283087/pdf/2145.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9832116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments. TRESTLE:可重复执行语音、文本和语言实验的工具包。
Changye Li, Weizhe Xu, Trevor Cohen, Martin Michalowski, Serguei Pakhomov

The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (Toolkit for Reproducible Execution of Speech Text and Language Experiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.

越来越多的证据表明,机器学习和深度学习方法可以学习患有各种形式认知障碍(如痴呆症)的人与认知健康的人所使用的语言之间的细微差别。TalkBank 等宝贵的公共数据资源库使计算界的研究人员能够联合起来,相互学习,从而在这一领域取得重大进展。然而,由于不同研究人员使用的方法和数据选择策略存在差异,不同研究小组取得的结果很难直接进行比较。在本文中,我们将介绍 TRESTLE(可重复执行语音文本和语言实验的工具包),这是一个开源平台,主要针对 TalkBank 库中的两个数据集,以痴呆症检测为示例领域。TRESTLE 在 2022 年 AAAI 健康智能国际研讨会的黑客挑战赛(Hackathon/Challenge)中成功部署,为数据预处理和选择策略提供了精确的数字蓝图,其他研究人员可通过 TRESTLE 重复使用这些策略,以寻求与同行和当前最先进(SOTA)方法相媲美的结果。
{"title":"TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments.","authors":"Changye Li, Weizhe Xu, Trevor Cohen, Martin Michalowski, Serguei Pakhomov","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (<b>T</b>oolkit for <b>R</b>eproducible <b>E</b>xecution of <b>S</b>peech <b>T</b>ext and <b>L</b>anguage <b>E</b>xperiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"360-369"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283131/pdf/2277.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9715633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1