Exposure to Intimate Partner Violence (IPV) has lasting adverse effects on the physical, behavioral, cognitive, and emotional health of survivors. To this end, it is critical to understand the effectiveness of IPV treatment strategies in reducing IPV and its debilitating effects. Meta-analyses designed to comprehensively describe the effectiveness of treatments offer unique advantages. However, the heterogeneity within and between studies poses challenges in interpreting findings. Meta-analyses are therefore unlikely to identify the factors that underlie disparities in treatment efficacy. To characterize the effect of demographic and social factors on treatment effectiveness, we develop a comprehensive computational and statistical framework that uses Meta-regression to characterize the effect of demographic and social variables on treatment outcomes. The innovations in our methodology include (i) standardization of outcome variables to enable meaningful comparisons among studies, and (ii) two parallel meta-regression pipelines to reliably handle missing data.
亲密伴侣暴力(IPV)会对幸存者的身体、行为、认知和情感健康产生持久的不良影响。为此,了解 IPV 治疗策略在减少 IPV 及其破坏性影响方面的有效性至关重要。旨在全面描述治疗效果的 Meta 分析具有独特的优势。然而,研究内部和研究之间的异质性给解释研究结果带来了挑战。因此,Meta 分析不太可能找出导致治疗效果差异的因素。为了描述人口和社会因素对治疗效果的影响,我们开发了一个全面的计算和统计框架,利用元回归来描述人口和社会变量对治疗结果的影响。我们在方法上的创新包括:(i) 对结果变量进行标准化,以便在不同研究之间进行有意义的比较;(ii) 两个并行的元回归管道,以便可靠地处理缺失数据。
{"title":"Characterizing Disparities in the Treatment of Intimate Partner Violence.","authors":"Çerağ Oğuztüzün, Mehmet Koyutürk, Günnur Karakurt","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Exposure to Intimate Partner Violence (IPV) has lasting adverse effects on the physical, behavioral, cognitive, and emotional health of survivors. To this end, it is critical to understand the effectiveness of IPV treatment strategies in reducing IPV and its debilitating effects. Meta-analyses designed to comprehensively describe the effectiveness of treatments offer unique advantages. However, the heterogeneity within and between studies poses challenges in interpreting findings. Meta-analyses are therefore unlikely to identify the factors that underlie disparities in treatment efficacy. To characterize the effect of demographic and social factors on treatment effectiveness, we develop a comprehensive computational and statistical framework that uses Meta-regression to characterize the effect of demographic and social variables on treatment outcomes. The innovations in our methodology include (i) standardization of outcome variables to enable meaningful comparisons among studies, and (ii) two parallel meta-regression pipelines to reliably handle missing data.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"408-417"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283094/pdf/2326.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9710340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media platforms are increasingly being used by intimate partner violence (IPV) victims to share experiences and seek support. If such information is automatically curated, it may be possible to conduct social media based surveillance and even design interventions over such platforms. In this paper, we describe the development of a supervised classification system that automatically characterizes IPV-related posts on the social network Reddit. We collected data from four IPV-related subreddits and manually annotated the data to indicate whether a post is a self-report of IPV or not. Using the annotated data (N=289), we trained, evaluated, and compared supervised machine learning systems. A transformer-based classifier, RoBERTa, obtained the best classification performance with overall accuracy of 78% and IPV-self-report class 𝐹1 -score of 0.67. Post-classification error analyses revealed that misclassifications often occur for posts that are very long or are non-first-person reports of IPV. Despite the relatively small annotated data, our classification methods obtained promising results, indicating that it may be possible to detect and, hence, provide support to IPV victims over Reddit.
{"title":"Automatic Detection of Intimate Partner Violence Victims from Social Media for Proactive Delivery of Support.","authors":"Yuting Guo, Sangmi Kim, Elise Warren, Yuan-Chi Yang, Sahithi Lakamana, Abeed Sarker","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Social media platforms are increasingly being used by intimate partner violence (IPV) victims to share experiences and seek support. If such information is automatically curated, it may be possible to conduct social media based surveillance and even design interventions over such platforms. In this paper, we describe the development of a supervised classification system that automatically characterizes IPV-related posts on the social network Reddit. We collected data from four IPV-related subreddits and manually annotated the data to indicate whether a post is a self-report of IPV or not. Using the annotated data (N=289), we trained, evaluated, and compared supervised machine learning systems. A transformer-based classifier, RoBERTa, obtained the best classification performance with overall accuracy of 78% and IPV-self-report class 𝐹<sub>1</sub> -score of 0.67. Post-classification error analyses revealed that misclassifications often occur for posts that are very long or are non-first-person reports of IPV. Despite the relatively small annotated data, our classification methods obtained promising results, indicating that it may be possible to detect and, hence, provide support to IPV victims over Reddit.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"254-260"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283132/pdf/2018.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9767214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patient generated health data (PGHD) has been described as a necessary addition to provider-generated information for improving care processes in US hospitals. This study evaluated the distribution of Health Information Interested (HII) US hospitals that are more likely to capture or use PGHD. The literature suggests that HII hospitals are more likely to capture and use PGHD. Cross-sectional analysis of the 2018 American Hospital Association's (AHA) health-IT-supplement and other supporting datasets showed that HII hospitals collectively and majority of HII hospital subcategories evaluated were associated with increased PGHD capture and use. The full Learning Health System (LHS) hospital subcategory had the highest association and hospitals in the meaningful use stage three compliant (MU3) and PCORI funded subcategory also had higher rates of PGHD capture or use when in combination with LHS hospitals. Hence, being LHS appears to be the strongest practice and policy lever to increase PGHD capture and use.
{"title":"The Association of Learning Health System Practicing Hospitals and other Health Information Interested Hospitals with Patient-Generated Health Data Uptake.","authors":"Ibukun E Fowe, Neal T Wallace, Jeffrey Kaye","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Patient generated health data (PGHD) has been described as a necessary addition to provider-generated information for improving care processes in US hospitals. This study evaluated the distribution of Health Information Interested (HII) US hospitals that are more likely to capture or use PGHD. The literature suggests that HII hospitals are more likely to capture and use PGHD. Cross-sectional analysis of the 2018 American Hospital Association's (AHA) health-IT-supplement and other supporting datasets showed that HII hospitals collectively and majority of HII hospital subcategories evaluated were associated with increased PGHD capture and use. The full Learning Health System (LHS) hospital subcategory had the highest association and hospitals in the meaningful use stage three compliant (MU3) and PCORI funded subcategory also had higher rates of PGHD capture or use when in combination with LHS hospitals. Hence, being LHS appears to be the strongest practice and policy lever to increase PGHD capture and use.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"176-185"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283141/pdf/2055.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Kamruz Zaman Rana, Xing Song, Humayera Islam, Tanmoy Paul, Khuder Alaboud, Lemuel R Waitman, Abu S M Mosa
The integration of electronic health records (EHRs) with social determinants of health (SDoH) is crucial for population health outcome research, but it requires the collection of identifiable information and poses security risks. This study presents a framework for facilitating de-identified clinical data with privacy-preserved geocoded linked SDoH data in a Data Lake. A reidentification risk detection algorithm was also developed to evaluate the transmission risk of the data. The utility of this framework was demonstrated through one population health outcomes research analyzing the correlation between socioeconomic status and the risk of having chronic conditions. The results of this study inform the development of evidence-based interventions and support the use of this framework in understanding the complex relationships between SDoH and health outcomes. This framework reduces computational and administrative workload and security risks for researchers and preserves data privacy and enables rapid and reliable research on SDoH-connected clinical data for research institutes.
{"title":"Enrichment of a Data Lake to Support Population Health Outcomes Studies Using Social Determinants Linked EHR Data.","authors":"Md Kamruz Zaman Rana, Xing Song, Humayera Islam, Tanmoy Paul, Khuder Alaboud, Lemuel R Waitman, Abu S M Mosa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The integration of electronic health records (EHRs) with social determinants of health (SDoH) is crucial for population health outcome research, but it requires the collection of identifiable information and poses security risks. This study presents a framework for facilitating de-identified clinical data with privacy-preserved geocoded linked SDoH data in a Data Lake. A reidentification risk detection algorithm was also developed to evaluate the transmission risk of the data. The utility of this framework was demonstrated through one population health outcomes research analyzing the correlation between socioeconomic status and the risk of having chronic conditions. The results of this study inform the development of evidence-based interventions and support the use of this framework in understanding the complex relationships between SDoH and health outcomes. This framework reduces computational and administrative workload and security risks for researchers and preserves data privacy and enables rapid and reliable research on SDoH-connected clinical data for research institutes.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"448-457"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283101/pdf/2450.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juliet B Edgcomb, Chi-Hong Tseng, Mengtong Pan, Alexandra Klomhaus, Bonnie Zima
Suicide is the second leading cause of death of U.S. children over 10 years old. Application of statistical learning to structured EHR data may improve detection of children with suicidal behavior and self-harm. Classification trees (CART) were developed and cross-validated using mental health-related emergency department (MH-ED) visits (2015-2019) of children 10-17 years (N=600) across two sites. Performance was compared with the CDC Surveillance Case Definition ICD-10-CM code list. Gold-standard was child psychiatrist chart review. Visits were suicide-related among 284/600 (47.3%) children. ICD-10-CM detected cases with sensitivity 70.7 (95%CI 67.0-74.3), specificity 99.0 (98.8-100), and 85/284 (29.9%) false negatives. CART detected cases with sensitivity 85.1 (64.7-100) and specificity 94.9 (89.2-100). Strongest predictors were suicide-related code, MH- and suicide-related chief complaints, site, area deprivation index, and depression. Diagnostic codes miss nearly one-third of children with suicidal behavior and self-harm. Advances in EHR-based phenotyping have the potential to improve detection of childhood-onset suicidality.
{"title":"Detection of Suicidal Behavior and Self-harm Among Children Presenting to Emergency Departments: A Tree-based Classification Approach.","authors":"Juliet B Edgcomb, Chi-Hong Tseng, Mengtong Pan, Alexandra Klomhaus, Bonnie Zima","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Suicide is the second leading cause of death of U.S. children over 10 years old. Application of statistical learning to structured EHR data may improve detection of children with suicidal behavior and self-harm. Classification trees (CART) were developed and cross-validated using mental health-related emergency department (MH-ED) visits (2015-2019) of children 10-17 years (N=600) across two sites. Performance was compared with the CDC Surveillance Case Definition ICD-10-CM code list. Gold-standard was child psychiatrist chart review. Visits were suicide-related among 284/600 (47.3%) children. ICD-10-CM detected cases with sensitivity 70.7 (95%CI 67.0-74.3), specificity 99.0 (98.8-100), and 85/284 (29.9%) false negatives. CART detected cases with sensitivity 85.1 (64.7-100) and specificity 94.9 (89.2-100). Strongest predictors were suicide-related code, MH- and suicide-related chief complaints, site, area deprivation index, and depression. Diagnostic codes miss nearly one-third of children with suicidal behavior and self-harm. Advances in EHR-based phenotyping have the potential to improve detection of childhood-onset suicidality.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"108-117"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283119/pdf/2295.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Huang, Xiaojin Li, Deepa Dongarwar, Hulin Wu, Guo-Qiang Zhang
We developed a novel data mining pipeline that automatically extracts potential COVID-19 vaccine-related adverse events from a large Electronic Health Record (EHR) dataset. We applied this pipeline to Optum® de-identified COVID-19 EHR dataset containing COVID-19 vaccine records between December 11, 2020 and January 20, 2022. We compared post-vaccination diagnoses between the COVID-19 vaccine group and the influenza vaccine group among 553,682 individuals without COVID-19 infection. We extracted 1,414 ICD-10 diagnosis categories (first three ICD10 digits) within 180 days after the first dose of the COVID-19 vaccine. We then ranked the diagnosis codes using the adverse event rates and adjusted odds ratio based on the self-controlled case series analysis. Using inverse probability of censoring weighting, we estimated the right-censored time-to-event records. Our results show that the COVID-19 vaccine has a similar adverse events rate to the influenza vaccine. We found 20 types of potential COVID-19 vaccine-related adverse events that may need further investigation.
{"title":"Data Mining Pipeline for COVID-19 Vaccine Safety Analysis Using a Large Electronic Health Record.","authors":"Yan Huang, Xiaojin Li, Deepa Dongarwar, Hulin Wu, Guo-Qiang Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We developed a novel data mining pipeline that automatically extracts potential COVID-19 vaccine-related adverse events from a large Electronic Health Record (EHR) dataset. We applied this pipeline to Optum<sup>®</sup> de-identified COVID-19 EHR dataset containing COVID-19 vaccine records between December 11, 2020 and January 20, 2022. We compared post-vaccination diagnoses between the COVID-19 vaccine group and the influenza vaccine group among 553,682 individuals without COVID-19 infection. We extracted 1,414 ICD-10 diagnosis categories (first three ICD10 digits) within 180 days after the first dose of the COVID-19 vaccine. We then ranked the diagnosis codes using the adverse event rates and adjusted odds ratio based on the self-controlled case series analysis. Using inverse probability of censoring weighting, we estimated the right-censored time-to-event records. Our results show that the COVID-19 vaccine has a similar adverse events rate to the influenza vaccine. We found 20 types of potential COVID-19 vaccine-related adverse events that may need further investigation.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"271-280"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283124/pdf/2352.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinkai Wang, Yanbo Feng, Boning Tong, Jingxuan Bao, Marylyn D Ritchie, Andrew J Saykin, Jason H Moore, Ryan Urbanowicz, Li Shen
STREAMLINE is a simple, transparent, end-to-end automated machine learning (AutoML) pipeline for easily conducting rigorous machine learning (ML) modeling and analysis. The initial version is limited to binary classification. In this work, we extend STREAMLINE through implementing multiple regression-based ML models, including linear regression, elastic net, group lasso, and L21 norm. We demonstrate the effectiveness of the regression version of STREAMLINE by applying it to the prediction of Alzheimer's disease (AD) cognitive outcomes using multimodal brain imaging data. Our empirical results demonstrate the feasibility and effectiveness of the newly expanded STREAMLINE as an AutoML pipeline for evaluating AD regression models, and for discovering multimodal imaging biomarkers.
STREAMLINE 是一个简单、透明、端到端的自动机器学习(AutoML)管道,可轻松进行严格的机器学习(ML)建模和分析。最初的版本仅限于二元分类。在这项工作中,我们扩展了 STREAMLINE,实现了多种基于回归的 ML 模型,包括线性回归、弹性网、组套索和 L21 准则。我们将 STREAMLINE 的回归版本应用于使用多模态脑成像数据预测阿尔茨海默病(AD)的认知结果,从而证明了它的有效性。我们的实证结果证明了新扩展的 STREAMLINE 作为评估 AD 回归模型和发现多模态成像生物标记物的 AutoML 管道的可行性和有效性。
{"title":"Exploring Automated Machine Learning for Cognitive Outcome Prediction from Multimodal Brain Imaging using STREAMLINE.","authors":"Xinkai Wang, Yanbo Feng, Boning Tong, Jingxuan Bao, Marylyn D Ritchie, Andrew J Saykin, Jason H Moore, Ryan Urbanowicz, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>STREAMLINE is a simple, transparent, end-to-end automated machine learning (AutoML) pipeline for easily conducting rigorous machine learning (ML) modeling and analysis. The initial version is limited to binary classification. In this work, we extend STREAMLINE through implementing multiple regression-based ML models, including linear regression, elastic net, group lasso, and L21 norm. We demonstrate the effectiveness of the regression version of STREAMLINE by applying it to the prediction of Alzheimer's disease (AD) cognitive outcomes using multimodal brain imaging data. Our empirical results demonstrate the feasibility and effectiveness of the newly expanded STREAMLINE as an AutoML pipeline for evaluating AD regression models, and for discovering multimodal imaging biomarkers.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"544-553"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283099/pdf/2390.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10070912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are properly specified, they recover full population estimates. We then tackle the real-world task of monitoring the performance of deployed machine learning models whose interactions with clinicians feed-back and affect the selection mechanism of the labels. We train three machine learning models to flag low-yield laboratory diagnostics, and simulate their intended consequence of reducing wasteful laboratory utilization. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool. We propose an altered deployment procedure, one that combines injected randomization with traditional weighted estimates, and find it recovers true model performance.
{"title":"Avoiding Biased Clinical Machine Learning Model Performance Estimates in the Presence of Label Selection.","authors":"Conor K Corbin, Michael Baiocchi, Jonathan H Chen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>When evaluating the performance of clinical machine learning models, one must consider the deployment population. When the population of patients with observed labels is only a subset of the deployment population (label selection), standard model performance estimates on the observed population may be misleading. In this study we describe three classes of label selection and simulate five causally distinct scenarios to assess how particular selection mechanisms bias a suite of commonly reported binary machine learning model performance metrics. Simulations reveal that when selection is affected by observed features, naive estimates of model discrimination may be misleading. When selection is affected by labels, naive estimates of calibration fail to reflect reality. We borrow traditional weighting estimators from causal inference literature and find that when selection probabilities are properly specified, they recover full population estimates. We then tackle the real-world task of monitoring the performance of deployed machine learning models whose interactions with clinicians feed-back and affect the selection mechanism of the labels. We train three machine learning models to flag low-yield laboratory diagnostics, and simulate their intended consequence of reducing wasteful laboratory utilization. We find that naive estimates of AUROC on the observed population undershoot actual performance by up to 20%. Such a disparity could be large enough to lead to the wrongful termination of a successful clinical decision support tool. We propose an altered deployment procedure, one that combines injected randomization with traditional weighted estimates, and find it recovers true model performance.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283136/pdf/2405.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9703649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alana Schreibman, Sherrie Xie, Rebecca A Hubbard, Blanca E Himes
Electronic health record (EHR)-derived data can be linked to geospatially distributed socioeconomic and environmental factors to conduct large-scale epidemiologic studies. Ambient NO2 is a known environmental risk factor for asthma. However, health exposure studies often rely on data from geographically sparse regulatory monitors that may not reflect true individual exposure. We contrasted use of interpolated NO2 regulatory monitor data with raw satellite measurements and satellite-derived ground estimates, building on previous work which has computed improved exposure estimates from remotely sensed data. Raw satellite and satellite-derived ground measurements captured spatial variation missed by interpolated ground monitor measurements. Multivariable analyses comparing these three NO2 measurement approaches (interpolated monitor, raw satellite, and satellite-derived) revealed a positive relationship between exposure and asthma exacerbations for both satellite measurements. Exposure-outcome relationships using the interpolated monitor NO2 were inconsistent with known relationships to asthma, suggesting that interpolated monitor data might yield misleading results in small region studies.
{"title":"Linking Ambient NO2 Pollution Measures with Electronic Health Record Data to Study Asthma Exacerbations.","authors":"Alana Schreibman, Sherrie Xie, Rebecca A Hubbard, Blanca E Himes","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Electronic health record (EHR)-derived data can be linked to geospatially distributed socioeconomic and environmental factors to conduct large-scale epidemiologic studies. Ambient NO2 is a known environmental risk factor for asthma. However, health exposure studies often rely on data from geographically sparse regulatory monitors that may not reflect true individual exposure. We contrasted use of interpolated NO2 regulatory monitor data with raw satellite measurements and satellite-derived ground estimates, building on previous work which has computed improved exposure estimates from remotely sensed data. Raw satellite and satellite-derived ground measurements captured spatial variation missed by interpolated ground monitor measurements. Multivariable analyses comparing these three NO2 measurement approaches (interpolated monitor, raw satellite, and satellite-derived) revealed a positive relationship between exposure and asthma exacerbations for both satellite measurements. Exposure-outcome relationships using the interpolated monitor NO2 were inconsistent with known relationships to asthma, suggesting that interpolated monitor data might yield misleading results in small region studies.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"467-476"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283087/pdf/2145.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9832116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changye Li, Weizhe Xu, Trevor Cohen, Martin Michalowski, Serguei Pakhomov
The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (Toolkit for Reproducible Execution of Speech Text and Language Experiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.
{"title":"TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments.","authors":"Changye Li, Weizhe Xu, Trevor Cohen, Martin Michalowski, Serguei Pakhomov","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (<b>T</b>oolkit for <b>R</b>eproducible <b>E</b>xecution of <b>S</b>peech <b>T</b>ext and <b>L</b>anguage <b>E</b>xperiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2023 ","pages":"360-369"},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283131/pdf/2277.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9715633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}