首页 > 最新文献

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science最新文献

英文 中文
DETECT: Feature extraction method for disease trajectory modeling in electronic health records. DETECT:电子健康记录中疾病轨迹建模的特征提取方法。
Pankhuri Singhal, Lindsay Guare, Colleen Morse, Anastasia Lucas, Marta Byrska-Bishop, Marie A Guerraty, Dokyoon Kim, Marylyn D Ritchie, Anurag Verma

Modeling with longitudinal electronic health record (EHR) data proves challenging given the high dimensionality, redundancy, and noise captured in EHR. In order to improve precision medicine strategies and identify predictors of disease risk in advance, evaluating meaningful patient disease trajectories is essential. In this study, we develop the algorithm DiseasE Trajectory fEature extraCTion (DETECT) for feature extraction and trajectory generation in high-throughput temporal EHR data. This algorithm can 1) simulate longitudinal individual-level EHR data, specified to user parameters of scale, complexity, and noise and 2) use a convergent relative risk framework to test intermediate codes occurring between specified index code(s) and outcome code(s) to determine if they are predictive features of the outcome. Temporal range can be specified to investigate predictors occurring during a specific period of time prior to onset of the outcome. We benchmarked our method on simulated data and generated real-world disease trajectories using DETECT in a cohort of 145,575 individuals diagnosed with hypertension in Penn Medicine EHR for severe cardiometabolic outcomes.

鉴于电子健康记录(EHR)中的高维度、冗余和噪声,使用纵向电子健康记录(EHR)数据建模具有挑战性。为了改进精准医疗策略并提前识别疾病风险预测因子,评估有意义的患者疾病轨迹至关重要。在本研究中,我们开发了疾病轨迹特征提取算法(DiseasE Trajectory fEature extraCTion,DETECT),用于在高通量时态电子病历数据中进行特征提取和轨迹生成。该算法可以:1)模拟纵向个体级电子病历数据,根据用户指定的规模、复杂性和噪声参数进行模拟;2)使用收敛相对风险框架测试在指定的索引代码和结果代码之间出现的中间代码,以确定它们是否是结果的预测特征。可以指定时间范围,以调查结果发生前特定时间段内出现的预测因子。我们在模拟数据上对我们的方法进行了基准测试,并使用 DETECT 生成了真实世界的疾病轨迹,对象是宾夕法尼亚大学医学院 EHR 中诊断为高血压的 145,575 名严重心脏代谢疾病患者。
{"title":"DETECT: Feature extraction method for disease trajectory modeling in electronic health records.","authors":"Pankhuri Singhal, Lindsay Guare, Colleen Morse, Anastasia Lucas, Marta Byrska-Bishop, Marie A Guerraty, Dokyoon Kim, Marylyn D Ritchie, Anurag Verma","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Modeling with longitudinal electronic health record (EHR) data proves challenging given the high dimensionality, redundancy, and noise captured in EHR. In order to improve precision medicine strategies and identify predictors of disease risk in advance, evaluating meaningful patient disease trajectories is essential. In this study, we develop the algorithm <b>D</b>iseas<b>E T</b>rajectory f<b>E</b>ature extra<b>CT</b>ion (<b>DETECT)</b> for feature extraction and trajectory generation in high-throughput temporal EHR data. This algorithm can 1) simulate longitudinal individual-level EHR data, specified to user parameters of scale, complexity, and noise and 2) use a convergent relative risk framework to test intermediate codes occurring between specified index code(s) and outcome code(s) to determine if they are predictive features of the outcome. Temporal range can be specified to investigate predictors occurring during a specific period of time prior to onset of the outcome. We benchmarked our method on simulated data and generated real-world disease trajectories using DETECT in a cohort of 145,575 individuals diagnosed with hypertension in Penn Medicine EHR for severe cardiometabolic outcomes.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283148/pdf/2354.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9715631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Resolution and Spatiotemporal Place-Based Computable Exposures at Scale. 高分辨率和基于时空地点的可计算规模暴露。
Erika Rasnick, Patrick Ryan, Jeff Blossom, Heike Luttmann-Gibson, Nathan Lothrop, Rima Habre, Diane R Gold, Andrew Vancil, Joel Schwartz, James E Gern, Cole Brokamp

Place-based exposures, termed "geomarkers", are powerful determinants of health but are often understudied because of a lack of open data and integration tools. Existing DeGAUSS (Decentralized Geomarker Assessment for Multisite Studies) software has been successfully implemented in multi-site studies, ensuring reproducibility and protection of health information. However, DeGAUSS relies on transporting geomarker data, which is not feasible for high-resolution spatiotemporal data too large to store locally or download over the internet. We expanded the DeGAUSS framework for high-resolution spatiotemporal geomarkers. Our approach stores data subsets based on coarsened location and year in an online repository, and appropriate subsets are downloaded to complete exposure assessment locally using exact date and location. We created and validated two free and open-source DeGAUSS containers for estimation of high-resolution, daily ambient air pollutant exposures, transforming published exposure assessment models into computable exposures for geomarker assessment at scale.

基于地点的暴露(称为 "地理标志物")是健康的有力决定因素,但由于缺乏开放数据和整合工具,对其的研究往往不足。现有的 DeGAUSS(用于多地点研究的分散地理标志物评估)软件已成功应用于多地点研究,确保了可重复性和健康信息的保护。然而,DeGAUSS 依赖于地理标志物数据的传输,这对于高分辨率时空数据来说并不可行,因为数据量太大,无法本地存储或通过互联网下载。我们对 DeGAUSS 框架进行了扩展,使其适用于高分辨率时空地理标记。我们的方法是将基于粗略位置和年份的数据子集存储在一个在线存储库中,然后下载适当的子集,利用准确的日期和位置在本地完成暴露评估。我们创建并验证了两个免费开源的 DeGAUSS 容器,用于估算高分辨率的每日环境空气污染物暴露量,将已发布的暴露评估模型转化为可计算的暴露量,以便进行大规模的地理标志物评估。
{"title":"High Resolution and Spatiotemporal Place-Based Computable Exposures at Scale.","authors":"Erika Rasnick, Patrick Ryan, Jeff Blossom, Heike Luttmann-Gibson, Nathan Lothrop, Rima Habre, Diane R Gold, Andrew Vancil, Joel Schwartz, James E Gern, Cole Brokamp","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Place-based exposures, termed \"geomarkers\", are powerful determinants of health but are often understudied because of a lack of open data and integration tools. Existing DeGAUSS (Decentralized Geomarker Assessment for Multisite Studies) software has been successfully implemented in multi-site studies, ensuring reproducibility and protection of health information. However, DeGAUSS relies on transporting geomarker data, which is not feasible for high-resolution spatiotemporal data too large to store locally or download over the internet. We expanded the DeGAUSS framework for high-resolution spatiotemporal geomarkers. Our approach stores data subsets based on coarsened location and year in an online repository, and appropriate subsets are downloaded to complete exposure assessment locally using exact date and location. We created and validated two free and open-source DeGAUSS containers for estimation of high-resolution, daily ambient air pollutant exposures, transforming published exposure assessment models into computable exposures for geomarker assessment at scale.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283107/pdf/2349.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9712649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Correlation between Fluid Intelligence and Whole-brain Large Scale Structural Connectivity. 挖掘流体智能与全脑大规模结构连接之间的相关性
Sumita Garai, Frederick Xu, Duy Anh Duong-Tran, Yize Zhao, Li Shen

Exploring the neural basis of intelligence and the corresponding associations with brain network has been an active area of research in network neuroscience. Up to now, the majority of explorations mining human intelligence in brain connectomics leverages whole-brain functional connectivity patterns. In this study, structural connectivity patterns are instead used to explore relationships between brain connectivity and different behavioral/cognitive measures such as fluid intelligence. Specifically, we conduct a study using the 397 unrelated subjects from Human Connectome Project (Young Adults) dataset to estimate individual level structural connectivity matrices. We show that topological features, as quantified by our proposed measurements: Average Persistence (AP) and Persistent Entropy (PE), has statistically significant associations with different behavioral/cognitive measures. We also perform a parallel study using traditional graph-theoretical measures, provided by Brain Connectivity Toolbox, as benchmarks for our study. Our findings indicate that individual's structural connectivity indeed offers reliable predictive power of different behavioral/cognitive measures, including but not limited to fluid intelligence. Our results suggest that structural connectomes provide complementary insights (compared to using functional connectomes) in predicting human intelligence and warrants future studies on human intelligence and/or other behavioral/cognitive measures involving multi-modal approach.

探索智力的神经基础以及与脑网络的相应关联一直是网络神经科学的一个活跃研究领域。迄今为止,在大脑连接组学中对人类智力的探索大多利用全脑功能连接模式。在本研究中,结构连接模式被用来探索大脑连接与不同行为/认知测量(如流体智力)之间的关系。具体来说,我们利用人类连接组计划(年轻成人)数据集中的 397 个无关受试者进行研究,以估计个体水平的结构连接矩阵。我们的研究表明,我们提出的测量方法可以量化拓扑特征:平均持续性(AP)和持续熵(PE)与不同的行为/认知测量结果有显著的统计学关联。我们还使用脑连接工具箱提供的传统图论测量方法作为研究基准,进行了平行研究。我们的研究结果表明,个体的结构连通性确实能可靠地预测不同的行为/认知指标,包括但不限于流体智力。我们的研究结果表明,与使用功能连接组相比,结构连接组在预测人类智力方面提供了互补性的见解,值得今后对人类智力和/或其他涉及多模态方法的行为/认知测量进行研究。
{"title":"Mining Correlation between Fluid Intelligence and Whole-brain Large Scale Structural Connectivity.","authors":"Sumita Garai, Frederick Xu, Duy Anh Duong-Tran, Yize Zhao, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Exploring the neural basis of intelligence and the corresponding associations with brain network has been an active area of research in network neuroscience. Up to now, the majority of explorations mining human intelligence in brain connectomics leverages whole-brain functional connectivity patterns. In this study, structural connectivity patterns are instead used to explore relationships between brain connectivity and different behavioral/cognitive measures such as fluid intelligence. Specifically, we conduct a study using the 397 unrelated subjects from Human Connectome Project (Young Adults) dataset to estimate individual level structural connectivity matrices. We show that topological features, as quantified by our proposed measurements: Average Persistence (AP) and Persistent Entropy (PE), has statistically significant associations with different behavioral/cognitive measures. We also perform a parallel study using traditional graph-theoretical measures, provided by Brain Connectivity Toolbox, as benchmarks for our study. Our findings indicate that individual's structural connectivity indeed offers reliable predictive power of different behavioral/cognitive measures, including but not limited to fluid intelligence. Our results suggest that structural connectomes provide complementary insights (compared to using functional connectomes) in predicting human intelligence and warrants future studies on human intelligence and/or other behavioral/cognitive measures involving multi-modal approach.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283120/pdf/2239.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9712653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hypergraph Transformers for EHR-based Clinical Predictions. 基于电子病历的临床预测超图变换器。
Ran Xu, Mohammed K Ali, Joyce C Ho, Carl Yang

Electronic health records (EHR) data contain rich information about patients' health conditions including diagnosis, procedures, medications and etc., which have been widely used to facilitate digital medicine. Despite its importance, it is often non-trivial to learn useful representations for patients' visits that support downstream clinical predictions, as each visit contains massive and diverse medical codes. As a result, the complex interactions among medical codes are often not captured, which leads to substandard predictions. To better model these complex relations, we leverage hypergraphs, which go beyond pairwise relations to jointly learn the representations for visits and medical codes. We also propose to use the self-attention mechanism to automatically identify the most relevant medical codes for each visit based on the downstream clinical predictions with better generalization power. Experiments on two EHR datasets show that our proposed method not only yields superior performance, but also provides reasonable insights towards the target tasks.

电子健康记录(EHR)数据包含有关患者健康状况的丰富信息,包括诊断、手术、用药等,这些信息已被广泛用于促进数字医疗。尽管电子病历非常重要,但由于每次就诊都包含大量不同的医疗代码,要学习有用的就诊表征以支持下游临床预测往往并非易事。因此,医疗代码之间复杂的相互作用往往无法捕捉,从而导致预测结果不达标。为了更好地模拟这些复杂的关系,我们利用超图来超越配对关系,共同学习就诊和医疗代码的表征。我们还建议使用自我关注机制,根据下游临床预测自动识别每次就诊最相关的医疗代码,从而获得更好的泛化能力。在两个电子病历数据集上的实验表明,我们提出的方法不仅性能优越,而且对目标任务提供了合理的见解。
{"title":"Hypergraph Transformers for EHR-based Clinical Predictions.","authors":"Ran Xu, Mohammed K Ali, Joyce C Ho, Carl Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Electronic health records (EHR) data contain rich information about patients' health conditions including diagnosis, procedures, medications and etc., which have been widely used to facilitate digital medicine. Despite its importance, it is often non-trivial to learn useful representations for patients' visits that support downstream clinical predictions, as each visit contains massive and diverse medical codes. As a result, the complex interactions among medical codes are often not captured, which leads to substandard predictions. To better model these complex relations, we leverage hypergraphs, which go beyond pairwise relations to jointly learn the representations for visits and medical codes. We also propose to use the self-attention mechanism to automatically identify the most relevant medical codes for each visit based on the downstream clinical predictions with better generalization power. Experiments on two EHR datasets show that our proposed method not only yields superior performance, but also provides reasonable insights towards the target tasks.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283128/pdf/2220.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9866076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting Temporal Expressions of First Seizure Onset from Epilepsy Patient Discharge Summaries. 从癫痫患者出院摘要中提取首次癫痫发作的时间表达。
Shiqiang Tao, Rashmie Abeysinghe, Blanca Talavera De La Esperanza, Samden Lhatoo, Guo-Qiang Zhang, Licong Cui

Early onset of seizure is a potential risk factor for Sudden Unexpected Death in Epilepsy (SUDEP). However, the first seizure onset information is often documented as clinical narratives in epilepsy monitoring unit (EMU) discharge summaries. Manually extracting first seizure onset time from discharge summaries is time consuming and labor-intensive. In this work, we developed a rule-based natural language processing pipeline for automatically extracting the temporal information of patients' first seizure onset from EMU discharge summaries. We use the Epilepsy and Seizure Ontology (EpSO) as the core knowledge resource and construct 4 extraction rules based on 300 randomly selected EMU discharge summaries. To evaluate the effectiveness of the extraction pipeline, we apply the constructed rules on another 200 unseen discharge summaries and compare the results against the manual evaluation of a domain expert. Overall, our extraction pipeline achieved a precision of 0.75, recall of 0.651, and F1-score of 0.697. This is an encouraging initial result which will allow us to gain insights into potentially better-performing approaches.

癫痫早期发作是癫痫意外猝死(SUDEP)的潜在风险因素。然而,首次癫痫发作的信息通常作为临床叙述记录在癫痫监护病房(EMU)的出院摘要中。从出院摘要中手动提取首次癫痫发作时间既耗时又耗力。在这项工作中,我们开发了一种基于规则的自然语言处理管道,用于自动从癫痫监护室出院摘要中提取患者首次癫痫发作的时间信息。我们使用癫痫与发作本体(EpSO)作为核心知识资源,并基于随机选取的 300 份 EMU 出院摘要构建了 4 条提取规则。为了评估提取管道的有效性,我们在另外 200 份未见过的出院摘要上应用了所构建的规则,并将结果与领域专家的人工评估结果进行了比较。总体而言,我们的提取管道达到了 0.75 的精确度、0.651 的召回率和 0.697 的 F1 分数。这是一个令人鼓舞的初步结果,有助于我们深入了解可能性能更好的方法。
{"title":"Extracting Temporal Expressions of First Seizure Onset from Epilepsy Patient Discharge Summaries.","authors":"Shiqiang Tao, Rashmie Abeysinghe, Blanca Talavera De La Esperanza, Samden Lhatoo, Guo-Qiang Zhang, Licong Cui","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Early onset of seizure is a potential risk factor for Sudden Unexpected Death in Epilepsy (SUDEP). However, the first seizure onset information is often documented as clinical narratives in epilepsy monitoring unit (EMU) discharge summaries. Manually extracting first seizure onset time from discharge summaries is time consuming and labor-intensive. In this work, we developed a rule-based natural language processing pipeline for automatically extracting the temporal information of patients' first seizure onset from EMU discharge summaries. We use the Epilepsy and Seizure Ontology (EpSO) as the core knowledge resource and construct 4 extraction rules based on 300 randomly selected EMU discharge summaries. To evaluate the effectiveness of the extraction pipeline, we apply the constructed rules on another 200 unseen discharge summaries and compare the results against the manual evaluation of a domain expert. Overall, our extraction pipeline achieved a precision of 0.75, recall of 0.651, and F1-score of 0.697. This is an encouraging initial result which will allow us to gain insights into potentially better-performing approaches.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283149/pdf/2272.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9859585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Association of Learning Health System Practicing Hospitals and other Health Information Interested Hospitals with Patient-Generated Health Data Uptake. 学习健康系统实践医院协会和其他有患者生成健康数据的健康信息感兴趣的医院。
Ibukun E Fowe, Neal T Wallace, Jeffrey Kaye

Patient generated health data (PGHD) has been described as a necessary addition to provider-generated information for improving care processes in US hospitals. This study evaluated the distribution of Health Information Interested (HII) US hospitals that are more likely to capture or use PGHD. The literature suggests that HII hospitals are more likely to capture and use PGHD. Cross-sectional analysis of the 2018 American Hospital Association's (AHA) health-IT-supplement and other supporting datasets showed that HII hospitals collectively and majority of HII hospital subcategories evaluated were associated with increased PGHD capture and use. The full Learning Health System (LHS) hospital subcategory had the highest association and hospitals in the meaningful use stage three compliant (MU3) and PCORI funded subcategory also had higher rates of PGHD capture or use when in combination with LHS hospitals. Hence, being LHS appears to be the strongest practice and policy lever to increase PGHD capture and use.

患者生成的健康数据(PGHD)被描述为对提供者生成的信息的必要补充,用于改善美国医院的护理流程。本研究评估了更有可能捕获或使用PGHD的美国健康信息感兴趣(HII)医院的分布情况。文献表明,HII医院更有可能捕获和使用PGHD。对2018年美国医院协会(AHA)健康信息技术补充和其他支持数据集的横断面分析显示,HII医院和大多数评估的HII医院子类别与PGHD捕获和使用的增加有关。完全学习健康系统(LHS)医院子类别具有最高的关联性,在符合有意义使用阶段三(MU3)和PCORI资助的子类别中的医院与LHS医院联合使用时,PGHD的捕获或使用率也较高。因此,成为LHS似乎是增加PGHD捕获和使用的最强实践和政策杠杆。
{"title":"The Association of Learning Health System Practicing Hospitals and other Health Information Interested Hospitals with Patient-Generated Health Data Uptake.","authors":"Ibukun E Fowe,&nbsp;Neal T Wallace,&nbsp;Jeffrey Kaye","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Patient generated health data (PGHD) has been described as a necessary addition to provider-generated information for improving care processes in US hospitals. This study evaluated the distribution of Health Information Interested (HII) US hospitals that are more likely to capture or use PGHD. The literature suggests that HII hospitals are more likely to capture and use PGHD. Cross-sectional analysis of the 2018 American Hospital Association's (AHA) health-IT-supplement and other supporting datasets showed that HII hospitals collectively and majority of HII hospital subcategories evaluated were associated with increased PGHD capture and use. The full Learning Health System (LHS) hospital subcategory had the highest association and hospitals in the meaningful use stage three compliant (MU3) and PCORI funded subcategory also had higher rates of PGHD capture or use when in combination with LHS hospitals. Hence, being LHS appears to be the strongest practice and policy lever to increase PGHD capture and use.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283141/pdf/2055.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9711835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Detection of Intimate Partner Violence Victims from Social Media for Proactive Delivery of Support. 从社交媒体自动检测亲密伴侣暴力受害者,以主动提供支持。
Yuting Guo, Sangmi Kim, Elise Warren, Yuan-Chi Yang, Sahithi Lakamana, Abeed Sarker

Social media platforms are increasingly being used by intimate partner violence (IPV) victims to share experiences and seek support. If such information is automatically curated, it may be possible to conduct social media based surveillance and even design interventions over such platforms. In this paper, we describe the development of a supervised classification system that automatically characterizes IPV-related posts on the social network Reddit. We collected data from four IPV-related subreddits and manually annotated the data to indicate whether a post is a self-report of IPV or not. Using the annotated data (N=289), we trained, evaluated, and compared supervised machine learning systems. A transformer-based classifier, RoBERTa, obtained the best classification performance with overall accuracy of 78% and IPV-self-report class 𝐹1 -score of 0.67. Post-classification error analyses revealed that misclassifications often occur for posts that are very long or are non-first-person reports of IPV. Despite the relatively small annotated data, our classification methods obtained promising results, indicating that it may be possible to detect and, hence, provide support to IPV victims over Reddit.

亲密伴侣暴力 (IPV) 受害者越来越多地使用社交媒体平台来分享经历和寻求支持。如果能对此类信息进行自动整理,就有可能在此类平台上进行基于社交媒体的监控,甚至设计干预措施。在本文中,我们介绍了一个监督分类系统的开发过程,该系统可自动描述社交网络 Reddit 上与 IPV 相关的帖子。我们从四个与 IPV 相关的 subreddits 中收集了数据,并对数据进行了人工标注,以表明帖子是否是 IPV 的自我报告。利用注释数据(N=289),我们对监督机器学习系统进行了训练、评估和比较。基于转换器的分类器 RoBERTa 获得了最好的分类效果,总体准确率为 78%,IPV 自我报告类的ᵃ1 分数为 0.67。分类后误差分析表明,对于篇幅很长或非第一人称的 IPV 报告,经常会出现分类错误。尽管注释数据相对较少,但我们的分类方法仍取得了可喜的成果,这表明我们有可能在 Reddit 上检测到 IPV 受害者并为其提供支持。
{"title":"Automatic Detection of Intimate Partner Violence Victims from Social Media for Proactive Delivery of Support.","authors":"Yuting Guo, Sangmi Kim, Elise Warren, Yuan-Chi Yang, Sahithi Lakamana, Abeed Sarker","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Social media platforms are increasingly being used by intimate partner violence (IPV) victims to share experiences and seek support. If such information is automatically curated, it may be possible to conduct social media based surveillance and even design interventions over such platforms. In this paper, we describe the development of a supervised classification system that automatically characterizes IPV-related posts on the social network Reddit. We collected data from four IPV-related subreddits and manually annotated the data to indicate whether a post is a self-report of IPV or not. Using the annotated data (N=289), we trained, evaluated, and compared supervised machine learning systems. A transformer-based classifier, RoBERTa, obtained the best classification performance with overall accuracy of 78% and IPV-self-report class 𝐹<sub>1</sub> -score of 0.67. Post-classification error analyses revealed that misclassifications often occur for posts that are very long or are non-first-person reports of IPV. Despite the relatively small annotated data, our classification methods obtained promising results, indicating that it may be possible to detect and, hence, provide support to IPV victims over Reddit.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283132/pdf/2018.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9767214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enrichment of a Data Lake to Support Population Health Outcomes Studies Using Social Determinants Linked EHR Data. 利用社会决定因素相关的电子病历数据丰富数据湖以支持人口健康结果研究。
Md Kamruz Zaman Rana, Xing Song, Humayera Islam, Tanmoy Paul, Khuder Alaboud, Lemuel R Waitman, Abu S M Mosa

The integration of electronic health records (EHRs) with social determinants of health (SDoH) is crucial for population health outcome research, but it requires the collection of identifiable information and poses security risks. This study presents a framework for facilitating de-identified clinical data with privacy-preserved geocoded linked SDoH data in a Data Lake. A reidentification risk detection algorithm was also developed to evaluate the transmission risk of the data. The utility of this framework was demonstrated through one population health outcomes research analyzing the correlation between socioeconomic status and the risk of having chronic conditions. The results of this study inform the development of evidence-based interventions and support the use of this framework in understanding the complex relationships between SDoH and health outcomes. This framework reduces computational and administrative workload and security risks for researchers and preserves data privacy and enables rapid and reliable research on SDoH-connected clinical data for research institutes.

电子健康记录(EHRs)与健康社会决定因素(SDoH)的整合对于人口健康结果研究至关重要,但它需要收集可识别的信息,并存在安全风险。本研究提出了一个框架,用于促进在数据湖中使用隐私保护的地理编码链接的SDoH数据去识别临床数据。提出了一种重新识别风险检测算法来评估数据的传输风险。通过一项人口健康结果研究,分析了社会经济地位与患慢性病风险之间的相关性,证明了这一框架的效用。这项研究的结果为基于证据的干预措施的发展提供了信息,并支持使用这一框架来理解SDoH与健康结果之间的复杂关系。该框架减少了研究人员的计算和管理工作量和安全风险,并保护了数据隐私,使研究机构能够快速可靠地研究与sdoh相关的临床数据。
{"title":"Enrichment of a Data Lake to Support Population Health Outcomes Studies Using Social Determinants Linked EHR Data.","authors":"Md Kamruz Zaman Rana,&nbsp;Xing Song,&nbsp;Humayera Islam,&nbsp;Tanmoy Paul,&nbsp;Khuder Alaboud,&nbsp;Lemuel R Waitman,&nbsp;Abu S M Mosa","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The integration of electronic health records (EHRs) with social determinants of health (SDoH) is crucial for population health outcome research, but it requires the collection of identifiable information and poses security risks. This study presents a framework for facilitating de-identified clinical data with privacy-preserved geocoded linked SDoH data in a Data Lake. A reidentification risk detection algorithm was also developed to evaluate the transmission risk of the data. The utility of this framework was demonstrated through one population health outcomes research analyzing the correlation between socioeconomic status and the risk of having chronic conditions. The results of this study inform the development of evidence-based interventions and support the use of this framework in understanding the complex relationships between SDoH and health outcomes. This framework reduces computational and administrative workload and security risks for researchers and preserves data privacy and enables rapid and reliable research on SDoH-connected clinical data for research institutes.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283101/pdf/2450.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Suicidal Behavior and Self-harm Among Children Presenting to Emergency Departments: A Tree-based Classification Approach. 检测急诊科就诊儿童的自杀行为和自残行为:基于树的分类方法
Juliet B Edgcomb, Chi-Hong Tseng, Mengtong Pan, Alexandra Klomhaus, Bonnie Zima

Suicide is the second leading cause of death of U.S. children over 10 years old. Application of statistical learning to structured EHR data may improve detection of children with suicidal behavior and self-harm. Classification trees (CART) were developed and cross-validated using mental health-related emergency department (MH-ED) visits (2015-2019) of children 10-17 years (N=600) across two sites. Performance was compared with the CDC Surveillance Case Definition ICD-10-CM code list. Gold-standard was child psychiatrist chart review. Visits were suicide-related among 284/600 (47.3%) children. ICD-10-CM detected cases with sensitivity 70.7 (95%CI 67.0-74.3), specificity 99.0 (98.8-100), and 85/284 (29.9%) false negatives. CART detected cases with sensitivity 85.1 (64.7-100) and specificity 94.9 (89.2-100). Strongest predictors were suicide-related code, MH- and suicide-related chief complaints, site, area deprivation index, and depression. Diagnostic codes miss nearly one-third of children with suicidal behavior and self-harm. Advances in EHR-based phenotyping have the potential to improve detection of childhood-onset suicidality.

自杀是美国 10 岁以上儿童的第二大死因。将统计学习应用于结构化电子病历数据可提高对有自杀行为和自残行为的儿童的检测能力。我们开发了分类树 (CART),并使用两个地点的 10-17 岁儿童(N=600)的精神健康相关急诊(MH-ED)就诊记录(2015-2019 年)进行交叉验证。结果与疾病预防控制中心监测病例定义 ICD-10-CM 代码列表进行了比较。金标准为儿童精神科医生病历审查。284/600(47.3%)名儿童的就诊与自杀有关。ICD-10-CM 发现病例的灵敏度为 70.7 (95%CI 67.0-74.3),特异性为 99.0 (98.8-100),假阴性为 85/284 (29.9%)。CART 检测病例的灵敏度为 85.1(64.7-100),特异性为 94.9(89.2-100)。最强的预测因素是自杀相关代码、精神健康和自杀相关主诉、地点、地区贫困指数和抑郁症。诊断代码遗漏了近三分之一有自杀行为和自残行为的儿童。基于电子病历的表型分析技术的进步有望改善对儿童自杀倾向的检测。
{"title":"Detection of Suicidal Behavior and Self-harm Among Children Presenting to Emergency Departments: A Tree-based Classification Approach.","authors":"Juliet B Edgcomb, Chi-Hong Tseng, Mengtong Pan, Alexandra Klomhaus, Bonnie Zima","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Suicide is the second leading cause of death of U.S. children over 10 years old. Application of statistical learning to structured EHR data may improve detection of children with suicidal behavior and self-harm. Classification trees (CART) were developed and cross-validated using mental health-related emergency department (MH-ED) visits (2015-2019) of children 10-17 years (N=600) across two sites. Performance was compared with the CDC Surveillance Case Definition ICD-10-CM code list. Gold-standard was child psychiatrist chart review. Visits were suicide-related among 284/600 (47.3%) children. ICD-10-CM detected cases with sensitivity 70.7 (95%CI 67.0-74.3), specificity 99.0 (98.8-100), and 85/284 (29.9%) false negatives. CART detected cases with sensitivity 85.1 (64.7-100) and specificity 94.9 (89.2-100). Strongest predictors were suicide-related code, MH- and suicide-related chief complaints, site, area deprivation index, and depression. Diagnostic codes miss nearly one-third of children with suicidal behavior and self-harm. Advances in EHR-based phenotyping have the potential to improve detection of childhood-onset suicidality.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283119/pdf/2295.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10089106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Automated Machine Learning for Cognitive Outcome Prediction from Multimodal Brain Imaging using STREAMLINE. 利用 STREAMLINE 探索通过多模态脑成像进行认知结果预测的自动化机器学习。
Xinkai Wang, Yanbo Feng, Boning Tong, Jingxuan Bao, Marylyn D Ritchie, Andrew J Saykin, Jason H Moore, Ryan Urbanowicz, Li Shen

STREAMLINE is a simple, transparent, end-to-end automated machine learning (AutoML) pipeline for easily conducting rigorous machine learning (ML) modeling and analysis. The initial version is limited to binary classification. In this work, we extend STREAMLINE through implementing multiple regression-based ML models, including linear regression, elastic net, group lasso, and L21 norm. We demonstrate the effectiveness of the regression version of STREAMLINE by applying it to the prediction of Alzheimer's disease (AD) cognitive outcomes using multimodal brain imaging data. Our empirical results demonstrate the feasibility and effectiveness of the newly expanded STREAMLINE as an AutoML pipeline for evaluating AD regression models, and for discovering multimodal imaging biomarkers.

STREAMLINE 是一个简单、透明、端到端的自动机器学习(AutoML)管道,可轻松进行严格的机器学习(ML)建模和分析。最初的版本仅限于二元分类。在这项工作中,我们扩展了 STREAMLINE,实现了多种基于回归的 ML 模型,包括线性回归、弹性网、组套索和 L21 准则。我们将 STREAMLINE 的回归版本应用于使用多模态脑成像数据预测阿尔茨海默病(AD)的认知结果,从而证明了它的有效性。我们的实证结果证明了新扩展的 STREAMLINE 作为评估 AD 回归模型和发现多模态成像生物标记物的 AutoML 管道的可行性和有效性。
{"title":"Exploring Automated Machine Learning for Cognitive Outcome Prediction from Multimodal Brain Imaging using STREAMLINE.","authors":"Xinkai Wang, Yanbo Feng, Boning Tong, Jingxuan Bao, Marylyn D Ritchie, Andrew J Saykin, Jason H Moore, Ryan Urbanowicz, Li Shen","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>STREAMLINE is a simple, transparent, end-to-end automated machine learning (AutoML) pipeline for easily conducting rigorous machine learning (ML) modeling and analysis. The initial version is limited to binary classification. In this work, we extend STREAMLINE through implementing multiple regression-based ML models, including linear regression, elastic net, group lasso, and L21 norm. We demonstrate the effectiveness of the regression version of STREAMLINE by applying it to the prediction of Alzheimer's disease (AD) cognitive outcomes using multimodal brain imaging data. Our empirical results demonstrate the feasibility and effectiveness of the newly expanded STREAMLINE as an AutoML pipeline for evaluating AD regression models, and for discovering multimodal imaging biomarkers.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10283099/pdf/2390.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10070912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1