Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00135
Ko-Hong Lin, Jay-Jiguang Zhu, Judith A Smith, Yejin Kim, Xiaoqian Jiang
Our study aims to address the challenges in drug development for glioblastoma, a highly aggressive brain cancer with poor prognosis. We propose a computational framework that utilizes machine learning-based propensity score matching to estimate counterfactual treatment effects and predict synergistic effects of drug combinations. Through our in-silico analysis, we identified promising drug candidates and drug combinations that warrant further investigation. To validate these computational findings, we conducted in-vitro experiments on two GBM cell lines, U87 and T98G. The experimental results demonstrated that some of the identified drugs and drug combinations indeed exhibit strong suppressive effects on GBM cell growth. Our end-to-end pipeline showcases the feasibility of integrating computational models with biological experiments to expedite drug repurposing and discovery efforts. By bridging the gap between in-silico analysis and in-vitro validation, we demonstrate the potential of this approach to accelerate the development of novel and effective treatments for glioblastoma.
{"title":"An End-to-end <i>In-Silico</i> and <i>In-Vitro</i> Drug Repurposing Pipeline for Glioblastoma.","authors":"Ko-Hong Lin, Jay-Jiguang Zhu, Judith A Smith, Yejin Kim, Xiaoqian Jiang","doi":"10.1109/ichi57859.2023.00135","DOIUrl":"10.1109/ichi57859.2023.00135","url":null,"abstract":"<p><p>Our study aims to address the challenges in drug development for glioblastoma, a highly aggressive brain cancer with poor prognosis. We propose a computational framework that utilizes machine learning-based propensity score matching to estimate counterfactual treatment effects and predict synergistic effects of drug combinations. Through our <i>in-silico</i> analysis, we identified promising drug candidates and drug combinations that warrant further investigation. To validate these computational findings, we conducted <i>in-vitro</i> experiments on two GBM cell lines, U87 and T98G. The experimental results demonstrated that some of the identified drugs and drug combinations indeed exhibit strong suppressive effects on GBM cell growth. Our end-to-end pipeline showcases the feasibility of integrating computational models with biological experiments to expedite drug repurposing and discovery efforts. By bridging the gap between <i>in-silico</i> analysis and <i>in-vitro</i> validation, we demonstrate the potential of this approach to accelerate the development of novel and effective treatments for glioblastoma.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"738-745"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10956733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00047
Yaohua Wang, Lisanne Van Dijk, Abdallah S R Mohamed, Mohamed Naser, Clifton David Fuller, Xinhua Zhang, G Elisabeta Marai, Guadalupe Canahuate
Patient-Reported Outcomes (PRO) are collected directly from the patients using symptom questionnaires. In the case of head and neck cancer patients, PRO surveys are recorded every week during treatment with each patient's visit to the clinic and at different follow-up times after the treatment has concluded. PRO surveys can be very informative regarding the patient's status and the effect of treatment on the patient's quality of life (QoL). Processing PRO data is challenging for several reasons. First, missing data is frequent as patients might skip a question or a questionnaire altogether. Second, PROs are patient-dependent, a rating of 5 for one patient might be a rating of 10 for another patient. Finally, most patients experience severe symptoms during treatment which usually subside over time. However, for some patients, late toxicities persist negatively affecting the patient's QoL. These long-term severe symptoms are hard to predict and are the focus of this study. In this work, we model PRO data collected from head and neck cancer patients treated at the MD Anderson Cancer Center using the MD Anderson Symptom Inventory (MDASI) questionnaire as time series. We impute missing values with a combination of K nearest neighbor (KNN) and Long Short-Term Memory (LSTM) neural networks, and finally, apply LSTM to predict late symptom severity 12 months after treatment. We compare performance against clinical and ARIMA models. We show that the LSTM model combined with KNN imputation is effective in predicting late-stage symptom ratings for occurrence and severity under the AUC and F1 score metrics.
患者报告结果 (PRO) 是通过症状问卷直接从患者处收集的。就头颈部癌症患者而言,在治疗期间,每周都会对每位患者的就诊情况和治疗结束后的不同随访时间进行PRO调查记录。PRO调查可以为患者的状况以及治疗对患者生活质量(QoL)的影响提供大量信息。处理 PRO 数据具有挑战性,原因有以下几点。首先,由于患者可能会跳过某个问题或问卷,因此经常会出现数据缺失的情况。其次,PRO 与病人有关,一个病人的评分是 5 分,另一个病人的评分可能是 10 分。最后,大多数患者在治疗期间都会出现严重的症状,这些症状通常会随着时间的推移而消退。然而,对于某些患者来说,后期毒性反应持续存在,对患者的生活质量产生负面影响。这些长期的严重症状很难预测,也是本研究的重点。在这项研究中,我们使用 MD 安德森症状量表 (MDASI) 问卷对在 MD 安德森癌症中心接受治疗的头颈部癌症患者的 PRO 数据建立了时间序列模型。我们使用 K 最近邻(KNN)和长短期记忆(LSTM)神经网络组合来弥补缺失值,最后应用 LSTM 预测治疗 12 个月后的晚期症状严重程度。我们将其性能与临床模型和 ARIMA 模型进行了比较。结果表明,LSTM 模型与 KNN 估算相结合,能有效预测 AUC 和 F1 分数指标下的晚期症状发生率和严重程度。
{"title":"Improving Prediction of Late Symptoms using LSTM and Patient-reported Outcomes for Head and Neck Cancer Patients.","authors":"Yaohua Wang, Lisanne Van Dijk, Abdallah S R Mohamed, Mohamed Naser, Clifton David Fuller, Xinhua Zhang, G Elisabeta Marai, Guadalupe Canahuate","doi":"10.1109/ichi57859.2023.00047","DOIUrl":"10.1109/ichi57859.2023.00047","url":null,"abstract":"<p><p>Patient-Reported Outcomes (PRO) are collected directly from the patients using symptom questionnaires. In the case of head and neck cancer patients, PRO surveys are recorded every week during treatment with each patient's visit to the clinic and at different follow-up times after the treatment has concluded. PRO surveys can be very informative regarding the patient's status and the effect of treatment on the patient's quality of life (QoL). Processing PRO data is challenging for several reasons. First, missing data is frequent as patients might skip a question or a questionnaire altogether. Second, PROs are patient-dependent, a rating of 5 for one patient might be a rating of 10 for another patient. Finally, most patients experience severe symptoms during treatment which usually subside over time. However, for some patients, late toxicities persist negatively affecting the patient's QoL. These long-term severe symptoms are hard to predict and are the focus of this study. In this work, we model PRO data collected from head and neck cancer patients treated at the MD Anderson Cancer Center using the MD Anderson Symptom Inventory (MDASI) questionnaire as time series. We impute missing values with a combination of K nearest neighbor (KNN) and Long Short-Term Memory (LSTM) neural networks, and finally, apply LSTM to predict late symptom severity 12 months after treatment. We compare performance against clinical and ARIMA models. We show that the LSTM model combined with KNN imputation is effective in predicting late-stage symptom ratings for occurrence and severity under the AUC and F1 score metrics.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"292-300"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10853990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00104
Muskan Garg, Sunghwan Sohn
With advancements in analysis of cognitive decline in electronic health records, the research community witnesses a recent surge in social media posting by caregivers and/or loved ones of people with cognitive decline. The major challenges in this area are availability of large and diverse datasets, ethics of data collection and sharing, diagnostic specificity and clinical acceptability. To this end, we construct a new dataset, Caregivers experiences with cognitive Decline (CareD), of 1005 posts with more than 194K words and 9541 sentences, highlighting discussions on people with dementia and Alzheimer's disease on Reddit. We discuss the changing trends of discussions on cognitive decline in social media and open challenges for natural language processing and social computing. We first identify the Reddit posts reflecting substantial information as candidate posts. We further formulate the annotation guidelines, handle perplexities to investigate the existence of experiences, self-reported articles and potential caregiver in candidate posts, resulting in the discovery of latent symptoms, firsthand information, and prospective source of longitudinal information about the patient, respectively.
{"title":"CareD: Caregiver's Experience with Cognitive Decline in Reddit Posts.","authors":"Muskan Garg, Sunghwan Sohn","doi":"10.1109/ichi57859.2023.00104","DOIUrl":"10.1109/ichi57859.2023.00104","url":null,"abstract":"<p><p>With advancements in analysis of cognitive decline in electronic health records, the research community witnesses a recent surge in social media posting by caregivers and/or loved ones of people with cognitive decline. The major challenges in this area are availability of large and diverse datasets, ethics of data collection and sharing, diagnostic specificity and clinical acceptability. To this end, we construct a new dataset, Caregivers experiences with cognitive Decline (CareD), of 1005 posts with more than 194K words and 9541 sentences, highlighting discussions on people with dementia and Alzheimer's disease on Reddit. We discuss the changing trends of discussions on cognitive decline in social media and open challenges for natural language processing and social computing. We first identify the Reddit posts reflecting substantial information as candidate posts. We further formulate the annotation guidelines, handle perplexities to investigate the existence of experiences, self-reported articles and potential caregiver in candidate posts, resulting in the discovery of latent symptoms, firsthand information, and prospective source of longitudinal information about the patient, respectively.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"581-587"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10877621/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00021
Yuhang Jiang, Ramakanth Kavuluru
Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the combination drug therapy MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an n-ary relation extraction problem. Unlike in the general n-ary setting where n is fixed (e.g., drug-gene-mutation relations where n = 3), extracting combination therapies is a special setting where n ≥ 2 is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, CombDrugExt, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of 66.7% on the CombDrugExt test set for positive (or effective) combinations. This is an absolute ≈ 5% F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic n-ary extraction scenarios.
{"title":"End-to-End <i>n</i>-ary Relation Extraction for Combination Drug Therapies.","authors":"Yuhang Jiang, Ramakanth Kavuluru","doi":"10.1109/ichi57859.2023.00021","DOIUrl":"10.1109/ichi57859.2023.00021","url":null,"abstract":"<p><p>Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the <b>combination drug therapy</b> MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an <i>n</i>-ary relation extraction problem. Unlike in the general <i>n</i>-ary setting where <i>n</i> is fixed (e.g., drug-gene-mutation relations where <i>n</i> = 3), extracting combination therapies is a special setting where <i>n</i> ≥ 2 is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, <b>CombDrugExt</b>, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of 66.7% on the <b>CombDrugExt</b> test set for positive (or effective) combinations. This is an absolute <i>≈</i> 5% F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic <i>n</i>-ary extraction scenarios.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"72-80"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10814995/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00018
Pulakesh Upadhyaya, Yaobin Ling, Luyao Chen, Yejin Kim, Xiaoqian Jiang
Alzheimer's disease (AD) is one of the leading causes of death in the United States, especially among the elderly. Recent studies have shown how hypertension is related to cognitive decline in elderly patients, which in turn leads to increased mortality as well as morbidity. There have been various studies that have looked at the effect of antihypertensive drugs in reducing cognitive decline, and their results have proved inconclusive. However, most of these studies assume the treatment effect is similar for all patients, thus considering only the average treatment effects of antihypertensive drugs. In this paper, we assume that the effect of antihypertensives on the onset of AD depends on patient characteristics. We develop a deep learning method called LASSO-Dragonnet to estimate the individualized treatment effects of each patient. We considered six antihypertensive drugs, and each of the six models considered one of the drugs as the treatment and the remaining as control. Our studies showed that although many antihypertensives have a positive impact in delaying AD onset on average, the impact varies from individual to individual, depending on their various characteristics. We also analyzed the importance of various covariates in such an estimation. Our results showed that the individualized treatment effects of each patient could be estimated accurately using a deep learning method, and that the importance of various covariates could be determined.
{"title":"Inferring Personalized Treatment Effect of Antihypertensives on Alzheimer's Disease Using Deep Learning.","authors":"Pulakesh Upadhyaya, Yaobin Ling, Luyao Chen, Yejin Kim, Xiaoqian Jiang","doi":"10.1109/ichi57859.2023.00018","DOIUrl":"10.1109/ichi57859.2023.00018","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is one of the leading causes of death in the United States, especially among the elderly. Recent studies have shown how hypertension is related to cognitive decline in elderly patients, which in turn leads to increased mortality as well as morbidity. There have been various studies that have looked at the effect of antihypertensive drugs in reducing cognitive decline, and their results have proved inconclusive. However, most of these studies assume the treatment effect is similar for all patients, thus considering only the average treatment effects of antihypertensive drugs. In this paper, we assume that the effect of antihypertensives on the onset of AD depends on patient characteristics. We develop a deep learning method called LASSO-Dragonnet to estimate the individualized treatment effects of each patient. We considered six antihypertensive drugs, and each of the six models considered one of the drugs as the treatment and the remaining as control. Our studies showed that although many antihypertensives have a positive impact in delaying AD onset on average, the impact varies from individual to individual, depending on their various characteristics. We also analyzed the importance of various covariates in such an estimation. Our results showed that the individualized treatment effects of each patient could be estimated accurately using a deep learning method, and that the importance of various covariates could be determined.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"49-57"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10956734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00027
Chen Lin, Jianghong Zhou, Jing Zhang, Carl Yang, Eugene Agichtein
The utilization of web search activity for pandemic forecasting has significant implications for managing disease spread and informing policy decisions. However, web search records tend to be noisy and influenced by geographical location, making it difficult to develop large-scale models. While regularized linear models have been effective in predicting the spread of respiratory illnesses like COVID-19, they are limited to specific locations. The lack of incorporation of neighboring areas' data and the inability to transfer models to new locations with limited data has impeded further progress. To address these limitations, this study proposes a novel self-supervised message-passing neural network (SMPNN) framework for modeling local and cross-location dynamics in pandemic forecasting. The SMPNN framework utilizes an MPNN module to learn cross-location dependencies through self-supervised learning and improve local predictions with graph-generated features. The framework is designed as an end-to-end solution and is compared with state-of-the-art statistical and deep learning models using COVID-19 data from England and the US. The results of the study demonstrate that the SMPNN model outperforms other models by achieving up to a 6.9% improvement in prediction accuracy and lower prediction errors during the early stages of disease outbreaks. This approach represents a significant advancement in disease surveillance and forecasting, providing a novel methodology, datasets, and insights that combine web search data and spatial information. The proposed SMPNN framework offers a promising avenue for modeling the spread of pandemics, leveraging both local and cross-location information, and has the potential to inform public health policy decisions.
{"title":"Graph Neural Network Modeling of Web Search Activity for Real-time Pandemic Forecasting.","authors":"Chen Lin, Jianghong Zhou, Jing Zhang, Carl Yang, Eugene Agichtein","doi":"10.1109/ichi57859.2023.00027","DOIUrl":"10.1109/ichi57859.2023.00027","url":null,"abstract":"<p><p>The utilization of web search activity for pandemic forecasting has significant implications for managing disease spread and informing policy decisions. However, web search records tend to be noisy and influenced by geographical location, making it difficult to develop large-scale models. While regularized linear models have been effective in predicting the spread of respiratory illnesses like COVID-19, they are limited to specific locations. The lack of incorporation of neighboring areas' data and the inability to transfer models to new locations with limited data has impeded further progress. To address these limitations, this study proposes a novel self-supervised message-passing neural network (SMPNN) framework for modeling local and cross-location dynamics in pandemic forecasting. The SMPNN framework utilizes an MPNN module to learn cross-location dependencies through self-supervised learning and improve local predictions with graph-generated features. The framework is designed as an end-to-end solution and is compared with state-of-the-art statistical and deep learning models using COVID-19 data from England and the US. The results of the study demonstrate that the SMPNN model outperforms other models by achieving up to a 6.9% improvement in prediction accuracy and lower prediction errors during the early stages of disease outbreaks. This approach represents a significant advancement in disease surveillance and forecasting, providing a novel methodology, datasets, and insights that combine web search data and spatial information. The proposed SMPNN framework offers a promising avenue for modeling the spread of pandemics, leveraging both local and cross-location information, and has the potential to inform public health policy decisions.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"128-137"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10853009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139708630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00108
Xuguang Ai, Ramakanth Kavuluru
End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in > 4% improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.
{"title":"End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies.","authors":"Xuguang Ai, Ramakanth Kavuluru","doi":"10.1109/ichi57859.2023.00108","DOIUrl":"10.1109/ichi57859.2023.00108","url":null,"abstract":"<p><p>End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in > 4% improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"610-618"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10809256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139565432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ICHI57859.2023.00028
Yuelyu Ji, Yuhe Gao, Runxue Bao, Qi Li, Disheng Liu, Yiming Sun, Ye Ye
The coronavirus disease 2019 (COVID-19) has led to a global pandemic of significant severity. In addition to its high level of contagiousness, COVID-19 can have a heterogeneous clinical course, ranging from asymptomatic carriers to severe and potentially life-threatening health complications. Many patients have to revisit the emergency room (ER) within a short time after discharge, which significantly increases the workload for medical staff. Early identification of such patients is crucial for helping physicians focus on treating life-threatening cases. In this study, we obtained Electronic Health Records (EHRs) of 3,210 encounters from 13 affiliated ERs within the University of Pittsburgh Medical Center between March 2020 and January 2021. We leveraged a Natural Language Processing technique, ScispaCy, to extract clinical concepts and used the 1001 most frequent concepts to develop 7-day revisit models for COVID-19 patients in ERs. The research data we collected were obtained from 13 ERs, which may have distributional differences that could affect the model development. To address this issue, we employed a classic deep transfer learning method called the Domain Adversarial Neural Network (DANN) and evaluated different modeling strategies, including the Multi-DANN algorithm (which considers the source differences), the Single-DANN algorithm (which doesn't consider the source differences), and three baseline methods: using only source data, using only target data, and using a mixture of source and target data. Results showed that the Multi-DANN models outperformed the Single-DANN models and baseline models in predicting revisits of COVID-19 patients to the ER within 7 days after discharge (median AUROC = 0.8 vs. 0.5). Notably, the Multi-DANN strategy effectively addressed the heterogeneity among multiple source domains and improved the adaptation of source data to the target domain. Moreover, the high performance of Multi-DANN models indicates that EHRs are informative for developing a prediction model to identify COVID-19 patients who are very likely to revisit an ER within 7 days after discharge.
{"title":"Prediction of COVID-19 Patients' Emergency Room Revisit using Multi-Source Transfer Learning.","authors":"Yuelyu Ji, Yuhe Gao, Runxue Bao, Qi Li, Disheng Liu, Yiming Sun, Ye Ye","doi":"10.1109/ICHI57859.2023.00028","DOIUrl":"10.1109/ICHI57859.2023.00028","url":null,"abstract":"<p><p>The coronavirus disease 2019 (COVID-19) has led to a global pandemic of significant severity. In addition to its high level of contagiousness, COVID-19 can have a heterogeneous clinical course, ranging from asymptomatic carriers to severe and potentially life-threatening health complications. Many patients have to revisit the emergency room (ER) within a short time after discharge, which significantly increases the workload for medical staff. Early identification of such patients is crucial for helping physicians focus on treating life-threatening cases. In this study, we obtained Electronic Health Records (EHRs) of 3,210 encounters from 13 affiliated ERs within the University of Pittsburgh Medical Center between March 2020 and January 2021. We leveraged a Natural Language Processing technique, ScispaCy, to extract clinical concepts and used the 1001 most frequent concepts to develop 7-day revisit models for COVID-19 patients in ERs. The research data we collected were obtained from 13 ERs, which may have distributional differences that could affect the model development. To address this issue, we employed a classic deep transfer learning method called the Domain Adversarial Neural Network (DANN) and evaluated different modeling strategies, including the Multi-DANN algorithm (which considers the source differences), the Single-DANN algorithm (which doesn't consider the source differences), and three baseline methods: using only source data, using only target data, and using a mixture of source and target data. Results showed that the Multi-DANN models outperformed the Single-DANN models and baseline models in predicting revisits of COVID-19 patients to the ER within 7 days after discharge (median AUROC = 0.8 vs. 0.5). Notably, the Multi-DANN strategy effectively addressed the heterogeneity among multiple source domains and improved the adaptation of source data to the target domain. Moreover, the high performance of Multi-DANN models indicates that EHRs are informative for developing a prediction model to identify COVID-19 patients who are very likely to revisit an ER within 7 days after discharge.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"138-144"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10939709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00100
Zenan Sun, Cui Tao
Alzheimer's Disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Finding effective treatments for this disease is crucial. Clinical trials play an essential role in developing and testing new treatments for AD. However, identifying eligible participants can be challenging, time-consuming, and costly. In recent years, the development of natural language processing (NLP) techniques, specifically named entity recognition (NER) and named entity normalization (NEN), have helped to automate the identification and extraction of relevant information from the eligibility criteria (EC) more efficiently, in order to facilitate semi-automatic patient recruitment and enable data FAIRness for clinical trial data. Nevertheless, most current biomedical NER models only provide annotations for a restricted set of entity types that may not be applicable to the clinical trial data. Additionally, accurately performing NEN on entities that are negated using a negative prefix currently lacks established techniques. In this paper, we introduce a pipeline designed for information extraction from AD clinical trial EC, which involves preprocessing of the EC data, clinical NER, and biomedical NEN to Unified Medical Language System (UMLS). Our NER model can identify named entities in seven pre-defined categories, while our NEN model employs a combination of exact match and partial match search strategies, as well as customized rules to accurately normalize entities with negative prefixes. To evaluate the performance of our pipeline, we measured the precision, recall, and F1 score for the NER component, and we manually reviewed the top five mapping results produced by the NEN component. Our evaluation of the pipeline's performance revealed that it can successfully normalize named entities in clinical trial ECs with optimal accuracies. The NER component achieved a overall F1 of 0.816, demonstrating its ability to accurately identify seven types of named entities in clinical text. The NEN component of the pipeline also demonstrated impressive performance, with customized rules and a combination of exact and partial match strategies leading to an accuracy of 0.940 for normalized entities.
阿尔茨海默病(AD)是一种复杂的神经退行性疾病,影响着全球数百万人。找到治疗这种疾病的有效方法至关重要。临床试验在开发和测试阿尔茨海默病的新疗法方面发挥着至关重要的作用。然而,确定符合条件的参与者是一项具有挑战性的工作,既费时又费钱。近年来,自然语言处理(NLP)技术的发展,特别是命名实体识别(NER)和命名实体规范化(NEN)技术的发展,有助于更高效地自动识别和提取资格标准(EC)中的相关信息,从而促进半自动化的患者招募,并实现临床试验数据的公平性。然而,目前大多数生物医学 NER 模型只为有限的实体类型提供注释,而这些实体类型可能并不适用于临床试验数据。此外,对使用否定前缀否定的实体准确执行 NEN 目前还缺乏成熟的技术。在本文中,我们介绍了一个专为从 AD 临床试验 EC 中提取信息而设计的管道,其中包括对 EC 数据进行预处理、临床 NER 以及根据统一医学语言系统(UMLS)进行生物医学 NEN。我们的 NER 模型可以识别七个预定义类别中的命名实体,而我们的 NEN 模型则结合使用了精确匹配和部分匹配搜索策略,以及自定义规则来准确归一化带有负前缀的实体。为了评估我们管道的性能,我们测量了 NER 组件的精确度、召回率和 F1 分数,并手动查看了 NEN 组件生成的前五个映射结果。我们对管道性能的评估结果表明,它能以最佳的准确率成功地对临床试验 EC 中的命名实体进行规范化处理。NER 组件的总体 F1 值为 0.816,表明它有能力准确识别临床文本中的七种命名实体。该管道的 NEN 组件也表现出了令人印象深刻的性能,通过定制规则以及精确匹配和部分匹配策略的组合,规范化实体的准确率达到了 0.940。
{"title":"Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria.","authors":"Zenan Sun, Cui Tao","doi":"10.1109/ichi57859.2023.00100","DOIUrl":"10.1109/ichi57859.2023.00100","url":null,"abstract":"<p><p>Alzheimer's Disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Finding effective treatments for this disease is crucial. Clinical trials play an essential role in developing and testing new treatments for AD. However, identifying eligible participants can be challenging, time-consuming, and costly. In recent years, the development of natural language processing (NLP) techniques, specifically named entity recognition (NER) and named entity normalization (NEN), have helped to automate the identification and extraction of relevant information from the eligibility criteria (EC) more efficiently, in order to facilitate semi-automatic patient recruitment and enable data FAIRness for clinical trial data. Nevertheless, most current biomedical NER models only provide annotations for a restricted set of entity types that may not be applicable to the clinical trial data. Additionally, accurately performing NEN on entities that are negated using a negative prefix currently lacks established techniques. In this paper, we introduce a pipeline designed for information extraction from AD clinical trial EC, which involves preprocessing of the EC data, clinical NER, and biomedical NEN to Unified Medical Language System (UMLS). Our NER model can identify named entities in seven pre-defined categories, while our NEN model employs a combination of exact match and partial match search strategies, as well as customized rules to accurately normalize entities with negative prefixes. To evaluate the performance of our pipeline, we measured the precision, recall, and F1 score for the NER component, and we manually reviewed the top five mapping results produced by the NEN component. Our evaluation of the pipeline's performance revealed that it can successfully normalize named entities in clinical trial ECs with optimal accuracies. The NER component achieved a overall F1 of 0.816, demonstrating its ability to accurately identify seven types of named entities in clinical text. The NEN component of the pipeline also demonstrated impressive performance, with customized rules and a combination of exact and partial match strategies leading to an accuracy of 0.940 for normalized entities.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"558-564"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10815931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-12-11DOI: 10.1109/ichi57859.2023.00139
Aokun Chen, Qian Li, Elizabeth Shenkman, Yonghui Wu, Yi Guo, Jiang Bian
Clinical trials were vital tools to prove the effectiveness and safety of medications. To maximize generalizability, the study sample should represent the sample population and the target population. However, the clinical trial design tends to favor the evaluation of drug safety and procedure (i.e., internal validity) without clear knowledge of its penalty on trial generalizability (i.e., external validity). Alzheimer's Disease (AD) trials are known to have generalizability issues. Thus, in this study, we explore the effect of eligibility criteria on the AD severity patients and the severe adverse event (SAE) among the eligible patients.
临床试验是证明药物有效性和安全性的重要工具。为了最大限度地提高可推广性,研究样本应代表样本人群和目标人群。然而,临床试验设计往往偏重于药物安全性和程序的评估(即内部效度),而不清楚其对试验可推广性(即外部效度)的影响。众所周知,阿尔茨海默病(AD)试验存在可推广性问题。因此,在本研究中,我们探讨了合格标准对 AD 严重程度患者和合格患者中严重不良事件(SAE)的影响。
{"title":"Exploring the Effect of Eligibility Criteria on AD Severity and Severe Adverse Event in Eligible Patients.","authors":"Aokun Chen, Qian Li, Elizabeth Shenkman, Yonghui Wu, Yi Guo, Jiang Bian","doi":"10.1109/ichi57859.2023.00139","DOIUrl":"10.1109/ichi57859.2023.00139","url":null,"abstract":"<p><p>Clinical trials were vital tools to prove the effectiveness and safety of medications. To maximize generalizability, the study sample should represent the sample population and the target population. However, the clinical trial design tends to favor the evaluation of drug safety and procedure (i.e., internal validity) without clear knowledge of its penalty on trial generalizability (i.e., external validity). Alzheimer's Disease (AD) trials are known to have generalizability issues. Thus, in this study, we explore the effect of eligibility criteria on the AD severity patients and the severe adverse event (SAE) among the eligible patients.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"756-759"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11273173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}