IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics最新文献_第3页

An End-to-end In-Silico and In-Vitro Drug Repurposing Pipeline for Glioblastoma. 针对胶质母细胞瘤的端到端硅内和体外药物再利用管道。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00135

Ko-Hong Lin, Jay-Jiguang Zhu, Judith A Smith, Yejin Kim, Xiaoqian Jiang

Our study aims to address the challenges in drug development for glioblastoma, a highly aggressive brain cancer with poor prognosis. We propose a computational framework that utilizes machine learning-based propensity score matching to estimate counterfactual treatment effects and predict synergistic effects of drug combinations. Through our in-silico analysis, we identified promising drug candidates and drug combinations that warrant further investigation. To validate these computational findings, we conducted in-vitro experiments on two GBM cell lines, U87 and T98G. The experimental results demonstrated that some of the identified drugs and drug combinations indeed exhibit strong suppressive effects on GBM cell growth. Our end-to-end pipeline showcases the feasibility of integrating computational models with biological experiments to expedite drug repurposing and discovery efforts. By bridging the gap between in-silico analysis and in-vitro validation, we demonstrate the potential of this approach to accelerate the development of novel and effective treatments for glioblastoma.

胶质母细胞瘤是一种侵袭性极强、预后极差的脑癌，我们的研究旨在应对胶质母细胞瘤药物开发中的挑战。我们提出了一个计算框架，利用基于机器学习的倾向得分匹配来估计反事实治疗效果，并预测药物组合的协同效应。通过内嵌分析，我们确定了有希望的候选药物和值得进一步研究的药物组合。为了验证这些计算结果，我们在 U87 和 T98G 两种 GBM 细胞系上进行了体外实验。实验结果表明，一些确定的药物和药物组合确实对 GBM 细胞的生长有很强的抑制作用。我们的端到端管道展示了将计算模型与生物实验相结合以加快药物再利用和发现工作的可行性。通过弥合体内分析和体外验证之间的差距，我们证明了这种方法在加速开发胶质母细胞瘤新型有效疗法方面的潜力。

{"title":"An End-to-end In-Silico and In-Vitro Drug Repurposing Pipeline for Glioblastoma.","authors":"Ko-Hong Lin, Jay-Jiguang Zhu, Judith A Smith, Yejin Kim, Xiaoqian Jiang","doi":"10.1109/ichi57859.2023.00135","DOIUrl":"10.1109/ichi57859.2023.00135","url":null,"abstract":"Our study aims to address the challenges in drug development for glioblastoma, a highly aggressive brain cancer with poor prognosis. We propose a computational framework that utilizes machine learning-based propensity score matching to estimate counterfactual treatment effects and predict synergistic effects of drug combinations. Through our in-silico analysis, we identified promising drug candidates and drug combinations that warrant further investigation. To validate these computational findings, we conducted in-vitro experiments on two GBM cell lines, U87 and T98G. The experimental results demonstrated that some of the identified drugs and drug combinations indeed exhibit strong suppressive effects on GBM cell growth. Our end-to-end pipeline showcases the feasibility of integrating computational models with biological experiments to expedite drug repurposing and discovery efforts. By bridging the gap between in-silico analysis and in-vitro validation, we demonstrate the potential of this approach to accelerate the development of novel and effective treatments for glioblastoma.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"738-745"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10956733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Prediction of Late Symptoms using LSTM and Patient-reported Outcomes for Head and Neck Cancer Patients. 利用 LSTM 和患者报告结果改进头颈癌患者晚期症状的预测。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00047

Yaohua Wang, Lisanne Van Dijk, Abdallah S R Mohamed, Mohamed Naser, Clifton David Fuller, Xinhua Zhang, G Elisabeta Marai, Guadalupe Canahuate

Patient-Reported Outcomes (PRO) are collected directly from the patients using symptom questionnaires. In the case of head and neck cancer patients, PRO surveys are recorded every week during treatment with each patient's visit to the clinic and at different follow-up times after the treatment has concluded. PRO surveys can be very informative regarding the patient's status and the effect of treatment on the patient's quality of life (QoL). Processing PRO data is challenging for several reasons. First, missing data is frequent as patients might skip a question or a questionnaire altogether. Second, PROs are patient-dependent, a rating of 5 for one patient might be a rating of 10 for another patient. Finally, most patients experience severe symptoms during treatment which usually subside over time. However, for some patients, late toxicities persist negatively affecting the patient's QoL. These long-term severe symptoms are hard to predict and are the focus of this study. In this work, we model PRO data collected from head and neck cancer patients treated at the MD Anderson Cancer Center using the MD Anderson Symptom Inventory (MDASI) questionnaire as time series. We impute missing values with a combination of K nearest neighbor (KNN) and Long Short-Term Memory (LSTM) neural networks, and finally, apply LSTM to predict late symptom severity 12 months after treatment. We compare performance against clinical and ARIMA models. We show that the LSTM model combined with KNN imputation is effective in predicting late-stage symptom ratings for occurrence and severity under the AUC and F1 score metrics.

患者报告结果 (PRO) 是通过症状问卷直接从患者处收集的。就头颈部癌症患者而言，在治疗期间，每周都会对每位患者的就诊情况和治疗结束后的不同随访时间进行PRO调查记录。PRO调查可以为患者的状况以及治疗对患者生活质量（QoL）的影响提供大量信息。处理 PRO 数据具有挑战性，原因有以下几点。首先，由于患者可能会跳过某个问题或问卷，因此经常会出现数据缺失的情况。其次，PRO 与病人有关，一个病人的评分是 5 分，另一个病人的评分可能是 10 分。最后，大多数患者在治疗期间都会出现严重的症状，这些症状通常会随着时间的推移而消退。然而，对于某些患者来说，后期毒性反应持续存在，对患者的生活质量产生负面影响。这些长期的严重症状很难预测，也是本研究的重点。在这项研究中，我们使用 MD 安德森症状量表 (MDASI) 问卷对在 MD 安德森癌症中心接受治疗的头颈部癌症患者的 PRO 数据建立了时间序列模型。我们使用 K 最近邻（KNN）和长短期记忆（LSTM）神经网络组合来弥补缺失值，最后应用 LSTM 预测治疗 12 个月后的晚期症状严重程度。我们将其性能与临床模型和 ARIMA 模型进行了比较。结果表明，LSTM 模型与 KNN 估算相结合，能有效预测 AUC 和 F1 分数指标下的晚期症状发生率和严重程度。

{"title":"Improving Prediction of Late Symptoms using LSTM and Patient-reported Outcomes for Head and Neck Cancer Patients.","authors":"Yaohua Wang, Lisanne Van Dijk, Abdallah S R Mohamed, Mohamed Naser, Clifton David Fuller, Xinhua Zhang, G Elisabeta Marai, Guadalupe Canahuate","doi":"10.1109/ichi57859.2023.00047","DOIUrl":"10.1109/ichi57859.2023.00047","url":null,"abstract":"Patient-Reported Outcomes (PRO) are collected directly from the patients using symptom questionnaires. In the case of head and neck cancer patients, PRO surveys are recorded every week during treatment with each patient's visit to the clinic and at different follow-up times after the treatment has concluded. PRO surveys can be very informative regarding the patient's status and the effect of treatment on the patient's quality of life (QoL). Processing PRO data is challenging for several reasons. First, missing data is frequent as patients might skip a question or a questionnaire altogether. Second, PROs are patient-dependent, a rating of 5 for one patient might be a rating of 10 for another patient. Finally, most patients experience severe symptoms during treatment which usually subside over time. However, for some patients, late toxicities persist negatively affecting the patient's QoL. These long-term severe symptoms are hard to predict and are the focus of this study. In this work, we model PRO data collected from head and neck cancer patients treated at the MD Anderson Cancer Center using the MD Anderson Symptom Inventory (MDASI) questionnaire as time series. We impute missing values with a combination of K nearest neighbor (KNN) and Long Short-Term Memory (LSTM) neural networks, and finally, apply LSTM to predict late symptom severity 12 months after treatment. We compare performance against clinical and ARIMA models. We show that the LSTM model combined with KNN imputation is effective in predicting late-stage symptom ratings for occurrence and severity under the AUC and F1 score metrics.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"292-300"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10853990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139725194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CareD: Caregiver's Experience with Cognitive Decline in Reddit Posts. CareD：照顾者在 Reddit 帖子中对认知能力衰退的体验。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00104

Muskan Garg, Sunghwan Sohn

With advancements in analysis of cognitive decline in electronic health records, the research community witnesses a recent surge in social media posting by caregivers and/or loved ones of people with cognitive decline. The major challenges in this area are availability of large and diverse datasets, ethics of data collection and sharing, diagnostic specificity and clinical acceptability. To this end, we construct a new dataset, Caregivers experiences with cognitive Decline (CareD), of 1005 posts with more than 194K words and 9541 sentences, highlighting discussions on people with dementia and Alzheimer's disease on Reddit. We discuss the changing trends of discussions on cognitive decline in social media and open challenges for natural language processing and social computing. We first identify the Reddit posts reflecting substantial information as candidate posts. We further formulate the annotation guidelines, handle perplexities to investigate the existence of experiences, self-reported articles and potential caregiver in candidate posts, resulting in the discovery of latent symptoms, firsthand information, and prospective source of longitudinal information about the patient, respectively.

随着对电子健康记录中认知功能衰退分析的进步，研究界发现，最近认知功能衰退患者的照顾者和/或亲人在社交媒体上发布的信息激增。该领域面临的主要挑战包括：大型和多样化数据集的可用性、数据收集和共享的道德规范、诊断特异性和临床可接受性。为此，我们构建了一个新的数据集--"认知衰退的照顾者经验（CareD）"，其中包含 1005 篇帖子，超过 194K 个单词和 9541 个句子，突出了 Reddit 上关于痴呆症和阿尔茨海默病患者的讨论。我们讨论了社交媒体中有关认知能力下降的讨论的变化趋势，以及自然语言处理和社交计算所面临的挑战。我们首先将反映大量信息的 Reddit 帖子确定为候选帖子。我们进一步制定了注释指南，处理各种困惑，以调查候选帖子中是否存在经历、自述文章和潜在护理者，从而分别发现潜在症状、第一手信息和患者纵向信息的前瞻性来源。

{"title":"CareD: Caregiver's Experience with Cognitive Decline in Reddit Posts.","authors":"Muskan Garg, Sunghwan Sohn","doi":"10.1109/ichi57859.2023.00104","DOIUrl":"10.1109/ichi57859.2023.00104","url":null,"abstract":"With advancements in analysis of cognitive decline in electronic health records, the research community witnesses a recent surge in social media posting by caregivers and/or loved ones of people with cognitive decline. The major challenges in this area are availability of large and diverse datasets, ethics of data collection and sharing, diagnostic specificity and clinical acceptability. To this end, we construct a new dataset, Caregivers experiences with cognitive Decline (CareD), of 1005 posts with more than 194K words and 9541 sentences, highlighting discussions on people with dementia and Alzheimer's disease on Reddit. We discuss the changing trends of discussions on cognitive decline in social media and open challenges for natural language processing and social computing. We first identify the Reddit posts reflecting substantial information as candidate posts. We further formulate the annotation guidelines, handle perplexities to investigate the existence of experiences, self-reported articles and potential caregiver in candidate posts, resulting in the discovery of latent symptoms, firsthand information, and prospective source of longitudinal information about the patient, respectively.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"581-587"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10877621/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

End-to-End n-ary Relation Extraction for Combination Drug Therapies. 联合药物疗法的端到端 nary 关系提取。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00021

Yuhang Jiang, Ramakanth Kavuluru

Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the combination drug therapy MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an n-ary relation extraction problem. Unlike in the general n-ary setting where n is fixed (e.g., drug-gene-mutation relations where n = 3), extracting combination therapies is a special setting where n ≥ 2 is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, CombDrugExt, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of 66.7% on the CombDrugExt test set for positive (or effective) combinations. This is an absolute ≈ 5% F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic n-ary extraction scenarios.

联合药物疗法是一种涉及两种或两种以上药物的治疗方案，通常用于治疗癌症、艾滋病、疟疾或结核病患者。目前，PubMed 上有超过 35 万篇使用联合药物疗法 MeSH 标题的文章，在过去二十年中，每年至少有 1 万篇文章发表。从科学文献中提取联合疗法本身就构成了一个 n-ary 关系提取问题。在一般的 n-ary 环境中，n 是固定的（例如，n = 3 的药物基因突变关系），而提取联合疗法则不同，在这种特殊环境中，n ≥ 2 是动态的，取决于每个实例。最近，Tiktinsky 等人（NAACL 2022）推出了首个从文献中提取此类疗法的数据集 CombDrugExt。在这里，我们使用了一种序列到序列式的端到端提取方法，在 CombDrugExt 测试集上，阳性（或有效）组合的 F1 分数达到了 66.7%。即使与之前使用斑点药物实体（因此不是端到端）的最佳关系分类得分相比，F1 分数的绝对值也提高了 ≈ 5%。因此，我们的努力为端到端提取引入了最先进的首个模型，该模型已经优于之前用于该任务的最佳非端到端模型。我们的模型能一次性无缝提取所有药物实体和关系，非常适合动态 n-ary 提取场景。

{"title":"End-to-End n-ary Relation Extraction for Combination Drug Therapies.","authors":"Yuhang Jiang, Ramakanth Kavuluru","doi":"10.1109/ichi57859.2023.00021","DOIUrl":"10.1109/ichi57859.2023.00021","url":null,"abstract":"Combination drug therapies are treatment regimens that involve two or more drugs, administered more commonly for patients with cancer, HIV, malaria, or tuberculosis. Currently there are over 350K articles in PubMed that use the combination drug therapy MeSH heading with at least 10K articles published per year over the past two decades. Extracting combination therapies from scientific literature inherently constitutes an n-ary relation extraction problem. Unlike in the general n-ary setting where n is fixed (e.g., drug-gene-mutation relations where n = 3), extracting combination therapies is a special setting where n ≥ 2 is dynamic, depending on each instance. Recently, Tiktinsky et al. (NAACL 2022) introduced a first of its kind dataset, CombDrugExt, for extracting such therapies from literature. Here, we use a sequence-to-sequence style end-to-end extraction method to achieve an F1-Score of 66.7% on the CombDrugExt test set for positive (or effective) combinations. This is an absolute ≈ 5% F1-score improvement even over the prior best relation classification score with spotted drug entities (hence, not end-to-end). Thus our effort introduces a state-of-the-art first model for end-to-end extraction that is already superior to the best prior non end-to-end model for this task. Our model seamlessly extracts all drug entities and relations in a single pass and is highly suitable for dynamic n-ary extraction scenarios.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"72-80"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10814995/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph Neural Network Modeling of Web Search Activity for Real-time Pandemic Forecasting. 用于实时流行病预测的网络搜索活动图神经网络模型。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00027

Chen Lin, Jianghong Zhou, Jing Zhang, Carl Yang, Eugene Agichtein

The utilization of web search activity for pandemic forecasting has significant implications for managing disease spread and informing policy decisions. However, web search records tend to be noisy and influenced by geographical location, making it difficult to develop large-scale models. While regularized linear models have been effective in predicting the spread of respiratory illnesses like COVID-19, they are limited to specific locations. The lack of incorporation of neighboring areas' data and the inability to transfer models to new locations with limited data has impeded further progress. To address these limitations, this study proposes a novel self-supervised message-passing neural network (SMPNN) framework for modeling local and cross-location dynamics in pandemic forecasting. The SMPNN framework utilizes an MPNN module to learn cross-location dependencies through self-supervised learning and improve local predictions with graph-generated features. The framework is designed as an end-to-end solution and is compared with state-of-the-art statistical and deep learning models using COVID-19 data from England and the US. The results of the study demonstrate that the SMPNN model outperforms other models by achieving up to a 6.9% improvement in prediction accuracy and lower prediction errors during the early stages of disease outbreaks. This approach represents a significant advancement in disease surveillance and forecasting, providing a novel methodology, datasets, and insights that combine web search data and spatial information. The proposed SMPNN framework offers a promising avenue for modeling the spread of pandemics, leveraging both local and cross-location information, and has the potential to inform public health policy decisions.

利用网络搜索活动进行大流行病预测对管理疾病传播和为政策决策提供信息具有重要意义。然而，网络搜索记录往往比较嘈杂，而且受地理位置的影响较大，因此很难开发大规模的模型。虽然正则化线性模型在预测 COVID-19 等呼吸道疾病的传播方面很有效，但它们仅限于特定地点。由于没有纳入邻近地区的数据，也无法在数据有限的情况下将模型转移到新的地点，这阻碍了模型的进一步发展。为了解决这些局限性，本研究提出了一种新颖的自监督信息传递神经网络（SMPNN）框架，用于在大流行预测中建立本地和跨地点动态模型。SMPNN 框架利用 MPNN 模块，通过自我监督学习来学习跨地点依赖关系，并利用图生成的特征来改进本地预测。该框架被设计为端到端解决方案，并利用来自英国和美国的 COVID-19 数据与最先进的统计和深度学习模型进行了比较。研究结果表明，在疾病爆发的早期阶段，SMPNN 模型优于其他模型，预测准确率提高了 6.9%，预测误差更低。这种方法提供了一种结合网络搜索数据和空间信息的新方法、数据集和见解，是疾病监测和预测领域的一大进步。所提出的 SMPNN 框架为利用本地和跨地点信息模拟流行病的传播提供了一个前景广阔的途径，并有可能为公共卫生政策决策提供信息。

{"title":"Graph Neural Network Modeling of Web Search Activity for Real-time Pandemic Forecasting.","authors":"Chen Lin, Jianghong Zhou, Jing Zhang, Carl Yang, Eugene Agichtein","doi":"10.1109/ichi57859.2023.00027","DOIUrl":"10.1109/ichi57859.2023.00027","url":null,"abstract":"The utilization of web search activity for pandemic forecasting has significant implications for managing disease spread and informing policy decisions. However, web search records tend to be noisy and influenced by geographical location, making it difficult to develop large-scale models. While regularized linear models have been effective in predicting the spread of respiratory illnesses like COVID-19, they are limited to specific locations. The lack of incorporation of neighboring areas' data and the inability to transfer models to new locations with limited data has impeded further progress. To address these limitations, this study proposes a novel self-supervised message-passing neural network (SMPNN) framework for modeling local and cross-location dynamics in pandemic forecasting. The SMPNN framework utilizes an MPNN module to learn cross-location dependencies through self-supervised learning and improve local predictions with graph-generated features. The framework is designed as an end-to-end solution and is compared with state-of-the-art statistical and deep learning models using COVID-19 data from England and the US. The results of the study demonstrate that the SMPNN model outperforms other models by achieving up to a 6.9% improvement in prediction accuracy and lower prediction errors during the early stages of disease outbreaks. This approach represents a significant advancement in disease surveillance and forecasting, providing a novel methodology, datasets, and insights that combine web search data and spatial information. The proposed SMPNN framework offers a promising avenue for modeling the spread of pandemics, leveraging both local and cross-location information, and has the potential to inform public health policy decisions.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"128-137"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10853009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139708630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inferring Personalized Treatment Effect of Antihypertensives on Alzheimer's Disease Using Deep Learning. 利用深度学习推断抗高血压药对阿尔茨海默病的个性化治疗效果

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00018

Pulakesh Upadhyaya, Yaobin Ling, Luyao Chen, Yejin Kim, Xiaoqian Jiang

Alzheimer's disease (AD) is one of the leading causes of death in the United States, especially among the elderly. Recent studies have shown how hypertension is related to cognitive decline in elderly patients, which in turn leads to increased mortality as well as morbidity. There have been various studies that have looked at the effect of antihypertensive drugs in reducing cognitive decline, and their results have proved inconclusive. However, most of these studies assume the treatment effect is similar for all patients, thus considering only the average treatment effects of antihypertensive drugs. In this paper, we assume that the effect of antihypertensives on the onset of AD depends on patient characteristics. We develop a deep learning method called LASSO-Dragonnet to estimate the individualized treatment effects of each patient. We considered six antihypertensive drugs, and each of the six models considered one of the drugs as the treatment and the remaining as control. Our studies showed that although many antihypertensives have a positive impact in delaying AD onset on average, the impact varies from individual to individual, depending on their various characteristics. We also analyzed the importance of various covariates in such an estimation. Our results showed that the individualized treatment effects of each patient could be estimated accurately using a deep learning method, and that the importance of various covariates could be determined.

阿尔茨海默病（AD）是美国人，尤其是老年人的主要死因之一。最近的研究表明，高血压与老年患者认知能力下降有关，而认知能力下降又会导致死亡率和发病率上升。有多项研究探讨了降压药对减少认知功能衰退的作用，但结果并不确定。然而，这些研究大多假设所有患者的治疗效果相似，因此只考虑了降压药物的平均治疗效果。在本文中，我们假设降压药对注意力缺失症发病的影响取决于患者的特征。我们开发了一种名为 LASSO-Dragonnet 的深度学习方法来估计每位患者的个性化治疗效果。我们考虑了六种抗高血压药物，六个模型中的每一个都将其中一种药物作为治疗药物，其余药物作为对照药物。我们的研究表明，虽然许多降压药平均而言对延缓AD发病有积极影响，但这种影响因人而异，取决于每个人的不同特征。我们还分析了各种协变量在这种估算中的重要性。我们的结果表明，使用深度学习方法可以准确估计出每位患者的个体化治疗效果，并且可以确定各种协变量的重要性。

{"title":"Inferring Personalized Treatment Effect of Antihypertensives on Alzheimer's Disease Using Deep Learning.","authors":"Pulakesh Upadhyaya, Yaobin Ling, Luyao Chen, Yejin Kim, Xiaoqian Jiang","doi":"10.1109/ichi57859.2023.00018","DOIUrl":"10.1109/ichi57859.2023.00018","url":null,"abstract":"Alzheimer's disease (AD) is one of the leading causes of death in the United States, especially among the elderly. Recent studies have shown how hypertension is related to cognitive decline in elderly patients, which in turn leads to increased mortality as well as morbidity. There have been various studies that have looked at the effect of antihypertensive drugs in reducing cognitive decline, and their results have proved inconclusive. However, most of these studies assume the treatment effect is similar for all patients, thus considering only the average treatment effects of antihypertensive drugs. In this paper, we assume that the effect of antihypertensives on the onset of AD depends on patient characteristics. We develop a deep learning method called LASSO-Dragonnet to estimate the individualized treatment effects of each patient. We considered six antihypertensive drugs, and each of the six models considered one of the drugs as the treatment and the remaining as control. Our studies showed that although many antihypertensives have a positive impact in delaying AD onset on average, the impact varies from individual to individual, depending on their various characteristics. We also analyzed the importance of various covariates in such an estimation. Our results showed that the individualized treatment effects of each patient could be estimated accurately using a deep learning method, and that the importance of various covariates could be determined.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"49-57"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10956734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140186469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Designing Software for Genomics Medicine Service Leaders to Engage Stakeholders. 为基因组学医学服务领导者设计软件以吸引利益相关者。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00059

Juandalyn Coffen-Burke, Kai-Wen K Yang, Zoljargal Lkhagvajav, Yuzhi L Lu, Tamisha Dzifa Segbefia, Natalie Wang, James Michael Stevenson, Casey Overby Taylor

Background: Stakeholder engagement can be challenging yet is critically important for the success of a new genomic medicine service or genetic test offering. Services may include ordering, analysis, interpretation, and counseling associated with using a patient's genetic information to make decisions about medical choices.

Objective: The aim of this study was to use a human-centered design (HCD) approach to design software for genomic medicine service leaders seeking to engage stakeholders. Of particular interest were stakeholders that could be partners to champion a new genetic test offering.

Methods: Our HCD process was driven by a modified design sprint methodology involving participatory design sessions and semi-structured interviews with experts. Subsequently, we created a low-fidelity prototype aiming to facilitate early engagement with potential genomic medicine service partners. The prototype was used to test the design with current genomic medicine service leaders. To guide our evaluation, we developed a set of design considerations for the software application that reflected common strategies used by programs successfully implementing genetic test offerings in diverse settings.

Results: We analyzed notes collected from interview sessions with seven genomic medicine service leaders. We identified ten sub-themes, synthesized corresponding notes, and focused our interpretations around three design considerations: #1-identify potential genomic medicine service partners to champion a new genetic test offering; #2-train and educate stakeholders; and #3-obtain and use genomic medicine partner feedback. The top three ranked sub-themes were: 1 - add new information, 2 - general approval, and 3 - add/change functionality.

Conclusion: Our findings suggest that genomic medicine service leaders approve of our software design and process to facilitate key stakeholders' review and endorsement of a new genetic test offering. We also demonstrated an evaluation strategy that draws from lessons of successful genomic medicine programs to identify and prioritize areas to improve the software in future design iterations.

背景：利益相关者的参与可能具有挑战性，但对于新的基因组医学服务或基因检测产品的成功至关重要。服务可能包括排序、分析、解释和咨询，使用患者的遗传信息来做出医疗选择的决定。目的：本研究的目的是使用以人为中心的设计（HCD）方法为寻求利益相关者参与的基因组医学服务领导者设计软件。特别感兴趣的是可能成为合作伙伴的利益相关者，以支持新的基因测试产品。方法：我们的HCD过程由改进的设计冲刺方法驱动，包括参与式设计会议和与专家的半结构化访谈。随后，我们创建了一个低保真原型，旨在促进与潜在基因组医学服务合作伙伴的早期接触。该原型被用于与当前基因组医学服务的领导者一起测试设计。为了指导我们的评估，我们为软件应用程序开发了一组设计考虑，这些考虑反映了在不同设置中成功实现基因测试产品的程序所使用的共同策略。结果：我们分析了7位基因组医学服务领导者的访谈记录。我们确定了十个子主题，合成了相应的注释，并围绕三个设计考虑因素进行解释：#1-确定潜在的基因组医学服务合作伙伴，以支持新的基因检测产品；#2-培训和教育利益相关者；第三，获取并利用基因组医学合作伙伴的反馈。排名前三的子主题是：1 -添加新信息，2 -一般批准，和3 -添加/更改功能。结论：我们的研究结果表明，基因组医学服务的领导者认可我们的软件设计和流程，以促进关键利益相关者审查和认可新的基因检测产品。我们还展示了一种评估策略，该策略借鉴了成功的基因组医学项目的经验，以确定和优先考虑在未来设计迭代中改进软件的领域。

{"title":"Designing Software for Genomics Medicine Service Leaders to Engage Stakeholders.","authors":"Juandalyn Coffen-Burke, Kai-Wen K Yang, Zoljargal Lkhagvajav, Yuzhi L Lu, Tamisha Dzifa Segbefia, Natalie Wang, James Michael Stevenson, Casey Overby Taylor","doi":"10.1109/ichi57859.2023.00059","DOIUrl":"10.1109/ichi57859.2023.00059","url":null,"abstract":"Background: Stakeholder engagement can be challenging yet is critically important for the success of a new genomic medicine service or genetic test offering. Services may include ordering, analysis, interpretation, and counseling associated with using a patient's genetic information to make decisions about medical choices.Objective: The aim of this study was to use a human-centered design (HCD) approach to design software for genomic medicine service leaders seeking to engage stakeholders. Of particular interest were stakeholders that could be partners to champion a new genetic test offering.Methods: Our HCD process was driven by a modified design sprint methodology involving participatory design sessions and semi-structured interviews with experts. Subsequently, we created a low-fidelity prototype aiming to facilitate early engagement with potential genomic medicine service partners. The prototype was used to test the design with current genomic medicine service leaders. To guide our evaluation, we developed a set of design considerations for the software application that reflected common strategies used by programs successfully implementing genetic test offerings in diverse settings.Results: We analyzed notes collected from interview sessions with seven genomic medicine service leaders. We identified ten sub-themes, synthesized corresponding notes, and focused our interpretations around three design considerations: #1-identify potential genomic medicine service partners to champion a new genetic test offering; #2-train and educate stakeholders; and #3-obtain and use genomic medicine partner feedback. The top three ranked sub-themes were: 1 - add new information, 2 - general approval, and 3 - add/change functionality.Conclusion: Our findings suggest that genomic medicine service leaders approve of our software design and process to facilitate key stakeholders' review and endorsement of a new genetic test offering. We also demonstrated an evaluation strategy that draws from lessons of successful genomic medicine programs to identify and prioritize areas to improve the software in future design iterations.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"398-406"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12743354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145851716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies. 化学-蛋白质相互作用提取的端到端模型：更好的标记化和基于跨度的管道策略

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00108

Xuguang Ai, Ramakanth Kavuluru

End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in > 4% improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.

端到端关系提取（E2ERE）是信息提取中的一项重要任务，对于生物医学来说更是如此，因为科学文献仍在呈指数级增长。E2ERE 通常包括识别实体（或命名实体识别 (NER)）和相关关系，而大多数 RE 任务只是假定实体已预先提供，并最终执行关系分类。由于命名实体识别中的错误可能会产生滚雪球效应，导致命名实体识别中出现更多错误，因此 E2ERE 本身就比 RE 更难。生物医学 E2ERE 中的一个复杂数据集是 ChemProt 数据集（BioCreative VI, 2017），该数据集用于识别科学文献中化合物与基因/蛋白质之间的关系。ChemProt 包含在最近所有的生物医学自然语言处理基准中，包括 BLUE、BLURB 和 BigBio。然而，在这些基准和其他单独的工作中，对 ChemProt 的处理通常不是端对端，只有少数例外。在这项研究中，我们采用了一种基于跨度的管道方法，在 ChemProt 数据集上实现了最先进的 E2ERE 性能，使 F1 分数比之前的最佳成绩提高了 4%。我们的结果表明，直接的细粒度标记化方案有助于基于跨度的方法在 E2ERE 中取得优异成绩，尤其是在处理复杂命名实体方面。我们的错误分析还发现了 E2ERE 在 ChemProt 中的一些关键故障模式。

{"title":"End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies.","authors":"Xuguang Ai, Ramakanth Kavuluru","doi":"10.1109/ichi57859.2023.00108","DOIUrl":"10.1109/ichi57859.2023.00108","url":null,"abstract":"End-to-end relation extraction (E2ERE) is an important task in information extraction, more so for biomedicine as scientific literature continues to grow exponentially. E2ERE typically involves identifying entities (or named entity recognition (NER)) and associated relations, while most RE tasks simply assume that the entities are provided upfront and end up performing relation classification. E2ERE is inherently more difficult than RE alone given the potential snowball effect of errors from NER leading to more errors in RE. A complex dataset in biomedical E2ERE is the ChemProt dataset (BioCreative VI, 2017) that identifies relations between chemical compounds and genes/proteins in scientific literature. ChemProt is included in all recent biomedical natural language processing benchmarks including BLUE, BLURB, and BigBio. However, its treatment in these benchmarks and in other separate efforts is typically not end-to-end, with few exceptions. In this effort, we employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset, resulting in > 4% improvement in F1-score over the prior best effort. Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE, especially with regards to handling complex named entities. Our error analysis also identifies a few key failure modes in E2ERE for ChemProt.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"610-618"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10809256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139565432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prediction of COVID-19 Patients' Emergency Room Revisit using Multi-Source Transfer Learning. 利用多源迁移学习预测 COVID-19 患者的急诊室复诊率。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ICHI57859.2023.00028

Yuelyu Ji, Yuhe Gao, Runxue Bao, Qi Li, Disheng Liu, Yiming Sun, Ye Ye

The coronavirus disease 2019 (COVID-19) has led to a global pandemic of significant severity. In addition to its high level of contagiousness, COVID-19 can have a heterogeneous clinical course, ranging from asymptomatic carriers to severe and potentially life-threatening health complications. Many patients have to revisit the emergency room (ER) within a short time after discharge, which significantly increases the workload for medical staff. Early identification of such patients is crucial for helping physicians focus on treating life-threatening cases. In this study, we obtained Electronic Health Records (EHRs) of 3,210 encounters from 13 affiliated ERs within the University of Pittsburgh Medical Center between March 2020 and January 2021. We leveraged a Natural Language Processing technique, ScispaCy, to extract clinical concepts and used the 1001 most frequent concepts to develop 7-day revisit models for COVID-19 patients in ERs. The research data we collected were obtained from 13 ERs, which may have distributional differences that could affect the model development. To address this issue, we employed a classic deep transfer learning method called the Domain Adversarial Neural Network (DANN) and evaluated different modeling strategies, including the Multi-DANN algorithm (which considers the source differences), the Single-DANN algorithm (which doesn't consider the source differences), and three baseline methods: using only source data, using only target data, and using a mixture of source and target data. Results showed that the Multi-DANN models outperformed the Single-DANN models and baseline models in predicting revisits of COVID-19 patients to the ER within 7 days after discharge (median AUROC = 0.8 vs. 0.5). Notably, the Multi-DANN strategy effectively addressed the heterogeneity among multiple source domains and improved the adaptation of source data to the target domain. Moreover, the high performance of Multi-DANN models indicates that EHRs are informative for developing a prediction model to identify COVID-19 patients who are very likely to revisit an ER within 7 days after discharge.

2019 年冠状病毒病（COVID-19）导致了一场严重的全球大流行。除了传染性强之外，COVID-19 的临床病程也多种多样，从无症状携带者到严重并可能危及生命的并发症，不一而足。许多患者在出院后很短时间内就必须再次前往急诊室（ER）就诊，这大大增加了医务人员的工作量。及早发现这类患者对于帮助医生集中精力治疗危及生命的病例至关重要。在这项研究中，我们从匹兹堡大学医疗中心的 13 个附属急诊室获取了 2020 年 3 月至 2021 年 1 月期间 3210 次就诊的电子健康记录（EHR）。我们利用自然语言处理技术 ScispaCy 提取临床概念，并使用 1001 个最常见的概念为急诊室的 COVID-19 患者开发 7 天重访模型。我们收集的研究数据来自 13 家急诊室，其分布差异可能会影响模型的开发。为了解决这个问题，我们采用了一种名为领域对抗神经网络（DANN）的经典深度迁移学习方法，并评估了不同的建模策略，包括多DANN算法（考虑来源差异）、单DANN算法（不考虑来源差异）以及三种基线方法：仅使用来源数据、仅使用目标数据以及使用来源和目标数据的混合数据。结果显示，Multi-DANN 模型在预测 COVID-19 患者出院后 7 天内再次进入急诊室方面的表现优于 Single-DANN 模型和基线模型（中位数 AUROC = 0.8 vs. 0.5）。值得注意的是，Multi-DANN 策略有效地解决了多个源域之间的异质性问题，提高了源数据对目标域的适应性。此外，Multi-DANN 模型的高性能表明，电子病历对于开发预测模型以识别出院后 7 天内极有可能再次到急诊室就诊的 COVID-19 患者具有参考价值。

{"title":"Prediction of COVID-19 Patients' Emergency Room Revisit using Multi-Source Transfer Learning.","authors":"Yuelyu Ji, Yuhe Gao, Runxue Bao, Qi Li, Disheng Liu, Yiming Sun, Ye Ye","doi":"10.1109/ICHI57859.2023.00028","DOIUrl":"10.1109/ICHI57859.2023.00028","url":null,"abstract":"The coronavirus disease 2019 (COVID-19) has led to a global pandemic of significant severity. In addition to its high level of contagiousness, COVID-19 can have a heterogeneous clinical course, ranging from asymptomatic carriers to severe and potentially life-threatening health complications. Many patients have to revisit the emergency room (ER) within a short time after discharge, which significantly increases the workload for medical staff. Early identification of such patients is crucial for helping physicians focus on treating life-threatening cases. In this study, we obtained Electronic Health Records (EHRs) of 3,210 encounters from 13 affiliated ERs within the University of Pittsburgh Medical Center between March 2020 and January 2021. We leveraged a Natural Language Processing technique, ScispaCy, to extract clinical concepts and used the 1001 most frequent concepts to develop 7-day revisit models for COVID-19 patients in ERs. The research data we collected were obtained from 13 ERs, which may have distributional differences that could affect the model development. To address this issue, we employed a classic deep transfer learning method called the Domain Adversarial Neural Network (DANN) and evaluated different modeling strategies, including the Multi-DANN algorithm (which considers the source differences), the Single-DANN algorithm (which doesn't consider the source differences), and three baseline methods: using only source data, using only target data, and using a mixture of source and target data. Results showed that the Multi-DANN models outperformed the Single-DANN models and baseline models in predicting revisits of COVID-19 patients to the ER within 7 days after discharge (median AUROC = 0.8 vs. 0.5). Notably, the Multi-DANN strategy effectively addressed the heterogeneity among multiple source domains and improved the adaptation of source data to the target domain. Moreover, the high performance of Multi-DANN models indicates that EHRs are informative for developing a prediction model to identify COVID-19 patients who are very likely to revisit an ER within 7 days after discharge.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"138-144"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10939709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria. 阿尔茨海默病资格标准的命名实体识别和规范化。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

Pub Date : 2023-06-01 Epub Date: 2023-12-11 DOI: 10.1109/ichi57859.2023.00100

Zenan Sun, Cui Tao

Alzheimer's Disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Finding effective treatments for this disease is crucial. Clinical trials play an essential role in developing and testing new treatments for AD. However, identifying eligible participants can be challenging, time-consuming, and costly. In recent years, the development of natural language processing (NLP) techniques, specifically named entity recognition (NER) and named entity normalization (NEN), have helped to automate the identification and extraction of relevant information from the eligibility criteria (EC) more efficiently, in order to facilitate semi-automatic patient recruitment and enable data FAIRness for clinical trial data. Nevertheless, most current biomedical NER models only provide annotations for a restricted set of entity types that may not be applicable to the clinical trial data. Additionally, accurately performing NEN on entities that are negated using a negative prefix currently lacks established techniques. In this paper, we introduce a pipeline designed for information extraction from AD clinical trial EC, which involves preprocessing of the EC data, clinical NER, and biomedical NEN to Unified Medical Language System (UMLS). Our NER model can identify named entities in seven pre-defined categories, while our NEN model employs a combination of exact match and partial match search strategies, as well as customized rules to accurately normalize entities with negative prefixes. To evaluate the performance of our pipeline, we measured the precision, recall, and F1 score for the NER component, and we manually reviewed the top five mapping results produced by the NEN component. Our evaluation of the pipeline's performance revealed that it can successfully normalize named entities in clinical trial ECs with optimal accuracies. The NER component achieved a overall F1 of 0.816, demonstrating its ability to accurately identify seven types of named entities in clinical text. The NEN component of the pipeline also demonstrated impressive performance, with customized rules and a combination of exact and partial match strategies leading to an accuracy of 0.940 for normalized entities.

阿尔茨海默病（AD）是一种复杂的神经退行性疾病，影响着全球数百万人。找到治疗这种疾病的有效方法至关重要。临床试验在开发和测试阿尔茨海默病的新疗法方面发挥着至关重要的作用。然而，确定符合条件的参与者是一项具有挑战性的工作，既费时又费钱。近年来，自然语言处理（NLP）技术的发展，特别是命名实体识别（NER）和命名实体规范化（NEN）技术的发展，有助于更高效地自动识别和提取资格标准（EC）中的相关信息，从而促进半自动化的患者招募，并实现临床试验数据的公平性。然而，目前大多数生物医学 NER 模型只为有限的实体类型提供注释，而这些实体类型可能并不适用于临床试验数据。此外，对使用否定前缀否定的实体准确执行 NEN 目前还缺乏成熟的技术。在本文中，我们介绍了一个专为从 AD 临床试验 EC 中提取信息而设计的管道，其中包括对 EC 数据进行预处理、临床 NER 以及根据统一医学语言系统（UMLS）进行生物医学 NEN。我们的 NER 模型可以识别七个预定义类别中的命名实体，而我们的 NEN 模型则结合使用了精确匹配和部分匹配搜索策略，以及自定义规则来准确归一化带有负前缀的实体。为了评估我们管道的性能，我们测量了 NER 组件的精确度、召回率和 F1 分数，并手动查看了 NEN 组件生成的前五个映射结果。我们对管道性能的评估结果表明，它能以最佳的准确率成功地对临床试验 EC 中的命名实体进行规范化处理。NER 组件的总体 F1 值为 0.816，表明它有能力准确识别临床文本中的七种命名实体。该管道的 NEN 组件也表现出了令人印象深刻的性能，通过定制规则以及精确匹配和部分匹配策略的组合，规范化实体的准确率达到了 0.940。

{"title":"Named Entity Recognition and Normalization for Alzheimer's Disease Eligibility Criteria.","authors":"Zenan Sun, Cui Tao","doi":"10.1109/ichi57859.2023.00100","DOIUrl":"10.1109/ichi57859.2023.00100","url":null,"abstract":"Alzheimer's Disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Finding effective treatments for this disease is crucial. Clinical trials play an essential role in developing and testing new treatments for AD. However, identifying eligible participants can be challenging, time-consuming, and costly. In recent years, the development of natural language processing (NLP) techniques, specifically named entity recognition (NER) and named entity normalization (NEN), have helped to automate the identification and extraction of relevant information from the eligibility criteria (EC) more efficiently, in order to facilitate semi-automatic patient recruitment and enable data FAIRness for clinical trial data. Nevertheless, most current biomedical NER models only provide annotations for a restricted set of entity types that may not be applicable to the clinical trial data. Additionally, accurately performing NEN on entities that are negated using a negative prefix currently lacks established techniques. In this paper, we introduce a pipeline designed for information extraction from AD clinical trial EC, which involves preprocessing of the EC data, clinical NER, and biomedical NEN to Unified Medical Language System (UMLS). Our NER model can identify named entities in seven pre-defined categories, while our NEN model employs a combination of exact match and partial match search strategies, as well as customized rules to accurately normalize entities with negative prefixes. To evaluate the performance of our pipeline, we measured the precision, recall, and F1 score for the NER component, and we manually reviewed the top five mapping results produced by the NEN component. Our evaluation of the pipeline's performance revealed that it can successfully normalize named entities in clinical trial ECs with optimal accuracies. The NER component achieved a overall F1 of 0.816, demonstrating its ability to accurately identify seven types of named entities in clinical text. The NEN component of the pipeline also demonstrated impressive performance, with customized rules and a combination of exact and partial match strategies leading to an accuracy of 0.940 for normalized entities.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2023 ","pages":"558-564"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10815931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139571763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0