Pub Date : 2024-09-01DOI: 10.1016/j.jbi.2024.104718
Ziqi Zhang, Ailian Jiang
Radiology report generation automates diagnostic narrative synthesis from medical imaging data. Current report generation methods primarily employ knowledge graphs for image enhancement, neglecting the interpretability and guiding function of the knowledge graphs themselves. Additionally, few approaches leverage the stable modal alignment information from multimodal pre-trained models to facilitate the generation of radiology reports. We propose the Terms-Guided Radiology Report Generation (TGR), a simple and practical model for generating reports guided primarily by anatomical terms. Specifically, we utilize a dual-stream visual feature extraction module comprised of detail extraction module and a frozen multimodal pre-trained model to separately extract visual detail features and semantic features. Furthermore, a Visual Enhancement Module (VEM) is proposed to further enrich the visual features, thereby facilitating the generation of a list of anatomical terms. We integrate anatomical terms with image features and proceed to engage contrastive learning with frozen text embeddings, utilizing the stable feature space from these embeddings to boost modal alignment capabilities further. Our model incorporates the capability for manual input, enabling it to generate a list of organs for specifically focused abnormal areas or to produce more accurate single-sentence descriptions based on selected anatomical terms. Comprehensive experiments demonstrate the effectiveness of our method in report generation tasks, our TGR-S model reduces training parameters by 38.9% while performing comparably to current state-of-the-art models, and our TGR-B model exceeds the best baseline models across multiple metrics.
{"title":"Interactive dual-stream contrastive learning for radiology report generation","authors":"Ziqi Zhang, Ailian Jiang","doi":"10.1016/j.jbi.2024.104718","DOIUrl":"10.1016/j.jbi.2024.104718","url":null,"abstract":"<div><p>Radiology report generation automates diagnostic narrative synthesis from medical imaging data. Current report generation methods primarily employ knowledge graphs for image enhancement, neglecting the interpretability and guiding function of the knowledge graphs themselves. Additionally, few approaches leverage the stable modal alignment information from multimodal pre-trained models to facilitate the generation of radiology reports. We propose the Terms-Guided Radiology Report Generation (TGR), a simple and practical model for generating reports guided primarily by anatomical terms. Specifically, we utilize a dual-stream visual feature extraction module comprised of detail extraction module and a frozen multimodal pre-trained model to separately extract visual detail features and semantic features. Furthermore, a Visual Enhancement Module (VEM) is proposed to further enrich the visual features, thereby facilitating the generation of a list of anatomical terms. We integrate anatomical terms with image features and proceed to engage contrastive learning with frozen text embeddings, utilizing the stable feature space from these embeddings to boost modal alignment capabilities further. Our model incorporates the capability for manual input, enabling it to generate a list of organs for specifically focused abnormal areas or to produce more accurate single-sentence descriptions based on selected anatomical terms. Comprehensive experiments demonstrate the effectiveness of our method in report generation tasks, our TGR-S model reduces training parameters by 38.9% while performing comparably to current state-of-the-art models, and our TGR-B model exceeds the best baseline models across multiple metrics.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104718"},"PeriodicalIF":4.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142094701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jbi.2024.104719
Pengyuan Nie , Jinzhong Ning , Mengxuan Lin , Zhihao Yang , Lei Wang
Document-level interaction extraction for Chemical-Disease is aimed at inferring the interaction relations between chemical entities and disease entities across multiple sentences. Compared with sentence-level relation extraction, document-level relation extraction can capture the associations between different entities throughout the entire document, which is found to be more practical for biomedical text information. However, current biomedical extraction methods mainly concentrate on sentence-level relation extraction, making it difficult to access the rich structural information contained in documents in practical application scenarios. We put forward SSGU-CD, a combined Semantic and Structural information Graph U-shaped network for document-level Chemical-Disease interaction extraction. This framework effectively stores document semantic and structure information as graphs and can fuse the original context information of documents. Using the framework, we propose a balanced combination of cross-entropy loss function to facilitate collaborative optimization among models with the aim of enhancing the ability to extract Chemical-Disease interaction relations. We evaluated SSGU-CD on the document-level relation extraction dataset CDR and BioRED, and the results demonstrate that the framework can significantly improve the extraction performance.
化学-疾病的文档级交互关系抽取旨在推断多个句子中化学实体与疾病实体之间的交互关系。与句子级关系提取相比,文档级关系提取可以捕捉整个文档中不同实体之间的关联,这对于生物医学文本信息来说更为实用。然而,目前的生物医学提取方法主要集中于句子级关系提取,在实际应用场景中很难获取文档中包含的丰富结构信息。我们提出了一种用于文档级化学-疾病交互提取的语义与结构信息图U形网络(Semantic and Structural information Graph U-shaped network)。该框架能有效地将文档语义和结构信息存储为图,并能融合文档的原始上下文信息。利用该框架,我们提出了交叉熵损失函数的平衡组合,以促进模型间的协同优化,从而提高提取化学-疾病交互关系的能力。我们在文档级关系提取数据集 CDR 和 BioRED 上对 SSGU-CD 进行了评估,结果表明该框架能显著提高提取性能。
{"title":"SSGU-CD: A combined semantic and structural information graph U-shaped network for document-level Chemical-Disease interaction extraction","authors":"Pengyuan Nie , Jinzhong Ning , Mengxuan Lin , Zhihao Yang , Lei Wang","doi":"10.1016/j.jbi.2024.104719","DOIUrl":"10.1016/j.jbi.2024.104719","url":null,"abstract":"<div><p>Document-level interaction extraction for Chemical-Disease is aimed at inferring the interaction relations between chemical entities and disease entities across multiple sentences. Compared with sentence-level relation extraction, document-level relation extraction can capture the associations between different entities throughout the entire document, which is found to be more practical for biomedical text information. However, current biomedical extraction methods mainly concentrate on sentence-level relation extraction, making it difficult to access the rich structural information contained in documents in practical application scenarios. We put forward SSGU-CD, a combined <strong><u>S</u></strong>emantic and <strong><u>S</u></strong>tructural information <strong><u>G</u></strong>raph <strong><u>U</u></strong>-shaped network for document-level <strong><u>C</u></strong>hemical-<strong><u>D</u></strong>isease interaction extraction. This framework effectively stores document semantic and structure information as graphs and can fuse the original context information of documents. Using the framework, we propose a balanced combination of cross-entropy loss function to facilitate collaborative optimization among models with the aim of enhancing the ability to extract Chemical-Disease interaction relations. We evaluated SSGU-CD on the document-level relation extraction dataset CDR and BioRED, and the results demonstrate that the framework can significantly improve the extraction performance.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104719"},"PeriodicalIF":4.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1532046424001370/pdfft?md5=ccbd03895ffdd2c9164f4a506fad5a18&pid=1-s2.0-S1532046424001370-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142107838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jbi.2024.104712
Yan Guo, Yongqiang Gao, Jiawei Song
In today’s era of rapid development of large models, the traditional drug development process is undergoing a profound transformation. The vast demand for data and consumption of computational resources are making independent drug discovery increasingly difficult. By integrating federated learning technology into the drug discovery field, we have found a solution that both protects privacy and shares computational power. However, the differences in data held by various pharmaceutical institutions and the diversity in drug design objectives have exacerbated the issue of data heterogeneity, making traditional federated learning consensus models unable to meet the personalized needs of all parties. In this study, we introduce and evaluate an innovative drug discovery framework, MolCFL, which utilizes a multi-layer perceptron (MLP) as the generator and a graph convolutional network (GCN) as the discriminator in a generative adversarial network (GAN). By learning the graph structure of molecules, it generates new molecules in a highly personalized manner and then optimizes the learning process by clustering federated learning, grouping compound data with high similarity. MolCFL not only enhances the model’s ability to protect privacy but also significantly improves the efficiency and personalization of molecular design. MolCFL exhibits superior performance when handling non-independently and identically distributed data compared to traditional models. Experimental results show that the framework demonstrates outstanding performance on two benchmark datasets, with the generated new molecules achieving over 90% in Uniqueness and close to 100% in Novelty. MolCFL not only improves the quality and efficiency of drug molecule design but also, through its highly customized clustered federated learning environment, promotes collaboration and specialization in the drug discovery process while ensuring data privacy. These features make MolCFL a powerful tool suitable for addressing the various challenges faced in the modern drug research and development field.
{"title":"MolCFL: A personalized and privacy-preserving drug discovery framework based on generative clustered federated learning","authors":"Yan Guo, Yongqiang Gao, Jiawei Song","doi":"10.1016/j.jbi.2024.104712","DOIUrl":"10.1016/j.jbi.2024.104712","url":null,"abstract":"<div><p>In today’s era of rapid development of large models, the traditional drug development process is undergoing a profound transformation. The vast demand for data and consumption of computational resources are making independent drug discovery increasingly difficult. By integrating federated learning technology into the drug discovery field, we have found a solution that both protects privacy and shares computational power. However, the differences in data held by various pharmaceutical institutions and the diversity in drug design objectives have exacerbated the issue of data heterogeneity, making traditional federated learning consensus models unable to meet the personalized needs of all parties. In this study, we introduce and evaluate an innovative drug discovery framework, MolCFL, which utilizes a multi-layer perceptron (MLP) as the generator and a graph convolutional network (GCN) as the discriminator in a generative adversarial network (GAN). By learning the graph structure of molecules, it generates new molecules in a highly personalized manner and then optimizes the learning process by clustering federated learning, grouping compound data with high similarity. MolCFL not only enhances the model’s ability to protect privacy but also significantly improves the efficiency and personalization of molecular design. MolCFL exhibits superior performance when handling non-independently and identically distributed data compared to traditional models. Experimental results show that the framework demonstrates outstanding performance on two benchmark datasets, with the generated new molecules achieving over 90% in Uniqueness and close to 100% in Novelty. MolCFL not only improves the quality and efficiency of drug molecule design but also, through its highly customized clustered federated learning environment, promotes collaboration and specialization in the drug discovery process while ensuring data privacy. These features make MolCFL a powerful tool suitable for addressing the various challenges faced in the modern drug research and development field.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104712"},"PeriodicalIF":4.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142055706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01DOI: 10.1016/j.jbi.2024.104716
Hui Zong , Rongrong Wu , Jiaxue Cha , Weizhe Feng , Erman Wu , Jiakun Li , Aibin Shao , Liang Tao , Zuofeng Li , Buzhou Tang , Bairong Shen
Objective
This study aims to review the recent advances in community challenges for biomedical text mining in China.
Methods
We collected information of evaluation tasks released in community challenges of biomedical text mining, including task description, dataset description, data source, task type and related links. A systematic summary and comparative analysis were conducted on various biomedical natural language processing tasks, such as named entity recognition, entity normalization, attribute extraction, relation extraction, event extraction, text classification, text similarity, knowledge graph construction, question answering, text generation, and large language model evaluation.
Results
We identified 39 evaluation tasks from 6 community challenges that spanned from 2017 to 2023. Our analysis revealed the diverse range of evaluation task types and data sources in biomedical text mining. We explored the potential clinical applications of these community challenge tasks from a translational biomedical informatics perspective. We compared with their English counterparts, and discussed the contributions, limitations, lessons and guidelines of these community challenges, while highlighting future directions in the era of large language models.
Conclusion
Community challenge evaluation competitions have played a crucial role in promoting technology innovation and fostering interdisciplinary collaboration in the field of biomedical text mining. These challenges provide valuable platforms for researchers to develop state-of-the-art solutions.
{"title":"Advancing Chinese biomedical text mining with community challenges","authors":"Hui Zong , Rongrong Wu , Jiaxue Cha , Weizhe Feng , Erman Wu , Jiakun Li , Aibin Shao , Liang Tao , Zuofeng Li , Buzhou Tang , Bairong Shen","doi":"10.1016/j.jbi.2024.104716","DOIUrl":"10.1016/j.jbi.2024.104716","url":null,"abstract":"<div><h3>Objective</h3><p>This study aims to review the recent advances in community challenges for biomedical text mining in China.</p></div><div><h3>Methods</h3><p>We collected information of evaluation tasks released in community challenges of biomedical text mining, including task description, dataset description, data source, task type and related links. A systematic summary and comparative analysis were conducted on various biomedical natural language processing tasks, such as named entity recognition, entity normalization, attribute extraction, relation extraction, event extraction, text classification, text similarity, knowledge graph construction, question answering, text generation, and large language model evaluation.</p></div><div><h3>Results</h3><p>We identified 39 evaluation tasks from 6 community challenges that spanned from 2017 to 2023. Our analysis revealed the diverse range of evaluation task types and data sources in biomedical text mining. We explored the potential clinical applications of these community challenge tasks from a translational biomedical informatics perspective. We compared with their English counterparts, and discussed the contributions, limitations, lessons and guidelines of these community challenges, while highlighting future directions in the era of large language models.</p></div><div><h3>Conclusion</h3><p>Community challenge evaluation competitions have played a crucial role in promoting technology innovation and fostering interdisciplinary collaboration in the field of biomedical text mining. These challenges provide valuable platforms for researchers to develop state-of-the-art solutions.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104716"},"PeriodicalIF":4.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1532046424001345/pdfft?md5=90f19a6b5c337cb24358bf3c1497f985&pid=1-s2.0-S1532046424001345-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142093211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-26DOI: 10.1016/j.jbi.2024.104715
Yuewei Xue, Shaopeng Guan, Wanhai Jia
Accurately predicting blood glucose levels is crucial in diabetes management to mitigate patients’ risk of complications. However, blood glucose values exhibit instability, and existing prediction methods often struggle to capture their volatile nature, leading to inaccurate trend forecasts. To address these challenges, we propose a novel blood glucose level prediction model based on the Informer architecture: BGformer. Our model introduces a feature enhancement module and a microscale overlapping concerns mechanism. The feature enhancement module integrates periodic and trend feature extractors, enhancing the model’s ability to capture relevant information from the data. By extending the feature extraction capacity of time series data, it provides richer feature representations for analysis. Meanwhile, the microscale overlapping concerns mechanism adopts a window-based strategy, computing attention scores only within specific windows. This approach reduces computational complexity while enhancing the model’s capacity to capture local temporal dependencies. Furthermore, we introduce a dual attention enhancement module to augment the model’s expressive capability. Through prediction experiments on blood glucose values from sixteen diabetic patients, our model outperformed eight benchmark models in terms of both MAE and RMSE metrics for future 60-minute and 90-minute predictions. Our proposed scheme significantly improves the model’s dependency-capturing ability, resulting in more accurate blood glucose level predictions.
{"title":"BGformer: An improved Informer model to enhance blood glucose prediction","authors":"Yuewei Xue, Shaopeng Guan, Wanhai Jia","doi":"10.1016/j.jbi.2024.104715","DOIUrl":"10.1016/j.jbi.2024.104715","url":null,"abstract":"<div><p>Accurately predicting blood glucose levels is crucial in diabetes management to mitigate patients’ risk of complications. However, blood glucose values exhibit instability, and existing prediction methods often struggle to capture their volatile nature, leading to inaccurate trend forecasts. To address these challenges, we propose a novel blood glucose level prediction model based on the Informer architecture: BGformer. Our model introduces a feature enhancement module and a microscale overlapping concerns mechanism. The feature enhancement module integrates periodic and trend feature extractors, enhancing the model’s ability to capture relevant information from the data. By extending the feature extraction capacity of time series data, it provides richer feature representations for analysis. Meanwhile, the microscale overlapping concerns mechanism adopts a window-based strategy, computing attention scores only within specific windows. This approach reduces computational complexity while enhancing the model’s capacity to capture local temporal dependencies. Furthermore, we introduce a dual attention enhancement module to augment the model’s expressive capability. Through prediction experiments on blood glucose values from sixteen diabetic patients, our model outperformed eight benchmark models in terms of both MAE and RMSE metrics for future 60-minute and 90-minute predictions. Our proposed scheme significantly improves the model’s dependency-capturing ability, resulting in more accurate blood glucose level predictions.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104715"},"PeriodicalIF":4.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142089056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-17DOI: 10.1016/j.jbi.2024.104710
Tong Zhang , Shao-Wu Zhang , Ming-Yu Xie , Yan Li
Objective
Identifying cancer driver genes, especially rare or patient-specific cancer driver genes, is a primary goal in cancer therapy. Although researchers have proposed some methods to tackle this problem, these methods mostly identify cancer driver genes at single gene level, overlooking the cooperative relationship among cancer driver genes. Identifying cooperating cancer driver genes in individual patients is pivotal for understanding cancer etiology and advancing the development of personalized therapies.
Methods
Here, we propose a novel Personalized Cooperating cancer Driver Genes (PCoDG) method by using hypergraph random walk to identify the cancer driver genes that cooperatively drive individual patient cancer progression. By leveraging the powerful ability of hypergraph in representing multi-way relationships, PCoDG first employs the personalized hypergraph to depict the complex interactions among mutated genes and differentially expressed genes of an individual patient. Then, a hypergraph random walk algorithm based on hyperedge similarity is utilized to calculate the importance scores of mutated genes, integrating these scores with signaling pathway data to identify the cooperating cancer driver genes in individual patients.
Results
The experimental results on three TCGA cancer datasets (i.e., BRCA, LUAD, and COADREAD) demonstrate the effectiveness of PCoDG in identifying personalized cooperating cancer driver genes. These genes identified by PCoDG not only offer valuable insights into patient stratification correlating with clinical outcomes, but also provide an useful reference resource for tailoring personalized treatments.
Conclusion
We propose a novel method that can effectively identify cooperating cancer driver genes for individual patients, thereby deepening our understanding of the cooperative relationship among personalized cancer driver genes and advancing the development of precision oncology.
{"title":"Identifying cooperating cancer driver genes in individual patients through hypergraph random walk","authors":"Tong Zhang , Shao-Wu Zhang , Ming-Yu Xie , Yan Li","doi":"10.1016/j.jbi.2024.104710","DOIUrl":"10.1016/j.jbi.2024.104710","url":null,"abstract":"<div><h3>Objective</h3><p>Identifying cancer driver genes, especially rare or patient-specific cancer driver genes, is a primary goal in cancer therapy. Although researchers have proposed some methods to tackle this problem, these methods mostly identify cancer driver genes at single gene level, overlooking the cooperative relationship among cancer driver genes. Identifying cooperating cancer driver genes in individual patients is pivotal for understanding cancer etiology and advancing the development of personalized therapies.</p></div><div><h3>Methods</h3><p>Here, we propose a novel Personalized Cooperating cancer Driver Genes (PCoDG) method by using hypergraph random walk to identify the cancer driver genes that cooperatively drive individual patient cancer progression. By leveraging the powerful ability of hypergraph in representing multi-way relationships, PCoDG first employs the personalized hypergraph to depict the complex interactions among mutated genes and differentially expressed genes of an individual patient. Then, a hypergraph random walk algorithm based on hyperedge similarity is utilized to calculate the importance scores of mutated genes, integrating these scores with signaling pathway data to identify the cooperating cancer driver genes in individual patients.</p></div><div><h3>Results</h3><p>The experimental results on three TCGA cancer datasets (i.e., BRCA, LUAD, and COADREAD) demonstrate the effectiveness of PCoDG in identifying personalized cooperating cancer driver genes. These genes identified by PCoDG not only offer valuable insights into patient stratification correlating with clinical outcomes, but also provide an useful reference resource for tailoring personalized treatments.</p></div><div><h3>Conclusion</h3><p>We propose a novel method that can effectively identify cooperating cancer driver genes for individual patients, thereby deepening our understanding of the cooperative relationship among personalized cancer driver genes and advancing the development of precision oncology.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104710"},"PeriodicalIF":4.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142004329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-15DOI: 10.1016/j.jbi.2024.104709
Lara J. Kanbar , Anagh Mishra , Alexander Osborn , Andrew Cifuentes , Jennifer Combs , Michael Sorter , Drew Barzman , Judith W. Dexheimer
Objectives
Natural language processing and machine learning have the potential to lead to biased predictions. We designed a novel Automated RIsk Assessment (ARIA) machine learning algorithm that assesses risk of violence and aggression in adolescents using natural language processing of transcribed student interviews. This work evaluated the possible sources of bias in the study design and the algorithm, tested how much of a prediction was explained by demographic covariates, and investigated the misclassifications based on demographic variables.
Methods
We recruited students 10–18 years of age and enrolled in middle or high schools in Ohio, Kentucky, Indiana, and Tennessee. The reference standard outcome was determined by a forensic psychiatrist as either a “high” or “low” risk level. ARIA used L2-regularized logistic regression to predict a risk level for each student using contextual and semantic features. We conducted three analyses: a PROBAST analysis of risk in study design; analysis of demographic variables as covariates; and a prediction analysis. Covariates were included in the linear regression analyses and comprised of race, sex, ethnicity, household education, annual household income, age at the time of visit, and utilization of public assistance.
Results
We recruited 412 students from 204 schools. ARIA performed with an AUC of 0.92, sensitivity of 71%, NPV of 77%, and specificity of 95%. Of these, 387 students with complete demographic information were included in the analysis. Individual linear regressions resulted in a coefficient of determination less than 0.08 across all demographic variables. When using all demographic variables to predict ARIA’s risk assessment score, the multiple linear regression model resulted in a coefficient of determination of 0.189. ARIA performed with a lower False Negative Rate (FNR) of 15.2% (CI [0 – 40]) for the Black subgroup and 12.7%, CI [0 – 41.4] for Other races, compared to an FNR of 26.1% (CI [14.1 – 41.8]) in the White subgroup.
Conclusions
Bias assessment is needed to address shortcomings within machine learning. In our work, student race, ethnicity, sex, use of public assistance, and annual household income did not explain ARIA’s risk assessment score of students. ARIA will continue to be evaluated regularly with increased subject recruitment.
{"title":"Investigation of bias in the automated assessment of school violence","authors":"Lara J. Kanbar , Anagh Mishra , Alexander Osborn , Andrew Cifuentes , Jennifer Combs , Michael Sorter , Drew Barzman , Judith W. Dexheimer","doi":"10.1016/j.jbi.2024.104709","DOIUrl":"10.1016/j.jbi.2024.104709","url":null,"abstract":"<div><h3>Objectives</h3><p>Natural language processing and machine learning have the potential to lead to biased predictions. We designed a novel Automated RIsk Assessment (ARIA) machine learning algorithm that assesses risk of violence and aggression in adolescents using natural language processing of transcribed student interviews. This work evaluated the possible sources of bias in the study design and the algorithm, tested how much of a prediction was explained by demographic covariates, and investigated the misclassifications based on demographic variables.</p></div><div><h3>Methods</h3><p>We recruited students 10–18 years of age and enrolled in middle or high schools in Ohio, Kentucky, Indiana, and Tennessee. The reference standard outcome was determined by a forensic psychiatrist as either a “high” or “low” risk level. ARIA used L2-regularized logistic regression to predict a risk level for each student using contextual and semantic features. We conducted three analyses: a PROBAST analysis of risk in study design; analysis of demographic variables as covariates; and a prediction analysis. Covariates were included in the linear regression analyses and comprised of race, sex, ethnicity, household education, annual household income, age at the time of visit, and utilization of public assistance.</p></div><div><h3>Results</h3><p>We recruited 412 students from 204 schools. ARIA performed with an AUC of 0.92, sensitivity of 71%, NPV of 77%, and specificity of 95%. Of these, 387 students with complete demographic information were included in the analysis. Individual linear regressions resulted in a coefficient of determination less than 0.08 across all demographic variables. When using all demographic variables to predict ARIA’s risk assessment score, the multiple linear regression model resulted in a coefficient of determination of 0.189. ARIA performed with a lower False Negative Rate (FNR) of 15.2% (CI [0 – 40]) for the Black subgroup and 12.7%, CI [0 – 41.4] for Other races, compared to an FNR of 26.1% (CI [14.1 – 41.8]) in the White subgroup.</p></div><div><h3>Conclusions</h3><p>Bias assessment is needed to address shortcomings within machine learning. In our work, student race, ethnicity, sex, use of public assistance, and annual household income did not explain ARIA’s risk assessment score of students. ARIA will continue to be evaluated regularly with increased subject recruitment.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104709"},"PeriodicalIF":4.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141995770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1016/j.jbi.2024.104707
Majid Afshar , Yanjun Gao , Deepak Gupta , Emma Croxford , Dina Demner-Fushman
Objective:
Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models’ internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs.
Methods:
We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation.
Results:
In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments.
Conclusion:
We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.
{"title":"On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models","authors":"Majid Afshar , Yanjun Gao , Deepak Gupta , Emma Croxford , Dina Demner-Fushman","doi":"10.1016/j.jbi.2024.104707","DOIUrl":"10.1016/j.jbi.2024.104707","url":null,"abstract":"<div><h3>Objective:</h3><p>Traditional knowledge-based and machine learning diagnostic decision support systems have benefited from integrating the medical domain knowledge encoded in the Unified Medical Language System (UMLS). The emergence of Large Language Models (LLMs) to supplant traditional systems poses questions of the quality and extent of the medical knowledge in the models’ internal knowledge representations and the need for external knowledge sources. The objective of this study is three-fold: to probe the diagnosis-related medical knowledge of popular LLMs, to examine the benefit of providing the UMLS knowledge to LLMs (grounding the diagnosis predictions), and to evaluate the correlations between human judgments and the UMLS-based metrics for generations by LLMs.</p></div><div><h3>Methods:</h3><p>We evaluated diagnoses generated by LLMs from consumer health questions and daily care notes in the electronic health records using the ConsumerQA and Problem Summarization datasets. Probing LLMs for the UMLS knowledge was performed by prompting the LLM to complete the diagnosis-related UMLS knowledge paths. Grounding the predictions was examined in an approach that integrated the UMLS graph paths and clinical notes in prompting the LLMs. The results were compared to prompting without the UMLS paths. The final experiments examined the alignment of different evaluation metrics, UMLS-based and non-UMLS, with human expert evaluation.</p></div><div><h3>Results:</h3><p>In probing the UMLS knowledge, GPT-3.5 significantly outperformed Llama2 and a simple baseline yielding an F1 score of 10.9% in completing one-hop UMLS paths for a given concept. Grounding diagnosis predictions with the UMLS paths improved the results for both models on both tasks, with the highest improvement (4%) in SapBERT score. There was a weak correlation between the widely used evaluation metrics (ROUGE and SapBERT) and human judgments.</p></div><div><h3>Conclusion:</h3><p>We found that while popular LLMs contain some medical knowledge in their internal representations, augmentation with the UMLS knowledge provides performance gains around diagnosis generation. The UMLS needs to be tailored for the task to improve the LLMs predictions. Finding evaluation metrics that are aligned with human judgments better than the traditional ROUGE and BERT-based scores remains an open research question.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104707"},"PeriodicalIF":4.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141982357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1016/j.jbi.2024.104705
Yuqing Lei , Adam Christian Naj , Hua Xu , Ruowang Li , Yong Chen
Objective
Phenotypic misclassification in genetic association analyses can impact the accuracy of PRS-based prediction models. The bias reduction method proposed by Tong et al. (2019) has demonstrated its efficacy in reducing the effects of bias on the estimation of association parameters between genotype and phenotype while minimizing variance by employing chart reviews on a subset of the data for validating phenotypes, however its improvement of subsequent PRS prediction accuracy remains unclear. Our study aims to fill this gap by assessing the performance of simulated PRS models and estimating the optimal number of chart reviews needed for validation.
Methods
To comprehensively assess the efficacy of the bias reduction method proposed by Tong et al. in enhancing the accuracy of PRS-based prediction models, we simulated each phenotype under different correlation structures (an independent model, a weakly correlated model, a strongly correlated model) and introduced error-prone phenotypes using two distinct error mechanisms (differential and non-differential phenotyping errors). To facilitate this, we used genotype and phenotype data from 12 case-control datasets in the Alzheimer’s Disease Genetics Consortium (ADGC) to produce simulated phenotypes. The evaluation included analyses across various misclassification rates of original phenotypes as well as quantities of validation set. Additionally, we determined the median threshold, identifying the minimal validation size required for a meaningful improvement in the accuracy of PRS-based predictions across a broad spectrum.
Results
This simulation study demonstrated that incorporating chart review does not universally guarantee enhanced performance of PRS-based prediction models. Specifically, in scenarios with minimal misclassification rates and limited validation sizes, PRS models utilizing debiased regression coefficients demonstrated inferior predictive capabilities compared to models using error-prone phenotypes. Put differently, the effectiveness of the bias reduction method is contingent upon the misclassification rates of phenotypes and the size of the validation set employed during chart reviews. Notably, when dealing with datasets featuring higher misclassification rates, the advantages of utilizing this bias reduction method become more evident, requiring a smaller validation set to achieve better performance.
Conclusion
This study highlights the importance of choosing an appropriate validation set size to balance between the efforts of chart review and the gain in PRS prediction accuracy. Consequently, our study establishes a valuable guidance for validation planning, across a diverse array of sensitivity and specificity combinations.
{"title":"Balancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study","authors":"Yuqing Lei , Adam Christian Naj , Hua Xu , Ruowang Li , Yong Chen","doi":"10.1016/j.jbi.2024.104705","DOIUrl":"10.1016/j.jbi.2024.104705","url":null,"abstract":"<div><h3>Objective</h3><p>Phenotypic misclassification in genetic association analyses can impact the accuracy of PRS-based prediction models. The bias reduction method proposed by Tong et al. (2019) has demonstrated its efficacy in reducing the effects of bias on the estimation of association parameters between genotype and phenotype while minimizing variance by employing chart reviews on a subset of the data for validating phenotypes, however its improvement of subsequent PRS prediction accuracy remains unclear. Our study aims to fill this gap by assessing the performance of simulated PRS models and estimating the optimal number of chart reviews needed for validation.</p></div><div><h3>Methods</h3><p>To comprehensively assess the efficacy of the bias reduction method proposed by Tong et al. in enhancing the accuracy of PRS-based prediction models, we simulated each phenotype under different correlation structures (an independent model, a weakly correlated model, a strongly correlated model) and introduced error-prone phenotypes using two distinct error mechanisms (differential and non-differential phenotyping errors). To facilitate this, we used genotype and phenotype data from 12 case-control datasets in the Alzheimer’s Disease Genetics Consortium (ADGC) to produce simulated phenotypes. The evaluation included analyses across various misclassification rates of original phenotypes as well as quantities of validation set. Additionally, we determined the median threshold, identifying the minimal validation size required for a meaningful improvement in the accuracy of PRS-based predictions across a broad spectrum.</p></div><div><h3>Results</h3><p>This simulation study demonstrated that incorporating chart review does not universally guarantee enhanced performance of PRS-based prediction models. Specifically, in scenarios with minimal misclassification rates and limited validation sizes, PRS models utilizing debiased regression coefficients demonstrated inferior predictive capabilities compared to models using error-prone phenotypes. Put differently, the effectiveness of the bias reduction method is contingent upon the misclassification rates of phenotypes and the size of the validation set employed during chart reviews. Notably, when dealing with datasets featuring higher misclassification rates, the advantages of utilizing this bias reduction method become more evident, requiring a smaller validation set to achieve better performance.</p></div><div><h3>Conclusion</h3><p>This study highlights the importance of choosing an appropriate validation set size to balance between the efforts of chart review and the gain in PRS prediction accuracy. Consequently, our study establishes a valuable guidance for validation planning, across a diverse array of sensitivity and specificity combinations.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104705"},"PeriodicalIF":4.0,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141971201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1016/j.jbi.2024.104706
Boran Hao , Yang Hu , William G. Adams , Sabrina A. Assoumou , Heather E. Hsu , Nahid Bhadelia , Ioannis Ch. Paschalidis
Objective
To develop an Artificial Intelligence (AI)-based anomaly detection model as a complement of an “astute physician” in detecting novel disease cases in a hospital and preventing emerging outbreaks.
Methods
Data included hospitalized patients (n = 120,714) at a safety-net hospital in Massachusetts. A novel Generative Pre-trained Transformer (GPT)-based clinical anomaly detection system was designed and further trained using Empirical Risk Minimization (ERM), which can model a hospitalized patient’s Electronic Health Records (EHR) and detect atypical patients. Methods and performance metrics, similar to the ones behind the recent Large Language Models (LLMs), were leveraged to capture the dynamic evolution of the patient’s clinical variables and compute an Out-Of-Distribution (OOD) anomaly score.
Results
In a completely unsupervised setting, hospitalizations for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infection could have been predicted by our GPT model at the beginning of the COVID-19 pandemic, with an Area Under the Receiver Operating Characteristic Curve (AUC) of 92.2 %, using 31 extracted clinical variables and a 3-day detection window. Our GPT achieves individual patient-level anomaly detection and mortality prediction AUC of 78.3 % and 94.7 %, outperforming traditional linear models by 6.6 % and 9 %, respectively. Different types of clinical trajectories of a SARS-CoV-2 infection are captured by our model to make interpretable detections, while a trend of over-pessimistic outcome prediction yields a more effective detection pathway. Furthermore, our comprehensive GPT model can potentially assist clinicians with forecasting patient clinical variables and developing personalized treatment plans.
Conclusion
This study demonstrates that an emerging outbreak can be accurately detected within a hospital, by using a GPT to model patient EHR time sequences and labeling them as anomalous when actual outcomes are not supported by the model. Such a GPT is also a comprehensive model with the functionality of generating future patient clinical variables, which can potentially assist clinicians in developing personalized treatment plans.
{"title":"A GPT-based EHR modeling system for unsupervised novel disease detection","authors":"Boran Hao , Yang Hu , William G. Adams , Sabrina A. Assoumou , Heather E. Hsu , Nahid Bhadelia , Ioannis Ch. Paschalidis","doi":"10.1016/j.jbi.2024.104706","DOIUrl":"10.1016/j.jbi.2024.104706","url":null,"abstract":"<div><h3>Objective</h3><p>To develop an <em>Artificial Intelligence (AI)</em>-based anomaly detection model as a complement of an “astute physician” in detecting novel disease cases in a hospital and preventing emerging outbreaks<em>.</em></p></div><div><h3>Methods</h3><p>Data included hospitalized patients (n = 120,714) at a safety-net hospital in Massachusetts. A novel <em>Generative Pre-trained Transformer (GPT)</em>-based clinical anomaly detection system was designed and further trained using <em>Empirical Risk Minimization (ERM)</em>, which can model a hospitalized patient’s <em>Electronic Health Records (EHR)</em> and detect atypical patients. Methods and performance metrics, similar to the ones behind the recent <em>Large Language Models (LLMs)</em>, were leveraged to capture the dynamic evolution of the patient’s clinical variables and compute an <em>Out-Of-Distribution (OOD)</em> anomaly score.</p></div><div><h3>Results</h3><p>In a completely unsupervised setting, hospitalizations for <em>Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)</em> infection could have been predicted by our GPT model at the beginning of the COVID-19 pandemic, with an Area Under the Receiver Operating Characteristic Curve (AUC) of 92.2 %, using 31 extracted clinical variables and a 3-day detection window. Our GPT achieves individual patient-level anomaly detection and mortality prediction AUC of 78.3 % and 94.7 %, outperforming traditional linear models by 6.6 % and 9 %, respectively. Different types of clinical trajectories of a SARS-CoV-2 infection are captured by our model to make interpretable detections, while a trend of over-pessimistic outcome prediction yields a more effective detection pathway. Furthermore, our comprehensive GPT model can potentially assist clinicians with forecasting patient clinical variables and developing personalized treatment plans.</p></div><div><h3>Conclusion</h3><p>This study demonstrates that an emerging outbreak can be accurately detected within a hospital, by using a GPT to model patient EHR time sequences and labeling them as anomalous when actual outcomes are not supported by the model. Such a GPT is also a comprehensive model with the functionality of generating future patient clinical variables, which can potentially assist clinicians in developing personalized treatment plans.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"157 ","pages":"Article 104706"},"PeriodicalIF":4.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141912806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}