首页 > 最新文献

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing最新文献

英文 中文
Uncovering Important Diagnostic Features for Alzheimer's, Parkinson's and Other Dementias Using Interpretable Association Mining Methods. 利用可解释的关联挖掘方法揭示阿尔茨海默病、帕金森病和其他痴呆症的重要诊断特征。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0045
Kazi Noshin, Mary Regina Boland, Bojian Hou, Victoria Lu, Carol Manning, Li Shen, Aidong Zhang

Alzheimer's Disease and Related Dementias (ADRD) afflict almost 7 million people in the USA alone. The majority of research in ADRD is conducted using post-mortem samples of brain tissue or carefully recruited clinical trial patients. While these resources are excellent, they suffer from lack of sex/gender, and racial/ethnic inclusiveness. Electronic Health Records (EHR) data has the potential to bridge this gap by including real-world ADRD patients treated during routine clinical care. In this study, we utilize EHR data from a cohort of 70,420 ADRD patients diagnosed and treated at Penn Medicine. Our goal is to uncover important risk features leading to three types of Neuro-Degenerative Disorders (NDD), including Alzheimer's Disease (AD), Parkinson's Disease (PD) and Other Dementias (OD). We employ a variety of Machine Learning (ML) Methods, including uni-variate and multivariate ML approaches and compare accuracies across the ML methods. We also investigate the types of features identified by each method, the overlapping features and the unique features to highlight important advantages and disadvantages of each approach specific for certain NDD types. Our study is important for those interested in studying ADRD and NDD in EHRs as it highlights the strengths and limitations of popular approaches employed in the ML community. We found that the uni-variate approach was able to uncover features that were important and rare for specific types of NDD (AD, PD, OD), which is important from a clinical perspective. Features that were found across all methods represent features that are the most robust.

阿尔茨海默病和相关痴呆(ADRD)仅在美国就折磨着近700万人。大多数关于ADRD的研究都是使用死后脑组织样本或精心招募的临床试验患者进行的。虽然这些资源很好,但它们缺乏性别/性别和种族/民族包容性。电子健康记录(EHR)数据有可能通过包括常规临床护理期间治疗的真实ADRD患者来弥合这一差距。在这项研究中,我们利用了宾夕法尼亚大学医学院诊断和治疗的70420名ADRD患者的电子病历数据。我们的目标是揭示导致三种神经退行性疾病(NDD)的重要风险特征,包括阿尔茨海默病(AD),帕金森病(PD)和其他痴呆症(OD)。我们采用各种机器学习(ML)方法,包括单变量和多变量ML方法,并比较ML方法的准确性。我们还研究了每种方法所识别的特征类型、重叠特征和独特特征,以突出每种方法针对特定NDD类型的重要优点和缺点。我们的研究对于那些对研究电子病历中的ADRD和NDD感兴趣的人很重要,因为它突出了ML社区采用的流行方法的优势和局限性。我们发现单变量方法能够揭示特定类型NDD (AD, PD, OD)的重要且罕见的特征,这从临床角度来看是重要的。在所有方法中发现的特征代表了最健壮的特征。
{"title":"Uncovering Important Diagnostic Features for Alzheimer's, Parkinson's and Other Dementias Using Interpretable Association Mining Methods.","authors":"Kazi Noshin, Mary Regina Boland, Bojian Hou, Victoria Lu, Carol Manning, Li Shen, Aidong Zhang","doi":"10.1142/9789819807024_0045","DOIUrl":"https://doi.org/10.1142/9789819807024_0045","url":null,"abstract":"<p><p>Alzheimer's Disease and Related Dementias (ADRD) afflict almost 7 million people in the USA alone. The majority of research in ADRD is conducted using post-mortem samples of brain tissue or carefully recruited clinical trial patients. While these resources are excellent, they suffer from lack of sex/gender, and racial/ethnic inclusiveness. Electronic Health Records (EHR) data has the potential to bridge this gap by including real-world ADRD patients treated during routine clinical care. In this study, we utilize EHR data from a cohort of 70,420 ADRD patients diagnosed and treated at Penn Medicine. Our goal is to uncover important risk features leading to three types of Neuro-Degenerative Disorders (NDD), including Alzheimer's Disease (AD), Parkinson's Disease (PD) and Other Dementias (OD). We employ a variety of Machine Learning (ML) Methods, including uni-variate and multivariate ML approaches and compare accuracies across the ML methods. We also investigate the types of features identified by each method, the overlapping features and the unique features to highlight important advantages and disadvantages of each approach specific for certain NDD types. Our study is important for those interested in studying ADRD and NDD in EHRs as it highlights the strengths and limitations of popular approaches employed in the ML community. We found that the uni-variate approach was able to uncover features that were important and rare for specific types of NDD (AD, PD, OD), which is important from a clinical perspective. Features that were found across all methods represent features that are the most robust.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"631-646"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144036917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions. 用迭代追问改进医学中的检索增强生成。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0015
Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang

The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a vanilla RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with vanilla RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG.

大型语言模型(llm)的涌现能力在解决医学问题方面显示出巨大的潜力。他们可能拥有丰富的医学知识,但仍可能产生幻觉,在知识更新方面缺乏灵活性。虽然已经提出了检索增强生成(RAG)来增强具有外部知识库的法学硕士的医学问答能力,但在需要多轮信息搜索的复杂情况下,它仍然可能失败。为了解决这一问题,我们提出了针对医学的迭代RAG (i-MedRAG),法学硕士可以根据之前的信息搜索尝试迭代地提出后续查询。在i-MedRAG的每次迭代中,后续的查询将由一个普通的RAG系统回答,它们将进一步用于指导下一次迭代中的查询生成。我们的实验表明,与vanilla RAG相比,i-MedRAG带来的各种llm在来自美国医学许可考试(USMLE)临床小视频的复杂问题以及大规模多任务语言理解(MMLU)数据集中的各种知识测试上的性能有所提高。值得注意的是,我们的零射击i-MedRAG优于GPT-3.5上所有现有的提示工程和微调方法,在MedQA数据集上实现了69.68%的准确率。此外,我们用后续查询的不同迭代和每次迭代的不同查询数量来表征i-MedRAG的缩放属性。我们的案例研究表明,i-MedRAG可以灵活地提出后续查询,形成推理链,对医疗问题进行深度分析。据我们所知,这是第一次将后续查询纳入医疗RAG的此类研究。
{"title":"Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions.","authors":"Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang","doi":"10.1142/9789819807024_0015","DOIUrl":"10.1142/9789819807024_0015","url":null,"abstract":"<p><p>The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may still fail in complex cases where multiple rounds of information-seeking are required. To address such an issue, we propose iterative RAG for medicine (i-MedRAG), where LLMs can iteratively ask follow-up queries based on previous information-seeking attempts. In each iteration of i-MedRAG, the follow-up queries will be answered by a vanilla RAG system and they will be further used to guide the query generation in the next iteration. Our experiments show the improved performance of various LLMs brought by i-MedRAG compared with vanilla RAG on complex questions from clinical vignettes in the United States Medical Licensing Examination (USMLE), as well as various knowledge tests in the Massive Multitask Language Understanding (MMLU) dataset. Notably, our zero-shot i-MedRAG outperforms all existing prompt engineering and fine-tuning methods on GPT-3.5, achieving an accuracy of 69.68% on the MedQA dataset. In addition, we characterize the scaling properties of i-MedRAG with different iterations of follow-up queries and different numbers of queries per iteration. Our case studies show that i-MedRAG can flexibly ask follow-up queries to form reasoning chains, providing an in-depth analysis of medical questions. To the best of our knowledge, this is the first-of-its-kind study on incorporating follow-up queries into medical RAG.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"199-214"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11997844/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI in Point-of-Care - A Sustainable Healthcare Revolution at the Edge. 人工智能在医疗点——边缘的可持续医疗革命。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0055
Yousuf Rajput, Tarek Tarif, Akira Wolfe, Eric Dawson, Keolu Fox

This paper examines the integration of artificial intelligence (AI) in point-of-care testing (POCT) to enhance diagnostic speed, accuracy, and accessibility, particularly in underserved regions. AI-driven POCT is shown to optimize clinical decision-making, reduce diagnostic times, and offer personalized healthcare solutions, with applications in genome sequencing and infectious disease management. The paper highlights the environmental challenges of AI, including high energy consumption and electronic waste, and proposes solutions such as energy-efficient algorithms and edge computing. It also addresses ethical concerns, emphasizing the reduction of algorithmic bias and the need for equitable access to AI technologies. While AI in POCT can improve healthcare and promote sustainability, collaboration within the POCT ecosystem-among researchers, healthcare providers, and policymakers-is essential to overcome the ethical, environmental, and technological challenges.

本文探讨了人工智能(AI)在护理点检测(POCT)中的集成,以提高诊断速度,准确性和可及性,特别是在服务不足的地区。人工智能驱动的POCT被证明可以优化临床决策,缩短诊断时间,并提供个性化的医疗保健解决方案,并在基因组测序和传染病管理中得到应用。该论文强调了人工智能的环境挑战,包括高能耗和电子垃圾,并提出了节能算法和边缘计算等解决方案。它还解决了伦理问题,强调减少算法偏见和公平获取人工智能技术的必要性。虽然POCT中的人工智能可以改善医疗保健并促进可持续性,但POCT生态系统内的合作——研究人员、医疗保健提供者和政策制定者之间的合作——对于克服伦理、环境和技术挑战至关重要。
{"title":"AI in Point-of-Care - A Sustainable Healthcare Revolution at the Edge.","authors":"Yousuf Rajput, Tarek Tarif, Akira Wolfe, Eric Dawson, Keolu Fox","doi":"10.1142/9789819807024_0055","DOIUrl":"10.1142/9789819807024_0055","url":null,"abstract":"<p><p>This paper examines the integration of artificial intelligence (AI) in point-of-care testing (POCT) to enhance diagnostic speed, accuracy, and accessibility, particularly in underserved regions. AI-driven POCT is shown to optimize clinical decision-making, reduce diagnostic times, and offer personalized healthcare solutions, with applications in genome sequencing and infectious disease management. The paper highlights the environmental challenges of AI, including high energy consumption and electronic waste, and proposes solutions such as energy-efficient algorithms and edge computing. It also addresses ethical concerns, emphasizing the reduction of algorithmic bias and the need for equitable access to AI technologies. While AI in POCT can improve healthcare and promote sustainability, collaboration within the POCT ecosystem-among researchers, healthcare providers, and policymakers-is essential to overcome the ethical, environmental, and technological challenges.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"734-747"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency of adding salt is a stronger predictor of chronic kidney disease in individuals with genetic risk. 在有遗传风险的个体中,加盐频率是慢性肾脏疾病的一个更强的预测因子。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0039
Manu Shivakumar, Yanggyun Kim, Sang-Hyuk Jung, Jakob Woerner, Dokyoon Kim

The incidence of chronic kidney disease (CKD) is increasing worldwide, but there is no specific treatment available. Therefore, understanding and controlling the risk factors for CKD are essential for preventing disease occurrence. Salt intake raises blood pressure by increasing fluid volume and contributes to the deterioration of kidney function by enhancing the renin-angiotensin system and sympathetic tone. Thus, a low-salt diet is important to reduce blood pressure and prevent kidney diseases. With recent advancements in genetic research, our understanding of the etiology and genetic background of CKD has deepened, enabling the identification of populations with a high genetic predisposition to CKD. It is thought that the impact of lifestyle or environmental factors on disease occurrence or prevention may vary based on genetic factors. This study aims to investigate whether frequency of adding salt has different effects depending on genetic risk for CKD. CKD polygenic risk scores (PRS) were generated using CKDGen Consortium GWAS (N= 765,348) summary statics. Then we applied the CKD PRS to UK Biobank subjects. A total of 331,318 European individuals aged 40-69 without CKD were enrolled in the study between 2006-2010. The average age at enrollment of the participants in this study was 56.69, and 46% were male. Over an average follow-up period of 8 years, 12,279 CKD cases were identified. The group that developed CKD had a higher percentage of individuals who added salt (46.37% vs. 43.04%) and higher CKD high-risk PRS values compared to the group that did not develop CKD (23.53% vs. 19.86%). We classified the individuals into four groups based on PRS: low (0-19%), intermediate (20-79%), high (80-94%), very high (≥ 95%). Incidence of CKD increased incrementally according to CKD PRS even after adjusting for age, sex, race, Townsend deprivation index, body mass index, estimated glomerular filtration rate, smoking, alcohol, physical activity, diabetes mellitus, dyslipidemia, hypertension, coronary artery diseases, cerebrovascular diseases at baseline. Compared to the "never/rarely" frequency of adding salt group, "always" frequency of adding salt group had an increasing incidence of CKD proportionate to the degree of frequency of adding salt. However, the significant association of "always" group on incident CKD disappeared in the low PRS group. This study validated the signal from PRSs for CKD across a large cohort and confirmed that frequency of adding salt contributes to the occurrence of CKD. Additionally, it confirmed that the effect of frequency of "always" adding salt on CKD incidence is greater in those with more than intermediate CKD-PRS. This study suggests that increased salt intake is particularly concerning for individuals with genetic risk factors for CKD, underscoring the clinical importance of reducing salt intake for these individuals.

慢性肾脏疾病(CKD)的发病率在全球范围内呈上升趋势,但目前尚无专门的治疗方法。因此,了解和控制CKD的危险因素对预防疾病的发生至关重要。盐的摄入通过增加体液量而升高血压,并通过增强肾素-血管紧张素系统和交感神经张力而导致肾功能恶化。因此,低盐饮食对降低血压和预防肾脏疾病很重要。随着最近遗传学研究的进展,我们对CKD的病因学和遗传背景的理解已经加深,从而能够识别出CKD高遗传易感性人群。人们认为,生活方式或环境因素对疾病发生或预防的影响可能因遗传因素而异。本研究旨在探讨食盐添加频率是否对CKD遗传风险有不同的影响。使用CKDGen Consortium GWAS (N= 765,348)汇总统计生成CKD多基因风险评分(PRS)。然后我们将CKD PRS应用于UK Biobank受试者。2006-2010年间,共有331,318名年龄在40-69岁之间、无CKD的欧洲人参加了这项研究。本研究参与者入组时的平均年龄为56.69岁,其中46%为男性。在平均8年的随访期间,确定了12,279例CKD病例。与未发生CKD的组相比,发生CKD的组添加盐的个体比例更高(46.37%对43.04%),CKD高危PRS值也更高(23.53%对19.86%)。我们根据PRS将个体分为4组:低(0-19%)、中(20-79%)、高(80-94%)、极高(≥95%)。即使在调整了年龄、性别、种族、Townsend剥夺指数、体重指数、肾小球滤过率、吸烟、饮酒、体力活动、糖尿病、血脂异常、高血压、冠状动脉疾病、脑血管疾病等基线因素后,根据CKD PRS, CKD的发病率仍呈递增趋势。与“从不/很少”加盐频率组相比,“经常”加盐频率组CKD发病率与加盐频率成正比。然而,低PRS组“总是”组与CKD事件的显著相关性消失。本研究在一个大型队列中验证了PRSs对CKD的信号,并证实了添加盐的频率有助于CKD的发生。此外,它证实了“总是”加盐频率对CKD发病率的影响在中度以上的CKD- prs患者中更大。这项研究表明,增加盐摄入量对具有CKD遗传风险因素的个体尤其重要,强调了减少这些个体盐摄入量的临床重要性。
{"title":"Frequency of adding salt is a stronger predictor of chronic kidney disease in individuals with genetic risk.","authors":"Manu Shivakumar, Yanggyun Kim, Sang-Hyuk Jung, Jakob Woerner, Dokyoon Kim","doi":"10.1142/9789819807024_0039","DOIUrl":"10.1142/9789819807024_0039","url":null,"abstract":"<p><p>The incidence of chronic kidney disease (CKD) is increasing worldwide, but there is no specific treatment available. Therefore, understanding and controlling the risk factors for CKD are essential for preventing disease occurrence. Salt intake raises blood pressure by increasing fluid volume and contributes to the deterioration of kidney function by enhancing the renin-angiotensin system and sympathetic tone. Thus, a low-salt diet is important to reduce blood pressure and prevent kidney diseases. With recent advancements in genetic research, our understanding of the etiology and genetic background of CKD has deepened, enabling the identification of populations with a high genetic predisposition to CKD. It is thought that the impact of lifestyle or environmental factors on disease occurrence or prevention may vary based on genetic factors. This study aims to investigate whether frequency of adding salt has different effects depending on genetic risk for CKD. CKD polygenic risk scores (PRS) were generated using CKDGen Consortium GWAS (N= 765,348) summary statics. Then we applied the CKD PRS to UK Biobank subjects. A total of 331,318 European individuals aged 40-69 without CKD were enrolled in the study between 2006-2010. The average age at enrollment of the participants in this study was 56.69, and 46% were male. Over an average follow-up period of 8 years, 12,279 CKD cases were identified. The group that developed CKD had a higher percentage of individuals who added salt (46.37% vs. 43.04%) and higher CKD high-risk PRS values compared to the group that did not develop CKD (23.53% vs. 19.86%). We classified the individuals into four groups based on PRS: low (0-19%), intermediate (20-79%), high (80-94%), very high (≥ 95%). Incidence of CKD increased incrementally according to CKD PRS even after adjusting for age, sex, race, Townsend deprivation index, body mass index, estimated glomerular filtration rate, smoking, alcohol, physical activity, diabetes mellitus, dyslipidemia, hypertension, coronary artery diseases, cerebrovascular diseases at baseline. Compared to the \"never/rarely\" frequency of adding salt group, \"always\" frequency of adding salt group had an increasing incidence of CKD proportionate to the degree of frequency of adding salt. However, the significant association of \"always\" group on incident CKD disappeared in the low PRS group. This study validated the signal from PRSs for CKD across a large cohort and confirmed that frequency of adding salt contributes to the occurrence of CKD. Additionally, it confirmed that the effect of frequency of \"always\" adding salt on CKD incidence is greater in those with more than intermediate CKD-PRS. This study suggests that increased salt intake is particularly concerning for individuals with genetic risk factors for CKD, underscoring the clinical importance of reducing salt intake for these individuals.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"551-564"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12008778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Evaluation of Antibiotic Prescribing Guideline Concordance in Pediatric Sinusitis Clinical Notes. 自动评估小儿鼻窦炎临床笔记中抗生素处方指南的一致性。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0011
Davy Weissenbacher, Lauren Dutcher, Mickael Boustany, Leigh Cressman, Karen O'Connor, Keith W Hamilton, Jeffrey Gerber, Robert Grundmeier, Graciela Gonzalez-Hernandez

Background: Ensuring antibiotics are prescribed only when necessary is crucial for maintaining their effectiveness and is a key focus of public health initiatives worldwide. In cases of sinusitis, among the most common reasons for antibiotic prescriptions in children, healthcare providers must distinguish between bacterial and viral causes based on clinical signs and symptoms. However, due to the overlap between symptoms of acute sinusitis and viral upper respiratory infections, antibiotics are often over-prescribed.

Objectives: Currently, there are no electronic health record (EHR)-based methods, such as lab tests or ICD-10 codes, to retroactively assess the appropriateness of prescriptions for sinusitis, making manual chart reviews the only available method for evaluation, which is time-intensive and not feasible at a large scale. In this study, we propose using natural language processing to automate this assessment.

Methods: We developed, trained, and evaluated generative models to classify the appropriateness of antibiotic prescriptions in 300 clinical notes from pediatric patients with sinusitis seen at a primary care practice in the Children's Hospital of Philadelphia network. We utilized standard prompt engineering techniques, including few-shot learning and chain-of-thought prompting, to refine an initial prompt. Additionally, we employed Parameter-Efficient Fine-Tuning to train a medium-sized generative model Llama 3 70B-instruct.

Results: While parameter-efficient fine-tuning did not enhance performance, the combination of few-shot learning and chain-of-thought prompting proved beneficial. Our best results were achieved using the largest generative model publicly available to date, the Llama 3.1 405B-instruct. On our evaluation set, the model correctly identified 94.7% of the 152 notes where antibiotic prescription was appropriate and 66.2% of the 83 notes where it was not appropriate. However, 15 notes that were insufficiently, vaguely, or ambiguously documented by physicians posed a challenge to our model, as none were accurately classified.

Conclusion: Our generative model demonstrated good performance in the challenging task of chart review. This level of performance may be sufficient for deploying the model within the EHR, where it can assist physicians in real-time to prescribe antibiotics in concordance with the guidelines, or for monitoring antibiotic stewardship on a large scale.

背景:确保仅在必要时开抗生素处方对于保持其有效性至关重要,也是世界各地公共卫生行动的重点。在鼻窦炎的情况下,儿童抗生素处方的最常见原因之一,医疗保健提供者必须根据临床体征和症状区分细菌和病毒原因。然而,由于急性鼻窦炎和病毒性上呼吸道感染的症状重叠,抗生素经常被过量使用。目的:目前,没有基于电子健康记录(EHR)的方法,如实验室测试或ICD-10代码,来追溯评估鼻窦炎处方的适当性,使手动图表审查成为唯一可用的评估方法,这是耗时且不可行的大规模方法。在本研究中,我们建议使用自然语言处理来自动化此评估。方法:我们开发、训练和评估生成模型,对费城儿童医院网络初级保健实践中300例鼻窦炎儿科患者的临床记录进行抗生素处方的适当性分类。我们利用标准的提示工程技术,包括少量学习和思维链提示,来完善初始提示。此外,我们采用参数高效微调来训练中型生成模型Llama 370b - instruction。结果:虽然参数有效的微调并没有提高性能,但少射学习和思维链提示的结合被证明是有益的。我们的最佳结果是使用迄今为止公开可用的最大生成模型Llama 3.1 405b指令。在我们的评估集中,该模型正确识别了152个抗生素处方合适的笔记中的94.7%和83个不合适的笔记中的66.2%。然而,医生记录的不充分、模糊或含糊的15个笔记对我们的模型构成了挑战,因为没有一个是准确分类的。结论:我们的生成模型在具有挑战性的图表评审任务中表现良好。这种性能水平可能足以在电子病历中部署该模型,它可以帮助医生根据指南实时开抗生素处方,或大规模监测抗生素管理。
{"title":"Automated Evaluation of Antibiotic Prescribing Guideline Concordance in Pediatric Sinusitis Clinical Notes.","authors":"Davy Weissenbacher, Lauren Dutcher, Mickael Boustany, Leigh Cressman, Karen O'Connor, Keith W Hamilton, Jeffrey Gerber, Robert Grundmeier, Graciela Gonzalez-Hernandez","doi":"10.1142/9789819807024_0011","DOIUrl":"10.1142/9789819807024_0011","url":null,"abstract":"<p><strong>Background: </strong>Ensuring antibiotics are prescribed only when necessary is crucial for maintaining their effectiveness and is a key focus of public health initiatives worldwide. In cases of sinusitis, among the most common reasons for antibiotic prescriptions in children, healthcare providers must distinguish between bacterial and viral causes based on clinical signs and symptoms. However, due to the overlap between symptoms of acute sinusitis and viral upper respiratory infections, antibiotics are often over-prescribed.</p><p><strong>Objectives: </strong>Currently, there are no electronic health record (EHR)-based methods, such as lab tests or ICD-10 codes, to retroactively assess the appropriateness of prescriptions for sinusitis, making manual chart reviews the only available method for evaluation, which is time-intensive and not feasible at a large scale. In this study, we propose using natural language processing to automate this assessment.</p><p><strong>Methods: </strong>We developed, trained, and evaluated generative models to classify the appropriateness of antibiotic prescriptions in 300 clinical notes from pediatric patients with sinusitis seen at a primary care practice in the Children's Hospital of Philadelphia network. We utilized standard prompt engineering techniques, including few-shot learning and chain-of-thought prompting, to refine an initial prompt. Additionally, we employed Parameter-Efficient Fine-Tuning to train a medium-sized generative model Llama 3 70B-instruct.</p><p><strong>Results: </strong>While parameter-efficient fine-tuning did not enhance performance, the combination of few-shot learning and chain-of-thought prompting proved beneficial. Our best results were achieved using the largest generative model publicly available to date, the Llama 3.1 405B-instruct. On our evaluation set, the model correctly identified 94.7% of the 152 notes where antibiotic prescription was appropriate and 66.2% of the 83 notes where it was not appropriate. However, 15 notes that were insufficiently, vaguely, or ambiguously documented by physicians posed a challenge to our model, as none were accurately classified.</p><p><strong>Conclusion: </strong>Our generative model demonstrated good performance in the challenging task of chart review. This level of performance may be sufficient for deploying the model within the EHR, where it can assist physicians in real-time to prescribe antibiotics in concordance with the guidelines, or for monitoring antibiotic stewardship on a large scale.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"138-153"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Amyloid, Tau, and APOE in Alzheimer's Disease: Impact on White Matter Tracts. 淀粉样蛋白、Tau蛋白和APOE在阿尔茨海默病中的作用:对白质束的影响
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0029
Bramsh Qamar Chandio, Julio E Villalon-Reina, Talia M Nir, Sophia I Thomopoulos, Yixue Feng, Sebastian Benavidez, Neda Jahanshad, Jaroslaw Harezlak, Eleftherios Garyfallidis, Paul M Thompson

Alzheimer's disease (AD) is characterized by cognitive decline and memory loss due to the abnormal accumulation of amyloid-beta (Aβ) plaques and tau tangles in the brain; its onset and progression also depend on genetic factors such as the apolipoprotein E (APOE) genotype. Understanding how these factors affect the brain's neural pathways is important for early diagnostics and interventions. Tractometry is an advanced technique for 3D quantitative assessment of white matter tracts, localizing microstructural abnormalities in diseased populations in vivo. In this work, we applied BUAN (Bundle Analytics) tractometry to 3D diffusion MRI data from 730 participants in ADNI3 (phase 3 of the Alzheimer's Disease Neuroimaging Initiative; age range: 55-95 years, 349M/381F, 214 with mild cognitive impairment, 69 with AD, and 447 cognitively healthy controls). Using along-tract statistical analysis, we assessed the localized impact of amyloid, tau, and APOE genetic variants on the brain's neural pathways. BUAN quantifies microstructural properties of white matter tracts, supporting along-tract statistical analyses that identify factors associated with brain microstructure. We visualize the 3D profile of white matter tract associations with tau and amyloid burden in Alzheimer's disease; strong associations near the cortex may support models of disease propagation along neural pathways. Relative to the neutral genotype, APOE ϵ3/ϵ3, carriers of the AD-risk conferring APOE ϵ4 genotype show microstructural abnormalities, while carriers of the protective ϵ2 genotype also show subtle differences. Of all the microstructural metrics, mean diffusivity (MD) generally shows the strongest associations with AD pathology, followed by axial diffusivity (AxD) and radial diffusivity (RD), while fractional anisotropy (FA) is typically the least sensitive metric. Along-tract microstructural metrics are sensitive to tau and amyloid accumulation, showing the potential of diffusion MRI to track AD pathology and map its impact on neural pathways.

阿尔茨海默病(AD)的特征是由于淀粉样蛋白- β (Aβ)斑块和tau缠结在大脑中的异常积累而导致认知能力下降和记忆丧失;其发病和进展也取决于遗传因素,如载脂蛋白E (APOE)基因型。了解这些因素如何影响大脑的神经通路对于早期诊断和干预非常重要。束测法是一种先进的白质束三维定量评估技术,可在体内定位病变人群的显微结构异常。在这项工作中,我们对730名ADNI3(阿尔茨海默病神经成像计划3期;年龄范围:55-95岁,349M/381F,轻度认知障碍214例,AD 69例,认知健康对照447例)。通过沿路统计分析,我们评估了淀粉样蛋白、tau蛋白和APOE基因变异对大脑神经通路的局部影响。BUAN量化白质束的微观结构特性,支持沿束统计分析,确定与大脑微观结构相关的因素。我们可视化了阿尔茨海默病中与tau和淀粉样蛋白负荷相关的白质束的3D轮廓;皮层附近的强关联可能支持疾病沿神经通路传播的模型。相对于中性基因型APOE ϵ3/ϵ3,具有ad风险的APOE ϵ4基因型的携带者表现出微观结构异常,而具有保护性的ϵ2基因型的携带者也表现出细微的差异。在所有的显微结构指标中,平均扩散率(MD)通常与AD病理的相关性最强,其次是轴向扩散率(AxD)和径向扩散率(RD),而分数各向异性(FA)通常是最不敏感的指标。沿束微结构指标对tau和淀粉样蛋白积累很敏感,显示了弥散MRI追踪AD病理和绘制其对神经通路影响的潜力。
{"title":"Amyloid, Tau, and APOE in Alzheimer's Disease: Impact on White Matter Tracts.","authors":"Bramsh Qamar Chandio, Julio E Villalon-Reina, Talia M Nir, Sophia I Thomopoulos, Yixue Feng, Sebastian Benavidez, Neda Jahanshad, Jaroslaw Harezlak, Eleftherios Garyfallidis, Paul M Thompson","doi":"10.1142/9789819807024_0029","DOIUrl":"10.1142/9789819807024_0029","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is characterized by cognitive decline and memory loss due to the abnormal accumulation of amyloid-beta (Aβ) plaques and tau tangles in the brain; its onset and progression also depend on genetic factors such as the apolipoprotein E (APOE) genotype. Understanding how these factors affect the brain's neural pathways is important for early diagnostics and interventions. Tractometry is an advanced technique for 3D quantitative assessment of white matter tracts, localizing microstructural abnormalities in diseased populations in vivo. In this work, we applied BUAN (Bundle Analytics) tractometry to 3D diffusion MRI data from 730 participants in ADNI3 (phase 3 of the Alzheimer's Disease Neuroimaging Initiative; age range: 55-95 years, 349M/381F, 214 with mild cognitive impairment, 69 with AD, and 447 cognitively healthy controls). Using along-tract statistical analysis, we assessed the localized impact of amyloid, tau, and APOE genetic variants on the brain's neural pathways. BUAN quantifies microstructural properties of white matter tracts, supporting along-tract statistical analyses that identify factors associated with brain microstructure. We visualize the 3D profile of white matter tract associations with tau and amyloid burden in Alzheimer's disease; strong associations near the cortex may support models of disease propagation along neural pathways. Relative to the neutral genotype, APOE ϵ3/ϵ3, carriers of the AD-risk conferring APOE ϵ4 genotype show microstructural abnormalities, while carriers of the protective ϵ2 genotype also show subtle differences. Of all the microstructural metrics, mean diffusivity (MD) generally shows the strongest associations with AD pathology, followed by axial diffusivity (AxD) and radial diffusivity (RD), while fractional anisotropy (FA) is typically the least sensitive metric. Along-tract microstructural metrics are sensitive to tau and amyloid accumulation, showing the potential of diffusion MRI to track AD pathology and map its impact on neural pathways.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"394-411"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncovering Important Diagnostic Features for Alzheimer's, Parkinson's and Other Dementias Using Interpretable Association Mining Methods. 利用可解释的关联挖掘方法揭示阿尔茨海默病、帕金森病和其他痴呆症的重要诊断特征。
Kazi Noshin, Mary Regina Boland, Bojian Hou, Victoria Lu, Carol Manning, Li Shen, Aidong Zhang

Alzheimer's Disease and Related Dementias (ADRD) afflict almost 7 million people in the USA alone. The majority of research in ADRD is conducted using post-mortem samples of brain tissue or carefully recruited clinical trial patients. While these resources are excellent, they suffer from lack of sex/gender, and racial/ethnic inclusiveness. Electronic Health Records (EHR) data has the potential to bridge this gap by including real-world ADRD patients treated during routine clinical care. In this study, we utilize EHR data from a cohort of 70,420 ADRD patients diagnosed and treated at Penn Medicine. Our goal is to uncover important risk features leading to three types of Neuro-Degenerative Disorders (NDD), including Alzheimer's Disease (AD), Parkinson's Disease (PD) and Other Dementias (OD). We employ a variety of Machine Learning (ML) Methods, including uni-variate and multivariate ML approaches and compare accuracies across the ML methods. We also investigate the types of features identified by each method, the overlapping features and the unique features to highlight important advantages and disadvantages of each approach specific for certain NDD types. Our study is important for those interested in studying ADRD and NDD in EHRs as it highlights the strengths and limitations of popular approaches employed in the ML community. We found that the uni-variate approach was able to uncover features that were important and rare for specific types of NDD (AD, PD, OD), which is important from a clinical perspective. Features that were found across all methods represent features that are the most robust.

阿尔茨海默病和相关痴呆(ADRD)仅在美国就折磨着近700万人。大多数关于ADRD的研究都是使用死后脑组织样本或精心招募的临床试验患者进行的。虽然这些资源很好,但它们缺乏性别/性别和种族/民族包容性。电子健康记录(EHR)数据有可能通过包括常规临床护理期间治疗的真实ADRD患者来弥合这一差距。在这项研究中,我们利用了宾夕法尼亚大学医学院诊断和治疗的70420名ADRD患者的电子病历数据。我们的目标是揭示导致三种神经退行性疾病(NDD)的重要风险特征,包括阿尔茨海默病(AD),帕金森病(PD)和其他痴呆症(OD)。我们采用各种机器学习(ML)方法,包括单变量和多变量ML方法,并比较ML方法的准确性。我们还研究了每种方法所识别的特征类型、重叠特征和独特特征,以突出每种方法针对特定NDD类型的重要优点和缺点。我们的研究对于那些对研究电子病历中的ADRD和NDD感兴趣的人很重要,因为它突出了ML社区采用的流行方法的优势和局限性。我们发现单变量方法能够揭示特定类型NDD (AD, PD, OD)的重要且罕见的特征,这从临床角度来看是重要的。在所有方法中发现的特征代表了最健壮的特征。
{"title":"Uncovering Important Diagnostic Features for Alzheimer's, Parkinson's and Other Dementias Using Interpretable Association Mining Methods.","authors":"Kazi Noshin, Mary Regina Boland, Bojian Hou, Victoria Lu, Carol Manning, Li Shen, Aidong Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's Disease and Related Dementias (ADRD) afflict almost 7 million people in the USA alone. The majority of research in ADRD is conducted using post-mortem samples of brain tissue or carefully recruited clinical trial patients. While these resources are excellent, they suffer from lack of sex/gender, and racial/ethnic inclusiveness. Electronic Health Records (EHR) data has the potential to bridge this gap by including real-world ADRD patients treated during routine clinical care. In this study, we utilize EHR data from a cohort of 70,420 ADRD patients diagnosed and treated at Penn Medicine. Our goal is to uncover important risk features leading to three types of Neuro-Degenerative Disorders (NDD), including Alzheimer's Disease (AD), Parkinson's Disease (PD) and Other Dementias (OD). We employ a variety of Machine Learning (ML) Methods, including uni-variate and multivariate ML approaches and compare accuracies across the ML methods. We also investigate the types of features identified by each method, the overlapping features and the unique features to highlight important advantages and disadvantages of each approach specific for certain NDD types. Our study is important for those interested in studying ADRD and NDD in EHRs as it highlights the strengths and limitations of popular approaches employed in the ML community. We found that the uni-variate approach was able to uncover features that were important and rare for specific types of NDD (AD, PD, OD), which is important from a clinical perspective. Features that were found across all methods represent features that are the most robust.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"631-646"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Dynamic Model for Early Prediction of Alzheimer's Disease by Leveraging Graph Convolutional Networks and Tensor Algebra. 利用图卷积网络和张量代数的阿尔茨海默病早期预测动态模型。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0048
Cagri Ozdemir, Mohammad Al Olaimat, Serdar Bozdag

Alzheimer's disease (AD) is a neurocognitive disorder that deteriorates memory and impairs cognitive functions. Mild Cognitive Impairment (MCI) is generally considered as an intermediate phase between normal cognitive aging and more severe conditions such as AD. Although not all individuals with MCI will develop AD, they are at an increased risk of developing AD. Diagnosing AD once strong symptoms are already present is of limited value, as AD leads to irreversible cognitive decline and brain damage. Thus, it is crucial to develop methods for the early prediction of AD in individuals with MCI. Recurrent Neural Networks (RNN)-based methods have been effectively used to predict the progression from MCI to AD by analyzing electronic health records (EHR). However, despite their widespread use, existing RNN-based tools may introduce increased model complexity and often face difficulties in capturing long-term dependencies. In this study, we introduced a novel Dynamic deep learning model for Early Prediction of AD (DyEPAD) to predict MCI subjects' progression to AD utilizing EHR data. In the first phase of DyEPAD, embeddings for each time step or visit are captured through Graph Convolutional Networks (GCN) and aggregation functions. In the final phase, DyEPAD employs tensor algebraic operations for frequency domain analysis of these embeddings, capturing the full scope of evolutionary patterns across all time steps. Our experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets demonstrate that our proposed model outperforms or is in par with the state-of-the-art and baseline methods.

阿尔茨海默病(AD)是一种神经认知障碍,会使记忆恶化,损害认知功能。轻度认知损伤(Mild Cognitive Impairment, MCI)通常被认为是介于正常认知老化和AD等更严重疾病之间的中间阶段。虽然并非所有轻度认知障碍患者都会发展为AD,但他们患AD的风险增加了。一旦出现强烈症状,诊断阿尔茨海默病的价值就有限了,因为阿尔茨海默病会导致不可逆转的认知能力下降和脑损伤。因此,开发早期预测MCI患者AD的方法至关重要。基于递归神经网络(RNN)的方法已被有效地用于通过分析电子健康记录(EHR)来预测从MCI到AD的进展。然而,尽管它们被广泛使用,现有的基于rnn的工具可能会引入增加的模型复杂性,并且在捕获长期依赖关系方面经常面临困难。在这项研究中,我们引入了一种新的AD早期预测动态深度学习模型(DyEPAD),利用电子病历数据预测MCI受试者向AD的进展。在染料pad的第一阶段,通过图卷积网络(GCN)和聚合函数捕获每个时间步或访问的嵌入。在最后阶段,染料pad采用张量代数运算对这些嵌入进行频域分析,捕捉所有时间步长的进化模式的全部范围。我们在阿尔茨海默病神经影像学倡议(ADNI)和国家阿尔茨海默病协调中心(NACC)数据集上的实验表明,我们提出的模型优于或与最先进的基线方法相当。
{"title":"A Dynamic Model for Early Prediction of Alzheimer's Disease by Leveraging Graph Convolutional Networks and Tensor Algebra.","authors":"Cagri Ozdemir, Mohammad Al Olaimat, Serdar Bozdag","doi":"10.1142/9789819807024_0048","DOIUrl":"10.1142/9789819807024_0048","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a neurocognitive disorder that deteriorates memory and impairs cognitive functions. Mild Cognitive Impairment (MCI) is generally considered as an intermediate phase between normal cognitive aging and more severe conditions such as AD. Although not all individuals with MCI will develop AD, they are at an increased risk of developing AD. Diagnosing AD once strong symptoms are already present is of limited value, as AD leads to irreversible cognitive decline and brain damage. Thus, it is crucial to develop methods for the early prediction of AD in individuals with MCI. Recurrent Neural Networks (RNN)-based methods have been effectively used to predict the progression from MCI to AD by analyzing electronic health records (EHR). However, despite their widespread use, existing RNN-based tools may introduce increased model complexity and often face difficulties in capturing long-term dependencies. In this study, we introduced a novel Dynamic deep learning model for Early Prediction of AD (DyEPAD) to predict MCI subjects' progression to AD utilizing EHR data. In the first phase of DyEPAD, embeddings for each time step or visit are captured through Graph Convolutional Networks (GCN) and aggregation functions. In the final phase, DyEPAD employs tensor algebraic operations for frequency domain analysis of these embeddings, capturing the full scope of evolutionary patterns across all time steps. Our experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets demonstrate that our proposed model outperforms or is in par with the state-of-the-art and baseline methods.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"675-689"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649016/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comprehensive Bibliometric Analysis: Celebrating the Thirtieth Anniversary of the Pacific Symposium on Biocomputing. 综合文献计量学分析:庆祝太平洋生物计算研讨会三十周年。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0001
Rachit Kumar, Rasika Venkatesh, David Y Zhang, Teri E Klein, Marylyn D Ritchie

The 2025 Pacific Symposium on Biocomputing (PSB) represents a remarkable milestone, as it is the thirtieth anniversary of PSB. We use this opportunity to analyze the bibliometric output of 30 years of PSB publications in a wide range of analyses with a focus on various eras that represent important disruptive breakpoints in the field of bioinformatics and biocomputing. These include an analysis of paper topics and keywords, flight emissions produced by travel to PSB by authors, citation and co-authorship networks and metrics, and a broad assessment of diversity and representation in PSB authors. We use the results of these analyses to identify insights that we can carry forward to the upcoming decades of PSB.

2025年太平洋生物计算研讨会(PSB)是一个非凡的里程碑,因为它是PSB的三十周年纪念。我们利用这个机会对30年来PSB出版物的文献计量输出进行了广泛的分析,重点关注了代表生物信息学和生物计算领域重要破坏性断点的各个时代。其中包括对论文主题和关键词的分析,作者前往PSB旅行产生的飞行排放,引用和合著网络和指标,以及对PSB作者多样性和代表性的广泛评估。我们使用这些分析的结果来确定我们可以在未来几十年的PSB中发扬光大的见解。
{"title":"A Comprehensive Bibliometric Analysis: Celebrating the Thirtieth Anniversary of the Pacific Symposium on Biocomputing.","authors":"Rachit Kumar, Rasika Venkatesh, David Y Zhang, Teri E Klein, Marylyn D Ritchie","doi":"10.1142/9789819807024_0001","DOIUrl":"10.1142/9789819807024_0001","url":null,"abstract":"<p><p>The 2025 Pacific Symposium on Biocomputing (PSB) represents a remarkable milestone, as it is the thirtieth anniversary of PSB. We use this opportunity to analyze the bibliometric output of 30 years of PSB publications in a wide range of analyses with a focus on various eras that represent important disruptive breakpoints in the field of bioinformatics and biocomputing. These include an analysis of paper topics and keywords, flight emissions produced by travel to PSB by authors, citation and co-authorship networks and metrics, and a broad assessment of diversity and representation in PSB authors. We use the results of these analyses to identify insights that we can carry forward to the upcoming decades of PSB.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Large Language Models for Efficient Cancer Registry Coding in the Real Hospital Setting: A Feasibility Study. 在真实医院环境中使用大型语言模型进行有效的癌症登记编码:可行性研究。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0010
Chen-Kai Wang, Cheng-Rong Ke, Ming-Siang Huang, Inn-Wen Chong, Yi-Hsin Yang, Vincent S Tseng, Hong-Jie Dai

The primary challenge in reporting cancer cases lies in the labor-intensive and time-consuming process of manually reviewing numerous reports. Current methods predominantly rely on rule-based approaches or custom-supervised learning models, which predict diagnostic codes based on a single pathology report per patient. Although these methods show promising evaluation results, their biased outcomes in controlled settings may hinder adaption to real-world reporting workflows. In this feasibility study, we focused on lung cancer as a test case and developed an agentic retrieval-augmented generation (RAG) system to evaluate the potential of publicly available large language models (LLMs) for cancer registry coding. Our findings demonstrate that: (1) directly applying publicly available LLMs without fine-tuning is feasible for cancer registry coding; and (2) prompt engineering can significantly enhance the capability of pre-trained LLMs in cancer registry coding. The off-the-shelf LLM, combined with our proposed system architecture and basic prompts, achieved a macro-averaged F-score of 0.637 when evaluated on testing data consisting of patients' medical reports spanning 1.5 years since their first visit. By employing chain of thought (CoT) reasoning and our proposed coding item grouping, the system outperformed the baseline by 0.187 in terms of the macro-averaged F-score. These findings demonstrate the great potential of leveraging LLMs with prompt engineering for cancer registry coding. Our system could offer cancer registrars a promising reference tool to enhance their daily workflow, improving efficiency and accuracy in cancer case reporting.

报告癌症病例的主要挑战在于人工审查大量报告的劳动密集型和耗时过程。目前的方法主要依赖于基于规则的方法或自定义监督的学习模型,这些模型基于每个患者的单个病理报告来预测诊断代码。尽管这些方法显示出有希望的评估结果,但它们在受控环境中的偏差结果可能会阻碍对现实世界报告工作流程的适应。在这项可行性研究中,我们将重点放在肺癌作为测试案例,并开发了一个代理检索增强生成(RAG)系统,以评估公开可用的大型语言模型(llm)用于癌症注册编码的潜力。研究结果表明:(1)直接应用公开的llm进行癌症注册编码是可行的;(2)快速工程可以显著提高预训练llm在癌症注册编码中的能力。现成的LLM结合我们提出的系统架构和基本提示,在由患者首次就诊后1.5年的医疗报告组成的测试数据中进行评估时,获得了0.637的宏观平均f分。通过采用思维链(CoT)推理和我们提出的编码项目分组,系统在宏观平均f得分方面优于基线0.187。这些发现证明了利用法学硕士和快速工程进行癌症登记编码的巨大潜力。我们的系统可以为癌症登记员提供一个有前途的参考工具,以改善他们的日常工作流程,提高癌症病例报告的效率和准确性。
{"title":"Using Large Language Models for Efficient Cancer Registry Coding in the Real Hospital Setting: A Feasibility Study.","authors":"Chen-Kai Wang, Cheng-Rong Ke, Ming-Siang Huang, Inn-Wen Chong, Yi-Hsin Yang, Vincent S Tseng, Hong-Jie Dai","doi":"10.1142/9789819807024_0010","DOIUrl":"10.1142/9789819807024_0010","url":null,"abstract":"<p><p>The primary challenge in reporting cancer cases lies in the labor-intensive and time-consuming process of manually reviewing numerous reports. Current methods predominantly rely on rule-based approaches or custom-supervised learning models, which predict diagnostic codes based on a single pathology report per patient. Although these methods show promising evaluation results, their biased outcomes in controlled settings may hinder adaption to real-world reporting workflows. In this feasibility study, we focused on lung cancer as a test case and developed an agentic retrieval-augmented generation (RAG) system to evaluate the potential of publicly available large language models (LLMs) for cancer registry coding. Our findings demonstrate that: (1) directly applying publicly available LLMs without fine-tuning is feasible for cancer registry coding; and (2) prompt engineering can significantly enhance the capability of pre-trained LLMs in cancer registry coding. The off-the-shelf LLM, combined with our proposed system architecture and basic prompts, achieved a macro-averaged F-score of 0.637 when evaluated on testing data consisting of patients' medical reports spanning 1.5 years since their first visit. By employing chain of thought (CoT) reasoning and our proposed coding item grouping, the system outperformed the baseline by 0.187 in terms of the macro-averaged F-score. These findings demonstrate the great potential of leveraging LLMs with prompt engineering for cancer registry coding. Our system could offer cancer registrars a promising reference tool to enhance their daily workflow, improving efficiency and accuracy in cancer case reporting.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"121-137"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1