首页 > 最新文献

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing最新文献

英文 中文
Command line to pipeLine: Cross-biobank analyses with Nextflow. 命令行到管道:跨生物银行分析与Nextflow。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0050
Anurag Verma, Zachary Rodriguez, Lindsay Guare, Katie Cardone, Christopher Carson

Biobanks hold immense potential for genomic research, but fragmented data and incompatible tools slow progress. This workshop equipped participants with Nextflow, a powerful workflow language to streamline bioinformatic analyses across biobanks. We taught participants to write code in their preferred language and demonstrated how Nextflow handles the complexities, ensuring consistent, reproducible results across different platforms. This interactive session was ideal for beginner-to-intermediate researchers who want to (1) Leverage biobank data for genomic discoveries, (2) Build portable and scalable analysis pipelines, (3) Ensure reproducibility in their findings, (4) Gain hands-on experience through presentations, demonstrations, tutorials, and discussions with bioinformatics experts.

生物银行在基因组研究方面拥有巨大的潜力,但零散的数据和不兼容的工具阻碍了进展。本次研讨会为参与者提供了Nextflow,这是一种功能强大的工作流程语言,可以简化跨生物库的生物信息分析。我们教参与者用他们喜欢的语言编写代码,并演示Nextflow如何处理复杂性,确保跨不同平台的一致,可重复的结果。这个互动会议非常适合初学者到中级研究人员,他们希望(1)利用生物银行数据进行基因组发现,(2)建立便携式和可扩展的分析管道,(3)确保其发现的可重复性,(4)通过演示,演示,教程和与生物信息学专家的讨论获得实践经验。
{"title":"Command line to pipeLine: Cross-biobank analyses with Nextflow.","authors":"Anurag Verma, Zachary Rodriguez, Lindsay Guare, Katie Cardone, Christopher Carson","doi":"10.1142/9789819807024_0050","DOIUrl":"10.1142/9789819807024_0050","url":null,"abstract":"<p><p>Biobanks hold immense potential for genomic research, but fragmented data and incompatible tools slow progress. This workshop equipped participants with Nextflow, a powerful workflow language to streamline bioinformatic analyses across biobanks. We taught participants to write code in their preferred language and demonstrated how Nextflow handles the complexities, ensuring consistent, reproducible results across different platforms. This interactive session was ideal for beginner-to-intermediate researchers who want to (1) Leverage biobank data for genomic discoveries, (2) Build portable and scalable analysis pipelines, (3) Ensure reproducibility in their findings, (4) Gain hands-on experience through presentations, demonstrations, tutorials, and discussions with bioinformatics experts.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"696-701"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electronic Health Record Analysis for Personalized Medicine: Predicting Malnutrition-Related Health Outcomes and Secondary Neuropsychiatric Health Concerns. 用于个性化医疗的电子健康记录分析:预测与营养不良相关的健康结果和继发性神经精神健康问题。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0043
Pinar Gurkas, Gunnur Karakurt

Malnutrition poses risks regarding cognitive, behavioral, and physical well-being. The aim of this study was to investigate the prevalent health issues associated with malnutrition by utilizing electronic health records (EHR) data. The IBM Watson Health, Explorys platform was used to access the EHR data. Two cohorts were created by two queries; patients with a history of malnutrition (n=5180) and patients without a history of malnutrition diagnosis (n= 413890). The log odds ratio and χ2 statistic were used to identify the statistically significant differences between these two cohorts. We found that there were 35 terms that were more common among the cohort with the malnutrition diagnosis. These terms were categorized under developmental anomalies, infectious agents, respiratory system issues, digestive system issues, pregnancy/prenatal problems, mental, behavioral, or neurodevelopmental disorders, diseases of the ear or mastoid process, diseases of the visual system, and chromosomal anomalies. The management of malnutrition in children is a complex problem that can be addressed with a multifactorial approach. Based on the key themes emerging from among the commonly prevalent terms identified in our study, infection prevention, education in appropriate nutritional solutions for digestive health issues, supportive services to address neurodevelopmental needs, and quality prenatal healthcare would constitute beneficial prevention efforts. Improving our understanding of malnutrition is necessary to develop new interventions for prevention and treatment.

营养不良会给认知、行为和身体健康带来风险。本研究的目的是利用电子健康记录(EHR)数据调查与营养不良相关的普遍健康问题。使用IBM Watson Health, Explorys平台访问EHR数据。两个队列由两个查询创建;有营养不良史的患者(n=5180)和无营养不良诊断史的患者(n= 413890)。采用对数比值比和χ2统计分析两组间差异有统计学意义。我们发现有35个术语在诊断为营养不良的队列中更为常见。这些术语被分类为发育异常、传染因子、呼吸系统问题、消化系统问题、怀孕/产前问题、精神、行为或神经发育障碍、耳或乳突疾病、视觉系统疾病和染色体异常。儿童营养不良的管理是一个复杂的问题,可以通过多因素方法来解决。基于在我们的研究中确定的常见术语中出现的关键主题,感染预防,针对消化系统健康问题的适当营养解决方案的教育,解决神经发育需求的支持性服务以及高质量的产前保健将构成有益的预防工作。提高我们对营养不良的认识对于开发新的预防和治疗干预措施是必要的。
{"title":"Electronic Health Record Analysis for Personalized Medicine: Predicting Malnutrition-Related Health Outcomes and Secondary Neuropsychiatric Health Concerns.","authors":"Pinar Gurkas, Gunnur Karakurt","doi":"10.1142/9789819807024_0043","DOIUrl":"10.1142/9789819807024_0043","url":null,"abstract":"<p><p>Malnutrition poses risks regarding cognitive, behavioral, and physical well-being. The aim of this study was to investigate the prevalent health issues associated with malnutrition by utilizing electronic health records (EHR) data. The IBM Watson Health, Explorys platform was used to access the EHR data. Two cohorts were created by two queries; patients with a history of malnutrition (n=5180) and patients without a history of malnutrition diagnosis (n= 413890). The log odds ratio and χ2 statistic were used to identify the statistically significant differences between these two cohorts. We found that there were 35 terms that were more common among the cohort with the malnutrition diagnosis. These terms were categorized under developmental anomalies, infectious agents, respiratory system issues, digestive system issues, pregnancy/prenatal problems, mental, behavioral, or neurodevelopmental disorders, diseases of the ear or mastoid process, diseases of the visual system, and chromosomal anomalies. The management of malnutrition in children is a complex problem that can be addressed with a multifactorial approach. Based on the key themes emerging from among the commonly prevalent terms identified in our study, infection prevention, education in appropriate nutritional solutions for digestive health issues, supportive services to address neurodevelopmental needs, and quality prenatal healthcare would constitute beneficial prevention efforts. Improving our understanding of malnutrition is necessary to develop new interventions for prevention and treatment.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"599-613"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649008/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Connecting intermediate phenotypes to disease using multi-omics in heart failure. 在心力衰竭中使用多组学连接中间表型与疾病。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0036
Anni Moore, Rasika Venkatesh, Michael G Levin, Scott M Damrauer, Nosheen Reza, Thomas P Cappola, Marylyn D Ritchie

Heart failure (HF) is one of the most common, complex, heterogeneous diseases in the world, with over 1-3% of the global population living with the condition. Progression of HF can be tracked via MRI measures of structural and functional changes to the heart, namely left ventricle (LV), including ejection fraction, mass, end-diastolic volume, and LV end-systolic volume. Moreover, while genome-wide association studies (GWAS) have been a useful tool to identify candidate variants involved in HF risk, they lack crucial tissue-specific and mechanistic information which can be gained from incorporating additional data modalities. This study addresses this gap by incorporating transcriptome-wide and proteome-wide association studies (TWAS and PWAS) to gain insights into genetically-regulated changes in gene expression and protein abundance in precursors to HF measured using MRI-derived cardiac measures as well as full-stage all-cause HF. We identified several gene and protein overlaps between LV ejection fraction and end-systolic volume measures. Many of the overlaps identified in MRI-derived measurements through TWAS and PWAS appear to be shared with all-cause HF. We implicate many putative pathways relevant in HF associated with these genes and proteins via gene-set enrichment and protein-protein interaction network approaches. The results of this study (1) highlight the benefit of using multi-omics to better understand genetics and (2) provide novel insights as to how changes in heart structure and function may relate to HF.

心力衰竭(HF)是世界上最常见、最复杂、最异质性的疾病之一,全球有超过1-3%的人口患有此病。心衰的进展可以通过MRI测量心脏,即左心室(LV)的结构和功能变化来跟踪,包括射血分数、质量、舒张末期容积和左心室收缩末期容积。此外,尽管全基因组关联研究(GWAS)是识别与HF风险相关的候选变异的有用工具,但它们缺乏关键的组织特异性和机制信息,而这些信息可以通过合并其他数据模式获得。本研究通过结合转录组和蛋白质组关联研究(TWAS和PWAS)来解决这一空白,以深入了解使用mri衍生的心脏测量以及全期全因HF测量的HF前体中基因表达和蛋白质丰度的遗传调控变化。我们发现在左室射血分数和收缩末期容积测量之间有几个基因和蛋白质重叠。通过TWAS和PWAS在mri衍生的测量中发现的许多重叠似乎与全因HF共有。我们通过基因集富集和蛋白-蛋白相互作用网络方法暗示了许多与HF相关的假定途径与这些基因和蛋白质相关。这项研究的结果(1)强调了使用多组学来更好地理解遗传学的好处;(2)为心脏结构和功能的变化如何与HF相关提供了新的见解。
{"title":"Connecting intermediate phenotypes to disease using multi-omics in heart failure.","authors":"Anni Moore, Rasika Venkatesh, Michael G Levin, Scott M Damrauer, Nosheen Reza, Thomas P Cappola, Marylyn D Ritchie","doi":"10.1142/9789819807024_0036","DOIUrl":"10.1142/9789819807024_0036","url":null,"abstract":"<p><p>Heart failure (HF) is one of the most common, complex, heterogeneous diseases in the world, with over 1-3% of the global population living with the condition. Progression of HF can be tracked via MRI measures of structural and functional changes to the heart, namely left ventricle (LV), including ejection fraction, mass, end-diastolic volume, and LV end-systolic volume. Moreover, while genome-wide association studies (GWAS) have been a useful tool to identify candidate variants involved in HF risk, they lack crucial tissue-specific and mechanistic information which can be gained from incorporating additional data modalities. This study addresses this gap by incorporating transcriptome-wide and proteome-wide association studies (TWAS and PWAS) to gain insights into genetically-regulated changes in gene expression and protein abundance in precursors to HF measured using MRI-derived cardiac measures as well as full-stage all-cause HF. We identified several gene and protein overlaps between LV ejection fraction and end-systolic volume measures. Many of the overlaps identified in MRI-derived measurements through TWAS and PWAS appear to be shared with all-cause HF. We implicate many putative pathways relevant in HF associated with these genes and proteins via gene-set enrichment and protein-protein interaction network approaches. The results of this study (1) highlight the benefit of using multi-omics to better understand genetics and (2) provide novel insights as to how changes in heart structure and function may relate to HF.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"504-521"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11822568/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Ancestry on Genome-Wide Association Studies. 祖先对全基因组关联研究的影响。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0019
Steven Christopher Jones, Katie M Cardone, Yuki Bradford, Sarah A Tishkoff, Marylyn D Ritchie

Genome-wide association studies (GWAS) are an important tool for the study of complex disease genetics. Decisions regarding the quality control (QC) procedures employed as part of a GWAS can have important implications on the results and their biological interpretation. Many GWAS have been conducted predominantly in cohorts of European ancestry, but many initiatives aim to increase the representation of diverse ancestries in genetic studies. The question of how these data should be combined and the consequences that genetic variation across ancestry groups might have on GWAS results warrants further investigation. In this study, we focus on several commonly used methods for combining genetic data across diverse ancestry groups and the impact these decisions have on the outcome of GWAS summary statistics. We ran GWAS on two binary phenotypes using ancestry-specific, multi-ancestry mega-analysis, and meta-analysis approaches. We found that while multi-ancestry mega-analysis and meta-analysis approaches can aid in identifying signals shared across ancestries, they can diminish the signal of ancestry-specific associations and modify their effect sizes. These results demonstrate the potential impact on downstream post-GWAS analyses and follow-up studies. Decisions regarding how the genetic data are combined has the potential to mask important findings that might serve individuals of ancestries that have been historically underrepresented in genetic studies. New methods that consider ancestry-specific variants in conjunction with the shared variants need to be developed.

全基因组关联研究(GWAS)是研究复杂疾病遗传学的重要工具。作为GWAS的一部分,关于质量控制(QC)程序的决定可能对结果及其生物学解释具有重要影响。许多GWAS主要是在欧洲血统的人群中进行的,但许多倡议旨在增加遗传研究中不同祖先的代表性。如何将这些数据结合起来,以及不同祖先群体的遗传变异可能对GWAS结果产生的影响,这些问题值得进一步研究。在这项研究中,我们关注几种常用的方法来组合不同祖先群体的遗传数据,以及这些决定对GWAS汇总统计结果的影响。我们使用祖先特异性、多祖先大型分析和荟萃分析方法对两种二元表型进行了GWAS。我们发现,虽然多祖先大分析和荟萃分析方法可以帮助识别跨祖先共享的信号,但它们可以减少特定祖先关联的信号并修改其效应大小。这些结果显示了对下游gwas后分析和后续研究的潜在影响。关于基因数据如何组合的决定有可能掩盖重要的发现,这些发现可能服务于历史上在基因研究中代表性不足的祖先个体。需要开发将特定于祖先的变体与共享变体结合起来考虑的新方法。
{"title":"The Impact of Ancestry on Genome-Wide Association Studies.","authors":"Steven Christopher Jones, Katie M Cardone, Yuki Bradford, Sarah A Tishkoff, Marylyn D Ritchie","doi":"10.1142/9789819807024_0019","DOIUrl":"10.1142/9789819807024_0019","url":null,"abstract":"<p><p>Genome-wide association studies (GWAS) are an important tool for the study of complex disease genetics. Decisions regarding the quality control (QC) procedures employed as part of a GWAS can have important implications on the results and their biological interpretation. Many GWAS have been conducted predominantly in cohorts of European ancestry, but many initiatives aim to increase the representation of diverse ancestries in genetic studies. The question of how these data should be combined and the consequences that genetic variation across ancestry groups might have on GWAS results warrants further investigation. In this study, we focus on several commonly used methods for combining genetic data across diverse ancestry groups and the impact these decisions have on the outcome of GWAS summary statistics. We ran GWAS on two binary phenotypes using ancestry-specific, multi-ancestry mega-analysis, and meta-analysis approaches. We found that while multi-ancestry mega-analysis and meta-analysis approaches can aid in identifying signals shared across ancestries, they can diminish the signal of ancestry-specific associations and modify their effect sizes. These results demonstrate the potential impact on downstream post-GWAS analyses and follow-up studies. Decisions regarding how the genetic data are combined has the potential to mask important findings that might serve individuals of ancestries that have been historically underrepresented in genetic studies. New methods that consider ancestry-specific variants in conjunction with the shared variants need to be developed.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"251-267"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11694900/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session Introduction: Translating Big Data Imaging Genomics Findings to the Individual: Prediction of Risks and Outcomes in Neuropsychiatric Illnesses. 会议简介:将大数据成像基因组学研究成果转化为个人数据:预测神经精神疾病的风险和结果。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0042
Peter Kochunov, Li Shen, Zhongming Zhao, Paul M Thompson

This PSB 2025 session is focused on opportunities, challenges and solutions for translating Big Data Imaging Genomic findings toward powering decision making in personalized medicine and guiding individual clinical decisions. It combines many of the scientific directions that are of interest to PSB members including Big Data analyses, pattern recognition, machine learning and AI, electronic health records and others.

本次PSB 2025会议的重点是将大数据成像基因组研究成果转化为推动个性化医疗决策和指导个人临床决策的机遇、挑战和解决方案。它结合了PSB成员感兴趣的许多科学方向,包括大数据分析、模式识别、机器学习和人工智能、电子健康记录等。
{"title":"Session Introduction: Translating Big Data Imaging Genomics Findings to the Individual: Prediction of Risks and Outcomes in Neuropsychiatric Illnesses.","authors":"Peter Kochunov, Li Shen, Zhongming Zhao, Paul M Thompson","doi":"10.1142/9789819807024_0042","DOIUrl":"10.1142/9789819807024_0042","url":null,"abstract":"<p><p>This PSB 2025 session is focused on opportunities, challenges and solutions for translating Big Data Imaging Genomic findings toward powering decision making in personalized medicine and guiding individual clinical decisions. It combines many of the scientific directions that are of interest to PSB members including Big Data analyses, pattern recognition, machine learning and AI, electronic health records and others.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"594-598"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial Allies: Validation of Synthetic Text for Peer Support Tools through Data Augmentation in NLP Model Development. 人工盟友:通过NLP模型开发中的数据增强来验证同伴支持工具的合成文本。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0008
Josué Godeme, Julia Hill, Stephen P Gaughan, Wade J Hirschbuhl, Amanda J Emerson, Christian Darabos, Carly A Bobak, Karen L Fortuna

This study investigates the potential of using synthetic text to augment training data for Natural Language Processing (NLP) models, specifically within the context of peer support tools. We surveyed 22 participants-13 professional peer supporters and 9 AI-proficient individuals-tasked with distinguishing between AI-generated and human-written sentences. Using signal detection theory and confidence-based metrics, we evaluated the accuracy and confidence levels of both groups. The results show no significant differences in rater agreement between the two groups (p = 0.116), with overall classification accuracy falling below chance levels (mean accuracy = 43.10%, p < 0.001). Both groups exhibited a tendency to misclassify low-fidelity sentences as AI-generated, with peer supporters showing a significant bias (p = 0.007). Further analysis revealed a significant negative correlation between errors and confidence among AI-proficient raters (r = -0.429, p < 0.001), suggesting that as their confidence increased, their error rates decreased. Our findings support the feasibility of using synthetic text to mimic human communication, with important implications for improving the fidelity of peer support interventions through NLP model development.

本研究探讨了使用合成文本来增强自然语言处理(NLP)模型训练数据的潜力,特别是在同伴支持工具的背景下。我们调查了22名参与者——13名专业的同行支持者和9名精通人工智能的个人——他们的任务是区分人工智能生成的句子和人类写的句子。使用信号检测理论和基于置信度的指标,我们评估了两组的准确性和置信度。结果显示两组之间的一致性无显著差异(p = 0.116),总体分类准确率低于机会水平(平均准确率= 43.10%,p < 0.001)。两组都倾向于将低保真度的句子错误地分类为人工智能生成的句子,同伴支持者表现出明显的偏见(p = 0.007)。进一步分析显示,人工智能熟练评分者的错误率与置信度之间存在显著的负相关(r = -0.429, p < 0.001),这表明随着他们置信度的增加,他们的错误率下降。我们的研究结果支持使用合成文本模拟人类交流的可行性,并对通过NLP模型开发提高同伴支持干预的保真度具有重要意义。
{"title":"Artificial Allies: Validation of Synthetic Text for Peer Support Tools through Data Augmentation in NLP Model Development.","authors":"Josué Godeme, Julia Hill, Stephen P Gaughan, Wade J Hirschbuhl, Amanda J Emerson, Christian Darabos, Carly A Bobak, Karen L Fortuna","doi":"10.1142/9789819807024_0008","DOIUrl":"10.1142/9789819807024_0008","url":null,"abstract":"<p><p>This study investigates the potential of using synthetic text to augment training data for Natural Language Processing (NLP) models, specifically within the context of peer support tools. We surveyed 22 participants-13 professional peer supporters and 9 AI-proficient individuals-tasked with distinguishing between AI-generated and human-written sentences. Using signal detection theory and confidence-based metrics, we evaluated the accuracy and confidence levels of both groups. The results show no significant differences in rater agreement between the two groups (p = 0.116), with overall classification accuracy falling below chance levels (mean accuracy = 43.10%, p < 0.001). Both groups exhibited a tendency to misclassify low-fidelity sentences as AI-generated, with peer supporters showing a significant bias (p = 0.007). Further analysis revealed a significant negative correlation between errors and confidence among AI-proficient raters (r = -0.429, p < 0.001), suggesting that as their confidence increased, their error rates decreased. Our findings support the feasibility of using synthetic text to mimic human communication, with important implications for improving the fidelity of peer support interventions through NLP model development.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"94-108"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biologically Enhanced Machine Learning Model to uncover Novel Gene-Drug Targets for Alzheimer's Disease. 生物增强机器学习模型揭示阿尔茨海默病的新基因药物靶点。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0032
Alena Orlenko, Mythreye Venkatesan, Li Shen, Marylyn D Ritchie, Zhiping Paul Wang, Tayo Obafemi-Ajayi, Jason H Moore

Given the complexity and multifactorial nature of Alzheimer's disease, investigating potential drug-gene targets is imperative for developing effective therapies and advancing our understanding of the underlying mechanisms driving the disease. We present an explainable ML model that integrates the role and impact of gene interactions to drive the genomic variant feature selection. The model leverages both the Alzheimer's knowledge base and the Drug-Gene interaction database (DGIdb) to identify a list of biologically plausible novel gene-drug targets for further investigation. Model validation is performed on an ethnically diverse study sample obtained from the Alzheimer's Disease Sequencing Project (ADSP), a multi-ancestry multi-cohort genomic study. To mitigate population stratification and spurious associations from ML analysis, we implemented novel data curation methods. The study outcomes include a set of possible gene targets for further functional follow-up and drug repurposing.

鉴于阿尔茨海默病的复杂性和多因素性质,研究潜在的药物基因靶点对于开发有效的治疗方法和提高我们对驱动该疾病的潜在机制的理解是必不可少的。我们提出了一个可解释的机器学习模型,该模型集成了基因相互作用的作用和影响,以驱动基因组变异特征选择。该模型利用阿尔茨海默病知识库和药物-基因相互作用数据库(DGIdb)来确定生物学上合理的新基因-药物靶点列表,以供进一步研究。模型验证是在从阿尔茨海默病测序项目(ADSP)获得的不同种族的研究样本上进行的,这是一项多祖先多队列基因组研究。为了减轻ML分析中的人口分层和虚假关联,我们实施了新的数据管理方法。研究结果包括一组可能的基因靶点,用于进一步的功能随访和药物再利用。
{"title":"Biologically Enhanced Machine Learning Model to uncover Novel Gene-Drug Targets for Alzheimer's Disease.","authors":"Alena Orlenko, Mythreye Venkatesan, Li Shen, Marylyn D Ritchie, Zhiping Paul Wang, Tayo Obafemi-Ajayi, Jason H Moore","doi":"10.1142/9789819807024_0032","DOIUrl":"10.1142/9789819807024_0032","url":null,"abstract":"<p><p>Given the complexity and multifactorial nature of Alzheimer's disease, investigating potential drug-gene targets is imperative for developing effective therapies and advancing our understanding of the underlying mechanisms driving the disease. We present an explainable ML model that integrates the role and impact of gene interactions to drive the genomic variant feature selection. The model leverages both the Alzheimer's knowledge base and the Drug-Gene interaction database (DGIdb) to identify a list of biologically plausible novel gene-drug targets for further investigation. Model validation is performed on an ethnically diverse study sample obtained from the Alzheimer's Disease Sequencing Project (ADSP), a multi-ancestry multi-cohort genomic study. To mitigate population stratification and spurious associations from ML analysis, we implemented novel data curation methods. The study outcomes include a set of possible gene targets for further functional follow-up and drug repurposing.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"441-456"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Dimensionality Reduction Techniques for the Assessment of ASD Biomarkers. 评估ASD生物标志物的无监督降维技术。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0044
Zachary Jacokes, Ian Adoremos, Arham Rameez Hussain, Benjamin T Newman, Kevin A Pelphrey, John Darrell Van Horn

Autism Spectrum Disorder (ASD) encompasses a range of developmental disabilities marked by differences in social functioning, cognition, and behavior. Both genetic and environmental factors are known to contribute to ASD, yet the exact etiological factors remain unclear. Developing integrative models to explore the effects of gene expression on behavioral and cognitive traits attributed to ASD can uncover environmental and genetic interactions. A notable aspect of ASD research is the sex-wise diagnostic disparity: males are diagnosed more frequently than females, which suggests potential sex-specific biological influences. Investigating neuronal microstructure, particularly axonal conduction velocity offers insights into the neural basis of ASD. Developing robust models that evaluate the vast multidimensional datasets generated from genetic and microstructural processing poses significant challenges. Traditional feature selection techniques have limitations; thus, this research aims to integrate principal component analysis (PCA) with supervised machine learning algorithms to navigate the complex data space. By leveraging various neuroimaging techniques and transcriptomics data analysis methods, this methodology builds on traditional implementations of PCA to better contextualize the complex genetic and phenotypic heterogeneity linked to sex differences in ASD and pave the way for tailored interventions.

自闭症谱系障碍(ASD)包括一系列以社会功能、认知和行为差异为特征的发育障碍。已知遗传和环境因素都有助于ASD,但确切的病因尚不清楚。开发整合模型来探索基因表达对ASD行为和认知特征的影响,可以揭示环境和遗传的相互作用。自闭症谱系障碍研究的一个值得注意的方面是性别方面的诊断差异:男性的诊断频率高于女性,这表明潜在的性别特异性生物学影响。研究神经元微观结构,特别是轴突传导速度,有助于深入了解自闭症谱系障碍的神经基础。开发健壮的模型来评估由遗传和微观结构处理产生的大量多维数据集,这构成了重大挑战。传统的特征选择技术存在局限性;因此,本研究旨在将主成分分析(PCA)与监督机器学习算法相结合,以导航复杂的数据空间。通过利用各种神经成像技术和转录组学数据分析方法,该方法建立在传统PCA实现的基础上,以更好地了解与ASD性别差异相关的复杂遗传和表型异质性,并为量身定制的干预措施铺平道路。
{"title":"Unsupervised Dimensionality Reduction Techniques for the Assessment of ASD Biomarkers.","authors":"Zachary Jacokes, Ian Adoremos, Arham Rameez Hussain, Benjamin T Newman, Kevin A Pelphrey, John Darrell Van Horn","doi":"10.1142/9789819807024_0044","DOIUrl":"10.1142/9789819807024_0044","url":null,"abstract":"<p><p>Autism Spectrum Disorder (ASD) encompasses a range of developmental disabilities marked by differences in social functioning, cognition, and behavior. Both genetic and environmental factors are known to contribute to ASD, yet the exact etiological factors remain unclear. Developing integrative models to explore the effects of gene expression on behavioral and cognitive traits attributed to ASD can uncover environmental and genetic interactions. A notable aspect of ASD research is the sex-wise diagnostic disparity: males are diagnosed more frequently than females, which suggests potential sex-specific biological influences. Investigating neuronal microstructure, particularly axonal conduction velocity offers insights into the neural basis of ASD. Developing robust models that evaluate the vast multidimensional datasets generated from genetic and microstructural processing poses significant challenges. Traditional feature selection techniques have limitations; thus, this research aims to integrate principal component analysis (PCA) with supervised machine learning algorithms to navigate the complex data space. By leveraging various neuroimaging techniques and transcriptomics data analysis methods, this methodology builds on traditional implementations of PCA to better contextualize the complex genetic and phenotypic heterogeneity linked to sex differences in ASD and pave the way for tailored interventions.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"614-630"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12262183/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-CGM: A Benchmark for Large Language Model-Enabled Querying of Continuous Glucose Monitoring Data for Conversational Diabetes Management. LLM-CGM:大型语言模型支持的连续葡萄糖监测数据查询基准,用于对话式糖尿病管理。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0007
Elizabeth Healey, Isaac Kohane

Over the past decade, wearable technology has dramatically changed how patients manage chronic diseases. The widespread availability of on-body sensors, such as heart rate monitors and continuous glucose monitoring (CGM) sensors, has allowed patients to have real-time data about their health. Most of these data are readily available on patients' smartphone applications, where patients can view their current and retrospective data. For patients with diabetes, CGM has transformed how their disease is managed. Many sensor devices interface with smartphones to display charts, metrics, and alerts. However, these metrics and plots may be challenging for some patients to interpret. In this work, we explore how large language models (LLMs) can be used to answer questions about CGM data. We produce an open-source benchmark of time-series question-answering tasks for CGM data in diabetes management. We evaluate different LLM frameworks to provide a performance benchmark. Lastly, we highlight the need for more research on how to optimize LLM frameworks to best handle questions about wearable data. Our benchmark is publicly available for future use and development. While this benchmark is specifically designed for diabetes care, our model implementation and several of the statistical tasks can be extended to other wearable device domains.

在过去的十年里,可穿戴技术极大地改变了患者治疗慢性病的方式。广泛使用的身体传感器,如心率监测器和连续血糖监测(CGM)传感器,使患者能够获得有关其健康状况的实时数据。这些数据中的大多数都可以在患者的智能手机应用程序上随时获得,患者可以在那里查看他们当前和回顾性的数据。对于糖尿病患者来说,CGM改变了他们的疾病管理方式。许多传感器设备与智能手机连接,以显示图表、指标和警报。然而,对于一些患者来说,这些指标和图可能具有挑战性。在这项工作中,我们探索了如何使用大型语言模型(llm)来回答有关CGM数据的问题。我们为糖尿病管理中的CGM数据制作了一个时间序列问答任务的开源基准。我们评估了不同的LLM框架,以提供性能基准。最后,我们强调需要对如何优化LLM框架进行更多研究,以最好地处理有关可穿戴数据的问题。我们的基准是公开的,以供将来使用和开发。虽然这个基准是专门为糖尿病护理设计的,但我们的模型实现和一些统计任务可以扩展到其他可穿戴设备领域。
{"title":"LLM-CGM: A Benchmark for Large Language Model-Enabled Querying of Continuous Glucose Monitoring Data for Conversational Diabetes Management.","authors":"Elizabeth Healey, Isaac Kohane","doi":"10.1142/9789819807024_0007","DOIUrl":"10.1142/9789819807024_0007","url":null,"abstract":"<p><p>Over the past decade, wearable technology has dramatically changed how patients manage chronic diseases. The widespread availability of on-body sensors, such as heart rate monitors and continuous glucose monitoring (CGM) sensors, has allowed patients to have real-time data about their health. Most of these data are readily available on patients' smartphone applications, where patients can view their current and retrospective data. For patients with diabetes, CGM has transformed how their disease is managed. Many sensor devices interface with smartphones to display charts, metrics, and alerts. However, these metrics and plots may be challenging for some patients to interpret. In this work, we explore how large language models (LLMs) can be used to answer questions about CGM data. We produce an open-source benchmark of time-series question-answering tasks for CGM data in diabetes management. We evaluate different LLM frameworks to provide a performance benchmark. Lastly, we highlight the need for more research on how to optimize LLM frameworks to best handle questions about wearable data. Our benchmark is publicly available for future use and development. While this benchmark is specifically designed for diabetes care, our model implementation and several of the statistical tasks can be extended to other wearable device domains.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"82-93"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data. 关注:生物医学数据的可扩展多模态集成。
Q2 Computer Science Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0041
Michal Golovanevsky, Eva Schiller, Akira Nair, Eric Han, Ritambhara Singh, Carsten Eickhoff

Multimodal models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to disease diagnosis. Despite the importance of multimodal learning, existing efforts focus on vision-language applications, where the number of modalities rarely exceeds four (images, text, audio, video). However, data in healthcare domain, may include many more modalities like X-rays, PET scans, MRIs, genetic screening, genomic data, and clinical notes, creating a need for both efficient and accurate data integration. Many state-of-the-art multimodal models rely on cross-attention or self-attention for effective data integration, which do not scale well for applications with more than two modalities. The complexity per layer of computing attention in either paradigm is, at best, quadratic with respect to the number of modalities, posing a computational bottleneck that impedes broad adoption. To address this, we propose a new attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities, thus offering a significant reduction in computational complexity compared to existing multimodal attention methods. Using three clinical datasets with multiple diverse modalities, we show that our method decreases computation costs while maintaining or increasing performance compared to popular integration techniques. Across all clinical datasets, OvO reduced the number of required floating point operations (FLOPs) by at least 91.98%, demonstrating its significant impact on efficiency and enabling multi-modal predictions in healthcare.

多模态模型在从问题解答到疾病诊断等各种任务中超越了单模态方法,变得越来越重要。尽管多模态学习非常重要,但现有的工作主要集中在视觉语言应用上,其中模态的数量很少超过四种(图像、文本、音频、视频)。然而,医疗保健领域的数据可能包括更多模态,如 X 光、正电子发射计算机断层扫描、核磁共振成像、基因筛查、基因组数据和临床笔记,因此需要高效、准确的数据集成。许多最先进的多模态模型依赖交叉注意或自我注意来实现有效的数据整合,但这两种方法并不能很好地扩展到包含两种以上模态的应用中。在这两种模式中,每层计算注意力的复杂度充其量与模态的数量成二次关系,这就造成了计算瓶颈,阻碍了广泛应用。为了解决这个问题,我们提出了一种新的注意力机制--"单对其他"(OvO)注意力,它与模态的数量成线性关系,因此与现有的多模态注意力方法相比,计算复杂度大大降低。通过使用三个包含多种不同模态的临床数据集,我们发现与流行的整合技术相比,我们的方法在保持或提高性能的同时降低了计算成本。在所有临床数据集上,OvO 将所需浮点运算 (FLOP) 的次数减少了至少 91.98%,这表明它对效率有显著影响,并能在医疗保健领域实现多模态预测。
{"title":"One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data.","authors":"Michal Golovanevsky, Eva Schiller, Akira Nair, Eric Han, Ritambhara Singh, Carsten Eickhoff","doi":"10.1142/9789819807024_0041","DOIUrl":"10.1142/9789819807024_0041","url":null,"abstract":"<p><p>Multimodal models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to disease diagnosis. Despite the importance of multimodal learning, existing efforts focus on vision-language applications, where the number of modalities rarely exceeds four (images, text, audio, video). However, data in healthcare domain, may include many more modalities like X-rays, PET scans, MRIs, genetic screening, genomic data, and clinical notes, creating a need for both efficient and accurate data integration. Many state-of-the-art multimodal models rely on cross-attention or self-attention for effective data integration, which do not scale well for applications with more than two modalities. The complexity per layer of computing attention in either paradigm is, at best, quadratic with respect to the number of modalities, posing a computational bottleneck that impedes broad adoption. To address this, we propose a new attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities, thus offering a significant reduction in computational complexity compared to existing multimodal attention methods. Using three clinical datasets with multiple diverse modalities, we show that our method decreases computation costs while maintaining or increasing performance compared to popular integration techniques. Across all clinical datasets, OvO reduced the number of required floating point operations (FLOPs) by at least 91.98%, demonstrating its significant impact on efficiency and enabling multi-modal predictions in healthcare.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"580-593"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1