Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing最新文献_第9页

nSEA: n-Node Subnetwork Enumeration Algorithm Identifies Lower Grade Glioma Subtypes with Altered Subnetworks and Distinct Prognostics. nSEA：n节点子网络枚举算法可识别具有改变的子网络和不同预后的低级别胶质瘤亚型。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Zhihan Zhang, Christiana Wang, Ziyin Zhao, Ziyue Yi, Arda Durmaz, Jennifer S Yu, Gurkan Bebek

Advances in molecular characterization have reshaped our understanding of low-grade glioma (LGG) subtypes, emphasizing the need for comprehensive classification beyond histology. Lever-aging this, we present a novel approach, network-based Subnetwork Enumeration, and Analysis (nSEA), to identify distinct LGG patient groups based on dysregulated molecular pathways. Using gene expression profiles from 516 patients and a protein-protein interaction network we generated 25 million sub-networks. Through our unsupervised bottom-up approach, we selected 92 subnetworks that categorized LGG patients into five groups. Notably, a new LGG patient group with a lack of mutations in EGFR, NF1, and PTEN emerged as a previously unidentified patient subgroup with unique clinical features and subnetwork states. Validation of the patient groups on an independent dataset demonstrated the robustness of our approach and revealed consistent survival traits across different patient populations. This study offers a comprehensive molecular classification of LGG, providing insights beyond traditional genetic markers. By integrating network analysis with patient clustering, we unveil a previously overlooked patient subgroup with potential implications for prognosis and treatment strategies. Our approach sheds light on the synergistic nature of driver genes and highlights the biological relevance of the identified subnetworks. With broad implications for glioma research, our findings pave the way for further investigations into the mechanistic underpinnings of LGG subtypes and their clinical relevance.Availability: Source code and supplementary data are available at https://github.com/bebeklab/nSEA.

分子特征描述的进展重塑了我们对低级别胶质瘤（LGG）亚型的认识，强调了超越组织学进行综合分类的必要性。利用这一点，我们提出了一种新方法--基于网络的子网络枚举和分析（nSEA）--来根据失调的分子通路识别不同的 LGG 患者群体。利用来自 516 名患者的基因表达谱和蛋白-蛋白相互作用网络，我们生成了 2,500 万个子网络。通过自下而上的无监督方法，我们筛选出 92 个子网络，将 LGG 患者分为五组。值得注意的是，一个缺乏表皮生长因子受体（EGFR）、NF1和PTEN突变的新LGG患者组出现了，这是一个以前未被发现的患者亚组，具有独特的临床特征和亚网络状态。在一个独立数据集上对患者分组进行的验证证明了我们的方法的稳健性，并揭示了不同患者群体的一致生存特征。这项研究提供了一种全面的 LGG 分子分类方法，提供了超越传统遗传标记的见解。通过将网络分析与患者聚类相结合，我们揭示了一个以前被忽视的患者亚群，并对预后和治疗策略产生了潜在影响。我们的方法揭示了驱动基因的协同作用，并强调了已识别子网络的生物学相关性。我们的发现对胶质瘤研究具有广泛的意义，为进一步研究 LGG 亚型的机理基础及其临床意义铺平了道路：源代码和补充数据见 https://github.com/bebeklab/nSEA。

{"title":"nSEA: n-Node Subnetwork Enumeration Algorithm Identifies Lower Grade Glioma Subtypes with Altered Subnetworks and Distinct Prognostics.","authors":"Zhihan Zhang, Christiana Wang, Ziyin Zhao, Ziyue Yi, Arda Durmaz, Jennifer S Yu, Gurkan Bebek","doi":"","DOIUrl":"","url":null,"abstract":"Advances in molecular characterization have reshaped our understanding of low-grade glioma (LGG) subtypes, emphasizing the need for comprehensive classification beyond histology. Lever-aging this, we present a novel approach, network-based Subnetwork Enumeration, and Analysis (nSEA), to identify distinct LGG patient groups based on dysregulated molecular pathways. Using gene expression profiles from 516 patients and a protein-protein interaction network we generated 25 million sub-networks. Through our unsupervised bottom-up approach, we selected 92 subnetworks that categorized LGG patients into five groups. Notably, a new LGG patient group with a lack of mutations in EGFR, NF1, and PTEN emerged as a previously unidentified patient subgroup with unique clinical features and subnetwork states. Validation of the patient groups on an independent dataset demonstrated the robustness of our approach and revealed consistent survival traits across different patient populations. This study offers a comprehensive molecular classification of LGG, providing insights beyond traditional genetic markers. By integrating network analysis with patient clustering, we unveil a previously overlooked patient subgroup with potential implications for prognosis and treatment strategies. Our approach sheds light on the synergistic nature of driver genes and highlights the biological relevance of the identified subnetworks. With broad implications for glioma research, our findings pave the way for further investigations into the mechanistic underpinnings of LGG subtypes and their clinical relevance.Availability: Source code and supplementary data are available at https://github.com/bebeklab/nSEA.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"521-533"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Expanding the access of wearable silicone wristbands in community-engaged research through best practices in data analysis and integration. 通过数据分析和整合方面的最佳实践，扩大可穿戴硅胶腕带在社区参与式研究中的使用范围。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Lisa M Bramer, Holly M Dixon, David J Degnan, Diana Rohlman, Julie B Herbstman, Kim A Anderson, Katrina M Waters

Wearable silicone wristbands are a rapidly growing exposure assessment technology that offer researchers the ability to study previously inaccessible cohorts and have the potential to provide a more comprehensive picture of chemical exposure within diverse communities. However, there are no established best practices for analyzing the data within a study or across multiple studies, thereby limiting impact and access of these data for larger meta-analyses. We utilize data from three studies, from over 600 wristbands worn by participants in New York City and Eugene, Oregon, to present a first-of-its-kind manuscript detailing wristband data properties. We further discuss and provide concrete examples of key areas and considerations in common statistical modeling methods where best practices must be established to enable meta-analyses and integration of data from multiple studies. Finally, we detail important and challenging aspects of machine learning, meta-analysis, and data integration that researchers will face in order to extend beyond the limited scope of individual studies focused on specific populations.

可穿戴硅胶腕带是一种快速发展的暴露评估技术，它为研究人员提供了研究以前无法接触到的群体的能力，并有可能更全面地反映不同社区的化学品暴露情况。然而，目前还没有既定的最佳实践来分析一项研究或多项研究中的数据，从而限制了这些数据对大型荟萃分析的影响和使用。我们利用纽约市和俄勒冈州尤金市参与者佩戴的 600 多条腕带上的三项研究数据，首次提交了一份详细说明腕带数据特性的手稿。我们进一步讨论了常用统计建模方法中的关键领域和注意事项，并提供了具体实例，这些领域和注意事项必须建立最佳实践，才能进行荟萃分析和整合来自多项研究的数据。最后，我们详细介绍了研究人员在机器学习、荟萃分析和数据整合方面将面临的重要挑战，以便超越以特定人群为重点的单项研究的有限范围。

{"title":"Expanding the access of wearable silicone wristbands in community-engaged research through best practices in data analysis and integration.","authors":"Lisa M Bramer, Holly M Dixon, David J Degnan, Diana Rohlman, Julie B Herbstman, Kim A Anderson, Katrina M Waters","doi":"","DOIUrl":"","url":null,"abstract":"Wearable silicone wristbands are a rapidly growing exposure assessment technology that offer researchers the ability to study previously inaccessible cohorts and have the potential to provide a more comprehensive picture of chemical exposure within diverse communities. However, there are no established best practices for analyzing the data within a study or across multiple studies, thereby limiting impact and access of these data for larger meta-analyses. We utilize data from three studies, from over 600 wristbands worn by participants in New York City and Eugene, Oregon, to present a first-of-its-kind manuscript detailing wristband data properties. We further discuss and provide concrete examples of key areas and considerations in common statistical modeling methods where best practices must be established to enable meta-analyses and integration of data from multiple studies. Finally, we detail important and challenging aspects of machine learning, meta-analysis, and data integration that researchers will face in order to extend beyond the limited scope of individual studies focused on specific populations.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"170-186"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10766083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations. PopGenAdapt：在代表性不足的人群中进行基因型到表型预测的半监督领域适应。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Marçal Comajoan Cara, Daniel Mas Montserrat, Alexander G Ioannidis

The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.Our code is available at https://github.com/AI-sandbox/PopGenAdapt.

基因组数据集目前偏重于欧洲血统的个体，缺乏多样性，这给开发包容性生物医学模型带来了挑战。此类数据的稀缺性在包含与电子健康记录相关联的基因组数据的标记数据集中尤为明显。为了弥补这一不足，本文介绍了一种基因型到表型预测模型 PopGenAdapt，它采用了最初为计算机视觉提出的半监督领域适应（SSDA）技术。PopGenAdapt 的设计目的是利用欧洲血统个体的大量标注数据，以及目前代表性不足人群的有限标注数据和大量未标注数据。该方法在来自尼日利亚、斯里兰卡和夏威夷的代表性不足人群中进行了评估，以预测几种疾病的结果。结果表明，与最先进的监督学习方法相比，针对这些人群的基因型到表型模型的性能有了显著提高，这使得 SSDA 成为在生物医学研究中创建更具包容性的机器学习模型的一种有前途的策略。我们的代码可在 https://github.com/AI-sandbox/PopGenAdapt 上获取。

{"title":"PopGenAdapt: Semi-Supervised Domain Adaptation for Genotype-to-Phenotype Prediction in Underrepresented Populations.","authors":"Marçal Comajoan Cara, Daniel Mas Montserrat, Alexander G Ioannidis","doi":"","DOIUrl":"","url":null,"abstract":"The lack of diversity in genomic datasets, currently skewed towards individuals of European ancestry, presents a challenge in developing inclusive biomedical models. The scarcity of such data is particularly evident in labeled datasets that include genomic data linked to electronic health records. To address this gap, this paper presents PopGenAdapt, a genotype-to-phenotype prediction model which adopts semi-supervised domain adaptation (SSDA) techniques originally proposed for computer vision. PopGenAdapt is designed to leverage the substantial labeled data available from individuals of European ancestry, as well as the limited labeled and the larger amount of unlabeled data from currently underrepresented populations. The method is evaluated in underrepresented populations from Nigeria, Sri Lanka, and Hawaii for the prediction of several disease outcomes. The results suggest a significant improvement in the performance of genotype-to-phenotype models for these populations over state-of-the-art supervised learning methods, setting SSDA as a promising strategy for creating more inclusive machine learning models in biomedical research.Our code is available at https://github.com/AI-sandbox/PopGenAdapt.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"327-340"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10906137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Systematic Estimation of Treatment Effect on Hospitalization Risk as a Drug Repurposing Screening Method. 系统估算治疗效果对住院风险的影响，作为药物再利用筛选方法。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Costa Georgantas, Jaume Banus, Roger Hullin, Jonas Richiardi

Drug repurposing (DR) intends to identify new uses for approved medications outside their original indication. Computational methods for finding DR candidates usually rely on prior biological and chemical information on a specific drug or target but rarely utilize real-world observations. In this work, we propose a simple and effective systematic screening approach to measure medication impact on hospitalization risk based on large-scale observational data. We use common classification systems to group drugs and diseases into broader functional categories and test for non-zero effects in each drug-disease category pair. Treatment effects on the hospitalization risk of an individual disease are obtained by combining widely used methods for causal inference and time-to-event modelling. 6468 drug-disease pairs were tested using data from the UK Biobank, focusing on cardiovascular, metabolic, and respiratory diseases. We determined key parameters to reduce the number of spurious correlations and identified 7 statistically significant associations of reduced hospitalization risk after correcting for multiple testing. Some of these associations were already reported in other studies, including new potential applications for cardioselective beta-blockers and thiazides. We also found evidence for proton pump inhibitor side effects and multiple possible associations for anti-diabetic drugs. Our work demonstrates the applicability of the present screening approach and the utility of real-world data for identifying potential DR candidates.

药物再利用（DR）旨在为已批准的药物确定其原始适应症之外的新用途。寻找 DR 候选药物的计算方法通常依赖于特定药物或靶点的先前生物和化学信息，但很少利用真实世界的观察结果。在这项工作中，我们提出了一种简单有效的系统筛选方法，基于大规模观察数据来衡量药物对住院风险的影响。我们使用常见的分类系统将药物和疾病归入更广泛的功能类别，并检验每个药物-疾病类别对的非零效应。通过结合广泛使用的因果推断和时间到事件建模方法，得出治疗对单个疾病住院风险的影响。我们利用英国生物库的数据对 6468 对药物-疾病配对进行了测试，重点关注心血管、代谢和呼吸系统疾病。我们确定了减少虚假相关性的关键参数，并在校正多重检验后确定了 7 种具有统计学意义的降低住院风险的相关性。其中一些关联在其他研究中已有报道，包括心脏选择性β受体阻滞剂和噻嗪类药物的新潜在应用。我们还发现了质子泵抑制剂副作用的证据以及抗糖尿病药物的多种可能关联。我们的工作证明了目前筛选方法的适用性以及真实世界数据在确定潜在 DR 候选药物方面的实用性。

{"title":"Systematic Estimation of Treatment Effect on Hospitalization Risk as a Drug Repurposing Screening Method.","authors":"Costa Georgantas, Jaume Banus, Roger Hullin, Jonas Richiardi","doi":"","DOIUrl":"","url":null,"abstract":"Drug repurposing (DR) intends to identify new uses for approved medications outside their original indication. Computational methods for finding DR candidates usually rely on prior biological and chemical information on a specific drug or target but rarely utilize real-world observations. In this work, we propose a simple and effective systematic screening approach to measure medication impact on hospitalization risk based on large-scale observational data. We use common classification systems to group drugs and diseases into broader functional categories and test for non-zero effects in each drug-disease category pair. Treatment effects on the hospitalization risk of an individual disease are obtained by combining widely used methods for causal inference and time-to-event modelling. 6468 drug-disease pairs were tested using data from the UK Biobank, focusing on cardiovascular, metabolic, and respiratory diseases. We determined key parameters to reduce the number of spurious correlations and identified 7 statistically significant associations of reduced hospitalization risk after correcting for multiple testing. Some of these associations were already reported in other studies, including new potential applications for cardioselective beta-blockers and thiazides. We also found evidence for proton pump inhibitor side effects and multiple possible associations for anti-diabetic drugs. Our work demonstrates the applicability of the present screening approach and the utility of real-world data for identifying potential DR candidates.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"232-246"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VetLLM: Large Language Model for Predicting Diagnosis from Veterinary Notes. VetLLM：从兽医笔记中预测诊断的大型语言模型。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Yixing Jiang, Jeremy A Irvin, Andrew Y Ng, James Zou

Lack of diagnosis coding is a barrier to leveraging veterinary notes for medical and public health research. Previous work is limited to develop specialized rule-based or customized supervised learning models to predict diagnosis coding, which is tedious and not easily transferable. In this work, we show that open-source large language models (LLMs) pretrained on general corpus can achieve reasonable performance in a zero-shot setting. Alpaca-7B can achieve a zero-shot F1 of 0.538 on CSU test data and 0.389 on PP test data, two standard benchmarks for coding from veterinary notes. Furthermore, with appropriate fine-tuning, the performance of LLMs can be substantially boosted, exceeding those of strong state-of-the-art supervised models. VetLLM, which is fine-tuned on Alpaca-7B using just 5000 veterinary notes, can achieve a F1 of 0.747 on CSU test data and 0.637 on PP test data. It is of note that our fine-tuning is data-efficient: using 200 notes can outperform supervised models trained with more than 100,000 notes. The findings demonstrate the great potential of leveraging LLMs for language processing tasks in medicine, and we advocate this new paradigm for processing clinical text.

缺乏诊断编码是利用兽医笔记进行医学和公共卫生研究的障碍。以往的工作仅限于开发基于规则的专门模型或定制的监督学习模型来预测诊断编码，这既繁琐又不易移植。在这项工作中，我们展示了在通用语料库上预先训练的开源大语言模型（LLM）可以在零镜头设置中实现合理的性能。Alpaca-7B 在 CSU 测试数据和 PP 测试数据（兽医笔记编码的两个标准基准）上的零射频 F1 分别为 0.538 和 0.389。此外，通过适当的微调，LLM 的性能可以大幅提升，超过最先进的强监督模型。仅使用 5000 份兽医笔记在 Alpaca-7B 上进行微调的 VetLLM 在 CSU 测试数据上的 F1 值为 0.747，在 PP 测试数据上的 F1 值为 0.637。值得注意的是，我们的微调具有很高的数据效率：使用 200 份笔记的效果优于使用超过 100,000 份笔记训练的监督模型。研究结果表明，利用 LLMs 完成医学语言处理任务具有巨大的潜力，我们提倡将这种新模式用于处理临床文本。

{"title":"VetLLM: Large Language Model for Predicting Diagnosis from Veterinary Notes.","authors":"Yixing Jiang, Jeremy A Irvin, Andrew Y Ng, James Zou","doi":"","DOIUrl":"","url":null,"abstract":"Lack of diagnosis coding is a barrier to leveraging veterinary notes for medical and public health research. Previous work is limited to develop specialized rule-based or customized supervised learning models to predict diagnosis coding, which is tedious and not easily transferable. In this work, we show that open-source large language models (LLMs) pretrained on general corpus can achieve reasonable performance in a zero-shot setting. Alpaca-7B can achieve a zero-shot F1 of 0.538 on CSU test data and 0.389 on PP test data, two standard benchmarks for coding from veterinary notes. Furthermore, with appropriate fine-tuning, the performance of LLMs can be substantially boosted, exceeding those of strong state-of-the-art supervised models. VetLLM, which is fine-tuned on Alpaca-7B using just 5000 veterinary notes, can achieve a F1 of 0.747 on CSU test data and 0.637 on PP test data. It is of note that our fine-tuning is data-efficient: using 200 notes can outperform supervised models trained with more than 100,000 notes. The findings demonstrate the great potential of leveraging LLMs for language processing tasks in medicine, and we advocate this new paradigm for processing clinical text.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"120-133"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

KombOver: Efficient k-core and K-truss based characterization of perturbations within the human gut microbiome. KombOver：基于 K 核心和 K 桁架的人类肠道微生物群扰动高效表征。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Nicolae Sapoval, Marko Tanevski, Todd J Treangen

The microbes present in the human gastrointestinal tract are regularly linked to human health and disease outcomes. Thanks to technological and methodological advances in recent years, metagenomic sequencing data, and computational methods designed to analyze metagenomic data, have contributed to improved understanding of the link between the human gut microbiome and disease. However, while numerous methods have been recently developed to extract quantitative and qualitative results from host-associated microbiome data, improved computational tools are still needed to track microbiome dynamics with short-read sequencing data. Previously we have proposed KOMB as a de novo tool for identifying copy number variations in metagenomes for characterizing microbial genome dynamics in response to perturbations. In this work, we present KombOver (KO), which includes four key contributions with respect to our previous work: (i) it scales to large microbiome study cohorts, (ii) it includes both k-core and K-truss based analysis, (iii) we provide the foundation of a theoretical understanding of the relation between various graph-based metagenome representations, and (iv) we provide an improved user experience with easier-to-run code and more descriptive outputs/results. To highlight the aforementioned benefits, we applied KO to nearly 1000 human microbiome samples, requiring less than 10 minutes and 10 GB RAM per sample to process these data. Furthermore, we highlight how graph-based approaches such as k-core and K-truss can be informative for pinpointing microbial community dynamics within a myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) cohort. KO is open source and available for download/use at: https://github.com/treangenlab/komb.

人类胃肠道中的微生物经常与人类健康和疾病结果联系在一起。近年来，由于技术和方法上的进步，元基因组测序数据和用于分析元基因组数据的计算方法有助于人们更好地了解人类肠道微生物组与疾病之间的联系。然而，尽管最近已开发出许多方法来从宿主相关微生物组数据中提取定量和定性结果，但仍需要改进计算工具来利用短线程测序数据跟踪微生物组动态。在此之前，我们已经提出了 KOMB 作为一种全新的工具，用于识别元基因组中的拷贝数变异，以描述微生物基因组对扰动的动态响应。在这项工作中，我们提出了 KombOver (KO)，它与我们之前的工作相比有四个主要贡献：(i) 它可扩展到大型微生物组研究队列；(ii) 它包括基于 K 核和 K 桁架的分析；(iii) 我们为理解各种基于图的元基因组表示之间的关系提供了理论基础；(iv) 我们提供了更好的用户体验，代码更易于运行，输出/结果更具描述性。为了突出上述优势，我们将 KO 应用于近 1000 个人类微生物组样本，每个样本只需不到 10 分钟和 10 GB 内存就能处理这些数据。此外，我们还强调了基于图的方法（如 k-core 和 K-truss）如何为确定肌痛性脑脊髓炎/慢性疲劳综合征（ME/CFS）队列中的微生物群落动态提供信息。KO 是开放源代码，可在以下网址下载/使用：https://github.com/treangenlab/komb。

{"title":"KombOver: Efficient k-core and K-truss based characterization of perturbations within the human gut microbiome.","authors":"Nicolae Sapoval, Marko Tanevski, Todd J Treangen","doi":"","DOIUrl":"","url":null,"abstract":"The microbes present in the human gastrointestinal tract are regularly linked to human health and disease outcomes. Thanks to technological and methodological advances in recent years, metagenomic sequencing data, and computational methods designed to analyze metagenomic data, have contributed to improved understanding of the link between the human gut microbiome and disease. However, while numerous methods have been recently developed to extract quantitative and qualitative results from host-associated microbiome data, improved computational tools are still needed to track microbiome dynamics with short-read sequencing data. Previously we have proposed KOMB as a de novo tool for identifying copy number variations in metagenomes for characterizing microbial genome dynamics in response to perturbations. In this work, we present KombOver (KO), which includes four key contributions with respect to our previous work: (i) it scales to large microbiome study cohorts, (ii) it includes both k-core and K-truss based analysis, (iii) we provide the foundation of a theoretical understanding of the relation between various graph-based metagenome representations, and (iv) we provide an improved user experience with easier-to-run code and more descriptive outputs/results. To highlight the aforementioned benefits, we applied KO to nearly 1000 human microbiome samples, requiring less than 10 minutes and 10 GB RAM per sample to process these data. Furthermore, we highlight how graph-based approaches such as k-core and K-truss can be informative for pinpointing microbial community dynamics within a myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) cohort. KO is open source and available for download/use at: https://github.com/treangenlab/komb.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"506-520"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10764071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low- and high-level information analyses of transcriptome connecting endometrial-decidua-placental origin of preeclampsia subtypes: A preliminary study. 子痫前期亚型子宫内膜-蜕膜-胎盘来源转录组的低级和高级信息分析：初步研究。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Herdiantri Sufriyana, Yu-Wei Wu, Emily Chia-Yu Su

Background: Existing proposed pathogenesis for preeclampsia (PE) was only applied for early onset subtype and did not consider pre-pregnancy and competing risks. We aimed to decipher PE subtypes by identifying related transcriptome that represents endometrial maturation and histologic chorioamnionitis.

Methods: We utilized eight arrays of mRNA expression for discovery (n=289), and other eight arrays for validation (n=352). Differentially expressed genes (DEGs) were overlapped between those of: (1) healthy samples from endometrium, decidua, and placenta, and placenta samples under histologic chorioamnionitis; and (2) placenta samples for each of the subtypes. They were all possible combinations based on four axes: (1) pregnancy-induced hypertension; (2) placental dysfunction-related diseases (e.g., fetal growth restriction [FGR]); (3) onset; and (4) severity.

Results: The DEGs of endometrium at late-secretory phase, but none of decidua, significantly overlapped with those of any subtypes with: (1) early onset (p-values ≤0.008); (2) severe hypertension and proteinuria (p-values ≤0.042); or (3) chronic hypertension and/or severe PE with FGR (p-values ≤0.042). Although sharing the same subtypes whose DEGs with which significantly overlap, the gene regulation was mostly counter-expressed in placenta under chorioamnionitis (n=13/18, 72.22%; odds ratio [OR] upper bounds ≤0.21) but co-expressed in late-secretory endometrium (n=3/9, 66.67%; OR lower bounds ≥1.17). Neither the placental DEGs at first-nor second-trimester under normotensive pregnancy significantly overlapped with those under late-onset, severe PE without FGR.

Conclusions: We identified the transcriptome of endometrial maturation in placental dysfunction that distinguished early- and late-onset PE, and indicated chorioamnionitis as a PE competing risk. This study implied a feasibility to develop and validate the pathogenesis models that include pre-pregnancy and competing risks to decide if it is needed to collect prospective data for PE starting from pre-pregnancy including chorioamnionitis information.

背景：现有的子痫前期（PE）发病机制仅适用于早发亚型，并未考虑孕前和竞争性风险。我们的目的是通过识别代表子宫内膜成熟和组织学绒毛膜炎的相关转录组来解读子痫前期亚型：我们利用八种 mRNA 表达阵列进行发现（样本数=289），并利用其他八种阵列进行验证（样本数=352）。差异表达基因（DEGs）在以下两类样本中重叠：(1) 子宫内膜、蜕膜和胎盘的健康样本和组织学绒毛膜羊膜炎的胎盘样本；(2) 每种亚型的胎盘样本。它们都是基于四个轴的可能组合：（1）妊娠诱发高血压；（2）胎盘功能障碍相关疾病（如胎儿生长受限[FGR]）；（3）发病；（4）严重程度：结果：分泌晚期子宫内膜的 DEGs 与任何亚型的 DEGs 都有明显重叠，但蜕膜没有：(1)早期发病（p 值≤0.008）；(2)严重高血压和蛋白尿（p 值≤0.042）；或(3)慢性高血压和/或严重 PE 合并 FGR（p 值≤0.042）。虽然DEGs与之有明显重叠的亚型相同，但在绒毛膜羊膜炎的胎盘中，基因调控大多是反表达（n=13/18，72.22%；比值比[OR]上限≤0.21），但在晚分泌期子宫内膜中却是共表达（n=3/9，66.67%；比值比下限≥1.17）。正常血压妊娠的胎盘 DEGs 在一胎和二胎均未与晚期重度 PE 无 FGR 的胎盘 DEGs 显著重叠：我们确定了胎盘功能障碍中子宫内膜成熟的转录组，该转录组可区分早发和晚发PE，并指出绒毛膜羊膜炎是PE的竞争风险之一。这项研究意味着开发和验证包括孕前和竞争风险在内的发病机理模型的可行性，以决定是否需要从孕前开始收集包括绒毛膜羊膜炎信息在内的前瞻性 PE 数据。

{"title":"Low- and high-level information analyses of transcriptome connecting endometrial-decidua-placental origin of preeclampsia subtypes: A preliminary study.","authors":"Herdiantri Sufriyana, Yu-Wei Wu, Emily Chia-Yu Su","doi":"","DOIUrl":"","url":null,"abstract":"Background: Existing proposed pathogenesis for preeclampsia (PE) was only applied for early onset subtype and did not consider pre-pregnancy and competing risks. We aimed to decipher PE subtypes by identifying related transcriptome that represents endometrial maturation and histologic chorioamnionitis.Methods: We utilized eight arrays of mRNA expression for discovery (n=289), and other eight arrays for validation (n=352). Differentially expressed genes (DEGs) were overlapped between those of: (1) healthy samples from endometrium, decidua, and placenta, and placenta samples under histologic chorioamnionitis; and (2) placenta samples for each of the subtypes. They were all possible combinations based on four axes: (1) pregnancy-induced hypertension; (2) placental dysfunction-related diseases (e.g., fetal growth restriction [FGR]); (3) onset; and (4) severity.Results: The DEGs of endometrium at late-secretory phase, but none of decidua, significantly overlapped with those of any subtypes with: (1) early onset (p-values ≤0.008); (2) severe hypertension and proteinuria (p-values ≤0.042); or (3) chronic hypertension and/or severe PE with FGR (p-values ≤0.042). Although sharing the same subtypes whose DEGs with which significantly overlap, the gene regulation was mostly counter-expressed in placenta under chorioamnionitis (n=13/18, 72.22%; odds ratio [OR] upper bounds ≤0.21) but co-expressed in late-secretory endometrium (n=3/9, 66.67%; OR lower bounds ≥1.17). Neither the placental DEGs at first-nor second-trimester under normotensive pregnancy significantly overlapped with those under late-onset, severe PE without FGR.Conclusions: We identified the transcriptome of endometrial maturation in placental dysfunction that distinguished early- and late-onset PE, and indicated chorioamnionitis as a PE competing risk. This study implied a feasibility to develop and validate the pathogenesis models that include pre-pregnancy and competing risks to decide if it is needed to collect prospective data for PE starting from pre-pregnancy including chorioamnionitis information.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"549-563"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MaTiLDA: An Integrated Machine Learning and Topological Data Analysis Platform for Brain Network Dynamics. MaTiLDA：用于脑网络动力学的机器学习和拓扑数据分析集成平台。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Katrina Prantzalos, Dipak Upadhyaya, Nassim Shafiabadi, Guadalupe Fernandez-BacaVaca, Nick Gurski, Kenneth Yoshimoto, Subhashini Sivagnanam, Amitava Majumdar, Satya S Sahoo

Topological data analysis (TDA) combined with machine learning (ML) algorithms is a powerful approach for investigating complex brain interaction patterns in neurological disorders such as epilepsy. However, the use of ML algorithms and TDA for analysis of aberrant brain interactions requires substantial domain knowledge in computing as well as pure mathematics. To lower the threshold for clinical and computational neuroscience researchers to effectively use ML algorithms together with TDA to study neurological disorders, we introduce an integrated web platform called MaTiLDA. MaTiLDA is the first tool that enables users to intuitively use TDA methods together with ML models to characterize interaction patterns derived from neurophysiological signal data such as electroencephalogram (EEG) recorded during routine clinical practice. MaTiLDA features support for TDA methods, such as persistent homology, that enable classification of signal data using ML models to provide insights into complex brain interaction patterns in neurological disorders. We demonstrate the practical use of MaTiLDA by analyzing high-resolution intracranial EEG from refractory epilepsy patients to characterize the distinct phases of seizure propagation to different brain regions. The MaTiLDA platform is available at: https://bmhinformatics.case.edu/nicworkflow/MaTiLDA.

拓扑数据分析（TDA）与机器学习（ML）算法相结合，是研究癫痫等神经系统疾病中复杂的大脑交互模式的有力方法。然而，使用 ML 算法和 TDA 分析异常大脑交互需要大量的计算领域知识和纯数学知识。为了降低临床和计算神经科学研究人员有效使用 ML 算法和 TDA 研究神经系统疾病的门槛，我们推出了一个名为 MaTiLDA 的集成网络平台。MaTiLDA 是第一个能让用户直观地使用 TDA 方法和 ML 模型来描述从神经生理学信号数据（如常规临床实践中记录的脑电图）中得出的交互模式的工具。MaTiLDA 支持持续同源性等 TDA 方法，可使用 ML 模型对信号数据进行分类，从而深入了解神经系统疾病中复杂的大脑交互模式。通过分析难治性癫痫患者的高分辨率颅内脑电图，我们展示了 MaTiLDA 的实际应用，以描述癫痫发作向不同脑区传播的不同阶段。MaTiLDA平台的网址是：https://bmhinformatics.case.edu/nicworkflow/MaTiLDA。

{"title":"MaTiLDA: An Integrated Machine Learning and Topological Data Analysis Platform for Brain Network Dynamics.","authors":"Katrina Prantzalos, Dipak Upadhyaya, Nassim Shafiabadi, Guadalupe Fernandez-BacaVaca, Nick Gurski, Kenneth Yoshimoto, Subhashini Sivagnanam, Amitava Majumdar, Satya S Sahoo","doi":"","DOIUrl":"","url":null,"abstract":"Topological data analysis (TDA) combined with machine learning (ML) algorithms is a powerful approach for investigating complex brain interaction patterns in neurological disorders such as epilepsy. However, the use of ML algorithms and TDA for analysis of aberrant brain interactions requires substantial domain knowledge in computing as well as pure mathematics. To lower the threshold for clinical and computational neuroscience researchers to effectively use ML algorithms together with TDA to study neurological disorders, we introduce an integrated web platform called MaTiLDA. MaTiLDA is the first tool that enables users to intuitively use TDA methods together with ML models to characterize interaction patterns derived from neurophysiological signal data such as electroencephalogram (EEG) recorded during routine clinical practice. MaTiLDA features support for TDA methods, such as persistent homology, that enable classification of signal data using ML models to provide insights into complex brain interaction patterns in neurological disorders. We demonstrate the practical use of MaTiLDA by analyzing high-resolution intracranial EEG from refractory epilepsy patients to characterize the distinct phases of seizure propagation to different brain regions. The MaTiLDA platform is available at: https://bmhinformatics.case.edu/nicworkflow/MaTiLDA.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"65-80"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Zoish: A Novel Feature Selection Approach Leveraging Shapley Additive Values for Machine Learning Applications in Healthcare. Zoish：利用 Shapley 加法值的新特征选择方法，用于医疗保健领域的机器学习应用。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Hossein Javedani Sadaei, Salvatore Loguercio, Mahdi Shafiei Neyestanak, Ali Torkamani, Daria Prilutsky

In the intricate landscape of healthcare analytics, effective feature selection is a prerequisite for generating robust predictive models, especially given the common challenges of sample sizes and potential biases. Zoish uniquely addresses these issues by employing Shapley additive values-an idea rooted in cooperative game theory-to enable both transparent and automated feature selection. Unlike existing tools, Zoish is versatile, designed to seamlessly integrate with an array of machine learning libraries including scikit-learn, XGBoost, CatBoost, and imbalanced-learn.The distinct advantage of Zoish lies in its dual algorithmic approach for calculating Shapley values, allowing it to efficiently manage both large and small datasets. This adaptability renders it exceptionally suitable for a wide spectrum of healthcare-related tasks. The tool also places a strong emphasis on interpretability, providing comprehensive visualizations for analyzed features. Its customizable settings offer users fine-grained control over feature selection, thus optimizing for specific predictive objectives.This manuscript elucidates the mathematical framework underpinning Zoish and how it uniquely combines local and global feature selection into a single, streamlined process. To validate Zoish's efficiency and adaptability, we present case studies in breast cancer prediction and Montreal Cognitive Assessment (MoCA) prediction in Parkinson's disease, along with evaluations on 300 synthetic datasets. These applications underscore Zoish's unparalleled performance in diverse healthcare contexts and against its counterparts.

在错综复杂的医疗分析领域，有效的特征选择是生成稳健预测模型的先决条件，尤其是考虑到样本量和潜在偏差等常见挑战。Zoish 采用夏普利加法值（Shapley additive values）--一种植根于合作博弈论的理念--实现了透明和自动的特征选择，从而独特地解决了这些问题。与现有工具不同的是，Zoish 具有多功能性，可与一系列机器学习库无缝集成，包括 scikit-learn、XGBoost、CatBoost 和 imbalanced-learn。这种适应性使其非常适合广泛的医疗保健相关任务。该工具还非常注重可解释性，为分析特征提供全面的可视化效果。本手稿阐明了 Zoish 的数学框架，以及它如何将局部和全局特征选择独特地结合到一个单一、精简的流程中。为了验证 Zoish 的效率和适应性，我们介绍了乳腺癌预测和帕金森病蒙特利尔认知评估（MoCA）预测的案例研究，以及对 300 个合成数据集的评估。这些应用凸显了 Zoish 在不同医疗环境中与同行相比无与伦比的性能。

{"title":"Zoish: A Novel Feature Selection Approach Leveraging Shapley Additive Values for Machine Learning Applications in Healthcare.","authors":"Hossein Javedani Sadaei, Salvatore Loguercio, Mahdi Shafiei Neyestanak, Ali Torkamani, Daria Prilutsky","doi":"","DOIUrl":"","url":null,"abstract":"In the intricate landscape of healthcare analytics, effective feature selection is a prerequisite for generating robust predictive models, especially given the common challenges of sample sizes and potential biases. Zoish uniquely addresses these issues by employing Shapley additive values-an idea rooted in cooperative game theory-to enable both transparent and automated feature selection. Unlike existing tools, Zoish is versatile, designed to seamlessly integrate with an array of machine learning libraries including scikit-learn, XGBoost, CatBoost, and imbalanced-learn.The distinct advantage of Zoish lies in its dual algorithmic approach for calculating Shapley values, allowing it to efficiently manage both large and small datasets. This adaptability renders it exceptionally suitable for a wide spectrum of healthcare-related tasks. The tool also places a strong emphasis on interpretability, providing comprehensive visualizations for analyzed features. Its customizable settings offer users fine-grained control over feature selection, thus optimizing for specific predictive objectives.This manuscript elucidates the mathematical framework underpinning Zoish and how it uniquely combines local and global feature selection into a single, streamlined process. To validate Zoish's efficiency and adaptability, we present case studies in breast cancer prediction and Montreal Cognitive Assessment (MoCA) prediction in Parkinson's disease, along with evaluations on 300 synthetic datasets. These applications underscore Zoish's unparalleled performance in diverse healthcare contexts and against its counterparts.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"81-95"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10764073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session Introduction: Precision Medicine: Innovative methods for advanced understanding of molecular underpinnings of disease. 会议简介：精准医学：通过创新方法深入了解疾病的分子基础。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2024-01-01

Yana Bromberg, Hannah Carter, Steven E Brenner

Precision medicine, also often referred to as personalized medicine, targets the development of treatments and preventative measures specific to the individual's genomic signatures, lifestyle, and environmental conditions. The series of Precision Medicine sessions in PSB has continuously highlighted the advances in this field. Our 2024 collection of manuscripts showcases algorithmic advances that integrate data from distinct modalities and introduce innovative approaches to extract new, medically relevant information from existing data. These evolving technology and analytical methods promise to bring closer the goals of precision medicine to improve health and increase lifespan.

精准医学，也常被称为个性化医学，其目标是针对个人的基因组特征、生活方式和环境条件，开发特定的治疗和预防措施。PSB的精准医学系列会议不断强调这一领域的进展。我们的2024年手稿集展示了算法的进步，这些算法整合了来自不同模式的数据，并引入创新方法从现有数据中提取新的医学相关信息。这些不断发展的技术和分析方法有望进一步实现精准医学的目标，改善健康状况，延长寿命。

引用次数: 0