Jiarong Song, Josh Lamstein, Vivek Gopal Ramaswamy, Michelle Webb, Gabriel Zada, Steven Finkbeiner, David W Craig
Spatial transcriptomics (ST) represents a pivotal advancement in biomedical research, enabling the transcriptional profiling of cells within their morphological context and providing a pivotal tool for understanding spatial heterogeneity in cancer tissues. However, current analytical approaches, akin to single-cell analysis, largely depend on gene expression, underutilizing the rich morphological information inherent in the tissue. We present a novel method integrating spatial transcriptomics and histopathological image data to better capture biologically meaningful patterns in patient data, focusing on aggressive cancer types such as glioblastoma and triple-negative breast cancer. We used a ResNet-based deep learning model to extract key morphological features from high-resolution whole-slide histology images. Spot-level PCA-reduced vectors of both the ResNet-50 analysis of the histological image and the spatial gene expression data were used in Louvain clustering to enable image-aware feature discovery. Assessment of features from image-aware clustering successfully pinpointed key biological features identified by manual histopathology, such as for regions of fibrosis and necrosis, as well as improved edge definition in EGFR-rich areas. Importantly, our combinatorial approach revealed crucial characteristics seen in histopathology that gene-expression-only analysis had missed.Supplemental Material: https://github.com/davcraig75/song_psb2014/blob/main/SupplementaryData.pdf.
{"title":"Enhancing Spatial Transcriptomics Analysis by Integrating Image-Aware Deep Learning Methods.","authors":"Jiarong Song, Josh Lamstein, Vivek Gopal Ramaswamy, Michelle Webb, Gabriel Zada, Steven Finkbeiner, David W Craig","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Spatial transcriptomics (ST) represents a pivotal advancement in biomedical research, enabling the transcriptional profiling of cells within their morphological context and providing a pivotal tool for understanding spatial heterogeneity in cancer tissues. However, current analytical approaches, akin to single-cell analysis, largely depend on gene expression, underutilizing the rich morphological information inherent in the tissue. We present a novel method integrating spatial transcriptomics and histopathological image data to better capture biologically meaningful patterns in patient data, focusing on aggressive cancer types such as glioblastoma and triple-negative breast cancer. We used a ResNet-based deep learning model to extract key morphological features from high-resolution whole-slide histology images. Spot-level PCA-reduced vectors of both the ResNet-50 analysis of the histological image and the spatial gene expression data were used in Louvain clustering to enable image-aware feature discovery. Assessment of features from image-aware clustering successfully pinpointed key biological features identified by manual histopathology, such as for regions of fibrosis and necrosis, as well as improved edge definition in EGFR-rich areas. Importantly, our combinatorial approach revealed crucial characteristics seen in histopathology that gene-expression-only analysis had missed.Supplemental Material: https://github.com/davcraig75/song_psb2014/blob/main/SupplementaryData.pdf.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kathleen M Cardone, Scott Dudek, Karl Keat, Yuki Bradford, Zinhle Cindi, Eric S Daar, Roy Gulick, Sharon A Riddler, Jeffrey L Lennox, Phumla Sinxadi, David W Haas, Marylyn D Ritchie
Access to safe and effective antiretroviral therapy (ART) is a cornerstone in the global response to the HIV pandemic. Among people living with HIV, there is considerable interindividual variability in absolute CD4 T-cell recovery following initiation of virally suppressive ART. The contribution of host genetics to this variability is not well understood. We explored the contribution of a polygenic score which was derived from large, publicly available summary statistics for absolute lymphocyte count from individuals in the general population (PGSlymph) due to a lack of publicly available summary statistics for CD4 T-cell count. We explored associations with baseline CD4 T-cell count prior to ART initiation (n=4959) and change from baseline to week 48 on ART (n=3274) among treatment-naïve participants in prospective, randomized ART studies of the AIDS Clinical Trials Group. We separately examined an African-ancestry-derived and a European-ancestry-derived PGSlymph, and evaluated their performance across all participants, and also in the African and European ancestral groups separately. Multivariate models that included PGSlymph, baseline plasma HIV-1 RNA, age, sex, and 15 principal components (PCs) of genetic similarity explained ∼26-27% of variability in baseline CD4 T-cell count, but PGSlymph accounted for <1% of this variability. Models that also included baseline CD4 T-cell count explained ∼7-9% of variability in CD4 T-cell count increase on ART, but PGSlymph accounted for <1% of this variability. In univariate analyses, PGSlymph was not significantly associated with baseline or change in CD4 T-cell count. Among individuals of African ancestry, the African PGSlymph term in the multivariate model was significantly associated with change in CD4 T-cell count while not significant in the univariate model. When applied to lymphocyte count in a general medical biobank population (Penn Medicine BioBank), PGSlymph explained ∼6-10% of variability in multivariate models (including age, sex, and PCs) but only ∼1% in univariate models. In summary, a lymphocyte count PGS derived from the general population was not consistently associated with CD4 T-cell recovery on ART. Nonetheless, adjusting for clinical covariates is quite important when estimating such polygenic effects.
获得安全有效的抗逆转录病毒疗法(ART)是全球应对艾滋病大流行的基石。在艾滋病病毒感染者中,开始接受病毒抑制性抗逆转录病毒疗法后,CD4 T 细胞的绝对恢复能力在个体间存在相当大的差异。宿主遗传学对这一变异性的贡献尚不十分清楚。由于缺乏可公开获得的 CD4 T 细胞计数汇总统计数据,我们对多基因评分的贡献进行了探讨,该评分来自可公开获得的大量普通人群(PGSlymph)绝对淋巴细胞计数汇总统计数据。我们探讨了艾滋病临床试验组(AIDS Clinical Trials Group)前瞻性随机抗逆转录病毒疗法(ART)研究中未接受过治疗的参与者中,抗逆转录病毒疗法开始前的 CD4 T 细胞计数基线(4959 人)和从基线到抗逆转录病毒疗法第 48 周的变化(3274 人)之间的关联。我们分别研究了非洲裔和欧洲裔的 PGSlymph,并评估了它们在所有参与者中的表现,以及在非洲裔和欧洲裔群体中的表现。包含 PGSlymph、基线血浆 HIV-1 RNA、年龄、性别和 15 个遗传相似性主成分 (PCs) 的多变量模型解释了基线 CD4 T 细胞计数变异的 26% 至 27%,而 PGSlymph 则解释了基线 CD4 T 细胞计数变异的 26% 至 27%。
{"title":"Lymphocyte Count Derived Polygenic Score and Interindividual Variability in CD4 T-cell Recovery in Response to Antiretroviral Therapy.","authors":"Kathleen M Cardone, Scott Dudek, Karl Keat, Yuki Bradford, Zinhle Cindi, Eric S Daar, Roy Gulick, Sharon A Riddler, Jeffrey L Lennox, Phumla Sinxadi, David W Haas, Marylyn D Ritchie","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Access to safe and effective antiretroviral therapy (ART) is a cornerstone in the global response to the HIV pandemic. Among people living with HIV, there is considerable interindividual variability in absolute CD4 T-cell recovery following initiation of virally suppressive ART. The contribution of host genetics to this variability is not well understood. We explored the contribution of a polygenic score which was derived from large, publicly available summary statistics for absolute lymphocyte count from individuals in the general population (PGSlymph) due to a lack of publicly available summary statistics for CD4 T-cell count. We explored associations with baseline CD4 T-cell count prior to ART initiation (n=4959) and change from baseline to week 48 on ART (n=3274) among treatment-naïve participants in prospective, randomized ART studies of the AIDS Clinical Trials Group. We separately examined an African-ancestry-derived and a European-ancestry-derived PGSlymph, and evaluated their performance across all participants, and also in the African and European ancestral groups separately. Multivariate models that included PGSlymph, baseline plasma HIV-1 RNA, age, sex, and 15 principal components (PCs) of genetic similarity explained ∼26-27% of variability in baseline CD4 T-cell count, but PGSlymph accounted for <1% of this variability. Models that also included baseline CD4 T-cell count explained ∼7-9% of variability in CD4 T-cell count increase on ART, but PGSlymph accounted for <1% of this variability. In univariate analyses, PGSlymph was not significantly associated with baseline or change in CD4 T-cell count. Among individuals of African ancestry, the African PGSlymph term in the multivariate model was significantly associated with change in CD4 T-cell count while not significant in the univariate model. When applied to lymphocyte count in a general medical biobank population (Penn Medicine BioBank), PGSlymph explained ∼6-10% of variability in multivariate models (including age, sex, and PCs) but only ∼1% in univariate models. In summary, a lymphocyte count PGS derived from the general population was not consistently associated with CD4 T-cell recovery on ART. Nonetheless, adjusting for clinical covariates is quite important when estimating such polygenic effects.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10764076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1142/9789811286421_0046
R. Kember, S. Verma, A. Verma, B. Xiao, Anastasia Lucas, Colleen M Kripke, R. Judy, Jinbo Chen, S. Damrauer, D. J. Rader, Marylyn D. Ritchie
Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.
{"title":"Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine.","authors":"R. Kember, S. Verma, A. Verma, B. Xiao, Anastasia Lucas, Colleen M Kripke, R. Judy, Jinbo Chen, S. Damrauer, D. J. Rader, Marylyn D. Ritchie","doi":"10.1142/9789811286421_0046","DOIUrl":"https://doi.org/10.1142/9789811286421_0046","url":null,"abstract":"Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1142/9789811286421_0010
Yixing Jiang, Jeremy Irvin, Andrew Y. Ng, James Zou
Lack of diagnosis coding is a barrier to leveraging veterinary notes for medical and public health research. Previous work is limited to develop specialized rule-based or customized supervised learning models to predict diagnosis coding, which is tedious and not easily transferable. In this work, we show that open-source large language models (LLMs) pretrained on general corpus can achieve reasonable performance in a zero-shot setting. Alpaca-7B can achieve a zero-shot F1 of 0.538 on CSU test data and 0.389 on PP test data, two standard benchmarks for coding from veterinary notes. Furthermore, with appropriate fine-tuning, the performance of LLMs can be substantially boosted, exceeding those of strong state-of-the-art supervised models. VetLLM, which is fine-tuned on Alpaca-7B using just 5000 veterinary notes, can achieve a F1 of 0.747 on CSU test data and 0.637 on PP test data. It is of note that our fine-tuning is data-efficient: using 200 notes can outperform supervised models trained with more than 100,000 notes. The findings demonstrate the great potential of leveraging LLMs for language processing tasks in medicine, and we advocate this new paradigm for processing clinical text.
{"title":"VetLLM: Large Language Model for Predicting Diagnosis from Veterinary Notes.","authors":"Yixing Jiang, Jeremy Irvin, Andrew Y. Ng, James Zou","doi":"10.1142/9789811286421_0010","DOIUrl":"https://doi.org/10.1142/9789811286421_0010","url":null,"abstract":"Lack of diagnosis coding is a barrier to leveraging veterinary notes for medical and public health research. Previous work is limited to develop specialized rule-based or customized supervised learning models to predict diagnosis coding, which is tedious and not easily transferable. In this work, we show that open-source large language models (LLMs) pretrained on general corpus can achieve reasonable performance in a zero-shot setting. Alpaca-7B can achieve a zero-shot F1 of 0.538 on CSU test data and 0.389 on PP test data, two standard benchmarks for coding from veterinary notes. Furthermore, with appropriate fine-tuning, the performance of LLMs can be substantially boosted, exceeding those of strong state-of-the-art supervised models. VetLLM, which is fine-tuned on Alpaca-7B using just 5000 veterinary notes, can achieve a F1 of 0.747 on CSU test data and 0.637 on PP test data. It is of note that our fine-tuning is data-efficient: using 200 notes can outperform supervised models trained with more than 100,000 notes. The findings demonstrate the great potential of leveraging LLMs for language processing tasks in medicine, and we advocate this new paradigm for processing clinical text.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1142/9789811286421_0022
Chinmaya U. Joisa, Kevin A Chen, Samantha Beville, T. Stuhlmiller, Matthew E. Berginski, Denis O Okumu, B. Golitz, M. East, Gary L Johnson, Shawn M Gomez
Protein kinases are a primary focus in targeted therapy development for cancer, owing to their role as regulators in nearly all areas of cell life. Recent strategies targeting the kinome with combination therapies have shown promise, such as trametinib and dabrafenib in advanced melanoma, but empirical design for less characterized pathways remains a challenge. Computational combination screening is an attractive alternative, allowing in-silico filtering prior to experimental testing of drastically fewer leads, increasing efficiency and effectiveness of drug development pipelines. In this work, we generated combined kinome inhibition states of 40,000 kinase inhibitor combinations from kinobeads-based kinome profiling across 64 doses. We then integrated these with transcriptomics from CCLE to build machine learning models with elastic-net feature selection to predict cell line sensitivity across nine cancer types, with accuracy R2 ∼ 0.75-0.9. We then validated the model by using a PDX-derived TNBC cell line and saw good global accuracy (R2 ∼ 0.7) as well as high accuracy in predicting synergy using four popular metrics (R2 ∼ 0.9). Additionally, the model was able to predict a highly synergistic combination of trametinib and omipalisib for TNBC treatment, which incidentally was recently in phase I clinical trials. Our choice of tree-based models for greater interpretability allowed interrogation of highly predictive kinases in each cancer type, such as the MAPK, CDK, and STK kinases. Overall, these results suggest that kinome inhibition states of kinase inhibitor combinations are strongly predictive of cell line responses and have great potential for integration into computational drug screening pipelines. This approach may facilitate the identification of effective kinase inhibitor combinations and accelerate the development of novel cancer therapies, ultimately improving patient outcomes.
{"title":"Combined kinome inhibition states are predictive of cancer cell line sensitivity to kinase inhibitor combination therapies.","authors":"Chinmaya U. Joisa, Kevin A Chen, Samantha Beville, T. Stuhlmiller, Matthew E. Berginski, Denis O Okumu, B. Golitz, M. East, Gary L Johnson, Shawn M Gomez","doi":"10.1142/9789811286421_0022","DOIUrl":"https://doi.org/10.1142/9789811286421_0022","url":null,"abstract":"Protein kinases are a primary focus in targeted therapy development for cancer, owing to their role as regulators in nearly all areas of cell life. Recent strategies targeting the kinome with combination therapies have shown promise, such as trametinib and dabrafenib in advanced melanoma, but empirical design for less characterized pathways remains a challenge. Computational combination screening is an attractive alternative, allowing in-silico filtering prior to experimental testing of drastically fewer leads, increasing efficiency and effectiveness of drug development pipelines. In this work, we generated combined kinome inhibition states of 40,000 kinase inhibitor combinations from kinobeads-based kinome profiling across 64 doses. We then integrated these with transcriptomics from CCLE to build machine learning models with elastic-net feature selection to predict cell line sensitivity across nine cancer types, with accuracy R2 ∼ 0.75-0.9. We then validated the model by using a PDX-derived TNBC cell line and saw good global accuracy (R2 ∼ 0.7) as well as high accuracy in predicting synergy using four popular metrics (R2 ∼ 0.9). Additionally, the model was able to predict a highly synergistic combination of trametinib and omipalisib for TNBC treatment, which incidentally was recently in phase I clinical trials. Our choice of tree-based models for greater interpretability allowed interrogation of highly predictive kinases in each cancer type, such as the MAPK, CDK, and STK kinases. Overall, these results suggest that kinome inhibition states of kinase inhibitor combinations are strongly predictive of cell line responses and have great potential for integration into computational drug screening pipelines. This approach may facilitate the identification of effective kinase inhibitor combinations and accelerate the development of novel cancer therapies, ultimately improving patient outcomes.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1142/9789811286421_0017
Yi Yang, Han Xie, Hejie Cui, †. CarlYang
Recent advancements in neuroimaging techniques have sparked a growing interest in understanding the complex interactions between anatomical regions of interest (ROIs), forming into brain networks that play a crucial role in various clinical tasks, such as neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have emerged as powerful tools for analyzing network data. However, due to the complexity of data acquisition and regulatory restrictions, brain network studies remain limited in scale and are often confined to local institutions. These limitations greatly challenge GNN models to capture useful neural circuitry patterns and deliver robust downstream performance. As a distributed machine learning paradigm, federated learning (FL) provides a promising solution in addressing resource limitation and privacy concerns, by enabling collaborative learning across local institutions (i.e., clients) without data sharing. While the data heterogeneity issues have been extensively studied in recent FL literature, cross-institutional brain network analysis presents unique data heterogeneity challenges, that is, the inconsistent ROI parcellation systems and varying predictive neural circuitry patterns across local neuroimaging studies. To this end, we propose FedBrain, a GNN-based personalized FL framework that takes into account the unique properties of brain network data. Specifically, we present a federated atlas mapping mechanism to overcome the feature and structure heterogeneity of brain networks arising from different ROI atlas systems, and a clustering approach guided by clinical prior knowledge to address varying predictive neural circuitry patterns regarding different patient groups, neuroimaging modalities and clinical outcomes. Compared to existing FL strategies, our approach demonstrates superior and more consistent performance, showcasing its strong potential and generalizability in cross-institutional connectome-based brain imaging analysis. The implementation is available here.
神经成像技术的最新进展引发了人们对了解解剖学感兴趣区(ROIs)之间复杂相互作用的日益浓厚的兴趣,这些相互作用形成的大脑网络在神经模式发现和疾病诊断等各种临床任务中发挥着至关重要的作用。近年来,图神经网络(GNN)已成为分析网络数据的强大工具。然而,由于数据采集的复杂性和监管限制,脑网络研究的规模仍然有限,而且往往局限于本地机构。这些限制极大地挑战了 GNN 模型捕捉有用神经回路模式并提供稳健下游性能的能力。作为一种分布式机器学习范例,联合学习(FL)提供了一种很有前景的解决方案,它能在不共享数据的情况下,实现本地机构(即客户)之间的协作学习,从而解决资源限制和隐私问题。虽然数据异构问题已在最近的联合学习文献中得到了广泛研究,但跨机构脑网络分析面临着独特的数据异构挑战,即本地神经影像研究中不一致的 ROI 剖分系统和不同的预测神经回路模式。为此,我们提出了基于 GNN 的个性化 FL 框架 FedBrain,该框架考虑到了脑网络数据的独特属性。具体来说,我们提出了一种联合图集映射机制,以克服不同 ROI 图集系统产生的脑网络特征和结构异质性,并提出了一种以临床先验知识为指导的聚类方法,以解决不同患者群体、神经成像模式和临床结果的不同预测神经回路模式。与现有的 FL 策略相比,我们的方法表现出更优越、更稳定的性能,展示了其在跨机构基于连接体的脑成像分析中的强大潜力和通用性。具体实施请点击此处。
{"title":"FedBrain: Federated Training of Graph Neural Networks for Connectome-based Brain Imaging Analysis.","authors":"Yi Yang, Han Xie, Hejie Cui, †. CarlYang","doi":"10.1142/9789811286421_0017","DOIUrl":"https://doi.org/10.1142/9789811286421_0017","url":null,"abstract":"Recent advancements in neuroimaging techniques have sparked a growing interest in understanding the complex interactions between anatomical regions of interest (ROIs), forming into brain networks that play a crucial role in various clinical tasks, such as neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have emerged as powerful tools for analyzing network data. However, due to the complexity of data acquisition and regulatory restrictions, brain network studies remain limited in scale and are often confined to local institutions. These limitations greatly challenge GNN models to capture useful neural circuitry patterns and deliver robust downstream performance. As a distributed machine learning paradigm, federated learning (FL) provides a promising solution in addressing resource limitation and privacy concerns, by enabling collaborative learning across local institutions (i.e., clients) without data sharing. While the data heterogeneity issues have been extensively studied in recent FL literature, cross-institutional brain network analysis presents unique data heterogeneity challenges, that is, the inconsistent ROI parcellation systems and varying predictive neural circuitry patterns across local neuroimaging studies. To this end, we propose FedBrain, a GNN-based personalized FL framework that takes into account the unique properties of brain network data. Specifically, we present a federated atlas mapping mechanism to overcome the feature and structure heterogeneity of brain networks arising from different ROI atlas systems, and a clustering approach guided by clinical prior knowledge to address varying predictive neural circuitry patterns regarding different patient groups, neuroimaging modalities and clinical outcomes. Compared to existing FL strategies, our approach demonstrates superior and more consistent performance, showcasing its strong potential and generalizability in cross-institutional connectome-based brain imaging analysis. The implementation is available here.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1142/9789811286421_0041
Kiyoshi Ferreira Fukutani, Thomas H. Hampton, Carly A. Bobak, Todd A. MacKenzie, Bruce A. Stanton
The availability of multiple publicly-available datasets studying the same phenomenon has the promise of accelerating scientific discovery. Meta-analysis can address issues of reproducibility and often increase power. The promise of meta-analysis is especially germane to rarer diseases like cystic fibrosis (CF), which affects roughly 100,000 people worldwide. A recent search of the National Institute of Health's Gene Expression Omnibus revealed 1.3 million data sets related to cancer compared to about 2,000 related to CF. These studies are highly diverse, involving different tissues, animal models, treatments, and clinical covariates. In our search for gene expression studies of primary human airway epithelial cells, we identified three studies with compatible methodologies and sufficient metadata: GSE139078, Sala Study, and PRJEB9292. Even so, experimental designs were not identical, and we identified significant batch effects that would have complicated functional analysis. Here we present quantile discretization and Bayesian network construction using the Hill climb method as a powerful tool to overcome experimental differences and reveal biologically relevant responses to the CF genotype itself, exposure to virus, bacteria, and drugs used to treat CF. Functional patterns revealed by cluster Profiler included interferon signaling, interferon gamma signaling, interleukins 4 and 13 signaling, interleukin 6 signaling, interleukin 21 signaling, and inactivation of CSF3/G-CSF signaling pathways showing significant alterations. These pathways were consistently associated with higher gene expression in CF epithelial cells compared to non-CF cells, suggesting that targeting these pathways could improve clinical outcomes. The success of quantile discretization and Bayesian network analysis in the context of CF suggests that these approaches might be applicable to other contexts where exactly comparable data sets are hard to find.
研究同一现象的多个公开数据集的可用性有望加速科学发现。荟萃分析可以解决可重复性问题,通常还能提高研究效率。荟萃分析的前景对于囊性纤维化(CF)等罕见疾病尤为重要,全世界约有 10 万人患有囊性纤维化。最近对美国国立卫生研究院基因表达总库的搜索显示,与癌症有关的数据集有130万个,而与囊性纤维化有关的数据集只有约2000个。这些研究非常多样化,涉及不同的组织、动物模型、治疗方法和临床协变量。在搜索原代人类气道上皮细胞的基因表达研究时,我们发现了三项方法兼容、元数据充分的研究:GSE139078、Sala Study 和 PRJEB9292。尽管如此,实验设计并不完全相同,而且我们还发现了显著的批次效应,这将使功能分析变得更加复杂。在这里,我们介绍了使用希尔爬坡法进行量化离散化和贝叶斯网络构建的方法,它是克服实验差异并揭示 CF 基因型本身、暴露于病毒、细菌和用于治疗 CF 的药物的生物相关反应的有力工具。集群剖析器揭示的功能模式包括干扰素信号传导、γ干扰素信号传导、白细胞介素4和13信号传导、白细胞介素6信号传导、白细胞介素21信号传导,以及CSF3/G-CSF信号传导通路的失活,显示出显著的变化。与非CF细胞相比,这些通路始终与CF上皮细胞中较高的基因表达相关,这表明以这些通路为靶点可改善临床疗效。量子离散化和贝叶斯网络分析在CF方面的成功表明,这些方法可能适用于其他难以找到完全可比数据集的情况。
{"title":"APPLICATION OF QUANTILE DISCRETIZATION AND BAYESIAN NETWORK ANALYSIS TO PUBLICLY AVAILABLE CYSTIC FIBROSIS DATA SETS.","authors":"Kiyoshi Ferreira Fukutani, Thomas H. Hampton, Carly A. Bobak, Todd A. MacKenzie, Bruce A. Stanton","doi":"10.1142/9789811286421_0041","DOIUrl":"https://doi.org/10.1142/9789811286421_0041","url":null,"abstract":"The availability of multiple publicly-available datasets studying the same phenomenon has the promise of accelerating scientific discovery. Meta-analysis can address issues of reproducibility and often increase power. The promise of meta-analysis is especially germane to rarer diseases like cystic fibrosis (CF), which affects roughly 100,000 people worldwide. A recent search of the National Institute of Health's Gene Expression Omnibus revealed 1.3 million data sets related to cancer compared to about 2,000 related to CF. These studies are highly diverse, involving different tissues, animal models, treatments, and clinical covariates. In our search for gene expression studies of primary human airway epithelial cells, we identified three studies with compatible methodologies and sufficient metadata: GSE139078, Sala Study, and PRJEB9292. Even so, experimental designs were not identical, and we identified significant batch effects that would have complicated functional analysis. Here we present quantile discretization and Bayesian network construction using the Hill climb method as a powerful tool to overcome experimental differences and reveal biologically relevant responses to the CF genotype itself, exposure to virus, bacteria, and drugs used to treat CF. Functional patterns revealed by cluster Profiler included interferon signaling, interferon gamma signaling, interleukins 4 and 13 signaling, interleukin 6 signaling, interleukin 21 signaling, and inactivation of CSF3/G-CSF signaling pathways showing significant alterations. These pathways were consistently associated with higher gene expression in CF epithelial cells compared to non-CF cells, suggesting that targeting these pathways could improve clinical outcomes. The success of quantile discretization and Bayesian network analysis in the context of CF suggests that these approaches might be applicable to other contexts where exactly comparable data sets are hard to find.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1142/9789811286421_0011
Milos Vukadinovic, Gauri Renjith, Victoria Yuan, Alan Kwan, Susan C. Cheng, Debiao Li, Shoa L. Clarke, David Ouyang
Recent research has effectively used quantitative traits from imaging to boost the capabilities of genome-wide association studies (GWAS), providing further understanding of disease biology and various traits. However, it's important to note that phenotyping inherently carries measurement error and noise that could influence subsequent genetic analyses. The study focused on left ventricular ejection fraction (LVEF), a vital yet potentially inaccurate quantitative measurement, to investigate how imprecision in phenotype measurement affects genetic studies. Several methods of acquiring LVEF, along with simulating measurement noise, were assessed for their effects on ensuing genetic analyses. The results showed that by introducing just 7.9% of measurement noise, all genetic associations in an LVEF GWAS with almost forty thousand individuals could be eliminated. Moreover, a 1% increase in mean absolute error (MAE) in LVEF had an effect equivalent to a 10% reduction in the sample size of the cohort on the power of GWAS. Therefore, enhancing the accuracy of phenotyping is crucial to maximize the effectiveness of genome-wide association studies.
{"title":"Impact of Measurement Noise on Genetic Association Studies of Cardiac Function.","authors":"Milos Vukadinovic, Gauri Renjith, Victoria Yuan, Alan Kwan, Susan C. Cheng, Debiao Li, Shoa L. Clarke, David Ouyang","doi":"10.1142/9789811286421_0011","DOIUrl":"https://doi.org/10.1142/9789811286421_0011","url":null,"abstract":"Recent research has effectively used quantitative traits from imaging to boost the capabilities of genome-wide association studies (GWAS), providing further understanding of disease biology and various traits. However, it's important to note that phenotyping inherently carries measurement error and noise that could influence subsequent genetic analyses. The study focused on left ventricular ejection fraction (LVEF), a vital yet potentially inaccurate quantitative measurement, to investigate how imprecision in phenotype measurement affects genetic studies. Several methods of acquiring LVEF, along with simulating measurement noise, were assessed for their effects on ensuing genetic analyses. The results showed that by introducing just 7.9% of measurement noise, all genetic associations in an LVEF GWAS with almost forty thousand individuals could be eliminated. Moreover, a 1% increase in mean absolute error (MAE) in LVEF had an effect equivalent to a 10% reduction in the sample size of the cohort on the power of GWAS. Therefore, enhancing the accuracy of phenotyping is crucial to maximize the effectiveness of genome-wide association studies.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1142/9789811286421_0018
Megan M. Shuey, J. Hellwege, Nikhil Khankari, Marijana Vujkovic, Todd L. Edwards
This PSB 2024 session discusses the many broad biological, computational, and statistical approaches currently being used for therapeutic drug target identification and repurposing of existing treatments. Drug repurposing efforts have the potential to dramatically improve the treatment landscape by more rapidly identifying drug targets and alternative strategies for untreated or poorly managed diseases. The overarching theme for this session is the use and integration of real-world data to identify drug-disease pairs with potential therapeutic use. These drug-disease pairs may be identified through genomic, proteomic, biomarkers, protein interaction analyses, electronic health records, and chemical profiling. Taken together, this session combines novel applications of methods and innovative modeling strategies with diverse real-world data to suggest new pharmaceutical treatments for human diseases.
{"title":"Session Introduction: Drug-repurposing and discovery in the era of \"big\" real-world data: how the incorporation of observational data, genetics, and other -omic technologies can move us forward.","authors":"Megan M. Shuey, J. Hellwege, Nikhil Khankari, Marijana Vujkovic, Todd L. Edwards","doi":"10.1142/9789811286421_0018","DOIUrl":"https://doi.org/10.1142/9789811286421_0018","url":null,"abstract":"This PSB 2024 session discusses the many broad biological, computational, and statistical approaches currently being used for therapeutic drug target identification and repurposing of existing treatments. Drug repurposing efforts have the potential to dramatically improve the treatment landscape by more rapidly identifying drug targets and alternative strategies for untreated or poorly managed diseases. The overarching theme for this session is the use and integration of real-world data to identify drug-disease pairs with potential therapeutic use. These drug-disease pairs may be identified through genomic, proteomic, biomarkers, protein interaction analyses, electronic health records, and chemical profiling. Taken together, this session combines novel applications of methods and innovative modeling strategies with diverse real-world data to suggest new pharmaceutical treatments for human diseases.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1142/9789811286421_0049
D. Martschenko, Nicole Martinez-Martin, Meghan Halley
The following sections are included:Workshop DescriptionLearning ObjectivesPresenter InformationAbout the Workshop OrganizersPresentationsSpeaker Presentations.
包括以下部分:研讨会简介学习目标主讲人信息关于研讨会组织者演讲人演讲。
{"title":"Practical Approaches to Enhancing Fairness, Social Responsibility and the Inclusion of Diverse Viewpoints in Biomedicine.","authors":"D. Martschenko, Nicole Martinez-Martin, Meghan Halley","doi":"10.1142/9789811286421_0049","DOIUrl":"https://doi.org/10.1142/9789811286421_0049","url":null,"abstract":"The following sections are included:Workshop DescriptionLearning ObjectivesPresenter InformationAbout the Workshop OrganizersPresentationsSpeaker Presentations.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139176615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}