Lindsay A Guare, Jagyashila Das, Lannawill Caruth, Shefali Setia-Verma
Women's health conditions are influenced by both genetic and environmental factors. Understanding these factors individually and their interactions is crucial for implementing preventative, personalized medicine. However, since genetics and environmental exposures, particularly social determinants of health (SDoH), are correlated with race and ancestry, risk models without careful consideration of these measures can exacerbate health disparities. We focused on seven women's health disorders in the All of Us Research Program: breast cancer, cervical cancer, endometriosis, ovarian cancer, preeclampsia, uterine cancer, and uterine fibroids. We computed polygenic risk scores (PRSs) from publicly available weights and tested the effect of the PRSs on their respective phenotypes as well as any effects of genetic risk on age at diagnosis. We next tested the effects of environmental risk factors (BMI, lifestyle measures, and SDoH) on age at diagnosis. Finally, we examined the impact of environmental exposures in modulating genetic risk by stratified logistic regressions for different tertiles of the environment variables, comparing the effect size of the PRS. Of the twelve sets of weights for the seven conditions, nine were significantly and positively associated with their respective phenotypes. None of the PRSs was associated with different ages at diagnoses in the time-to-event analyses. The highest environmental risk group tended to be diagnosed earlier than the low and medium-risk groups. For example, the cases of breast cancer, ovarian cancer, uterine cancer, and uterine fibroids in highest BMI tertile were diagnosed significantly earlier than the low and medium BMI groups, respectively). PRS regression coefficients were often the largest in the highest environment risk groups, showing increased susceptibility to genetic risk. This study's strengths include the diversity of the All of Us study cohort, the consideration of SDoH themes, and the examination of key risk factors and their interrelationships. These elements collectively underscore the importance of integrating genetic and environmental data to develop more precise risk models, enhance personalized medicine, and ultimately reduce health disparities.
{"title":"Social Determinants of Health and Lifestyle Risk Factors Modulate Genetic Susceptibility for Women's Health Outcomes.","authors":"Lindsay A Guare, Jagyashila Das, Lannawill Caruth, Shefali Setia-Verma","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Women's health conditions are influenced by both genetic and environmental factors. Understanding these factors individually and their interactions is crucial for implementing preventative, personalized medicine. However, since genetics and environmental exposures, particularly social determinants of health (SDoH), are correlated with race and ancestry, risk models without careful consideration of these measures can exacerbate health disparities. We focused on seven women's health disorders in the All of Us Research Program: breast cancer, cervical cancer, endometriosis, ovarian cancer, preeclampsia, uterine cancer, and uterine fibroids. We computed polygenic risk scores (PRSs) from publicly available weights and tested the effect of the PRSs on their respective phenotypes as well as any effects of genetic risk on age at diagnosis. We next tested the effects of environmental risk factors (BMI, lifestyle measures, and SDoH) on age at diagnosis. Finally, we examined the impact of environmental exposures in modulating genetic risk by stratified logistic regressions for different tertiles of the environment variables, comparing the effect size of the PRS. Of the twelve sets of weights for the seven conditions, nine were significantly and positively associated with their respective phenotypes. None of the PRSs was associated with different ages at diagnoses in the time-to-event analyses. The highest environmental risk group tended to be diagnosed earlier than the low and medium-risk groups. For example, the cases of breast cancer, ovarian cancer, uterine cancer, and uterine fibroids in highest BMI tertile were diagnosed significantly earlier than the low and medium BMI groups, respectively). PRS regression coefficients were often the largest in the highest environment risk groups, showing increased susceptibility to genetic risk. This study's strengths include the diversity of the All of Us study cohort, the consideration of SDoH themes, and the examination of key risk factors and their interrelationships. These elements collectively underscore the importance of integrating genetic and environmental data to develop more precise risk models, enhance personalized medicine, and ultimately reduce health disparities.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"296-313"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexis T Akerele, Jacqueline A Piekos, Jeewoo Kim, Nikhil K Khankari, Jacklyn N Hellwege, Todd L Edwards, Digna R Velez Edwards
Uterine leiomyomata (fibroids, UFs) are common, benign tumors in females, having an estimated prevalence of up to 80%. They are fibrous masses growing within the myometrium leading to chronic symptoms like dysmenorrhea, abnormal uterine bleeding, anemia, severe pelvic pain, and infertility. Hypertension (HTN) is a common risk factor for UFs, though less prevalent in premenopausal individuals. While observational studies have indicated strong associations between UFs and HTN, the biological mechanisms linking the two conditions remain unclear. Understanding the relationship between HTN and UFs is crucial because UFs and HTN lead to substantial comorbidities adversely impacting female health. Identifying the common underlying biological mechanisms can improve treatment strategies for both conditions. To clarify the genetic and causal relationships between UFs and BP, we conducted a bidirectional, two-sample Mendelian randomization (MR) analysis and evaluated the genetic correlations across BP traits and UFs. We used data from a multi-ancestry genome-wide association study (GWAS) meta-analysis of UFs (44,205 cases and 356,552 controls), and data from a cross-ancestry GWAS meta-analysis of BP phenotypes (diastolic BP [DBP], systolic BP [SBP], and pulse pressure [PP], N=447,758). We evaluated genetic correlation of BP phenotypes and UFs with linkage disequilibrium score regression (LDSC). LDSC results indicated a positive genetic correlation between DBP and UFs (Rg=0.132, p<5.0x10-5), and SBP and UFs (Rg=0.063, p<2.5x10-2). MR using UFs as the exposure and BP traits as outcomes indicated a relationship where UFs increases DBP (odds ratio [OR]=1.20, p<2.7x10-3). Having BP traits as exposures and UFs as the outcome showed that DBP and SBP increase risk for UFs (OR =1.04, p<2.2x10-3; OR=1.00, p<4.0x10-2; respectively). Our results provide evidence of shared genetic architecture and pleiotropy between HTN and UFs, suggesting common biological pathways driving their etiologies. Based on these findings, DBP appears to be a stronger risk factor for UFs compared to SBP and PP.
子宫平滑肌瘤(肌瘤,UFs)是女性常见的良性肿瘤,估计患病率高达80%。它们是生长在子宫肌层内的纤维团块,导致慢性症状,如痛经、子宫异常出血、贫血、严重盆腔疼痛和不孕症。高血压(HTN)是UFs的常见危险因素,尽管在绝经前个体中不太普遍。虽然观察性研究表明UFs和HTN之间存在很强的联系,但将这两种情况联系起来的生物学机制仍不清楚。了解HTN和UFs之间的关系至关重要,因为UFs和HTN会导致大量合并症,对女性健康产生不利影响。确定共同的潜在生物学机制可以改善这两种疾病的治疗策略。为了明确UFs与BP之间的遗传和因果关系,我们进行了双向、双样本孟德尔随机化(MR)分析,并评估了BP性状与UFs之间的遗传相关性。我们使用了来自UFs(44,205例和356,552例对照)的多祖先全基因组关联研究(GWAS)荟萃分析数据,以及来自BP表型(舒张压[DBP]、收缩压[SBP]和脉压[PP], N=447,758)的跨祖先GWAS荟萃分析数据。我们用连锁不平衡评分回归(LDSC)评估了BP表型和UFs的遗传相关性。LDSC结果显示DBP与UFs呈正遗传相关(Rg=0.132, p
{"title":"Uterine fibroids show evidence of shared genetic architecture with blood pressure traits.","authors":"Alexis T Akerele, Jacqueline A Piekos, Jeewoo Kim, Nikhil K Khankari, Jacklyn N Hellwege, Todd L Edwards, Digna R Velez Edwards","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Uterine leiomyomata (fibroids, UFs) are common, benign tumors in females, having an estimated prevalence of up to 80%. They are fibrous masses growing within the myometrium leading to chronic symptoms like dysmenorrhea, abnormal uterine bleeding, anemia, severe pelvic pain, and infertility. Hypertension (HTN) is a common risk factor for UFs, though less prevalent in premenopausal individuals. While observational studies have indicated strong associations between UFs and HTN, the biological mechanisms linking the two conditions remain unclear. Understanding the relationship between HTN and UFs is crucial because UFs and HTN lead to substantial comorbidities adversely impacting female health. Identifying the common underlying biological mechanisms can improve treatment strategies for both conditions. To clarify the genetic and causal relationships between UFs and BP, we conducted a bidirectional, two-sample Mendelian randomization (MR) analysis and evaluated the genetic correlations across BP traits and UFs. We used data from a multi-ancestry genome-wide association study (GWAS) meta-analysis of UFs (44,205 cases and 356,552 controls), and data from a cross-ancestry GWAS meta-analysis of BP phenotypes (diastolic BP [DBP], systolic BP [SBP], and pulse pressure [PP], N=447,758). We evaluated genetic correlation of BP phenotypes and UFs with linkage disequilibrium score regression (LDSC). LDSC results indicated a positive genetic correlation between DBP and UFs (Rg=0.132, p<5.0x10-5), and SBP and UFs (Rg=0.063, p<2.5x10-2). MR using UFs as the exposure and BP traits as outcomes indicated a relationship where UFs increases DBP (odds ratio [OR]=1.20, p<2.7x10-3). Having BP traits as exposures and UFs as the outcome showed that DBP and SBP increase risk for UFs (OR =1.04, p<2.2x10-3; OR=1.00, p<4.0x10-2; respectively). Our results provide evidence of shared genetic architecture and pleiotropy between HTN and UFs, suggesting common biological pathways driving their etiologies. Based on these findings, DBP appears to be a stronger risk factor for UFs compared to SBP and PP.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"281-295"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gokul Srinivasan, Matthew J Davis, Matthew R LeBoeuf, Michael Fatemi, Zarif L Azher, Yunrui Lu, Alos B Diallo, Marietta K Saldias Montivero, Fred W Kolling, Laurent Perrard, Lucas A Salas, Brock C Christensen, Thomas J Palys, Margaret R Karagas, Scott M Palisoul, Gregory J Tsongalis, Louis J Vaickus, Sarah M Preum, Joshua J Levy
The advent of spatial transcriptomics technologies has heralded a renaissance in research to advance our understanding of the spatial cellular and transcriptional heterogeneity within tissues. Spatial transcriptomics allows investigation of the interplay between cells, molecular pathways, and the surrounding tissue architecture and can help elucidate developmental trajectories, disease pathogenesis, and various niches in the tumor microenvironment. Photoaging is the histological and molecular skin damage resulting from chronic/acute sun exposure and is a major risk factor for skin cancer. Spatial transcriptomics technologies hold promise for improving the reliability of evaluating photoaging and developing new therapeutics. Challenges to current methods include limited focus on dermal elastosis variations and reliance on self-reported measures, which can introduce subjectivity and inconsistency. Spatial transcriptomics offers an opportunity to assess photoaging objectively and reproducibly in studies of carcinogenesis and discern the effectiveness of therapies that intervene in photoaging and preventing cancer. Evaluation of distinct histological architectures using highly-multiplexed spatial technologies can identify specific cell lineages that have been understudied due to their location beyond the depth of UV penetration. However, the cost and interpatient variability using state-of-the-art assays such as the 10x Genomics Spatial Transcriptomics assays limits the scope and scale of large-scale molecular epidemiologic studies. Here, we investigate the inference of spatial transcriptomics information from routine hematoxylin and eosin-stained (H&E) tissue slides. We employed the Visium CytAssist spatial transcriptomics assay to analyze over 18,000 genes at a 50-micron resolution for four patients from a cohort of 261 skin specimens collected adjacent to surgical resection sites for basal cell and squamous cell keratinocyte tumors. The spatial transcriptomics data was co-registered with 40x resolution whole slide imaging (WSI) information. We developed machine learning models that achieved a macro-averaged median AUC and F1 score of 0.80 and 0.61 and Spearman coefficient of 0.60 in inferring transcriptomic profiles across the slides, and accurately captured biological pathways across various tissue architectures.
{"title":"Potential to Enhance Large Scale Molecular Assessments of Skin Photoaging through Virtual Inference of Spatial Transcriptomics from Routine Staining.","authors":"Gokul Srinivasan, Matthew J Davis, Matthew R LeBoeuf, Michael Fatemi, Zarif L Azher, Yunrui Lu, Alos B Diallo, Marietta K Saldias Montivero, Fred W Kolling, Laurent Perrard, Lucas A Salas, Brock C Christensen, Thomas J Palys, Margaret R Karagas, Scott M Palisoul, Gregory J Tsongalis, Louis J Vaickus, Sarah M Preum, Joshua J Levy","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The advent of spatial transcriptomics technologies has heralded a renaissance in research to advance our understanding of the spatial cellular and transcriptional heterogeneity within tissues. Spatial transcriptomics allows investigation of the interplay between cells, molecular pathways, and the surrounding tissue architecture and can help elucidate developmental trajectories, disease pathogenesis, and various niches in the tumor microenvironment. Photoaging is the histological and molecular skin damage resulting from chronic/acute sun exposure and is a major risk factor for skin cancer. Spatial transcriptomics technologies hold promise for improving the reliability of evaluating photoaging and developing new therapeutics. Challenges to current methods include limited focus on dermal elastosis variations and reliance on self-reported measures, which can introduce subjectivity and inconsistency. Spatial transcriptomics offers an opportunity to assess photoaging objectively and reproducibly in studies of carcinogenesis and discern the effectiveness of therapies that intervene in photoaging and preventing cancer. Evaluation of distinct histological architectures using highly-multiplexed spatial technologies can identify specific cell lineages that have been understudied due to their location beyond the depth of UV penetration. However, the cost and interpatient variability using state-of-the-art assays such as the 10x Genomics Spatial Transcriptomics assays limits the scope and scale of large-scale molecular epidemiologic studies. Here, we investigate the inference of spatial transcriptomics information from routine hematoxylin and eosin-stained (H&E) tissue slides. We employed the Visium CytAssist spatial transcriptomics assay to analyze over 18,000 genes at a 50-micron resolution for four patients from a cohort of 261 skin specimens collected adjacent to surgical resection sites for basal cell and squamous cell keratinocyte tumors. The spatial transcriptomics data was co-registered with 40x resolution whole slide imaging (WSI) information. We developed machine learning models that achieved a macro-averaged median AUC and F1 score of 0.80 and 0.61 and Spearman coefficient of 0.60 in inferring transcriptomic profiles across the slides, and accurately captured biological pathways across various tissue architectures.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"477-491"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10813837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacqueline A Piekos, Jeewoo Kim, Jacob M Keaton, Jacklyn N Hellwege, Todd L Edwards, Digna R Velez Edwards
There is a desire in research to move away from the concept of race as a clinical factor because it is a societal construct used as an imprecise proxy for geographic ancestry. In this study, we leverage the biobank from Vanderbilt University Medical Center, BioVU, to investigate relationships between genetic ancestry proportion and the clinical phenome. For all samples in BioVU, we calculated six ancestry proportions based on 1000 Genomes references: eastern African (EAFR), western African (WAFR), northern European (NEUR), southern European (SEUR), eastern Asian (EAS), and southern Asian (SAS). From PheWAS, we found phecode categories significantly enriched neoplasms for EAFR, WAFR, and SEUR, and pregnancy complication in SEUR, NEUR, SAS, and EAS (p < 0.003). We then selected phenotypes hypertension (HTN) and atrial fibrillation (AFib) to further investigate the relationships between these phenotypes and EAFR, WAFR, SEUR, and NEUR using logistic regression modeling and non-linear restricted cubic spline modeling (RCS). For EAS and SAS, we chose renal failure (RF) for further modeling. The relationships between HTN and AFib and the ancestries EAFR, WAFR, and SEUR were best fit by the linear model (beta p < 1x10-4 for all) while the relationships with NEUR were best fit with RCS (HTN ANOVA p = 0.001, AFib ANOVA p < 1x10-4). For RF, the relationship with SAS was best fit with a linear model (beta p < 1x10-4) while RCS model was a better fit for EAS (ANOVA p < 1x10-4). In this study, we identify relationships between genetic ancestry and phenotypes that are best fit with non-linear modeling techniques. The assumption of linearity for regression modeling is integral for proper fitting of a model and there is no knowing a priori to modeling if the relationship is truly linear.
在研究中,人们希望摒弃将种族作为临床因素的概念,因为种族是一种社会结构,被用作地理血统的不精确替代物。在本研究中,我们利用范德比尔特大学医学中心的生物库(BioVU)来研究遗传血统比例与临床表型之间的关系。对于 BioVU 的所有样本,我们根据《1000 基因组》参考文献计算了六种祖先比例:非洲东部(EAFR)、非洲西部(WAFR)、欧洲北部(NEUR)、欧洲南部(SEUR)、亚洲东部(EAS)和亚洲南部(SAS)。从 PheWAS 中,我们发现在 EAFR、WAFR 和 SEUR 中,phecode 类别显著富集肿瘤;在 SEUR、NEUR、SAS 和 EAS 中,显著富集妊娠并发症(p < 0.003)。然后,我们选择了表型高血压(HTN)和心房颤动(AFib),使用逻辑回归模型和非线性限制立方样条模型(RCS)进一步研究这些表型与 EAFR、WAFR、SEUR 和 NEUR 之间的关系。对于 EAS 和 SAS,我们选择肾衰竭(RF)进行进一步建模。线性模型最符合高血压和心房颤动与祖先 EAFR、WAFR 和 SEUR 之间的关系(所有模型的贝塔值 p < 1x10-4),而 RCS 最符合与 NEUR 之间的关系(高血压方差分析 p = 0.001,心房颤动方差分析 p < 1x10-4)。就 RF 而言,线性模型最符合与 SAS 的关系(β p < 1x10-4),而 RCS 模型更符合与 EAS 的关系(方差分析 p < 1x10-4)。在这项研究中,我们确定了非线性建模技术最适合的遗传血统与表型之间的关系。回归建模的线性假设是正确拟合模型不可或缺的条件,而且在建模之前无法知道两者之间是否真的存在线性关系。
{"title":"EVALUATING THE RELATIONSHIPS BETWEEN GENETIC ANCESTRY AND THE CLINICAL PHENOME.","authors":"Jacqueline A Piekos, Jeewoo Kim, Jacob M Keaton, Jacklyn N Hellwege, Todd L Edwards, Digna R Velez Edwards","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>There is a desire in research to move away from the concept of race as a clinical factor because it is a societal construct used as an imprecise proxy for geographic ancestry. In this study, we leverage the biobank from Vanderbilt University Medical Center, BioVU, to investigate relationships between genetic ancestry proportion and the clinical phenome. For all samples in BioVU, we calculated six ancestry proportions based on 1000 Genomes references: eastern African (EAFR), western African (WAFR), northern European (NEUR), southern European (SEUR), eastern Asian (EAS), and southern Asian (SAS). From PheWAS, we found phecode categories significantly enriched neoplasms for EAFR, WAFR, and SEUR, and pregnancy complication in SEUR, NEUR, SAS, and EAS (p < 0.003). We then selected phenotypes hypertension (HTN) and atrial fibrillation (AFib) to further investigate the relationships between these phenotypes and EAFR, WAFR, SEUR, and NEUR using logistic regression modeling and non-linear restricted cubic spline modeling (RCS). For EAS and SAS, we chose renal failure (RF) for further modeling. The relationships between HTN and AFib and the ancestries EAFR, WAFR, and SEUR were best fit by the linear model (beta p < 1x10-4 for all) while the relationships with NEUR were best fit with RCS (HTN ANOVA p = 0.001, AFib ANOVA p < 1x10-4). For RF, the relationship with SAS was best fit with a linear model (beta p < 1x10-4) while RCS model was a better fit for EAS (ANOVA p < 1x10-4). In this study, we identify relationships between genetic ancestry and phenotypes that are best fit with non-linear modeling techniques. The assumption of linearity for regression modeling is integral for proper fitting of a model and there is no knowing a priori to modeling if the relationship is truly linear.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"389-403"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10802858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bao Hoang, Yijiang Pang, Hiroko H Dodge, Jiayu Zhou
Mild cognitive impairment (MCI) represents the early stage of dementia including Alzheimer's disease (AD) and is a crucial stage for therapeutic interventions and treatment. Early detection of MCI offers opportunities for early intervention and significantly benefits cohort enrichment for clinical trials. Imaging and in vivo markers in plasma and cerebrospinal fluid biomarkers have high detection performance, yet their prohibitive costs and intrusiveness demand more affordable and accessible alternatives. The recent advances in digital biomarkers, especially language markers, have shown great potential, where variables informative to MCI are derived from linguistic and/or speech and later used for predictive modeling. A major challenge in modeling language markers comes from the variability of how each person speaks. As the cohort size for language studies is usually small due to extensive data collection efforts, the variability among persons makes language markers hard to generalize to unseen subjects. In this paper, we propose a novel subject harmonization tool to address the issue of distributional differences in language markers across subjects, thus enhancing the generalization performance of machine learning models. Our empirical results show that machine learning models built on our harmonized features have improved prediction performance on unseen data. The source code and experiment scripts are available at https://github.com/illidanlab/subject_harmonization.
{"title":"Subject Harmonization of Digital Biomarkers: Improved Detection of Mild Cognitive Impairment from Language Markers.","authors":"Bao Hoang, Yijiang Pang, Hiroko H Dodge, Jiayu Zhou","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Mild cognitive impairment (MCI) represents the early stage of dementia including Alzheimer's disease (AD) and is a crucial stage for therapeutic interventions and treatment. Early detection of MCI offers opportunities for early intervention and significantly benefits cohort enrichment for clinical trials. Imaging and in vivo markers in plasma and cerebrospinal fluid biomarkers have high detection performance, yet their prohibitive costs and intrusiveness demand more affordable and accessible alternatives. The recent advances in digital biomarkers, especially language markers, have shown great potential, where variables informative to MCI are derived from linguistic and/or speech and later used for predictive modeling. A major challenge in modeling language markers comes from the variability of how each person speaks. As the cohort size for language studies is usually small due to extensive data collection efforts, the variability among persons makes language markers hard to generalize to unseen subjects. In this paper, we propose a novel subject harmonization tool to address the issue of distributional differences in language markers across subjects, thus enhancing the generalization performance of machine learning models. Our empirical results show that machine learning models built on our harmonized features have improved prediction performance on unseen data. The source code and experiment scripts are available at https://github.com/illidanlab/subject_harmonization.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"187-200"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11017207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco M De La Vega, Kathleen C Barnes, Keolu Fox, Alexander Ioannidis, Eimear Kenny, Rasika A Mathias, Bogdan Pasaniuc
The following sections are included:OverviewDealing with the lack of diversity in current research datasetsDevelopment of fair machine learning algorithmsRace, genetic ancestry, and population structureConclusionAcknowledgments.
{"title":"Session Introduction: Overcoming health disparities in precision medicine.","authors":"Francisco M De La Vega, Kathleen C Barnes, Keolu Fox, Alexander Ioannidis, Eimear Kenny, Rasika A Mathias, Bogdan Pasaniuc","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The following sections are included:OverviewDealing with the lack of diversity in current research datasetsDevelopment of fair machine learning algorithmsRace, genetic ancestry, and population structureConclusionAcknowledgments.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"322-326"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kiyoshi Ferreira Fukutani, Thomas H Hampton, Carly A Bobak, Todd A MacKenzie, Bruce A Stanton
The availability of multiple publicly-available datasets studying the same phenomenon has the promise of accelerating scientific discovery. Meta-analysis can address issues of reproducibility and often increase power. The promise of meta-analysis is especially germane to rarer diseases like cystic fibrosis (CF), which affects roughly 100,000 people worldwide. A recent search of the National Institute of Health's Gene Expression Omnibus revealed 1.3 million data sets related to cancer compared to about 2,000 related to CF. These studies are highly diverse, involving different tissues, animal models, treatments, and clinical covariates. In our search for gene expression studies of primary human airway epithelial cells, we identified three studies with compatible methodologies and sufficient metadata: GSE139078, Sala Study, and PRJEB9292. Even so, experimental designs were not identical, and we identified significant batch effects that would have complicated functional analysis. Here we present quantile discretization and Bayesian network construction using the Hill climb method as a powerful tool to overcome experimental differences and reveal biologically relevant responses to the CF genotype itself, exposure to virus, bacteria, and drugs used to treat CF. Functional patterns revealed by cluster Profiler included interferon signaling, interferon gamma signaling, interleukins 4 and 13 signaling, interleukin 6 signaling, interleukin 21 signaling, and inactivation of CSF3/G-CSF signaling pathways showing significant alterations. These pathways were consistently associated with higher gene expression in CF epithelial cells compared to non-CF cells, suggesting that targeting these pathways could improve clinical outcomes. The success of quantile discretization and Bayesian network analysis in the context of CF suggests that these approaches might be applicable to other contexts where exactly comparable data sets are hard to find.
研究同一现象的多个公开数据集的可用性有望加速科学发现。荟萃分析可以解决可重复性问题,通常还能提高研究效率。荟萃分析的前景对于囊性纤维化(CF)等罕见疾病尤为重要,全世界约有 10 万人患有囊性纤维化。最近对美国国立卫生研究院基因表达总库的搜索显示,与癌症有关的数据集有130万个,而与囊性纤维化有关的数据集只有约2000个。这些研究非常多样化,涉及不同的组织、动物模型、治疗方法和临床协变量。在搜索原代人类气道上皮细胞的基因表达研究时,我们发现了三项方法兼容、元数据充分的研究:GSE139078、Sala Study 和 PRJEB9292。尽管如此,实验设计并不完全相同,而且我们还发现了显著的批次效应,这将使功能分析变得更加复杂。在这里,我们介绍了使用希尔爬坡法进行量化离散化和贝叶斯网络构建的方法,它是克服实验差异并揭示 CF 基因型本身、暴露于病毒、细菌和用于治疗 CF 的药物的生物相关反应的有力工具。集群剖析器揭示的功能模式包括干扰素信号传导、γ干扰素信号传导、白细胞介素4和13信号传导、白细胞介素6信号传导、白细胞介素21信号传导,以及CSF3/G-CSF信号传导通路的失活,显示出显著的变化。与非CF细胞相比,这些通路始终与CF上皮细胞中较高的基因表达相关,这表明以这些通路为靶点可改善临床疗效。量子离散化和贝叶斯网络分析在CF方面的成功表明,这些方法可能适用于其他难以找到完全可比数据集的情况。
{"title":"APPLICATION OF QUANTILE DISCRETIZATION AND BAYESIAN NETWORK ANALYSIS TO PUBLICLY AVAILABLE CYSTIC FIBROSIS DATA SETS.","authors":"Kiyoshi Ferreira Fukutani, Thomas H Hampton, Carly A Bobak, Todd A MacKenzie, Bruce A Stanton","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The availability of multiple publicly-available datasets studying the same phenomenon has the promise of accelerating scientific discovery. Meta-analysis can address issues of reproducibility and often increase power. The promise of meta-analysis is especially germane to rarer diseases like cystic fibrosis (CF), which affects roughly 100,000 people worldwide. A recent search of the National Institute of Health's Gene Expression Omnibus revealed 1.3 million data sets related to cancer compared to about 2,000 related to CF. These studies are highly diverse, involving different tissues, animal models, treatments, and clinical covariates. In our search for gene expression studies of primary human airway epithelial cells, we identified three studies with compatible methodologies and sufficient metadata: GSE139078, Sala Study, and PRJEB9292. Even so, experimental designs were not identical, and we identified significant batch effects that would have complicated functional analysis. Here we present quantile discretization and Bayesian network construction using the Hill climb method as a powerful tool to overcome experimental differences and reveal biologically relevant responses to the CF genotype itself, exposure to virus, bacteria, and drugs used to treat CF. Functional patterns revealed by cluster Profiler included interferon signaling, interferon gamma signaling, interleukins 4 and 13 signaling, interleukin 6 signaling, interleukin 21 signaling, and inactivation of CSF3/G-CSF signaling pathways showing significant alterations. These pathways were consistently associated with higher gene expression in CF epithelial cells compared to non-CF cells, suggesting that targeting these pathways could improve clinical outcomes. The success of quantile discretization and Bayesian network analysis in the context of CF suggests that these approaches might be applicable to other contexts where exactly comparable data sets are hard to find.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"534-548"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10783867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Functional brain networks represent dynamic and complex interactions among anatomical regions of interest (ROIs), providing crucial clinical insights for neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have proven immense success and effectiveness in analyzing structured network data. However, due to the high complexity of data acquisition, resulting in limited training resources of neuroimaging data, GNNs, like all deep learning models, suffer from overfitting. Moreover, their capability to capture useful neural patterns for downstream prediction is also adversely affected. To address such challenge, this study proposes BrainSTEAM, an integrated framework featuring a spatio-temporal module that consists of an EdgeConv GNN model, an autoencoder network, and a Mixup strategy. In particular, the spatio-temporal module aims to dynamically segment the time series signals of the ROI features for each subject into chunked sequences. We leverage each sequence to construct correlation networks, thereby increasing the training data. Additionally, we employ the EdgeConv GNN to capture ROI connectivity structures, an autoencoder for data denoising, and mixup for enhancing model training through linear data augmentation. We evaluate our framework on two real-world neuroimaging datasets, ABIDE for Autism prediction and HCP for gender prediction. Extensive experiments demonstrate the superiority and robustness of BrainSTEAM when compared to a variety of existing models, showcasing the strong potential of our proposed mechanisms in generalizing to other studies for connectome-based fMRI analysis.
{"title":"BrainSTEAM: A Practical Pipeline for Connectome-based fMRI Analysis towards Subject Classification.","authors":"Alexis Li, Yi Yang, Hejie Cui, Carl Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Functional brain networks represent dynamic and complex interactions among anatomical regions of interest (ROIs), providing crucial clinical insights for neural pattern discovery and disorder diagnosis. In recent years, graph neural networks (GNNs) have proven immense success and effectiveness in analyzing structured network data. However, due to the high complexity of data acquisition, resulting in limited training resources of neuroimaging data, GNNs, like all deep learning models, suffer from overfitting. Moreover, their capability to capture useful neural patterns for downstream prediction is also adversely affected. To address such challenge, this study proposes BrainSTEAM, an integrated framework featuring a spatio-temporal module that consists of an EdgeConv GNN model, an autoencoder network, and a Mixup strategy. In particular, the spatio-temporal module aims to dynamically segment the time series signals of the ROI features for each subject into chunked sequences. We leverage each sequence to construct correlation networks, thereby increasing the training data. Additionally, we employ the EdgeConv GNN to capture ROI connectivity structures, an autoencoder for data denoising, and mixup for enhancing model training through linear data augmentation. We evaluate our framework on two real-world neuroimaging datasets, ABIDE for Autism prediction and HCP for gender prediction. Extensive experiments demonstrate the superiority and robustness of BrainSTEAM when compared to a variety of existing models, showcasing the strong potential of our proposed mechanisms in generalizing to other studies for connectome-based fMRI analysis.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"53-64"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengzhou Hu, Xikun Zhang, Andrew Latham, Andrej Šali, Trey Ideker, Emma Lundberg
Cells consist of large components, such as organelles, that recursively factor into smaller systems, such as condensates and protein complexes, forming a dynamic multi-scale structure of the cell. Recent technological innovations have paved the way for systematic interrogation of subcellular structures, yielding unprecedented insights into their roles and interactions. In this workshop, we discuss progress, challenges, and collaboration to marshal various computational approaches toward assembling an integrated structural map of the human cell.
{"title":"Tools for assembling the cell: Towards the era of cell structural bioinformatics.","authors":"Mengzhou Hu, Xikun Zhang, Andrew Latham, Andrej Šali, Trey Ideker, Emma Lundberg","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Cells consist of large components, such as organelles, that recursively factor into smaller systems, such as condensates and protein complexes, forming a dynamic multi-scale structure of the cell. Recent technological innovations have paved the way for systematic interrogation of subcellular structures, yielding unprecedented insights into their roles and interactions. In this workshop, we discuss progress, challenges, and collaboration to marshal various computational approaches toward assembling an integrated structural map of the human cell.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"661-665"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chinmaya U Joisa, Kevin A Chen, Samantha Beville, Timothy Stuhlmiller, Matthew E Berginski, Denis Okumu, Brian T Golitz, Michael P East, Gary L Johnson, Shawn M Gomez
Protein kinases are a primary focus in targeted therapy development for cancer, owing to their role as regulators in nearly all areas of cell life. Recent strategies targeting the kinome with combination therapies have shown promise, such as trametinib and dabrafenib in advanced melanoma, but empirical design for less characterized pathways remains a challenge. Computational combination screening is an attractive alternative, allowing in-silico filtering prior to experimental testing of drastically fewer leads, increasing efficiency and effectiveness of drug development pipelines. In this work, we generated combined kinome inhibition states of 40,000 kinase inhibitor combinations from kinobeads-based kinome profiling across 64 doses. We then integrated these with transcriptomics from CCLE to build machine learning models with elastic-net feature selection to predict cell line sensitivity across nine cancer types, with accuracy R2 ∼ 0.75-0.9. We then validated the model by using a PDX-derived TNBC cell line and saw good global accuracy (R2 ∼ 0.7) as well as high accuracy in predicting synergy using four popular metrics (R2 ∼ 0.9). Additionally, the model was able to predict a highly synergistic combination of trametinib and omipalisib for TNBC treatment, which incidentally was recently in phase I clinical trials. Our choice of tree-based models for greater interpretability allowed interrogation of highly predictive kinases in each cancer type, such as the MAPK, CDK, and STK kinases. Overall, these results suggest that kinome inhibition states of kinase inhibitor combinations are strongly predictive of cell line responses and have great potential for integration into computational drug screening pipelines. This approach may facilitate the identification of effective kinase inhibitor combinations and accelerate the development of novel cancer therapies, ultimately improving patient outcomes.
{"title":"Combined kinome inhibition states are predictive of cancer cell line sensitivity to kinase inhibitor combination therapies.","authors":"Chinmaya U Joisa, Kevin A Chen, Samantha Beville, Timothy Stuhlmiller, Matthew E Berginski, Denis Okumu, Brian T Golitz, Michael P East, Gary L Johnson, Shawn M Gomez","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Protein kinases are a primary focus in targeted therapy development for cancer, owing to their role as regulators in nearly all areas of cell life. Recent strategies targeting the kinome with combination therapies have shown promise, such as trametinib and dabrafenib in advanced melanoma, but empirical design for less characterized pathways remains a challenge. Computational combination screening is an attractive alternative, allowing in-silico filtering prior to experimental testing of drastically fewer leads, increasing efficiency and effectiveness of drug development pipelines. In this work, we generated combined kinome inhibition states of 40,000 kinase inhibitor combinations from kinobeads-based kinome profiling across 64 doses. We then integrated these with transcriptomics from CCLE to build machine learning models with elastic-net feature selection to predict cell line sensitivity across nine cancer types, with accuracy R2 ∼ 0.75-0.9. We then validated the model by using a PDX-derived TNBC cell line and saw good global accuracy (R2 ∼ 0.7) as well as high accuracy in predicting synergy using four popular metrics (R2 ∼ 0.9). Additionally, the model was able to predict a highly synergistic combination of trametinib and omipalisib for TNBC treatment, which incidentally was recently in phase I clinical trials. Our choice of tree-based models for greater interpretability allowed interrogation of highly predictive kinases in each cancer type, such as the MAPK, CDK, and STK kinases. Overall, these results suggest that kinome inhibition states of kinase inhibitor combinations are strongly predictive of cell line responses and have great potential for integration into computational drug screening pipelines. This approach may facilitate the identification of effective kinase inhibitor combinations and accelerate the development of novel cancer therapies, ultimately improving patient outcomes.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"29 ","pages":"276-290"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413988/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}