Pub Date : 2024-12-30DOI: 10.1101/2023.09.14.23295428
Michael Lape, Daniel Schnell, Sreeja Parameswaran, Kevin Ernst, Shannon O'Connor, Nathan Salomonis, Lisa J Martin, Brett M Harnett, Leah C Kottyan, Matthew T Weirauch
There are many well-established relationships between pathogens and human disease, but far fewer when focusing on non-communicable diseases (NCDs). We leverage data from The UK Biobank and TriNetX to perform a systematic survey across 20 pathogens and 426 diseases, primarily NCDs. To this end, we assess the association between disease status and infection history proxies. We identify 206 pathogen-disease pairs that replicate in both cohorts. We replicate many established relationships, including Helicobacter pylori with several gastroenterological diseases and connections between Epstein-Barr virus with multiple sclerosis and lupus. Overall, our approach identified evidence of association for 15 pathogens and 96 distinct diseases, including a currently controversial link between human cytomegalovirus (CMV) and ulcerative colitis (UC). We validate this connection through two orthogonal analyses, revealing increased CMV gene expression in UC patients and enrichment for UC genetic risk signal near human genes that have altered expression upon CMV infection. Collectively, these results form a foundation for future investigations into mechanistic roles played by pathogens in NCDs. All results are easily accessible on our website, https://tf.cchmc.org/pathogen-disease.
{"title":"After the Infection: A Survey of Pathogens and Non-communicable Human Disease.","authors":"Michael Lape, Daniel Schnell, Sreeja Parameswaran, Kevin Ernst, Shannon O'Connor, Nathan Salomonis, Lisa J Martin, Brett M Harnett, Leah C Kottyan, Matthew T Weirauch","doi":"10.1101/2023.09.14.23295428","DOIUrl":"10.1101/2023.09.14.23295428","url":null,"abstract":"<p><p>There are many well-established relationships between pathogens and human disease, but far fewer when focusing on non-communicable diseases (NCDs). We leverage data from The UK Biobank and TriNetX to perform a systematic survey across 20 pathogens and 426 diseases, primarily NCDs. To this end, we assess the association between disease status and infection history proxies. We identify 206 pathogen-disease pairs that replicate in both cohorts. We replicate many established relationships, including <i>Helicobacter pylori</i> with several gastroenterological diseases and connections between Epstein-Barr virus with multiple sclerosis and lupus. Overall, our approach identified evidence of association for 15 pathogens and 96 distinct diseases, including a currently controversial link between human cytomegalovirus (CMV) and ulcerative colitis (UC). We validate this connection through two orthogonal analyses, revealing increased CMV gene expression in UC patients and enrichment for UC genetic risk signal near human genes that have altered expression upon CMV infection. Collectively, these results form a foundation for future investigations into mechanistic roles played by pathogens in NCDs. All results are easily accessible on our website, https://tf.cchmc.org/pathogen-disease.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/3d/2c/nihpp-2023.09.14.23295428v1.PMC10516055.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41104621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-14DOI: 10.1101/2023.03.15.23287145
G L Barlow, C M Schürch, S S Bhate, D Phillips, A Young, S Dong, H A Martinez, G Kaber, N Nagy, S Ramachandran, J Meng, E Korpos, J A Bluestone, G P Nolan, P L Bollyky
In autoimmune Type 1 diabetes (T1D), immune cells infiltrate and destroy the islets of Langerhans - islands of endocrine tissue dispersed throughout the pancreas. However, the contribution of cellular programs outside islets to insulitis is unclear. Here, using CO-Detection by indEXing (CODEX) tissue imaging and cadaveric pancreas samples, we simultaneously examine islet and extra-islet inflammation in human T1D. We identify four sub-states of inflamed islets characterized by the activation profiles of CD8 + T cells enriched in islets relative to the surrounding tissue. We further find that the extra-islet space of lobules with extensive islet-infiltration differs from the extra-islet space of less infiltrated areas within the same tissue section. Finally, we identify lymphoid structures away from islets enriched in CD45RA + T cells - a population also enriched in one of the inflamed islet sub-states. Together, these data help define the coordination between islets and the extra-islet pancreas in the pathogenesis of human T1D.
在自身免疫性 1 型糖尿病(T1D)中,免疫细胞会浸润并破坏朗格汉斯胰岛--分散在整个胰腺中的内分泌组织岛。然而,胰岛外的细胞程序对胰岛炎的影响尚不清楚。在这里,我们利用 CO-Detection by indEXing (CODEX) 组织成像和尸体胰腺样本,同时研究了人类 T1D 中的胰岛和胰岛外炎症。我们发现了四种胰岛炎症亚状态,其特点是胰岛中的 CD8 + T 细胞活化图谱相对于周围组织更为丰富。我们进一步发现,在同一组织切片中,胰岛广泛浸润的小叶的胰岛外空间与浸润较少区域的胰岛外空间不同。最后,我们发现远离胰岛的淋巴结构富含 CD45RA + T 细胞--这也是胰岛发炎亚状态之一的富集人群。这些数据有助于确定胰岛和胰岛外胰腺在人类 T1D 发病机制中的协调作用。
{"title":"The Extra-Islet Pancreas Supports Autoimmunity in Human Type 1 Diabetes.","authors":"G L Barlow, C M Schürch, S S Bhate, D Phillips, A Young, S Dong, H A Martinez, G Kaber, N Nagy, S Ramachandran, J Meng, E Korpos, J A Bluestone, G P Nolan, P L Bollyky","doi":"10.1101/2023.03.15.23287145","DOIUrl":"10.1101/2023.03.15.23287145","url":null,"abstract":"<p><p>In autoimmune Type 1 diabetes (T1D), immune cells infiltrate and destroy the islets of Langerhans - islands of endocrine tissue dispersed throughout the pancreas. However, the contribution of cellular programs outside islets to insulitis is unclear. Here, using CO-Detection by indEXing (CODEX) tissue imaging and cadaveric pancreas samples, we simultaneously examine islet and extra-islet inflammation in human T1D. We identify four sub-states of inflamed islets characterized by the activation profiles of CD8 <sup>+</sup> T cells enriched in islets relative to the surrounding tissue. We further find that the extra-islet space of lobules with extensive islet-infiltration differs from the extra-islet space of less infiltrated areas within the same tissue section. Finally, we identify lymphoid structures away from islets enriched in CD45RA <sup>+</sup> T cells - a population also enriched in one of the inflamed islet sub-states. Together, these data help define the coordination between islets and the extra-islet pancreas in the pathogenesis of human T1D.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9197159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18DOI: 10.1101/2023.01.26.23285060
Rohan Goli, Keerthana Komatineni, Shailesh Alluri, Nina Hubig, Hua Min, Yang Gong, Dean F Sittig, Lior Rennert, David Robinson, Paul Biondich, Adam Wright, Christian Nøhr, Timothy Law, Arild Faxvaag, Aneesa Weaver, Ronald Gimbel, Xia Jing
Background: Interoperable clinical decision support system (CDSS) rules provide a pathway to interoperability, a well-recognized challenge in health information technology. Building an ontology facilitates creating interoperable CDSS rules, which can be achieved by identifying the keyphrases (KP) from the existing literature. Ontology construction is traditionally a manual effort by human domain experts, and the newly advanced natural language processing techniques, such as KP identification, can be a critical complementary automatic part of building ontology. However, KP identification requires human expertise, consensus, and contextual understanding for data labeling.
Methods: This paper presents a semi-supervised KP identification framework (long short-term memory-based encoders and the conditional random fields -based decoder models, BiLSTM-CRF) using minimal human labeled data based on hierarchical attention (i.e., at word, sentence, and abstract levels) over the documents and domain adaptation. We created synthetic labels for initial training and human-labeled data for fine-tuning. We also tested different options during NLP preprocessing and ML training to optimize the ML pipeline.
Results: Our method outperforms the prior neural architectures by learning through synthetic labels for initial training, document-level contextual learning, language modeling, and fine-tuning with limited gold standard label data. After comparison, we found that the BIO encoding schema performed slightly better than Blue, and domain adaptation techniques can improve the quality of synthetic labels. In addition, document-level context, pre-trained LM, and pre-trained WE all contributed to better model performance in our tasks. Add 2 to 4 human-labeled documents for every 100 synthetic labeled documents improves the model performance without exhausting human-labeled documents too quickly.
Conclusions: To the best of our knowledge, this is the first functional framework for the CDSS sub-domain to identify KPs, which is trained on limited human labeled data. It contributes to the general natural language processing (NLP) architectures in areas such as clinical NLP, where manual data labeling is challenging, and light-weighted deep learning models play an important role in real-time KP identification as a complementary approach to human experts' effort.
{"title":"Keyphrase Identification Using Minimal Labeled Data with Hierarchical Contexts and Transfer Learning.","authors":"Rohan Goli, Keerthana Komatineni, Shailesh Alluri, Nina Hubig, Hua Min, Yang Gong, Dean F Sittig, Lior Rennert, David Robinson, Paul Biondich, Adam Wright, Christian Nøhr, Timothy Law, Arild Faxvaag, Aneesa Weaver, Ronald Gimbel, Xia Jing","doi":"10.1101/2023.01.26.23285060","DOIUrl":"10.1101/2023.01.26.23285060","url":null,"abstract":"<p><strong>Background: </strong>Interoperable clinical decision support system (CDSS) rules provide a pathway to interoperability, a well-recognized challenge in health information technology. Building an ontology facilitates creating interoperable CDSS rules, which can be achieved by identifying the keyphrases (KP) from the existing literature. Ontology construction is traditionally a manual effort by human domain experts, and the newly advanced natural language processing techniques, such as KP identification, can be a critical complementary automatic part of building ontology. However, KP identification requires human expertise, consensus, and contextual understanding for data labeling.</p><p><strong>Methods: </strong>This paper presents a semi-supervised KP identification framework (long short-term memory-based encoders and the conditional random fields -based decoder models, BiLSTM-CRF) using minimal human labeled data based on hierarchical attention (i.e., at word, sentence, and abstract levels) over the documents and domain adaptation. We created synthetic labels for initial training and human-labeled data for fine-tuning. We also tested different options during NLP preprocessing and ML training to optimize the ML pipeline.</p><p><strong>Results: </strong>Our method outperforms the prior neural architectures by learning through synthetic labels for initial training, document-level contextual learning, language modeling, and fine-tuning with limited gold standard label data. After comparison, we found that the BIO encoding schema performed slightly better than Blue, and domain adaptation techniques can improve the quality of synthetic labels. In addition, document-level context, pre-trained LM, and pre-trained WE all contributed to better model performance in our tasks. Add 2 to 4 human-labeled documents for every 100 synthetic labeled documents improves the model performance without exhausting human-labeled documents too quickly.</p><p><strong>Conclusions: </strong>To the best of our knowledge, this is the first functional framework for the CDSS sub-domain to identify KPs, which is trained on limited human labeled data. It contributes to the general natural language processing (NLP) architectures in areas such as clinical NLP, where manual data labeling is challenging, and light-weighted deep learning models play an important role in real-time KP identification as a complementary approach to human experts' effort.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b9/97/nihpp-2023.01.26.23285060v2.PMC10246160.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10009443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1101/2023.05.25.23290531
Nansu Zong, Shaika Chowdhury, Shibo Zhou, Sivaraman Rajaganapathy, Yue Yu, Liewei Wang, Qiying Dai, Pengyang Li, Xiaoke Liu, Suzette J Bielinski, Jun Chen, Yongbin Chen, James R Cerhan
Introduction: The High mortality rates associated with heart failure (HF) have propelled the strategy of drug repurposing, which seeks new therapeutic uses for existing, approved drugs to enhance the management of HF symptoms effectively. An emerging trend focuses on utilizing real-world data, like EHR, to mimic randomized controlled trials (RCTs) for evaluating treatment outcomes through what are known as emulated trials (ET). Nonetheless, the intricacies inherent in EHR data-comprising detailed patient histories in databases, the omission of certain biomarkers or specific diagnostic tests, and partial records of symptoms-introduce notable discrepancies between EHR data and the stringent standards of RCTs. This gap poses a substantial challenge in conducting an ET to accurately predict treatment efficacy.
Objective: The objective of this research is to predict the efficacy of drugs repurposed for HF in randomized trials by leveraging EHR in ET.
Methods: We proposed an ET framework to predict drug efficacy, integrating target prediction based on biomedical databases with statistical analysis using EHR data. Specifically, we developed a novel target prediction model that learns low-dimensional representations of drug molecules, protein sequences, and diverse biomedical associations from a knowledge graph. Additionally, we crafted strategies to improve the prediction by considering the interactions between HF drugs and biological factors in the context of HF prognostic markers.
Results: Our validation of the drug-target prediction model against the BETA benchmark demonstrated superior performance, with an average AUCROC of 97.7%, PRAUC of 97.4%, F1 score of 93.1%, and a General Score of 96.1%, surpassing existing baseline algorithms. Further analysis of our ET framework on identifying 17 repurposed drugs-derived from 266 phase 3 HF RCTs-using data from 59,000 patients at the Mayo Clinic highlighted the framework's remarkable predictive accuracy. This analysis took into account various factors such as biological variables (e.g., gender, age, ethnicity), HF medications (e.g., ACE inhibitors, Beta-blockers, ARBs, Loop Diuretics), types of HF (HFpEF and HFrEF), confounders, and prognostic markers (e.g., NT-proBNP, bUn, creatinine, and hemoglobin). The ET framework significantly improved the accuracy compared to the baseline efficacy analysis that utilized EHR data. Notably, the best results were improved in AUC-ROC from 75.71% to 93.57% and in PRAUC from 78.66% to 90.34%, compared to the baseline models.
Conclusion: Our study presents an ET framework that significantly enhances drug efficacy emulation by integrating EHR-based analysis with target prediction. We demonstrated substantial success in predicting the efficacy of 17 HF drugs repurposed for phase 3 RCTs, showcasing the framework's potential in advancing HF treatment strategies.
{"title":"Advancing Efficacy Prediction for EHR-based Emulated Trials in Repurposing Heart Failure Therapies.","authors":"Nansu Zong, Shaika Chowdhury, Shibo Zhou, Sivaraman Rajaganapathy, Yue Yu, Liewei Wang, Qiying Dai, Pengyang Li, Xiaoke Liu, Suzette J Bielinski, Jun Chen, Yongbin Chen, James R Cerhan","doi":"10.1101/2023.05.25.23290531","DOIUrl":"10.1101/2023.05.25.23290531","url":null,"abstract":"<p><strong>Introduction: </strong>The High mortality rates associated with heart failure (HF) have propelled the strategy of drug repurposing, which seeks new therapeutic uses for existing, approved drugs to enhance the management of HF symptoms effectively. An emerging trend focuses on utilizing real-world data, like EHR, to mimic randomized controlled trials (RCTs) for evaluating treatment outcomes through what are known as emulated trials (ET). Nonetheless, the intricacies inherent in EHR data-comprising detailed patient histories in databases, the omission of certain biomarkers or specific diagnostic tests, and partial records of symptoms-introduce notable discrepancies between EHR data and the stringent standards of RCTs. This gap poses a substantial challenge in conducting an ET to accurately predict treatment efficacy.</p><p><strong>Objective: </strong>The objective of this research is to predict the efficacy of drugs repurposed for HF in randomized trials by leveraging EHR in ET.</p><p><strong>Methods: </strong>We proposed an ET framework to predict drug efficacy, integrating target prediction based on biomedical databases with statistical analysis using EHR data. Specifically, we developed a novel target prediction model that learns low-dimensional representations of drug molecules, protein sequences, and diverse biomedical associations from a knowledge graph. Additionally, we crafted strategies to improve the prediction by considering the interactions between HF drugs and biological factors in the context of HF prognostic markers.</p><p><strong>Results: </strong>Our validation of the drug-target prediction model against the BETA benchmark demonstrated superior performance, with an average AUCROC of 97.7%, PRAUC of 97.4%, F1 score of 93.1%, and a General Score of 96.1%, surpassing existing baseline algorithms. Further analysis of our ET framework on identifying 17 repurposed drugs-derived from 266 phase 3 HF RCTs-using data from 59,000 patients at the Mayo Clinic highlighted the framework's remarkable predictive accuracy. This analysis took into account various factors such as biological variables (e.g., gender, age, ethnicity), HF medications (e.g., ACE inhibitors, Beta-blockers, ARBs, Loop Diuretics), types of HF (HFpEF and HFrEF), confounders, and prognostic markers (e.g., NT-proBNP, bUn, creatinine, and hemoglobin). The ET framework significantly improved the accuracy compared to the baseline efficacy analysis that utilized EHR data. Notably, the best results were improved in AUC-ROC from 75.71% to 93.57% and in PRAUC from 78.66% to 90.34%, compared to the baseline models.</p><p><strong>Conclusion: </strong>Our study presents an ET framework that significantly enhances drug efficacy emulation by integrating EHR-based analysis with target prediction. We demonstrated substantial success in predicting the efficacy of 17 HF drugs repurposed for phase 3 RCTs, showcasing the framework's potential in advancing HF treatment strategies.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b0/45/nihpp-2023.05.25.23290531v1.PMC10312819.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9754104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-29DOI: 10.1101/2023.06.12.23291297
Arielle Klepper, James Asaki, Andrew F Kung, Sara E Vazquez, Aaron Bodansky, Anthea Mitchell, Sabrina A Mann, Kelsey Zorn, Isaac Avila-Vargas, Swathi Kari, Melawit Tekeste, Javier Castro, Briton Lee, Maria Duarte, Mandana Khalili, Monica Yang, Paul Wolters, Jennifer Price, Emily Perito, Sandy Feng, Jacquelyn J Maher, Jennifer C Lai, Christina Weiler-Normann, Ansgar W Lohse, Joseph DeRisi, Michele Tana
Background and aims: Autoimmune hepatitis (AIH) is a severe disease characterized by elevated immunoglobin levels. However, the role of autoantibodies in the pathophysiology of AIH remains uncertain.
Methods: Phage Immunoprecipitation-Sequencing (PhIP-seq) was employed to identify autoantibodies in the serum of patients with AIH (n = 115), compared to patients with other liver diseases (metabolic associated steatotic liver disease (MASH) n = 178, primary biliary cholangitis (PBC), n = 26, or healthy controls, n = 94).
Results: Logistic regression using PhIP-seq enriched peptides as inputs yielded a classification AUC of 0.81, indicating the presence of a predictive humoral immune signature for AIH. Embedded within this signature were disease relevant targets, including SLA/LP, the target of a well-recognized autoantibody in AIH, disco interacting protein 2 homolog A (DIP2A), and the relaxin family peptide receptor 1 (RXFP1). The autoreactive fragment of DIP2A was a 9-amino acid stretch nearly identical to the U27 protein of human herpes virus 6 (HHV-6). Fine mapping of this epitope suggests the HHV-6 U27 sequence is preferentially enriched relative to the corresponding DIP2A sequence. Antibodies against RXFP1, a receptor involved in anti-fibrotic signaling, were also highly specific to AIH. The enriched peptides are within a motif adjacent to the receptor binding domain, required for signaling and serum from AIH patients positive for anti-RFXP1 antibody was able to significantly inhibit relaxin-2 singling. Depletion of IgG from anti-RXFP1 positive serum abrogated this effect.
Conclusions: These data provide evidence for a novel serological profile in AIH, including a possible functional role for anti-RXFP1, and antibodies that cross react with HHV6 U27 protein.
{"title":"Novel autoantibody targets identified in patients with autoimmune hepatitis (AIH) by PhIP-Seq reveals pathogenic insights.","authors":"Arielle Klepper, James Asaki, Andrew F Kung, Sara E Vazquez, Aaron Bodansky, Anthea Mitchell, Sabrina A Mann, Kelsey Zorn, Isaac Avila-Vargas, Swathi Kari, Melawit Tekeste, Javier Castro, Briton Lee, Maria Duarte, Mandana Khalili, Monica Yang, Paul Wolters, Jennifer Price, Emily Perito, Sandy Feng, Jacquelyn J Maher, Jennifer C Lai, Christina Weiler-Normann, Ansgar W Lohse, Joseph DeRisi, Michele Tana","doi":"10.1101/2023.06.12.23291297","DOIUrl":"10.1101/2023.06.12.23291297","url":null,"abstract":"<p><strong>Background and aims: </strong>Autoimmune hepatitis (AIH) is a severe disease characterized by elevated immunoglobin levels. However, the role of autoantibodies in the pathophysiology of AIH remains uncertain.</p><p><strong>Methods: </strong>Phage Immunoprecipitation-Sequencing (PhIP-seq) was employed to identify autoantibodies in the serum of patients with AIH (<i>n</i> = 115), compared to patients with other liver diseases (metabolic associated steatotic liver disease (MASH) <i>n</i> = 178, primary biliary cholangitis (PBC), <i>n</i> = 26, or healthy controls, <i>n</i> = 94).</p><p><strong>Results: </strong>Logistic regression using PhIP-seq enriched peptides as inputs yielded a classification AUC of 0.81, indicating the presence of a predictive humoral immune signature for AIH. Embedded within this signature were disease relevant targets, including SLA/LP, the target of a well-recognized autoantibody in AIH, disco interacting protein 2 homolog A (DIP2A), and the relaxin family peptide receptor 1 (RXFP1). The autoreactive fragment of DIP2A was a 9-amino acid stretch nearly identical to the U27 protein of human herpes virus 6 (HHV-6). Fine mapping of this epitope suggests the HHV-6 U27 sequence is preferentially enriched relative to the corresponding DIP2A sequence. Antibodies against RXFP1, a receptor involved in anti-fibrotic signaling, were also highly specific to AIH. The enriched peptides are within a motif adjacent to the receptor binding domain, required for signaling and serum from AIH patients positive for anti-RFXP1 antibody was able to significantly inhibit relaxin-2 singling. Depletion of IgG from anti-RXFP1 positive serum abrogated this effect.</p><p><strong>Conclusions: </strong>These data provide evidence for a novel serological profile in AIH, including a possible functional role for anti-RXFP1, and antibodies that cross react with HHV6 U27 protein.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/3f/66/nihpp-2023.06.12.23291297v2.PMC10312872.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9754091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-25DOI: 10.1101/2023.04.17.23286845
Lu Zeng, Charles C White, David A Bennett, Hans-Ulrich Klein, Philip L De Jager
Myeloid cells, including monocytes, macrophages, microglia, dendritic cells and neutrophils are a part of innate immune system, playing a major role in orchestrating innate and adaptive immune responses. Both Alzheimer's disease (AD) and inflammatory bowel disease (IBD) susceptibility loci are enriched for genes expressed in myeloid cells, but it is not clear whether these myeloid risk factors are shared between the two diseases. Leveraging results of genome-wide association studies, we investigated the causal effect of IBD (including ulcerative colitis (UC) and Crohn's disease (CD)) variants on AD and its endophenotypes. Microglia and monocyte expression Quantitative Trait Locus (eQTLs) were used to examine the functional consequences of IBD and AD variants. Our results revealed distinct sets of genes and pathways of AD and IBD susceptibility loci. Specifically, AD loci are enriched for microglial eQTLs, while IBD loci are enriched for monocyte eQTLs. However, we also found that genetically determined IBD is associated with a protective effect against AD (p<0.03). Yet, a genetic propensity for the CD subtype is associated with increased amyloid accumulation (beta=7.14, p-value=0.02) and susceptibility to AD. Susceptibility to UC was associated with increased deposition of TDP-43 (beta=7.58, p-value=6.11×10-4). The relation of these gastrointestinal inflammatory disease to AD is therefore complex; while the different subsets of susceptibility variants preferentially affect different myeloid cell subtypes, there do appear to be certain shared pathways and the possible protective effect of IBD susceptibility on the risk of AD which may provide therapeutic insights.
{"title":"Genetic insights into the association between inflammatory bowel disease and Alzheimer's disease.","authors":"Lu Zeng, Charles C White, David A Bennett, Hans-Ulrich Klein, Philip L De Jager","doi":"10.1101/2023.04.17.23286845","DOIUrl":"10.1101/2023.04.17.23286845","url":null,"abstract":"<p><p>Myeloid cells, including monocytes, macrophages, microglia, dendritic cells and neutrophils are a part of innate immune system, playing a major role in orchestrating innate and adaptive immune responses. Both Alzheimer's disease (AD) and inflammatory bowel disease (IBD) susceptibility loci are enriched for genes expressed in myeloid cells, but it is not clear whether these myeloid risk factors are shared between the two diseases. Leveraging results of genome-wide association studies, we investigated the causal effect of IBD (including ulcerative colitis (UC) and Crohn's disease (CD)) variants on AD and its endophenotypes. Microglia and monocyte expression Quantitative Trait Locus (eQTLs) were used to examine the functional consequences of IBD and AD variants. Our results revealed distinct sets of genes and pathways of AD and IBD susceptibility loci. Specifically, AD loci are enriched for microglial eQTLs, while IBD loci are enriched for monocyte eQTLs. However, we also found that genetically determined IBD is associated with a protective effect against AD (p<0.03). Yet, a genetic propensity for the CD subtype is associated with increased amyloid accumulation (beta=7.14, p-value=0.02) and susceptibility to AD. Susceptibility to UC was associated with increased deposition of TDP-43 (beta=7.58, p-value=6.11×10<sup>-4</sup>). The relation of these gastrointestinal inflammatory disease to AD is therefore complex; while the different subsets of susceptibility variants preferentially affect different myeloid cell subtypes, there do appear to be certain shared pathways and the possible protective effect of IBD susceptibility on the risk of AD which may provide therapeutic insights.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/39/e6/nihpp-2023.04.17.23286845v1.PMC10153331.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9459545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1101/2023.07.16.23292724
David S M Lee, Kathleen M Cardone, David Y Zhang, Noah L Tsao, Sarah Abramowitz, Pranav Sharma, John S DePaolo, Mitchell Conery, Krishna G Aragam, Kiran Biddinger, Ozan Dilitikas, Lily Hoffman-Andrews, Renae L Judy, Atlas Khan, Iftikhar Kulo, Megan J Puckelwartz, Nosheen Reza, Benjamin A Satterfield, Pankhuri Singhal, Zoltan P Arany, Thomas P Cappola, Eric Carruth, Sharlene M Day, Ron Do, Christopher M Haggarty, Jacob Joseph, Elizabeth M McNally, Girish Nadkarni, Anjali T Owens, Daniel J Rader, Marylyn D Ritchie, Yan V Sun, Benjamin F Voight, Michael G Levin, Scott M Damrauer
Heart failure (HF) is a complex trait, influenced by environmental and genetic factors, which affects over 30 million individuals worldwide. Historically, the genetics of HF have been studied in Mendelian forms of disease, where rare genetic variants have been linked to familial cardiomyopathies. More recently, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with risk of HF. However, the relative importance of genetic variants across the allele-frequency spectrum remains incompletely characterized. Here, we report the results of common- and rare-variant association studies of all-cause heart failure, applying recently developed methods to quantify the heritability of HF attributable to different classes of genetic variation. We combine GWAS data across multiple populations including 207,346 individuals with HF and 2,151,210 without, identifying 176 risk loci at genome-wide significance (P-value < 5×10-8). Signals at newly identified common-variant loci include coding variants in Mendelian cardiomyopathy genes (MYBPC3, BAG3) and in regulators of lipoprotein (LPL) and glucose metabolism (GIPR, GLP1R). These signals are enriched in myocyte and adipocyte cell types and can be clustered into 5 broad modules based on pleiotropic associations with anthropomorphic traits/obesity, blood pressure/renal function, atherosclerosis/lipids, immune activity, and arrhythmias. Gene burden studies across three biobanks (PMBB, UKB, AOU), including 27,208 individuals with HF and 349,126 without, uncover exome-wide significant (P-value < 1.57×10-6) associations for HF and rare predicted loss-of-function (pLoF) variants in TTN, MYBPC3, FLNC, and BAG3. Total burden heritability of rare coding variants (2.2%, 95% CI 0.99-3.5%) is highly concentrated in a small set of Mendelian cardiomyopathy genes, while common variant heritability (4.3%, 95% CI 3.9-4.7%) is more diffusely spread throughout the genome. Finally, we show that common-variant background, in the form of a polygenic risk score (PRS), significantly modifies the risk of HF among carriers of pathogenic truncating variants in the Mendelian cardiomyopathy gene TTN. Together, these findings provide a genetic link between dysregulated metabolism and HF, and suggest a significant polygenic component to HF exists that is not captured by current clinical genetic testing.
{"title":"Common- and rare-variant genetic architecture of heart failure across the allele frequency spectrum.","authors":"David S M Lee, Kathleen M Cardone, David Y Zhang, Noah L Tsao, Sarah Abramowitz, Pranav Sharma, John S DePaolo, Mitchell Conery, Krishna G Aragam, Kiran Biddinger, Ozan Dilitikas, Lily Hoffman-Andrews, Renae L Judy, Atlas Khan, Iftikhar Kulo, Megan J Puckelwartz, Nosheen Reza, Benjamin A Satterfield, Pankhuri Singhal, Zoltan P Arany, Thomas P Cappola, Eric Carruth, Sharlene M Day, Ron Do, Christopher M Haggarty, Jacob Joseph, Elizabeth M McNally, Girish Nadkarni, Anjali T Owens, Daniel J Rader, Marylyn D Ritchie, Yan V Sun, Benjamin F Voight, Michael G Levin, Scott M Damrauer","doi":"10.1101/2023.07.16.23292724","DOIUrl":"10.1101/2023.07.16.23292724","url":null,"abstract":"<p><p>Heart failure (HF) is a complex trait, influenced by environmental and genetic factors, which affects over 30 million individuals worldwide. Historically, the genetics of HF have been studied in Mendelian forms of disease, where rare genetic variants have been linked to familial cardiomyopathies. More recently, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with risk of HF. However, the relative importance of genetic variants across the allele-frequency spectrum remains incompletely characterized. Here, we report the results of common- and rare-variant association studies of all-cause heart failure, applying recently developed methods to quantify the heritability of HF attributable to different classes of genetic variation. We combine GWAS data across multiple populations including 207,346 individuals with HF and 2,151,210 without, identifying 176 risk loci at genome-wide significance (P-value < 5×10<sup>-8</sup>). Signals at newly identified common-variant loci include coding variants in Mendelian cardiomyopathy genes (<i>MYBPC3</i>, <i>BAG3</i>) and in regulators of lipoprotein (<i>LPL</i>) and glucose metabolism (<i>GIPR</i>, <i>GLP1R</i>). These signals are enriched in myocyte and adipocyte cell types and can be clustered into 5 broad modules based on pleiotropic associations with anthropomorphic traits/obesity, blood pressure/renal function, atherosclerosis/lipids, immune activity, and arrhythmias. Gene burden studies across three biobanks (PMBB, UKB, AOU), including 27,208 individuals with HF and 349,126 without, uncover exome-wide significant (P-value < 1.57×10<sup>-6</sup>) associations for HF and rare predicted loss-of-function (pLoF) variants in <i>TTN</i>, <i>MYBPC3</i>, <i>FLNC, and BAG3.</i> Total burden heritability of rare coding variants (2.2%, 95% CI 0.99-3.5%) is highly concentrated in a small set of Mendelian cardiomyopathy genes, while common variant heritability (4.3%, 95% CI 3.9-4.7%) is more diffusely spread throughout the genome. Finally, we show that common-variant background, in the form of a polygenic risk score (PRS), significantly modifies the risk of HF among carriers of pathogenic truncating variants in the Mendelian cardiomyopathy gene TTN. Together, these findings provide a genetic link between dysregulated metabolism and HF, and suggest a significant polygenic component to HF exists that is not captured by current clinical genetic testing.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/ec/53/nihpp-2023.07.16.23292724v3.PMC10371173.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9945525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1101/2023.05.26.23290469
Dennis Wylie, Xiaoping Wang, Jun Yao, Hengyi Xu, Elizabeth A Ferrick-Kiddie, Toshiaki Iwase, Savitri Krishnamurthy, Naoto T Ueno, Alan M Lambowitz
Inflammatory breast cancer (IBC) is the most aggressive and lethal breast cancer subtype but lacks unequivocal genomic differences or robust biomarkers that differentiate it from non-IBC. Here, Thermostable Group II intron Reverse Transcriptase RNA-sequencing (TGIRT-seq) revealed myriad differences in tumor samples, Peripheral Blood Mononuclear Cells (PBMCs), and plasma that distinguished IBC from non-IBC patients and healthy donors across all tested receptor-based subtypes. These included numerous differentially expressed protein-coding gene and non-coding RNAs in all three sample types, a granulocytic immune response in IBC PBMCs, and over-expression of antisense RNAs, suggesting wide-spread enhanced transcription in both IBC tumors and PBMCs. By using TGIRT-seq to quantitate Intron-exon Depth Ratios (IDRs) and mapping reads to both genome and transcriptome reference sequences, we developed methods for parallel analysis of transcriptional and post-transcriptional gene regulation. This analysis identified numerous differentially and non-differentially expressed protein-coding genes in IBC tumors and PBMCs with high IDRs, the latter reflecting rate-limiting RNA splicing that negatively impacts mRNA production. Mirroring gene expression differences in tumors and PBMCs, over-represented protein-coding gene RNAs in IBC patient plasma were largely intronic RNAs, while those in non-IBC patients and healthy donor plasma were largely mRNA fragments. Potential IBC biomarkers in plasma included T-cell receptor pre-mRNAs and intronic, LINE-1, and antisense RNAs. Our findings provide new insights into IBC and set the stage for monitoring disease progression and response to treatment by liquid biopsy. The methods developed for parallel transcriptional and post-transcriptional gene regulation analysis have potentially broad RNA-seq and clinical applications.
炎症性乳腺癌癌症(IBC)是最具侵袭性和致命性的癌症亚型,但在生物标志物鉴定方面存在滞后。在这里,我们使用了一种改进的Thermostable Group II内含子逆转录酶RNA测序(TGIRT-seq)方法来同时分析来自肿瘤、PBMC以及IBC和非IBC患者和健康供体的血浆的编码和非编码RNA。除了来自已知IBC相关基因的RNA外,我们在IBC肿瘤和PBMC中鉴定了数百种其他过表达的编码和非编码RNA(p≤0.001),包括较高比例的内含子-外显子深度比(IDRs)升高,这可能反映了转录增强导致内含子RNA的积累。因此,IBC血浆中差异表达的蛋白质编码基因RNA主要是内含子RNA片段,而健康供体和非IBC血浆的RNA主要是片段化的mRNA。血浆中潜在的IBC生物标志物包括追踪到IBC肿瘤和PBMC的T细胞受体前mRNA片段;内含子RNA片段与高IDR基因相关;以及我们发现在IBC中全局上调并在血浆中优先富集的LINE-1和其他逆转录元件RNA。我们的发现为IBC提供了新的见解,并证明了广泛分析转录组用于生物标志物鉴定的优势。为这项研究开发的RNA-seq和数据分析方法可能广泛适用于其他疾病。
{"title":"TGIRT-seq of Inflammatory Breast Cancer Tumor and Blood Samples Reveals Widespread Enhanced Transcription Impacting RNA Splicing and Intronic RNAs in Plasma.","authors":"Dennis Wylie, Xiaoping Wang, Jun Yao, Hengyi Xu, Elizabeth A Ferrick-Kiddie, Toshiaki Iwase, Savitri Krishnamurthy, Naoto T Ueno, Alan M Lambowitz","doi":"10.1101/2023.05.26.23290469","DOIUrl":"10.1101/2023.05.26.23290469","url":null,"abstract":"<p><p>Inflammatory breast cancer (IBC) is the most aggressive and lethal breast cancer subtype but lacks unequivocal genomic differences or robust biomarkers that differentiate it from non-IBC. Here, Thermostable Group II intron Reverse Transcriptase RNA-sequencing (TGIRT-seq) revealed myriad differences in tumor samples, Peripheral Blood Mononuclear Cells (PBMCs), and plasma that distinguished IBC from non-IBC patients and healthy donors across all tested receptor-based subtypes. These included numerous differentially expressed protein-coding gene and non-coding RNAs in all three sample types, a granulocytic immune response in IBC PBMCs, and over-expression of antisense RNAs, suggesting wide-spread enhanced transcription in both IBC tumors and PBMCs. By using TGIRT-seq to quantitate Intron-exon Depth Ratios (IDRs) and mapping reads to both genome and transcriptome reference sequences, we developed methods for parallel analysis of transcriptional and post-transcriptional gene regulation. This analysis identified numerous differentially and non-differentially expressed protein-coding genes in IBC tumors and PBMCs with high IDRs, the latter reflecting rate-limiting RNA splicing that negatively impacts mRNA production. Mirroring gene expression differences in tumors and PBMCs, over-represented protein-coding gene RNAs in IBC patient plasma were largely intronic RNAs, while those in non-IBC patients and healthy donor plasma were largely mRNA fragments. Potential IBC biomarkers in plasma included T-cell receptor pre-mRNAs and intronic, LINE-1, and antisense RNAs. Our findings provide new insights into IBC and set the stage for monitoring disease progression and response to treatment by liquid biopsy. The methods developed for parallel transcriptional and post-transcriptional gene regulation analysis have potentially broad RNA-seq and clinical applications.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/5f/b3/nihpp-2023.05.26.23290469v1.PMC10312853.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10122265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1101/2023.08.24.23294574
Min Gu Kwak, Lingchao Mao, Zhiyang Zheng, Yi Su, Fleming Lure, Jing Li
Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Despite the promise of integrating multimodal neuroimages such as MRI and PET, handling datasets with incomplete modalities remains under-researched. This phenomenon, however, is common in real-world scenarios as not every patient has all modalities due to practical constraints such as cost, access, and safety concerns. We propose a deep learning framework employing cross-modal Mutual Knowledge Distillation (MKD) to model different sub-cohorts of patients based on their available modalities. In MKD, the multimodal model (e.g., MRI and PET) serves as a teacher, while the single-modality model (e.g., MRI only) is the student. Our MKD framework features three components: a Modality-Disentangling Teacher (MDT) model designed through information disentanglement, a student model that learns from classification errors and MDT's knowledge, and the teacher model enhanced via distilling the student's single-modal feature extraction capabilities. Moreover, we show the effectiveness of the proposed method through theoretical analysis and validate its performance with simulation studies. In addition, our method is demonstrated through a case study with Alzheimer's Disease Neuroimaging Initiative (ADNI) datasets, underscoring the potential of artificial intelligence in addressing incomplete multimodal neuroimaging datasets and advancing early AD detection.
Note to practitioners—: This paper was motivated by the challenge of early AD diagnosis, particularly in scenarios when clinicians encounter varied availability of patient imaging data, such as MRI and PET scans, often constrained by cost or accessibility issues. We propose an incomplete multimodal learning framework that produces tailored models for patients with only MRI and patients with both MRI and PET. This approach improves the accuracy and effectiveness of early AD diagnosis, especially when imaging resources are limited, via bi-directional knowledge transfer. We introduced a teacher model that prioritizes extracting common information between different modalities, significantly enhancing the student model's learning process. This paper includes theoretical analysis, simulation study, and real-world case study to illustrate the method's promising potential in early AD detection. However, practitioners should be mindful of the complexities involved in model tuning. Future work will focus on improving model interpretability and expanding its application. This includes developing methods to discover the key brain regions for predictions, enhancing clinical trust, and extending the framework to incorporate a broader range of imaging modalities, demographic information, and clinical data. These advancements aim to provide a more comprehensive view of patient health and improve diagnostic accuracy across various neurodegenerative diseases.
{"title":"A Cross-Modal Mutual Knowledge Distillation Framework for Alzheimer's Disease Diagnosis: Addressing Incomplete Modalities.","authors":"Min Gu Kwak, Lingchao Mao, Zhiyang Zheng, Yi Su, Fleming Lure, Jing Li","doi":"10.1101/2023.08.24.23294574","DOIUrl":"10.1101/2023.08.24.23294574","url":null,"abstract":"<p><p>Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Despite the promise of integrating multimodal neuroimages such as MRI and PET, handling datasets with incomplete modalities remains under-researched. This phenomenon, however, is common in real-world scenarios as not every patient has all modalities due to practical constraints such as cost, access, and safety concerns. We propose a deep learning framework employing cross-modal Mutual Knowledge Distillation (MKD) to model different sub-cohorts of patients based on their available modalities. In MKD, the multimodal model (e.g., MRI and PET) serves as a teacher, while the single-modality model (e.g., MRI only) is the student. Our MKD framework features three components: a Modality-Disentangling Teacher (MDT) model designed through information disentanglement, a student model that learns from classification errors and MDT's knowledge, and the teacher model enhanced via distilling the student's single-modal feature extraction capabilities. Moreover, we show the effectiveness of the proposed method through theoretical analysis and validate its performance with simulation studies. In addition, our method is demonstrated through a case study with Alzheimer's Disease Neuroimaging Initiative (ADNI) datasets, underscoring the potential of artificial intelligence in addressing incomplete multimodal neuroimaging datasets and advancing early AD detection.</p><p><strong>Note to practitioners—: </strong>This paper was motivated by the challenge of early AD diagnosis, particularly in scenarios when clinicians encounter varied availability of patient imaging data, such as MRI and PET scans, often constrained by cost or accessibility issues. We propose an incomplete multimodal learning framework that produces tailored models for patients with only MRI and patients with both MRI and PET. This approach improves the accuracy and effectiveness of early AD diagnosis, especially when imaging resources are limited, via bi-directional knowledge transfer. We introduced a teacher model that prioritizes extracting common information between different modalities, significantly enhancing the student model's learning process. This paper includes theoretical analysis, simulation study, and real-world case study to illustrate the method's promising potential in early AD detection. However, practitioners should be mindful of the complexities involved in model tuning. Future work will focus on improving model interpretability and expanding its application. This includes developing methods to discover the key brain regions for predictions, enhancing clinical trust, and extending the framework to incorporate a broader range of imaging modalities, demographic information, and clinical data. These advancements aim to provide a more comprehensive view of patient health and improve diagnostic accuracy across various neurodegenerative diseases.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/70/04/nihpp-2023.08.24.23294574v1.PMC10473798.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10213310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08DOI: 10.1101/2023.06.06.23290887
Hellen Lesmann, Alexander Hustinx, Shahida Moosa, Hannah Klinkhammer, Elaine Marchi, Pilar Caro, Ibrahim M Abdelrazek, Jean Tori Pantel, Merle Ten Hagen, Meow-Keong Thong, Rifhan Azwani Binti Mazlan, Sok Kun Tae, Tom Kamphans, Wolfgang Meiswinkel, Jing-Mei Li, Behnam Javanmardi, Alexej Knaus, Annette Uwineza, Cordula Knopp, Tinatin Tkemaladze, Miriam Elbracht, Larissa Mattern, Rami Abou Jamra, Clara Velmans, Vincent Strehlow, Maureen Jacob, Angela Peron, Cristina Dias, Beatriz Carvalho Nunes, Thainá Vilella, Isabel Furquim Pinheiro, Chong Ae Kim, Maria Isabel Melaragno, Hannah Weiland, Sophia Kaptain, Karolina Chwiałkowska, Miroslaw Kwasniewski, Ramy Saad, Sarah Wiethoff, Himanshu Goel, Clara Tang, Anna Hau, Tahsin Stefan Barakat, Przemysław Panek, Amira Nabil, Julia Suh, Frederik Braun, Israel Gomy, Luisa Averdunk, Ekanem Ekure, Gaber Bergant, Borut Peterlin, Claudio Graziano, Nagwa Gaboon, Moisés Fiesco-Roa, Alessandro Mauro Spinelli, Nina-Maria Wilpert, Prasit Phowthongkum, Nergis Güzel, Tobias B Haack, Rana Bitar, Andreas Tzschach, Agusti Rodriguez-Palmero, Theresa Brunet, Sabine Rudnik-Schöneborn, Silvina Noemi Contreras-Capetillo, Ava Oberlack, Carole Samango-Sprouse, Teresa Sadeghin, Margaret Olaya, Konrad Platzer, Artem Borovikov, Franziska Schnabel, Lara Heuft, Vera Herrmann, Renske Oegema, Nour Elkhateeb, Sheetal Kumar, Katalin Komlosi, Khoushoua Mohamed, Silvia Kalantari, Fabio Sirchia, Antonio F Martinez-Monseny, Matthias Höller, Louiza Toutouna, Amal Mohamed, Amaia Lasa-Aranzasti, John A Sayer, Nadja Ehmke, Magdalena Danyel, Henrike Sczakiel, Sarina Schwartzmann, Felix Boschann, Max Zhao, Ronja Adam, Lara Einicke, Denise Horn, Kee Seang Chew, Choy Chen Kam, Miray Karakoyun, Ben Pode-Shakked, Aviva Eliyahu, Rachel Rock, Teresa Carrion, Odelia Chorin, Yuri A Zarate, Marcelo Martinez Conti, Mert Karakaya, Moon Ley Tung, Bharatendu Chandra, Arjan Bouman, Aime Lumaka, Naveed Wasif, Marwan Shinawi, Patrick R Blackburn, Tianyun Wang, Tim Niehues, Axel Schmidt, Regina Rita Roth, Dagmar Wieczorek, Ping Hu, Rebekah L Waikel, Suzanna E Ledgister Hanchard, Gehad Elmakkawy, Sylvia Safwat, Frédéric Ebstein, Elke Krüger, Sébastien Küry, Stéphane Bézieau, Annabelle Arlt, Eric Olinger, Felix Marbach, Dong Li, Lucie Dupuis, Roberto Mendoza-Londono, Sofia Douzgou Houge, Denisa Weis, Brian Hon-Yin Chung, Christopher C Y Mak, Hülya Kayserili, Nursel Elcioglu, Ayca Aykut, Peli Özlem Şimşek-Kiper, Nina Bögershausen, Bernd Wollnik, Heidi Beate Bentzen, Ingo Kurth, Christian Netzer, Aleksandra Jezela-Stanek, Koen Devriendt, Karen W Gripp, Martin Mücke, Alain Verloes, Christian P Schaaf, Christoffer Nellåker, Benjamin D Solomon, Markus M Nöthen, Ebtesam Abdalla, Gholson J Lyon, Peter M Krawitz, Tzung-Chien Hsieh
The most important factor that complicates the work of dysmorphologists is the significant phenotypic variability of the human face. Next-Generation Phenotyping (NGP) tools that assist clinicians with recognizing characteristic syndromic patterns are particularly challenged when confronted with patients from populations different from their training data. To that end, we systematically analyzed the impact of genetic ancestry on facial dysmorphism. For that purpose, we established the GestaltMatcher Database (GMDB) as a reference dataset for medical images of patients with rare genetic disorders from around the world. We collected 10,980 frontal facial images - more than a quarter previously unpublished - from 8,346 patients, representing 581 rare disorders. Although the predominant ancestry is still European (67%), data from underrepresented populations have been increased considerably via global collaborations (19% Asian and 7% African). This includes previously unpublished reports for more than 40% of the African patients. The NGP analysis on this diverse dataset revealed characteristic performance differences depending on the composition of training and test sets corresponding to genetic relatedness. For clinical use of NGP, incorporating non-European patients resulted in a profound enhancement of GestaltMatcher performance. The top-5 accuracy rate increased by +11.29%. Importantly, this improvement in delineating the correct disorder from a facial portrait was achieved without decreasing the performance on European patients. By design, GMDB complies with the FAIR principles by rendering the curated medical data findable, accessible, interoperable, and reusable. This means GMDB can also serve as data for training and benchmarking. In summary, our study on facial dysmorphism on a global sample revealed a considerable cross ancestral phenotypic variability confounding NGP that should be counteracted by international efforts for increasing data diversity. GMDB will serve as a vital reference database for clinicians and a transparent training set for advancing NGP technology.
{"title":"GestaltMatcher Database - A global reference for facial phenotypic variability in rare human diseases.","authors":"Hellen Lesmann, Alexander Hustinx, Shahida Moosa, Hannah Klinkhammer, Elaine Marchi, Pilar Caro, Ibrahim M Abdelrazek, Jean Tori Pantel, Merle Ten Hagen, Meow-Keong Thong, Rifhan Azwani Binti Mazlan, Sok Kun Tae, Tom Kamphans, Wolfgang Meiswinkel, Jing-Mei Li, Behnam Javanmardi, Alexej Knaus, Annette Uwineza, Cordula Knopp, Tinatin Tkemaladze, Miriam Elbracht, Larissa Mattern, Rami Abou Jamra, Clara Velmans, Vincent Strehlow, Maureen Jacob, Angela Peron, Cristina Dias, Beatriz Carvalho Nunes, Thainá Vilella, Isabel Furquim Pinheiro, Chong Ae Kim, Maria Isabel Melaragno, Hannah Weiland, Sophia Kaptain, Karolina Chwiałkowska, Miroslaw Kwasniewski, Ramy Saad, Sarah Wiethoff, Himanshu Goel, Clara Tang, Anna Hau, Tahsin Stefan Barakat, Przemysław Panek, Amira Nabil, Julia Suh, Frederik Braun, Israel Gomy, Luisa Averdunk, Ekanem Ekure, Gaber Bergant, Borut Peterlin, Claudio Graziano, Nagwa Gaboon, Moisés Fiesco-Roa, Alessandro Mauro Spinelli, Nina-Maria Wilpert, Prasit Phowthongkum, Nergis Güzel, Tobias B Haack, Rana Bitar, Andreas Tzschach, Agusti Rodriguez-Palmero, Theresa Brunet, Sabine Rudnik-Schöneborn, Silvina Noemi Contreras-Capetillo, Ava Oberlack, Carole Samango-Sprouse, Teresa Sadeghin, Margaret Olaya, Konrad Platzer, Artem Borovikov, Franziska Schnabel, Lara Heuft, Vera Herrmann, Renske Oegema, Nour Elkhateeb, Sheetal Kumar, Katalin Komlosi, Khoushoua Mohamed, Silvia Kalantari, Fabio Sirchia, Antonio F Martinez-Monseny, Matthias Höller, Louiza Toutouna, Amal Mohamed, Amaia Lasa-Aranzasti, John A Sayer, Nadja Ehmke, Magdalena Danyel, Henrike Sczakiel, Sarina Schwartzmann, Felix Boschann, Max Zhao, Ronja Adam, Lara Einicke, Denise Horn, Kee Seang Chew, Choy Chen Kam, Miray Karakoyun, Ben Pode-Shakked, Aviva Eliyahu, Rachel Rock, Teresa Carrion, Odelia Chorin, Yuri A Zarate, Marcelo Martinez Conti, Mert Karakaya, Moon Ley Tung, Bharatendu Chandra, Arjan Bouman, Aime Lumaka, Naveed Wasif, Marwan Shinawi, Patrick R Blackburn, Tianyun Wang, Tim Niehues, Axel Schmidt, Regina Rita Roth, Dagmar Wieczorek, Ping Hu, Rebekah L Waikel, Suzanna E Ledgister Hanchard, Gehad Elmakkawy, Sylvia Safwat, Frédéric Ebstein, Elke Krüger, Sébastien Küry, Stéphane Bézieau, Annabelle Arlt, Eric Olinger, Felix Marbach, Dong Li, Lucie Dupuis, Roberto Mendoza-Londono, Sofia Douzgou Houge, Denisa Weis, Brian Hon-Yin Chung, Christopher C Y Mak, Hülya Kayserili, Nursel Elcioglu, Ayca Aykut, Peli Özlem Şimşek-Kiper, Nina Bögershausen, Bernd Wollnik, Heidi Beate Bentzen, Ingo Kurth, Christian Netzer, Aleksandra Jezela-Stanek, Koen Devriendt, Karen W Gripp, Martin Mücke, Alain Verloes, Christian P Schaaf, Christoffer Nellåker, Benjamin D Solomon, Markus M Nöthen, Ebtesam Abdalla, Gholson J Lyon, Peter M Krawitz, Tzung-Chien Hsieh","doi":"10.1101/2023.06.06.23290887","DOIUrl":"10.1101/2023.06.06.23290887","url":null,"abstract":"<p><p>The most important factor that complicates the work of dysmorphologists is the significant phenotypic variability of the human face. Next-Generation Phenotyping (NGP) tools that assist clinicians with recognizing characteristic syndromic patterns are particularly challenged when confronted with patients from populations different from their training data. To that end, we systematically analyzed the impact of genetic ancestry on facial dysmorphism. For that purpose, we established the GestaltMatcher Database (GMDB) as a reference dataset for medical images of patients with rare genetic disorders from around the world. We collected 10,980 frontal facial images - more than a quarter previously unpublished - from 8,346 patients, representing 581 rare disorders. Although the predominant ancestry is still European (67%), data from underrepresented populations have been increased considerably via global collaborations (19% Asian and 7% African). This includes previously unpublished reports for more than 40% of the African patients. The NGP analysis on this diverse dataset revealed characteristic performance differences depending on the composition of training and test sets corresponding to genetic relatedness. For clinical use of NGP, incorporating non-European patients resulted in a profound enhancement of GestaltMatcher performance. The top-5 accuracy rate increased by +11.29%. Importantly, this improvement in delineating the correct disorder from a facial portrait was achieved without decreasing the performance on European patients. By design, GMDB complies with the FAIR principles by rendering the curated medical data findable, accessible, interoperable, and reusable. This means GMDB can also serve as data for training and benchmarking. In summary, our study on facial dysmorphism on a global sample revealed a considerable cross ancestral phenotypic variability confounding NGP that should be counteracted by international efforts for increasing data diversity. GMDB will serve as a vital reference database for clinicians and a transparent training set for advancing NGP technology.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2b/fe/nihpp-2023.06.06.23290887v1.PMC10371103.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9934770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}