首页 > 最新文献

medRxiv : the preprint server for health sciences最新文献

英文 中文
After the Infection: A Survey of Pathogens and Non-communicable Human Disease. 感染后:病原体和非传染性人类疾病的调查。
Pub Date : 2024-12-30 DOI: 10.1101/2023.09.14.23295428
Michael Lape, Daniel Schnell, Sreeja Parameswaran, Kevin Ernst, Shannon O'Connor, Nathan Salomonis, Lisa J Martin, Brett M Harnett, Leah C Kottyan, Matthew T Weirauch

There are many well-established relationships between pathogens and human disease, but far fewer when focusing on non-communicable diseases (NCDs). We leverage data from The UK Biobank and TriNetX to perform a systematic survey across 20 pathogens and 426 diseases, primarily NCDs. To this end, we assess the association between disease status and infection history proxies. We identify 206 pathogen-disease pairs that replicate in both cohorts. We replicate many established relationships, including Helicobacter pylori with several gastroenterological diseases and connections between Epstein-Barr virus with multiple sclerosis and lupus. Overall, our approach identified evidence of association for 15 pathogens and 96 distinct diseases, including a currently controversial link between human cytomegalovirus (CMV) and ulcerative colitis (UC). We validate this connection through two orthogonal analyses, revealing increased CMV gene expression in UC patients and enrichment for UC genetic risk signal near human genes that have altered expression upon CMV infection. Collectively, these results form a foundation for future investigations into mechanistic roles played by pathogens in NCDs. All results are easily accessible on our website, https://tf.cchmc.org/pathogen-disease.

病原体和人类疾病之间有许多公认的关系,但在关注非传染性疾病时,这种关系要少得多。我们利用英国生物银行和TriNetX的数据,对20种病原体和426种疾病进行了系统调查,主要关注非传染性疾病。为此,我们评估了疾病状态和感染史指标之间的关系。我们鉴定了206对在两个队列中复制的病原体-疾病对。我们复制了许多已建立的关系,包括幽门螺杆菌与几种胃肠病之间的关系,以及EB病毒与多发性硬化症和狼疮之间的关系。总的来说,我们的方法确定了15种病原体和96种不同疾病的相关性证据,包括目前有争议的人类巨细胞病毒(CMV)和溃疡性结肠炎(UC)之间的联系。我们通过两个正交分析验证了这种联系,揭示了UC患者中CMV基因表达的增加,以及在CMV感染后改变表达的人类基因附近UC遗传风险信号的富集。总之,这些结果为未来研究病原体在疾病中的机制作用奠定了基础。
{"title":"After the Infection: A Survey of Pathogens and Non-communicable Human Disease.","authors":"Michael Lape, Daniel Schnell, Sreeja Parameswaran, Kevin Ernst, Shannon O'Connor, Nathan Salomonis, Lisa J Martin, Brett M Harnett, Leah C Kottyan, Matthew T Weirauch","doi":"10.1101/2023.09.14.23295428","DOIUrl":"10.1101/2023.09.14.23295428","url":null,"abstract":"<p><p>There are many well-established relationships between pathogens and human disease, but far fewer when focusing on non-communicable diseases (NCDs). We leverage data from The UK Biobank and TriNetX to perform a systematic survey across 20 pathogens and 426 diseases, primarily NCDs. To this end, we assess the association between disease status and infection history proxies. We identify 206 pathogen-disease pairs that replicate in both cohorts. We replicate many established relationships, including <i>Helicobacter pylori</i> with several gastroenterological diseases and connections between Epstein-Barr virus with multiple sclerosis and lupus. Overall, our approach identified evidence of association for 15 pathogens and 96 distinct diseases, including a currently controversial link between human cytomegalovirus (CMV) and ulcerative colitis (UC). We validate this connection through two orthogonal analyses, revealing increased CMV gene expression in UC patients and enrichment for UC genetic risk signal near human genes that have altered expression upon CMV infection. Collectively, these results form a foundation for future investigations into mechanistic roles played by pathogens in NCDs. All results are easily accessible on our website, https://tf.cchmc.org/pathogen-disease.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/3d/2c/nihpp-2023.09.14.23295428v1.PMC10516055.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41104621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Extra-Islet Pancreas Supports Autoimmunity in Human Type 1 Diabetes. 胰岛外胰腺支持人类 1 型糖尿病的自身免疫功能
Pub Date : 2024-12-14 DOI: 10.1101/2023.03.15.23287145
G L Barlow, C M Schürch, S S Bhate, D Phillips, A Young, S Dong, H A Martinez, G Kaber, N Nagy, S Ramachandran, J Meng, E Korpos, J A Bluestone, G P Nolan, P L Bollyky

In autoimmune Type 1 diabetes (T1D), immune cells infiltrate and destroy the islets of Langerhans - islands of endocrine tissue dispersed throughout the pancreas. However, the contribution of cellular programs outside islets to insulitis is unclear. Here, using CO-Detection by indEXing (CODEX) tissue imaging and cadaveric pancreas samples, we simultaneously examine islet and extra-islet inflammation in human T1D. We identify four sub-states of inflamed islets characterized by the activation profiles of CD8 + T cells enriched in islets relative to the surrounding tissue. We further find that the extra-islet space of lobules with extensive islet-infiltration differs from the extra-islet space of less infiltrated areas within the same tissue section. Finally, we identify lymphoid structures away from islets enriched in CD45RA + T cells - a population also enriched in one of the inflamed islet sub-states. Together, these data help define the coordination between islets and the extra-islet pancreas in the pathogenesis of human T1D.

在自身免疫性 1 型糖尿病(T1D)中,免疫细胞会浸润并破坏朗格汉斯胰岛--分散在整个胰腺中的内分泌组织岛。然而,胰岛外的细胞程序对胰岛炎的影响尚不清楚。在这里,我们利用 CO-Detection by indEXing (CODEX) 组织成像和尸体胰腺样本,同时研究了人类 T1D 中的胰岛和胰岛外炎症。我们发现了四种胰岛炎症亚状态,其特点是胰岛中的 CD8 + T 细胞活化图谱相对于周围组织更为丰富。我们进一步发现,在同一组织切片中,胰岛广泛浸润的小叶的胰岛外空间与浸润较少区域的胰岛外空间不同。最后,我们发现远离胰岛的淋巴结构富含 CD45RA + T 细胞--这也是胰岛发炎亚状态之一的富集人群。这些数据有助于确定胰岛和胰岛外胰腺在人类 T1D 发病机制中的协调作用。
{"title":"The Extra-Islet Pancreas Supports Autoimmunity in Human Type 1 Diabetes.","authors":"G L Barlow, C M Schürch, S S Bhate, D Phillips, A Young, S Dong, H A Martinez, G Kaber, N Nagy, S Ramachandran, J Meng, E Korpos, J A Bluestone, G P Nolan, P L Bollyky","doi":"10.1101/2023.03.15.23287145","DOIUrl":"10.1101/2023.03.15.23287145","url":null,"abstract":"<p><p>In autoimmune Type 1 diabetes (T1D), immune cells infiltrate and destroy the islets of Langerhans - islands of endocrine tissue dispersed throughout the pancreas. However, the contribution of cellular programs outside islets to insulitis is unclear. Here, using CO-Detection by indEXing (CODEX) tissue imaging and cadaveric pancreas samples, we simultaneously examine islet and extra-islet inflammation in human T1D. We identify four sub-states of inflamed islets characterized by the activation profiles of CD8 <sup>+</sup> T cells enriched in islets relative to the surrounding tissue. We further find that the extra-islet space of lobules with extensive islet-infiltration differs from the extra-islet space of less infiltrated areas within the same tissue section. Finally, we identify lymphoid structures away from islets enriched in CD45RA <sup>+</sup> T cells - a population also enriched in one of the inflamed islet sub-states. Together, these data help define the coordination between islets and the extra-islet pancreas in the pathogenesis of human T1D.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9197159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keyphrase Identification Using Minimal Labeled Data with Hierarchical Contexts and Transfer Learning. 使用具有分层上下文和迁移学习的最小标记数据的关键词识别。
Pub Date : 2024-11-18 DOI: 10.1101/2023.01.26.23285060
Rohan Goli, Keerthana Komatineni, Shailesh Alluri, Nina Hubig, Hua Min, Yang Gong, Dean F Sittig, Lior Rennert, David Robinson, Paul Biondich, Adam Wright, Christian Nøhr, Timothy Law, Arild Faxvaag, Aneesa Weaver, Ronald Gimbel, Xia Jing

Background: Interoperable clinical decision support system (CDSS) rules provide a pathway to interoperability, a well-recognized challenge in health information technology. Building an ontology facilitates creating interoperable CDSS rules, which can be achieved by identifying the keyphrases (KP) from the existing literature. Ontology construction is traditionally a manual effort by human domain experts, and the newly advanced natural language processing techniques, such as KP identification, can be a critical complementary automatic part of building ontology. However, KP identification requires human expertise, consensus, and contextual understanding for data labeling.

Methods: This paper presents a semi-supervised KP identification framework (long short-term memory-based encoders and the conditional random fields -based decoder models, BiLSTM-CRF) using minimal human labeled data based on hierarchical attention (i.e., at word, sentence, and abstract levels) over the documents and domain adaptation. We created synthetic labels for initial training and human-labeled data for fine-tuning. We also tested different options during NLP preprocessing and ML training to optimize the ML pipeline.

Results: Our method outperforms the prior neural architectures by learning through synthetic labels for initial training, document-level contextual learning, language modeling, and fine-tuning with limited gold standard label data. After comparison, we found that the BIO encoding schema performed slightly better than Blue, and domain adaptation techniques can improve the quality of synthetic labels. In addition, document-level context, pre-trained LM, and pre-trained WE all contributed to better model performance in our tasks. Add 2 to 4 human-labeled documents for every 100 synthetic labeled documents improves the model performance without exhausting human-labeled documents too quickly.

Conclusions: To the best of our knowledge, this is the first functional framework for the CDSS sub-domain to identify KPs, which is trained on limited human labeled data. It contributes to the general natural language processing (NLP) architectures in areas such as clinical NLP, where manual data labeling is challenging, and light-weighted deep learning models play an important role in real-time KP identification as a complementary approach to human experts' effort.

可互操作的临床决策支持系统(CDSS)规则提供了一条实现互操作性的途径,这是卫生信息技术中公认的挑战。构建本体有助于创建可互操作的CDSS规则,这可以通过从现有文献中识别关键短语(KP)来实现。然而,数据标签的KP识别需要人类专业知识、共识和上下文理解。本文旨在提出一个半监督KP识别框架,该框架使用基于对文档的层次关注和领域自适应的最小标记数据。我们的方法通过合成标签进行初始训练、文档级上下文学习、语言建模以及使用有限的金标准标签数据进行微调,从而优于先前的神经架构。据我们所知,这是CDSS子域识别KP的第一个功能框架,它是在有限的标记数据上训练的。它有助于临床NLP等领域的通用自然语言处理(NLP)架构,在这些领域,手动数据标记具有挑战性,而轻量级深度学习模型在实时KP识别中发挥作用,作为人类专家工作的补充方法。
{"title":"Keyphrase Identification Using Minimal Labeled Data with Hierarchical Contexts and Transfer Learning.","authors":"Rohan Goli, Keerthana Komatineni, Shailesh Alluri, Nina Hubig, Hua Min, Yang Gong, Dean F Sittig, Lior Rennert, David Robinson, Paul Biondich, Adam Wright, Christian Nøhr, Timothy Law, Arild Faxvaag, Aneesa Weaver, Ronald Gimbel, Xia Jing","doi":"10.1101/2023.01.26.23285060","DOIUrl":"10.1101/2023.01.26.23285060","url":null,"abstract":"<p><strong>Background: </strong>Interoperable clinical decision support system (CDSS) rules provide a pathway to interoperability, a well-recognized challenge in health information technology. Building an ontology facilitates creating interoperable CDSS rules, which can be achieved by identifying the keyphrases (KP) from the existing literature. Ontology construction is traditionally a manual effort by human domain experts, and the newly advanced natural language processing techniques, such as KP identification, can be a critical complementary automatic part of building ontology. However, KP identification requires human expertise, consensus, and contextual understanding for data labeling.</p><p><strong>Methods: </strong>This paper presents a semi-supervised KP identification framework (long short-term memory-based encoders and the conditional random fields -based decoder models, BiLSTM-CRF) using minimal human labeled data based on hierarchical attention (i.e., at word, sentence, and abstract levels) over the documents and domain adaptation. We created synthetic labels for initial training and human-labeled data for fine-tuning. We also tested different options during NLP preprocessing and ML training to optimize the ML pipeline.</p><p><strong>Results: </strong>Our method outperforms the prior neural architectures by learning through synthetic labels for initial training, document-level contextual learning, language modeling, and fine-tuning with limited gold standard label data. After comparison, we found that the BIO encoding schema performed slightly better than Blue, and domain adaptation techniques can improve the quality of synthetic labels. In addition, document-level context, pre-trained LM, and pre-trained WE all contributed to better model performance in our tasks. Add 2 to 4 human-labeled documents for every 100 synthetic labeled documents improves the model performance without exhausting human-labeled documents too quickly.</p><p><strong>Conclusions: </strong>To the best of our knowledge, this is the first functional framework for the CDSS sub-domain to identify KPs, which is trained on limited human labeled data. It contributes to the general natural language processing (NLP) architectures in areas such as clinical NLP, where manual data labeling is challenging, and light-weighted deep learning models play an important role in real-time KP identification as a complementary approach to human experts' effort.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b9/97/nihpp-2023.01.26.23285060v2.PMC10246160.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10009443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing Efficacy Prediction for EHR-based Emulated Trials in Repurposing Heart Failure Therapies. 基于人工智能的心力衰竭3期临床试验疗效预测。
Pub Date : 2024-11-01 DOI: 10.1101/2023.05.25.23290531
Nansu Zong, Shaika Chowdhury, Shibo Zhou, Sivaraman Rajaganapathy, Yue Yu, Liewei Wang, Qiying Dai, Pengyang Li, Xiaoke Liu, Suzette J Bielinski, Jun Chen, Yongbin Chen, James R Cerhan

Introduction: The High mortality rates associated with heart failure (HF) have propelled the strategy of drug repurposing, which seeks new therapeutic uses for existing, approved drugs to enhance the management of HF symptoms effectively. An emerging trend focuses on utilizing real-world data, like EHR, to mimic randomized controlled trials (RCTs) for evaluating treatment outcomes through what are known as emulated trials (ET). Nonetheless, the intricacies inherent in EHR data-comprising detailed patient histories in databases, the omission of certain biomarkers or specific diagnostic tests, and partial records of symptoms-introduce notable discrepancies between EHR data and the stringent standards of RCTs. This gap poses a substantial challenge in conducting an ET to accurately predict treatment efficacy.

Objective: The objective of this research is to predict the efficacy of drugs repurposed for HF in randomized trials by leveraging EHR in ET.

Methods: We proposed an ET framework to predict drug efficacy, integrating target prediction based on biomedical databases with statistical analysis using EHR data. Specifically, we developed a novel target prediction model that learns low-dimensional representations of drug molecules, protein sequences, and diverse biomedical associations from a knowledge graph. Additionally, we crafted strategies to improve the prediction by considering the interactions between HF drugs and biological factors in the context of HF prognostic markers.

Results: Our validation of the drug-target prediction model against the BETA benchmark demonstrated superior performance, with an average AUCROC of 97.7%, PRAUC of 97.4%, F1 score of 93.1%, and a General Score of 96.1%, surpassing existing baseline algorithms. Further analysis of our ET framework on identifying 17 repurposed drugs-derived from 266 phase 3 HF RCTs-using data from 59,000 patients at the Mayo Clinic highlighted the framework's remarkable predictive accuracy. This analysis took into account various factors such as biological variables (e.g., gender, age, ethnicity), HF medications (e.g., ACE inhibitors, Beta-blockers, ARBs, Loop Diuretics), types of HF (HFpEF and HFrEF), confounders, and prognostic markers (e.g., NT-proBNP, bUn, creatinine, and hemoglobin). The ET framework significantly improved the accuracy compared to the baseline efficacy analysis that utilized EHR data. Notably, the best results were improved in AUC-ROC from 75.71% to 93.57% and in PRAUC from 78.66% to 90.34%, compared to the baseline models.

Conclusion: Our study presents an ET framework that significantly enhances drug efficacy emulation by integrating EHR-based analysis with target prediction. We demonstrated substantial success in predicting the efficacy of 17 HF drugs repurposed for phase 3 RCTs, showcasing the framework's potential in advancing HF treatment strategies.

引言:药物再利用涉及为已经批准的药物寻找新的治疗用途,这可以节省成本,因为它们的药代动力学和药效学已经为人所知。基于临床终点预测疗效对于设计3期试验和做出决定是有价值的,考虑到2期潜在的混杂效应。目的:本研究旨在预测3期临床试验中重新调整用途的心力衰竭(HF)药物的疗效。方法:我们的研究为预测3期试验中的药物疗效提供了一个全面的框架,其将使用生物医学知识库的药物靶点预测与真实世界数据的统计分析相结合。我们开发了一种新的药物靶点预测模型,该模型使用药物化学结构和基因序列的低维表示以及生物医学知识库。此外,我们对电子健康记录进行了统计分析,以评估重新调整用途的药物与临床测量(如NT-proBNP)的有效性。结果:我们从266项3期临床试验中确定了24种用于治疗心力衰竭的重新调整用途药物(9种为阳性,15种为非阳性)。我们使用了25个与心力衰竭相关的基因进行药物靶点预测,并使用梅奥诊所的电子健康记录(EHR)进行筛查,其中包括58000多名接受各种药物治疗的心力衰竭患者,并按心力衰竭亚型进行分类。与六种尖端的基线方法相比,我们提出的药物靶点预测模型在BETA基准的所有七项测试中都表现得非常好(即在404项任务中的266项中表现最好)。对于24种药物的总体预测,我们的模型实现了82.59%的AUROC和73.39%的PRAUC(平均精度)。结论:该研究在预测3期临床试验中重新利用药物的疗效方面取得了卓越的结果,突出了该方法促进计算药物重新利用的潜力。
{"title":"Advancing Efficacy Prediction for EHR-based Emulated Trials in Repurposing Heart Failure Therapies.","authors":"Nansu Zong, Shaika Chowdhury, Shibo Zhou, Sivaraman Rajaganapathy, Yue Yu, Liewei Wang, Qiying Dai, Pengyang Li, Xiaoke Liu, Suzette J Bielinski, Jun Chen, Yongbin Chen, James R Cerhan","doi":"10.1101/2023.05.25.23290531","DOIUrl":"10.1101/2023.05.25.23290531","url":null,"abstract":"<p><strong>Introduction: </strong>The High mortality rates associated with heart failure (HF) have propelled the strategy of drug repurposing, which seeks new therapeutic uses for existing, approved drugs to enhance the management of HF symptoms effectively. An emerging trend focuses on utilizing real-world data, like EHR, to mimic randomized controlled trials (RCTs) for evaluating treatment outcomes through what are known as emulated trials (ET). Nonetheless, the intricacies inherent in EHR data-comprising detailed patient histories in databases, the omission of certain biomarkers or specific diagnostic tests, and partial records of symptoms-introduce notable discrepancies between EHR data and the stringent standards of RCTs. This gap poses a substantial challenge in conducting an ET to accurately predict treatment efficacy.</p><p><strong>Objective: </strong>The objective of this research is to predict the efficacy of drugs repurposed for HF in randomized trials by leveraging EHR in ET.</p><p><strong>Methods: </strong>We proposed an ET framework to predict drug efficacy, integrating target prediction based on biomedical databases with statistical analysis using EHR data. Specifically, we developed a novel target prediction model that learns low-dimensional representations of drug molecules, protein sequences, and diverse biomedical associations from a knowledge graph. Additionally, we crafted strategies to improve the prediction by considering the interactions between HF drugs and biological factors in the context of HF prognostic markers.</p><p><strong>Results: </strong>Our validation of the drug-target prediction model against the BETA benchmark demonstrated superior performance, with an average AUCROC of 97.7%, PRAUC of 97.4%, F1 score of 93.1%, and a General Score of 96.1%, surpassing existing baseline algorithms. Further analysis of our ET framework on identifying 17 repurposed drugs-derived from 266 phase 3 HF RCTs-using data from 59,000 patients at the Mayo Clinic highlighted the framework's remarkable predictive accuracy. This analysis took into account various factors such as biological variables (e.g., gender, age, ethnicity), HF medications (e.g., ACE inhibitors, Beta-blockers, ARBs, Loop Diuretics), types of HF (HFpEF and HFrEF), confounders, and prognostic markers (e.g., NT-proBNP, bUn, creatinine, and hemoglobin). The ET framework significantly improved the accuracy compared to the baseline efficacy analysis that utilized EHR data. Notably, the best results were improved in AUC-ROC from 75.71% to 93.57% and in PRAUC from 78.66% to 90.34%, compared to the baseline models.</p><p><strong>Conclusion: </strong>Our study presents an ET framework that significantly enhances drug efficacy emulation by integrating EHR-based analysis with target prediction. We demonstrated substantial success in predicting the efficacy of 17 HF drugs repurposed for phase 3 RCTs, showcasing the framework's potential in advancing HF treatment strategies.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b0/45/nihpp-2023.05.25.23290531v1.PMC10312819.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9754104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel autoantibody targets identified in patients with autoimmune hepatitis (AIH) by PhIP-Seq reveals pathogenic insights. PhIP-Seq在自身免疫性肝炎(AIH)患者中发现的新的自身抗体靶点揭示了致病性见解。
Pub Date : 2024-10-29 DOI: 10.1101/2023.06.12.23291297
Arielle Klepper, James Asaki, Andrew F Kung, Sara E Vazquez, Aaron Bodansky, Anthea Mitchell, Sabrina A Mann, Kelsey Zorn, Isaac Avila-Vargas, Swathi Kari, Melawit Tekeste, Javier Castro, Briton Lee, Maria Duarte, Mandana Khalili, Monica Yang, Paul Wolters, Jennifer Price, Emily Perito, Sandy Feng, Jacquelyn J Maher, Jennifer C Lai, Christina Weiler-Normann, Ansgar W Lohse, Joseph DeRisi, Michele Tana

Background and aims: Autoimmune hepatitis (AIH) is a severe disease characterized by elevated immunoglobin levels. However, the role of autoantibodies in the pathophysiology of AIH remains uncertain.

Methods: Phage Immunoprecipitation-Sequencing (PhIP-seq) was employed to identify autoantibodies in the serum of patients with AIH (n = 115), compared to patients with other liver diseases (metabolic associated steatotic liver disease (MASH) n = 178, primary biliary cholangitis (PBC), n = 26, or healthy controls, n = 94).

Results: Logistic regression using PhIP-seq enriched peptides as inputs yielded a classification AUC of 0.81, indicating the presence of a predictive humoral immune signature for AIH. Embedded within this signature were disease relevant targets, including SLA/LP, the target of a well-recognized autoantibody in AIH, disco interacting protein 2 homolog A (DIP2A), and the relaxin family peptide receptor 1 (RXFP1). The autoreactive fragment of DIP2A was a 9-amino acid stretch nearly identical to the U27 protein of human herpes virus 6 (HHV-6). Fine mapping of this epitope suggests the HHV-6 U27 sequence is preferentially enriched relative to the corresponding DIP2A sequence. Antibodies against RXFP1, a receptor involved in anti-fibrotic signaling, were also highly specific to AIH. The enriched peptides are within a motif adjacent to the receptor binding domain, required for signaling and serum from AIH patients positive for anti-RFXP1 antibody was able to significantly inhibit relaxin-2 singling. Depletion of IgG from anti-RXFP1 positive serum abrogated this effect.

Conclusions: These data provide evidence for a novel serological profile in AIH, including a possible functional role for anti-RXFP1, and antibodies that cross react with HHV6 U27 protein.

自身免疫性肝炎(AIH)是一种严重的自身免疫性疾病,其特征是自身抗体的存在。然而,自身抗体在AIH病理生理学中的作用仍不确定。在此,我们采用噬菌体免疫沉淀测序(PhIP-Seq)来鉴定AIH中的新型自身抗体。使用这些结果,逻辑回归分类器能够预测哪些患者患有AIH,表明存在明显的体液免疫特征。为了进一步研究AIH最特异的自身抗体,相对于广泛的对照组(298名非酒精性脂肪肝(NAFLD)、原发性胆汁性胆管炎(PBC)患者或健康对照组),鉴定了重要的肽。排名靠前的自身反应靶点包括SLA(AIH中公认的自身抗体的靶点)和迪斯科相互作用蛋白2同源物a(DIP2A)。DIP2A的自身反应片段共有9个氨基酸,与肝脏中发现的病毒HHV-6B的U27蛋白几乎相同。此外,针对来源于松弛素家族肽受体1(RXFP1)的富含亮氨酸重复序列N末端(LRRNT)结构域的肽的抗体高度富集并且对AIH具有特异性。富集的肽映射到受体结合结构域附近的基序,这是RXFP1信号传导所需的。RXFP1是一种G蛋白偶联受体,结合松弛素-2,松弛素-2是一种抗纤维化分子,可降低肝星状细胞的肌成纤维细胞表型。九名具有RXFP1抗体的患者中有八名有晚期纤维化的证据(F3或更高)。此外,来自抗RFXP1抗体阳性的AIH患者的血清能够显著抑制人单核细胞系THP1中的松弛素-2信号传导。从抗RXFP1阳性血清中消耗IgG消除了这种作用。这些数据提供了支持性证据,证明HHV6在AIH的发展中发挥作用,并指出抗RXFP1 IgG在一些患者中具有潜在的致病作用。在患者血清中鉴定抗RXFP1可以对AIH患者的纤维化进展进行风险分层,并导致疾病干预新策略的开发。
{"title":"Novel autoantibody targets identified in patients with autoimmune hepatitis (AIH) by PhIP-Seq reveals pathogenic insights.","authors":"Arielle Klepper, James Asaki, Andrew F Kung, Sara E Vazquez, Aaron Bodansky, Anthea Mitchell, Sabrina A Mann, Kelsey Zorn, Isaac Avila-Vargas, Swathi Kari, Melawit Tekeste, Javier Castro, Briton Lee, Maria Duarte, Mandana Khalili, Monica Yang, Paul Wolters, Jennifer Price, Emily Perito, Sandy Feng, Jacquelyn J Maher, Jennifer C Lai, Christina Weiler-Normann, Ansgar W Lohse, Joseph DeRisi, Michele Tana","doi":"10.1101/2023.06.12.23291297","DOIUrl":"10.1101/2023.06.12.23291297","url":null,"abstract":"<p><strong>Background and aims: </strong>Autoimmune hepatitis (AIH) is a severe disease characterized by elevated immunoglobin levels. However, the role of autoantibodies in the pathophysiology of AIH remains uncertain.</p><p><strong>Methods: </strong>Phage Immunoprecipitation-Sequencing (PhIP-seq) was employed to identify autoantibodies in the serum of patients with AIH (<i>n</i> = 115), compared to patients with other liver diseases (metabolic associated steatotic liver disease (MASH) <i>n</i> = 178, primary biliary cholangitis (PBC), <i>n</i> = 26, or healthy controls, <i>n</i> = 94).</p><p><strong>Results: </strong>Logistic regression using PhIP-seq enriched peptides as inputs yielded a classification AUC of 0.81, indicating the presence of a predictive humoral immune signature for AIH. Embedded within this signature were disease relevant targets, including SLA/LP, the target of a well-recognized autoantibody in AIH, disco interacting protein 2 homolog A (DIP2A), and the relaxin family peptide receptor 1 (RXFP1). The autoreactive fragment of DIP2A was a 9-amino acid stretch nearly identical to the U27 protein of human herpes virus 6 (HHV-6). Fine mapping of this epitope suggests the HHV-6 U27 sequence is preferentially enriched relative to the corresponding DIP2A sequence. Antibodies against RXFP1, a receptor involved in anti-fibrotic signaling, were also highly specific to AIH. The enriched peptides are within a motif adjacent to the receptor binding domain, required for signaling and serum from AIH patients positive for anti-RFXP1 antibody was able to significantly inhibit relaxin-2 singling. Depletion of IgG from anti-RXFP1 positive serum abrogated this effect.</p><p><strong>Conclusions: </strong>These data provide evidence for a novel serological profile in AIH, including a possible functional role for anti-RXFP1, and antibodies that cross react with HHV6 U27 protein.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/3f/66/nihpp-2023.06.12.23291297v2.PMC10312872.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9754091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genetic insights into the association between inflammatory bowel disease and Alzheimer's disease. 炎症性肠病和阿尔茨海默病之间关系的遗传学见解。
Pub Date : 2024-10-25 DOI: 10.1101/2023.04.17.23286845
Lu Zeng, Charles C White, David A Bennett, Hans-Ulrich Klein, Philip L De Jager

Myeloid cells, including monocytes, macrophages, microglia, dendritic cells and neutrophils are a part of innate immune system, playing a major role in orchestrating innate and adaptive immune responses. Both Alzheimer's disease (AD) and inflammatory bowel disease (IBD) susceptibility loci are enriched for genes expressed in myeloid cells, but it is not clear whether these myeloid risk factors are shared between the two diseases. Leveraging results of genome-wide association studies, we investigated the causal effect of IBD (including ulcerative colitis (UC) and Crohn's disease (CD)) variants on AD and its endophenotypes. Microglia and monocyte expression Quantitative Trait Locus (eQTLs) were used to examine the functional consequences of IBD and AD variants. Our results revealed distinct sets of genes and pathways of AD and IBD susceptibility loci. Specifically, AD loci are enriched for microglial eQTLs, while IBD loci are enriched for monocyte eQTLs. However, we also found that genetically determined IBD is associated with a protective effect against AD (p<0.03). Yet, a genetic propensity for the CD subtype is associated with increased amyloid accumulation (beta=7.14, p-value=0.02) and susceptibility to AD. Susceptibility to UC was associated with increased deposition of TDP-43 (beta=7.58, p-value=6.11×10-4). The relation of these gastrointestinal inflammatory disease to AD is therefore complex; while the different subsets of susceptibility variants preferentially affect different myeloid cell subtypes, there do appear to be certain shared pathways and the possible protective effect of IBD susceptibility on the risk of AD which may provide therapeutic insights.

背景:骨髓细胞,包括单核细胞、巨噬细胞、小胶质细胞、树突状细胞和中性粒细胞,是先天免疫的一部分,在协调先天和适应性免疫反应中发挥着重要作用。小胶质细胞是中枢神经系统的固有髓细胞,许多阿尔茨海默病(AD)风险基因座存在于髓细胞中高度表达或有时唯一表达的基因中或附近。类似地,炎症性肠病(IBD)基因座也富含髓细胞表达的基因。然而,骨髓细胞中AD和IBD易感性基因座之间的重叠程度仍不清楚,大量的IBD基因图谱可能有助于加速AD研究。方法:在这里,我们利用大规模全基因组关联研究(GWAS)的汇总统计数据来研究IBD(包括溃疡性结肠炎和克罗恩病)变异对AD和AD内表型的因果影响。小胶质细胞和单核细胞表达定量性状位点(eQTL)用于检测IBD和AD风险变体在两种不同髓系细胞亚型中富集的功能后果。结果:我们的研究结果表明,虽然PTK2B与这两种疾病有关,并且两组风险基因座都富含髓系基因,但AD和IBD易感性基因座在很大程度上涉及不同的基因和途径。AD基因座对小胶质细胞eQTL的富集程度明显高于IBD。我们还发现,遗传决定的IBD与AD的风险较低有关,这可能是由对神经原纤维缠结积累的负面影响(β=1.04,p=0.013)驱动的。此外,IBD与精神疾病和多发性硬化症表现出显著的正遗传相关性,AD与肌萎缩侧索硬化症具有显著的正相关基因。结论:据我们所知,这是第一项系统对比IBD和AD之间遗传关联的研究,我们的发现强调了IBD对AD可能具有的遗传保护作用,即使这两组疾病变体对髓细胞基因表达的大多数影响是不同的。因此,IBD骨髓研究可能无助于加速AD功能研究,但我们的观察加强了骨髓细胞在tau蛋白病积累中的作用,并为发现保护因子提供了新的途径。
{"title":"Genetic insights into the association between inflammatory bowel disease and Alzheimer's disease.","authors":"Lu Zeng, Charles C White, David A Bennett, Hans-Ulrich Klein, Philip L De Jager","doi":"10.1101/2023.04.17.23286845","DOIUrl":"10.1101/2023.04.17.23286845","url":null,"abstract":"<p><p>Myeloid cells, including monocytes, macrophages, microglia, dendritic cells and neutrophils are a part of innate immune system, playing a major role in orchestrating innate and adaptive immune responses. Both Alzheimer's disease (AD) and inflammatory bowel disease (IBD) susceptibility loci are enriched for genes expressed in myeloid cells, but it is not clear whether these myeloid risk factors are shared between the two diseases. Leveraging results of genome-wide association studies, we investigated the causal effect of IBD (including ulcerative colitis (UC) and Crohn's disease (CD)) variants on AD and its endophenotypes. Microglia and monocyte expression Quantitative Trait Locus (eQTLs) were used to examine the functional consequences of IBD and AD variants. Our results revealed distinct sets of genes and pathways of AD and IBD susceptibility loci. Specifically, AD loci are enriched for microglial eQTLs, while IBD loci are enriched for monocyte eQTLs. However, we also found that genetically determined IBD is associated with a protective effect against AD (p<0.03). Yet, a genetic propensity for the CD subtype is associated with increased amyloid accumulation (beta=7.14, p-value=0.02) and susceptibility to AD. Susceptibility to UC was associated with increased deposition of TDP-43 (beta=7.58, p-value=6.11×10<sup>-4</sup>). The relation of these gastrointestinal inflammatory disease to AD is therefore complex; while the different subsets of susceptibility variants preferentially affect different myeloid cell subtypes, there do appear to be certain shared pathways and the possible protective effect of IBD susceptibility on the risk of AD which may provide therapeutic insights.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/39/e6/nihpp-2023.04.17.23286845v1.PMC10153331.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9459545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Common- and rare-variant genetic architecture of heart failure across the allele frequency spectrum. 等位基因频率谱中常见和罕见的心力衰竭变异遗传结构。
Pub Date : 2024-10-23 DOI: 10.1101/2023.07.16.23292724
David S M Lee, Kathleen M Cardone, David Y Zhang, Noah L Tsao, Sarah Abramowitz, Pranav Sharma, John S DePaolo, Mitchell Conery, Krishna G Aragam, Kiran Biddinger, Ozan Dilitikas, Lily Hoffman-Andrews, Renae L Judy, Atlas Khan, Iftikhar Kulo, Megan J Puckelwartz, Nosheen Reza, Benjamin A Satterfield, Pankhuri Singhal, Zoltan P Arany, Thomas P Cappola, Eric Carruth, Sharlene M Day, Ron Do, Christopher M Haggarty, Jacob Joseph, Elizabeth M McNally, Girish Nadkarni, Anjali T Owens, Daniel J Rader, Marylyn D Ritchie, Yan V Sun, Benjamin F Voight, Michael G Levin, Scott M Damrauer

Heart failure (HF) is a complex trait, influenced by environmental and genetic factors, which affects over 30 million individuals worldwide. Historically, the genetics of HF have been studied in Mendelian forms of disease, where rare genetic variants have been linked to familial cardiomyopathies. More recently, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with risk of HF. However, the relative importance of genetic variants across the allele-frequency spectrum remains incompletely characterized. Here, we report the results of common- and rare-variant association studies of all-cause heart failure, applying recently developed methods to quantify the heritability of HF attributable to different classes of genetic variation. We combine GWAS data across multiple populations including 207,346 individuals with HF and 2,151,210 without, identifying 176 risk loci at genome-wide significance (P-value < 5×10-8). Signals at newly identified common-variant loci include coding variants in Mendelian cardiomyopathy genes (MYBPC3, BAG3) and in regulators of lipoprotein (LPL) and glucose metabolism (GIPR, GLP1R). These signals are enriched in myocyte and adipocyte cell types and can be clustered into 5 broad modules based on pleiotropic associations with anthropomorphic traits/obesity, blood pressure/renal function, atherosclerosis/lipids, immune activity, and arrhythmias. Gene burden studies across three biobanks (PMBB, UKB, AOU), including 27,208 individuals with HF and 349,126 without, uncover exome-wide significant (P-value < 1.57×10-6) associations for HF and rare predicted loss-of-function (pLoF) variants in TTN, MYBPC3, FLNC, and BAG3. Total burden heritability of rare coding variants (2.2%, 95% CI 0.99-3.5%) is highly concentrated in a small set of Mendelian cardiomyopathy genes, while common variant heritability (4.3%, 95% CI 3.9-4.7%) is more diffusely spread throughout the genome. Finally, we show that common-variant background, in the form of a polygenic risk score (PRS), significantly modifies the risk of HF among carriers of pathogenic truncating variants in the Mendelian cardiomyopathy gene TTN. Together, these findings provide a genetic link between dysregulated metabolism and HF, and suggest a significant polygenic component to HF exists that is not captured by current clinical genetic testing.

心力衰竭是一种受环境和遗传因素影响的复杂特征,影响着全球3000多万人。从历史上看,HF的遗传学研究是在孟德尔形式的疾病中进行的,其中罕见的遗传变异与家族性心肌病有关。最近,全基因组关联研究(GWAS)成功地确定了与HF风险相关的常见遗传变异。然而,遗传变异在等位基因频谱中的相对重要性仍不完全确定。在这里,我们报告了全因心力衰竭的常见和罕见变异关联研究的结果,应用最近开发的方法来量化不同类型遗传变异引起的HF的遗传力。我们结合了多个群体的GWAS数据,包括207346名HF患者和2151210名无HF患者,确定了176个具有全基因组显著性的风险位点(p<5×10-8)。新发现的常见变异基因座上的信号包括孟德尔心肌病基因的编码变异(MYBPC3、BAG3),以及脂蛋白(LPL)和葡萄糖代谢调节因子(GIPR、GLP1R),并在心脏、肌肉、神经和血管组织以及肌细胞和脂肪细胞类型中富集。三个生物库(PMBB、UKB、AOU)的基因负载研究,包括27208名HF患者和349126名无HF患者,揭示了TTN、MYBPC3、FLNC和BAG3中HF和罕见预测功能丧失(pLoF)变异的外显子组范围显著(p<3.15×10-6)关联。罕见编码变异体的总负荷遗传力(2.2%,95%CI 0.99-3.5%)高度集中在一小部分孟德尔心肌病基因中,并且低于常见变异体的遗传力(4.3%,95%CI 3.9-4.7%),后者在整个基因组中更为广泛。最后,我们证明了常见的变异背景,以多基因风险评分(PRS)的形式,显著改变了孟德尔心肌病基因TTN中致病性截短变异携带者的HF风险。这些发现表明HF存在一个重要的多基因成分,而目前的临床基因测试并没有捕捉到这一成分。
{"title":"Common- and rare-variant genetic architecture of heart failure across the allele frequency spectrum.","authors":"David S M Lee, Kathleen M Cardone, David Y Zhang, Noah L Tsao, Sarah Abramowitz, Pranav Sharma, John S DePaolo, Mitchell Conery, Krishna G Aragam, Kiran Biddinger, Ozan Dilitikas, Lily Hoffman-Andrews, Renae L Judy, Atlas Khan, Iftikhar Kulo, Megan J Puckelwartz, Nosheen Reza, Benjamin A Satterfield, Pankhuri Singhal, Zoltan P Arany, Thomas P Cappola, Eric Carruth, Sharlene M Day, Ron Do, Christopher M Haggarty, Jacob Joseph, Elizabeth M McNally, Girish Nadkarni, Anjali T Owens, Daniel J Rader, Marylyn D Ritchie, Yan V Sun, Benjamin F Voight, Michael G Levin, Scott M Damrauer","doi":"10.1101/2023.07.16.23292724","DOIUrl":"10.1101/2023.07.16.23292724","url":null,"abstract":"<p><p>Heart failure (HF) is a complex trait, influenced by environmental and genetic factors, which affects over 30 million individuals worldwide. Historically, the genetics of HF have been studied in Mendelian forms of disease, where rare genetic variants have been linked to familial cardiomyopathies. More recently, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with risk of HF. However, the relative importance of genetic variants across the allele-frequency spectrum remains incompletely characterized. Here, we report the results of common- and rare-variant association studies of all-cause heart failure, applying recently developed methods to quantify the heritability of HF attributable to different classes of genetic variation. We combine GWAS data across multiple populations including 207,346 individuals with HF and 2,151,210 without, identifying 176 risk loci at genome-wide significance (P-value < 5×10<sup>-8</sup>). Signals at newly identified common-variant loci include coding variants in Mendelian cardiomyopathy genes (<i>MYBPC3</i>, <i>BAG3</i>) and in regulators of lipoprotein (<i>LPL</i>) and glucose metabolism (<i>GIPR</i>, <i>GLP1R</i>). These signals are enriched in myocyte and adipocyte cell types and can be clustered into 5 broad modules based on pleiotropic associations with anthropomorphic traits/obesity, blood pressure/renal function, atherosclerosis/lipids, immune activity, and arrhythmias. Gene burden studies across three biobanks (PMBB, UKB, AOU), including 27,208 individuals with HF and 349,126 without, uncover exome-wide significant (P-value < 1.57×10<sup>-6</sup>) associations for HF and rare predicted loss-of-function (pLoF) variants in <i>TTN</i>, <i>MYBPC3</i>, <i>FLNC, and BAG3.</i> Total burden heritability of rare coding variants (2.2%, 95% CI 0.99-3.5%) is highly concentrated in a small set of Mendelian cardiomyopathy genes, while common variant heritability (4.3%, 95% CI 3.9-4.7%) is more diffusely spread throughout the genome. Finally, we show that common-variant background, in the form of a polygenic risk score (PRS), significantly modifies the risk of HF among carriers of pathogenic truncating variants in the Mendelian cardiomyopathy gene TTN. Together, these findings provide a genetic link between dysregulated metabolism and HF, and suggest a significant polygenic component to HF exists that is not captured by current clinical genetic testing.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/ec/53/nihpp-2023.07.16.23292724v3.PMC10371173.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9945525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TGIRT-seq of Inflammatory Breast Cancer Tumor and Blood Samples Reveals Widespread Enhanced Transcription Impacting RNA Splicing and Intronic RNAs in Plasma. 通过同时对肿瘤和血液中编码和非编码RNA进行TGIRT-seq分析来识别癌症炎症性乳腺癌生物标志物。
Pub Date : 2024-10-22 DOI: 10.1101/2023.05.26.23290469
Dennis Wylie, Xiaoping Wang, Jun Yao, Hengyi Xu, Elizabeth A Ferrick-Kiddie, Toshiaki Iwase, Savitri Krishnamurthy, Naoto T Ueno, Alan M Lambowitz

Inflammatory breast cancer (IBC) is the most aggressive and lethal breast cancer subtype but lacks unequivocal genomic differences or robust biomarkers that differentiate it from non-IBC. Here, Thermostable Group II intron Reverse Transcriptase RNA-sequencing (TGIRT-seq) revealed myriad differences in tumor samples, Peripheral Blood Mononuclear Cells (PBMCs), and plasma that distinguished IBC from non-IBC patients and healthy donors across all tested receptor-based subtypes. These included numerous differentially expressed protein-coding gene and non-coding RNAs in all three sample types, a granulocytic immune response in IBC PBMCs, and over-expression of antisense RNAs, suggesting wide-spread enhanced transcription in both IBC tumors and PBMCs. By using TGIRT-seq to quantitate Intron-exon Depth Ratios (IDRs) and mapping reads to both genome and transcriptome reference sequences, we developed methods for parallel analysis of transcriptional and post-transcriptional gene regulation. This analysis identified numerous differentially and non-differentially expressed protein-coding genes in IBC tumors and PBMCs with high IDRs, the latter reflecting rate-limiting RNA splicing that negatively impacts mRNA production. Mirroring gene expression differences in tumors and PBMCs, over-represented protein-coding gene RNAs in IBC patient plasma were largely intronic RNAs, while those in non-IBC patients and healthy donor plasma were largely mRNA fragments. Potential IBC biomarkers in plasma included T-cell receptor pre-mRNAs and intronic, LINE-1, and antisense RNAs. Our findings provide new insights into IBC and set the stage for monitoring disease progression and response to treatment by liquid biopsy. The methods developed for parallel transcriptional and post-transcriptional gene regulation analysis have potentially broad RNA-seq and clinical applications.

炎症性乳腺癌癌症(IBC)是最具侵袭性和致命性的癌症亚型,但在生物标志物鉴定方面存在滞后。在这里,我们使用了一种改进的Thermostable Group II内含子逆转录酶RNA测序(TGIRT-seq)方法来同时分析来自肿瘤、PBMC以及IBC和非IBC患者和健康供体的血浆的编码和非编码RNA。除了来自已知IBC相关基因的RNA外,我们在IBC肿瘤和PBMC中鉴定了数百种其他过表达的编码和非编码RNA(p≤0.001),包括较高比例的内含子-外显子深度比(IDRs)升高,这可能反映了转录增强导致内含子RNA的积累。因此,IBC血浆中差异表达的蛋白质编码基因RNA主要是内含子RNA片段,而健康供体和非IBC血浆的RNA主要是片段化的mRNA。血浆中潜在的IBC生物标志物包括追踪到IBC肿瘤和PBMC的T细胞受体前mRNA片段;内含子RNA片段与高IDR基因相关;以及我们发现在IBC中全局上调并在血浆中优先富集的LINE-1和其他逆转录元件RNA。我们的发现为IBC提供了新的见解,并证明了广泛分析转录组用于生物标志物鉴定的优势。为这项研究开发的RNA-seq和数据分析方法可能广泛适用于其他疾病。
{"title":"TGIRT-seq of Inflammatory Breast Cancer Tumor and Blood Samples Reveals Widespread Enhanced Transcription Impacting RNA Splicing and Intronic RNAs in Plasma.","authors":"Dennis Wylie, Xiaoping Wang, Jun Yao, Hengyi Xu, Elizabeth A Ferrick-Kiddie, Toshiaki Iwase, Savitri Krishnamurthy, Naoto T Ueno, Alan M Lambowitz","doi":"10.1101/2023.05.26.23290469","DOIUrl":"10.1101/2023.05.26.23290469","url":null,"abstract":"<p><p>Inflammatory breast cancer (IBC) is the most aggressive and lethal breast cancer subtype but lacks unequivocal genomic differences or robust biomarkers that differentiate it from non-IBC. Here, Thermostable Group II intron Reverse Transcriptase RNA-sequencing (TGIRT-seq) revealed myriad differences in tumor samples, Peripheral Blood Mononuclear Cells (PBMCs), and plasma that distinguished IBC from non-IBC patients and healthy donors across all tested receptor-based subtypes. These included numerous differentially expressed protein-coding gene and non-coding RNAs in all three sample types, a granulocytic immune response in IBC PBMCs, and over-expression of antisense RNAs, suggesting wide-spread enhanced transcription in both IBC tumors and PBMCs. By using TGIRT-seq to quantitate Intron-exon Depth Ratios (IDRs) and mapping reads to both genome and transcriptome reference sequences, we developed methods for parallel analysis of transcriptional and post-transcriptional gene regulation. This analysis identified numerous differentially and non-differentially expressed protein-coding genes in IBC tumors and PBMCs with high IDRs, the latter reflecting rate-limiting RNA splicing that negatively impacts mRNA production. Mirroring gene expression differences in tumors and PBMCs, over-represented protein-coding gene RNAs in IBC patient plasma were largely intronic RNAs, while those in non-IBC patients and healthy donor plasma were largely mRNA fragments. Potential IBC biomarkers in plasma included T-cell receptor pre-mRNAs and intronic, LINE-1, and antisense RNAs. Our findings provide new insights into IBC and set the stage for monitoring disease progression and response to treatment by liquid biopsy. The methods developed for parallel transcriptional and post-transcriptional gene regulation analysis have potentially broad RNA-seq and clinical applications.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/5f/b3/nihpp-2023.05.26.23290469v1.PMC10312853.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10122265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Cross-Modal Mutual Knowledge Distillation Framework for Alzheimer's Disease Diagnosis: Addressing Incomplete Modalities. 一种相互知识提取的人工智能框架,用于使用不完全多模图像早期检测阿尔茨海默病。
Pub Date : 2024-10-22 DOI: 10.1101/2023.08.24.23294574
Min Gu Kwak, Lingchao Mao, Zhiyang Zheng, Yi Su, Fleming Lure, Jing Li

Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Despite the promise of integrating multimodal neuroimages such as MRI and PET, handling datasets with incomplete modalities remains under-researched. This phenomenon, however, is common in real-world scenarios as not every patient has all modalities due to practical constraints such as cost, access, and safety concerns. We propose a deep learning framework employing cross-modal Mutual Knowledge Distillation (MKD) to model different sub-cohorts of patients based on their available modalities. In MKD, the multimodal model (e.g., MRI and PET) serves as a teacher, while the single-modality model (e.g., MRI only) is the student. Our MKD framework features three components: a Modality-Disentangling Teacher (MDT) model designed through information disentanglement, a student model that learns from classification errors and MDT's knowledge, and the teacher model enhanced via distilling the student's single-modal feature extraction capabilities. Moreover, we show the effectiveness of the proposed method through theoretical analysis and validate its performance with simulation studies. In addition, our method is demonstrated through a case study with Alzheimer's Disease Neuroimaging Initiative (ADNI) datasets, underscoring the potential of artificial intelligence in addressing incomplete multimodal neuroimaging datasets and advancing early AD detection.

Note to practitioners—: This paper was motivated by the challenge of early AD diagnosis, particularly in scenarios when clinicians encounter varied availability of patient imaging data, such as MRI and PET scans, often constrained by cost or accessibility issues. We propose an incomplete multimodal learning framework that produces tailored models for patients with only MRI and patients with both MRI and PET. This approach improves the accuracy and effectiveness of early AD diagnosis, especially when imaging resources are limited, via bi-directional knowledge transfer. We introduced a teacher model that prioritizes extracting common information between different modalities, significantly enhancing the student model's learning process. This paper includes theoretical analysis, simulation study, and real-world case study to illustrate the method's promising potential in early AD detection. However, practitioners should be mindful of the complexities involved in model tuning. Future work will focus on improving model interpretability and expanding its application. This includes developing methods to discover the key brain regions for predictions, enhancing clinical trust, and extending the framework to incorporate a broader range of imaging modalities, demographic information, and clinical data. These advancements aim to provide a more comprehensive view of patient health and improve diagnostic accuracy across various neurodegenerative diseases.

阿尔茨海默病(AD)的早期检测对于确保及时干预和优化患者的治疗结果至关重要。虽然整合MRI和PET等多模态神经图像显示出巨大的前景,但在整合中有效处理不完整的多模态图像数据集的研究有限。为此,我们提出了一个基于深度学习的框架,该框架使用互知识提取(MKD)基于不同的子队列各自可用的图像模式对其进行联合建模。在MKD中,具有更多模态(例如MRI和PET)的模型被视为教师,而具有较少模态(例如仅MRI)的模型则被视为学生。我们提出的MKD框架包括三个关键组成部分:首先,我们通过多模态信息解纠缠,设计了一个面向学生的教师模型,即面向学生的多模态教师(SMT)。其次,我们训练学生模型,不仅要最大限度地减少其分类错误,还要向SMT老师学习。第三,我们通过从学生的特征提取器进行迁移学习来更新教师模型,因为学生模型是用更多的样本训练的。对阿尔茨海默病神经成像倡议(ADNI)数据集的评估突出了我们方法的有效性。我们的工作证明了使用人工智能解决不完整的多模态神经图像数据集挑战的潜力,为推进早期AD检测和治疗策略开辟了新途径。
{"title":"A Cross-Modal Mutual Knowledge Distillation Framework for Alzheimer's Disease Diagnosis: Addressing Incomplete Modalities.","authors":"Min Gu Kwak, Lingchao Mao, Zhiyang Zheng, Yi Su, Fleming Lure, Jing Li","doi":"10.1101/2023.08.24.23294574","DOIUrl":"10.1101/2023.08.24.23294574","url":null,"abstract":"<p><p>Early detection of Alzheimer's Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Despite the promise of integrating multimodal neuroimages such as MRI and PET, handling datasets with incomplete modalities remains under-researched. This phenomenon, however, is common in real-world scenarios as not every patient has all modalities due to practical constraints such as cost, access, and safety concerns. We propose a deep learning framework employing cross-modal Mutual Knowledge Distillation (MKD) to model different sub-cohorts of patients based on their available modalities. In MKD, the multimodal model (e.g., MRI and PET) serves as a teacher, while the single-modality model (e.g., MRI only) is the student. Our MKD framework features three components: a Modality-Disentangling Teacher (MDT) model designed through information disentanglement, a student model that learns from classification errors and MDT's knowledge, and the teacher model enhanced via distilling the student's single-modal feature extraction capabilities. Moreover, we show the effectiveness of the proposed method through theoretical analysis and validate its performance with simulation studies. In addition, our method is demonstrated through a case study with Alzheimer's Disease Neuroimaging Initiative (ADNI) datasets, underscoring the potential of artificial intelligence in addressing incomplete multimodal neuroimaging datasets and advancing early AD detection.</p><p><strong>Note to practitioners—: </strong>This paper was motivated by the challenge of early AD diagnosis, particularly in scenarios when clinicians encounter varied availability of patient imaging data, such as MRI and PET scans, often constrained by cost or accessibility issues. We propose an incomplete multimodal learning framework that produces tailored models for patients with only MRI and patients with both MRI and PET. This approach improves the accuracy and effectiveness of early AD diagnosis, especially when imaging resources are limited, via bi-directional knowledge transfer. We introduced a teacher model that prioritizes extracting common information between different modalities, significantly enhancing the student model's learning process. This paper includes theoretical analysis, simulation study, and real-world case study to illustrate the method's promising potential in early AD detection. However, practitioners should be mindful of the complexities involved in model tuning. Future work will focus on improving model interpretability and expanding its application. This includes developing methods to discover the key brain regions for predictions, enhancing clinical trust, and extending the framework to incorporate a broader range of imaging modalities, demographic information, and clinical data. These advancements aim to provide a more comprehensive view of patient health and improve diagnostic accuracy across various neurodegenerative diseases.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/70/04/nihpp-2023.08.24.23294574v1.PMC10473798.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10213310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GestaltMatcher Database - A global reference for facial phenotypic variability in rare human diseases. GestaltMatcher数据库-一个FAIR数据库,用于罕见疾病的医学成像数据。
Pub Date : 2024-10-08 DOI: 10.1101/2023.06.06.23290887
Hellen Lesmann, Alexander Hustinx, Shahida Moosa, Hannah Klinkhammer, Elaine Marchi, Pilar Caro, Ibrahim M Abdelrazek, Jean Tori Pantel, Merle Ten Hagen, Meow-Keong Thong, Rifhan Azwani Binti Mazlan, Sok Kun Tae, Tom Kamphans, Wolfgang Meiswinkel, Jing-Mei Li, Behnam Javanmardi, Alexej Knaus, Annette Uwineza, Cordula Knopp, Tinatin Tkemaladze, Miriam Elbracht, Larissa Mattern, Rami Abou Jamra, Clara Velmans, Vincent Strehlow, Maureen Jacob, Angela Peron, Cristina Dias, Beatriz Carvalho Nunes, Thainá Vilella, Isabel Furquim Pinheiro, Chong Ae Kim, Maria Isabel Melaragno, Hannah Weiland, Sophia Kaptain, Karolina Chwiałkowska, Miroslaw Kwasniewski, Ramy Saad, Sarah Wiethoff, Himanshu Goel, Clara Tang, Anna Hau, Tahsin Stefan Barakat, Przemysław Panek, Amira Nabil, Julia Suh, Frederik Braun, Israel Gomy, Luisa Averdunk, Ekanem Ekure, Gaber Bergant, Borut Peterlin, Claudio Graziano, Nagwa Gaboon, Moisés Fiesco-Roa, Alessandro Mauro Spinelli, Nina-Maria Wilpert, Prasit Phowthongkum, Nergis Güzel, Tobias B Haack, Rana Bitar, Andreas Tzschach, Agusti Rodriguez-Palmero, Theresa Brunet, Sabine Rudnik-Schöneborn, Silvina Noemi Contreras-Capetillo, Ava Oberlack, Carole Samango-Sprouse, Teresa Sadeghin, Margaret Olaya, Konrad Platzer, Artem Borovikov, Franziska Schnabel, Lara Heuft, Vera Herrmann, Renske Oegema, Nour Elkhateeb, Sheetal Kumar, Katalin Komlosi, Khoushoua Mohamed, Silvia Kalantari, Fabio Sirchia, Antonio F Martinez-Monseny, Matthias Höller, Louiza Toutouna, Amal Mohamed, Amaia Lasa-Aranzasti, John A Sayer, Nadja Ehmke, Magdalena Danyel, Henrike Sczakiel, Sarina Schwartzmann, Felix Boschann, Max Zhao, Ronja Adam, Lara Einicke, Denise Horn, Kee Seang Chew, Choy Chen Kam, Miray Karakoyun, Ben Pode-Shakked, Aviva Eliyahu, Rachel Rock, Teresa Carrion, Odelia Chorin, Yuri A Zarate, Marcelo Martinez Conti, Mert Karakaya, Moon Ley Tung, Bharatendu Chandra, Arjan Bouman, Aime Lumaka, Naveed Wasif, Marwan Shinawi, Patrick R Blackburn, Tianyun Wang, Tim Niehues, Axel Schmidt, Regina Rita Roth, Dagmar Wieczorek, Ping Hu, Rebekah L Waikel, Suzanna E Ledgister Hanchard, Gehad Elmakkawy, Sylvia Safwat, Frédéric Ebstein, Elke Krüger, Sébastien Küry, Stéphane Bézieau, Annabelle Arlt, Eric Olinger, Felix Marbach, Dong Li, Lucie Dupuis, Roberto Mendoza-Londono, Sofia Douzgou Houge, Denisa Weis, Brian Hon-Yin Chung, Christopher C Y Mak, Hülya Kayserili, Nursel Elcioglu, Ayca Aykut, Peli Özlem Şimşek-Kiper, Nina Bögershausen, Bernd Wollnik, Heidi Beate Bentzen, Ingo Kurth, Christian Netzer, Aleksandra Jezela-Stanek, Koen Devriendt, Karen W Gripp, Martin Mücke, Alain Verloes, Christian P Schaaf, Christoffer Nellåker, Benjamin D Solomon, Markus M Nöthen, Ebtesam Abdalla, Gholson J Lyon, Peter M Krawitz, Tzung-Chien Hsieh

The most important factor that complicates the work of dysmorphologists is the significant phenotypic variability of the human face. Next-Generation Phenotyping (NGP) tools that assist clinicians with recognizing characteristic syndromic patterns are particularly challenged when confronted with patients from populations different from their training data. To that end, we systematically analyzed the impact of genetic ancestry on facial dysmorphism. For that purpose, we established the GestaltMatcher Database (GMDB) as a reference dataset for medical images of patients with rare genetic disorders from around the world. We collected 10,980 frontal facial images - more than a quarter previously unpublished - from 8,346 patients, representing 581 rare disorders. Although the predominant ancestry is still European (67%), data from underrepresented populations have been increased considerably via global collaborations (19% Asian and 7% African). This includes previously unpublished reports for more than 40% of the African patients. The NGP analysis on this diverse dataset revealed characteristic performance differences depending on the composition of training and test sets corresponding to genetic relatedness. For clinical use of NGP, incorporating non-European patients resulted in a profound enhancement of GestaltMatcher performance. The top-5 accuracy rate increased by +11.29%. Importantly, this improvement in delineating the correct disorder from a facial portrait was achieved without decreasing the performance on European patients. By design, GMDB complies with the FAIR principles by rendering the curated medical data findable, accessible, interoperable, and reusable. This means GMDB can also serve as data for training and benchmarking. In summary, our study on facial dysmorphism on a global sample revealed a considerable cross ancestral phenotypic variability confounding NGP that should be counteracted by international efforts for increasing data diversity. GMDB will serve as a vital reference database for clinicians and a transparent training set for advancing NGP technology.

计算机辅助图像分析的价值已经在几项研究中得到了证明。人工智能工具(如GestaltMatcher)的性能随着训练集的大小和多样性而提高,但正确标记的训练数据是目前开发下一代表型(NGP)应用程序的最大瓶颈。因此,我们开发了GestaltMatcher数据库(GMDB),这是一个机器可读医学图像数据的数据库,符合FAIR原则,提高了医学遗传学科学发现的开放性和可访问性。GMDB中的条目包括医学图像,如肖像、X射线或眼底镜检查,以及机器可读元信息,如HPO术语中编码的临床特征或HGVS格式报告的致病突变。一开始,数据主要是由策展人从文献中收集图像来收集的。目前,从患者支持小组招募的临床医生和个人提供了他们以前未发表的数据。对于这种以患者为中心的方法,我们开发了一种数字同意书。GMDB是一种现代的病例报告出版媒介,补充了预印本,例如medRxiv。为了实现队列间比较,我们在GMDB中实现了一个研究功能,该功能计算手工挑选的病例之间的成对症状相似性。通过社区驱动的努力,我们收集了超过7533例GMDB中792种疾病的图像。大部分数据来自2058份出版物。此外,还获得了498例先前未发表病例的约1018张正面图像。网络界面允许以基因和表型为中心的查询或在图库中进行无限滚动。数字同意导致患者越来越多地采用这种方法。GMDB中的研究应用程序用于生成症状相似性矩阵,以表征两种新表型(CSNK2B、PSMC3)。GMDB是NGP的第一个FAIR数据库,其中的数据是可查找、可访问、可互操作和可重用的。它是medRxiv中无法包含的医学图像的存储库。这意味着GMDB将临床医生与特定表型的共同兴趣联系起来,并提高人工智能的性能。
{"title":"GestaltMatcher Database - A global reference for facial phenotypic variability in rare human diseases.","authors":"Hellen Lesmann, Alexander Hustinx, Shahida Moosa, Hannah Klinkhammer, Elaine Marchi, Pilar Caro, Ibrahim M Abdelrazek, Jean Tori Pantel, Merle Ten Hagen, Meow-Keong Thong, Rifhan Azwani Binti Mazlan, Sok Kun Tae, Tom Kamphans, Wolfgang Meiswinkel, Jing-Mei Li, Behnam Javanmardi, Alexej Knaus, Annette Uwineza, Cordula Knopp, Tinatin Tkemaladze, Miriam Elbracht, Larissa Mattern, Rami Abou Jamra, Clara Velmans, Vincent Strehlow, Maureen Jacob, Angela Peron, Cristina Dias, Beatriz Carvalho Nunes, Thainá Vilella, Isabel Furquim Pinheiro, Chong Ae Kim, Maria Isabel Melaragno, Hannah Weiland, Sophia Kaptain, Karolina Chwiałkowska, Miroslaw Kwasniewski, Ramy Saad, Sarah Wiethoff, Himanshu Goel, Clara Tang, Anna Hau, Tahsin Stefan Barakat, Przemysław Panek, Amira Nabil, Julia Suh, Frederik Braun, Israel Gomy, Luisa Averdunk, Ekanem Ekure, Gaber Bergant, Borut Peterlin, Claudio Graziano, Nagwa Gaboon, Moisés Fiesco-Roa, Alessandro Mauro Spinelli, Nina-Maria Wilpert, Prasit Phowthongkum, Nergis Güzel, Tobias B Haack, Rana Bitar, Andreas Tzschach, Agusti Rodriguez-Palmero, Theresa Brunet, Sabine Rudnik-Schöneborn, Silvina Noemi Contreras-Capetillo, Ava Oberlack, Carole Samango-Sprouse, Teresa Sadeghin, Margaret Olaya, Konrad Platzer, Artem Borovikov, Franziska Schnabel, Lara Heuft, Vera Herrmann, Renske Oegema, Nour Elkhateeb, Sheetal Kumar, Katalin Komlosi, Khoushoua Mohamed, Silvia Kalantari, Fabio Sirchia, Antonio F Martinez-Monseny, Matthias Höller, Louiza Toutouna, Amal Mohamed, Amaia Lasa-Aranzasti, John A Sayer, Nadja Ehmke, Magdalena Danyel, Henrike Sczakiel, Sarina Schwartzmann, Felix Boschann, Max Zhao, Ronja Adam, Lara Einicke, Denise Horn, Kee Seang Chew, Choy Chen Kam, Miray Karakoyun, Ben Pode-Shakked, Aviva Eliyahu, Rachel Rock, Teresa Carrion, Odelia Chorin, Yuri A Zarate, Marcelo Martinez Conti, Mert Karakaya, Moon Ley Tung, Bharatendu Chandra, Arjan Bouman, Aime Lumaka, Naveed Wasif, Marwan Shinawi, Patrick R Blackburn, Tianyun Wang, Tim Niehues, Axel Schmidt, Regina Rita Roth, Dagmar Wieczorek, Ping Hu, Rebekah L Waikel, Suzanna E Ledgister Hanchard, Gehad Elmakkawy, Sylvia Safwat, Frédéric Ebstein, Elke Krüger, Sébastien Küry, Stéphane Bézieau, Annabelle Arlt, Eric Olinger, Felix Marbach, Dong Li, Lucie Dupuis, Roberto Mendoza-Londono, Sofia Douzgou Houge, Denisa Weis, Brian Hon-Yin Chung, Christopher C Y Mak, Hülya Kayserili, Nursel Elcioglu, Ayca Aykut, Peli Özlem Şimşek-Kiper, Nina Bögershausen, Bernd Wollnik, Heidi Beate Bentzen, Ingo Kurth, Christian Netzer, Aleksandra Jezela-Stanek, Koen Devriendt, Karen W Gripp, Martin Mücke, Alain Verloes, Christian P Schaaf, Christoffer Nellåker, Benjamin D Solomon, Markus M Nöthen, Ebtesam Abdalla, Gholson J Lyon, Peter M Krawitz, Tzung-Chien Hsieh","doi":"10.1101/2023.06.06.23290887","DOIUrl":"10.1101/2023.06.06.23290887","url":null,"abstract":"<p><p>The most important factor that complicates the work of dysmorphologists is the significant phenotypic variability of the human face. Next-Generation Phenotyping (NGP) tools that assist clinicians with recognizing characteristic syndromic patterns are particularly challenged when confronted with patients from populations different from their training data. To that end, we systematically analyzed the impact of genetic ancestry on facial dysmorphism. For that purpose, we established the GestaltMatcher Database (GMDB) as a reference dataset for medical images of patients with rare genetic disorders from around the world. We collected 10,980 frontal facial images - more than a quarter previously unpublished - from 8,346 patients, representing 581 rare disorders. Although the predominant ancestry is still European (67%), data from underrepresented populations have been increased considerably via global collaborations (19% Asian and 7% African). This includes previously unpublished reports for more than 40% of the African patients. The NGP analysis on this diverse dataset revealed characteristic performance differences depending on the composition of training and test sets corresponding to genetic relatedness. For clinical use of NGP, incorporating non-European patients resulted in a profound enhancement of GestaltMatcher performance. The top-5 accuracy rate increased by +11.29%. Importantly, this improvement in delineating the correct disorder from a facial portrait was achieved without decreasing the performance on European patients. By design, GMDB complies with the FAIR principles by rendering the curated medical data findable, accessible, interoperable, and reusable. This means GMDB can also serve as data for training and benchmarking. In summary, our study on facial dysmorphism on a global sample revealed a considerable cross ancestral phenotypic variability confounding NGP that should be counteracted by international efforts for increasing data diversity. GMDB will serve as a vital reference database for clinicians and a transparent training set for advancing NGP technology.</p>","PeriodicalId":18659,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2b/fe/nihpp-2023.06.06.23290887v1.PMC10371103.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9934770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
medRxiv : the preprint server for health sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1