用于神经发育障碍(NDD)诊断的附加信号实施的综合评估。

IF 3.8 2区 生物学 Q2 GENETICS & HEREDITY Human Genetics Pub Date : 2023-12-01 Epub Date: 2023-10-27 DOI:10.1007/s00439-023-02609-2
Edoardo Giuili, Robin Grolaux, Catarina Z N M Macedo, Laurence Desmyter, Bruno Pichon, Sebastian Neuens, Catheline Vilain, Catharina Olsen, Sonia Van Dooren, Guillaume Smits, Matthieu Defrance
{"title":"用于神经发育障碍(NDD)诊断的附加信号实施的综合评估。","authors":"Edoardo Giuili, Robin Grolaux, Catarina Z N M Macedo, Laurence Desmyter, Bruno Pichon, Sebastian Neuens, Catheline Vilain, Catharina Olsen, Sonia Van Dooren, Guillaume Smits, Matthieu Defrance","doi":"10.1007/s00439-023-02609-2","DOIUrl":null,"url":null,"abstract":"<p><p>Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models' predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.</p>","PeriodicalId":13175,"journal":{"name":"Human Genetics","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10676303/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs).\",\"authors\":\"Edoardo Giuili, Robin Grolaux, Catarina Z N M Macedo, Laurence Desmyter, Bruno Pichon, Sebastian Neuens, Catheline Vilain, Catharina Olsen, Sonia Van Dooren, Guillaume Smits, Matthieu Defrance\",\"doi\":\"10.1007/s00439-023-02609-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models' predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.</p>\",\"PeriodicalId\":13175,\"journal\":{\"name\":\"Human Genetics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10676303/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Human Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s00439-023-02609-2\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/10/27 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s00439-023-02609-2","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/27 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

信号是诊断罕见神经发育障碍的常用工具。它们通常基于一组与支持向量机模型结合使用的差异甲基化CpG。DNA甲基化(DNAm)数据通常包括由于数据生成技术和批量效应的变化而导致的缺失值。虽然DNAm数据存在许多标准化方法,但从未评估过它们对附加信号性能的影响。此外,量化DNAm的技术发展迅速,这可能导致在不推荐的阵列版本上生成的现有附加信号转换为新的附加信号的效果不佳。事实上,在阵列版本、技术之间或预处理过程中移除探针会导致值丢失。因此,缺失数据对附加标志性能的影响也必须通过插补或附加标志设计的创新方法进行仔细评估和解决。在本文中,我们使用歌舞伎和索托斯综合征患者的数据来评估归一化方法、分类模型和缺失数据对两种现有发作信号预测性能的影响。我们比较了六种流行的甲基阵列数据归一化方法如何影响歌舞伎综合征和索托斯综合征的附加信号分类性能,并在构建新的附加信号时提供了最佳实践建议。在这种情况下,我们表明,与Quantile、Raw和Swan归一化方法相比,Illumina、Noob或Funnorm归一化方法在测试集上实现了更高的分类性能。我们进一步表明,惩罚逻辑回归和支持向量机在Kabuki和Sotos综合征患者的分类中表现最好。然后,我们描述了一种基于差异甲基化区域(DMRs)检测构建附加信号的新范式,并在存在缺失数据的情况下,评估其与基于经典差异甲基化胞嘧啶(DMCs)的附加信号相比的性能。我们表明,与基于DMR的方法相比,经典的基于DMC的附加信号的性能更容易受到数据缺失的影响。我们使用三个流行的分类模型,对DNA甲基化数据的标准化如何影响附加信号表现进行了全面评估。我们进一步评估了缺失的数据如何影响这些模型的预测。最后,我们提出了一种基于差异甲基化区域识别的新方法来开发附加信号,并展示了在存在缺失数据的情况下,该方法如何略微优于经典附加信号。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comprehensive evaluation of the implementation of episignatures for diagnosis of neurodevelopmental disorders (NDDs).

Episignatures are popular tools for the diagnosis of rare neurodevelopmental disorders. They are commonly based on a set of differentially methylated CpGs used in combination with a support vector machine model. DNA methylation (DNAm) data often include missing values due to changes in data generation technology and batch effects. While many normalization methods exist for DNAm data, their impact on episignature performance have never been assessed. In addition, technologies to quantify DNAm evolve quickly and this may lead to poor transposition of existing episignatures generated on deprecated array versions to new ones. Indeed, probe removal between array versions, technologies or during preprocessing leads to missing values. Thus, the effect of missing data on episignature performance must also be carefully evaluated and addressed through imputation or an innovative approach to episignatures design. In this paper, we used data from patients suffering from Kabuki and Sotos syndrome to evaluate the influence of normalization methods, classification models and missing data on the prediction performances of two existing episignatures. We compare how six popular normalization methods for methylarray data affect episignature classification performances in Kabuki and Sotos syndromes and provide best practice suggestions when building new episignatures. In this setting, we show that Illumina, Noob or Funnorm normalization methods achieved higher classification performances on the testing sets compared to Quantile, Raw and Swan normalization methods. We further show that penalized logistic regression and support vector machines perform best in the classification of Kabuki and Sotos syndrome patients. Then, we describe a new paradigm to build episignatures based on the detection of differentially methylated regions (DMRs) and evaluate their performance compared to classical differentially methylated cytosines (DMCs)-based episignatures in the presence of missing data. We show that the performance of classical DMC-based episignatures suffers from the presence of missing data more than the DMR-based approach. We present a comprehensive evaluation of how the normalization of DNA methylation data affects episignature performance, using three popular classification models. We further evaluate how missing data affect those models' predictions. Finally, we propose a novel methodology to develop episignatures based on differentially methylated regions identification and show how this method slightly outperforms classical episignatures in the presence of missing data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Human Genetics
Human Genetics 生物-遗传学
CiteScore
10.80
自引率
3.80%
发文量
94
审稿时长
1 months
期刊介绍: Human Genetics is a monthly journal publishing original and timely articles on all aspects of human genetics. The Journal particularly welcomes articles in the areas of Behavioral genetics, Bioinformatics, Cancer genetics and genomics, Cytogenetics, Developmental genetics, Disease association studies, Dysmorphology, ELSI (ethical, legal and social issues), Evolutionary genetics, Gene expression, Gene structure and organization, Genetics of complex diseases and epistatic interactions, Genetic epidemiology, Genome biology, Genome structure and organization, Genotype-phenotype relationships, Human Genomics, Immunogenetics and genomics, Linkage analysis and genetic mapping, Methods in Statistical Genetics, Molecular diagnostics, Mutation detection and analysis, Neurogenetics, Physical mapping and Population Genetics. Articles reporting animal models relevant to human biology or disease are also welcome. Preference will be given to those articles which address clinically relevant questions or which provide new insights into human biology. Unless reporting entirely novel and unusual aspects of a topic, clinical case reports, cytogenetic case reports, papers on descriptive population genetics, articles dealing with the frequency of polymorphisms or additional mutations within genes in which numerous lesions have already been described, and papers that report meta-analyses of previously published datasets will normally not be accepted. The Journal typically will not consider for publication manuscripts that report merely the isolation, map position, structure, and tissue expression profile of a gene of unknown function unless the gene is of particular interest or is a candidate gene involved in a human trait or disorder.
期刊最新文献
Integrative genomic analyses identify neuroblastoma risk genes involved in neuronal differentiation. VCAT: an integrated variant function annotation tools. Structure-informed protein language models are robust predictors for variant effects. Assessing predictions on fitness effects of missense variants in HMBS in CAGI6. GBF1 deficiency causes cataracts in human and mouse.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1