Amy M Inkster, Martin T Wong, Allison M Matthews, Carolyn J Brown, Wendy P Robinson
{"title":"Who's afraid of the X? Incorporating the X and Y chromosomes into the analysis of DNA methylation array data.","authors":"Amy M Inkster, Martin T Wong, Allison M Matthews, Carolyn J Brown, Wendy P Robinson","doi":"10.1186/s13072-022-00477-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Many human disease phenotypes manifest differently by sex, making the development of methods for incorporating X and Y-chromosome data into analyses vital. Unfortunately, X and Y chromosome data are frequently excluded from large-scale analyses of the human genome and epigenome due to analytical complexity associated with sex chromosome dosage differences between XX and XY individuals, and the impact of X-chromosome inactivation (XCI) on the epigenome. As such, little attention has been given to considering the methods by which sex chromosome data may be included in analyses of DNA methylation (DNAme) array data.</p><p><strong>Results: </strong>With Illumina Infinium HumanMethylation450 DNAme array data from 634 placental samples, we investigated the effects of probe filtering, normalization, and batch correction on DNAme data from the X and Y chromosomes. Processing steps were evaluated in both mixed-sex and sex-stratified subsets of the analysis cohort to identify whether including both sexes impacted processing results. We found that identification of probes that have a high detection p-value, or that are non-variable, should be performed in sex-stratified data subsets to avoid over- and under-estimation of the quantity of probes eligible for removal, respectively. All normalization techniques investigated returned X and Y DNAme data that were highly correlated with the raw data from the same samples. We found no difference in batch correction results after application to mixed-sex or sex-stratified cohorts. Additionally, we identify two analytical methods suitable for XY chromosome data, the choice between which should be guided by the research question of interest, and we performed a proof-of-concept analysis studying differential DNAme on the X and Y chromosome in the context of placental acute chorioamnionitis. Finally, we provide an annotation of probe types that may be desirable to filter in X and Y chromosome analyses, including probes in repetitive elements, the X-transposed region, and cancer-testis gene promoters.</p><p><strong>Conclusion: </strong>While there may be no single \"best\" approach for analyzing DNAme array data from the X and Y chromosome, analysts must consider key factors during processing and analysis of sex chromosome data to accommodate the underlying biology of these chromosomes, and the technical limitations of DNA methylation arrays.</p>","PeriodicalId":49253,"journal":{"name":"Epigenetics & Chromatin","volume":"16 1","pages":"1"},"PeriodicalIF":4.2000,"publicationDate":"2023-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825011/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epigenetics & Chromatin","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13072-022-00477-0","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Many human disease phenotypes manifest differently by sex, making the development of methods for incorporating X and Y-chromosome data into analyses vital. Unfortunately, X and Y chromosome data are frequently excluded from large-scale analyses of the human genome and epigenome due to analytical complexity associated with sex chromosome dosage differences between XX and XY individuals, and the impact of X-chromosome inactivation (XCI) on the epigenome. As such, little attention has been given to considering the methods by which sex chromosome data may be included in analyses of DNA methylation (DNAme) array data.
Results: With Illumina Infinium HumanMethylation450 DNAme array data from 634 placental samples, we investigated the effects of probe filtering, normalization, and batch correction on DNAme data from the X and Y chromosomes. Processing steps were evaluated in both mixed-sex and sex-stratified subsets of the analysis cohort to identify whether including both sexes impacted processing results. We found that identification of probes that have a high detection p-value, or that are non-variable, should be performed in sex-stratified data subsets to avoid over- and under-estimation of the quantity of probes eligible for removal, respectively. All normalization techniques investigated returned X and Y DNAme data that were highly correlated with the raw data from the same samples. We found no difference in batch correction results after application to mixed-sex or sex-stratified cohorts. Additionally, we identify two analytical methods suitable for XY chromosome data, the choice between which should be guided by the research question of interest, and we performed a proof-of-concept analysis studying differential DNAme on the X and Y chromosome in the context of placental acute chorioamnionitis. Finally, we provide an annotation of probe types that may be desirable to filter in X and Y chromosome analyses, including probes in repetitive elements, the X-transposed region, and cancer-testis gene promoters.
Conclusion: While there may be no single "best" approach for analyzing DNAme array data from the X and Y chromosome, analysts must consider key factors during processing and analysis of sex chromosome data to accommodate the underlying biology of these chromosomes, and the technical limitations of DNA methylation arrays.
背景:许多人类疾病的表型因性别而异,因此开发将 X 和 Y 染色体数据纳入分析的方法至关重要。遗憾的是,由于 XX 和 XY 个体之间性染色体剂量差异带来的分析复杂性,以及 X 染色体失活(XCI)对表观基因组的影响,X 和 Y 染色体数据经常被排除在人类基因组和表观基因组的大规模分析之外。因此,人们很少关注如何将性染色体数据纳入 DNA 甲基化(DNAme)阵列数据的分析中:利用来自 634 个胎盘样本的 Illumina Infinium HumanMethylation450 DNAme 阵列数据,我们研究了探针过滤、归一化和批次校正对来自 X 和 Y 染色体的 DNAme 数据的影响。我们在分析队列的混合性别子集和性别分层子集中评估了处理步骤,以确定包括男女两性是否会影响处理结果。我们发现,应在性别分层数据子集中识别检测 p 值较高或不可变的探针,以避免分别高估和低估符合移除条件的探针数量。所研究的所有归一化技术返回的 X 和 Y DNAme 数据都与同一样本的原始数据高度相关。我们发现,在应用于混合性别或性别分层队列后,批次校正结果没有差异。此外,我们还确定了两种适用于 XY 染色体数据的分析方法,选择哪种方法应根据感兴趣的研究问题而定,我们还进行了概念验证分析,研究了胎盘急性绒毛膜羊膜炎背景下 X 和 Y 染色体 DNAme 的差异。最后,我们提供了在 X 和 Y 染色体分析中可能需要过滤的探针类型的注释,包括重复元件、X 转座区域和癌睾丸基因启动子中的探针:虽然分析 X 和 Y 染色体 DNAme 阵列数据可能没有单一的 "最佳 "方法,但分析人员在处理和分析性染色体数据时必须考虑关键因素,以适应这些染色体的潜在生物学特性以及 DNA 甲基化阵列的技术局限性。
期刊介绍:
Epigenetics & Chromatin is a peer-reviewed, open access, online journal that publishes research, and reviews, providing novel insights into epigenetic inheritance and chromatin-based interactions. The journal aims to understand how gene and chromosomal elements are regulated and their activities maintained during processes such as cell division, differentiation and environmental alteration.