基于DNA甲基化的精液年龄估计:全基因组标记鉴定和模型开发。

IF 3.2 2区 医学 Q2 GENETICS & HEREDITY Forensic Science International-Genetics Pub Date : 2024-12-25 DOI:10.1016/j.fsigen.2024.103215
Ya Li , Xiaozhao Liu , Maomin Chen , Shaohua Yi , Ximiao He , Chao Xiao , Daixin Huang
{"title":"基于DNA甲基化的精液年龄估计:全基因组标记鉴定和模型开发。","authors":"Ya Li ,&nbsp;Xiaozhao Liu ,&nbsp;Maomin Chen ,&nbsp;Shaohua Yi ,&nbsp;Ximiao He ,&nbsp;Chao Xiao ,&nbsp;Daixin Huang","doi":"10.1016/j.fsigen.2024.103215","DOIUrl":null,"url":null,"abstract":"<div><div>DNA methylation at age-related CpG (AR-CpG) sites holds significant promise for forensic age estimation. However, somatic models perform poorly in semen due to unique methylation dynamics during spermatogenesis, and current studies are constrained by the limited coverage of methylation microarrays. This study aimed to identify novel semen-specific AR-CpG sites using double-enzyme reduced representation bisulfite sequencing (dRRBS) and validate these markers, alongside previously reported sites and neighboring CpGs, using bisulfite amplicon sequencing (BSAS) to develop robust age estimation models. A methylome-wide association study was conducted on semen samples from 21 healthy Chinese men across three age groups, generating over 4 million CpG sites per sample at ≥ 5 × depth. Analysis of 721,840 shared CpG sites revealed that more than 95 % were not covered by conventional methylation microarrays. Differential methylation and correlation analyses identified 139 AR-CpG sites. A two-stage validation process using multiplex PCR-based BSAS was performed. In the first stage, 47 top dRRBS-identified AR-CpG sites, 26 literature-reported sites, and 242 neighboring CpGs were assessed in 129 semen samples (22–64 years), validating 31 dRRBS, 26 literature-reported, and 152 neighboring CpGs as age-related. The second stage examined 154 CpG sites in 247 samples (22–67 years), confirming 71 AR-CpG sites with |rho| &gt; 0.50. Among these, chr2:129071885 (cg19998819) emerged as the strongest age-associated marker (rho = 0.81). Using the second BSAS dataset, age estimation models were developed with multiple linear regression and random forest (RF) algorithms within a repeated nested cross-validation (CV) framework (10-fold outer CV with 10-fold inner CV, repeated 10 times). The RF models demonstrated superior accuracy across feature subsets of 5–25 CpGs. The optimized 9-CpG RF model achieved an average root mean square error of 4.73 years (4.62–4.96, SD=0.10) and an average mean absolute error of 3.30 years (3.23–3.43, SD=0.06). This study demonstrates the utility of dRRBS for large-scale AR-CpG discovery and provides a robust age estimation model and a comprehensive reference database of semen-specific AR-CpG sites for forensic applications.</div></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"76 ","pages":"Article 103215"},"PeriodicalIF":3.2000,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DNA methylation-based age estimation from semen: Genome-wide marker identification and model development\",\"authors\":\"Ya Li ,&nbsp;Xiaozhao Liu ,&nbsp;Maomin Chen ,&nbsp;Shaohua Yi ,&nbsp;Ximiao He ,&nbsp;Chao Xiao ,&nbsp;Daixin Huang\",\"doi\":\"10.1016/j.fsigen.2024.103215\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>DNA methylation at age-related CpG (AR-CpG) sites holds significant promise for forensic age estimation. However, somatic models perform poorly in semen due to unique methylation dynamics during spermatogenesis, and current studies are constrained by the limited coverage of methylation microarrays. This study aimed to identify novel semen-specific AR-CpG sites using double-enzyme reduced representation bisulfite sequencing (dRRBS) and validate these markers, alongside previously reported sites and neighboring CpGs, using bisulfite amplicon sequencing (BSAS) to develop robust age estimation models. A methylome-wide association study was conducted on semen samples from 21 healthy Chinese men across three age groups, generating over 4 million CpG sites per sample at ≥ 5 × depth. Analysis of 721,840 shared CpG sites revealed that more than 95 % were not covered by conventional methylation microarrays. Differential methylation and correlation analyses identified 139 AR-CpG sites. A two-stage validation process using multiplex PCR-based BSAS was performed. In the first stage, 47 top dRRBS-identified AR-CpG sites, 26 literature-reported sites, and 242 neighboring CpGs were assessed in 129 semen samples (22–64 years), validating 31 dRRBS, 26 literature-reported, and 152 neighboring CpGs as age-related. The second stage examined 154 CpG sites in 247 samples (22–67 years), confirming 71 AR-CpG sites with |rho| &gt; 0.50. Among these, chr2:129071885 (cg19998819) emerged as the strongest age-associated marker (rho = 0.81). Using the second BSAS dataset, age estimation models were developed with multiple linear regression and random forest (RF) algorithms within a repeated nested cross-validation (CV) framework (10-fold outer CV with 10-fold inner CV, repeated 10 times). The RF models demonstrated superior accuracy across feature subsets of 5–25 CpGs. The optimized 9-CpG RF model achieved an average root mean square error of 4.73 years (4.62–4.96, SD=0.10) and an average mean absolute error of 3.30 years (3.23–3.43, SD=0.06). This study demonstrates the utility of dRRBS for large-scale AR-CpG discovery and provides a robust age estimation model and a comprehensive reference database of semen-specific AR-CpG sites for forensic applications.</div></div>\",\"PeriodicalId\":50435,\"journal\":{\"name\":\"Forensic Science International-Genetics\",\"volume\":\"76 \",\"pages\":\"Article 103215\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-12-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Genetics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1872497324002114\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497324002114","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

年龄相关CpG (AR-CpG)位点的DNA甲基化对法医年龄估计具有重要意义。然而,由于精子发生过程中独特的甲基化动力学,体细胞模型在精液中的表现不佳,目前的研究受到甲基化微阵列有限覆盖的限制。本研究旨在利用双酶还原亚硫酸氢盐测序(dRRBS)鉴定新的精液特异性AR-CpG位点,并利用亚硫酸氢盐扩增子测序(BSAS)验证这些标记,以及先前报道的位点和邻近的cpg,以建立稳健的年龄估计模型。研究人员对21名来自三个年龄组的中国健康男性的精液样本进行了一项全甲基组关联研究,每个样本在≥ 5 × 深度上产生了超过400万个CpG位点。对721840个共享CpG位点的分析显示,超过95% %未被常规甲基化微阵列覆盖。差异甲基化和相关分析鉴定出139个AR-CpG位点。使用基于多重pcr的BSAS进行两阶段验证过程。在第一阶段,对129份(22-64岁)精液样本中47个经dRRBS鉴定的AR-CpG位点、26个文献报道的位点和242个邻近的cpg进行了评估,验证了31个dRRBS、26个文献报道的位点和152个邻近的cpg与年龄相关。第二阶段在247个样本(22-67岁)中检测了154个CpG位点,确认了71个AR-CpG位点|rho| > 0.50。其中,chr2:129071885 (cg19998819)是最强的年龄相关标记(rho = 0.81)。使用第二个BSAS数据集,在重复嵌套交叉验证(CV)框架(10倍外部CV和10倍内部CV,重复10次)中,使用多元线性回归和随机森林(RF)算法建立年龄估计模型。RF模型在5-25个CpGs的特征子集上显示出优越的准确性。优化后的9-CpG射频模型平均均方根误差为4.73年(4.62 ~ 4.96,SD=0.10),平均绝对误差为3.30年(3.23 ~ 3.43,SD=0.06)。这项研究证明了dRRBS在大规模AR-CpG发现中的实用性,并为法医应用提供了一个稳健的年龄估计模型和一个全面的精液特异性AR-CpG位点参考数据库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DNA methylation-based age estimation from semen: Genome-wide marker identification and model development
DNA methylation at age-related CpG (AR-CpG) sites holds significant promise for forensic age estimation. However, somatic models perform poorly in semen due to unique methylation dynamics during spermatogenesis, and current studies are constrained by the limited coverage of methylation microarrays. This study aimed to identify novel semen-specific AR-CpG sites using double-enzyme reduced representation bisulfite sequencing (dRRBS) and validate these markers, alongside previously reported sites and neighboring CpGs, using bisulfite amplicon sequencing (BSAS) to develop robust age estimation models. A methylome-wide association study was conducted on semen samples from 21 healthy Chinese men across three age groups, generating over 4 million CpG sites per sample at ≥ 5 × depth. Analysis of 721,840 shared CpG sites revealed that more than 95 % were not covered by conventional methylation microarrays. Differential methylation and correlation analyses identified 139 AR-CpG sites. A two-stage validation process using multiplex PCR-based BSAS was performed. In the first stage, 47 top dRRBS-identified AR-CpG sites, 26 literature-reported sites, and 242 neighboring CpGs were assessed in 129 semen samples (22–64 years), validating 31 dRRBS, 26 literature-reported, and 152 neighboring CpGs as age-related. The second stage examined 154 CpG sites in 247 samples (22–67 years), confirming 71 AR-CpG sites with |rho| > 0.50. Among these, chr2:129071885 (cg19998819) emerged as the strongest age-associated marker (rho = 0.81). Using the second BSAS dataset, age estimation models were developed with multiple linear regression and random forest (RF) algorithms within a repeated nested cross-validation (CV) framework (10-fold outer CV with 10-fold inner CV, repeated 10 times). The RF models demonstrated superior accuracy across feature subsets of 5–25 CpGs. The optimized 9-CpG RF model achieved an average root mean square error of 4.73 years (4.62–4.96, SD=0.10) and an average mean absolute error of 3.30 years (3.23–3.43, SD=0.06). This study demonstrates the utility of dRRBS for large-scale AR-CpG discovery and provides a robust age estimation model and a comprehensive reference database of semen-specific AR-CpG sites for forensic applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.50
自引率
32.30%
发文量
132
审稿时长
11.3 weeks
期刊介绍: Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts. The scope of the journal includes: Forensic applications of human polymorphism. Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies. Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms. Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications. Non-human DNA polymorphisms for crime scene investigation. Population genetics of human polymorphisms of forensic interest. Population data, especially from DNA polymorphisms of interest for the solution of forensic problems. DNA typing methodologies and strategies. Biostatistical methods in forensic genetics. Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches. Standards in forensic genetics. Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards. Quality control. Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies. Criminal DNA databases. Technical, legal and statistical issues. General ethical and legal issues related to forensic genetics.
期刊最新文献
Investigation into the genotyping performance of a unique molecular identifier based microhaplotypes MPS panel in complex DNA mixture Massively parallel sequencing of a forensic combined panel of 107-plex STR loci and 292-plex SNP loci in the Han Chinese population Editorial Board The IPEFA model: An initiative for online training and education as applied by the International Society for Forensic Genetics Preparing for shotgun sequencing in forensic genetics – Evaluation of DNA extraction and library building methods
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1