DMR_Kmeans:基于kmeans聚类和Read甲基化单倍型过滤识别差异甲基化区域

IF 2.4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS Current Bioinformatics Pub Date : 2023-10-06 DOI:10.2174/0115748936245495230925112419
Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang, Ji Li
{"title":"DMR_Kmeans:基于kmeans聚类和Read甲基化单倍型过滤识别差异甲基化区域","authors":"Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang, Ji Li","doi":"10.2174/0115748936245495230925112419","DOIUrl":null,"url":null,"abstract":"Introduction:: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs. background: Differentially methylated regions (DMRs), including the tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglects the joint methylation statuses provided in each read and results in inaccurate boundaries of DMRs. Methods:: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. objective: Make use of the joint methylation statuses provided in each read and predict accurate boundaries of DMRs. Result:: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. method: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on k-means clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. Conclusion:: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. result: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis, since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. other: None","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DMR_Kmeans: Identifying Differentially Methylated Regions Based on kmeans Clustering and Read Methylation Haplotype Filtering\",\"authors\":\"Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang, Ji Li\",\"doi\":\"10.2174/0115748936245495230925112419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction:: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs. background: Differentially methylated regions (DMRs), including the tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglects the joint methylation statuses provided in each read and results in inaccurate boundaries of DMRs. Methods:: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. objective: Make use of the joint methylation statuses provided in each read and predict accurate boundaries of DMRs. Result:: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. method: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on k-means clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. Conclusion:: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. result: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis, since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. other: None\",\"PeriodicalId\":10801,\"journal\":{\"name\":\"Current Bioinformatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/0115748936245495230925112419\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0115748936245495230925112419","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

差异甲基化区(Differentially methylated regions, DMRs)包括组织特异性DMRs和疾病特异性DMRs,可用于揭示基因调控机制和疾病筛查。到目前为止,已经提出了许多从亚硫酸盐测序数据中检测DMRs的方法。在这些方法中,差异甲基化的CpG位点和DMRs通常是基于统计检验或分布模型来识别的,这些模型忽略了每个读取中提供的联合甲基化状态,导致DMRs的边界不准确。背景:差异甲基化区(Differentially methylated regions, DMRs)包括组织特异性DMRs和疾病特异性DMRs,可用于揭示基因调控机制和疾病筛查。到目前为止,已经提出了许多从亚硫酸盐测序数据中检测DMRs的方法。在这些方法中,差异甲基化的CpG位点和DMRs通常是基于统计检验或分布模型来识别的,这忽略了每个读取中提供的联合甲基化状态,导致DMRs的边界不准确。方法:本文提出了一种基于kmeans聚类和read甲基化单倍型过滤的DMR_Kmeans检测DMRs的方法。在DMR_Kmeans中,对于每个CpG位点,使用k-means算法对两组CpG的甲基化水平进行聚类,并根据聚类中的不同分布测量CpG的甲基化差异。利用reads的甲基化单倍型提取候选区域的甲基化模式。最后,根据候选区域的甲基化差异和甲基化模式来识别DMRs。目的:利用每个读数提供的关节甲基化状态,预测dmr的准确边界。结果:比较DMR_Kmeans和8种DMR检测方法对6对组织亚硫酸氢盐全基因组测序数据的表现,结果表明,在甲基化差异大于0.4的一定阈值下,DMR_Kmeans比其他方法获得更高的Qn和Ql,并且有更多的重叠启动子,这表明边界准确的DMR_Kmeans预测的DMRs比其他方法含有更少的CpGs,甲基化差异较小。本文提出了一种基于k-means聚类和读取甲基化单倍型过滤的DMR_Kmeans检测DMRs的方法。在DMR_Kmeans中,对于每个CpG位点,使用k-means算法对两组CpG的甲基化水平进行聚类,并根据聚类中的不同分布测量CpG的甲基化差异。利用reads的甲基化单倍型提取候选区域的甲基化模式。最后,根据候选区域的甲基化差异和甲基化模式来识别DMRs。结论:与其他方法相比,DMR_Kmeans预测的DMR总长度更长,CpG位点总数更多,可以为下游分析提供高质量的DMR集。结果:比较DMR_Kmeans和8种DMR检测方法对6对组织亚硫酸氢盐全基因组测序数据的性能,结果表明,在甲基化差异大于0.4的一定阈值下,DMR_Kmeans预测的DMR比其他方法获得更高的Qn和Ql,这表明边界准确的DMR_Kmeans预测的DMR含有较少的CpGs,甲基化差异较小。此外,由于DMR_Kmeans预测的DMR总长度更长,并且DMR中CpG位点的总数大于其他方法,因此DMR_Kmeans可以为下游分析提供高质量的DMR集。其他:无
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DMR_Kmeans: Identifying Differentially Methylated Regions Based on kmeans Clustering and Read Methylation Haplotype Filtering
Introduction:: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs. background: Differentially methylated regions (DMRs), including the tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglects the joint methylation statuses provided in each read and results in inaccurate boundaries of DMRs. Methods:: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. objective: Make use of the joint methylation statuses provided in each read and predict accurate boundaries of DMRs. Result:: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. method: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on k-means clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. Conclusion:: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. result: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis, since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. other: None
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Current Bioinformatics
Current Bioinformatics 生物-生化研究方法
CiteScore
6.60
自引率
2.50%
发文量
77
审稿时长
>12 weeks
期刊介绍: Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
期刊最新文献
Mining Transcriptional Data for Precision Medicine: Bioinformatics Insights into Inflammatory Bowel Disease Prediction of miRNA-disease Associations by Deep Matrix Decomposition Method based on Fused Similarity Information TCM@MPXV: A Resource for Treating Monkeypox Patients in Traditional Chinese Medicine Identifying Key Clinical Indicators Associated with the Risk of Death in Hospitalized COVID-19 Patients A Parallel Implementation for Large-Scale TSR-based 3D Structural Comparisons of Protein and Amino Acid
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1