文献互助智能选刊最新文献

高级搜索发布求助登录注册

DMR_Kmeans:基于kmeans聚类和Read甲基化单倍型过滤识别差异甲基化区域

IF 2.4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS Current Bioinformatics Pub Date : 2023-10-06 DOI:10.2174/0115748936245495230925112419

Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang, Ji Li

{"title":"DMR_Kmeans:基于kmeans聚类和Read甲基化单倍型过滤识别差异甲基化区域","authors":"Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang, Ji Li","doi":"10.2174/0115748936245495230925112419","DOIUrl":null,"url":null,"abstract":"Introduction:: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs. background: Differentially methylated regions (DMRs), including the tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglects the joint methylation statuses provided in each read and results in inaccurate boundaries of DMRs. Methods:: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. objective: Make use of the joint methylation statuses provided in each read and predict accurate boundaries of DMRs. Result:: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. method: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on k-means clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. Conclusion:: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. result: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis, since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. other: None","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"300 1","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DMR_Kmeans: Identifying Differentially Methylated Regions Based on kmeans Clustering and Read Methylation Haplotype Filtering\",\"authors\":\"Xiaoqing Peng, Wanxin Cui, Xiangyan Kong, Yuannan Huang, Ji Li\",\"doi\":\"10.2174/0115748936245495230925112419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction:: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs. background: Differentially methylated regions (DMRs), including the tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglects the joint methylation statuses provided in each read and results in inaccurate boundaries of DMRs. Methods:: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. objective: Make use of the joint methylation statuses provided in each read and predict accurate boundaries of DMRs. Result:: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. method: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on k-means clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. Conclusion:: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. result: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis, since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. other: None\",\"PeriodicalId\":10801,\"journal\":{\"name\":\"Current Bioinformatics\",\"volume\":\"300 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/0115748936245495230925112419\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0115748936245495230925112419","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

差异甲基化区(Differentially methylated regions, DMRs)包括组织特异性DMRs和疾病特异性DMRs，可用于揭示基因调控机制和疾病筛查。到目前为止，已经提出了许多从亚硫酸盐测序数据中检测DMRs的方法。在这些方法中，差异甲基化的CpG位点和DMRs通常是基于统计检验或分布模型来识别的，这些模型忽略了每个读取中提供的联合甲基化状态，导致DMRs的边界不准确。背景:差异甲基化区(Differentially methylated regions, DMRs)包括组织特异性DMRs和疾病特异性DMRs，可用于揭示基因调控机制和疾病筛查。到目前为止，已经提出了许多从亚硫酸盐测序数据中检测DMRs的方法。在这些方法中，差异甲基化的CpG位点和DMRs通常是基于统计检验或分布模型来识别的，这忽略了每个读取中提供的联合甲基化状态，导致DMRs的边界不准确。方法:本文提出了一种基于kmeans聚类和read甲基化单倍型过滤的DMR_Kmeans检测DMRs的方法。在DMR_Kmeans中，对于每个CpG位点，使用k-means算法对两组CpG的甲基化水平进行聚类，并根据聚类中的不同分布测量CpG的甲基化差异。利用reads的甲基化单倍型提取候选区域的甲基化模式。最后，根据候选区域的甲基化差异和甲基化模式来识别DMRs。目的:利用每个读数提供的关节甲基化状态，预测dmr的准确边界。结果:比较DMR_Kmeans和8种DMR检测方法对6对组织亚硫酸氢盐全基因组测序数据的表现，结果表明，在甲基化差异大于0.4的一定阈值下，DMR_Kmeans比其他方法获得更高的Qn和Ql，并且有更多的重叠启动子，这表明边界准确的DMR_Kmeans预测的DMRs比其他方法含有更少的CpGs，甲基化差异较小。本文提出了一种基于k-means聚类和读取甲基化单倍型过滤的DMR_Kmeans检测DMRs的方法。在DMR_Kmeans中，对于每个CpG位点，使用k-means算法对两组CpG的甲基化水平进行聚类，并根据聚类中的不同分布测量CpG的甲基化差异。利用reads的甲基化单倍型提取候选区域的甲基化模式。最后，根据候选区域的甲基化差异和甲基化模式来识别DMRs。结论:与其他方法相比，DMR_Kmeans预测的DMR总长度更长，CpG位点总数更多，可以为下游分析提供高质量的DMR集。结果:比较DMR_Kmeans和8种DMR检测方法对6对组织亚硫酸氢盐全基因组测序数据的性能，结果表明，在甲基化差异大于0.4的一定阈值下，DMR_Kmeans预测的DMR比其他方法获得更高的Qn和Ql，这表明边界准确的DMR_Kmeans预测的DMR含有较少的CpGs，甲基化差异较小。此外，由于DMR_Kmeans预测的DMR总长度更长，并且DMR中CpG位点的总数大于其他方法，因此DMR_Kmeans可以为下游分析提供高质量的DMR集。其他:无

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DMR_Kmeans: Identifying Differentially Methylated Regions Based on kmeans Clustering and Read Methylation Haplotype Filtering

Introduction:: Differentially methylated regions (DMRs), including tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglect the joint methylation statuses provided in each read and result in inaccurate boundaries of DMRs. background: Differentially methylated regions (DMRs), including the tissue-specific DMRs and disease-specific DMRs, can be used in revealing the mechanisms of gene regulation and screening diseases. Up until now, many methods have been proposed to detect DMRs from bisulfite sequencing data. In these methods, differentially methylated CpG sites and DMRs are usually identified based on statistical tests or distribution models, which neglects the joint methylation statuses provided in each read and results in inaccurate boundaries of DMRs. Methods:: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on kmeans clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. objective: Make use of the joint methylation statuses provided in each read and predict accurate boundaries of DMRs. Result:: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql, and more overlapped promoters than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. method: In this paper, a method, named DMR_Kmeans, is proposed to detect DMRs based on k-means clustering and read methylation haplotype filtering. In DMR_Kmeans, for each CpG site, the k-means algorithm is used to cluster the methylation levels from two groups, and the methylation difference of the CpG is measured based on the different distributions in clusters. Methylation haplotypes of reads are employed to extract the methylation patterns in a candidate region. Finally, DMRs are identified based on the methylation differences and the methylation patterns in candidate regions. Conclusion:: Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. result: Comparing the performance of DMR_Kmeans and eight DMR detection methods on the whole genome bisulfite sequencing data of six pairs of tissues, the results show that DMR_Kmeans achieves higher Qn and Ql than other methods when given a certain threshold of methylation difference greater than 0.4, which indicates that the DMRs predicted by DMR_Kmeans with accurate boundaries contain less CpGs with small methylation differences than those by other methods. Furthermore, it suggests that DMR_Kmeans can provide a DMR set with high quality for downstream analysis, since the total length of DMRs predicted by DMR_Kmeans is longer and the total number of CpG sites in the DMRs is greater than those of other methods. other: None

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Current Bioinformatics

Current Bioinformatics 生物-生化研究方法

CiteScore

6.60

自引率

2.50%

发文量

77

审稿时长

>12 weeks

期刊介绍： Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.

期刊最新文献

Mining Transcriptional Data for Precision Medicine: Bioinformatics Insights into Inflammatory Bowel Disease Prediction of miRNA-disease Associations by Deep Matrix Decomposition Method based on Fused Similarity Information TCM@MPXV: A Resource for Treating Monkeypox Patients in Traditional Chinese Medicine Identifying Key Clinical Indicators Associated with the Risk of Death in Hospitalized COVID-19 Patients A Parallel Implementation for Large-Scale TSR-based 3D Structural Comparisons of Protein and Amino Acid

0

微信

客服QQ

Book学术公众号

扫码关注我们

反馈

Book学术官方微信

Book学术文献互助

Book学术文献互助群
群号：481959085

文献互助智能选刊最新文献互助须知联系我们：info@booksci.cn

Book学术提供免费学术资源搜索服务，方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。

Copyright © 2023 Book学术 All rights reserved.

京公网安备 11010802042870号京ICP备2023020795号-1