DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools.

IF 6.2 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Genome research Pub Date : 2024-11-20 DOI:10.1101/gr.279095.124
Anupama Jha, Stephanie C Bohaczuk, Yizi Mao, Jane Ranchalis, Benjamin J Mallory, Alan T Min, Morgan O Hamm, Elliott Swanson, Danilo Dubocanin, Connor Finkbeiner, Tony Li, Dale Whittington, William Stafford Noble, Andrew B Stergachis, Mitchell R Vollger
{"title":"DNA-m6A calling and integrated long-read epigenetic and genetic analysis with <i>fibertools</i>.","authors":"Anupama Jha, Stephanie C Bohaczuk, Yizi Mao, Jane Ranchalis, Benjamin J Mallory, Alan T Min, Morgan O Hamm, Elliott Swanson, Danilo Dubocanin, Connor Finkbeiner, Tony Li, Dale Whittington, William Stafford Noble, Andrew B Stergachis, Mitchell R Vollger","doi":"10.1101/gr.279095.124","DOIUrl":null,"url":null,"abstract":"<p><p>Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA <i>N</i> <sup><i>6</i></sup> -methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce <i>fibertools</i>, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that <i>fibertools</i> can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"1976-1986"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279095.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation and the identification of exogenously placed DNA N 6 -methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as coprocessing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semisupervised convolutional neural network for fast and accurate identification of m6A-marked bases using Pacific Biosciences (PacBio) single-molecule long-read sequencing, as well as the coprocessing of long-read genetic and epigenetic data produced using either the PacBio or Oxford Nanopore Technologies (ONT) sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kb long DNA molecules with an ∼1000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用 fibertools 进行 DNA-m6A 调用和综合长读数表观遗传学和基因分析。
长线DNA测序最近已成为以单分子和单核苷酸分辨率研究遗传和表观遗传结构的有力工具。长读表观遗传学研究既包括直接鉴定原生胞嘧啶甲基化,也包括鉴定外源 DNA N6-甲基腺嘌呤(DNA-m6A)。然而,利用单分子测序检测DNA-m6A修饰,以及共同处理单分子遗传和表观遗传结构,都受到计算需求和支持工具缺乏的限制。在这里,我们介绍了最先进的工具包 fibertools,它采用半监督卷积神经网络,利用 PacBio 单分子长读数测序技术快速准确地识别 m6A 标记碱基,并对利用 PacBio 或 Oxford Nanopore 测序平台产生的长读数遗传和表观遗传数据进行协同处理。我们展示了对长度大于 20 千碱基的 DNA 分子进行精确的 DNA-m6A 鉴定(精确度和召回率大于 90%),速度提高了约 1000 倍。此外,我们还证明了 fibertools 能以单分子分辨率轻松整合遗传和表观遗传数据,包括分子坐标系和参考坐标系之间的无缝转换,从而能在结构和体细胞可变的基因组区域内对长读数据进行准确的遗传和表观遗传分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Genome research
Genome research 生物-生化与分子生物学
CiteScore
12.40
自引率
1.40%
发文量
140
审稿时长
6 months
期刊介绍: Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.
期刊最新文献
Global identification of mammalian host and nested gene pairs reveal tissue-specific transcriptional interplay Convergent relaxation of molecular constraint in herbivores reveals the changing role of liver and kidney functions across mammalian diets KAS-ATAC reveals the genome-wide single-stranded accessible chromatin landscape of the human genome Advancements in prospective single-cell lineage barcoding and their applications in research The chromatin landscape of the histone-possessing Bacteriovorax bacteria
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1