LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads.

GigaByte (Hong Kong, China) Pub Date : 2022-05-04 eCollection Date: 2022-01-01 DOI:10.46471/gigabyte.51
Hui-Su Kim, Asta Blazyte, Sungwon Jeon, Changhan Yoon, Yeonkyung Kim, Changjae Kim, Dan Bolser, Ji-Hye Ahn, Jeremy S Edwards, Jong Bhak
{"title":"LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads.","authors":"Hui-Su Kim, Asta Blazyte, Sungwon Jeon, Changhan Yoon, Yeonkyung Kim, Changjae Kim, Dan Bolser, Ji-Hye Ahn, Jeremy S Edwards, Jong Bhak","doi":"10.46471/gigabyte.51","DOIUrl":null,"url":null,"abstract":"<p><p>We present LT1, the first high-quality human reference genome from the Baltic States. LT1 is a female <i>de novo</i> human reference genome assembly, constructed using 57× nanopore long reads and polished using 47× short paired-end reads. We utilized 72 GB of Hi-C chromosomal mapping data for scaffolding, to maximize assembly contiguity and accuracy. The contig assembly of LT1 was 2.73 Gbp in length, comprising 4490 contigs with an NG50 value of 12.0 Mbp. After scaffolding with Hi-C data and manual curation, the final assembly has an NG50 value of 137 Mbp and 4699 scaffolds. Assessment of gene prediction quality using Benchmarking Universal Single-Copy Orthologs (BUSCO) identified 89.3% of the single-copy orthologous genes included in the benchmark. Detailed characterization of LT1 suggests it has 73,744 predicted transcripts, 4.2 million autosomal SNPs, 974,616 short indels, and 12,079 large structural variants. These data may be used as a benchmark for further in-depth genomic analyses of Baltic populations.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2022 ","pages":"gigabyte51"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9650228/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaByte (Hong Kong, China)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46471/gigabyte.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We present LT1, the first high-quality human reference genome from the Baltic States. LT1 is a female de novo human reference genome assembly, constructed using 57× nanopore long reads and polished using 47× short paired-end reads. We utilized 72 GB of Hi-C chromosomal mapping data for scaffolding, to maximize assembly contiguity and accuracy. The contig assembly of LT1 was 2.73 Gbp in length, comprising 4490 contigs with an NG50 value of 12.0 Mbp. After scaffolding with Hi-C data and manual curation, the final assembly has an NG50 value of 137 Mbp and 4699 scaffolds. Assessment of gene prediction quality using Benchmarking Universal Single-Copy Orthologs (BUSCO) identified 89.3% of the single-copy orthologous genes included in the benchmark. Detailed characterization of LT1 suggests it has 73,744 predicted transcripts, 4.2 million autosomal SNPs, 974,616 short indels, and 12,079 large structural variants. These data may be used as a benchmark for further in-depth genomic analyses of Baltic populations.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LT1是一个基于ONT长读数的装配,使用Hi-C数据搭建脚手架,并用短读数进行打磨。
我们展示了波罗的海国家第一个高质量人类参考基因组 LT1。LT1 是一个女性从头人类参考基因组组装,使用 57× 纳米孔长读数构建,并使用 47× 短配对末端读数进行抛光。我们利用 72 GB 的 Hi-C 染色体图谱数据搭建脚手架,以最大限度地提高组装的连续性和准确性。LT1 的等位基因组装长度为 2.73 Gbp,由 4490 个等位基因组成,NG50 值为 12.0 Mbp。在使用 Hi-C 数据搭建脚手架并进行人工整理后,最终的组装结果的 NG50 值为 137 Mbp,脚手架数量为 4699 个。使用通用单拷贝正源基因基准(BUSCO)对基因预测质量进行评估,确定了基准中 89.3% 的单拷贝正源基因。LT1 的详细特征表明,它有 73,744 个预测转录本、420 万个常染色体 SNP、974,616 个短嵌合和 12,079 个大结构变异。这些数据可作为进一步深入分析波罗的海人群基因组的基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
审稿时长
5 weeks
期刊最新文献
The genome of the sapphire damselfish Chrysiptera cyanea: a new resource to support further investigation of the evolution of Pomacentrids. Polyploid genome assembly of Cardamine chenopodiifolia. NeuroVar: an open-source tool for the visualization of gene expression and variation data for biomarkers of neurological diseases. Whole-genome re-sequencing of the Baikal seal and other phocid seals for a glimpse into their genetic diversity, demographic history, and phylogeny. Chromosome-level genome assembly and annotation of the crested gecko, Correlophus ciliatus, a lizard incapable of tail regeneration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1