Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes

A. Kawahara, Caroline G. Storer, A. Markee, J. Heckenhauer, A. Powell, David M. Plotkin, S. Hotaling, T. Cleland, Rebecca B. Dikow, Torsten Dikow, Ryoichi B. Kuranishi, Rebeccah L. Messcher, S. Pauls, R. Stewart, K. Tojo, P. Frandsen
{"title":"Long-read HiFi sequencing correctly assembles repetitive heavy fibroin silk genes in new moth and caddisfly genomes","authors":"A. Kawahara, Caroline G. Storer, A. Markee, J. Heckenhauer, A. Powell, David M. Plotkin, S. Hotaling, T. Cleland, Rebecca B. Dikow, Torsten Dikow, Ryoichi B. Kuranishi, Rebeccah L. Messcher, S. Pauls, R. Stewart, K. Tojo, P. Frandsen","doi":"10.1101/2022.06.01.494423","DOIUrl":null,"url":null,"abstract":"Insect silk is an incredibly versatile biomaterial. Lepidoptera and their sister lineage, Trichoptera, display some of the most diverse uses of silk with varying strength, adhesive qualities and elastic properties. It is well known that silk fibroin genes are long (> 20 kb) and have many repetitive motifs. These features make these genes challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly, Eubasilissa regina. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO Complete = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. This study demonstrates that HiFi long-read sequencing can significantly help our understanding of genes with highly contiguous, repetitive regions.","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaByte (Hong Kong, China)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2022.06.01.494423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Insect silk is an incredibly versatile biomaterial. Lepidoptera and their sister lineage, Trichoptera, display some of the most diverse uses of silk with varying strength, adhesive qualities and elastic properties. It is well known that silk fibroin genes are long (> 20 kb) and have many repetitive motifs. These features make these genes challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly, Eubasilissa regina. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO Complete = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. This study demonstrates that HiFi long-read sequencing can significantly help our understanding of genes with highly contiguous, repetitive regions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
长读HiFi测序正确组装重复重丝素基因在新的飞蛾和球蛾基因组
昆虫丝是一种用途广泛的生物材料。鳞翅目和它们的姐妹系——毛翅目,展示了一些最多样化的丝绸用途,它们具有不同的强度、粘接质量和弹性。众所周知,丝素蛋白基因很长(大约20 kb),并且有许多重复的基序。这些特征使得这些基因难以测序。到目前为止,大多数研究都集中在纤维蛋白基因的保守N端和c端区域,因为不可能对不同分类群的重复区域进行全面比较。利用PacBio Sequel II系统和SMRT测序,我们生成了印度蛾(Plodia interpunctella)和白蛉(Eubasilissa regina)的高保真(HiFi)长读基因组和转录组序列。两个基因组高度连续(N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11)和完整(BUSCO complete = 99.3%/95.2%),恢复的丝质重丝蛋白基因序列完整且连续。这项研究表明,HiFi长读测序可以显著地帮助我们理解具有高度连续、重复区域的基因。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.60
自引率
0.00%
发文量
0
审稿时长
5 weeks
期刊最新文献
A practical DNA data storage using an expanded alphabet introducing 5-methylcytosine. Biodepot Launcher: an app to install, manage and launch bioinformatics workflows. The genome of the sapphire damselfish Chrysiptera cyanea: a new resource to support further investigation of the evolution of Pomacentrids. Polyploid genome assembly of Cardamine chenopodiifolia. NeuroVar: an open-source tool for the visualization of gene expression and variation data for biomarkers of neurological diseases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1