首页 > 最新文献

GigaScience最新文献

英文 中文
Current status of global conservation and characterisation of wild and cultivated Brassicaceae genetic resources. 全球野生和栽培十字花科遗传资源的保护和特征描述现状。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae050
Elena Castillo-Lorenzo, Elinor Breman, Pablo Gómez Barreiro, Juan Viruel

Background: The economic importance of the globally distributed Brassicaceae family resides in the large diversity of crops within the family and the substantial variety of agronomic and functional traits they possess. We reviewed the current classifications of crop wild relatives (CWRs) in the Brassicaceae family with the aim of identifying new potential cross-compatible species from a total of 1,242 species using phylogenetic approaches.

Results: In general, cross-compatibility data between wild species and crops, as well as phenotype and genotype characterisation data, were available for major crops but very limited for minor crops, restricting the identification of new potential CWRs. Around 70% of wild Brassicaceae did not have genetic sequence data available in public repositories, and only 40% had chromosome counts published. Using phylogenetic distances, we propose 103 new potential CWRs for this family, which we recommend as priorities for cross-compatibility tests with crops and for phenotypic characterisation, including 71 newly identified CWRs for 10 minor crops. From the total species used in this study, more than half had no records of being in ex situ conservation, and 80% were not assessed for their conservation status or were data deficient (IUCN Red List Assessments).

Conclusions: Great efforts are needed on ex situ conservation to have accessible material for characterising and evaluating the species for future breeding programmes. We identified the Mediterranean region as one key conservation area for wild Brassicaceae species, with great numbers of endemic and threatened species. Conservation assessments are urgently needed to evaluate most of these wild Brassicaceae.

背景:分布于全球的十字花科(Brassicaceae)在经济上的重要性在于该科内作物的多样性以及它们所具有的大量农艺学和功能性特征。我们回顾了十字花科作物野生近缘种(CWRs)的现有分类,目的是从总共 1,242 个物种中利用系统发育方法鉴定出新的潜在交叉相容物种:一般来说,主要作物可获得野生物种与作物之间的杂交相容性数据以及表型和基因型特征数据,而次要作物则非常有限,这限制了新的潜在杂交种的鉴定。约 70% 的野生十字花科植物在公共资料库中没有基因序列数据,只有 40% 公布了染色体数。利用系统发育距离,我们为该科提出了 103 个新的潜在 CWRs,并建议将这些 CWRs 作为与作物进行杂交相容性测试和表型鉴定的优先选择,其中包括为 10 种次要作物新鉴定的 71 个 CWRs。在这项研究中使用的所有物种中,超过一半的物种没有进行异地保护的记录,80%的物种没有进行保护状况评估或数据不足(世界自然保护联盟红色名录评估):结论:需要大力开展异地保护工作,以便为未来的繁殖计划提供可获取的材料,对物种进行特征描述和评估。我们发现地中海地区是野生十字花科物种的主要保护区之一,这里有大量特有物种和濒危物种。亟需对这些野生十字花科植物进行保护评估。
{"title":"Current status of global conservation and characterisation of wild and cultivated Brassicaceae genetic resources.","authors":"Elena Castillo-Lorenzo, Elinor Breman, Pablo Gómez Barreiro, Juan Viruel","doi":"10.1093/gigascience/giae050","DOIUrl":"10.1093/gigascience/giae050","url":null,"abstract":"<p><strong>Background: </strong>The economic importance of the globally distributed Brassicaceae family resides in the large diversity of crops within the family and the substantial variety of agronomic and functional traits they possess. We reviewed the current classifications of crop wild relatives (CWRs) in the Brassicaceae family with the aim of identifying new potential cross-compatible species from a total of 1,242 species using phylogenetic approaches.</p><p><strong>Results: </strong>In general, cross-compatibility data between wild species and crops, as well as phenotype and genotype characterisation data, were available for major crops but very limited for minor crops, restricting the identification of new potential CWRs. Around 70% of wild Brassicaceae did not have genetic sequence data available in public repositories, and only 40% had chromosome counts published. Using phylogenetic distances, we propose 103 new potential CWRs for this family, which we recommend as priorities for cross-compatibility tests with crops and for phenotypic characterisation, including 71 newly identified CWRs for 10 minor crops. From the total species used in this study, more than half had no records of being in ex situ conservation, and 80% were not assessed for their conservation status or were data deficient (IUCN Red List Assessments).</p><p><strong>Conclusions: </strong>Great efforts are needed on ex situ conservation to have accessible material for characterising and evaluating the species for future breeding programmes. We identified the Mediterranean region as one key conservation area for wild Brassicaceae species, with great numbers of endemic and threatened species. Conservation assessments are urgently needed to evaluate most of these wild Brassicaceae.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304946/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Swin Transformer and knowledge transfer for denoising of super-resolution structured illumination microscopy data. 评估用于超分辨率结构照明显微镜数据去噪的斯温变换器和知识转移。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad109
Zafran Hussain Shah, Marcel Müller, Wolfgang Hübner, Tung-Cheng Wang, Daniel Telman, Thomas Huser, Wolfram Schenck

Background: Convolutional neural network (CNN)-based methods have shown excellent performance in denoising and reconstruction of super-resolved structured illumination microscopy (SR-SIM) data. Therefore, CNN-based architectures have been the focus of existing studies. However, Swin Transformer, an alternative and recently proposed deep learning-based image restoration architecture, has not been fully investigated for denoising SR-SIM images. Furthermore, it has not been fully explored how well transfer learning strategies work for denoising SR-SIM images with different noise characteristics and recorded cell structures for these different types of deep learning-based methods. Currently, the scarcity of publicly available SR-SIM datasets limits the exploration of the performance and generalization capabilities of deep learning methods.

Results: In this work, we present SwinT-fairSIM, a novel method based on the Swin Transformer for restoring SR-SIM images with a low signal-to-noise ratio. The experimental results show that SwinT-fairSIM outperforms previous CNN-based denoising methods. Furthermore, as a second contribution, two types of transfer learning-namely, direct transfer and fine-tuning-were benchmarked in combination with SwinT-fairSIM and CNN-based methods for denoising SR-SIM data. Direct transfer did not prove to be a viable strategy, but fine-tuning produced results comparable to conventional training from scratch while saving computational time and potentially reducing the amount of training data required. As a third contribution, we publish four datasets of raw SIM images and already reconstructed SR-SIM images. These datasets cover two different types of cell structures, tubulin filaments and vesicle structures. Different noise levels are available for the tubulin filaments.

Conclusion: The SwinT-fairSIM method is well suited for denoising SR-SIM images. By fine-tuning, already trained models can be easily adapted to different noise characteristics and cell structures. Furthermore, the provided datasets are structured in a way that the research community can readily use them for research on denoising, super-resolution, and transfer learning strategies.

背景:基于卷积神经网络(CNN)的方法在超分辨结构照明显微镜(SR-SIM)数据的去噪和重建方面表现出色。因此,基于 CNN 的架构一直是现有研究的重点。然而,最近提出的另一种基于深度学习的图像修复架构 Swin Transformer 还没有被充分研究用于 SR-SIM 图像的去噪。此外,对于这些不同类型的基于深度学习的方法,如何利用迁移学习策略对具有不同噪声特征和记录单元结构的 SR-SIM 图像进行去噪,还没有进行充分的探讨。目前,公开可用的 SR-SIM 数据集的稀缺性限制了对深度学习方法的性能和泛化能力的探索:在这项工作中,我们提出了 SwinT-fairSIM,这是一种基于 Swin 变换器的新方法,用于还原信噪比较低的 SR-SIM 图像。实验结果表明,SwinT-fairSIM 优于之前基于 CNN 的去噪方法。此外,作为第二项贡献,两种类型的迁移学习--即直接迁移和微调--与 SwinT-fairSIM 和基于 CNN 的 SR-SIM 数据去噪方法相结合进行了基准测试。事实证明,直接迁移不是一种可行的策略,但微调的结果与传统的从头开始训练的结果相当,同时节省了计算时间,并有可能减少所需的训练数据量。第三个贡献是,我们发布了四个原始 SIM 图像和已重建 SR-SIM 图像的数据集。这些数据集涵盖两种不同类型的细胞结构,即微管蛋白丝和囊泡结构。对于微管蛋白丝,有不同的噪声水平:结论:SwinT-fairSIM 方法非常适合 SR-SIM 图像去噪。通过微调,已经训练好的模型可以很容易地适应不同的噪声特征和细胞结构。此外,所提供的数据集结构合理,研究界可随时将其用于去噪、超分辨率和迁移学习策略的研究。
{"title":"Evaluation of Swin Transformer and knowledge transfer for denoising of super-resolution structured illumination microscopy data.","authors":"Zafran Hussain Shah, Marcel Müller, Wolfgang Hübner, Tung-Cheng Wang, Daniel Telman, Thomas Huser, Wolfram Schenck","doi":"10.1093/gigascience/giad109","DOIUrl":"10.1093/gigascience/giad109","url":null,"abstract":"<p><strong>Background: </strong>Convolutional neural network (CNN)-based methods have shown excellent performance in denoising and reconstruction of super-resolved structured illumination microscopy (SR-SIM) data. Therefore, CNN-based architectures have been the focus of existing studies. However, Swin Transformer, an alternative and recently proposed deep learning-based image restoration architecture, has not been fully investigated for denoising SR-SIM images. Furthermore, it has not been fully explored how well transfer learning strategies work for denoising SR-SIM images with different noise characteristics and recorded cell structures for these different types of deep learning-based methods. Currently, the scarcity of publicly available SR-SIM datasets limits the exploration of the performance and generalization capabilities of deep learning methods.</p><p><strong>Results: </strong>In this work, we present SwinT-fairSIM, a novel method based on the Swin Transformer for restoring SR-SIM images with a low signal-to-noise ratio. The experimental results show that SwinT-fairSIM outperforms previous CNN-based denoising methods. Furthermore, as a second contribution, two types of transfer learning-namely, direct transfer and fine-tuning-were benchmarked in combination with SwinT-fairSIM and CNN-based methods for denoising SR-SIM data. Direct transfer did not prove to be a viable strategy, but fine-tuning produced results comparable to conventional training from scratch while saving computational time and potentially reducing the amount of training data required. As a third contribution, we publish four datasets of raw SIM images and already reconstructed SR-SIM images. These datasets cover two different types of cell structures, tubulin filaments and vesicle structures. Different noise levels are available for the tubulin filaments.</p><p><strong>Conclusion: </strong>The SwinT-fairSIM method is well suited for denoising SR-SIM images. By fine-tuning, already trained models can be easily adapted to different noise characteristics and cell structures. Furthermore, the provided datasets are structured in a way that the research community can readily use them for research on denoising, super-resolution, and transfer learning strategies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139466408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gapless genome assembly and epigenetic profiles reveal gene regulation of whole-genome triplication in lettuce. 无间隙基因组组装和表观遗传图谱揭示了莴苣全基因组三重复制的基因调控。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae043
Shuai Cao, Nunchanoke Sawettalake, Lisha Shen

Background: Lettuce, an important member of the Asteraceae family, is a globally cultivated cash vegetable crop. With a highly complex genome (∼2.5 Gb; 2n = 18) rich in repeat sequences, current lettuce reference genomes exhibit thousands of gaps, impeding a comprehensive understanding of the lettuce genome.

Findings: Here, we present a near-complete gapless reference genome for cutting lettuce with high transformability, using long-read PacBio HiFi and Nanopore sequencing data. In comparison to stem lettuce genome, we identify 127,681 structural variations (SVs, present in 0.41 Gb of sequence), reflecting the divergence of leafy and stem lettuce. Interestingly, these SVs are related to transposons and DNA methylation states. Furthermore, we identify 4,612 whole-genome triplication genes exhibiting high expression levels associated with low DNA methylation levels and high N6-methyladenosine RNA modifications. DNA methylation changes are also associated with activation of genes involved in callus formation.

Conclusions: Our gapless lettuce genome assembly, an unprecedented achievement in the Asteraceae family, establishes a solid foundation for functional genomics, epigenomics, and crop breeding and sheds new light on understanding the complexity of gene regulation associated with the dynamics of DNA and RNA epigenetics in genome evolution.

背景:莴苣是菊科植物的重要成员,是一种全球栽培的经济蔬菜作物。莴苣基因组高度复杂(2.5 Gb;2n = 18),重复序列丰富,目前的莴苣参考基因组存在数千个缺口,阻碍了对莴苣基因组的全面了解:在这里,我们利用长线程 PacBio HiFi 和 Nanopore 测序数据,为具有高转化率的切莴苣提供了一个近乎完整的无间隙参考基因组。与茎用莴苣基因组相比,我们发现了127,681个结构变异(SV,存在于0.41 Gb的序列中),反映了叶用莴苣和茎用莴苣的差异。有趣的是,这些 SV 与转座子和 DNA 甲基化状态有关。此外,我们还发现了 4,612 个全基因组三复制基因,这些基因的高表达水平与低 DNA 甲基化水平和高 N6-甲基腺苷 RNA 修饰有关。DNA甲基化变化还与参与胼胝体形成的基因激活有关:我们的无间隙莴苣基因组组装是菊科植物中前所未有的成就,为功能基因组学、表观基因组学和作物育种奠定了坚实的基础,并为理解基因组进化过程中与 DNA 和 RNA 表观遗传学动态相关的基因调控的复杂性提供了新的思路。
{"title":"Gapless genome assembly and epigenetic profiles reveal gene regulation of whole-genome triplication in lettuce.","authors":"Shuai Cao, Nunchanoke Sawettalake, Lisha Shen","doi":"10.1093/gigascience/giae043","DOIUrl":"10.1093/gigascience/giae043","url":null,"abstract":"<p><strong>Background: </strong>Lettuce, an important member of the Asteraceae family, is a globally cultivated cash vegetable crop. With a highly complex genome (∼2.5 Gb; 2n = 18) rich in repeat sequences, current lettuce reference genomes exhibit thousands of gaps, impeding a comprehensive understanding of the lettuce genome.</p><p><strong>Findings: </strong>Here, we present a near-complete gapless reference genome for cutting lettuce with high transformability, using long-read PacBio HiFi and Nanopore sequencing data. In comparison to stem lettuce genome, we identify 127,681 structural variations (SVs, present in 0.41 Gb of sequence), reflecting the divergence of leafy and stem lettuce. Interestingly, these SVs are related to transposons and DNA methylation states. Furthermore, we identify 4,612 whole-genome triplication genes exhibiting high expression levels associated with low DNA methylation levels and high N6-methyladenosine RNA modifications. DNA methylation changes are also associated with activation of genes involved in callus formation.</p><p><strong>Conclusions: </strong>Our gapless lettuce genome assembly, an unprecedented achievement in the Asteraceae family, establishes a solid foundation for functional genomics, epigenomics, and crop breeding and sheds new light on understanding the complexity of gene regulation associated with the dynamics of DNA and RNA epigenetics in genome evolution.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11238431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata. PEPhub:用于编辑、共享和验证生物样本元数据的数据库、网络接口和应用程序接口。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae033
Nathan J LeRoy, Oleksandr Khoroshevskyi, Aaron O'Brien, Rafał Stępień, Alip Arslan, Nathan C Sheffield

Background: As biological data increase, we need additional infrastructure to share them and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important and in some ways has a wider scope than sharing data themselves.

Results: Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural-language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural-language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data or to share new data.

Availability: https://pephub.databio.org.

背景:随着生物数据的增加,我们需要更多的基础设施来共享这些数据并促进互操作性。虽然我们在数据共享方面投入了大量精力,但对元数据共享的重视程度却相对较低。然而,元数据共享同样重要,而且在某些方面比数据本身的共享范围更广:在此,我们提出了 PEPhub,一种改善生物元数据共享和互操作性的方法。PEPhub 提供了一个应用程序接口(API)、自然语言搜索以及基于用户友好的网络共享和编辑样本元数据表。我们使用 PEPhub 处理了 100,000 多个已发表的生物研究项目,并通过快速语义自然语言搜索对其进行索引。因此,PEPhub 为查找现有生物研究数据或共享新数据提供了一种快速、用户友好的方式。可用性:https://pephub.databio.org。
{"title":"PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata.","authors":"Nathan J LeRoy, Oleksandr Khoroshevskyi, Aaron O'Brien, Rafał Stępień, Alip Arslan, Nathan C Sheffield","doi":"10.1093/gigascience/giae033","DOIUrl":"10.1093/gigascience/giae033","url":null,"abstract":"<p><strong>Background: </strong>As biological data increase, we need additional infrastructure to share them and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important and in some ways has a wider scope than sharing data themselves.</p><p><strong>Results: </strong>Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural-language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural-language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data or to share new data.</p><p><strong>Availability: </strong>https://pephub.databio.org.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11238423/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Innovative approach for high-throughput exploiting sex-specific markers in Japanese parrotfish Oplegnathus fasciatus. 高通量利用日本鹦嘴鱼性别特异性标记的创新方法。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae045
Yongshuang Xiao, Zhizhong Xiao, Lin Liu, Yuting Ma, Haixia Zhao, Yanduo Wu, Jinwei Huang, Pingrui Xu, Jing Liu, Jun Li
<p><strong>Background: </strong>The use of sex-specific molecular markers has become a prominent method in enhancing fish production and economic value, as well as providing a foundation for understanding the complex molecular mechanisms involved in fish sex determination. Over the past decades, research on male and female sex identification has predominantly employed molecular biology methodologies such as restriction fragment length polymorphism, random amplification of polymorphic DNA, simple sequence repeat, and amplified fragment length polymorphism. The emergence of high-throughput sequencing technologies, particularly Illumina, has led to the utilization of single nucleotide polymorphism and insertion/deletion variants as significant molecular markers for investigating sex identification in fish. The advancement of sex-controlled breeding encounters numerous challenges, including the inefficiency of current methods, intricate experimental protocols, high costs of development, elevated rates of false positives, marker instability, and cumbersome field-testing procedures. Nevertheless, the emergence and swift progress of PacBio high-throughput sequencing technology, characterized by its long-read output capabilities, offers novel opportunities to overcome these obstacles.</p><p><strong>Findings: </strong>Utilizing male/female assembled genome information in conjunction with short-read sequencing data survey and long-read PacBio sequencing data, a catalog of large-segment (>100 bp) insertion/deletion genetic variants was generated through a genome-wide variant site-scanning approach with bidirectional comparisons. The sequence tagging sites were ranked based on the long-read depth of the insertion/deletion site, with markers exhibiting lower long-read depth being considered more effective for large-segment deletion variants. Subsequently, a catalog of bulk primers and simulated PCR for the male/female variant loci was developed, incorporating primer design for the target region and electronic PCR (e-PCR) technology. The Japanese parrotfish (Oplegnathus fasciatus), belonging to the Oplegnathidae family within the Centrarchiformes order, holds significant economic value as a rocky reef fish indigenous to East Asia. The criteria for rapid identification of male and female differences in Japanese parrotfish were established through agarose gel electrophoresis, which revealed 2 amplified bands for males and 1 amplified band for females. A high-throughput identification catalog of sex-specific markers was then constructed using this method, resulting in the identification of 3,639 (2,786 INS/853 DEL, ♀ as reference) and 3,672 (2,876 INS/833 DEL, ♂ as reference) markers in conjunction with 1,021 and 894 high-quality genetic sex identification markers, respectively. Sixteen differential loci were randomly chosen from the catalog for validation, with 11 of them meeting the criteria for male/female distinctions. The implementation of cost-effective and
背景:使用性别特异性分子标记已成为提高鱼类产量和经济价值的重要方法,同时也为了解鱼类性别决定所涉及的复杂分子机制奠定了基础。在过去几十年中,有关雌雄性别鉴定的研究主要采用限制性片段长度多态性、多态 DNA 随机扩增、简单序列重复和扩增片段长度多态性等分子生物学方法。高通量测序技术(尤其是 Illumina)的出现,使得单核苷酸多态性和插入/缺失变异成为研究鱼类性别鉴定的重要分子标记。性别控制育种的发展遇到了许多挑战,包括现有方法效率低、实验方案复杂、开发成本高、假阳性率高、标记不稳定以及现场测试程序繁琐。尽管如此,PacBio 高通量测序技术的出现和迅速发展(其特点是长读数输出能力)为克服这些障碍提供了新的机遇:研究结果:利用男性/女性组装基因组信息,结合短线程测序数据调查和长线程 PacBio 测序数据,通过双向比较的全基因组变异位点扫描方法,生成了大段(>100 bp)插入/缺失遗传变异目录。根据插入/缺失位点的长读取深度对序列标记位点进行排序,认为长读取深度较低的标记对大片段缺失变异更有效。随后,结合目标区域引物设计和电子 PCR(e-PCR)技术,开发了雄性/雌性变异位点的大量引物和模拟 PCR 目录。日本鹦嘴鱼(Oplegnathus fasciatus)隶属于半陆纲鹦嘴鱼科,是东亚特有的岩礁鱼类,具有重要的经济价值。通过琼脂糖凝胶电泳建立了快速鉴定日本鹦嘴鱼雌雄差异的标准,结果显示雄性有 2 条扩增带,雌性有 1 条扩增带。随后,利用该方法构建了性别特异性标记的高通量鉴定目录,分别鉴定出3639个(2786个INS/853个DEL,♀为参考)和3672个(2876个INS/833个DEL,♂为参考)标记,以及1021个和894个高质量遗传性别鉴定标记。从目录中随机选择了 16 个差异位点进行验证,其中 11 个符合雌雄鉴别标准。通过加快不同物种性别遗传标记的高通量开发,实施经济高效的技术流程将促进遗传育种的快速发展:我们的研究利用了从 PacBio 获得的雌雄个体基因组信息,以及短线程测序数据调查和长线程 PacBio 测序数据。我们广泛采用了全基因组变异位点扫描和鉴定、目标区域的高通量引物设计、e-PCR批量扩增,以及变异位点长读数深度的统计分析和排序。通过这种综合方法,我们成功编制了雌雄日本鹦鹉鱼的大插入/缺失位点(>100 bp)目录。
{"title":"Innovative approach for high-throughput exploiting sex-specific markers in Japanese parrotfish Oplegnathus fasciatus.","authors":"Yongshuang Xiao, Zhizhong Xiao, Lin Liu, Yuting Ma, Haixia Zhao, Yanduo Wu, Jinwei Huang, Pingrui Xu, Jing Liu, Jun Li","doi":"10.1093/gigascience/giae045","DOIUrl":"10.1093/gigascience/giae045","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;The use of sex-specific molecular markers has become a prominent method in enhancing fish production and economic value, as well as providing a foundation for understanding the complex molecular mechanisms involved in fish sex determination. Over the past decades, research on male and female sex identification has predominantly employed molecular biology methodologies such as restriction fragment length polymorphism, random amplification of polymorphic DNA, simple sequence repeat, and amplified fragment length polymorphism. The emergence of high-throughput sequencing technologies, particularly Illumina, has led to the utilization of single nucleotide polymorphism and insertion/deletion variants as significant molecular markers for investigating sex identification in fish. The advancement of sex-controlled breeding encounters numerous challenges, including the inefficiency of current methods, intricate experimental protocols, high costs of development, elevated rates of false positives, marker instability, and cumbersome field-testing procedures. Nevertheless, the emergence and swift progress of PacBio high-throughput sequencing technology, characterized by its long-read output capabilities, offers novel opportunities to overcome these obstacles.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Findings: &lt;/strong&gt;Utilizing male/female assembled genome information in conjunction with short-read sequencing data survey and long-read PacBio sequencing data, a catalog of large-segment (&gt;100 bp) insertion/deletion genetic variants was generated through a genome-wide variant site-scanning approach with bidirectional comparisons. The sequence tagging sites were ranked based on the long-read depth of the insertion/deletion site, with markers exhibiting lower long-read depth being considered more effective for large-segment deletion variants. Subsequently, a catalog of bulk primers and simulated PCR for the male/female variant loci was developed, incorporating primer design for the target region and electronic PCR (e-PCR) technology. The Japanese parrotfish (Oplegnathus fasciatus), belonging to the Oplegnathidae family within the Centrarchiformes order, holds significant economic value as a rocky reef fish indigenous to East Asia. The criteria for rapid identification of male and female differences in Japanese parrotfish were established through agarose gel electrophoresis, which revealed 2 amplified bands for males and 1 amplified band for females. A high-throughput identification catalog of sex-specific markers was then constructed using this method, resulting in the identification of 3,639 (2,786 INS/853 DEL, ♀ as reference) and 3,672 (2,876 INS/833 DEL, ♂ as reference) markers in conjunction with 1,021 and 894 high-quality genetic sex identification markers, respectively. Sixteen differential loci were randomly chosen from the catalog for validation, with 11 of them meeting the criteria for male/female distinctions. The implementation of cost-effective and","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11258905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141727099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-quality chromosomal genome assembly of the sea cucumber Chiridota heheva and its hydrothermal adaptation. 海参 Chiridota heheva 的高质量染色体基因组组装及其热液适应性。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad107
Yujin Pu, Yang Zhou, Jun Liu, Haibin Zhang

Background: Chiridota heheva is a cosmopolitan holothurian well adapted to diverse deep-sea ecosystems, especially chemosynthetic environments. Besides high hydrostatic pressure and limited light, high concentrations of metal ions also represent harsh conditions in hydrothermal environments. Few holothurian species can live in such extreme conditions. Therefore, it is valuable to elucidate the adaptive genetic mechanisms of C. heheva in hydrothermal environments.

Findings: Herein, we report a high-quality reference genome assembly of C. heheva from the Kairei vent, which is the first chromosome-level genome of Apodida. The chromosome-level genome size was 1.43 Gb, with a scaffold N50 of 53.24 Mb and BUSCO completeness score of 94.5%. Contig sequences were clustered, ordered, and assembled into 19 natural chromosomes. Comparative genome analysis found that the expanded gene families and positively selected genes of C. heheva were involved in the DNA damage repair process. The expanded gene families and the unique genes contributed to maintaining iron homeostasis in an iron-enriched environment. The positively selected gene RFC2 with 10 positively selected sites played an essential role in DNA repair under extreme environments.

Conclusions: This first chromosome-level genome assembly of C. heheva reveals the hydrothermal adaptation of holothurians. As the first chromosome-level genome of order Apodida, this genome will provide the resource for investigating the evolution of class Holothuroidea.

背景:Chiridota heheva是一种世界性的百足类动物,能很好地适应各种深海生态系统,尤其是化合环境。除了高静水压和有限的光照外,高浓度的金属离子也代表了热液环境中的苛刻条件。能在如此极端条件下生活的百足虫物种少之又少。因此,阐明C. heheva在热液环境中的适应性遗传机制具有重要价值:在此,我们报告了来自凯雷喷口的C. heheva的高质量参考基因组组装,这是Apodida的第一个染色体组水平的基因组。染色体级基因组大小为1.43 Gb,支架N50为53.24 Mb,BUSCO完整性得分为94.5%。对等位基因序列进行了聚类、排序并组装成 19 条天然染色体。比较基因组分析发现,C. heheva的扩展基因家族和正选基因参与了DNA损伤修复过程。扩展基因家族和独特基因有助于在富铁环境中维持铁平衡。具有10个正选位点的正选基因RFC2在极端环境下的DNA修复中发挥了重要作用:C.heheva的首个染色体级基因组组装揭示了热液适应性。作为Apodida目第一个染色体水平的基因组,该基因组将为研究Holothuroidea类的进化提供资源。
{"title":"A high-quality chromosomal genome assembly of the sea cucumber Chiridota heheva and its hydrothermal adaptation.","authors":"Yujin Pu, Yang Zhou, Jun Liu, Haibin Zhang","doi":"10.1093/gigascience/giad107","DOIUrl":"10.1093/gigascience/giad107","url":null,"abstract":"<p><strong>Background: </strong>Chiridota heheva is a cosmopolitan holothurian well adapted to diverse deep-sea ecosystems, especially chemosynthetic environments. Besides high hydrostatic pressure and limited light, high concentrations of metal ions also represent harsh conditions in hydrothermal environments. Few holothurian species can live in such extreme conditions. Therefore, it is valuable to elucidate the adaptive genetic mechanisms of C. heheva in hydrothermal environments.</p><p><strong>Findings: </strong>Herein, we report a high-quality reference genome assembly of C. heheva from the Kairei vent, which is the first chromosome-level genome of Apodida. The chromosome-level genome size was 1.43 Gb, with a scaffold N50 of 53.24 Mb and BUSCO completeness score of 94.5%. Contig sequences were clustered, ordered, and assembled into 19 natural chromosomes. Comparative genome analysis found that the expanded gene families and positively selected genes of C. heheva were involved in the DNA damage repair process. The expanded gene families and the unique genes contributed to maintaining iron homeostasis in an iron-enriched environment. The positively selected gene RFC2 with 10 positively selected sites played an essential role in DNA repair under extreme environments.</p><p><strong>Conclusions: </strong>This first chromosome-level genome assembly of C. heheva reveals the hydrothermal adaptation of holothurians. As the first chromosome-level genome of order Apodida, this genome will provide the resource for investigating the evolution of class Holothuroidea.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10764150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139086481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A graph clustering algorithm for detection and genotyping of structural variants from long reads. 从长读数中检测结构变异并进行基因分型的图聚类算法。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad112
Nicolás Gaitán, Jorge Duitama

Background: Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed.

Findings: We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths.

Conclusion: The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.

背景:结构变异(SV)是由其长度(大于 50 bp)定义的基因组多态性。SV 的常见类型有缺失、插入、易位、倒位和拷贝数变异。鉴于 SV 在表型变异和进化事件等现象中的作用,SV 的检测和基因分型至关重要。因此,最近开发出了利用长线程测序数据识别 SV 的方法:我们提出了一种准确、高效的算法,用于从长读序测序数据中预测种系SV。该算法首先从读数比对中收集 SV 的证据(特征)。然后,根据长度和基因组位置计算出的坐标欧几里得图对特征进行聚类。聚类是通过 DBSCAN 算法进行的,该算法具有高分辨率划分聚类的优势。聚类被转化为 SV,贝叶斯模型可根据 SV 的支持证据对 SV 进行精确的基因分型。该算法已被集成到下一代测序体验平台的单样本变异检测器中,从而促进了与其他基因组学分析功能的集成。我们进行了多个基准实验,包括模拟和真实数据,代表了不同的基因组图谱、测序技术(PacBio HiFi、ONT)和读取深度:结果表明,在种系 SV 调用和基因分型方面,我们的方法优于最先进的工具,尤其是在低深度和易出错的重复区域。我们相信,这项工作将极大地促进生物信息学策略的发展,从而最大限度地利用长读数测序技术。
{"title":"A graph clustering algorithm for detection and genotyping of structural variants from long reads.","authors":"Nicolás Gaitán, Jorge Duitama","doi":"10.1093/gigascience/giad112","DOIUrl":"10.1093/gigascience/giad112","url":null,"abstract":"<p><strong>Background: </strong>Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed.</p><p><strong>Findings: </strong>We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths.</p><p><strong>Conclusion: </strong>The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10783151/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139416802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MMV_Im2Im: an open-source microscopy machine vision toolbox for image-to-image transformation. MMV_Im2Im:用于图像到图像转换的开源显微镜机器视觉工具箱。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad120
Justin Sonneck, Yu Zhou, Jianxu Chen

Over the past decade, deep learning (DL) research in computer vision has been growing rapidly, with many advances in DL-based image analysis methods for biomedical problems. In this work, we introduce MMV_Im2Im, a new open-source Python package for image-to-image transformation in bioimaging applications. MMV_Im2Im is designed with a generic image-to-image transformation framework that can be used for a wide range of tasks, including semantic segmentation, instance segmentation, image restoration, image generation, and so on. Our implementation takes advantage of state-of-the-art machine learning engineering techniques, allowing researchers to focus on their research without worrying about engineering details. We demonstrate the effectiveness of MMV_Im2Im on more than 10 different biomedical problems, showcasing its general potentials and applicabilities. For computational biomedical researchers, MMV_Im2Im provides a starting point for developing new biomedical image analysis or machine learning algorithms, where they can either reuse the code in this package or fork and extend this package to facilitate the development of new methods. Experimental biomedical researchers can benefit from this work by gaining a comprehensive view of the image-to-image transformation concept through diversified examples and use cases. We hope this work can give the community inspirations on how DL-based image-to-image transformation can be integrated into the assay development process, enabling new biomedical studies that cannot be done only with traditional experimental assays. To help researchers get started, we have provided source code, documentation, and tutorials for MMV_Im2Im at [https://github.com/MMV-Lab/mmv_im2im] under MIT license.

过去十年间,计算机视觉领域的深度学习(DL)研究发展迅速,基于 DL 的生物医学问题图像分析方法也取得了许多进展。在这项工作中,我们介绍了 MMV_Im2Im,这是一个新的开源 Python 软件包,用于生物成像应用中的图像到图像转换。MMV_Im2Im 设计了一个通用的图像到图像转换框架,可用于多种任务,包括语义分割、实例分割、图像复原、图像生成等。我们的实现利用了最先进的机器学习工程技术,使研究人员能够专注于他们的研究,而不必担心工程细节。我们在 10 多个不同的生物医学问题上演示了 MMV_Im2Im 的有效性,展示了它的普遍潜力和适用性。对于计算生物医学研究人员来说,MMV_Im2Im 为他们开发新的生物医学图像分析或机器学习算法提供了一个起点,他们既可以重复使用该软件包中的代码,也可以对该软件包进行分叉和扩展,以促进新方法的开发。生物医学实验研究人员可以从这项工作中获益,通过多样化的示例和用例全面了解图像到图像的转换概念。我们希望这项工作能给社区带来启发,让他们了解如何将基于 DL 的图像到图像转换集成到检测开发流程中,从而实现传统实验检测无法完成的新生物医学研究。为了帮助研究人员入门,我们在 MIT 许可下在 [https://github.com/MMV-Lab/mmv_im2im] 网站上提供了 MMV_Im2Im 的源代码、文档和教程。
{"title":"MMV_Im2Im: an open-source microscopy machine vision toolbox for image-to-image transformation.","authors":"Justin Sonneck, Yu Zhou, Jianxu Chen","doi":"10.1093/gigascience/giad120","DOIUrl":"10.1093/gigascience/giad120","url":null,"abstract":"<p><p>Over the past decade, deep learning (DL) research in computer vision has been growing rapidly, with many advances in DL-based image analysis methods for biomedical problems. In this work, we introduce MMV_Im2Im, a new open-source Python package for image-to-image transformation in bioimaging applications. MMV_Im2Im is designed with a generic image-to-image transformation framework that can be used for a wide range of tasks, including semantic segmentation, instance segmentation, image restoration, image generation, and so on. Our implementation takes advantage of state-of-the-art machine learning engineering techniques, allowing researchers to focus on their research without worrying about engineering details. We demonstrate the effectiveness of MMV_Im2Im on more than 10 different biomedical problems, showcasing its general potentials and applicabilities. For computational biomedical researchers, MMV_Im2Im provides a starting point for developing new biomedical image analysis or machine learning algorithms, where they can either reuse the code in this package or fork and extend this package to facilitate the development of new methods. Experimental biomedical researchers can benefit from this work by gaining a comprehensive view of the image-to-image transformation concept through diversified examples and use cases. We hope this work can give the community inspirations on how DL-based image-to-image transformation can be integrated into the assay development process, enabling new biomedical studies that cannot be done only with traditional experimental assays. To help researchers get started, we have provided source code, documentation, and tutorials for MMV_Im2Im at [https://github.com/MMV-Lab/mmv_im2im] under MIT license.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10821710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139570421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The first high-altitude autotetraploid haplotype-resolved genome assembled (Rhododendron nivale subsp. boreale) provides new insights into mountaintop adaptation. 首次组装的高海拔自交单倍体单倍型基因组(Rhododendron nivale subsp.
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae052
Zhen-Yu Lyu, Xiong-Li Zhou, Si-Qi Wang, Gao-Ming Yang, Wen-Guang Sun, Jie-Yu Zhang, Rui Zhang, Shi-Kang Shen

Background: Rhododendron nivale subsp. boreale Philipson et M. N. Philipson is an alpine woody species with ornamental qualities that serve as the predominant species in mountainous scrub habitats found at an altitude of ∼4,200 m. As a high-altitude woody polyploid, this species may serve as a model to understand how plants adapt to alpine environments. Despite its ecological significance, the lack of genomic resources has hindered a comprehensive understanding of its evolutionary and adaptive characteristics in high-altitude mountainous environments.

Findings: We sequenced and assembled the genome of R. nivale subsp. boreale, an assembly of the first subgenus Rhododendron and the first high-altitude woody flowering tetraploid, contributing an important genomic resource for alpine woody flora. The assembly included 52 pseudochromosomes (scaffold N50 = 42.93 Mb; BUSCO = 98.8%; QV = 45.51; S-AQI = 98.69), which belonged to 4 haplotypes, harboring 127,810 predicted protein-coding genes. Conjoint k-mer analysis, collinearity assessment, and phylogenetic investigation corroborated autotetraploid identity. Comparative genomic analysis revealed that R. nivale subsp. boreale originated as a neopolyploid of R. nivale and underwent 2 rounds of ancient polyploidy events. Transcriptional expression analysis showed that differences in expression between alleles were common and randomly distributed in the genome. We identified extended gene families and signatures of positive selection that are involved not only in adaptation to the mountaintop ecosystem (response to stress and developmental regulation) but also in autotetraploid reproduction (meiotic stabilization). Additionally, the expression levels of the (group VII ethylene response factor transcription factors) ERF VIIs were significantly higher than the mean global gene expression. We suspect that these changes have enabled the success of this species at high altitudes.

Conclusions: We assembled the first high-altitude autopolyploid genome and achieved chromosome-level assembly within the subgenus Rhododendron. In addition, a high-altitude adaptation strategy of R. nivale subsp. boreale was reasonably speculated. This study provides valuable data for the exploration of alpine mountaintop adaptations and the correlation between extreme environments and species polyploidization.

背景:Rhododendron nivale subsp. boreale Philipson et M. N. Philipson 是一种具有观赏价值的高山木本物种,是海拔 4,200 米以上山地灌丛生境中的主要物种。尽管其生态学意义重大,但基因组资源的缺乏阻碍了对其在高海拔山区环境中的进化和适应特征的全面了解:nivale subsp. boreale的基因组进行了测序和组装,这是杜鹃花亚属的第一个基因组,也是第一个高海拔木本开花四倍体,为高山木本植物群提供了重要的基因组资源。该组配包括 52 个假染色体(支架 N50 = 42.93 Mb;BUSCO = 98.8%;QV = 45.51;S-AQI = 98.69),分属 4 个单倍型,包含 127,810 个预测的蛋白编码基因。联合 k-mer 分析、共线性评估和系统发育调查证实了自四倍体的身份。比较基因组分析表明,R. nivale亚种起源于R. nivale的新多倍体,经历了两轮古老的多倍体事件。转录表达分析表明,等位基因之间的表达差异很常见,并且随机分布在基因组中。我们发现了扩展的基因家族和正选择的特征,它们不仅参与了对山顶生态系统的适应(对压力的反应和发育调节),还参与了自交四倍体的繁殖(减数分裂的稳定)。此外,(第七组乙烯响应因子转录因子)ERF VIIs 的表达水平明显高于全球基因的平均表达水平。我们怀疑这些变化使该物种在高海拔地区获得了成功:我们组装了首个高海拔自多倍体基因组,并在杜鹃花亚属中实现了染色体组水平的组装。此外,我们还合理推测了北海道杜鹃亚种的高海拔适应策略。该研究为探索高山山顶适应性以及极端环境与物种多倍体化之间的相关性提供了宝贵的数据。
{"title":"The first high-altitude autotetraploid haplotype-resolved genome assembled (Rhododendron nivale subsp. boreale) provides new insights into mountaintop adaptation.","authors":"Zhen-Yu Lyu, Xiong-Li Zhou, Si-Qi Wang, Gao-Ming Yang, Wen-Guang Sun, Jie-Yu Zhang, Rui Zhang, Shi-Kang Shen","doi":"10.1093/gigascience/giae052","DOIUrl":"10.1093/gigascience/giae052","url":null,"abstract":"<p><strong>Background: </strong>Rhododendron nivale subsp. boreale Philipson et M. N. Philipson is an alpine woody species with ornamental qualities that serve as the predominant species in mountainous scrub habitats found at an altitude of ∼4,200 m. As a high-altitude woody polyploid, this species may serve as a model to understand how plants adapt to alpine environments. Despite its ecological significance, the lack of genomic resources has hindered a comprehensive understanding of its evolutionary and adaptive characteristics in high-altitude mountainous environments.</p><p><strong>Findings: </strong>We sequenced and assembled the genome of R. nivale subsp. boreale, an assembly of the first subgenus Rhododendron and the first high-altitude woody flowering tetraploid, contributing an important genomic resource for alpine woody flora. The assembly included 52 pseudochromosomes (scaffold N50 = 42.93 Mb; BUSCO = 98.8%; QV = 45.51; S-AQI = 98.69), which belonged to 4 haplotypes, harboring 127,810 predicted protein-coding genes. Conjoint k-mer analysis, collinearity assessment, and phylogenetic investigation corroborated autotetraploid identity. Comparative genomic analysis revealed that R. nivale subsp. boreale originated as a neopolyploid of R. nivale and underwent 2 rounds of ancient polyploidy events. Transcriptional expression analysis showed that differences in expression between alleles were common and randomly distributed in the genome. We identified extended gene families and signatures of positive selection that are involved not only in adaptation to the mountaintop ecosystem (response to stress and developmental regulation) but also in autotetraploid reproduction (meiotic stabilization). Additionally, the expression levels of the (group VII ethylene response factor transcription factors) ERF VIIs were significantly higher than the mean global gene expression. We suspect that these changes have enabled the success of this species at high altitudes.</p><p><strong>Conclusions: </strong>We assembled the first high-altitude autopolyploid genome and achieved chromosome-level assembly within the subgenus Rhododendron. In addition, a high-altitude adaptation strategy of R. nivale subsp. boreale was reasonably speculated. This study provides valuable data for the exploration of alpine mountaintop adaptations and the correlation between extreme environments and species polyploidization.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome. LRTK:用于人类基因组和元基因组联读分析的平台无关工具包。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae028
Chao Yang, Zhenmiao Zhang, Yufen Huang, Xuefeng Xie, Herui Liao, Jin Xiao, Werner Pieter Veldsman, Kejing Yin, Xiaodong Fang, Lu Zhang

Background: Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform.

Findings: To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots.

Conclusions: LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.

背景:链接读数测序技术可产生高碱基质量的短读数,这些读数包含长程 DNA 连接性的推断信息。链接读数技术的这些优势众所周知,并已在许多人类基因组和元基因组研究中得到证实。然而,现有的链接读数分析管道(如 Long Ranger)主要是为处理人类基因组测序数据而开发的,并不适合分析元基因组测序数据。此外,链接读数分析管道通常仅限于一种特定的测序平台:为了解决这些局限性,我们提出了链接读取工具包(LRTK),这是一个统一、通用的工具包,可用于处理人类基因组和元基因组的链接读取测序数据,不受平台限制。LRTK 提供的功能包括链接读数模拟、条形码测序纠错、条形码感知读数比对和元基因组组装、长 DNA 片段重建、分类学分类和量化以及条形码辅助基因组变异调用和分期。LRTK 能够自动处理多个样本,并为用户提供在处理原始测序数据期间和整个下游分析过程中的多个检查点生成可重现报告的选项。我们将 LRTK 应用于人类基因组和元基因组的模拟、模拟群落和真实数据集的链接读数。我们展示了 LRTK 从之前的基准研究中生成性能比较结果的能力,并以可供出版的 HTML 文档图报告这些结果:LRTK 提供了全面而灵活的模块,以及易于使用的基于 Python- 的工作流程,用于处理链接读数测序数据集,从而填补了该领域目前由以平台为中心的基因组特定链接读数数据分析工具造成的空白。
{"title":"LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome.","authors":"Chao Yang, Zhenmiao Zhang, Yufen Huang, Xuefeng Xie, Herui Liao, Jin Xiao, Werner Pieter Veldsman, Kejing Yin, Xiaodong Fang, Lu Zhang","doi":"10.1093/gigascience/giae028","DOIUrl":"10.1093/gigascience/giae028","url":null,"abstract":"<p><strong>Background: </strong>Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform.</p><p><strong>Findings: </strong>To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots.</p><p><strong>Conclusions: </strong>LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11170215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141310460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1