Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae050
Elena Castillo-Lorenzo, Elinor Breman, Pablo Gómez Barreiro, Juan Viruel
Background: The economic importance of the globally distributed Brassicaceae family resides in the large diversity of crops within the family and the substantial variety of agronomic and functional traits they possess. We reviewed the current classifications of crop wild relatives (CWRs) in the Brassicaceae family with the aim of identifying new potential cross-compatible species from a total of 1,242 species using phylogenetic approaches.
Results: In general, cross-compatibility data between wild species and crops, as well as phenotype and genotype characterisation data, were available for major crops but very limited for minor crops, restricting the identification of new potential CWRs. Around 70% of wild Brassicaceae did not have genetic sequence data available in public repositories, and only 40% had chromosome counts published. Using phylogenetic distances, we propose 103 new potential CWRs for this family, which we recommend as priorities for cross-compatibility tests with crops and for phenotypic characterisation, including 71 newly identified CWRs for 10 minor crops. From the total species used in this study, more than half had no records of being in ex situ conservation, and 80% were not assessed for their conservation status or were data deficient (IUCN Red List Assessments).
Conclusions: Great efforts are needed on ex situ conservation to have accessible material for characterising and evaluating the species for future breeding programmes. We identified the Mediterranean region as one key conservation area for wild Brassicaceae species, with great numbers of endemic and threatened species. Conservation assessments are urgently needed to evaluate most of these wild Brassicaceae.
{"title":"Current status of global conservation and characterisation of wild and cultivated Brassicaceae genetic resources.","authors":"Elena Castillo-Lorenzo, Elinor Breman, Pablo Gómez Barreiro, Juan Viruel","doi":"10.1093/gigascience/giae050","DOIUrl":"10.1093/gigascience/giae050","url":null,"abstract":"<p><strong>Background: </strong>The economic importance of the globally distributed Brassicaceae family resides in the large diversity of crops within the family and the substantial variety of agronomic and functional traits they possess. We reviewed the current classifications of crop wild relatives (CWRs) in the Brassicaceae family with the aim of identifying new potential cross-compatible species from a total of 1,242 species using phylogenetic approaches.</p><p><strong>Results: </strong>In general, cross-compatibility data between wild species and crops, as well as phenotype and genotype characterisation data, were available for major crops but very limited for minor crops, restricting the identification of new potential CWRs. Around 70% of wild Brassicaceae did not have genetic sequence data available in public repositories, and only 40% had chromosome counts published. Using phylogenetic distances, we propose 103 new potential CWRs for this family, which we recommend as priorities for cross-compatibility tests with crops and for phenotypic characterisation, including 71 newly identified CWRs for 10 minor crops. From the total species used in this study, more than half had no records of being in ex situ conservation, and 80% were not assessed for their conservation status or were data deficient (IUCN Red List Assessments).</p><p><strong>Conclusions: </strong>Great efforts are needed on ex situ conservation to have accessible material for characterising and evaluating the species for future breeding programmes. We identified the Mediterranean region as one key conservation area for wild Brassicaceae species, with great numbers of endemic and threatened species. Conservation assessments are urgently needed to evaluate most of these wild Brassicaceae.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304946/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giad109
Zafran Hussain Shah, Marcel Müller, Wolfgang Hübner, Tung-Cheng Wang, Daniel Telman, Thomas Huser, Wolfram Schenck
Background: Convolutional neural network (CNN)-based methods have shown excellent performance in denoising and reconstruction of super-resolved structured illumination microscopy (SR-SIM) data. Therefore, CNN-based architectures have been the focus of existing studies. However, Swin Transformer, an alternative and recently proposed deep learning-based image restoration architecture, has not been fully investigated for denoising SR-SIM images. Furthermore, it has not been fully explored how well transfer learning strategies work for denoising SR-SIM images with different noise characteristics and recorded cell structures for these different types of deep learning-based methods. Currently, the scarcity of publicly available SR-SIM datasets limits the exploration of the performance and generalization capabilities of deep learning methods.
Results: In this work, we present SwinT-fairSIM, a novel method based on the Swin Transformer for restoring SR-SIM images with a low signal-to-noise ratio. The experimental results show that SwinT-fairSIM outperforms previous CNN-based denoising methods. Furthermore, as a second contribution, two types of transfer learning-namely, direct transfer and fine-tuning-were benchmarked in combination with SwinT-fairSIM and CNN-based methods for denoising SR-SIM data. Direct transfer did not prove to be a viable strategy, but fine-tuning produced results comparable to conventional training from scratch while saving computational time and potentially reducing the amount of training data required. As a third contribution, we publish four datasets of raw SIM images and already reconstructed SR-SIM images. These datasets cover two different types of cell structures, tubulin filaments and vesicle structures. Different noise levels are available for the tubulin filaments.
Conclusion: The SwinT-fairSIM method is well suited for denoising SR-SIM images. By fine-tuning, already trained models can be easily adapted to different noise characteristics and cell structures. Furthermore, the provided datasets are structured in a way that the research community can readily use them for research on denoising, super-resolution, and transfer learning strategies.
{"title":"Evaluation of Swin Transformer and knowledge transfer for denoising of super-resolution structured illumination microscopy data.","authors":"Zafran Hussain Shah, Marcel Müller, Wolfgang Hübner, Tung-Cheng Wang, Daniel Telman, Thomas Huser, Wolfram Schenck","doi":"10.1093/gigascience/giad109","DOIUrl":"10.1093/gigascience/giad109","url":null,"abstract":"<p><strong>Background: </strong>Convolutional neural network (CNN)-based methods have shown excellent performance in denoising and reconstruction of super-resolved structured illumination microscopy (SR-SIM) data. Therefore, CNN-based architectures have been the focus of existing studies. However, Swin Transformer, an alternative and recently proposed deep learning-based image restoration architecture, has not been fully investigated for denoising SR-SIM images. Furthermore, it has not been fully explored how well transfer learning strategies work for denoising SR-SIM images with different noise characteristics and recorded cell structures for these different types of deep learning-based methods. Currently, the scarcity of publicly available SR-SIM datasets limits the exploration of the performance and generalization capabilities of deep learning methods.</p><p><strong>Results: </strong>In this work, we present SwinT-fairSIM, a novel method based on the Swin Transformer for restoring SR-SIM images with a low signal-to-noise ratio. The experimental results show that SwinT-fairSIM outperforms previous CNN-based denoising methods. Furthermore, as a second contribution, two types of transfer learning-namely, direct transfer and fine-tuning-were benchmarked in combination with SwinT-fairSIM and CNN-based methods for denoising SR-SIM data. Direct transfer did not prove to be a viable strategy, but fine-tuning produced results comparable to conventional training from scratch while saving computational time and potentially reducing the amount of training data required. As a third contribution, we publish four datasets of raw SIM images and already reconstructed SR-SIM images. These datasets cover two different types of cell structures, tubulin filaments and vesicle structures. Different noise levels are available for the tubulin filaments.</p><p><strong>Conclusion: </strong>The SwinT-fairSIM method is well suited for denoising SR-SIM images. By fine-tuning, already trained models can be easily adapted to different noise characteristics and cell structures. Furthermore, the provided datasets are structured in a way that the research community can readily use them for research on denoising, super-resolution, and transfer learning strategies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139466408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae043
Shuai Cao, Nunchanoke Sawettalake, Lisha Shen
Background: Lettuce, an important member of the Asteraceae family, is a globally cultivated cash vegetable crop. With a highly complex genome (∼2.5 Gb; 2n = 18) rich in repeat sequences, current lettuce reference genomes exhibit thousands of gaps, impeding a comprehensive understanding of the lettuce genome.
Findings: Here, we present a near-complete gapless reference genome for cutting lettuce with high transformability, using long-read PacBio HiFi and Nanopore sequencing data. In comparison to stem lettuce genome, we identify 127,681 structural variations (SVs, present in 0.41 Gb of sequence), reflecting the divergence of leafy and stem lettuce. Interestingly, these SVs are related to transposons and DNA methylation states. Furthermore, we identify 4,612 whole-genome triplication genes exhibiting high expression levels associated with low DNA methylation levels and high N6-methyladenosine RNA modifications. DNA methylation changes are also associated with activation of genes involved in callus formation.
Conclusions: Our gapless lettuce genome assembly, an unprecedented achievement in the Asteraceae family, establishes a solid foundation for functional genomics, epigenomics, and crop breeding and sheds new light on understanding the complexity of gene regulation associated with the dynamics of DNA and RNA epigenetics in genome evolution.
背景:莴苣是菊科植物的重要成员,是一种全球栽培的经济蔬菜作物。莴苣基因组高度复杂(2.5 Gb;2n = 18),重复序列丰富,目前的莴苣参考基因组存在数千个缺口,阻碍了对莴苣基因组的全面了解:在这里,我们利用长线程 PacBio HiFi 和 Nanopore 测序数据,为具有高转化率的切莴苣提供了一个近乎完整的无间隙参考基因组。与茎用莴苣基因组相比,我们发现了127,681个结构变异(SV,存在于0.41 Gb的序列中),反映了叶用莴苣和茎用莴苣的差异。有趣的是,这些 SV 与转座子和 DNA 甲基化状态有关。此外,我们还发现了 4,612 个全基因组三复制基因,这些基因的高表达水平与低 DNA 甲基化水平和高 N6-甲基腺苷 RNA 修饰有关。DNA甲基化变化还与参与胼胝体形成的基因激活有关:我们的无间隙莴苣基因组组装是菊科植物中前所未有的成就,为功能基因组学、表观基因组学和作物育种奠定了坚实的基础,并为理解基因组进化过程中与 DNA 和 RNA 表观遗传学动态相关的基因调控的复杂性提供了新的思路。
{"title":"Gapless genome assembly and epigenetic profiles reveal gene regulation of whole-genome triplication in lettuce.","authors":"Shuai Cao, Nunchanoke Sawettalake, Lisha Shen","doi":"10.1093/gigascience/giae043","DOIUrl":"10.1093/gigascience/giae043","url":null,"abstract":"<p><strong>Background: </strong>Lettuce, an important member of the Asteraceae family, is a globally cultivated cash vegetable crop. With a highly complex genome (∼2.5 Gb; 2n = 18) rich in repeat sequences, current lettuce reference genomes exhibit thousands of gaps, impeding a comprehensive understanding of the lettuce genome.</p><p><strong>Findings: </strong>Here, we present a near-complete gapless reference genome for cutting lettuce with high transformability, using long-read PacBio HiFi and Nanopore sequencing data. In comparison to stem lettuce genome, we identify 127,681 structural variations (SVs, present in 0.41 Gb of sequence), reflecting the divergence of leafy and stem lettuce. Interestingly, these SVs are related to transposons and DNA methylation states. Furthermore, we identify 4,612 whole-genome triplication genes exhibiting high expression levels associated with low DNA methylation levels and high N6-methyladenosine RNA modifications. DNA methylation changes are also associated with activation of genes involved in callus formation.</p><p><strong>Conclusions: </strong>Our gapless lettuce genome assembly, an unprecedented achievement in the Asteraceae family, establishes a solid foundation for functional genomics, epigenomics, and crop breeding and sheds new light on understanding the complexity of gene regulation associated with the dynamics of DNA and RNA epigenetics in genome evolution.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11238431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: As biological data increase, we need additional infrastructure to share them and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important and in some ways has a wider scope than sharing data themselves.
Results: Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural-language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural-language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data or to share new data.
{"title":"PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata.","authors":"Nathan J LeRoy, Oleksandr Khoroshevskyi, Aaron O'Brien, Rafał Stępień, Alip Arslan, Nathan C Sheffield","doi":"10.1093/gigascience/giae033","DOIUrl":"10.1093/gigascience/giae033","url":null,"abstract":"<p><strong>Background: </strong>As biological data increase, we need additional infrastructure to share them and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important and in some ways has a wider scope than sharing data themselves.</p><p><strong>Results: </strong>Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural-language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural-language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data or to share new data.</p><p><strong>Availability: </strong>https://pephub.databio.org.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11238423/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae045
Yongshuang Xiao, Zhizhong Xiao, Lin Liu, Yuting Ma, Haixia Zhao, Yanduo Wu, Jinwei Huang, Pingrui Xu, Jing Liu, Jun Li
<p><strong>Background: </strong>The use of sex-specific molecular markers has become a prominent method in enhancing fish production and economic value, as well as providing a foundation for understanding the complex molecular mechanisms involved in fish sex determination. Over the past decades, research on male and female sex identification has predominantly employed molecular biology methodologies such as restriction fragment length polymorphism, random amplification of polymorphic DNA, simple sequence repeat, and amplified fragment length polymorphism. The emergence of high-throughput sequencing technologies, particularly Illumina, has led to the utilization of single nucleotide polymorphism and insertion/deletion variants as significant molecular markers for investigating sex identification in fish. The advancement of sex-controlled breeding encounters numerous challenges, including the inefficiency of current methods, intricate experimental protocols, high costs of development, elevated rates of false positives, marker instability, and cumbersome field-testing procedures. Nevertheless, the emergence and swift progress of PacBio high-throughput sequencing technology, characterized by its long-read output capabilities, offers novel opportunities to overcome these obstacles.</p><p><strong>Findings: </strong>Utilizing male/female assembled genome information in conjunction with short-read sequencing data survey and long-read PacBio sequencing data, a catalog of large-segment (>100 bp) insertion/deletion genetic variants was generated through a genome-wide variant site-scanning approach with bidirectional comparisons. The sequence tagging sites were ranked based on the long-read depth of the insertion/deletion site, with markers exhibiting lower long-read depth being considered more effective for large-segment deletion variants. Subsequently, a catalog of bulk primers and simulated PCR for the male/female variant loci was developed, incorporating primer design for the target region and electronic PCR (e-PCR) technology. The Japanese parrotfish (Oplegnathus fasciatus), belonging to the Oplegnathidae family within the Centrarchiformes order, holds significant economic value as a rocky reef fish indigenous to East Asia. The criteria for rapid identification of male and female differences in Japanese parrotfish were established through agarose gel electrophoresis, which revealed 2 amplified bands for males and 1 amplified band for females. A high-throughput identification catalog of sex-specific markers was then constructed using this method, resulting in the identification of 3,639 (2,786 INS/853 DEL, ♀ as reference) and 3,672 (2,876 INS/833 DEL, ♂ as reference) markers in conjunction with 1,021 and 894 high-quality genetic sex identification markers, respectively. Sixteen differential loci were randomly chosen from the catalog for validation, with 11 of them meeting the criteria for male/female distinctions. The implementation of cost-effective and
{"title":"Innovative approach for high-throughput exploiting sex-specific markers in Japanese parrotfish Oplegnathus fasciatus.","authors":"Yongshuang Xiao, Zhizhong Xiao, Lin Liu, Yuting Ma, Haixia Zhao, Yanduo Wu, Jinwei Huang, Pingrui Xu, Jing Liu, Jun Li","doi":"10.1093/gigascience/giae045","DOIUrl":"10.1093/gigascience/giae045","url":null,"abstract":"<p><strong>Background: </strong>The use of sex-specific molecular markers has become a prominent method in enhancing fish production and economic value, as well as providing a foundation for understanding the complex molecular mechanisms involved in fish sex determination. Over the past decades, research on male and female sex identification has predominantly employed molecular biology methodologies such as restriction fragment length polymorphism, random amplification of polymorphic DNA, simple sequence repeat, and amplified fragment length polymorphism. The emergence of high-throughput sequencing technologies, particularly Illumina, has led to the utilization of single nucleotide polymorphism and insertion/deletion variants as significant molecular markers for investigating sex identification in fish. The advancement of sex-controlled breeding encounters numerous challenges, including the inefficiency of current methods, intricate experimental protocols, high costs of development, elevated rates of false positives, marker instability, and cumbersome field-testing procedures. Nevertheless, the emergence and swift progress of PacBio high-throughput sequencing technology, characterized by its long-read output capabilities, offers novel opportunities to overcome these obstacles.</p><p><strong>Findings: </strong>Utilizing male/female assembled genome information in conjunction with short-read sequencing data survey and long-read PacBio sequencing data, a catalog of large-segment (>100 bp) insertion/deletion genetic variants was generated through a genome-wide variant site-scanning approach with bidirectional comparisons. The sequence tagging sites were ranked based on the long-read depth of the insertion/deletion site, with markers exhibiting lower long-read depth being considered more effective for large-segment deletion variants. Subsequently, a catalog of bulk primers and simulated PCR for the male/female variant loci was developed, incorporating primer design for the target region and electronic PCR (e-PCR) technology. The Japanese parrotfish (Oplegnathus fasciatus), belonging to the Oplegnathidae family within the Centrarchiformes order, holds significant economic value as a rocky reef fish indigenous to East Asia. The criteria for rapid identification of male and female differences in Japanese parrotfish were established through agarose gel electrophoresis, which revealed 2 amplified bands for males and 1 amplified band for females. A high-throughput identification catalog of sex-specific markers was then constructed using this method, resulting in the identification of 3,639 (2,786 INS/853 DEL, ♀ as reference) and 3,672 (2,876 INS/833 DEL, ♂ as reference) markers in conjunction with 1,021 and 894 high-quality genetic sex identification markers, respectively. Sixteen differential loci were randomly chosen from the catalog for validation, with 11 of them meeting the criteria for male/female distinctions. The implementation of cost-effective and","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11258905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141727099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giad107
Yujin Pu, Yang Zhou, Jun Liu, Haibin Zhang
Background: Chiridota heheva is a cosmopolitan holothurian well adapted to diverse deep-sea ecosystems, especially chemosynthetic environments. Besides high hydrostatic pressure and limited light, high concentrations of metal ions also represent harsh conditions in hydrothermal environments. Few holothurian species can live in such extreme conditions. Therefore, it is valuable to elucidate the adaptive genetic mechanisms of C. heheva in hydrothermal environments.
Findings: Herein, we report a high-quality reference genome assembly of C. heheva from the Kairei vent, which is the first chromosome-level genome of Apodida. The chromosome-level genome size was 1.43 Gb, with a scaffold N50 of 53.24 Mb and BUSCO completeness score of 94.5%. Contig sequences were clustered, ordered, and assembled into 19 natural chromosomes. Comparative genome analysis found that the expanded gene families and positively selected genes of C. heheva were involved in the DNA damage repair process. The expanded gene families and the unique genes contributed to maintaining iron homeostasis in an iron-enriched environment. The positively selected gene RFC2 with 10 positively selected sites played an essential role in DNA repair under extreme environments.
Conclusions: This first chromosome-level genome assembly of C. heheva reveals the hydrothermal adaptation of holothurians. As the first chromosome-level genome of order Apodida, this genome will provide the resource for investigating the evolution of class Holothuroidea.
{"title":"A high-quality chromosomal genome assembly of the sea cucumber Chiridota heheva and its hydrothermal adaptation.","authors":"Yujin Pu, Yang Zhou, Jun Liu, Haibin Zhang","doi":"10.1093/gigascience/giad107","DOIUrl":"10.1093/gigascience/giad107","url":null,"abstract":"<p><strong>Background: </strong>Chiridota heheva is a cosmopolitan holothurian well adapted to diverse deep-sea ecosystems, especially chemosynthetic environments. Besides high hydrostatic pressure and limited light, high concentrations of metal ions also represent harsh conditions in hydrothermal environments. Few holothurian species can live in such extreme conditions. Therefore, it is valuable to elucidate the adaptive genetic mechanisms of C. heheva in hydrothermal environments.</p><p><strong>Findings: </strong>Herein, we report a high-quality reference genome assembly of C. heheva from the Kairei vent, which is the first chromosome-level genome of Apodida. The chromosome-level genome size was 1.43 Gb, with a scaffold N50 of 53.24 Mb and BUSCO completeness score of 94.5%. Contig sequences were clustered, ordered, and assembled into 19 natural chromosomes. Comparative genome analysis found that the expanded gene families and positively selected genes of C. heheva were involved in the DNA damage repair process. The expanded gene families and the unique genes contributed to maintaining iron homeostasis in an iron-enriched environment. The positively selected gene RFC2 with 10 positively selected sites played an essential role in DNA repair under extreme environments.</p><p><strong>Conclusions: </strong>This first chromosome-level genome assembly of C. heheva reveals the hydrothermal adaptation of holothurians. As the first chromosome-level genome of order Apodida, this genome will provide the resource for investigating the evolution of class Holothuroidea.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10764150/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139086481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giad112
Nicolás Gaitán, Jorge Duitama
Background: Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed.
Findings: We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths.
Conclusion: The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.
{"title":"A graph clustering algorithm for detection and genotyping of structural variants from long reads.","authors":"Nicolás Gaitán, Jorge Duitama","doi":"10.1093/gigascience/giad112","DOIUrl":"10.1093/gigascience/giad112","url":null,"abstract":"<p><strong>Background: </strong>Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed.</p><p><strong>Findings: </strong>We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths.</p><p><strong>Conclusion: </strong>The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10783151/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139416802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giad120
Justin Sonneck, Yu Zhou, Jianxu Chen
Over the past decade, deep learning (DL) research in computer vision has been growing rapidly, with many advances in DL-based image analysis methods for biomedical problems. In this work, we introduce MMV_Im2Im, a new open-source Python package for image-to-image transformation in bioimaging applications. MMV_Im2Im is designed with a generic image-to-image transformation framework that can be used for a wide range of tasks, including semantic segmentation, instance segmentation, image restoration, image generation, and so on. Our implementation takes advantage of state-of-the-art machine learning engineering techniques, allowing researchers to focus on their research without worrying about engineering details. We demonstrate the effectiveness of MMV_Im2Im on more than 10 different biomedical problems, showcasing its general potentials and applicabilities. For computational biomedical researchers, MMV_Im2Im provides a starting point for developing new biomedical image analysis or machine learning algorithms, where they can either reuse the code in this package or fork and extend this package to facilitate the development of new methods. Experimental biomedical researchers can benefit from this work by gaining a comprehensive view of the image-to-image transformation concept through diversified examples and use cases. We hope this work can give the community inspirations on how DL-based image-to-image transformation can be integrated into the assay development process, enabling new biomedical studies that cannot be done only with traditional experimental assays. To help researchers get started, we have provided source code, documentation, and tutorials for MMV_Im2Im at [https://github.com/MMV-Lab/mmv_im2im] under MIT license.
{"title":"MMV_Im2Im: an open-source microscopy machine vision toolbox for image-to-image transformation.","authors":"Justin Sonneck, Yu Zhou, Jianxu Chen","doi":"10.1093/gigascience/giad120","DOIUrl":"10.1093/gigascience/giad120","url":null,"abstract":"<p><p>Over the past decade, deep learning (DL) research in computer vision has been growing rapidly, with many advances in DL-based image analysis methods for biomedical problems. In this work, we introduce MMV_Im2Im, a new open-source Python package for image-to-image transformation in bioimaging applications. MMV_Im2Im is designed with a generic image-to-image transformation framework that can be used for a wide range of tasks, including semantic segmentation, instance segmentation, image restoration, image generation, and so on. Our implementation takes advantage of state-of-the-art machine learning engineering techniques, allowing researchers to focus on their research without worrying about engineering details. We demonstrate the effectiveness of MMV_Im2Im on more than 10 different biomedical problems, showcasing its general potentials and applicabilities. For computational biomedical researchers, MMV_Im2Im provides a starting point for developing new biomedical image analysis or machine learning algorithms, where they can either reuse the code in this package or fork and extend this package to facilitate the development of new methods. Experimental biomedical researchers can benefit from this work by gaining a comprehensive view of the image-to-image transformation concept through diversified examples and use cases. We hope this work can give the community inspirations on how DL-based image-to-image transformation can be integrated into the assay development process, enabling new biomedical studies that cannot be done only with traditional experimental assays. To help researchers get started, we have provided source code, documentation, and tutorials for MMV_Im2Im at [https://github.com/MMV-Lab/mmv_im2im] under MIT license.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10821710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139570421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Rhododendron nivale subsp. boreale Philipson et M. N. Philipson is an alpine woody species with ornamental qualities that serve as the predominant species in mountainous scrub habitats found at an altitude of ∼4,200 m. As a high-altitude woody polyploid, this species may serve as a model to understand how plants adapt to alpine environments. Despite its ecological significance, the lack of genomic resources has hindered a comprehensive understanding of its evolutionary and adaptive characteristics in high-altitude mountainous environments.
Findings: We sequenced and assembled the genome of R. nivale subsp. boreale, an assembly of the first subgenus Rhododendron and the first high-altitude woody flowering tetraploid, contributing an important genomic resource for alpine woody flora. The assembly included 52 pseudochromosomes (scaffold N50 = 42.93 Mb; BUSCO = 98.8%; QV = 45.51; S-AQI = 98.69), which belonged to 4 haplotypes, harboring 127,810 predicted protein-coding genes. Conjoint k-mer analysis, collinearity assessment, and phylogenetic investigation corroborated autotetraploid identity. Comparative genomic analysis revealed that R. nivale subsp. boreale originated as a neopolyploid of R. nivale and underwent 2 rounds of ancient polyploidy events. Transcriptional expression analysis showed that differences in expression between alleles were common and randomly distributed in the genome. We identified extended gene families and signatures of positive selection that are involved not only in adaptation to the mountaintop ecosystem (response to stress and developmental regulation) but also in autotetraploid reproduction (meiotic stabilization). Additionally, the expression levels of the (group VII ethylene response factor transcription factors) ERF VIIs were significantly higher than the mean global gene expression. We suspect that these changes have enabled the success of this species at high altitudes.
Conclusions: We assembled the first high-altitude autopolyploid genome and achieved chromosome-level assembly within the subgenus Rhododendron. In addition, a high-altitude adaptation strategy of R. nivale subsp. boreale was reasonably speculated. This study provides valuable data for the exploration of alpine mountaintop adaptations and the correlation between extreme environments and species polyploidization.
背景:Rhododendron nivale subsp. boreale Philipson et M. N. Philipson 是一种具有观赏价值的高山木本物种,是海拔 4,200 米以上山地灌丛生境中的主要物种。尽管其生态学意义重大,但基因组资源的缺乏阻碍了对其在高海拔山区环境中的进化和适应特征的全面了解:nivale subsp. boreale的基因组进行了测序和组装,这是杜鹃花亚属的第一个基因组,也是第一个高海拔木本开花四倍体,为高山木本植物群提供了重要的基因组资源。该组配包括 52 个假染色体(支架 N50 = 42.93 Mb;BUSCO = 98.8%;QV = 45.51;S-AQI = 98.69),分属 4 个单倍型,包含 127,810 个预测的蛋白编码基因。联合 k-mer 分析、共线性评估和系统发育调查证实了自四倍体的身份。比较基因组分析表明,R. nivale亚种起源于R. nivale的新多倍体,经历了两轮古老的多倍体事件。转录表达分析表明,等位基因之间的表达差异很常见,并且随机分布在基因组中。我们发现了扩展的基因家族和正选择的特征,它们不仅参与了对山顶生态系统的适应(对压力的反应和发育调节),还参与了自交四倍体的繁殖(减数分裂的稳定)。此外,(第七组乙烯响应因子转录因子)ERF VIIs 的表达水平明显高于全球基因的平均表达水平。我们怀疑这些变化使该物种在高海拔地区获得了成功:我们组装了首个高海拔自多倍体基因组,并在杜鹃花亚属中实现了染色体组水平的组装。此外,我们还合理推测了北海道杜鹃亚种的高海拔适应策略。该研究为探索高山山顶适应性以及极端环境与物种多倍体化之间的相关性提供了宝贵的数据。
{"title":"The first high-altitude autotetraploid haplotype-resolved genome assembled (Rhododendron nivale subsp. boreale) provides new insights into mountaintop adaptation.","authors":"Zhen-Yu Lyu, Xiong-Li Zhou, Si-Qi Wang, Gao-Ming Yang, Wen-Guang Sun, Jie-Yu Zhang, Rui Zhang, Shi-Kang Shen","doi":"10.1093/gigascience/giae052","DOIUrl":"10.1093/gigascience/giae052","url":null,"abstract":"<p><strong>Background: </strong>Rhododendron nivale subsp. boreale Philipson et M. N. Philipson is an alpine woody species with ornamental qualities that serve as the predominant species in mountainous scrub habitats found at an altitude of ∼4,200 m. As a high-altitude woody polyploid, this species may serve as a model to understand how plants adapt to alpine environments. Despite its ecological significance, the lack of genomic resources has hindered a comprehensive understanding of its evolutionary and adaptive characteristics in high-altitude mountainous environments.</p><p><strong>Findings: </strong>We sequenced and assembled the genome of R. nivale subsp. boreale, an assembly of the first subgenus Rhododendron and the first high-altitude woody flowering tetraploid, contributing an important genomic resource for alpine woody flora. The assembly included 52 pseudochromosomes (scaffold N50 = 42.93 Mb; BUSCO = 98.8%; QV = 45.51; S-AQI = 98.69), which belonged to 4 haplotypes, harboring 127,810 predicted protein-coding genes. Conjoint k-mer analysis, collinearity assessment, and phylogenetic investigation corroborated autotetraploid identity. Comparative genomic analysis revealed that R. nivale subsp. boreale originated as a neopolyploid of R. nivale and underwent 2 rounds of ancient polyploidy events. Transcriptional expression analysis showed that differences in expression between alleles were common and randomly distributed in the genome. We identified extended gene families and signatures of positive selection that are involved not only in adaptation to the mountaintop ecosystem (response to stress and developmental regulation) but also in autotetraploid reproduction (meiotic stabilization). Additionally, the expression levels of the (group VII ethylene response factor transcription factors) ERF VIIs were significantly higher than the mean global gene expression. We suspect that these changes have enabled the success of this species at high altitudes.</p><p><strong>Conclusions: </strong>We assembled the first high-altitude autopolyploid genome and achieved chromosome-level assembly within the subgenus Rhododendron. In addition, a high-altitude adaptation strategy of R. nivale subsp. boreale was reasonably speculated. This study provides valuable data for the exploration of alpine mountaintop adaptations and the correlation between extreme environments and species polyploidization.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae028
Chao Yang, Zhenmiao Zhang, Yufen Huang, Xuefeng Xie, Herui Liao, Jin Xiao, Werner Pieter Veldsman, Kejing Yin, Xiaodong Fang, Lu Zhang
Background: Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform.
Findings: To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots.
Conclusions: LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.
背景:链接读数测序技术可产生高碱基质量的短读数,这些读数包含长程 DNA 连接性的推断信息。链接读数技术的这些优势众所周知,并已在许多人类基因组和元基因组研究中得到证实。然而,现有的链接读数分析管道(如 Long Ranger)主要是为处理人类基因组测序数据而开发的,并不适合分析元基因组测序数据。此外,链接读数分析管道通常仅限于一种特定的测序平台:为了解决这些局限性,我们提出了链接读取工具包(LRTK),这是一个统一、通用的工具包,可用于处理人类基因组和元基因组的链接读取测序数据,不受平台限制。LRTK 提供的功能包括链接读数模拟、条形码测序纠错、条形码感知读数比对和元基因组组装、长 DNA 片段重建、分类学分类和量化以及条形码辅助基因组变异调用和分期。LRTK 能够自动处理多个样本,并为用户提供在处理原始测序数据期间和整个下游分析过程中的多个检查点生成可重现报告的选项。我们将 LRTK 应用于人类基因组和元基因组的模拟、模拟群落和真实数据集的链接读数。我们展示了 LRTK 从之前的基准研究中生成性能比较结果的能力,并以可供出版的 HTML 文档图报告这些结果:LRTK 提供了全面而灵活的模块,以及易于使用的基于 Python- 的工作流程,用于处理链接读数测序数据集,从而填补了该领域目前由以平台为中心的基因组特定链接读数数据分析工具造成的空白。
{"title":"LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome.","authors":"Chao Yang, Zhenmiao Zhang, Yufen Huang, Xuefeng Xie, Herui Liao, Jin Xiao, Werner Pieter Veldsman, Kejing Yin, Xiaodong Fang, Lu Zhang","doi":"10.1093/gigascience/giae028","DOIUrl":"10.1093/gigascience/giae028","url":null,"abstract":"<p><strong>Background: </strong>Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform.</p><p><strong>Findings: </strong>To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK's ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots.</p><p><strong>Conclusions: </strong>LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11170215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141310460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}