Pub Date : 2024-11-06eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.140
Marc A Gumangan, Zheyu Pan, Thomas P Lozito
The vast majority of gecko species are capable of tail regeneration, but singular geckos of Correlophus, Uroplatus, and Nephrurus genera are unable to regrow lost tails. Of these non-regenerative geckos, the crested gecko (Correlophus ciliatus) is distinguished by ready availability, ease of care, high productivity, and hybridization potential. These features make C. ciliatus particularly suited as a model for studying the genetic, molecular, and cellular mechanisms underlying loss of tail regeneration capabilities. We report a contiguous genome of C. ciliatus with a total size of 1.65 Gb, 152 scaffolds, L50 of 6, and N50 of 109 Mb. Repetitive content consists of 40.41% of the genome, and a total of 30,780 genes were annotated. Our assembly of the crested gecko genome provides a valuable resource for future comparative genomic studies between non-regenerative and regenerative geckos and other squamate reptiles.
Findings: We report genome sequencing, assembly, and annotation for the crested gecko, Correlophus ciliatus.
{"title":"Chromosome-level genome assembly and annotation of the crested gecko, <i>Correlophus ciliatus</i>, a lizard incapable of tail regeneration.","authors":"Marc A Gumangan, Zheyu Pan, Thomas P Lozito","doi":"10.46471/gigabyte.140","DOIUrl":"10.46471/gigabyte.140","url":null,"abstract":"<p><p>The vast majority of gecko species are capable of tail regeneration, but singular geckos of <i>Correlophus</i>, <i>Uroplatus</i>, and <i>Nephrurus</i> genera are unable to regrow lost tails. Of these non-regenerative geckos, the crested gecko (<i>Correlophus ciliatus</i>) is distinguished by ready availability, ease of care, high productivity, and hybridization potential. These features make <i>C. ciliatus</i> particularly suited as a model for studying the genetic, molecular, and cellular mechanisms underlying loss of tail regeneration capabilities. We report a contiguous genome of <i>C. ciliatus</i> with a total size of 1.65 Gb, 152 scaffolds, L50 of 6, and N50 of 109 Mb. Repetitive content consists of 40.41% of the genome, and a total of 30,780 genes were annotated. Our assembly of the crested gecko genome provides a valuable resource for future comparative genomic studies between non-regenerative and regenerative geckos and other squamate reptiles.</p><p><strong>Findings: </strong>We report genome sequencing, assembly, and annotation for the crested gecko, <i>Correlophus ciliatus</i>.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte140"},"PeriodicalIF":0.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.141
Peiyu Zong, Wenpeng Deng, Jian Liu, Jue Ruan
The rapid advancements in sequencing length necessitate the adoption of increasingly efficient sequence alignment algorithms. The Needleman-Wunsch method introduces the foundational dynamic-programming matrix calculation for global alignment, which evaluates the overall alignment of sequences. However, this method is known to be highly time-consuming. The proposed TSTA algorithm leverages both vector-level and thread-level parallelism to accelerate pairwise and multiple sequence alignments.
Availability and implementation: Source codes are available at https://github.com/bxskdh/TSTA.
{"title":"TSTA: thread and SIMD-based trapezoidal pairwise/multiple sequence-alignment method.","authors":"Peiyu Zong, Wenpeng Deng, Jian Liu, Jue Ruan","doi":"10.46471/gigabyte.141","DOIUrl":"10.46471/gigabyte.141","url":null,"abstract":"<p><p>The rapid advancements in sequencing length necessitate the adoption of increasingly efficient sequence alignment algorithms. The Needleman-Wunsch method introduces the foundational dynamic-programming matrix calculation for global alignment, which evaluates the overall alignment of sequences. However, this method is known to be highly time-consuming. The proposed TSTA algorithm leverages both vector-level and thread-level parallelism to accelerate pairwise and multiple sequence alignments.</p><p><strong>Availability and implementation: </strong>Source codes are available at https://github.com/bxskdh/TSTA.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte141"},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142633945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-21eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.139
Paolo Cozzi, Arianna Manunza, Johanna Ramirez-Diaz, Valentina Tsartsianidou, Konstantinos Gkagkavouzis, Pablo Peraza, Anna Maria Johansson, Juan José Arranz, Fernando Freire, Szilvia Kusza, Filippo Biscarini, Lucy Peters, Gwenola Tosser-Klopp, Gabriel Ciappesoni, Alexandros Triantafyllidis, Rachel Rupp, Bertrand Servin, Alessandra Stella
Underutilized sheep and goat breeds can adapt to challenging environments due to their genetics. Integrating publicly available genomic datasets with new data will facilitate genetic diversity analyses; however, this process is complicated by data discrepancies, such as outdated assembly versions or different data formats. Here, we present the SMARTER-database, a collection of tools and scripts to standardize genomic data and metadata, mainly from SNP chip arrays on global small ruminant populations, with a focus on reproducibility. SMARTER-database harmonizes genotypes for about 12,000 sheep and 6,000 goats to a uniform coding and assembly version. Users can access the genotype data via File Transfer Protocol and interact with the metadata through a web interface or using their custom scripts, enabling efficient filtering and selection of samples. These tools will empower researchers to focus on the crucial aspects of adaptation and contribute to livestock sustainability, leveraging the rich dataset provided by the SMARTER-database.
Availability and implementation: The code is available as open-source software under the MIT license at https://github.com/cnr-ibba/SMARTER-database.
未得到充分利用的绵羊和山羊品种因其基因而能够适应具有挑战性的环境。将公开的基因组数据集与新数据整合起来将有助于遗传多样性分析;然而,数据差异(如过期的组装版本或不同的数据格式)使这一过程变得复杂。在此,我们介绍 SMARTER 数据库,它是一个工具和脚本集合,用于标准化基因组数据和元数据,主要来自全球小反刍动物种群的 SNP 芯片阵列,重点在于可重复性。SMARTER 数据库将大约 12,000 只绵羊和 6,000 只山羊的基因型统一为统一编码和组装版本。用户可以通过文件传输协议访问基因型数据,并通过网络接口或使用自定义脚本与元数据进行交互,从而有效地筛选和选择样本。这些工具将使研究人员能够利用 SMARTER 数据库提供的丰富数据集,专注于适应性的关键方面,为畜牧业的可持续发展做出贡献:代码可在 https://github.com/cnr-ibba/SMARTER-database 网站上以 MIT 许可的开源软件形式获取。
{"title":"SMARTER-database: a tool to integrate SNP array datasets for sheep and goat breeds.","authors":"Paolo Cozzi, Arianna Manunza, Johanna Ramirez-Diaz, Valentina Tsartsianidou, Konstantinos Gkagkavouzis, Pablo Peraza, Anna Maria Johansson, Juan José Arranz, Fernando Freire, Szilvia Kusza, Filippo Biscarini, Lucy Peters, Gwenola Tosser-Klopp, Gabriel Ciappesoni, Alexandros Triantafyllidis, Rachel Rupp, Bertrand Servin, Alessandra Stella","doi":"10.46471/gigabyte.139","DOIUrl":"https://doi.org/10.46471/gigabyte.139","url":null,"abstract":"<p><p>Underutilized sheep and goat breeds can adapt to challenging environments due to their genetics. Integrating publicly available genomic datasets with new data will facilitate genetic diversity analyses; however, this process is complicated by data discrepancies, such as outdated assembly versions or different data formats. Here, we present the SMARTER-database, a collection of tools and scripts to standardize genomic data and metadata, mainly from SNP chip arrays on global small ruminant populations, with a focus on reproducibility. SMARTER-database harmonizes genotypes for about 12,000 sheep and 6,000 goats to a uniform coding and assembly version. Users can access the genotype data via File Transfer Protocol and interact with the metadata through a web interface or using their custom scripts, enabling efficient filtering and selection of samples. These tools will empower researchers to focus on the crucial aspects of adaptation and contribute to livestock sustainability, leveraging the rich dataset provided by the SMARTER-database.</p><p><strong>Availability and implementation: </strong>The code is available as open-source software under the MIT license at https://github.com/cnr-ibba/SMARTER-database.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte139"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11519891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.137
Locedie Mansueto, Tobias Kretzschmar, Ramil Mauleon, Graham J King
Global changes in cannabis legislation after decades of stringent regulation and heightened demand for its industrial and medicinal applications have spurred recent genetic and genomics research. An international research community emerged and identified the need for a web portal to host cannabis-specific datasets that seamlessly integrates multiple data sources and serves omics-type analyses, fostering information sharing. The Tripal platform was used to host public genome assemblies, gene annotations, quantitative trait loci and genetic maps, gene and protein expression data, metabolic profiles and their sample attributes. Single nucleotide polymorphisms were called using public resequencing datasets on three genomes. Additional applications, such as SNP-Seek and MapManJS, were embedded into Tripal. A multi-omics data integration web-service Application Programming Interface (API), developed on top of existing Tripal modules, returns generic tables of samples, properties and values. Use cases demonstrate the API's utility for various omics analyses, enabling researchers to perform multi-omics analyses efficiently.
Availability and implementation: The web portal can be accessed at www.icgrc.info.
{"title":"Building a community-driven bioinformatics platform to facilitate <i>Cannabis sativa</i> multi-omics research.","authors":"Locedie Mansueto, Tobias Kretzschmar, Ramil Mauleon, Graham J King","doi":"10.46471/gigabyte.137","DOIUrl":"10.46471/gigabyte.137","url":null,"abstract":"<p><p>Global changes in cannabis legislation after decades of stringent regulation and heightened demand for its industrial and medicinal applications have spurred recent genetic and genomics research. An international research community emerged and identified the need for a web portal to host cannabis-specific datasets that seamlessly integrates multiple data sources and serves omics-type analyses, fostering information sharing. The Tripal platform was used to host public genome assemblies, gene annotations, quantitative trait loci and genetic maps, gene and protein expression data, metabolic profiles and their sample attributes. Single nucleotide polymorphisms were called using public resequencing datasets on three genomes. Additional applications, such as SNP-Seek and MapManJS, were embedded into Tripal. A multi-omics data integration web-service Application Programming Interface (API), developed on top of existing Tripal modules, returns generic tables of samples, properties and values. Use cases demonstrate the API's utility for various omics analyses, enabling researchers to perform multi-omics analyses efficiently.</p><p><strong>Availability and implementation: </strong>The web portal can be accessed at www.icgrc.info.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte137"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11515022/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.136
Vincent Noël, Marco Ruscone, Robyn Shuttleworth, Cicely K Macnamara
The extracellular matrix, composed of macromolecules like collagen fibres, provides structural support to cells and acts as a barrier that metastatic cells degrade to spread beyond the primary tumour. While agent-based frameworks, such as PhysiCell, can simulate the spatial dynamics of tumour evolution, they only implement cells as circles (2D) or spheres (3D). To model the extracellular matrix as a network of fibres, we require a new type of agent represented by line segments (2D) or cylinders (3D). Here, we present PhysiMeSS, an addon of PhysiCell, introducing a new agent type to describe fibres and their physical interactions with cells and other fibres. PhysiMeSS implementation is available at https://github.com/PhysiMeSS/PhysiMeSS and in the official PhysiCell repository. We provide examples describing the possibilities of this framework. This tool may help tackle important biological questions, such as diseases linked to dysregulation of the extracellular matrix or the processes leading to cancer metastasis.
{"title":"PhysiMeSS - a new physiCell addon for extracellular matrix modelling.","authors":"Vincent Noël, Marco Ruscone, Robyn Shuttleworth, Cicely K Macnamara","doi":"10.46471/gigabyte.136","DOIUrl":"https://doi.org/10.46471/gigabyte.136","url":null,"abstract":"<p><p>The extracellular matrix, composed of macromolecules like collagen fibres, provides structural support to cells and acts as a barrier that metastatic cells degrade to spread beyond the primary tumour. While agent-based frameworks, such as PhysiCell, can simulate the spatial dynamics of tumour evolution, they only implement cells as circles (2D) or spheres (3D). To model the extracellular matrix as a network of fibres, we require a new type of agent represented by line segments (2D) or cylinders (3D). Here, we present PhysiMeSS, an addon of PhysiCell, introducing a new agent type to describe fibres and their physical interactions with cells and other fibres. PhysiMeSS implementation is available at https://github.com/PhysiMeSS/PhysiMeSS and in the official PhysiCell repository. We provide examples describing the possibilities of this framework. This tool may help tackle important biological questions, such as diseases linked to dysregulation of the extracellular matrix or the processes leading to cancer metastasis.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte136"},"PeriodicalIF":0.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.138
Saurabh Gupta, Ankur Sharma
Recent advancements in next-generation sequencing (NGS) technologies have brought to the forefront the necessity for versatile, cost-effective tools capable of adapting to a rapidly evolving landscape. The emergence of numerous new sequencing platforms, each with unique sample preparation and sequencing requirements, underscores the importance of efficient barcode balancing for successful pooling and accurate demultiplexing of samples. Recently launched new sequencing systems claiming better affordability comparable to more established platforms further exemplifies these challenges, especially when libraries originally prepared for one platform need conversion to another. In response to this dynamic environment, we introduce NucBalancer, a Shiny app developed for the optimal selection of barcode sequences. While initially tailored to meet the nucleotide, composition challenges specific to G400 and T7 series sequencers, NucBalancer's utility significantly broadens to accommodate the varied demands of these new sequencing technologies. Its application is particularly crucial in single-cell genomics, enabling the adaptation of libraries, such as those prepared for 10x technology, to various sequencers including G400 and T7 series sequencers. NucBalancer efficiently balances nucleotide composition and sample concentrations, reducing biases and enhancing the reliability of NGS data across platforms. Its adaptability makes it invaluable for addressing sequencing challenges, ensuring effective barcode balancing for sample pooling on any platform.
Availability and implementation: NucBalancer is implemented in R and is available at https://github.com/ersgupta/NucBalancer. Additionally, a shiny interface is available at https://ersgupta.shinyapps.io/NucBalancer/.
{"title":"NucBalancer: streamlining barcode sequence selection for optimal sample pooling for sequencing.","authors":"Saurabh Gupta, Ankur Sharma","doi":"10.46471/gigabyte.138","DOIUrl":"10.46471/gigabyte.138","url":null,"abstract":"<p><p>Recent advancements in next-generation sequencing (NGS) technologies have brought to the forefront the necessity for versatile, cost-effective tools capable of adapting to a rapidly evolving landscape. The emergence of numerous new sequencing platforms, each with unique sample preparation and sequencing requirements, underscores the importance of efficient barcode balancing for successful pooling and accurate demultiplexing of samples. Recently launched new sequencing systems claiming better affordability comparable to more established platforms further exemplifies these challenges, especially when libraries originally prepared for one platform need conversion to another. In response to this dynamic environment, we introduce NucBalancer, a Shiny app developed for the optimal selection of barcode sequences. While initially tailored to meet the nucleotide, composition challenges specific to G400 and T7 series sequencers, NucBalancer's utility significantly broadens to accommodate the varied demands of these new sequencing technologies. Its application is particularly crucial in single-cell genomics, enabling the adaptation of libraries, such as those prepared for 10x technology, to various sequencers including G400 and T7 series sequencers. NucBalancer efficiently balances nucleotide composition and sample concentrations, reducing biases and enhancing the reliability of NGS data across platforms. Its adaptability makes it invaluable for addressing sequencing challenges, ensuring effective barcode balancing for sample pooling on any platform.</p><p><strong>Availability and implementation: </strong>NucBalancer is implemented in R and is available at https://github.com/ersgupta/NucBalancer. Additionally, a shiny interface is available at https://ersgupta.shinyapps.io/NucBalancer/.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte138"},"PeriodicalIF":0.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11488490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.135
Locedie Mansueto, Kenneth L McNally, Tobias Kretzschmar, Ramil Mauleon
A growing interest in Cannabis sativa uses for food, fiber, and medicine, and recent changes in regulations have spurred numerous genomic studies of this once-prohibited plant. Cannabis research uses Next Generation Sequencing technologies for genomics and transcriptomics. While other crops have genome portals enabling access and analysis of numerous genotyping data from diverse accessions, leading to the discovery of alleles for important traits, this is absent for cannabis. The CannSeek web portal aims to address this gap. Single nucleotide polymorphism datasets were generated by identifying genome variants from public resequencing data and genome assemblies. Results and accompanying trait data are hosted in the CannSeek web application, built using the Rice SNP-Seek infrastructure with improvements to allow multiple reference genomes and provide a web-service Application Programming Interface. The tools built into the portal allow phylogenetic analyses, varietal grouping and identifications, and favorable haplotype discovery for cannabis accessions using public sequencing data.
Availability and implementation: The CannSeek portal is available at https://icgrc.info/cannseek, https://icgrc.info/genotype_viewer.
{"title":"CannSeek? Yes we Can! An open-source single nucleotide polymorphism database and analysis portal for <i>Cannabis sativa</i>.","authors":"Locedie Mansueto, Kenneth L McNally, Tobias Kretzschmar, Ramil Mauleon","doi":"10.46471/gigabyte.135","DOIUrl":"https://doi.org/10.46471/gigabyte.135","url":null,"abstract":"<p><p>A growing interest in <i>Cannabis sativa</i> uses for food, fiber, and medicine, and recent changes in regulations have spurred numerous genomic studies of this once-prohibited plant. Cannabis research uses Next Generation Sequencing technologies for genomics and transcriptomics. While other crops have genome portals enabling access and analysis of numerous genotyping data from diverse accessions, leading to the discovery of alleles for important traits, this is absent for cannabis. The CannSeek web portal aims to address this gap. Single nucleotide polymorphism datasets were generated by identifying genome variants from public resequencing data and genome assemblies. Results and accompanying trait data are hosted in the CannSeek web application, built using the Rice SNP-Seek infrastructure with improvements to allow multiple reference genomes and provide a web-service Application Programming Interface. The tools built into the portal allow phylogenetic analyses, varietal grouping and identifications, and favorable haplotype discovery for cannabis accessions using public sequencing data.</p><p><strong>Availability and implementation: </strong>The CannSeek portal is available at https://icgrc.info/cannseek, https://icgrc.info/genotype_viewer.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte135"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480739/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.134
Marcel Nebenführ, David Prochotta, Alexander Ben Hamadou, Axel Janke, Charlotte Gerheim, Christian Betz, Carola Greve, Hanno Jörn Bolz
The time required for genome sequencing and de novo assembly depends on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow, often constrained by external sequencing services. Bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities, we aimed at generating a high-quality mammalian de novo genome in minimal time. We present the first chromosome-level genome assembly of the Whippet, using PacBio long-read high-fidelity sequencing and reference-guided scaffolding. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. Annotation using mammalian genomes and transcriptome data yielded 28,383 transcripts, 90.9% complete BUSCO genes, and identified 36.5% repeat content. Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week, adding another high-quality reference genome to the available sequences of domestic dog breeds.
{"title":"High-speed whole-genome sequencing of a Whippet: Rapid chromosome-level assembly and annotation of an extremely fast dog's genome.","authors":"Marcel Nebenführ, David Prochotta, Alexander Ben Hamadou, Axel Janke, Charlotte Gerheim, Christian Betz, Carola Greve, Hanno Jörn Bolz","doi":"10.46471/gigabyte.134","DOIUrl":"10.46471/gigabyte.134","url":null,"abstract":"<p><p>The time required for genome sequencing and <i>de novo</i> assembly depends on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow, often constrained by external sequencing services. Bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities, we aimed at generating a high-quality mammalian <i>de novo</i> genome in minimal time. We present the first chromosome-level genome assembly of the Whippet, using PacBio long-read high-fidelity sequencing and reference-guided scaffolding. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. Annotation using mammalian genomes and transcriptome data yielded 28,383 transcripts, 90.9% complete BUSCO genes, and identified 36.5% repeat content. Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week, adding another high-quality reference genome to the available sequences of domestic dog breeds.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte134"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418881/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-31eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.132
Ann-Kathrin Dörr, Josefa Welling, Adrian Dörr, Jule Gosch, Hannah Möhlen, Ricarda Schmithausen, Jan Kehrmann, Folker Meyer, Ivana Kraiselburd
Background: Next-generation sequencing for microbial communities has become a standard technique. However, the computational analysis remains resource-intensive. With declining costs and growing adoption of sequencing-based methods in many fields, validated, fully automated, reproducible and flexible pipelines are increasingly essential in various scientific fields.
Results: We present RiboSnake, a validated, automated, reproducible QIIME2-based pipeline implemented in Snakemake for analysing 16S rRNA gene amplicon sequencing data. RiboSnake includes pre-packaged validated parameter sets optimized for different sample types, from environmental samples to patient data. The configuration packages can be easily adapted and shared, requiring minimal user input.
Conclusion: RiboSnake is a new alternative for researchers employing 16S rRNA gene amplicon sequencing and looking for a customizable and user-friendly pipeline for microbiome analyses with in vitro validated settings. By automating the analysis with validated parameters for diverse sample types, RiboSnake enhances existing methods significantly. The workflow repository can be found on GitHub (https://github.com/IKIM-Essen/RiboSnake).
{"title":"RiboSnake - a user-friendly, robust, reproducible, multipurpose and documentation-extensive pipeline for 16S rRNA gene microbiome analysis.","authors":"Ann-Kathrin Dörr, Josefa Welling, Adrian Dörr, Jule Gosch, Hannah Möhlen, Ricarda Schmithausen, Jan Kehrmann, Folker Meyer, Ivana Kraiselburd","doi":"10.46471/gigabyte.132","DOIUrl":"10.46471/gigabyte.132","url":null,"abstract":"<p><strong>Background: </strong>Next-generation sequencing for microbial communities has become a standard technique. However, the computational analysis remains resource-intensive. With declining costs and growing adoption of sequencing-based methods in many fields, validated, fully automated, reproducible and flexible pipelines are increasingly essential in various scientific fields.</p><p><strong>Results: </strong>We present RiboSnake, a validated, automated, reproducible QIIME2-based pipeline implemented in Snakemake for analysing <i>16S rRNA</i> gene amplicon sequencing data. RiboSnake includes pre-packaged validated parameter sets optimized for different sample types, from environmental samples to patient data. The configuration packages can be easily adapted and shared, requiring minimal user input.</p><p><strong>Conclusion: </strong>RiboSnake is a new alternative for researchers employing <i>16S rRNA</i> gene amplicon sequencing and looking for a customizable and user-friendly pipeline for microbiome analyses with <i>in vitro</i> validated settings. By automating the analysis with validated parameters for diverse sample types, RiboSnake enhances existing methods significantly. The workflow repository can be found on GitHub (https://github.com/IKIM-Essen/RiboSnake).</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte132"},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11448241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.133
Jorge Buenabad-Chavez, Evelyn Greeves, James P J Chong, Emma Rand
Amazon Web Services (AWS) instances provide a convenient way to run training on complex 'omics data analysis workflows without requiring participants to install software packages or store large data volumes locally. However, efficiently managing dozens of instances is challenging for training providers. We present a set of Bash scripts that make it quick and easy to manage Linux AWS instances pre-configured with all the software analysis tools and data needed for a course, and accessible using encrypted login keys and optional domain names. Creating over 30 instances takes 10-15 minutes. A comprehensive online tutorial describes how to set up and use an AWS account and the scripts, and how to customise AWS instance templates with other software tools and data. We anticipate that others offering similar training may benefit from using the scripts regardless of the analyses being taught.
{"title":"Automated management of AWS instances for training.","authors":"Jorge Buenabad-Chavez, Evelyn Greeves, James P J Chong, Emma Rand","doi":"10.46471/gigabyte.133","DOIUrl":"https://doi.org/10.46471/gigabyte.133","url":null,"abstract":"<p><p>Amazon Web Services (AWS) instances provide a convenient way to run training on complex 'omics data analysis workflows without requiring participants to install software packages or store large data volumes locally. However, efficiently managing dozens of instances is challenging for training providers. We present a set of Bash scripts that make it quick and easy to manage Linux AWS instances pre-configured with all the software analysis tools and data needed for a course, and accessible using encrypted login keys and optional domain names. Creating over 30 instances takes 10-15 minutes. A comprehensive online tutorial describes how to set up and use an AWS account and the scripts, and how to customise AWS instance templates with other software tools and data. We anticipate that others offering similar training may benefit from using the scripts regardless of the analyses being taught.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte133"},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11382607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}