The common house sparrow, Passer domesticus, is a small bird belonging to the family Passeridae. Here, we provide high-quality whole-genome sequencing data along with its assembly for the house sparrow. The final genome assembly was generated using a workflow that included Shovill, SPAdes, MaSuRCA, and BUSCO. The assembly consists of contigs spanning 268,193 bases and coalescing around a 922 MB sized reference genome. We used rigorous statistical thresholds to check the coverage, as the Passer genome showed considerable similarity to the Gallus gallus (chicken) and Taeniopygia guttata (Zebra finch) genomes, also providing functional annotations. This new annotated genome assembly will be a valuable resource for comparative and population genomic analyses of passerine, avian, and vertebrate evolution.
{"title":"Whole genome sequencing and assembly of the house sparrow, <i>Passer domesticus</i>.","authors":"Vikas Kumar, Gopesh Sharma, Sankalp Sharma, Samvrutha Prasad, Shailesh Desai, Toral Vaishnani, Dalia Vishnudasan, Gopinathan Maheswaran, Kaomud Tyagi, Inderjeet Tyagi, Polavarapu B Kavi Kishor, Gyaneshwer Chaubey, Prashanth Suravajhala","doi":"10.46471/gigabyte.161","DOIUrl":"10.46471/gigabyte.161","url":null,"abstract":"<p><p>The common house sparrow, <i>Passer domesticus</i>, is a small bird belonging to the family Passeridae. Here, we provide high-quality whole-genome sequencing data along with its assembly for the house sparrow. The final genome assembly was generated using a workflow that included Shovill, SPAdes, MaSuRCA, and BUSCO. The assembly consists of contigs spanning 268,193 bases and coalescing around a 922 MB sized reference genome. We used rigorous statistical thresholds to check the coverage, as the Passer genome showed considerable similarity to the <i>Gallus gallus</i> (chicken) and <i>Taeniopygia guttata</i> (Zebra finch) genomes, also providing functional annotations. This new annotated genome assembly will be a valuable resource for comparative and population genomic analyses of passerine, avian, and vertebrate evolution.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte161"},"PeriodicalIF":1.2,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-04eCollection Date: 2025-01-01DOI: 10.46471/gigabyte.160
Wudmir Y Rojas, Zargham Ahmad, Julia Jakiela, Helge Hecht, Jana Klánová, Elliott J Price
High-performance computing (HPC) environments are crucial for computational research, including quantum chemistry (QC), but pose challenges for non-expert users. Researchers with limited computational knowledge struggle to utilise domain-specific software and access mass spectra prediction for in silico annotation. Here, we provide a robust workflow that leverages interoperable file formats for molecular structures to ensure integration across various QC tools. The quantum chemistry package for mass spectral predictions after electron ionization or collision-induced dissociation has been integrated into the Galaxy platform, enabling automated analysis of fragmentation mechanisms. The extended tight binding quantum chemistry package, chosen for its balance between accuracy and computational efficiency, provides molecular geometry optimisation. A Docker image encapsulates the necessary software stack. We demonstrated the workflow for four molecules, highlighting the scalability and efficiency of our solution via runtime performance analysis. This work shows how non-HPC users can make these predictions effortlessly, using advanced computational tools without needing in-depth expertise.
{"title":"Galaxy QCxMS for straightforward semi-empirical quantum mechanical EI-MS prediction.","authors":"Wudmir Y Rojas, Zargham Ahmad, Julia Jakiela, Helge Hecht, Jana Klánová, Elliott J Price","doi":"10.46471/gigabyte.160","DOIUrl":"10.46471/gigabyte.160","url":null,"abstract":"<p><p>High-performance computing (HPC) environments are crucial for computational research, including quantum chemistry (QC), but pose challenges for non-expert users. Researchers with limited computational knowledge struggle to utilise domain-specific software and access mass spectra prediction for <i>in silico</i> annotation. Here, we provide a robust workflow that leverages interoperable file formats for molecular structures to ensure integration across various QC tools. The quantum chemistry package for mass spectral predictions after electron ionization or collision-induced dissociation has been integrated into the Galaxy platform, enabling automated analysis of fragmentation mechanisms. The extended tight binding quantum chemistry package, chosen for its balance between accuracy and computational efficiency, provides molecular geometry optimisation. A Docker image encapsulates the necessary software stack. We demonstrated the workflow for four molecules, highlighting the scalability and efficiency of our solution via runtime performance analysis. This work shows how non-HPC users can make these predictions effortlessly, using advanced computational tools without needing in-depth expertise.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte160"},"PeriodicalIF":0.0,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12257954/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-24eCollection Date: 2025-01-01DOI: 10.46471/gigabyte.158
Kevin Stachelek, Bhavana Bhat, David Cobrinik
Chevreul is an open-source R Bioconductor package and interactive R Shiny app for processing and visualising single-cell RNA sequencing (scRNA-seq) data. Chevreul differs from other scRNA-seq analysis packages in its ease of use, capacity to analyze full-length RNA sequencing data for exon coverage and transcript isoform inference, and support for batch correction. Chevreul enables exploratory analyses of scRNA-seq data using Bioconductor SingleCellExperiment objects (or converted Seurat objects), including batch integration, quality control filtering, read count normalization and transformation, dimensionality reduction, clustering at a range of resolutions, and cluster marker gene identification. Processed data can be visualized in the R Shiny app. Gene or transcript expression can be visualized using PCA, tSNE, UMAP, heatmaps, or violin plots; differential expression can be evaluated with several statistical tests. Chevreul also provides accessible tools for isoform-level analyses and alternative splicing detection. Chevreul empowers researchers without programming experience to analyze full-length scRNA-seq data.
Availability & implementation: Chevreul is implemented in R, and the R package and integrated Shiny application are freely available at https://github.com/cobriniklab/chevreul with constituent packages hosted on Bioconductor at https://bioconductor.org/packages/chevreulProcess, https://bioconductor.org/packages/chevreulPlot, and https://bioconductor.org/packages/chevreulShiny.
{"title":"Chevreul: an R bioconductor package for exploratory analysis of full-length single cell sequencing.","authors":"Kevin Stachelek, Bhavana Bhat, David Cobrinik","doi":"10.46471/gigabyte.158","DOIUrl":"10.46471/gigabyte.158","url":null,"abstract":"<p><p>Chevreul is an open-source R Bioconductor package and interactive R Shiny app for processing and visualising single-cell RNA sequencing (scRNA-seq) data. Chevreul differs from other scRNA-seq analysis packages in its ease of use, capacity to analyze full-length RNA sequencing data for exon coverage and transcript isoform inference, and support for batch correction. Chevreul enables exploratory analyses of scRNA-seq data using Bioconductor SingleCellExperiment objects (or converted Seurat objects), including batch integration, quality control filtering, read count normalization and transformation, dimensionality reduction, clustering at a range of resolutions, and cluster marker gene identification. Processed data can be visualized in the R Shiny app. Gene or transcript expression can be visualized using PCA, tSNE, UMAP, heatmaps, or violin plots; differential expression can be evaluated with several statistical tests. Chevreul also provides accessible tools for isoform-level analyses and alternative splicing detection. Chevreul empowers researchers without programming experience to analyze full-length scRNA-seq data.</p><p><strong>Availability & implementation: </strong>Chevreul is implemented in R, and the R package and integrated Shiny application are freely available at https://github.com/cobriniklab/chevreul with constituent packages hosted on Bioconductor at https://bioconductor.org/packages/chevreulProcess, https://bioconductor.org/packages/chevreulPlot, and https://bioconductor.org/packages/chevreulShiny.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte158"},"PeriodicalIF":1.2,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320507/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-20eCollection Date: 2025-01-01DOI: 10.46471/gigabyte.157
Roberto Márquez, Denis Jacob Machado, Reyhaneh Nouri, Kerry L Gendreau, Daniel Janies, Ralph A Saporito, Marcus R Kronforst, Taran Grant
Dendrobatid poison frogs have become well established as model systems in several fields of biology. Nevertheless, the development of molecular and genetic resources for these frogs has been hindered by their large, highly repetitive genomes, which have proven difficult to assemble. Here we present a draft assembly for Phyllobates terribilis (12.6 Gb), generated using a combination of sequencing platforms and bioinformatic approaches. Similar to other poison frog sequencing efforts, we recovered a highly fragmented assembly, likely due to the genome's large size and very high repeat content, which we estimated to be ≍88%. Despite the assembly's low contiguity, we were able to annotate multiple members of three gene sets of interest (voltage-gated sodium channels and Notch and Wnt signaling pathways), demonstrating the usefulness of our assembly to the amphibian research community.
{"title":"A draft genome assembly for the dart-poison frog <i>Phyllobates terribilis</i>.","authors":"Roberto Márquez, Denis Jacob Machado, Reyhaneh Nouri, Kerry L Gendreau, Daniel Janies, Ralph A Saporito, Marcus R Kronforst, Taran Grant","doi":"10.46471/gigabyte.157","DOIUrl":"10.46471/gigabyte.157","url":null,"abstract":"<p><p>Dendrobatid poison frogs have become well established as model systems in several fields of biology. Nevertheless, the development of molecular and genetic resources for these frogs has been hindered by their large, highly repetitive genomes, which have proven difficult to assemble. Here we present a draft assembly for <i>Phyllobates terribilis</i> (12.6 Gb), generated using a combination of sequencing platforms and bioinformatic approaches. Similar to other poison frog sequencing efforts, we recovered a highly fragmented assembly, likely due to the genome's large size and very high repeat content, which we estimated to be ≍88%. Despite the assembly's low contiguity, we were able to annotate multiple members of three gene sets of interest (voltage-gated sodium channels and <i>Notch</i> and <i>Wnt</i> signaling pathways), demonstrating the usefulness of our assembly to the amphibian research community.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte157"},"PeriodicalIF":1.2,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12208295/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-27eCollection Date: 2025-01-01DOI: 10.46471/gigabyte.156
Marcel Nebenführ, David Prochotta, Maria A Nilsson, Menno J de Jong, Tunca D Yazici, Fabienne Langefeld, Malambo Muloongo, Helena Woköck, Jakob Jilg, Sina C Bender, Marvin M Zangl, Juan-Manuel Ortega Guatame, Kimberley Williams, Moritz Sonnewald, Axel Janke
Background: The lemon sole (Microstomus kitt) is a culinary fish from the family of righteye flounders (Pleuronectidae), inhabiting sandy, shallow offshore grounds of the North Sea, western Baltic Sea, English Channel, Great Britain and Ireland, Bay of Biscay, and coastal waters of Norway.
Findings: Here, we present a chromosome-level genome assembly of the lemon sole. We applied PacBio HiFi sequencing on the PacBio Revio system to generate a highly complete and contiguous reference genome.The resulting assembly has a contig N50 of 17.2 Mbp and a scaffold N50 of 27.2 Mbp. The total assembly length is 628 Mbp, comprising 24 chromosome-length scaffolds. The identification of 99.7% complete BUSCO genes indicates a high level of assembly completeness.
Conclusions: The chromosome-level genome assembly of the lemon sole provides a high-quality reference genome for future population-level genomic analyses of this commercially valuable, edible fish.
{"title":"Chromosome-level genome assembly of the lemon sole, <i>Microstomus kitt</i> (Pleuronectiformes: Pleuronectidae).","authors":"Marcel Nebenführ, David Prochotta, Maria A Nilsson, Menno J de Jong, Tunca D Yazici, Fabienne Langefeld, Malambo Muloongo, Helena Woköck, Jakob Jilg, Sina C Bender, Marvin M Zangl, Juan-Manuel Ortega Guatame, Kimberley Williams, Moritz Sonnewald, Axel Janke","doi":"10.46471/gigabyte.156","DOIUrl":"10.46471/gigabyte.156","url":null,"abstract":"<p><strong>Background: </strong>The lemon sole (<i>Microstomus kitt</i>) is a culinary fish from the family of righteye flounders (Pleuronectidae), inhabiting sandy, shallow offshore grounds of the North Sea, western Baltic Sea, English Channel, Great Britain and Ireland, Bay of Biscay, and coastal waters of Norway.</p><p><strong>Findings: </strong>Here, we present a chromosome-level genome assembly of the lemon sole. We applied PacBio HiFi sequencing on the PacBio Revio system to generate a highly complete and contiguous reference genome.The resulting assembly has a contig N50 of 17.2 Mbp and a scaffold N50 of 27.2 Mbp. The total assembly length is 628 Mbp, comprising 24 chromosome-length scaffolds. The identification of 99.7% complete BUSCO genes indicates a high level of assembly completeness.</p><p><strong>Conclusions: </strong>The chromosome-level genome assembly of the lemon sole provides a high-quality reference genome for future population-level genomic analyses of this commercially valuable, edible fish.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte156"},"PeriodicalIF":0.0,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12135936/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144227869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sinocyclocheilus, a genus of tetraploid fishes endemic to Southwest China's karst regions, are classified as second-class nationally protected species due to their fragile habitat. Limited high-quality genomic resources have hampered studies on their phylogenetic relationships and the origin of their polyploidy. Here, we present a high-quality genome assembly of the most abundant Sinocyclocheilus species, the golden-line barbel (Sinocyclocheilus grahami), by integrating PacBio long-read and Hi-C sequencing. The resulting scaffold-level genome-assembly is 1.6 Gb long, with a scaffold N50 of up to 30.7 Mb. We annotated 42,806 protein-coding genes. Also, 93.1% of the assembled genome sequences (about 1.5 Gb) and 93.8% of the total predicted genes were successfully anchored onto 48 chromosomes. Furthermore, we obtained chromosome-level genome assemblies for four other Sinocyclocheilus species (S. anophthalmus, S. maitianheensis, S. anshuiensis, and S. rhinocerous) based on homologous comparisons. These genomic resources will enable in-depth investigations on cave adaptation, improvement of economic values, and conservation of diverse Sinocyclocheilus fishes.
中华青鱼(Sinocyclocheilus)是中国西南喀斯特地区特有的四倍体鱼类,因其栖息地脆弱,被列为国家二级保护物种。有限的高质量基因组资源阻碍了它们的系统发育关系和多倍体起源的研究。在这里,我们通过整合PacBio长读和Hi-C测序,展示了最丰富的Sinocyclocheilus物种,金线barbel (Sinocyclocheilus grahami)的高质量基因组组装。由此得到的支架水平基因组组装长1.6 Gb,其中支架N50高达30.7 Mb。我们注释了42,806个蛋白质编码基因。93.1%的基因组序列(约1.5 Gb)和93.8%的预测基因成功锚定在48条染色体上。此外,我们还通过同源比较获得了另外4种中华环蚊(S. anophthalmus, S. maitianheensis, S. anshuiensis和S. rhinocerous)的染色体水平基因组组装。这些基因组资源将有助于深入研究洞穴适应、提高经济价值和保护各种中华青鱼。
{"title":"Chromosome-level genome assemblies of five <i>Sinocyclocheilus</i> species.","authors":"Chao Bian, Ruihan Li, Yuqian Ouyang, Junxing Yang, Xidong Mu, Qiong Shi","doi":"10.46471/gigabyte.155","DOIUrl":"10.46471/gigabyte.155","url":null,"abstract":"<p><p><i>Sinocyclocheilus</i>, a genus of tetraploid fishes endemic to Southwest China's karst regions, are classified as second-class nationally protected species due to their fragile habitat. Limited high-quality genomic resources have hampered studies on their phylogenetic relationships and the origin of their polyploidy. Here, we present a high-quality genome assembly of the most abundant <i>Sinocyclocheilus</i> species, the golden-line barbel (<i>Sinocyclocheilus grahami</i>), by integrating PacBio long-read and Hi-C sequencing. The resulting scaffold-level genome-assembly is 1.6 Gb long, with a scaffold N50 of up to 30.7 Mb. We annotated 42,806 protein-coding genes. Also, 93.1% of the assembled genome sequences (about 1.5 Gb) and 93.8% of the total predicted genes were successfully anchored onto 48 chromosomes. Furthermore, we obtained chromosome-level genome assemblies for four other <i>Sinocyclocheilus</i> species (<i>S. anophthalmus</i>, <i>S. maitianheensis</i>, <i>S. anshuiensis</i>, and <i>S. rhinocerous</i>) based on homologous comparisons. These genomic resources will enable in-depth investigations on cave adaptation, improvement of economic values, and conservation of diverse <i>Sinocyclocheilus</i> fishes.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte155"},"PeriodicalIF":0.0,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, which are cost-effective and accurate but often produce fragmented draft genomes. Here, we used CycloneSEQ for long-read sequencing of ATCC BAA-835, producing long-reads with an average length of 11.6 kbp and an average quality score of 14.4. Hybrid assembly with short-reads data resulted in an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method, validated across nine species, successfully assembled complete circular genomes. Hybrid assembly significantly enhances genome completeness by using long-reads to fill gaps and accurately assembling multi-copy rRNA genes, unlike short-reads alone. Data subsampling showed that combining over 500 Mbp of short-read data with 100 Mbp of long-read data yields high-quality circular assemblies. CycloneSEQ long-reads improves the assembly of circular complete genomes from mixed microbial communities; however, its base quality needs improving. Integrating DNBSEQ short-reads improved accuracy, resulting in complete and accurate assemblies.
{"title":"Efficiently constructing complete genomes with CycloneSEQ to fill gaps in bacterial draft assemblies.","authors":"Hewei Liang, Yuanqiang Zou, Mengmeng Wang, Tongyuan Hu, Haoyu Wang, Wenxin He, Yanmei Ju, Ruijin Guo, Junyi Chen, Fei Guo, Tao Zeng, Yuliang Dong, Yuning Zhang, Bo Wang, Chuanyu Liu, Xin Jin, Wenwei Zhang, Xun Xu, Liang Xiao","doi":"10.46471/gigabyte.154","DOIUrl":"https://doi.org/10.46471/gigabyte.154","url":null,"abstract":"<p><p>Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, which are cost-effective and accurate but often produce fragmented draft genomes. Here, we used CycloneSEQ for long-read sequencing of ATCC BAA-835, producing long-reads with an average length of 11.6 kbp and an average quality score of 14.4. Hybrid assembly with short-reads data resulted in an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method, validated across nine species, successfully assembled complete circular genomes. Hybrid assembly significantly enhances genome completeness by using long-reads to fill gaps and accurately assembling multi-copy rRNA genes, unlike short-reads alone. Data subsampling showed that combining over 500 Mbp of short-read data with 100 Mbp of long-read data yields high-quality circular assemblies. CycloneSEQ long-reads improves the assembly of circular complete genomes from mixed microbial communities; however, its base quality needs improving. Integrating DNBSEQ short-reads improved accuracy, resulting in complete and accurate assemblies.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte154"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12051259/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144044131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10eCollection Date: 2025-01-01DOI: 10.46471/gigabyte.153
Trinity Conn, Jill Ashey, Ross Cunning, Hollie M Putnam
Reef-building corals are integral ecosystem engineers of tropical reefs but face threats from climate change. Investigating genetic, epigenetic, and environmental factors influencing their adaptation is critical. Genomic resources are essential for understanding coral biology and guiding conservation efforts. However, genomes of the coral genus Acropora are limited to highly-studied species. Here, we present the assembly and annotation of the genome and DNA methylome of Acropora pulchra from Mo'orea, French Polynesia. Using long-read PacBio HiFi and Illumina RNASeq, we generated the most complete Acropora genome to date (BUSCO completeness of 96.7% metazoan genes). The assembly size is 518 Mbp, with 174 scaffolds, and a scaffold N50 of 17 Mbp. We predicted 40,518 protein-coding genes and 16.74% of the genome in repeats. DNA methylation in the CpG context is 14.6%. This assembly of the A. pulchra genome and DNA methylome will support studies of coastal corals in French Polynesia, aiding conservation and comparative studies of Acropora and cnidarians.
{"title":"Genome assembly and annotation of <i>Acropora pulchra</i> from Mo'orea French Polynesia.","authors":"Trinity Conn, Jill Ashey, Ross Cunning, Hollie M Putnam","doi":"10.46471/gigabyte.153","DOIUrl":"https://doi.org/10.46471/gigabyte.153","url":null,"abstract":"<p><p>Reef-building corals are integral ecosystem engineers of tropical reefs but face threats from climate change. Investigating genetic, epigenetic, and environmental factors influencing their adaptation is critical. Genomic resources are essential for understanding coral biology and guiding conservation efforts. However, genomes of the coral genus <i>Acropora</i> are limited to highly-studied species. Here, we present the assembly and annotation of the genome and DNA methylome of <i>Acropora pulchra</i> from Mo'orea, French Polynesia. Using long-read PacBio HiFi and Illumina RNASeq, we generated the most complete <i>Acropora</i> genome to date (BUSCO completeness of 96.7% metazoan genes). The assembly size is 518 Mbp, with 174 scaffolds, and a scaffold N50 of 17 Mbp. We predicted 40,518 protein-coding genes and 16.74% of the genome in repeats. DNA methylation in the CpG context is 14.6%. This assembly of the <i>A. pulchra</i> genome and DNA methylome will support studies of coastal corals in French Polynesia, aiding conservation and comparative studies of <i>Acropora</i> and cnidarians.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte153"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11985253/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-07eCollection Date: 2025-01-01DOI: 10.46471/gigabyte.152
Niema Moshiri
The study of viral and bacterial species requires the ability to load and traverse ultra-large phylogenies with tens of millions of tips, but existing tree libraries struggle to scale to these sizes. We introduce CompactTree, a lightweight header-only C++ library with a user-friendly Python wrapper for traversing ultra-large trees that can be easily incorporated into other tools. We show that CompactTree is orders of magnitude faster and requires orders of magnitude less memory than existing tree packages. CompactTree is freely accessible as an open source project: https://github.com/niemasd/CompactTree.
{"title":"CompactTree: a lightweight header-only C++ library and Python wrapper for ultra-large phylogenetics.","authors":"Niema Moshiri","doi":"10.46471/gigabyte.152","DOIUrl":"10.46471/gigabyte.152","url":null,"abstract":"<p><p>The study of viral and bacterial species requires the ability to load and traverse ultra-large phylogenies with tens of millions of tips, but existing tree libraries struggle to scale to these sizes. We introduce CompactTree, a lightweight header-only C++ library with a user-friendly Python wrapper for traversing ultra-large trees that can be easily incorporated into other tools. We show that CompactTree is orders of magnitude faster and requires orders of magnitude less memory than existing tree packages. CompactTree is freely accessible as an open source project: https://github.com/niemasd/CompactTree.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte152"},"PeriodicalIF":0.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143665474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-03eCollection Date: 2025-01-01DOI: 10.46471/gigabyte.151
George T Hall
Biologists who want to analyse their single-cell transcriptomics dataset must install and use specialist software via the command line. This is often impractical for non-bioinformaticians. Whilst the popular CELLxGENE software provides an intuitive graphical interface to facilitate analysis outside the command line, its server-side installation and execution remain complex. A version that is easier to install and run would allow non-bioinformaticians to take advantage of this valuable tool without needing to use the command line. This work introduces Portable-CELLxGENE, a standalone distribution of CELLxGENE that can be installed via a graphical interface. It contains an easy-to-use extension of the CELLxGENE-Gateway Python package to allow the analysis of multiple datasets. This tool enables non-bioinformaticians to carry out simple analyses independently.
Availability and implementation: Versions of Portable-CELLxGENE for Windows and MacOS, along with source code, are available at https://george-hall-ucl.github.io/Portable-CELLxGENE-Docs. It is licensed under the GNU General Public License v3.
{"title":"Portable-CELLxGENE: standalone executables of CELLxGENE for easy installation.","authors":"George T Hall","doi":"10.46471/gigabyte.151","DOIUrl":"10.46471/gigabyte.151","url":null,"abstract":"<p><p>Biologists who want to analyse their single-cell transcriptomics dataset must install and use specialist software via the command line. This is often impractical for non-bioinformaticians. Whilst the popular CELLxGENE software provides an intuitive graphical interface to facilitate analysis outside the command line, its server-side installation and execution remain complex. A version that is easier to install and run would allow non-bioinformaticians to take advantage of this valuable tool without needing to use the command line. This work introduces Portable-CELLxGENE, a standalone distribution of CELLxGENE that can be installed via a graphical interface. It contains an easy-to-use extension of the CELLxGENE-Gateway Python package to allow the analysis of multiple datasets. This tool enables non-bioinformaticians to carry out simple analyses independently.</p><p><strong>Availability and implementation: </strong>Versions of Portable-CELLxGENE for Windows and MacOS, along with source code, are available at https://george-hall-ucl.github.io/Portable-CELLxGENE-Docs. It is licensed under the GNU General Public License v3.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2025 ","pages":"gigabyte151"},"PeriodicalIF":0.0,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894539/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}