Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae085
Carolina Heloisa Souza-Borges, Ricardo Utsunomia, Alessandro M Varani, Marcela Uliano-Silva, Lieschen Valeria G Lira, Arno J Butzge, John F Gomez Agudelo, Shisley Manso, Milena V Freitas, Raquel B Ariede, Vito A Mastrochirico-Filho, Carolina Penaloza, Agustín Barria, Fábio Porto-Foresti, Fausto Foresti, Ricardo Hattori, Yann Guiguen, Ross D Houston, Diogo Teruo Hashimoto
Background: Megaleporinus macrocephalus (piauçu) is a Neotropical fish within Characoidei that presents a well-established heteromorphic ZZ/ZW sex determination system and thus constitutes a good model for studying W and Z chromosomes in fishes. We used PacBio reads and Hi-C to assemble a chromosome-level reference genome for M. macrocephalus. We generated family segregation information to construct a genetic map, pool sequencing of males and females to characterize its sex system, and RNA sequencing to highlight candidate genes of M. macrocephalus sex determination.
Results: The reference genome of M. macrocephalus is 1,282,030,339 bp in length and has a contig and scaffold N50 of 5.0 Mb and 45.03 Mb, respectively. In the sex chromosome, based on patterns of recombination suppression, coverage, FST, and sex-specific SNPs, we distinguished a putative W-specific region that is highly differentiated, a region where Z and W still share some similarities and is undergoing degeneration, and the PAR. The sex chromosome gene repertoire includes genes from the TGF-β family (amhr2, bmp7) and the Wnt/β-catenin pathway (wnt4, wnt7a), some of which are differentially expressed.
Conclusions: The chromosome-level genome of piauçu exhibits high quality, establishing a valuable resource for advancing research within the group. Our discoveries offer insights into the evolutionary dynamics of Z and W sex chromosomes in fish, emphasizing ongoing degenerative processes and indicating complex interactions between Z and W sequences in specific genomic regions. Notably, amhr2 and bmp7 are potential candidate genes for sex determination in M. macrocephalus.
背景:巨头鱼(Megaleporinus macrocephalus,piauçu)是一种新热带鱼类,属于Characoidei科,具有完善的异形ZZ/ZW性别决定系统,因此是研究鱼类W和Z染色体的良好模型。我们利用 PacBio reads 和 Hi-C 为巨头鱼组装了染色体水平的参考基因组。我们生成了家系分离信息以构建遗传图谱,对雄性和雌性进行集合测序以描述其性别系统的特征,并通过RNA测序突出了大口鲶性别决定的候选基因:结果:大头蝠参考基因组长度为1,282,030,339 bp,等位基因和支架N50分别为5.0 Mb和45.03 Mb。在性染色体中,根据重组抑制模式、覆盖率、FST和性别特异性SNPs,我们区分出了一个高度分化的假定W特异性区域、一个Z和W仍有一些相似性并正在退化的区域以及PAR。性染色体基因库包括来自 TGF-β 家族(amhr2、bmp7)和 Wnt/β-catenin 通路(wnt4、wnt7a)的基因,其中一些基因的表达存在差异:piauçu染色体级基因组具有很高的质量,为推动该群体的研究提供了宝贵的资源。我们的发现为鱼类 Z 和 W 性染色体的进化动态提供了见解,强调了正在进行的退化过程,并显示了特定基因组区域中 Z 和 W 序列之间复杂的相互作用。值得注意的是,amhr2和bmp7是巨头鱼性别决定的潜在候选基因。
{"title":"De novo assembly and characterization of a highly degenerated ZW sex chromosome in the fish Megaleporinus macrocephalus.","authors":"Carolina Heloisa Souza-Borges, Ricardo Utsunomia, Alessandro M Varani, Marcela Uliano-Silva, Lieschen Valeria G Lira, Arno J Butzge, John F Gomez Agudelo, Shisley Manso, Milena V Freitas, Raquel B Ariede, Vito A Mastrochirico-Filho, Carolina Penaloza, Agustín Barria, Fábio Porto-Foresti, Fausto Foresti, Ricardo Hattori, Yann Guiguen, Ross D Houston, Diogo Teruo Hashimoto","doi":"10.1093/gigascience/giae085","DOIUrl":"10.1093/gigascience/giae085","url":null,"abstract":"<p><strong>Background: </strong>Megaleporinus macrocephalus (piauçu) is a Neotropical fish within Characoidei that presents a well-established heteromorphic ZZ/ZW sex determination system and thus constitutes a good model for studying W and Z chromosomes in fishes. We used PacBio reads and Hi-C to assemble a chromosome-level reference genome for M. macrocephalus. We generated family segregation information to construct a genetic map, pool sequencing of males and females to characterize its sex system, and RNA sequencing to highlight candidate genes of M. macrocephalus sex determination.</p><p><strong>Results: </strong>The reference genome of M. macrocephalus is 1,282,030,339 bp in length and has a contig and scaffold N50 of 5.0 Mb and 45.03 Mb, respectively. In the sex chromosome, based on patterns of recombination suppression, coverage, FST, and sex-specific SNPs, we distinguished a putative W-specific region that is highly differentiated, a region where Z and W still share some similarities and is undergoing degeneration, and the PAR. The sex chromosome gene repertoire includes genes from the TGF-β family (amhr2, bmp7) and the Wnt/β-catenin pathway (wnt4, wnt7a), some of which are differentially expressed.</p><p><strong>Conclusions: </strong>The chromosome-level genome of piauçu exhibits high quality, establishing a valuable resource for advancing research within the group. Our discoveries offer insights into the evolutionary dynamics of Z and W sex chromosomes in fish, emphasizing ongoing degenerative processes and indicating complex interactions between Z and W sequences in specific genomic regions. Notably, amhr2 and bmp7 are potential candidate genes for sex determination in M. macrocephalus.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11590113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142715761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae087
Aishwarya Venkataramanan, Michael Kloster, Andrea Burfeid-Castellanos, Mimoza Dani, Ntambwe A S Mayombo, Danijela Vidakovic, Daniel Langenkämper, Mingkun Tan, Cedric Pradalier, Tim Nattkemper, Martin Laviale, Bánk Beszteri
Background: Diatoms are microalgae with finely ornamented microscopic silica shells. Their taxonomic identification by light microscopy is routinely used as part of community ecological research as well as ecological status assessment of aquatic ecosystems, and a need for digitalization of these methods has long been recognized. Alongside their high taxonomic and morphological diversity, several other factors make diatoms highly challenging for deep learning-based identification using light microscopy images. These include (i) an unusually high intraclass variability combined with small between-class differences, (ii) a rather different visual appearance of specimens depending on their orientation on the microscope slide, and (iii) the limited availability of diatom experts for accurate taxonomic annotation.
Findings: We present the largest diatom image dataset thus far, aimed at facilitating the application and benchmarking of innovative deep learning methods to the diatom identification problem on realistic research data, "UDE DIATOMS in the Wild 2024." The dataset contains 83,570 images of 611 diatom taxa, 101 of which are represented by at least 100 examples and 144 by at least 50 examples each. We showcase this dataset in 2 innovative analyses that address individual aspects of the above challenges using subclustering to deal with visually heterogeneous classes, out-of-distribution sample detection, and semi-supervised learning.
Conclusions: The problem of image-based identification of diatoms is both important for environmental research and challenging from the machine learning perspective. By making available the so far largest image dataset, accompanied by innovative analyses, this contribution will facilitate addressing these points by the scientific community.
{"title":"\"UDE DIATOMS in the Wild 2024\": a new image dataset of freshwater diatoms for training deep learning models.","authors":"Aishwarya Venkataramanan, Michael Kloster, Andrea Burfeid-Castellanos, Mimoza Dani, Ntambwe A S Mayombo, Danijela Vidakovic, Daniel Langenkämper, Mingkun Tan, Cedric Pradalier, Tim Nattkemper, Martin Laviale, Bánk Beszteri","doi":"10.1093/gigascience/giae087","DOIUrl":"10.1093/gigascience/giae087","url":null,"abstract":"<p><strong>Background: </strong>Diatoms are microalgae with finely ornamented microscopic silica shells. Their taxonomic identification by light microscopy is routinely used as part of community ecological research as well as ecological status assessment of aquatic ecosystems, and a need for digitalization of these methods has long been recognized. Alongside their high taxonomic and morphological diversity, several other factors make diatoms highly challenging for deep learning-based identification using light microscopy images. These include (i) an unusually high intraclass variability combined with small between-class differences, (ii) a rather different visual appearance of specimens depending on their orientation on the microscope slide, and (iii) the limited availability of diatom experts for accurate taxonomic annotation.</p><p><strong>Findings: </strong>We present the largest diatom image dataset thus far, aimed at facilitating the application and benchmarking of innovative deep learning methods to the diatom identification problem on realistic research data, \"UDE DIATOMS in the Wild 2024.\" The dataset contains 83,570 images of 611 diatom taxa, 101 of which are represented by at least 100 examples and 144 by at least 50 examples each. We showcase this dataset in 2 innovative analyses that address individual aspects of the above challenges using subclustering to deal with visually heterogeneous classes, out-of-distribution sample detection, and semi-supervised learning.</p><p><strong>Conclusions: </strong>The problem of image-based identification of diatoms is both important for environmental research and challenging from the machine learning perspective. By making available the so far largest image dataset, accompanied by innovative analyses, this contribution will facilitate addressing these points by the scientific community.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11604061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142750299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae092
Mohamed Salem, Rafet Al-Tobasei, Ali Ali, Liqi An, Ying Wang, Xuechen Bai, Ye Bi, Huaijun Zhou
Rainbow trout (RBT) has gained widespread attention as a biological model across various fields and has been rapidly adopted for aquaculture and recreational purposes on 6 continents. Despite significant efforts to develop genome sequences for RBT, the functional genomic basis of RBT's environmental, phenotypic, and evolutionary variations still requires epigenome reference annotations. This study has produced a comprehensive catalog and epigenome annotation tracks of RBT, detecting gene regulatory elements, including chromatin histone modifications, chromatin accessibility, and DNA methylation. By integrating chromatin immunoprecipitation sequencing, ATAC sequencing, Methyl Mini-seq, and RNA sequencing data, this new regulatory element catalog has helped to characterize the epigenome dynamics and its correlation with gene expression. The study has also identified potential causal variants and transcription factors regulating complex domestication phenotypic traits. This research also provides valuable insights into the epigenome's role in gene evolution and the mechanism of duplicate gene retention 100 million years after RBT whole-genome duplication and during re-diploidization. The newly developed epigenome annotation maps are among the first in fish and are expected to enhance the accuracy and efficiency of genomic studies and applications, including genome-wide association studies, causative variation identification, and genomic selection in RBT and fish comparative genomics.
{"title":"Functional annotation of regulatory elements in rainbow trout uncovers roles of the epigenome in genetic selection and genome evolution.","authors":"Mohamed Salem, Rafet Al-Tobasei, Ali Ali, Liqi An, Ying Wang, Xuechen Bai, Ye Bi, Huaijun Zhou","doi":"10.1093/gigascience/giae092","DOIUrl":"10.1093/gigascience/giae092","url":null,"abstract":"<p><p>Rainbow trout (RBT) has gained widespread attention as a biological model across various fields and has been rapidly adopted for aquaculture and recreational purposes on 6 continents. Despite significant efforts to develop genome sequences for RBT, the functional genomic basis of RBT's environmental, phenotypic, and evolutionary variations still requires epigenome reference annotations. This study has produced a comprehensive catalog and epigenome annotation tracks of RBT, detecting gene regulatory elements, including chromatin histone modifications, chromatin accessibility, and DNA methylation. By integrating chromatin immunoprecipitation sequencing, ATAC sequencing, Methyl Mini-seq, and RNA sequencing data, this new regulatory element catalog has helped to characterize the epigenome dynamics and its correlation with gene expression. The study has also identified potential causal variants and transcription factors regulating complex domestication phenotypic traits. This research also provides valuable insights into the epigenome's role in gene evolution and the mechanism of duplicate gene retention 100 million years after RBT whole-genome duplication and during re-diploidization. The newly developed epigenome annotation maps are among the first in fish and are expected to enhance the accuracy and efficiency of genomic studies and applications, including genome-wide association studies, causative variation identification, and genomic selection in RBT and fish comparative genomics.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142828078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae024
Justin Chu, Jiazhen Rong, Xiaowen Feng, Heng Li
Background: Due to human error, sample swapping in large cohort studies with heterogeneous data types (e.g., mix of Oxford Nanopore Technologies, Pacific Bioscience, Illumina data, etc.) remains a common issue plaguing large-scale studies. At present, all sample swapping detection methods require costly and unnecessary (e.g., if data are only used for genome assembly) alignment, positional sorting, and indexing of the data in order to compare similarly. As studies include more samples and new sequencing data types, robust quality control tools will become increasingly important.
Findings: The similarity between samples can be determined using indexed k-mer sequence variants. To increase statistical power, we use coverage information on variant sites, calculating similarity using a likelihood ratio-based test. Per sample error rate, and coverage bias (i.e., missing sites) can also be estimated with this information, which can be used to determine if a spatially indexed principal component analysis (PCA)-based prescreening method can be used, which can greatly speed up analysis by preventing exhaustive all-to-all comparisons.
Conclusions: Because this tool processes raw data, is faster than alignment, and can be used on very low-coverage data, it can save an immense degree of computational resources in standard quality control (QC) pipelines. It is robust enough to be used on different sequencing data types, important in studies that leverage the strengths of different sequencing technologies. In addition to its primary use case of sample swap detection, this method also provides information useful in QC, such as error rate and coverage bias, as well as population-level PCA ancestry analysis visualization.
{"title":"ntsm: an alignment-free, ultra-low-coverage, sequencing technology agnostic, intraspecies sample comparison tool for sample swap detection.","authors":"Justin Chu, Jiazhen Rong, Xiaowen Feng, Heng Li","doi":"10.1093/gigascience/giae024","DOIUrl":"10.1093/gigascience/giae024","url":null,"abstract":"<p><strong>Background: </strong>Due to human error, sample swapping in large cohort studies with heterogeneous data types (e.g., mix of Oxford Nanopore Technologies, Pacific Bioscience, Illumina data, etc.) remains a common issue plaguing large-scale studies. At present, all sample swapping detection methods require costly and unnecessary (e.g., if data are only used for genome assembly) alignment, positional sorting, and indexing of the data in order to compare similarly. As studies include more samples and new sequencing data types, robust quality control tools will become increasingly important.</p><p><strong>Findings: </strong>The similarity between samples can be determined using indexed k-mer sequence variants. To increase statistical power, we use coverage information on variant sites, calculating similarity using a likelihood ratio-based test. Per sample error rate, and coverage bias (i.e., missing sites) can also be estimated with this information, which can be used to determine if a spatially indexed principal component analysis (PCA)-based prescreening method can be used, which can greatly speed up analysis by preventing exhaustive all-to-all comparisons.</p><p><strong>Conclusions: </strong>Because this tool processes raw data, is faster than alignment, and can be used on very low-coverage data, it can save an immense degree of computational resources in standard quality control (QC) pipelines. It is robust enough to be used on different sequencing data types, important in studies that leverage the strengths of different sequencing technologies. In addition to its primary use case of sample swap detection, this method also provides information useful in QC, such as error rate and coverage bias, as well as population-level PCA ancestry analysis visualization.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11148594/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141237337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae099
Rina Su, Hao Zhou, Wenhao Yang, Sorgog Moqir, Xiji Ritu, Lei Liu, Ying Shi, Ai Dong, Menghe Bayier, Yibu Letu, Xin Manxi, Hasi Chulu, Narenhua Nasenochir, He Meng, Muren Herrid
Background: Mongolian cattle, a unique breed indigenous to China, represent valuable genetic resources and serve as important sources of meat and milk. However, there is a lack of high-quality genomes in cattle, which limits biological research and breeding improvement.
Findings: In this study, we conducted whole-genome sequencing on a Mongolian bull. This effort yielded a 3.1 Gb Mongolian cattle genome sequence, with a BUSCO integrity assessment of 95.9%. The assembly achieved both contig N50 and scaffold N50 values of 110.9 Mb, with only 3 gaps identified across the entire genome. Additionally, we successfully assembled the Y chromosome among the 31 chromosomes. Notably, 3 chromosomes were identified as having telomeres at both ends. The annotation data include 54.31% repetitive sequences and 29,794 coding genes. Furthermore, a population genetic variation analysis was conducted on 332 individuals from 56 breeds, through which we identified variant loci and potentially discovered genes associated with the formation of marbling patterns in beef, predominantly located on chromosome 12.
Conclusions: This study produced a genome with high continuity, completeness, and accuracy, marking the first assembly and annotation of a near telomere-to-telomere genome in cattle. Based on this, we generated a variant database comprising 332 individuals. The assembly of the genome and the analysis of population variants provide significant insights into cattle evolution and enhance our understanding of breeding selection.
{"title":"Near telomere-to-telomere genome assembly of Mongolian cattle: implications for population genetic variation and beef quality.","authors":"Rina Su, Hao Zhou, Wenhao Yang, Sorgog Moqir, Xiji Ritu, Lei Liu, Ying Shi, Ai Dong, Menghe Bayier, Yibu Letu, Xin Manxi, Hasi Chulu, Narenhua Nasenochir, He Meng, Muren Herrid","doi":"10.1093/gigascience/giae099","DOIUrl":"10.1093/gigascience/giae099","url":null,"abstract":"<p><strong>Background: </strong>Mongolian cattle, a unique breed indigenous to China, represent valuable genetic resources and serve as important sources of meat and milk. However, there is a lack of high-quality genomes in cattle, which limits biological research and breeding improvement.</p><p><strong>Findings: </strong>In this study, we conducted whole-genome sequencing on a Mongolian bull. This effort yielded a 3.1 Gb Mongolian cattle genome sequence, with a BUSCO integrity assessment of 95.9%. The assembly achieved both contig N50 and scaffold N50 values of 110.9 Mb, with only 3 gaps identified across the entire genome. Additionally, we successfully assembled the Y chromosome among the 31 chromosomes. Notably, 3 chromosomes were identified as having telomeres at both ends. The annotation data include 54.31% repetitive sequences and 29,794 coding genes. Furthermore, a population genetic variation analysis was conducted on 332 individuals from 56 breeds, through which we identified variant loci and potentially discovered genes associated with the formation of marbling patterns in beef, predominantly located on chromosome 12.</p><p><strong>Conclusions: </strong>This study produced a genome with high continuity, completeness, and accuracy, marking the first assembly and annotation of a near telomere-to-telomere genome in cattle. Based on this, we generated a variant database comprising 332 individuals. The assembly of the genome and the analysis of population variants provide significant insights into cattle evolution and enhance our understanding of breeding selection.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11653892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142853779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae002
Carla L Archibald, David M Summers, Erin M Graham, Brett A Bryan
Background: Spatial information about the location and suitability of areas for native plant and animal species under different climate futures is an important input to land use and conservation planning and management. Australia, renowned for its abundant species diversity and endemism, often relies on modeled data to assess species distributions due to the country's vast size and the challenges associated with conducting on-ground surveys on such a large scale. The objective of this article is to develop habitat suitability maps for Australian flora and fauna under different climate futures.
Results: Using MaxEnt, we produced Australia-wide habitat suitability maps under RCP2.6-SSP1, RCP4.5-SSP2, RCP7.0-SSP3, and RCP8.5-SSP5 climate futures for 1,382 terrestrial vertebrates and 9,251 vascular plants vascular plants at 5 km2 for open access. This represents 60% of all Australian mammal species, 77% of amphibian species, 50% of reptile species, 71% of bird species, and 44% of vascular plant species. We also include tabular data, which include summaries of total quality-weighted habitat area of species under different climate scenarios and time periods.
Conclusions: The spatial data supplied can help identify important and sensitive locations for species under various climate futures. Additionally, the supplied tabular data can provide insights into the impacts of climate change on biodiversity in Australia. These habitat suitability maps can be used as input data for landscape and conservation planning or species management, particularly under different climate change scenarios in Australia.
{"title":"Habitat suitability maps for Australian flora and fauna under CMIP6 climate scenarios.","authors":"Carla L Archibald, David M Summers, Erin M Graham, Brett A Bryan","doi":"10.1093/gigascience/giae002","DOIUrl":"10.1093/gigascience/giae002","url":null,"abstract":"<p><strong>Background: </strong>Spatial information about the location and suitability of areas for native plant and animal species under different climate futures is an important input to land use and conservation planning and management. Australia, renowned for its abundant species diversity and endemism, often relies on modeled data to assess species distributions due to the country's vast size and the challenges associated with conducting on-ground surveys on such a large scale. The objective of this article is to develop habitat suitability maps for Australian flora and fauna under different climate futures.</p><p><strong>Results: </strong>Using MaxEnt, we produced Australia-wide habitat suitability maps under RCP2.6-SSP1, RCP4.5-SSP2, RCP7.0-SSP3, and RCP8.5-SSP5 climate futures for 1,382 terrestrial vertebrates and 9,251 vascular plants vascular plants at 5 km2 for open access. This represents 60% of all Australian mammal species, 77% of amphibian species, 50% of reptile species, 71% of bird species, and 44% of vascular plant species. We also include tabular data, which include summaries of total quality-weighted habitat area of species under different climate scenarios and time periods.</p><p><strong>Conclusions: </strong>The spatial data supplied can help identify important and sensitive locations for species under various climate futures. Additionally, the supplied tabular data can provide insights into the impacts of climate change on biodiversity in Australia. These habitat suitability maps can be used as input data for landscape and conservation planning or species management, particularly under different climate change scenarios in Australia.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10939329/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140039094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae004
Ye Xu, Ling Ma, Shanlin Liu, Yanxin Liang, Qiaoqiao Liu, Zhixin He, Li Tian, Yuange Duan, Wanzhi Cai, Hu Li, Fan Song
Background: Lice (Psocodea: Phthiraptera) are one important group of parasites that infects birds and mammals. It is believed that the ancestor of parasitic lice originated on the ancient avian host, and ancient mammals acquired these parasites via host-switching from birds. Here we present the first chromosome-level genome of Menopon gallinae in Amblycera (earliest diverging lineage of parasitic lice). We explore the transition of louse host-switching from birds to mammals at the genomic level by identifying numerous idiosyncratic genomic variations.
Results: The assembled genome is 155 Mb in length, with a contig N50 of 27.42 Mb. Hi-C scaffolding assigned 97% of the bases to 5 chromosomes. The genome of M. gallinae retains a basal insect repertoire of 11,950 protein-coding genes. By comparing the genomes of lice to those of multiple representative insects in other orders, we discovered that gene families of digestion, detoxification, and immunity-related are generally conserved between bird lice and mammal lice, while mammal lice have undergone a significant reduction in genes related to chemosensory systems and temperature. This suggests that mammal lice have lost some of these genes through the adaption to environment and temperatures after host-switching. Furthermore, 7 genes related to hematophagy were positively selected in mammal lice, suggesting their involvement in the hematophagous behavior.
Conclusions: Our high-quality genome of M. gallinae provides a valuable resource for comparative genomic research in Phthiraptera and facilitates further studies on adaptive evolution of host-switching within parasitic lice.
{"title":"Chromosome-level genome of the poultry shaft louse Menopon gallinae provides insight into the host-switching and adaptive evolution of parasitic lice.","authors":"Ye Xu, Ling Ma, Shanlin Liu, Yanxin Liang, Qiaoqiao Liu, Zhixin He, Li Tian, Yuange Duan, Wanzhi Cai, Hu Li, Fan Song","doi":"10.1093/gigascience/giae004","DOIUrl":"10.1093/gigascience/giae004","url":null,"abstract":"<p><strong>Background: </strong>Lice (Psocodea: Phthiraptera) are one important group of parasites that infects birds and mammals. It is believed that the ancestor of parasitic lice originated on the ancient avian host, and ancient mammals acquired these parasites via host-switching from birds. Here we present the first chromosome-level genome of Menopon gallinae in Amblycera (earliest diverging lineage of parasitic lice). We explore the transition of louse host-switching from birds to mammals at the genomic level by identifying numerous idiosyncratic genomic variations.</p><p><strong>Results: </strong>The assembled genome is 155 Mb in length, with a contig N50 of 27.42 Mb. Hi-C scaffolding assigned 97% of the bases to 5 chromosomes. The genome of M. gallinae retains a basal insect repertoire of 11,950 protein-coding genes. By comparing the genomes of lice to those of multiple representative insects in other orders, we discovered that gene families of digestion, detoxification, and immunity-related are generally conserved between bird lice and mammal lice, while mammal lice have undergone a significant reduction in genes related to chemosensory systems and temperature. This suggests that mammal lice have lost some of these genes through the adaption to environment and temperatures after host-switching. Furthermore, 7 genes related to hematophagy were positively selected in mammal lice, suggesting their involvement in the hematophagous behavior.</p><p><strong>Conclusions: </strong>Our high-quality genome of M. gallinae provides a valuable resource for comparative genomic research in Phthiraptera and facilitates further studies on adaptive evolution of host-switching within parasitic lice.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10904027/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139899653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae007
Filipi Miranda Soares, Luís Ferreira Pires, Maria Carolina Garcia, Yamine Bouzembrak, Lidio Coradin, Natalia Pirani Ghilardi-Lopes, Rubens Rangel Silva, Aline Martins de Carvalho, Benildes Coura Moreira Dos Santos Maculan, Sheina Koffler, Uiara Bandineli Montedo, Debora Pignatari Drucker, Raquel Santiago, Anand Gavai, Maria Clara Peres de Carvalho, Ana Carolina da Silva Lima, Hillary Dandara Elias Gabriel, Stephanie Gabriele Mendonça de França, Karoline Reis de Almeida, Bárbara Junqueira Dos Santos, Antonio Mauro Saraiva
Urbanization brings forth social challenges in emerging countries such as Brazil, encompassing food scarcity, health deterioration, air pollution, and biodiversity loss. Despite this, urban areas like the city of São Paulo still boast ample green spaces, offering opportunities for nature appreciation and conservation, enhancing city resilience and livability. Citizen science is a collaborative endeavor between professional scientists and nonprofessional scientists in scientific research that may help to understand the dynamics of urban ecosystems. We believe citizen science has the potential to promote human and nature connection in urban areas and provide useful data on urban biodiversity.
{"title":"Leveraging citizen science for monitoring urban forageable plants.","authors":"Filipi Miranda Soares, Luís Ferreira Pires, Maria Carolina Garcia, Yamine Bouzembrak, Lidio Coradin, Natalia Pirani Ghilardi-Lopes, Rubens Rangel Silva, Aline Martins de Carvalho, Benildes Coura Moreira Dos Santos Maculan, Sheina Koffler, Uiara Bandineli Montedo, Debora Pignatari Drucker, Raquel Santiago, Anand Gavai, Maria Clara Peres de Carvalho, Ana Carolina da Silva Lima, Hillary Dandara Elias Gabriel, Stephanie Gabriele Mendonça de França, Karoline Reis de Almeida, Bárbara Junqueira Dos Santos, Antonio Mauro Saraiva","doi":"10.1093/gigascience/giae007","DOIUrl":"10.1093/gigascience/giae007","url":null,"abstract":"<p><p>Urbanization brings forth social challenges in emerging countries such as Brazil, encompassing food scarcity, health deterioration, air pollution, and biodiversity loss. Despite this, urban areas like the city of São Paulo still boast ample green spaces, offering opportunities for nature appreciation and conservation, enhancing city resilience and livability. Citizen science is a collaborative endeavor between professional scientists and nonprofessional scientists in scientific research that may help to understand the dynamics of urban ecosystems. We believe citizen science has the potential to promote human and nature connection in urban areas and provide useful data on urban biodiversity.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10914215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140039095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae013
Anish M S Shrestha, Mark Edward M Gonzales, Phoebe Clare L Ong, Pierre Larmande, Hyun-Sook Lee, Ji-Ung Jeung, Ajay Kohli, Dmytro Chebotarov, Ramil P Mauleon, Jae-Sung Lee, Kenneth L McNally
Background: As the number of genome-wide association study (GWAS) and quantitative trait locus (QTL) mappings in rice continues to grow, so does the already long list of genomic loci associated with important agronomic traits. Typically, loci implicated by GWAS/QTL analysis contain tens to hundreds to thousands of single-nucleotide polmorphisms (SNPs)/genes, not all of which are causal and many of which are in noncoding regions. Unraveling the biological mechanisms that tie the GWAS regions and QTLs to the trait of interest is challenging, especially since it requires collating functional genomics information about the loci from multiple, disparate data sources.
Results: We present RicePilaf, a web app for post-GWAS/QTL analysis, that performs a slew of novel bioinformatics analyses to cross-reference GWAS results and QTL mappings with a host of publicly available rice databases. In particular, it integrates (i) pangenomic information from high-quality genome builds of multiple rice varieties, (ii) coexpression information from genome-scale coexpression networks, (iii) ontology and pathway information, (iv) regulatory information from rice transcription factor databases, (v) epigenomic information from multiple high-throughput epigenetic experiments, and (vi) text-mining information extracted from scientific abstracts linking genes and traits. We demonstrate the utility of RicePilaf by applying it to analyze GWAS peaks of preharvest sprouting and genes underlying yield-under-drought QTLs.
Conclusions: RicePilaf enables rice scientists and breeders to shed functional light on their GWAS regions and QTLs, and it provides them with a means to prioritize SNPs/genes for further experiments. The source code, a Docker image, and a demo version of RicePilaf are publicly available at https://github.com/bioinfodlsu/rice-pilaf.
{"title":"RicePilaf: a post-GWAS/QTL dashboard to integrate pangenomic, coexpression, regulatory, epigenomic, ontology, pathway, and text-mining information to provide functional insights into rice QTLs and GWAS loci.","authors":"Anish M S Shrestha, Mark Edward M Gonzales, Phoebe Clare L Ong, Pierre Larmande, Hyun-Sook Lee, Ji-Ung Jeung, Ajay Kohli, Dmytro Chebotarov, Ramil P Mauleon, Jae-Sung Lee, Kenneth L McNally","doi":"10.1093/gigascience/giae013","DOIUrl":"10.1093/gigascience/giae013","url":null,"abstract":"<p><strong>Background: </strong>As the number of genome-wide association study (GWAS) and quantitative trait locus (QTL) mappings in rice continues to grow, so does the already long list of genomic loci associated with important agronomic traits. Typically, loci implicated by GWAS/QTL analysis contain tens to hundreds to thousands of single-nucleotide polmorphisms (SNPs)/genes, not all of which are causal and many of which are in noncoding regions. Unraveling the biological mechanisms that tie the GWAS regions and QTLs to the trait of interest is challenging, especially since it requires collating functional genomics information about the loci from multiple, disparate data sources.</p><p><strong>Results: </strong>We present RicePilaf, a web app for post-GWAS/QTL analysis, that performs a slew of novel bioinformatics analyses to cross-reference GWAS results and QTL mappings with a host of publicly available rice databases. In particular, it integrates (i) pangenomic information from high-quality genome builds of multiple rice varieties, (ii) coexpression information from genome-scale coexpression networks, (iii) ontology and pathway information, (iv) regulatory information from rice transcription factor databases, (v) epigenomic information from multiple high-throughput epigenetic experiments, and (vi) text-mining information extracted from scientific abstracts linking genes and traits. We demonstrate the utility of RicePilaf by applying it to analyze GWAS peaks of preharvest sprouting and genes underlying yield-under-drought QTLs.</p><p><strong>Conclusions: </strong>RicePilaf enables rice scientists and breeders to shed functional light on their GWAS regions and QTLs, and it provides them with a means to prioritize SNPs/genes for further experiments. The source code, a Docker image, and a demo version of RicePilaf are publicly available at https://github.com/bioinfodlsu/rice-pilaf.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11148593/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141237423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-02DOI: 10.1093/gigascience/giae022
Teresa Müller, Stefan Mautner, Pavankumar Videm, Florian Eggenhofer, Martin Raden, Rolf Backofen
Background: RNA-RNA interactions are key to a wide range of cellular functions. The detection of potential interactions helps to understand the underlying processes. However, potential interactions identified via in silico or experimental high-throughput methods can lack precision because of a high false-positive rate.
Results: We present CheRRI, the first tool to evaluate the biological relevance of putative RNA-RNA interaction sites. CheRRI filters candidates via a machine learning-based model trained on experimental RNA-RNA interactome data. Its unique setup combines interactome data and an established thermodynamic prediction tool to integrate experimental data with state-of-the-art computational models. Applying these data to an automated machine learning approach provides the opportunity to not only filter data for potential false positives but also tailor the underlying interaction site model to specific needs.
Conclusions: CheRRI is a stand-alone postprocessing tool to filter either predicted or experimentally identified potential RNA-RNA interactions on a genomic level to enhance the quality of interaction candidates. It is easy to install (via conda, pip packages), use (via Galaxy), and integrate into existing RNA-RNA interaction pipelines.
{"title":"CheRRI-Accurate classification of the biological relevance of putative RNA-RNA interaction sites.","authors":"Teresa Müller, Stefan Mautner, Pavankumar Videm, Florian Eggenhofer, Martin Raden, Rolf Backofen","doi":"10.1093/gigascience/giae022","DOIUrl":"10.1093/gigascience/giae022","url":null,"abstract":"<p><strong>Background: </strong>RNA-RNA interactions are key to a wide range of cellular functions. The detection of potential interactions helps to understand the underlying processes. However, potential interactions identified via in silico or experimental high-throughput methods can lack precision because of a high false-positive rate.</p><p><strong>Results: </strong>We present CheRRI, the first tool to evaluate the biological relevance of putative RNA-RNA interaction sites. CheRRI filters candidates via a machine learning-based model trained on experimental RNA-RNA interactome data. Its unique setup combines interactome data and an established thermodynamic prediction tool to integrate experimental data with state-of-the-art computational models. Applying these data to an automated machine learning approach provides the opportunity to not only filter data for potential false positives but also tailor the underlying interaction site model to specific needs.</p><p><strong>Conclusions: </strong>CheRRI is a stand-alone postprocessing tool to filter either predicted or experimentally identified potential RNA-RNA interactions on a genomic level to enhance the quality of interaction candidates. It is easy to install (via conda, pip packages), use (via Galaxy), and integrate into existing RNA-RNA interaction pipelines.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11152173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141261603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}