Pub Date : 2024-02-20eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.109
Aleksandra Djordjevic, Junhua Li, Shuangsang Fang, Lei Cao, Marija Ivanovic
This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confronted this clustering challenge as an Integer Linear Programming minimization problem. Our approach introduced a novel model based on the VNS technique, demonstrating the efficacy in navigating the complexities of cell clustering. Notably, our method extends beyond conventional cell-type clustering to spatial domain clustering. This adaptability enables our algorithm to orchestrate clusters based on information gleaned from gene expression matrices and spatial coordinates. Our validation showed the superior performance of our method when compared to existing techniques. Our approach advances current clustering methodologies and can potentially be applied to several fields, from biomedical research to spatial data analysis.
{"title":"A novel variable neighborhood search approach for cell clustering for spatial transcriptomics.","authors":"Aleksandra Djordjevic, Junhua Li, Shuangsang Fang, Lei Cao, Marija Ivanovic","doi":"10.46471/gigabyte.109","DOIUrl":"10.46471/gigabyte.109","url":null,"abstract":"<p><p>This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confronted this clustering challenge as an Integer Linear Programming minimization problem. Our approach introduced a novel model based on the VNS technique, demonstrating the efficacy in navigating the complexities of cell clustering. Notably, our method extends beyond conventional cell-type clustering to spatial domain clustering. This adaptability enables our algorithm to orchestrate clusters based on information gleaned from gene expression matrices and spatial coordinates. Our validation showed the superior performance of our method when compared to existing techniques. Our approach advances current clustering methodologies and can potentially be applied to several fields, from biomedical research to spatial data analysis.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte109"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10910296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140029702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration. The BatchEval Pipeline generates a comprehensive report, which consists of a series of HTML pages for assessment findings, including a main page, a raw dataset evaluation page, and several built-in methods evaluation pages. The main page exhibits basic information of the integrated datasets, a comprehensive score of batch effect, and the most recommended method for removing batch effect from the current datasets. The remaining pages exhibit evaluation details for the raw dataset, and evaluation results from the built-in batch effect removal methods after removing batch effect. This comprehensive report enables researchers to accurately identify and remove batch effects, resulting in more reliable and meaningful biological insights from integrated datasets. In summary, the BatchEval Pipeline represents a significant advancement in batch effect evaluation, and is a valuable tool to improve the accuracy and reliability of the experimental results.
Availability & implementation: The source code of the BatchEval Pipeline is available at https://github.com/STOmics/BatchEval.
{"title":"BatchEval Pipeline: batch effect evaluation workflow for multiple datasets joint analysis.","authors":"Chao Zhang, Qiang Kang, Mei Li, Hongqing Xie, Shuangsang Fang, Xun Xu","doi":"10.46471/gigabyte.108","DOIUrl":"10.46471/gigabyte.108","url":null,"abstract":"<p><p>As genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration. The BatchEval Pipeline generates a comprehensive report, which consists of a series of HTML pages for assessment findings, including a main page, a raw dataset evaluation page, and several built-in methods evaluation pages. The main page exhibits basic information of the integrated datasets, a comprehensive score of batch effect, and the most recommended method for removing batch effect from the current datasets. The remaining pages exhibit evaluation details for the raw dataset, and evaluation results from the built-in batch effect removal methods after removing batch effect. This comprehensive report enables researchers to accurately identify and remove batch effects, resulting in more reliable and meaningful biological insights from integrated datasets. In summary, the BatchEval Pipeline represents a significant advancement in batch effect evaluation, and is a valuable tool to improve the accuracy and reliability of the experimental results.</p><p><strong>Availability & implementation: </strong>The source code of the BatchEval Pipeline is available at https://github.com/STOmics/BatchEval.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte108"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10905258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-20eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.110
Bohan Zhang, Mei Li, Qiang Kang, Zhonghan Deng, Hua Qin, Kui Su, Xiuwen Feng, Lichuan Chen, Huanlin Liu, Shuangsang Fang, Yong Zhang, Yuxiang Li, Susanne Brix, Xun Xu
In spatially resolved transcriptomics, Stereo-seq facilitates the analysis of large tissues at the single-cell level, offering subcellular resolution and centimeter-level field-of-view. Our previous work on StereoCell introduced a one-stop software using cell nuclei staining images and statistical methods to generate high-confidence single-cell spatial gene expression profiles for Stereo-seq data. With advancements allowing the acquisition of cell boundary information, such as cell membrane/wall staining images, we updated our software to a new version, STCellbin. Using cell nuclei staining images, STCellbin aligns cell membrane/wall staining images with spatial gene expression maps. Advanced cell segmentation ensures the detection of accurate cell boundaries, leading to more reliable single-cell spatial gene expression profiles. We verified that STCellbin can be applied to mouse liver (cell membranes) and Arabidopsis seed (cell walls) datasets, outperforming other methods. The improved capability of capturing single-cell gene expression profiles results in a deeper understanding of the contribution of single-cell phenotypes to tissue biology.
Availability & implementation: The source code of STCellbin is available at https://github.com/STOmics/STCellbin.
{"title":"Generating single-cell gene expression profiles for high-resolution spatial transcriptomics based on cell boundary images.","authors":"Bohan Zhang, Mei Li, Qiang Kang, Zhonghan Deng, Hua Qin, Kui Su, Xiuwen Feng, Lichuan Chen, Huanlin Liu, Shuangsang Fang, Yong Zhang, Yuxiang Li, Susanne Brix, Xun Xu","doi":"10.46471/gigabyte.110","DOIUrl":"10.46471/gigabyte.110","url":null,"abstract":"<p><p>In spatially resolved transcriptomics, Stereo-seq facilitates the analysis of large tissues at the single-cell level, offering subcellular resolution and centimeter-level field-of-view. Our previous work on StereoCell introduced a one-stop software using cell nuclei staining images and statistical methods to generate high-confidence single-cell spatial gene expression profiles for Stereo-seq data. With advancements allowing the acquisition of cell boundary information, such as cell membrane/wall staining images, we updated our software to a new version, STCellbin. Using cell nuclei staining images, STCellbin aligns cell membrane/wall staining images with spatial gene expression maps. Advanced cell segmentation ensures the detection of accurate cell boundaries, leading to more reliable single-cell spatial gene expression profiles. We verified that STCellbin can be applied to mouse liver (cell membranes) and <i>Arabidopsis</i> seed (cell walls) datasets, outperforming other methods. The improved capability of capturing single-cell gene expression profiles results in a deeper understanding of the contribution of single-cell phenotypes to tissue biology.</p><p><strong>Availability & implementation: </strong>The source code of STCellbin is available at https://github.com/STOmics/STCellbin.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte110"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10905256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The basic analysis steps of spatial transcriptomics require obtaining gene expression information from both space and cells. The existing tools for these analyses incur performance issues when dealing with large datasets. These issues involve computationally intensive spatial localization, RNA genome alignment, and excessive memory usage in large chip scenarios. These problems affect the applicability and efficiency of the analysis. Here, a high-performance and accurate spatial transcriptomics data analysis workflow, called Stereo-seq Analysis Workflow (SAW), was developed for the Stereo-seq technology developed at BGI. SAW includes mRNA spatial position reconstruction, genome alignment, gene expression matrix generation, and clustering. The workflow outputs files in a universal format for subsequent personalized analysis. The execution time for the entire analysis is ∼148 min with 1 GB reads 1 × 1 cm chip test data, 1.8 times faster than with an unoptimized workflow.
{"title":"SAW: an efficient and accurate data analysis workflow for Stereo-seq spatial transcriptomics.","authors":"Chun Gong, Shengkang Li, Leying Wang, Fuxiang Zhao, Shuangsang Fang, Dong Yuan, Zijian Zhao, Qiqi He, Mei Li, Weiqing Liu, Zhaoxun Li, Hongqing Xie, Sha Liao, Ao Chen, Yong Zhang, Yuxiang Li, Xun Xu","doi":"10.46471/gigabyte.111","DOIUrl":"10.46471/gigabyte.111","url":null,"abstract":"<p><p>The basic analysis steps of spatial transcriptomics require obtaining gene expression information from both space and cells. The existing tools for these analyses incur performance issues when dealing with large datasets. These issues involve computationally intensive spatial localization, RNA genome alignment, and excessive memory usage in large chip scenarios. These problems affect the applicability and efficiency of the analysis. Here, a high-performance and accurate spatial transcriptomics data analysis workflow, called Stereo-seq Analysis Workflow (SAW), was developed for the Stereo-seq technology developed at BGI. SAW includes mRNA spatial position reconstruction, genome alignment, gene expression matrix generation, and clustering. The workflow outputs files in a universal format for subsequent personalized analysis. The execution time for the entire analysis is ∼148 min with 1 GB reads 1 × 1 cm chip test data, 1.8 times faster than with an unoptimized workflow.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte111"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10905255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-25eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.106
Xiaotong Niu, Yakui Lv, Jin Chen, Yueheng Feng, Yilin Cui, Haorong Lu, Hui Liu
Trimeresurus albolabris, also known as the white-lipped pit viper or white-lipped tree viper, is a highly venomous snake distributed across Southeast Asia and the cause of many snakebite cases. In this study, we report the first whole genome assembly of T. albolabris obtained with next-generation sequencing from a specimen collected in Mengzi, Yunnan, China. After genome sequencing and assembly, the genome of this male T. albolabris individual was 1.51 Gb in length and included 38.42% repeat-element content. Using this genome, 21,695 genes were identified, and 99.17% of genes could be annotated using gene functional databases. Our genome assembly and annotation process was validated using a phylogenetic tree, which included six species and focused on single-copy genes of nuclear genomes. This research will contribute to future studies on Trimeresurus biology and the genetic basis of snake venom.
{"title":"The genome assembly and annotation of the white-lipped tree pit viper <i>Trimeresurus albolabris</i>.","authors":"Xiaotong Niu, Yakui Lv, Jin Chen, Yueheng Feng, Yilin Cui, Haorong Lu, Hui Liu","doi":"10.46471/gigabyte.106","DOIUrl":"10.46471/gigabyte.106","url":null,"abstract":"<p><p><i>Trimeresurus albolabris</i>, also known as the white-lipped pit viper or white-lipped tree viper, is a highly venomous snake distributed across Southeast Asia and the cause of many snakebite cases. In this study, we report the first whole genome assembly of <i>T. albolabris</i> obtained with next-generation sequencing from a specimen collected in Mengzi, Yunnan, China. After genome sequencing and assembly, the genome of this male <i>T. albolabris</i> individual was 1.51 Gb in length and included 38.42% repeat-element content. Using this genome, 21,695 genes were identified, and 99.17% of genes could be annotated using gene functional databases. Our genome assembly and annotation process was validated using a phylogenetic tree, which included six species and focused on single-copy genes of nuclear genomes. This research will contribute to future studies on <i>Trimeresurus</i> biology and the genetic basis of snake venom.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte106"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10836062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139682037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-11eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.105
Magnus Wolf, Bruno Lopes da Silva Ferrette, Raphael T F Coimbra, Menno de Jong, Marcel Nebenführ, David Prochotta, Yannis Schöneberg, Konstantin Zapf, Jessica Rosenbaum, Hannah A Mc Intyre, Julia Maier, Clara C S de Souza, Lucas M Gehlhaar, Melina J Werner, Henrik Oechler, Marie Wittekind, Moritz Sonnewald, Maria A Nilsson, Axel Janke, Sven Winter
The snake pipefish, Entelurus aequoreus (Linnaeus, 1758), is a northern Atlantic fish inhabiting open seagrass environments that recently expanded its distribution range. Here, we present a highly contiguous, near chromosome-scale genome of E. aequoreus. The final assembly spans 1.6 Gbp in 7,391 scaffolds, with a scaffold N50 of 62.3 Mbp and L50 of 12. The 28 largest scaffolds (>21 Mbp) span 89.7% of the assembly length. A BUSCO completeness score of 94.1% and a mapping rate above 98% suggest a high assembly completeness. Repetitive elements cover 74.93% of the genome, one of the highest proportions identified in vertebrates. Our demographic modeling identified a peak in population size during the last interglacial period, suggesting the species might benefit from warmer water conditions. Our updated snake pipefish assembly is essential for future analyses of the morphological and molecular changes unique to the Syngnathidae.
{"title":"Near chromosome-level and highly repetitive genome assembly of the snake pipefish <i>Entelurus aequoreus</i> (Syngnathiformes: Syngnathidae).","authors":"Magnus Wolf, Bruno Lopes da Silva Ferrette, Raphael T F Coimbra, Menno de Jong, Marcel Nebenführ, David Prochotta, Yannis Schöneberg, Konstantin Zapf, Jessica Rosenbaum, Hannah A Mc Intyre, Julia Maier, Clara C S de Souza, Lucas M Gehlhaar, Melina J Werner, Henrik Oechler, Marie Wittekind, Moritz Sonnewald, Maria A Nilsson, Axel Janke, Sven Winter","doi":"10.46471/gigabyte.105","DOIUrl":"10.46471/gigabyte.105","url":null,"abstract":"<p><p>The snake pipefish, <i>Entelurus aequoreus</i> (Linnaeus, 1758), is a northern Atlantic fish inhabiting open seagrass environments that recently expanded its distribution range. Here, we present a highly contiguous, near chromosome-scale genome of <i>E. aequoreus</i>. The final assembly spans 1.6 Gbp in 7,391 scaffolds, with a scaffold N50 of 62.3 Mbp and L50 of 12. The 28 largest scaffolds (>21 Mbp) span 89.7% of the assembly length. A BUSCO completeness score of 94.1% and a mapping rate above 98% suggest a high assembly completeness. Repetitive elements cover 74.93% of the genome, one of the highest proportions identified in vertebrates. Our demographic modeling identified a peak in population size during the last interglacial period, suggesting the species might benefit from warmer water conditions. Our updated snake pipefish assembly is essential for future analyses of the morphological and molecular changes unique to the Syngnathidae.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte105"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10795108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139492894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding the distribution of Anopheles species is essential for planning and implementing malaria control programmes. This study assessed the composition and distribution of cryptic species of the main malaria vector, the Anopheles gambiae complex, in different districts of Kinshasa. Anopheles were sampled using CDC light traps in the four Kinshasa districts between July 2021 and June 2022, and then morphologically identified. Equal proportions of Anopheles gambiae s.l. per site were subjected to polymerase chain reaction to identify the cryptic species of the Anopheles gambiae complex. Anopheles gambiae complex specimens were identified throughout Kinshasa. The average density significantly differed inside and outside households. Two species of this complex circulate in Kinshasa: Anopheles gambiae and Anopheles coluzzii. In all the study sites, Anopheles gambiae was the most widespread species. Our results provide an important basis for future studies on the ecology and dynamics of cryptic species of the Anopheles gambiae complex in Kinshasa.
{"title":"Species composition and distribution of the <i>Anopheles gambiae</i> complex circulating in Kinshasa.","authors":"Josue Zanga, Emery Metelo, Nono Mvuama, Victoire Nsabatien, Vanessa Mvudi, Degani Banzulu, Osée Mansiangi, Maxwel Bamba, Narcisse Basosila, Rodrigue Agossa, Roger Wumba","doi":"10.46471/gigabyte.104","DOIUrl":"10.46471/gigabyte.104","url":null,"abstract":"<p><p>Understanding the distribution of Anopheles species is essential for planning and implementing malaria control programmes. This study assessed the composition and distribution of cryptic species of the main malaria vector, the <i>Anopheles gambiae</i> complex, in different districts of Kinshasa. Anopheles were sampled using CDC light traps in the four Kinshasa districts between July 2021 and June 2022, and then morphologically identified. Equal proportions of <i>Anopheles gambiae</i> s.l. per site were subjected to polymerase chain reaction to identify the cryptic species of the <i>Anopheles gambiae</i> complex. <i>Anopheles gambiae</i> complex specimens were identified throughout Kinshasa. The average density significantly differed inside and outside households. Two species of this complex circulate in Kinshasa: <i>Anopheles gambiae</i> and <i>Anopheles coluzzii</i>. In all the study sites, <i>Anopheles gambiae</i> was the most widespread species. Our results provide an important basis for future studies on the ecology and dynamics of cryptic species of the <i>Anopheles gambiae</i> complex in Kinshasa.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte104"},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139426145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-11eCollection Date: 2023-01-01DOI: 10.46471/gigabyte.103
Danielle C Wrenn, Devin M Drown
Antimicrobial resistance (AMR) is a global public health threat. Environmental microbial communities act as reservoirs for AMR, containing genes associated with resistance, their precursors, and the selective pressures promoting their persistence. Genomic surveillance could provide insights into how these reservoirs change and impact public health. Enriching for AMR genomic signatures in complex microbial communities would strengthen surveillance efforts and reduce time-to-answer. Here, we tested the ability of nanopore sequencing and adaptive sampling to enrich for AMR genes in a mock community of environmental origin. Our setup implemented the MinION mk1B, an NVIDIA Jetson Xavier GPU, and Flongle flow cells. Using adaptive sampling, we observed consistent enrichment by composition. On average, adaptive sampling resulted in a target composition 4× higher than without adaptive sampling. Despite a decrease in total sequencing output, adaptive sampling increased target yield in most replicates. We also demonstrate enrichment in a diverse community using an environmental sample. This method enables rapid and flexible genomic surveillance.
抗菌药耐药性(AMR)是对全球公共卫生的威胁。环境微生物群落是 AMR 的储存库,其中包含与耐药性相关的基因、其前体以及促进其持续存在的选择性压力。基因组监测可以帮助人们深入了解这些贮藏库是如何变化并影响公共卫生的。在复杂的微生物群落中丰富 AMR 基因组特征将加强监测工作并缩短回复时间。在这里,我们测试了纳米孔测序和自适应采样在环境源模拟群落中富集 AMR 基因的能力。我们的装置采用了 MinION mk1B、英伟达 Jetson Xavier GPU 和 Flongle 流动池。利用自适应采样,我们观察到了一致的成分富集。平均而言,自适应采样的目标成分比不使用自适应采样时高 4 倍。尽管测序总输出量有所下降,但在大多数重复中,自适应采样提高了目标产量。我们还利用环境样本展示了多样化群落的富集情况。这种方法可以实现快速灵活的基因组监测。
{"title":"Nanopore adaptive sampling enriches for antimicrobial resistance genes in microbial communities.","authors":"Danielle C Wrenn, Devin M Drown","doi":"10.46471/gigabyte.103","DOIUrl":"10.46471/gigabyte.103","url":null,"abstract":"<p><p>Antimicrobial resistance (AMR) is a global public health threat. Environmental microbial communities act as reservoirs for AMR, containing genes associated with resistance, their precursors, and the selective pressures promoting their persistence. Genomic surveillance could provide insights into how these reservoirs change and impact public health. Enriching for AMR genomic signatures in complex microbial communities would strengthen surveillance efforts and reduce time-to-answer. Here, we tested the ability of nanopore sequencing and adaptive sampling to enrich for AMR genes in a mock community of environmental origin. Our setup implemented the MinION mk1B, an NVIDIA Jetson Xavier GPU, and Flongle flow cells. Using adaptive sampling, we observed consistent enrichment by composition. On average, adaptive sampling resulted in a target composition 4× higher than without adaptive sampling. Despite a decrease in total sequencing output, adaptive sampling increased target yield in most replicates. We also demonstrate enrichment in a diverse community using an environmental sample. This method enables rapid and flexible genomic surveillance.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte103"},"PeriodicalIF":0.0,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10726737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138814643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-07eCollection Date: 2023-01-01DOI: 10.46471/gigabyte.101
Jiale Fan, Ruyi Huang, Diancheng Yang, Yanan Gong, Zhangbo Cui, Xinge Wang, Zicheng Su, Jing Yu, Yi Zhang, Tierui Zhang, Zhihao Jiang, Tianming Lan, He Wang, Song Huang
The king ratsnake (Elaphe carinata) of the genus Elaphe is a common large, non-venomous snake widely distributed in Southeast and East Asia. It is an economically important farmed species. As a non-venomous snake, the king ratsnake predates venomous snakes, such as cobras and pit vipers. However, the immune and digestive mechanisms of the king ratsnake remain unclear. Despite their economic and research importance, we lack genomic resources that would benefit toxicology, phylogeography, and immunogenetics studies. Here, we used single-tube long fragment read sequencing to generate the first contiguous genome of a king ratsnake from Huangshan City, Anhui province, China. The genome size is 1.56 GB with a scaffold N50 of 6.53M. The total length of the genome is approximately 621 Mb, while the repeat content is 42.26%. Additionally, we predicted 22,339 protein-coding genes, including 22,065 with functional annotations. Our genome is a potentially useful addition to those available for snakes.
{"title":"Genome assembly and annotation of the king ratsnake, <i>Elaphe carinata</i>.","authors":"Jiale Fan, Ruyi Huang, Diancheng Yang, Yanan Gong, Zhangbo Cui, Xinge Wang, Zicheng Su, Jing Yu, Yi Zhang, Tierui Zhang, Zhihao Jiang, Tianming Lan, He Wang, Song Huang","doi":"10.46471/gigabyte.101","DOIUrl":"https://doi.org/10.46471/gigabyte.101","url":null,"abstract":"<p><p>The king ratsnake (<i>Elaphe carinata</i>) of the genus Elaphe is a common large, non-venomous snake widely distributed in Southeast and East Asia. It is an economically important farmed species. As a non-venomous snake, the king ratsnake predates venomous snakes, such as cobras and pit vipers. However, the immune and digestive mechanisms of the king ratsnake remain unclear. Despite their economic and research importance, we lack genomic resources that would benefit toxicology, phylogeography, and immunogenetics studies. Here, we used single-tube long fragment read sequencing to generate the first contiguous genome of a king ratsnake from Huangshan City, Anhui province, China. The genome size is 1.56 GB with a scaffold N50 of 6.53M. The total length of the genome is approximately 621 Mb, while the repeat content is 42.26%. Additionally, we predicted 22,339 protein-coding genes, including 22,065 with functional annotations. Our genome is a potentially useful addition to those available for snakes.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte101"},"PeriodicalIF":0.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10719989/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138814642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Planorbidae comprises approximately 40 genera of freshwater gastropods, including roughly 250 species. Among the Planorbidae subfamilies, the significance of Planorbinae is due to its genus Biomphalaria, whose species are intermediate hosts of the trematode Schistosoma mansoni Sambon, 1907, which causes schistosomiasis in humans and animals. Here, we present the analysis of the dataset of Planorbidae housed in the Collection of Mollusks of the Oswaldo Cruz Institute, with a special focus on Biomphalaria species. This dataset includes 7,267 lots originating from 55 countries, representing 20 genera and 75 species collected from 1948 to 2023. Collections were performed in all regions of Brazil, comprising specimens from 26 states and the Federal District, particularly from the Southeast and Northeast. Within the dataset, Biomphalaria includes 3,926 lots of 31 species from 42 countries. These records will help improve our comprehension of schistosomiasis transmission dynamics and the geographic distributions of these medically important species.
{"title":"Sampling collections and metadata of planorbidae (Mollusca: Gastropoda) in Brazil: a comprehensive analysis of the Oswaldo Cruz Institute's Mollusk Collection from 1948 to 2023.","authors":"Silvana Carvalho Thiengo, Mariana Gomes Lima, Alexandre Bonfim Pinheiro da Silva, Raiany Thuler Nogueira, Flávia Cristina Dos Santos Rangel, Suzete Rodrigues Gomes","doi":"10.46471/gigabyte.102","DOIUrl":"https://doi.org/10.46471/gigabyte.102","url":null,"abstract":"<p><p>Planorbidae comprises approximately 40 genera of freshwater gastropods, including roughly 250 species. Among the Planorbidae subfamilies, the significance of Planorbinae is due to its genus <i>Biomphalaria</i>, whose species are intermediate hosts of the trematode <i>Schistosoma mansoni</i> Sambon, 1907, which causes schistosomiasis in humans and animals. Here, we present the analysis of the dataset of Planorbidae housed in the Collection of Mollusks of the Oswaldo Cruz Institute, with a special focus on <i>Biomphalaria</i> species. This dataset includes 7,267 lots originating from 55 countries, representing 20 genera and 75 species collected from 1948 to 2023. Collections were performed in all regions of Brazil, comprising specimens from 26 states and the Federal District, particularly from the Southeast and Northeast. Within the dataset, <i>Biomphalaria</i> includes 3,926 lots of 31 species from 42 countries. These records will help improve our comprehension of schistosomiasis transmission dynamics and the geographic distributions of these medically important species.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2023 ","pages":"gigabyte102"},"PeriodicalIF":0.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10719988/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138814644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}