首页 > 最新文献

GigaByte (Hong Kong, China)最新文献

英文 中文
Chromosome-level genome assembly and annotation of the crested gecko, Correlophus ciliatus, a lizard incapable of tail regeneration. 冠壁虎(Correlophus ciliatus)染色体水平的基因组组装和注释,冠壁虎是一种无法进行尾部再生的蜥蜴。
Pub Date : 2024-11-06 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.140
Marc A Gumangan, Zheyu Pan, Thomas P Lozito

The vast majority of gecko species are capable of tail regeneration, but singular geckos of Correlophus, Uroplatus, and Nephrurus genera are unable to regrow lost tails. Of these non-regenerative geckos, the crested gecko (Correlophus ciliatus) is distinguished by ready availability, ease of care, high productivity, and hybridization potential. These features make C. ciliatus particularly suited as a model for studying the genetic, molecular, and cellular mechanisms underlying loss of tail regeneration capabilities. We report a contiguous genome of C. ciliatus with a total size of 1.65 Gb, 152 scaffolds, L50 of 6, and N50 of 109 Mb. Repetitive content consists of 40.41% of the genome, and a total of 30,780 genes were annotated. Our assembly of the crested gecko genome provides a valuable resource for future comparative genomic studies between non-regenerative and regenerative geckos and other squamate reptiles.

Findings: We report genome sequencing, assembly, and annotation for the crested gecko, Correlophus ciliatus.

绝大多数壁虎物种都具有尾巴再生能力,但Correlophus属、Uroplatus属和Nephrurus属的奇特壁虎无法再生失去的尾巴。在这些不具备再生能力的壁虎中,冠壁虎(Correlophus ciliatus)的特点是随时可用、易于照料、产量高且具有杂交潜力。这些特点使冠壁虎特别适合作为研究尾部再生能力丧失的遗传、分子和细胞机制的模型。我们报告的纤毛虫连续基因组总大小为 1.65 Gb,有 152 个支架,L50 为 6,N50 为 109 Mb。重复内容占基因组的 40.41%,共注释了 30,780 个基因。我们对冠壁虎基因组的组装为未来非再生壁虎和再生壁虎以及其他有鳞类爬行动物的基因组比较研究提供了宝贵的资源:我们报告了冠壁虎(Correlophus ciliatus)的基因组测序、组装和注释。
{"title":"Chromosome-level genome assembly and annotation of the crested gecko, <i>Correlophus ciliatus</i>, a lizard incapable of tail regeneration.","authors":"Marc A Gumangan, Zheyu Pan, Thomas P Lozito","doi":"10.46471/gigabyte.140","DOIUrl":"10.46471/gigabyte.140","url":null,"abstract":"<p><p>The vast majority of gecko species are capable of tail regeneration, but singular geckos of <i>Correlophus</i>, <i>Uroplatus</i>, and <i>Nephrurus</i> genera are unable to regrow lost tails. Of these non-regenerative geckos, the crested gecko (<i>Correlophus ciliatus</i>) is distinguished by ready availability, ease of care, high productivity, and hybridization potential. These features make <i>C. ciliatus</i> particularly suited as a model for studying the genetic, molecular, and cellular mechanisms underlying loss of tail regeneration capabilities. We report a contiguous genome of <i>C. ciliatus</i> with a total size of 1.65 Gb, 152 scaffolds, L50 of 6, and N50 of 109 Mb. Repetitive content consists of 40.41% of the genome, and a total of 30,780 genes were annotated. Our assembly of the crested gecko genome provides a valuable resource for future comparative genomic studies between non-regenerative and regenerative geckos and other squamate reptiles.</p><p><strong>Findings: </strong>We report genome sequencing, assembly, and annotation for the crested gecko, <i>Correlophus ciliatus</i>.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte140"},"PeriodicalIF":0.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142634020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TSTA: thread and SIMD-based trapezoidal pairwise/multiple sequence-alignment method. TSTA:基于线程和 SIMD 的梯形配对/多序列比对方法。
Pub Date : 2024-11-05 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.141
Peiyu Zong, Wenpeng Deng, Jian Liu, Jue Ruan

The rapid advancements in sequencing length necessitate the adoption of increasingly efficient sequence alignment algorithms. The Needleman-Wunsch method introduces the foundational dynamic-programming matrix calculation for global alignment, which evaluates the overall alignment of sequences. However, this method is known to be highly time-consuming. The proposed TSTA algorithm leverages both vector-level and thread-level parallelism to accelerate pairwise and multiple sequence alignments.

Availability and implementation: Source codes are available at https://github.com/bxskdh/TSTA.

随着测序长度的快速发展,有必要采用越来越高效的序列比对算法。Needleman-Wunsch 方法引入了用于全局比对的基础动态编程矩阵计算,该方法对序列的整体比对进行评估。然而,众所周知这种方法非常耗时。所提出的 TSTA 算法利用向量级和线程级并行性来加速成对和多序列比对:源代码可从 https://github.com/bxskdh/TSTA 获取。
{"title":"TSTA: thread and SIMD-based trapezoidal pairwise/multiple sequence-alignment method.","authors":"Peiyu Zong, Wenpeng Deng, Jian Liu, Jue Ruan","doi":"10.46471/gigabyte.141","DOIUrl":"10.46471/gigabyte.141","url":null,"abstract":"<p><p>The rapid advancements in sequencing length necessitate the adoption of increasingly efficient sequence alignment algorithms. The Needleman-Wunsch method introduces the foundational dynamic-programming matrix calculation for global alignment, which evaluates the overall alignment of sequences. However, this method is known to be highly time-consuming. The proposed TSTA algorithm leverages both vector-level and thread-level parallelism to accelerate pairwise and multiple sequence alignments.</p><p><strong>Availability and implementation: </strong>Source codes are available at https://github.com/bxskdh/TSTA.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte141"},"PeriodicalIF":0.0,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142633945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMARTER-database: a tool to integrate SNP array datasets for sheep and goat breeds. SMARTER 数据库:整合绵羊和山羊品种 SNP 阵列数据集的工具。
Pub Date : 2024-10-21 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.139
Paolo Cozzi, Arianna Manunza, Johanna Ramirez-Diaz, Valentina Tsartsianidou, Konstantinos Gkagkavouzis, Pablo Peraza, Anna Maria Johansson, Juan José Arranz, Fernando Freire, Szilvia Kusza, Filippo Biscarini, Lucy Peters, Gwenola Tosser-Klopp, Gabriel Ciappesoni, Alexandros Triantafyllidis, Rachel Rupp, Bertrand Servin, Alessandra Stella

Underutilized sheep and goat breeds can adapt to challenging environments due to their genetics. Integrating publicly available genomic datasets with new data will facilitate genetic diversity analyses; however, this process is complicated by data discrepancies, such as outdated assembly versions or different data formats. Here, we present the SMARTER-database, a collection of tools and scripts to standardize genomic data and metadata, mainly from SNP chip arrays on global small ruminant populations, with a focus on reproducibility. SMARTER-database harmonizes genotypes for about 12,000 sheep and 6,000 goats to a uniform coding and assembly version. Users can access the genotype data via File Transfer Protocol and interact with the metadata through a web interface or using their custom scripts, enabling efficient filtering and selection of samples. These tools will empower researchers to focus on the crucial aspects of adaptation and contribute to livestock sustainability, leveraging the rich dataset provided by the SMARTER-database.

Availability and implementation: The code is available as open-source software under the MIT license at https://github.com/cnr-ibba/SMARTER-database.

未得到充分利用的绵羊和山羊品种因其基因而能够适应具有挑战性的环境。将公开的基因组数据集与新数据整合起来将有助于遗传多样性分析;然而,数据差异(如过期的组装版本或不同的数据格式)使这一过程变得复杂。在此,我们介绍 SMARTER 数据库,它是一个工具和脚本集合,用于标准化基因组数据和元数据,主要来自全球小反刍动物种群的 SNP 芯片阵列,重点在于可重复性。SMARTER 数据库将大约 12,000 只绵羊和 6,000 只山羊的基因型统一为统一编码和组装版本。用户可以通过文件传输协议访问基因型数据,并通过网络接口或使用自定义脚本与元数据进行交互,从而有效地筛选和选择样本。这些工具将使研究人员能够利用 SMARTER 数据库提供的丰富数据集,专注于适应性的关键方面,为畜牧业的可持续发展做出贡献:代码可在 https://github.com/cnr-ibba/SMARTER-database 网站上以 MIT 许可的开源软件形式获取。
{"title":"SMARTER-database: a tool to integrate SNP array datasets for sheep and goat breeds.","authors":"Paolo Cozzi, Arianna Manunza, Johanna Ramirez-Diaz, Valentina Tsartsianidou, Konstantinos Gkagkavouzis, Pablo Peraza, Anna Maria Johansson, Juan José Arranz, Fernando Freire, Szilvia Kusza, Filippo Biscarini, Lucy Peters, Gwenola Tosser-Klopp, Gabriel Ciappesoni, Alexandros Triantafyllidis, Rachel Rupp, Bertrand Servin, Alessandra Stella","doi":"10.46471/gigabyte.139","DOIUrl":"https://doi.org/10.46471/gigabyte.139","url":null,"abstract":"<p><p>Underutilized sheep and goat breeds can adapt to challenging environments due to their genetics. Integrating publicly available genomic datasets with new data will facilitate genetic diversity analyses; however, this process is complicated by data discrepancies, such as outdated assembly versions or different data formats. Here, we present the SMARTER-database, a collection of tools and scripts to standardize genomic data and metadata, mainly from SNP chip arrays on global small ruminant populations, with a focus on reproducibility. SMARTER-database harmonizes genotypes for about 12,000 sheep and 6,000 goats to a uniform coding and assembly version. Users can access the genotype data via File Transfer Protocol and interact with the metadata through a web interface or using their custom scripts, enabling efficient filtering and selection of samples. These tools will empower researchers to focus on the crucial aspects of adaptation and contribute to livestock sustainability, leveraging the rich dataset provided by the SMARTER-database.</p><p><strong>Availability and implementation: </strong>The code is available as open-source software under the MIT license at https://github.com/cnr-ibba/SMARTER-database.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte139"},"PeriodicalIF":0.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11519891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building a community-driven bioinformatics platform to facilitate Cannabis sativa multi-omics research. 建立社区驱动的生物信息学平台,促进大麻多组学研究。
Pub Date : 2024-10-18 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.137
Locedie Mansueto, Tobias Kretzschmar, Ramil Mauleon, Graham J King

Global changes in cannabis legislation after decades of stringent regulation and heightened demand for its industrial and medicinal applications have spurred recent genetic and genomics research. An international research community emerged and identified the need for a web portal to host cannabis-specific datasets that seamlessly integrates multiple data sources and serves omics-type analyses, fostering information sharing. The Tripal platform was used to host public genome assemblies, gene annotations, quantitative trait loci and genetic maps, gene and protein expression data, metabolic profiles and their sample attributes. Single nucleotide polymorphisms were called using public resequencing datasets on three genomes. Additional applications, such as SNP-Seek and MapManJS, were embedded into Tripal. A multi-omics data integration web-service Application Programming Interface (API), developed on top of existing Tripal modules, returns generic tables of samples, properties and values. Use cases demonstrate the API's utility for various omics analyses, enabling researchers to perform multi-omics analyses efficiently.

Availability and implementation: The web portal can be accessed at www.icgrc.info.

经过数十年的严格监管,全球大麻立法发生了变化,对其工业和医疗应用的需求增加,这刺激了近期的遗传学和基因组学研究。一个国际研究团体应运而生,并确定需要一个门户网站来托管大麻特定数据集,该门户网站可无缝集成多个数据源,并提供全局分析,促进信息共享。Tripal 平台用于托管公共基因组组装、基因注释、定量性状位点和遗传图谱、基因和蛋白质表达数据、代谢图谱及其样本属性。利用三个基因组的公共重测序数据集对单核苷酸多态性进行了调用。此外,Tripal 还嵌入了 SNP-Seek 和 MapManJS 等其他应用程序。在现有 Tripal 模块基础上开发的多组学数据集成网络服务应用编程接口(API)可返回样本、属性和值的通用表格。使用案例展示了该应用编程接口在各种omics分析中的实用性,使研究人员能够高效地进行多组学分析:门户网站:www.icgrc.info。
{"title":"Building a community-driven bioinformatics platform to facilitate <i>Cannabis sativa</i> multi-omics research.","authors":"Locedie Mansueto, Tobias Kretzschmar, Ramil Mauleon, Graham J King","doi":"10.46471/gigabyte.137","DOIUrl":"10.46471/gigabyte.137","url":null,"abstract":"<p><p>Global changes in cannabis legislation after decades of stringent regulation and heightened demand for its industrial and medicinal applications have spurred recent genetic and genomics research. An international research community emerged and identified the need for a web portal to host cannabis-specific datasets that seamlessly integrates multiple data sources and serves omics-type analyses, fostering information sharing. The Tripal platform was used to host public genome assemblies, gene annotations, quantitative trait loci and genetic maps, gene and protein expression data, metabolic profiles and their sample attributes. Single nucleotide polymorphisms were called using public resequencing datasets on three genomes. Additional applications, such as SNP-Seek and MapManJS, were embedded into Tripal. A multi-omics data integration web-service Application Programming Interface (API), developed on top of existing Tripal modules, returns generic tables of samples, properties and values. Use cases demonstrate the API's utility for various omics analyses, enabling researchers to perform multi-omics analyses efficiently.</p><p><strong>Availability and implementation: </strong>The web portal can be accessed at www.icgrc.info.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte137"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11515022/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PhysiMeSS - a new physiCell addon for extracellular matrix modelling. PhysiMeSS - 用于细胞外基质建模的全新 physiCell 附加组件。
Pub Date : 2024-10-16 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.136
Vincent Noël, Marco Ruscone, Robyn Shuttleworth, Cicely K Macnamara

The extracellular matrix, composed of macromolecules like collagen fibres, provides structural support to cells and acts as a barrier that metastatic cells degrade to spread beyond the primary tumour. While agent-based frameworks, such as PhysiCell, can simulate the spatial dynamics of tumour evolution, they only implement cells as circles (2D) or spheres (3D). To model the extracellular matrix as a network of fibres, we require a new type of agent represented by line segments (2D) or cylinders (3D). Here, we present PhysiMeSS, an addon of PhysiCell, introducing a new agent type to describe fibres and their physical interactions with cells and other fibres. PhysiMeSS implementation is available at https://github.com/PhysiMeSS/PhysiMeSS and in the official PhysiCell repository. We provide examples describing the possibilities of this framework. This tool may help tackle important biological questions, such as diseases linked to dysregulation of the extracellular matrix or the processes leading to cancer metastasis.

细胞外基质由胶原纤维等大分子组成,为细胞提供结构支撑,并作为转移细胞降解后向原发肿瘤以外扩散的屏障。虽然基于代理的框架(如 PhysiCell)可以模拟肿瘤演变的空间动态,但它们只能将细胞模拟为圆形(2D)或球形(3D)。要将细胞外基质建模为纤维网络,我们需要一种以线段(二维)或圆柱体(三维)为代表的新型代理。在此,我们介绍 PhysiCell 的附加组件 PhysiMeSS,它引入了一种新的代理类型来描述纤维及其与细胞和其他纤维的物理交互。PhysiMeSS 的实现可从 https://github.com/PhysiMeSS/PhysiMeSS 和官方 PhysiCell 存储库中获取。我们将举例说明这一框架的可能性。该工具可帮助解决一些重要的生物学问题,如与细胞外基质失调有关的疾病或导致癌症转移的过程。
{"title":"PhysiMeSS - a new physiCell addon for extracellular matrix modelling.","authors":"Vincent Noël, Marco Ruscone, Robyn Shuttleworth, Cicely K Macnamara","doi":"10.46471/gigabyte.136","DOIUrl":"https://doi.org/10.46471/gigabyte.136","url":null,"abstract":"<p><p>The extracellular matrix, composed of macromolecules like collagen fibres, provides structural support to cells and acts as a barrier that metastatic cells degrade to spread beyond the primary tumour. While agent-based frameworks, such as PhysiCell, can simulate the spatial dynamics of tumour evolution, they only implement cells as circles (2D) or spheres (3D). To model the extracellular matrix as a network of fibres, we require a new type of agent represented by line segments (2D) or cylinders (3D). Here, we present PhysiMeSS, an addon of PhysiCell, introducing a new agent type to describe fibres and their physical interactions with cells and other fibres. PhysiMeSS implementation is available at https://github.com/PhysiMeSS/PhysiMeSS and in the official PhysiCell repository. We provide examples describing the possibilities of this framework. This tool may help tackle important biological questions, such as diseases linked to dysregulation of the extracellular matrix or the processes leading to cancer metastasis.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte136"},"PeriodicalIF":0.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NucBalancer: streamlining barcode sequence selection for optimal sample pooling for sequencing. NucBalancer:简化条形码序列选择,优化测序样本池。
Pub Date : 2024-10-11 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.138
Saurabh Gupta, Ankur Sharma

Recent advancements in next-generation sequencing (NGS) technologies have brought to the forefront the necessity for versatile, cost-effective tools capable of adapting to a rapidly evolving landscape. The emergence of numerous new sequencing platforms, each with unique sample preparation and sequencing requirements, underscores the importance of efficient barcode balancing for successful pooling and accurate demultiplexing of samples. Recently launched new sequencing systems claiming better affordability comparable to more established platforms further exemplifies these challenges, especially when libraries originally prepared for one platform need conversion to another. In response to this dynamic environment, we introduce NucBalancer, a Shiny app developed for the optimal selection of barcode sequences. While initially tailored to meet the nucleotide, composition challenges specific to G400 and T7 series sequencers, NucBalancer's utility significantly broadens to accommodate the varied demands of these new sequencing technologies. Its application is particularly crucial in single-cell genomics, enabling the adaptation of libraries, such as those prepared for 10x technology, to various sequencers including G400 and T7 series sequencers. NucBalancer efficiently balances nucleotide composition and sample concentrations, reducing biases and enhancing the reliability of NGS data across platforms. Its adaptability makes it invaluable for addressing sequencing challenges, ensuring effective barcode balancing for sample pooling on any platform.

Availability and implementation: NucBalancer is implemented in R and is available at https://github.com/ersgupta/NucBalancer. Additionally, a shiny interface is available at https://ersgupta.shinyapps.io/NucBalancer/.

下一代测序(NGS)技术的最新进展凸显了多功能、高性价比工具的必要性,这些工具必须能够适应快速发展的环境。大量新测序平台的出现(每种平台都有独特的样品制备和测序要求)凸显了高效条形码平衡对于成功汇集样品和准确解复用样品的重要性。最近推出的新测序系统声称其价格可与更成熟的平台相媲美,这进一步体现了这些挑战,尤其是当原本为一个平台制备的文库需要转换到另一个平台时。为了应对这种动态环境,我们推出了 NucBalancer,这是一款为优化条形码序列选择而开发的 Shiny 应用程序。虽然 NucBalancer 最初是为应对 G400 和 T7 系列测序仪特有的核苷酸组成挑战而量身定制的,但它的用途已大大扩展,以适应这些新测序技术的不同需求。NucBalancer 在单细胞基因组学中的应用尤为重要,它能使文库(如为 10x 技术制备的文库)适用于各种测序仪,包括 G400 和 T7 系列测序仪。NucBalancer 能有效平衡核苷酸组成和样本浓度,减少偏差,提高跨平台 NGS 数据的可靠性。NucBalancer 的适应性使其在应对测序挑战时非常有价值,可确保在任何平台上对样本池进行有效的条形码平衡:NucBalancer 采用 R 语言实现,可在 https://github.com/ersgupta/NucBalancer 上获取。此外,您还可以在 https://ersgupta.shinyapps.io/NucBalancer/ 上找到一个闪亮的界面。
{"title":"NucBalancer: streamlining barcode sequence selection for optimal sample pooling for sequencing.","authors":"Saurabh Gupta, Ankur Sharma","doi":"10.46471/gigabyte.138","DOIUrl":"10.46471/gigabyte.138","url":null,"abstract":"<p><p>Recent advancements in next-generation sequencing (NGS) technologies have brought to the forefront the necessity for versatile, cost-effective tools capable of adapting to a rapidly evolving landscape. The emergence of numerous new sequencing platforms, each with unique sample preparation and sequencing requirements, underscores the importance of efficient barcode balancing for successful pooling and accurate demultiplexing of samples. Recently launched new sequencing systems claiming better affordability comparable to more established platforms further exemplifies these challenges, especially when libraries originally prepared for one platform need conversion to another. In response to this dynamic environment, we introduce NucBalancer, a Shiny app developed for the optimal selection of barcode sequences. While initially tailored to meet the nucleotide, composition challenges specific to G400 and T7 series sequencers, NucBalancer's utility significantly broadens to accommodate the varied demands of these new sequencing technologies. Its application is particularly crucial in single-cell genomics, enabling the adaptation of libraries, such as those prepared for 10x technology, to various sequencers including G400 and T7 series sequencers. NucBalancer efficiently balances nucleotide composition and sample concentrations, reducing biases and enhancing the reliability of NGS data across platforms. Its adaptability makes it invaluable for addressing sequencing challenges, ensuring effective barcode balancing for sample pooling on any platform.</p><p><strong>Availability and implementation: </strong>NucBalancer is implemented in R and is available at https://github.com/ersgupta/NucBalancer. Additionally, a shiny interface is available at https://ersgupta.shinyapps.io/NucBalancer/.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte138"},"PeriodicalIF":0.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11488490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CannSeek? Yes we Can! An open-source single nucleotide polymorphism database and analysis portal for Cannabis sativa. CannSeek?是的,我们可以!针对大麻的开源单核苷酸多态性数据库和分析门户网站。
Pub Date : 2024-10-08 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.135
Locedie Mansueto, Kenneth L McNally, Tobias Kretzschmar, Ramil Mauleon

A growing interest in Cannabis sativa uses for food, fiber, and medicine, and recent changes in regulations have spurred numerous genomic studies of this once-prohibited plant. Cannabis research uses Next Generation Sequencing technologies for genomics and transcriptomics. While other crops have genome portals enabling access and analysis of numerous genotyping data from diverse accessions, leading to the discovery of alleles for important traits, this is absent for cannabis. The CannSeek web portal aims to address this gap. Single nucleotide polymorphism datasets were generated by identifying genome variants from public resequencing data and genome assemblies. Results and accompanying trait data are hosted in the CannSeek web application, built using the Rice SNP-Seek infrastructure with improvements to allow multiple reference genomes and provide a web-service Application Programming Interface. The tools built into the portal allow phylogenetic analyses, varietal grouping and identifications, and favorable haplotype discovery for cannabis accessions using public sequencing data.

Availability and implementation: The CannSeek portal is available at https://icgrc.info/cannseek, https://icgrc.info/genotype_viewer.

人们对大麻用于食品、纤维和医药的兴趣日益浓厚,加上最近法规的变化,促使人们对这种曾经被禁止的植物进行了大量的基因组研究。大麻研究利用新一代测序技术进行基因组学和转录组学研究。其他作物的基因组门户网站可以访问和分析来自不同品种的大量基因分型数据,从而发现重要性状的等位基因,但大麻却没有这样的门户网站。CannSeek 门户网站旨在填补这一空白。单核苷酸多态性数据集是通过识别公共重测序数据和基因组组装中的基因组变异而生成的。结果和伴随的性状数据托管在 CannSeek 网络应用程序中,该程序使用水稻 SNP-Seek 基础设施构建,并进行了改进,允许使用多个参考基因组,并提供了网络服务应用编程接口。该门户网站内置的工具可利用公共测序数据对大麻品种进行系统发育分析、品种分组和鉴定,以及发现有利的单倍型:CannSeek 门户网站的网址是:https://icgrc.info/cannseek, https://icgrc.info/genotype_viewer。
{"title":"CannSeek? Yes we Can! An open-source single nucleotide polymorphism database and analysis portal for <i>Cannabis sativa</i>.","authors":"Locedie Mansueto, Kenneth L McNally, Tobias Kretzschmar, Ramil Mauleon","doi":"10.46471/gigabyte.135","DOIUrl":"https://doi.org/10.46471/gigabyte.135","url":null,"abstract":"<p><p>A growing interest in <i>Cannabis sativa</i> uses for food, fiber, and medicine, and recent changes in regulations have spurred numerous genomic studies of this once-prohibited plant. Cannabis research uses Next Generation Sequencing technologies for genomics and transcriptomics. While other crops have genome portals enabling access and analysis of numerous genotyping data from diverse accessions, leading to the discovery of alleles for important traits, this is absent for cannabis. The CannSeek web portal aims to address this gap. Single nucleotide polymorphism datasets were generated by identifying genome variants from public resequencing data and genome assemblies. Results and accompanying trait data are hosted in the CannSeek web application, built using the Rice SNP-Seek infrastructure with improvements to allow multiple reference genomes and provide a web-service Application Programming Interface. The tools built into the portal allow phylogenetic analyses, varietal grouping and identifications, and favorable haplotype discovery for cannabis accessions using public sequencing data.</p><p><strong>Availability and implementation: </strong>The CannSeek portal is available at https://icgrc.info/cannseek, https://icgrc.info/genotype_viewer.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte135"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480739/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-speed whole-genome sequencing of a Whippet: Rapid chromosome-level assembly and annotation of an extremely fast dog's genome. 对一只惠比特犬进行高速全基因组测序:对速度极快的狗的基因组进行染色体级快速组装和注释。
Pub Date : 2024-09-13 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.134
Marcel Nebenführ, David Prochotta, Alexander Ben Hamadou, Axel Janke, Charlotte Gerheim, Christian Betz, Carola Greve, Hanno Jörn Bolz

The time required for genome sequencing and de novo assembly depends on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow, often constrained by external sequencing services. Bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities, we aimed at generating a high-quality mammalian de novo genome in minimal time. We present the first chromosome-level genome assembly of the Whippet, using PacBio long-read high-fidelity sequencing and reference-guided scaffolding. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. Annotation using mammalian genomes and transcriptome data yielded 28,383 transcripts, 90.9% complete BUSCO genes, and identified 36.5% repeat content. Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week, adding another high-quality reference genome to the available sequences of domestic dog breeds.

基因组测序和从头组装所需的时间取决于实验室工作、测序能力和生物信息学工作流程之间的相互作用,通常受到外部测序服务的限制。我们联合了生物多样性学术机构和一家具有广泛测序能力的医疗诊断公司,旨在用最短的时间生成高质量的哺乳动物从头基因组。我们利用 PacBio 长线程高保真测序技术和参考文献指导的支架技术,首次完成了惠比特犬染色体组水平的基因组组装。最终组装的等位基因 N50 为 55 Mbp,支架 N50 为 65.7 Mbp。装配总长度为 2.47 Gbp,其中 2.43 Gpb 分成 39 个染色体长度的支架。利用哺乳动物基因组和转录组数据进行了注释,得到了 28,383 个转录本、90.9% 完整的 BUSCO 基因,并确定了 36.5% 的重复内容。惠比特犬染色体级基因组的测序、组装和支架制作耗时不到一周,为现有的家犬品种序列又增加了一个高质量的参考基因组。
{"title":"High-speed whole-genome sequencing of a Whippet: Rapid chromosome-level assembly and annotation of an extremely fast dog's genome.","authors":"Marcel Nebenführ, David Prochotta, Alexander Ben Hamadou, Axel Janke, Charlotte Gerheim, Christian Betz, Carola Greve, Hanno Jörn Bolz","doi":"10.46471/gigabyte.134","DOIUrl":"10.46471/gigabyte.134","url":null,"abstract":"<p><p>The time required for genome sequencing and <i>de novo</i> assembly depends on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow, often constrained by external sequencing services. Bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities, we aimed at generating a high-quality mammalian <i>de novo</i> genome in minimal time. We present the first chromosome-level genome assembly of the Whippet, using PacBio long-read high-fidelity sequencing and reference-guided scaffolding. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. Annotation using mammalian genomes and transcriptome data yielded 28,383 transcripts, 90.9% complete BUSCO genes, and identified 36.5% repeat content. Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week, adding another high-quality reference genome to the available sequences of domestic dog breeds.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte134"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418881/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RiboSnake - a user-friendly, robust, reproducible, multipurpose and documentation-extensive pipeline for 16S rRNA gene microbiome analysis. RiboSnake - 用于 16S rRNA 基因微生物组分析的用户友好型、稳健型、可重现型、多用途和文档丰富型管道。
Pub Date : 2024-08-31 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.132
Ann-Kathrin Dörr, Josefa Welling, Adrian Dörr, Jule Gosch, Hannah Möhlen, Ricarda Schmithausen, Jan Kehrmann, Folker Meyer, Ivana Kraiselburd

Background: Next-generation sequencing for microbial communities has become a standard technique. However, the computational analysis remains resource-intensive. With declining costs and growing adoption of sequencing-based methods in many fields, validated, fully automated, reproducible and flexible pipelines are increasingly essential in various scientific fields.

Results: We present RiboSnake, a validated, automated, reproducible QIIME2-based pipeline implemented in Snakemake for analysing 16S rRNA gene amplicon sequencing data. RiboSnake includes pre-packaged validated parameter sets optimized for different sample types, from environmental samples to patient data. The configuration packages can be easily adapted and shared, requiring minimal user input.

Conclusion: RiboSnake is a new alternative for researchers employing 16S rRNA gene amplicon sequencing and looking for a customizable and user-friendly pipeline for microbiome analyses with in vitro validated settings. By automating the analysis with validated parameters for diverse sample types, RiboSnake enhances existing methods significantly. The workflow repository can be found on GitHub (https://github.com/IKIM-Essen/RiboSnake).

背景:微生物群落的新一代测序已成为一项标准技术。然而,计算分析仍然是资源密集型的。随着成本的下降和许多领域越来越多地采用基于测序的方法,经过验证的、全自动的、可重现的和灵活的管道在各个科学领域越来越重要:我们介绍了 RiboSnake,这是一种基于 QIIME2 的经过验证、自动化、可重现的管道,在 Snakemake 中实现,用于分析 16S rRNA 基因扩增片段测序数据。RiboSnake 包括针对不同样本类型(从环境样本到患者数据)优化的预打包验证参数集。这些配置包可以很容易地调整和共享,只需极少的用户输入:RiboSnake 是采用 16S rRNA 基因扩增片段测序的研究人员的新选择,他们正在寻找一种可定制且用户友好的管道,利用体外验证设置进行微生物组分析。RiboSnake 可针对不同类型的样本使用经过验证的参数进行自动化分析,从而大大增强了现有方法的功能。可在 GitHub(https://github.com/IKIM-Essen/RiboSnake)上找到该工作流程的资源库。
{"title":"RiboSnake - a user-friendly, robust, reproducible, multipurpose and documentation-extensive pipeline for 16S rRNA gene microbiome analysis.","authors":"Ann-Kathrin Dörr, Josefa Welling, Adrian Dörr, Jule Gosch, Hannah Möhlen, Ricarda Schmithausen, Jan Kehrmann, Folker Meyer, Ivana Kraiselburd","doi":"10.46471/gigabyte.132","DOIUrl":"10.46471/gigabyte.132","url":null,"abstract":"<p><strong>Background: </strong>Next-generation sequencing for microbial communities has become a standard technique. However, the computational analysis remains resource-intensive. With declining costs and growing adoption of sequencing-based methods in many fields, validated, fully automated, reproducible and flexible pipelines are increasingly essential in various scientific fields.</p><p><strong>Results: </strong>We present RiboSnake, a validated, automated, reproducible QIIME2-based pipeline implemented in Snakemake for analysing <i>16S rRNA</i> gene amplicon sequencing data. RiboSnake includes pre-packaged validated parameter sets optimized for different sample types, from environmental samples to patient data. The configuration packages can be easily adapted and shared, requiring minimal user input.</p><p><strong>Conclusion: </strong>RiboSnake is a new alternative for researchers employing <i>16S rRNA</i> gene amplicon sequencing and looking for a customizable and user-friendly pipeline for microbiome analyses with <i>in vitro</i> validated settings. By automating the analysis with validated parameters for diverse sample types, RiboSnake enhances existing methods significantly. The workflow repository can be found on GitHub (https://github.com/IKIM-Essen/RiboSnake).</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte132"},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11448241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated management of AWS instances for training. 自动管理用于培训的 AWS 实例。
Pub Date : 2024-08-29 eCollection Date: 2024-01-01 DOI: 10.46471/gigabyte.133
Jorge Buenabad-Chavez, Evelyn Greeves, James P J Chong, Emma Rand

Amazon Web Services (AWS) instances provide a convenient way to run training on complex 'omics data analysis workflows without requiring participants to install software packages or store large data volumes locally. However, efficiently managing dozens of instances is challenging for training providers. We present a set of Bash scripts that make it quick and easy to manage Linux AWS instances pre-configured with all the software analysis tools and data needed for a course, and accessible using encrypted login keys and optional domain names. Creating over 30 instances takes 10-15 minutes. A comprehensive online tutorial describes how to set up and use an AWS account and the scripts, and how to customise AWS instance templates with other software tools and data. We anticipate that others offering similar training may benefit from using the scripts regardless of the analyses being taught.

亚马逊网络服务(AWS)实例为复杂的 "omics "数据分析工作流程培训提供了一种便捷的方式,学员无需安装软件包或在本地存储大量数据。然而,对于培训提供商来说,高效管理数十个实例极具挑战性。我们提供了一套 Bash 脚本,可快速、轻松地管理预配置了课程所需的所有软件分析工具和数据的 Linux AWS 实例,并可使用加密登录密钥和可选域名进行访问。创建 30 多个实例只需 10-15 分钟。全面的在线教程介绍了如何设置和使用 AWS 账户和脚本,以及如何使用其他软件工具和数据自定义 AWS 实例模板。我们预计,无论教授何种分析,其他提供类似培训的人员都可以从使用脚本中获益。
{"title":"Automated management of AWS instances for training.","authors":"Jorge Buenabad-Chavez, Evelyn Greeves, James P J Chong, Emma Rand","doi":"10.46471/gigabyte.133","DOIUrl":"https://doi.org/10.46471/gigabyte.133","url":null,"abstract":"<p><p>Amazon Web Services (AWS) instances provide a convenient way to run training on complex 'omics data analysis workflows without requiring participants to install software packages or store large data volumes locally. However, efficiently managing dozens of instances is challenging for training providers. We present a set of Bash scripts that make it quick and easy to manage Linux AWS instances pre-configured with all the software analysis tools and data needed for a course, and accessible using encrypted login keys and optional domain names. Creating over 30 instances takes 10-15 minutes. A comprehensive online tutorial describes how to set up and use an AWS account and the scripts, and how to customise AWS instance templates with other software tools and data. We anticipate that others offering similar training may benefit from using the scripts regardless of the analyses being taught.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte133"},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11382607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaByte (Hong Kong, China)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1