Pub Date : 2024-10-16eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.136
Vincent Noël, Marco Ruscone, Robyn Shuttleworth, Cicely K Macnamara
The extracellular matrix, composed of macromolecules like collagen fibres, provides structural support to cells and acts as a barrier that metastatic cells degrade to spread beyond the primary tumour. While agent-based frameworks, such as PhysiCell, can simulate the spatial dynamics of tumour evolution, they only implement cells as circles (2D) or spheres (3D). To model the extracellular matrix as a network of fibres, we require a new type of agent represented by line segments (2D) or cylinders (3D). Here, we present PhysiMeSS, an addon of PhysiCell, introducing a new agent type to describe fibres and their physical interactions with cells and other fibres. PhysiMeSS implementation is available at https://github.com/PhysiMeSS/PhysiMeSS and in the official PhysiCell repository. We provide examples describing the possibilities of this framework. This tool may help tackle important biological questions, such as diseases linked to dysregulation of the extracellular matrix or the processes leading to cancer metastasis.
{"title":"PhysiMeSS - a new physiCell addon for extracellular matrix modelling.","authors":"Vincent Noël, Marco Ruscone, Robyn Shuttleworth, Cicely K Macnamara","doi":"10.46471/gigabyte.136","DOIUrl":"https://doi.org/10.46471/gigabyte.136","url":null,"abstract":"<p><p>The extracellular matrix, composed of macromolecules like collagen fibres, provides structural support to cells and acts as a barrier that metastatic cells degrade to spread beyond the primary tumour. While agent-based frameworks, such as PhysiCell, can simulate the spatial dynamics of tumour evolution, they only implement cells as circles (2D) or spheres (3D). To model the extracellular matrix as a network of fibres, we require a new type of agent represented by line segments (2D) or cylinders (3D). Here, we present PhysiMeSS, an addon of PhysiCell, introducing a new agent type to describe fibres and their physical interactions with cells and other fibres. PhysiMeSS implementation is available at https://github.com/PhysiMeSS/PhysiMeSS and in the official PhysiCell repository. We provide examples describing the possibilities of this framework. This tool may help tackle important biological questions, such as diseases linked to dysregulation of the extracellular matrix or the processes leading to cancer metastasis.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte136"},"PeriodicalIF":0.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500100/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.138
Saurabh Gupta, Ankur Sharma
Recent advancements in next-generation sequencing (NGS) technologies have brought to the forefront the necessity for versatile, cost-effective tools capable of adapting to a rapidly evolving landscape. The emergence of numerous new sequencing platforms, each with unique sample preparation and sequencing requirements, underscores the importance of efficient barcode balancing for successful pooling and accurate demultiplexing of samples. Recently launched new sequencing systems claiming better affordability comparable to more established platforms further exemplifies these challenges, especially when libraries originally prepared for one platform need conversion to another. In response to this dynamic environment, we introduce NucBalancer, a Shiny app developed for the optimal selection of barcode sequences. While initially tailored to meet the nucleotide, composition challenges specific to G400 and T7 series sequencers, NucBalancer's utility significantly broadens to accommodate the varied demands of these new sequencing technologies. Its application is particularly crucial in single-cell genomics, enabling the adaptation of libraries, such as those prepared for 10x technology, to various sequencers including G400 and T7 series sequencers. NucBalancer efficiently balances nucleotide composition and sample concentrations, reducing biases and enhancing the reliability of NGS data across platforms. Its adaptability makes it invaluable for addressing sequencing challenges, ensuring effective barcode balancing for sample pooling on any platform.
Availability and implementation: NucBalancer is implemented in R and is available at https://github.com/ersgupta/NucBalancer. Additionally, a shiny interface is available at https://ersgupta.shinyapps.io/NucBalancer/.
{"title":"NucBalancer: streamlining barcode sequence selection for optimal sample pooling for sequencing.","authors":"Saurabh Gupta, Ankur Sharma","doi":"10.46471/gigabyte.138","DOIUrl":"10.46471/gigabyte.138","url":null,"abstract":"<p><p>Recent advancements in next-generation sequencing (NGS) technologies have brought to the forefront the necessity for versatile, cost-effective tools capable of adapting to a rapidly evolving landscape. The emergence of numerous new sequencing platforms, each with unique sample preparation and sequencing requirements, underscores the importance of efficient barcode balancing for successful pooling and accurate demultiplexing of samples. Recently launched new sequencing systems claiming better affordability comparable to more established platforms further exemplifies these challenges, especially when libraries originally prepared for one platform need conversion to another. In response to this dynamic environment, we introduce NucBalancer, a Shiny app developed for the optimal selection of barcode sequences. While initially tailored to meet the nucleotide, composition challenges specific to G400 and T7 series sequencers, NucBalancer's utility significantly broadens to accommodate the varied demands of these new sequencing technologies. Its application is particularly crucial in single-cell genomics, enabling the adaptation of libraries, such as those prepared for 10x technology, to various sequencers including G400 and T7 series sequencers. NucBalancer efficiently balances nucleotide composition and sample concentrations, reducing biases and enhancing the reliability of NGS data across platforms. Its adaptability makes it invaluable for addressing sequencing challenges, ensuring effective barcode balancing for sample pooling on any platform.</p><p><strong>Availability and implementation: </strong>NucBalancer is implemented in R and is available at https://github.com/ersgupta/NucBalancer. Additionally, a shiny interface is available at https://ersgupta.shinyapps.io/NucBalancer/.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte138"},"PeriodicalIF":0.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11488490/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-08eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.135
Locedie Mansueto, Kenneth L McNally, Tobias Kretzschmar, Ramil Mauleon
A growing interest in Cannabis sativa uses for food, fiber, and medicine, and recent changes in regulations have spurred numerous genomic studies of this once-prohibited plant. Cannabis research uses Next Generation Sequencing technologies for genomics and transcriptomics. While other crops have genome portals enabling access and analysis of numerous genotyping data from diverse accessions, leading to the discovery of alleles for important traits, this is absent for cannabis. The CannSeek web portal aims to address this gap. Single nucleotide polymorphism datasets were generated by identifying genome variants from public resequencing data and genome assemblies. Results and accompanying trait data are hosted in the CannSeek web application, built using the Rice SNP-Seek infrastructure with improvements to allow multiple reference genomes and provide a web-service Application Programming Interface. The tools built into the portal allow phylogenetic analyses, varietal grouping and identifications, and favorable haplotype discovery for cannabis accessions using public sequencing data.
Availability and implementation: The CannSeek portal is available at https://icgrc.info/cannseek, https://icgrc.info/genotype_viewer.
{"title":"CannSeek? Yes we Can! An open-source single nucleotide polymorphism database and analysis portal for <i>Cannabis sativa</i>.","authors":"Locedie Mansueto, Kenneth L McNally, Tobias Kretzschmar, Ramil Mauleon","doi":"10.46471/gigabyte.135","DOIUrl":"https://doi.org/10.46471/gigabyte.135","url":null,"abstract":"<p><p>A growing interest in <i>Cannabis sativa</i> uses for food, fiber, and medicine, and recent changes in regulations have spurred numerous genomic studies of this once-prohibited plant. Cannabis research uses Next Generation Sequencing technologies for genomics and transcriptomics. While other crops have genome portals enabling access and analysis of numerous genotyping data from diverse accessions, leading to the discovery of alleles for important traits, this is absent for cannabis. The CannSeek web portal aims to address this gap. Single nucleotide polymorphism datasets were generated by identifying genome variants from public resequencing data and genome assemblies. Results and accompanying trait data are hosted in the CannSeek web application, built using the Rice SNP-Seek infrastructure with improvements to allow multiple reference genomes and provide a web-service Application Programming Interface. The tools built into the portal allow phylogenetic analyses, varietal grouping and identifications, and favorable haplotype discovery for cannabis accessions using public sequencing data.</p><p><strong>Availability and implementation: </strong>The CannSeek portal is available at https://icgrc.info/cannseek, https://icgrc.info/genotype_viewer.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte135"},"PeriodicalIF":0.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480739/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.134
Marcel Nebenführ, David Prochotta, Alexander Ben Hamadou, Axel Janke, Charlotte Gerheim, Christian Betz, Carola Greve, Hanno Jörn Bolz
The time required for genome sequencing and de novo assembly depends on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow, often constrained by external sequencing services. Bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities, we aimed at generating a high-quality mammalian de novo genome in minimal time. We present the first chromosome-level genome assembly of the Whippet, using PacBio long-read high-fidelity sequencing and reference-guided scaffolding. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. Annotation using mammalian genomes and transcriptome data yielded 28,383 transcripts, 90.9% complete BUSCO genes, and identified 36.5% repeat content. Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week, adding another high-quality reference genome to the available sequences of domestic dog breeds.
{"title":"High-speed whole-genome sequencing of a Whippet: Rapid chromosome-level assembly and annotation of an extremely fast dog's genome.","authors":"Marcel Nebenführ, David Prochotta, Alexander Ben Hamadou, Axel Janke, Charlotte Gerheim, Christian Betz, Carola Greve, Hanno Jörn Bolz","doi":"10.46471/gigabyte.134","DOIUrl":"10.46471/gigabyte.134","url":null,"abstract":"<p><p>The time required for genome sequencing and <i>de novo</i> assembly depends on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow, often constrained by external sequencing services. Bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities, we aimed at generating a high-quality mammalian <i>de novo</i> genome in minimal time. We present the first chromosome-level genome assembly of the Whippet, using PacBio long-read high-fidelity sequencing and reference-guided scaffolding. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. Annotation using mammalian genomes and transcriptome data yielded 28,383 transcripts, 90.9% complete BUSCO genes, and identified 36.5% repeat content. Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week, adding another high-quality reference genome to the available sequences of domestic dog breeds.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte134"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11418881/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-31eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.132
Ann-Kathrin Dörr, Josefa Welling, Adrian Dörr, Jule Gosch, Hannah Möhlen, Ricarda Schmithausen, Jan Kehrmann, Folker Meyer, Ivana Kraiselburd
Background: Next-generation sequencing for microbial communities has become a standard technique. However, the computational analysis remains resource-intensive. With declining costs and growing adoption of sequencing-based methods in many fields, validated, fully automated, reproducible and flexible pipelines are increasingly essential in various scientific fields.
Results: We present RiboSnake, a validated, automated, reproducible QIIME2-based pipeline implemented in Snakemake for analysing 16S rRNA gene amplicon sequencing data. RiboSnake includes pre-packaged validated parameter sets optimized for different sample types, from environmental samples to patient data. The configuration packages can be easily adapted and shared, requiring minimal user input.
Conclusion: RiboSnake is a new alternative for researchers employing 16S rRNA gene amplicon sequencing and looking for a customizable and user-friendly pipeline for microbiome analyses with in vitro validated settings. By automating the analysis with validated parameters for diverse sample types, RiboSnake enhances existing methods significantly. The workflow repository can be found on GitHub (https://github.com/IKIM-Essen/RiboSnake).
{"title":"RiboSnake - a user-friendly, robust, reproducible, multipurpose and documentation-extensive pipeline for 16S rRNA gene microbiome analysis.","authors":"Ann-Kathrin Dörr, Josefa Welling, Adrian Dörr, Jule Gosch, Hannah Möhlen, Ricarda Schmithausen, Jan Kehrmann, Folker Meyer, Ivana Kraiselburd","doi":"10.46471/gigabyte.132","DOIUrl":"10.46471/gigabyte.132","url":null,"abstract":"<p><strong>Background: </strong>Next-generation sequencing for microbial communities has become a standard technique. However, the computational analysis remains resource-intensive. With declining costs and growing adoption of sequencing-based methods in many fields, validated, fully automated, reproducible and flexible pipelines are increasingly essential in various scientific fields.</p><p><strong>Results: </strong>We present RiboSnake, a validated, automated, reproducible QIIME2-based pipeline implemented in Snakemake for analysing <i>16S rRNA</i> gene amplicon sequencing data. RiboSnake includes pre-packaged validated parameter sets optimized for different sample types, from environmental samples to patient data. The configuration packages can be easily adapted and shared, requiring minimal user input.</p><p><strong>Conclusion: </strong>RiboSnake is a new alternative for researchers employing <i>16S rRNA</i> gene amplicon sequencing and looking for a customizable and user-friendly pipeline for microbiome analyses with <i>in vitro</i> validated settings. By automating the analysis with validated parameters for diverse sample types, RiboSnake enhances existing methods significantly. The workflow repository can be found on GitHub (https://github.com/IKIM-Essen/RiboSnake).</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte132"},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11448241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-29eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.133
Jorge Buenabad-Chavez, Evelyn Greeves, James P J Chong, Emma Rand
Amazon Web Services (AWS) instances provide a convenient way to run training on complex 'omics data analysis workflows without requiring participants to install software packages or store large data volumes locally. However, efficiently managing dozens of instances is challenging for training providers. We present a set of Bash scripts that make it quick and easy to manage Linux AWS instances pre-configured with all the software analysis tools and data needed for a course, and accessible using encrypted login keys and optional domain names. Creating over 30 instances takes 10-15 minutes. A comprehensive online tutorial describes how to set up and use an AWS account and the scripts, and how to customise AWS instance templates with other software tools and data. We anticipate that others offering similar training may benefit from using the scripts regardless of the analyses being taught.
{"title":"Automated management of AWS instances for training.","authors":"Jorge Buenabad-Chavez, Evelyn Greeves, James P J Chong, Emma Rand","doi":"10.46471/gigabyte.133","DOIUrl":"https://doi.org/10.46471/gigabyte.133","url":null,"abstract":"<p><p>Amazon Web Services (AWS) instances provide a convenient way to run training on complex 'omics data analysis workflows without requiring participants to install software packages or store large data volumes locally. However, efficiently managing dozens of instances is challenging for training providers. We present a set of Bash scripts that make it quick and easy to manage Linux AWS instances pre-configured with all the software analysis tools and data needed for a course, and accessible using encrypted login keys and optional domain names. Creating over 30 instances takes 10-15 minutes. A comprehensive online tutorial describes how to set up and use an AWS account and the scripts, and how to customise AWS instance templates with other software tools and data. We anticipate that others offering similar training may benefit from using the scripts regardless of the analyses being taught.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte133"},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11382607/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.130
Platalea minor, or black-faced spoonbill (Threskiornithidae), is a wading bird confined to coastal areas in East Asia. Due to habitat destruction, it was classified as globally endangered by the International Union for Conservation of Nature. However, the lack of genomic resources for this species hinders the understanding of its biology and diversity, and the development of conservation measures. Here, we report the first chromosomal-level genome assembly of P. minor using a combination of PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (1.24 Gb) contains 95.33% of the sequences anchored to 31 pseudomolecules. The genome assembly has high sequence continuity with scaffold length N50 = 53 Mb. We predicted 18,780 protein-coding genes and measured high BUSCO score completeness (97.3%). Finally, we revealed 6,155,417 bi-allelic single nucleotide polymorphisms, accounting for ∼5% of the genome. This resource offers new opportunities for studying the black-faced spoonbill and developing conservation measures for this species.
小琵鹭(Platalea minor),又称黑脸琵鹭(Threskiornithidae),是一种仅分布于东亚沿海地区的涉禽。由于栖息地遭到破坏,它被国际自然保护联盟列为全球濒危物种。然而,该物种基因组资源的缺乏阻碍了对其生物学和多样性的了解以及保护措施的制定。在此,我们结合 PacBio SMRT 和 Omni-C 支架技术,首次报道了 P. minor 的染色体级基因组组装。组装完成的基因组(1.24 Gb)包含了锚定在 31 个假分子上的 95.33% 的序列。基因组组装具有较高的序列连续性,支架长度 N50 = 53 Mb。我们预测了 18,780 个编码蛋白质的基因,并测出了较高的 BUSCO 评分完整性(97.3%)。最后,我们发现了 6,155,417 个双等位单核苷酸多态性,占基因组的 5%。这一资源为研究黑脸琵鹭和制定该物种的保护措施提供了新的机遇。
{"title":"Chromosomal-level genome assembly and single-nucleotide polymorphism sites of black-faced spoonbill <i>Platalea minor</i>.","authors":"","doi":"10.46471/gigabyte.130","DOIUrl":"10.46471/gigabyte.130","url":null,"abstract":"<p><p><i>Platalea minor</i>, or black-faced spoonbill (Threskiornithidae), is a wading bird confined to coastal areas in East Asia. Due to habitat destruction, it was classified as globally endangered by the International Union for Conservation of Nature. However, the lack of genomic resources for this species hinders the understanding of its biology and diversity, and the development of conservation measures. Here, we report the first chromosomal-level genome assembly of <i>P. minor</i> using a combination of PacBio SMRT and Omni-C scaffolding technologies. The assembled genome (1.24 Gb) contains 95.33% of the sequences anchored to 31 pseudomolecules. The genome assembly has high sequence continuity with scaffold length N50 = 53 Mb. We predicted 18,780 protein-coding genes and measured high BUSCO score completeness (97.3%). Finally, we revealed 6,155,417 bi-allelic single nucleotide polymorphisms, accounting for ∼5% of the genome. This resource offers new opportunities for studying the black-faced spoonbill and developing conservation measures for this species.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11273517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kinship and pedigree, used for estimating inbreeding, heritability, selection, and gene flow, are useful for breeding and animal conservation. However, as the size of crossbred populations increases, inaccurate generation and parentage assignment in livestock farms increase. Restriction-site-associated DNA sequencing is a cost-effective platform for single nucleotide polymorphism (SNP) discovery and genotyping. Here, we performed a kinship analysis and pedigree reconstruction for Angus and Xiangxi yellow cattle. A total of 975 cattle, including 923 offspring with 24 known sires and 28 known dams, were sampled and subjected to SNP discovery and genotyping. The identified SNP panel included 7,305 SNPs capturing the maximum difference between paternal and maternal genome information, allowing us to distinguish F1 from F2 generations with 90% accuracy. In conclusion, we provided a low-cost and efficient SNP panel for kinship analyses and the improvement of local genetic resources, which are valuable for breed improvement, local resource utilization, and conservation.
亲缘关系和血统用于估计近亲繁殖、遗传率、选择和基因流,对育种和动物保护非常有用。然而,随着杂交种群规模的扩大,畜牧场中不准确的世代和亲子鉴定也在增加。限制性位点相关 DNA 测序是发现单核苷酸多态性(SNP)和进行基因分型的一种经济有效的平台。在此,我们对安格斯牛和湘西黄牛进行了亲缘关系分析和血统重建。我们对总共 975 头牛(包括 923 头后代,其中有 24 头已知的父牛和 28 头已知的母牛)进行了采样,并进行了 SNP 发现和基因分型。确定的 SNP 面板包括 7,305 个 SNP,捕获了父系和母系基因组信息的最大差异,使我们能够以 90% 的准确率区分 F1 和 F2 代。总之,我们为亲缘关系分析和地方遗传资源改良提供了一个低成本、高效率的 SNP 面板,这对品种改良、地方资源利用和保护都很有价值。
{"title":"Kinship analysis and pedigree reconstruction by RAD sequencing in cattle.","authors":"Yiming Xu, Wanqiu Wang, Jiefeng Huang, Minjie Xu, Binhu Wang, Yingsong Wu, Yongzhong Xie, Jianbo Jian","doi":"10.46471/gigabyte.131","DOIUrl":"10.46471/gigabyte.131","url":null,"abstract":"<p><p>Kinship and pedigree, used for estimating inbreeding, heritability, selection, and gene flow, are useful for breeding and animal conservation. However, as the size of crossbred populations increases, inaccurate generation and parentage assignment in livestock farms increase. Restriction-site-associated DNA sequencing is a cost-effective platform for single nucleotide polymorphism (SNP) discovery and genotyping. Here, we performed a kinship analysis and pedigree reconstruction for Angus and Xiangxi yellow cattle. A total of 975 cattle, including 923 offspring with 24 known sires and 28 known dams, were sampled and subjected to SNP discovery and genotyping. The identified SNP panel included 7,305 SNPs capturing the maximum difference between paternal and maternal genome information, allowing us to distinguish F1 from F2 generations with 90% accuracy. In conclusion, we provided a low-cost and efficient SNP panel for kinship analyses and the improvement of local genetic resources, which are valuable for breed improvement, local resource utilization, and conservation.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11273509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141790212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.127
Renato Santos, Víctor Moreno-Torres, Ilduara Pintos, Octavio Corral, Carmen de Mendoza, Vicente Soriano, Manuel Corpas
Despite the advances in genetic marker identification associated with severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≍0.97) across sequencing platforms, showcasing GLIMPSE1's ability to confidently impute variants with minor allele frequencies as low as 2% in individuals with Spanish ancestry. We carried out a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here can be leveraged for future genomic projects to gain vital insights into health challenges like COVID-19.
{"title":"Low-coverage whole genome sequencing for a highly selective cohort of severe COVID-19 patients.","authors":"Renato Santos, Víctor Moreno-Torres, Ilduara Pintos, Octavio Corral, Carmen de Mendoza, Vicente Soriano, Manuel Corpas","doi":"10.46471/gigabyte.127","DOIUrl":"10.46471/gigabyte.127","url":null,"abstract":"<p><p>Despite the advances in genetic marker identification associated with severe COVID-19, the full genetic characterisation of the disease remains elusive. This study explores imputation in low-coverage whole genome sequencing for a severe COVID-19 patient cohort. We generated a dataset of 79 imputed variant call format files using the GLIMPSE1 tool, each containing an average of 9.5 million single nucleotide variants. Validation revealed a high imputation accuracy (squared Pearson correlation ≍0.97) across sequencing platforms, showcasing GLIMPSE1's ability to confidently impute variants with minor allele frequencies as low as 2% in individuals with Spanish ancestry. We carried out a comprehensive analysis of the patient cohort, examining hospitalisation and intensive care utilisation, sex and age-based differences, and clinical phenotypes using a standardised set of medical terms developed to characterise severe COVID-19 symptoms. The methods and findings presented here can be leveraged for future genomic projects to gain vital insights into health challenges like COVID-19.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte127"},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11211761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.128
Randy Heiland, Daniel Bergman, Blair Lyons, Grant Waldow, Julie Cass, Heber Lima da Rocha, Marco Ruscone, Vincent Noël, Paul Macklin
Defining a multicellular model can be challenging. There may be hundreds of parameters that specify the attributes and behaviors of objects. In the best case, the model will be defined using some format specification - a markup language - that will provide easy model sharing (and a minimal step toward reproducibility). PhysiCell is an open-source, physics-based multicellular simulation framework with an active and growing user community. It uses XML to define a model and, traditionally, users needed to manually edit the XML to modify the model. PhysiCell Studio is a tool to make this task easier. It provides a GUI that allows editing the XML model definition, including the creation and deletion of fundamental objects: cell types and substrates in the microenvironment. It also lets users build their model by defining initial conditions and biological rules, run simulations, and view results interactively. PhysiCell Studio has evolved over multiple workshops and academic courses in recent years, which has led to many improvements. There is both a desktop and cloud version. Its design and development has benefited from an active undergraduate and graduate research program. Like PhysiCell, the Studio is open-source software and contributions from the community are encouraged.
定义多细胞模型是一项挑战。可能有数以百计的参数指定对象的属性和行为。在最好的情况下,模型将使用某种格式规范--标记语言--来定义,这将为模型共享提供方便(也是实现可重复性的最基本步骤)。PhysiCell 是一个开源的、基于物理学的多细胞仿真框架,拥有一个活跃的、不断增长的用户社区。它使用 XML 来定义模型,传统上,用户需要手动编辑 XML 来修改模型。PhysiCell Studio 是一个让这项工作变得更简单的工具。它提供了一个图形用户界面,允许编辑 XML 模型定义,包括创建和删除基本对象:微环境中的细胞类型和基质。用户还可以通过定义初始条件和生物规则来构建模型,运行模拟并交互式查看结果。近年来,PhysiCell Studio 在多个研讨会和学术课程中不断发展,取得了许多改进。该软件有桌面版和云计算版。它的设计和开发得益于活跃的本科生和研究生研究项目。与PhysiCell一样,Studio也是开源软件,鼓励社区贡献。
{"title":"PhysiCell Studio: a graphical tool to make agent-based modeling more accessible.","authors":"Randy Heiland, Daniel Bergman, Blair Lyons, Grant Waldow, Julie Cass, Heber Lima da Rocha, Marco Ruscone, Vincent Noël, Paul Macklin","doi":"10.46471/gigabyte.128","DOIUrl":"10.46471/gigabyte.128","url":null,"abstract":"<p><p>Defining a multicellular model can be challenging. There may be hundreds of parameters that specify the attributes and behaviors of objects. In the best case, the model will be defined using some format specification - a markup language - that will provide easy model sharing (and a minimal step toward reproducibility). PhysiCell is an open-source, physics-based multicellular simulation framework with an active and growing user community. It uses XML to define a model and, traditionally, users needed to manually edit the XML to modify the model. PhysiCell Studio is a tool to make this task easier. It provides a GUI that allows editing the XML model definition, including the creation and deletion of fundamental objects: cell types and substrates in the microenvironment. It also lets users build their model by defining initial conditions and biological rules, run simulations, and view results interactively. PhysiCell Studio has evolved over multiple workshops and academic courses in recent years, which has led to many improvements. There is both a desktop and cloud version. Its design and development has benefited from an active undergraduate and graduate research program. Like PhysiCell, the Studio is open-source software and contributions from the community are encouraged.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte128"},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11211762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141473254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}