Pub Date : 2024-03-14eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.114
Lipsa Priyadarsinee, Esther Jamir, Selvaraman Nagamani, Hridoy Jyoti Mahanta, Nandan Kumar, Lijo John, Himakshi Sarma, Asheesh Kumar, Anamika Singh Gaur, Rosaleen Sahoo, S Vaikundamani, N Arul Murugan, U Deva Priyakumar, G P S Raghava, Prasad V Bharatam, Ramakrishnan Parthasarathi, V Subramanian, G Madhavi Sastry, G Narahari Sastry
Molecular Property Diagnostic Suite (MPDS) was conceived and developed as an open-source disease-specific web portal based on Galaxy. MPDSCOVID-19 was developed for COVID-19 as a one-stop solution for drug discovery research. Galaxy platforms enable the creation of customized workflows connecting various modules in the web server. The architecture of MPDSCOVID-19 effectively employs Galaxy v22.04 features, which are ported on CentOS 7.8 and Python 3.7. MPDSCOVID-19 provides significant updates and the addition of several new tools updated after six years. Tools developed by our group in Perl/Python and open-source tools are collated and integrated into MPDSCOVID-19 using XML scripts. Our MPDS suite aims to facilitate transparent and open innovation. This approach significantly helps bring inclusiveness in the community while promoting free access and participation in software development.
Availability & implementation: The MPDSCOVID-19 portal can be accessed at https://mpds.neist.res.in:8085/.
{"title":"Molecular Property Diagnostic Suite for COVID-19 (MPDS<sup>COVID-19</sup>): an open-source disease-specific drug discovery portal.","authors":"Lipsa Priyadarsinee, Esther Jamir, Selvaraman Nagamani, Hridoy Jyoti Mahanta, Nandan Kumar, Lijo John, Himakshi Sarma, Asheesh Kumar, Anamika Singh Gaur, Rosaleen Sahoo, S Vaikundamani, N Arul Murugan, U Deva Priyakumar, G P S Raghava, Prasad V Bharatam, Ramakrishnan Parthasarathi, V Subramanian, G Madhavi Sastry, G Narahari Sastry","doi":"10.46471/gigabyte.114","DOIUrl":"10.46471/gigabyte.114","url":null,"abstract":"<p><p>Molecular Property Diagnostic Suite (MPDS) was conceived and developed as an open-source disease-specific web portal based on Galaxy. MPDS<sup>COVID-19</sup> was developed for COVID-19 as a one-stop solution for drug discovery research. Galaxy platforms enable the creation of customized workflows connecting various modules in the web server. The architecture of MPDS<sup>COVID-19</sup> effectively employs Galaxy v22.04 features, which are ported on CentOS 7.8 and Python 3.7. MPDS<sup>COVID-19</sup> provides significant updates and the addition of several new tools updated after six years. Tools developed by our group in Perl/Python and open-source tools are collated and integrated into MPDS<sup>COVID-19</sup> using XML scripts. Our MPDS suite aims to facilitate transparent and open innovation. This approach significantly helps bring inclusiveness in the community while promoting free access and participation in software development.</p><p><strong>Availability & implementation: </strong>The MPDS<sup>COVID-19</sup> portal can be accessed at https://mpds.neist.res.in:8085/.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte114"},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10958779/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140208383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-07eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.113
Sami Hamdan, Shammi More, Leonard Sasse, Vera Komeyer, Kaustubh R Patil, Federico Raimondo
The fast-paced development of machine learning (ML) and its increasing adoption in research challenge researchers without extensive training in ML. In neuroscience, ML can help understand brain-behavior relationships, diagnose diseases and develop biomarkers using data from sources like magnetic resonance imaging and electroencephalography. Primarily, ML builds models to make accurate predictions on unseen data. Researchers evaluate models' performance and generalizability using techniques such as cross-validation (CV). However, choosing a CV scheme and evaluating an ML pipeline is challenging and, if done improperly, can lead to overestimated results and incorrect interpretations. Here, we created julearn, an open-source Python library allowing researchers to design and evaluate complex ML pipelines without encountering common pitfalls. We present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects. Julearn simplifies the access to ML providing an easy-to-use environment. With its design, unique features, simple interface, and practical documentation, it poses as a useful Python-based library for research projects.
机器学习(ML)的发展日新月异,在研究领域的应用也日益广泛,这对没有接受过广泛 ML 培训的研究人员提出了挑战。在神经科学领域,ML 可以帮助理解大脑与行为之间的关系,利用磁共振成像和脑电图等数据源诊断疾病和开发生物标记物。ML 主要是建立模型,对未见数据进行准确预测。研究人员使用交叉验证(CV)等技术评估模型的性能和可推广性。然而,选择交叉验证方案和评估 ML 管道具有挑战性,如果操作不当,可能会导致结果被高估和解释错误。在这里,我们创建了 julearn,这是一个开源 Python 库,允许研究人员设计和评估复杂的 ML 管道,而不会遇到常见的陷阱。我们介绍了 julearn 的设计原理、核心功能,并展示了之前发表的三个研究项目实例。Julearn 提供了一个易于使用的环境,简化了对 ML 的访问。凭借其设计、独特的功能、简单的界面和实用的文档,它成为研究项目中一个有用的基于 Python 的库。
{"title":"Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models.","authors":"Sami Hamdan, Shammi More, Leonard Sasse, Vera Komeyer, Kaustubh R Patil, Federico Raimondo","doi":"10.46471/gigabyte.113","DOIUrl":"10.46471/gigabyte.113","url":null,"abstract":"<p><p>The fast-paced development of machine learning (ML) and its increasing adoption in research challenge researchers without extensive training in ML. In neuroscience, ML can help understand brain-behavior relationships, diagnose diseases and develop biomarkers using data from sources like magnetic resonance imaging and electroencephalography. Primarily, ML builds models to make accurate predictions on unseen data. Researchers evaluate models' performance and generalizability using techniques such as cross-validation (CV). However, choosing a CV scheme and evaluating an ML pipeline is challenging and, if done improperly, can lead to overestimated results and incorrect interpretations. Here, we created julearn, an open-source Python library allowing researchers to design and evaluate complex ML pipelines without encountering common pitfalls. We present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects. Julearn simplifies the access to ML providing an easy-to-use environment. With its design, unique features, simple interface, and practical documentation, it poses as a useful Python-based library for research projects.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte113"},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10940896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140144689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-06eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.112
Yutang Chen, Roland Kölliker, Martin Mascher, Dario Copetti, Axel Himmelbach, Nils Stein, Bruno Studer
This work is an update and extension of the previously published article "Ultralong Oxford Nanopore Reads Enable the Development of a Reference-Grade Perennial Ryegrass Genome Assembly" by Frei et al. The published genome assembly of the doubled haploid perennial ryegrass (Lolium perenne L.) genotype Kyuss (Kyuss v1.0) marked a milestone for forage grass research and breeding. However, order and orientation errors may exist in the pseudo-chromosomes of Kyuss, since barley (Hordeum vulgare L.), which diverged 30 million years ago from perennial ryegrass, was used as the reference to scaffold Kyuss. To correct for structural errors possibly present in the published Kyuss assembly, we de novo assembled the genome again and generated 50-fold coverage high-throughput chromosome conformation capture (Hi-C) data to assist pseudo-chromosome construction. The resulting new chromosome-level assembly Kyuss v2.0 showed improved quality with high contiguity (contig N50 = 120 Mb), high completeness (total BUSCO score = 99%), high base-level accuracy (QV = 50), and correct pseudo-chromosome structure (validated by Hi-C contact map). This new assembly will serve as a better reference genome for Lolium spp. and greatly benefit the forage and turf grass research community.
{"title":"An improved chromosome-level genome assembly of perennial ryegrass (<i>Lolium perenne</i> L.).","authors":"Yutang Chen, Roland Kölliker, Martin Mascher, Dario Copetti, Axel Himmelbach, Nils Stein, Bruno Studer","doi":"10.46471/gigabyte.112","DOIUrl":"10.46471/gigabyte.112","url":null,"abstract":"<p><p>This work is an update and extension of the previously published article \"Ultralong Oxford Nanopore Reads Enable the Development of a Reference-Grade Perennial Ryegrass Genome Assembly\" by Frei <i>et al.</i> The published genome assembly of the doubled haploid perennial ryegrass (<i>Lolium perenne</i> L.) genotype Kyuss (Kyuss v1.0) marked a milestone for forage grass research and breeding. However, order and orientation errors may exist in the pseudo-chromosomes of Kyuss, since barley (<i>Hordeum vulgare</i> L.), which diverged 30 million years ago from perennial ryegrass, was used as the reference to scaffold Kyuss. To correct for structural errors possibly present in the published Kyuss assembly, we <i>de novo</i> assembled the genome again and generated 50-fold coverage high-throughput chromosome conformation capture (Hi-C) data to assist pseudo-chromosome construction. The resulting new chromosome-level assembly Kyuss v2.0 showed improved quality with high contiguity (contig N50 = 120 Mb), high completeness (total BUSCO score = 99%), high base-level accuracy (QV = 50), and correct pseudo-chromosome structure (validated by Hi-C contact map). This new assembly will serve as a better reference genome for <i>Lolium</i> spp. and greatly benefit the forage and turf grass research community.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte112"},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10940895/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140144688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-21eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.107
Filipi Miranda Soares, Luís Ferreira Pires, Maria Carolina Garcia, Lidio Coradin, Natalia Pirani Ghilardi-Lopes, Rubens Rangel Silva, Aline Martins de Carvalho, Anand Gavai, Yamine Bouzembrak, Benildes Coura Moreira Dos Santos Maculan, Sheina Koffler, Uiara Bandineli Montedo, Debora Pignatari Drucker, Raquel Santiago, Maria Clara Peres de Carvalho, Ana Carolina da Silva Lima, Hillary Dandara Elias Gabriel, Stephanie Gabriele Mendonça de França, Karoline Reis de Almeida, Bárbara Junqueira Dos Santos, Antonio Mauro Saraiva
This paper presents two key data sets derived from the Pomar Urbano project. The first data set is a comprehensive catalog of edible fruit-bearing plant species, native or introduced to Brazil. The second data set, sourced from the iNaturalist platform, tracks the distribution and monitoring of these plants within urban landscapes across Brazil. The study includes data from the capitals of all 27 federative units of Brazil, focusing on the ten cities that contributed the most observations as of August 2023. The research emphasizes the significance of citizen science in urban biodiversity monitoring and its potential to contribute to various fields, including food and nutrition, creative industry, study of plant phenology, and machine learning applications. We expect the data sets presented in this paper to serve as resources for further studies in urban foraging, food security, cultural ecosystem services, and environmental sustainability.
{"title":"Citizen science data on urban forageable plants: a case study in Brazil.","authors":"Filipi Miranda Soares, Luís Ferreira Pires, Maria Carolina Garcia, Lidio Coradin, Natalia Pirani Ghilardi-Lopes, Rubens Rangel Silva, Aline Martins de Carvalho, Anand Gavai, Yamine Bouzembrak, Benildes Coura Moreira Dos Santos Maculan, Sheina Koffler, Uiara Bandineli Montedo, Debora Pignatari Drucker, Raquel Santiago, Maria Clara Peres de Carvalho, Ana Carolina da Silva Lima, Hillary Dandara Elias Gabriel, Stephanie Gabriele Mendonça de França, Karoline Reis de Almeida, Bárbara Junqueira Dos Santos, Antonio Mauro Saraiva","doi":"10.46471/gigabyte.107","DOIUrl":"10.46471/gigabyte.107","url":null,"abstract":"<p><p>This paper presents two key data sets derived from the <i>Pomar Urbano</i> project. The first data set is a comprehensive catalog of edible fruit-bearing plant species, native or introduced to Brazil. The second data set, sourced from the iNaturalist platform, tracks the distribution and monitoring of these plants within urban landscapes across Brazil. The study includes data from the capitals of all 27 federative units of Brazil, focusing on the ten cities that contributed the most observations as of August 2023. The research emphasizes the significance of citizen science in urban biodiversity monitoring and its potential to contribute to various fields, including food and nutrition, creative industry, study of plant phenology, and machine learning applications. We expect the data sets presented in this paper to serve as resources for further studies in urban foraging, food security, cultural ecosystem services, and environmental sustainability.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte107"},"PeriodicalIF":0.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10905257/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-20eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.109
Aleksandra Djordjevic, Junhua Li, Shuangsang Fang, Lei Cao, Marija Ivanovic
This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confronted this clustering challenge as an Integer Linear Programming minimization problem. Our approach introduced a novel model based on the VNS technique, demonstrating the efficacy in navigating the complexities of cell clustering. Notably, our method extends beyond conventional cell-type clustering to spatial domain clustering. This adaptability enables our algorithm to orchestrate clusters based on information gleaned from gene expression matrices and spatial coordinates. Our validation showed the superior performance of our method when compared to existing techniques. Our approach advances current clustering methodologies and can potentially be applied to several fields, from biomedical research to spatial data analysis.
{"title":"A novel variable neighborhood search approach for cell clustering for spatial transcriptomics.","authors":"Aleksandra Djordjevic, Junhua Li, Shuangsang Fang, Lei Cao, Marija Ivanovic","doi":"10.46471/gigabyte.109","DOIUrl":"10.46471/gigabyte.109","url":null,"abstract":"<p><p>This paper introduces a new approach to cell clustering using the Variable Neighborhood Search (VNS) metaheuristic. The purpose of this method is to cluster cells based on both gene expression and spatial coordinates. Initially, we confronted this clustering challenge as an Integer Linear Programming minimization problem. Our approach introduced a novel model based on the VNS technique, demonstrating the efficacy in navigating the complexities of cell clustering. Notably, our method extends beyond conventional cell-type clustering to spatial domain clustering. This adaptability enables our algorithm to orchestrate clusters based on information gleaned from gene expression matrices and spatial coordinates. Our validation showed the superior performance of our method when compared to existing techniques. Our approach advances current clustering methodologies and can potentially be applied to several fields, from biomedical research to spatial data analysis.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte109"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10910296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140029702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration. The BatchEval Pipeline generates a comprehensive report, which consists of a series of HTML pages for assessment findings, including a main page, a raw dataset evaluation page, and several built-in methods evaluation pages. The main page exhibits basic information of the integrated datasets, a comprehensive score of batch effect, and the most recommended method for removing batch effect from the current datasets. The remaining pages exhibit evaluation details for the raw dataset, and evaluation results from the built-in batch effect removal methods after removing batch effect. This comprehensive report enables researchers to accurately identify and remove batch effects, resulting in more reliable and meaningful biological insights from integrated datasets. In summary, the BatchEval Pipeline represents a significant advancement in batch effect evaluation, and is a valuable tool to improve the accuracy and reliability of the experimental results.
Availability & implementation: The source code of the BatchEval Pipeline is available at https://github.com/STOmics/BatchEval.
{"title":"BatchEval Pipeline: batch effect evaluation workflow for multiple datasets joint analysis.","authors":"Chao Zhang, Qiang Kang, Mei Li, Hongqing Xie, Shuangsang Fang, Xun Xu","doi":"10.46471/gigabyte.108","DOIUrl":"10.46471/gigabyte.108","url":null,"abstract":"<p><p>As genomic sequencing technology continues to advance, it becomes increasingly important to perform joint analyses of multiple datasets of transcriptomics. However, batch effect presents challenges for dataset integration, such as sequencing data measured on different platforms, and datasets collected at different times. Here, we report the development of BatchEval Pipeline, a batch effect workflow used to evaluate batch effect on dataset integration. The BatchEval Pipeline generates a comprehensive report, which consists of a series of HTML pages for assessment findings, including a main page, a raw dataset evaluation page, and several built-in methods evaluation pages. The main page exhibits basic information of the integrated datasets, a comprehensive score of batch effect, and the most recommended method for removing batch effect from the current datasets. The remaining pages exhibit evaluation details for the raw dataset, and evaluation results from the built-in batch effect removal methods after removing batch effect. This comprehensive report enables researchers to accurately identify and remove batch effects, resulting in more reliable and meaningful biological insights from integrated datasets. In summary, the BatchEval Pipeline represents a significant advancement in batch effect evaluation, and is a valuable tool to improve the accuracy and reliability of the experimental results.</p><p><strong>Availability & implementation: </strong>The source code of the BatchEval Pipeline is available at https://github.com/STOmics/BatchEval.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte108"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10905258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-20eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.110
Bohan Zhang, Mei Li, Qiang Kang, Zhonghan Deng, Hua Qin, Kui Su, Xiuwen Feng, Lichuan Chen, Huanlin Liu, Shuangsang Fang, Yong Zhang, Yuxiang Li, Susanne Brix, Xun Xu
In spatially resolved transcriptomics, Stereo-seq facilitates the analysis of large tissues at the single-cell level, offering subcellular resolution and centimeter-level field-of-view. Our previous work on StereoCell introduced a one-stop software using cell nuclei staining images and statistical methods to generate high-confidence single-cell spatial gene expression profiles for Stereo-seq data. With advancements allowing the acquisition of cell boundary information, such as cell membrane/wall staining images, we updated our software to a new version, STCellbin. Using cell nuclei staining images, STCellbin aligns cell membrane/wall staining images with spatial gene expression maps. Advanced cell segmentation ensures the detection of accurate cell boundaries, leading to more reliable single-cell spatial gene expression profiles. We verified that STCellbin can be applied to mouse liver (cell membranes) and Arabidopsis seed (cell walls) datasets, outperforming other methods. The improved capability of capturing single-cell gene expression profiles results in a deeper understanding of the contribution of single-cell phenotypes to tissue biology.
Availability & implementation: The source code of STCellbin is available at https://github.com/STOmics/STCellbin.
{"title":"Generating single-cell gene expression profiles for high-resolution spatial transcriptomics based on cell boundary images.","authors":"Bohan Zhang, Mei Li, Qiang Kang, Zhonghan Deng, Hua Qin, Kui Su, Xiuwen Feng, Lichuan Chen, Huanlin Liu, Shuangsang Fang, Yong Zhang, Yuxiang Li, Susanne Brix, Xun Xu","doi":"10.46471/gigabyte.110","DOIUrl":"10.46471/gigabyte.110","url":null,"abstract":"<p><p>In spatially resolved transcriptomics, Stereo-seq facilitates the analysis of large tissues at the single-cell level, offering subcellular resolution and centimeter-level field-of-view. Our previous work on StereoCell introduced a one-stop software using cell nuclei staining images and statistical methods to generate high-confidence single-cell spatial gene expression profiles for Stereo-seq data. With advancements allowing the acquisition of cell boundary information, such as cell membrane/wall staining images, we updated our software to a new version, STCellbin. Using cell nuclei staining images, STCellbin aligns cell membrane/wall staining images with spatial gene expression maps. Advanced cell segmentation ensures the detection of accurate cell boundaries, leading to more reliable single-cell spatial gene expression profiles. We verified that STCellbin can be applied to mouse liver (cell membranes) and <i>Arabidopsis</i> seed (cell walls) datasets, outperforming other methods. The improved capability of capturing single-cell gene expression profiles results in a deeper understanding of the contribution of single-cell phenotypes to tissue biology.</p><p><strong>Availability & implementation: </strong>The source code of STCellbin is available at https://github.com/STOmics/STCellbin.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte110"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10905256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The basic analysis steps of spatial transcriptomics require obtaining gene expression information from both space and cells. The existing tools for these analyses incur performance issues when dealing with large datasets. These issues involve computationally intensive spatial localization, RNA genome alignment, and excessive memory usage in large chip scenarios. These problems affect the applicability and efficiency of the analysis. Here, a high-performance and accurate spatial transcriptomics data analysis workflow, called Stereo-seq Analysis Workflow (SAW), was developed for the Stereo-seq technology developed at BGI. SAW includes mRNA spatial position reconstruction, genome alignment, gene expression matrix generation, and clustering. The workflow outputs files in a universal format for subsequent personalized analysis. The execution time for the entire analysis is ∼148 min with 1 GB reads 1 × 1 cm chip test data, 1.8 times faster than with an unoptimized workflow.
{"title":"SAW: an efficient and accurate data analysis workflow for Stereo-seq spatial transcriptomics.","authors":"Chun Gong, Shengkang Li, Leying Wang, Fuxiang Zhao, Shuangsang Fang, Dong Yuan, Zijian Zhao, Qiqi He, Mei Li, Weiqing Liu, Zhaoxun Li, Hongqing Xie, Sha Liao, Ao Chen, Yong Zhang, Yuxiang Li, Xun Xu","doi":"10.46471/gigabyte.111","DOIUrl":"10.46471/gigabyte.111","url":null,"abstract":"<p><p>The basic analysis steps of spatial transcriptomics require obtaining gene expression information from both space and cells. The existing tools for these analyses incur performance issues when dealing with large datasets. These issues involve computationally intensive spatial localization, RNA genome alignment, and excessive memory usage in large chip scenarios. These problems affect the applicability and efficiency of the analysis. Here, a high-performance and accurate spatial transcriptomics data analysis workflow, called Stereo-seq Analysis Workflow (SAW), was developed for the Stereo-seq technology developed at BGI. SAW includes mRNA spatial position reconstruction, genome alignment, gene expression matrix generation, and clustering. The workflow outputs files in a universal format for subsequent personalized analysis. The execution time for the entire analysis is ∼148 min with 1 GB reads 1 × 1 cm chip test data, 1.8 times faster than with an unoptimized workflow.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte111"},"PeriodicalIF":0.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10905255/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140023511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-25eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.106
Xiaotong Niu, Yakui Lv, Jin Chen, Yueheng Feng, Yilin Cui, Haorong Lu, Hui Liu
Trimeresurus albolabris, also known as the white-lipped pit viper or white-lipped tree viper, is a highly venomous snake distributed across Southeast Asia and the cause of many snakebite cases. In this study, we report the first whole genome assembly of T. albolabris obtained with next-generation sequencing from a specimen collected in Mengzi, Yunnan, China. After genome sequencing and assembly, the genome of this male T. albolabris individual was 1.51 Gb in length and included 38.42% repeat-element content. Using this genome, 21,695 genes were identified, and 99.17% of genes could be annotated using gene functional databases. Our genome assembly and annotation process was validated using a phylogenetic tree, which included six species and focused on single-copy genes of nuclear genomes. This research will contribute to future studies on Trimeresurus biology and the genetic basis of snake venom.
{"title":"The genome assembly and annotation of the white-lipped tree pit viper <i>Trimeresurus albolabris</i>.","authors":"Xiaotong Niu, Yakui Lv, Jin Chen, Yueheng Feng, Yilin Cui, Haorong Lu, Hui Liu","doi":"10.46471/gigabyte.106","DOIUrl":"10.46471/gigabyte.106","url":null,"abstract":"<p><p><i>Trimeresurus albolabris</i>, also known as the white-lipped pit viper or white-lipped tree viper, is a highly venomous snake distributed across Southeast Asia and the cause of many snakebite cases. In this study, we report the first whole genome assembly of <i>T. albolabris</i> obtained with next-generation sequencing from a specimen collected in Mengzi, Yunnan, China. After genome sequencing and assembly, the genome of this male <i>T. albolabris</i> individual was 1.51 Gb in length and included 38.42% repeat-element content. Using this genome, 21,695 genes were identified, and 99.17% of genes could be annotated using gene functional databases. Our genome assembly and annotation process was validated using a phylogenetic tree, which included six species and focused on single-copy genes of nuclear genomes. This research will contribute to future studies on <i>Trimeresurus</i> biology and the genetic basis of snake venom.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte106"},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10836062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139682037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-11eCollection Date: 2024-01-01DOI: 10.46471/gigabyte.105
Magnus Wolf, Bruno Lopes da Silva Ferrette, Raphael T F Coimbra, Menno de Jong, Marcel Nebenführ, David Prochotta, Yannis Schöneberg, Konstantin Zapf, Jessica Rosenbaum, Hannah A Mc Intyre, Julia Maier, Clara C S de Souza, Lucas M Gehlhaar, Melina J Werner, Henrik Oechler, Marie Wittekind, Moritz Sonnewald, Maria A Nilsson, Axel Janke, Sven Winter
The snake pipefish, Entelurus aequoreus (Linnaeus, 1758), is a northern Atlantic fish inhabiting open seagrass environments that recently expanded its distribution range. Here, we present a highly contiguous, near chromosome-scale genome of E. aequoreus. The final assembly spans 1.6 Gbp in 7,391 scaffolds, with a scaffold N50 of 62.3 Mbp and L50 of 12. The 28 largest scaffolds (>21 Mbp) span 89.7% of the assembly length. A BUSCO completeness score of 94.1% and a mapping rate above 98% suggest a high assembly completeness. Repetitive elements cover 74.93% of the genome, one of the highest proportions identified in vertebrates. Our demographic modeling identified a peak in population size during the last interglacial period, suggesting the species might benefit from warmer water conditions. Our updated snake pipefish assembly is essential for future analyses of the morphological and molecular changes unique to the Syngnathidae.
{"title":"Near chromosome-level and highly repetitive genome assembly of the snake pipefish <i>Entelurus aequoreus</i> (Syngnathiformes: Syngnathidae).","authors":"Magnus Wolf, Bruno Lopes da Silva Ferrette, Raphael T F Coimbra, Menno de Jong, Marcel Nebenführ, David Prochotta, Yannis Schöneberg, Konstantin Zapf, Jessica Rosenbaum, Hannah A Mc Intyre, Julia Maier, Clara C S de Souza, Lucas M Gehlhaar, Melina J Werner, Henrik Oechler, Marie Wittekind, Moritz Sonnewald, Maria A Nilsson, Axel Janke, Sven Winter","doi":"10.46471/gigabyte.105","DOIUrl":"10.46471/gigabyte.105","url":null,"abstract":"<p><p>The snake pipefish, <i>Entelurus aequoreus</i> (Linnaeus, 1758), is a northern Atlantic fish inhabiting open seagrass environments that recently expanded its distribution range. Here, we present a highly contiguous, near chromosome-scale genome of <i>E. aequoreus</i>. The final assembly spans 1.6 Gbp in 7,391 scaffolds, with a scaffold N50 of 62.3 Mbp and L50 of 12. The 28 largest scaffolds (>21 Mbp) span 89.7% of the assembly length. A BUSCO completeness score of 94.1% and a mapping rate above 98% suggest a high assembly completeness. Repetitive elements cover 74.93% of the genome, one of the highest proportions identified in vertebrates. Our demographic modeling identified a peak in population size during the last interglacial period, suggesting the species might benefit from warmer water conditions. Our updated snake pipefish assembly is essential for future analyses of the morphological and molecular changes unique to the Syngnathidae.</p>","PeriodicalId":73157,"journal":{"name":"GigaByte (Hong Kong, China)","volume":"2024 ","pages":"gigabyte105"},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10795108/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139492894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}