首页 > 最新文献

GigaScience最新文献

英文 中文
PhageGE: an interactive web platform for exploratory analysis and visualization of bacteriophage genomes. PhageGE:噬菌体基因组探索性分析和可视化互动网络平台。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae074
Jinxin Zhao, Jiru Han, Yu-Wei Lin, Yan Zhu, Michael Aichem, Dimitar Garkov, Phillip J Bergen, Sue C Nang, Jian-Zhong Ye, Tieli Zhou, Tony Velkov, Jiangning Song, Falk Schreiber, Jian Li

Background: Antimicrobial resistance is a serious threat to global health. Due to the stagnant antibiotic discovery pipeline, bacteriophages (phages) have been proposed as an alternative therapy for the treatment of infections caused by multidrug-resistant pathogens. Genomic features play an important role in phage pharmacology. However, our knowledge of phage genomics is sparse, and the use of existing bioinformatic pipelines and tools requires considerable bioinformatic expertise. These challenges have substantially limited the clinical translation of phage therapy.

Findings: We have developed PhageGE (Phage Genome Explorer), a user-friendly graphical interface application for the interactive analysis of phage genomes. PhageGE enables users to perform key analyses, including phylogenetic analysis, visualization of phylogenetic trees, prediction of phage life cycle, and comparative analysis of phage genome annotations. The new R Shiny web server, PhageGE, integrates existing R packages and combines them with several newly developed functions to facilitate these analyses. Additionally, the web server provides interactive visualization capabilities and allows users to directly export publication-quality images.

Conclusions: PhageGE is a valuable tool that simplifies the analysis of phage genome data and may expedite the development and clinical translation of phage therapy. PhageGE is publicly available at https://jason-zhao.shinyapps.io/PhageGE_Update/.

背景:抗菌药耐药性是对全球健康的严重威胁。由于抗生素的研发停滞不前,噬菌体(phage)被提议作为治疗耐多药病原体感染的替代疗法。基因组特征在噬菌体药理学中发挥着重要作用。然而,我们对噬菌体基因组学的了解并不多,使用现有的生物信息学管道和工具需要大量的生物信息学专业知识。这些挑战极大地限制了噬菌体疗法的临床转化:我们开发了 PhageGE(噬菌体基因组资源管理器),这是一款用户友好型图形界面应用程序,用于交互式分析噬菌体基因组。PhageGE使用户能够进行关键分析,包括系统发育分析、系统发育树可视化、噬菌体生命周期预测以及噬菌体基因组注释比较分析。新的 R Shiny 网络服务器 PhageGE 整合了现有的 R 软件包,并将它们与几个新开发的功能相结合,为这些分析提供了便利。此外,网络服务器还提供交互式可视化功能,并允许用户直接导出出版物质量的图像:PhageGE是一个有价值的工具,它简化了噬菌体基因组数据的分析,可能会加快噬菌体疗法的开发和临床转化。PhageGE 可通过 https://jason-zhao.shinyapps.io/PhageGE_Update/ 公开获取。
{"title":"PhageGE: an interactive web platform for exploratory analysis and visualization of bacteriophage genomes.","authors":"Jinxin Zhao, Jiru Han, Yu-Wei Lin, Yan Zhu, Michael Aichem, Dimitar Garkov, Phillip J Bergen, Sue C Nang, Jian-Zhong Ye, Tieli Zhou, Tony Velkov, Jiangning Song, Falk Schreiber, Jian Li","doi":"10.1093/gigascience/giae074","DOIUrl":"10.1093/gigascience/giae074","url":null,"abstract":"<p><strong>Background: </strong>Antimicrobial resistance is a serious threat to global health. Due to the stagnant antibiotic discovery pipeline, bacteriophages (phages) have been proposed as an alternative therapy for the treatment of infections caused by multidrug-resistant pathogens. Genomic features play an important role in phage pharmacology. However, our knowledge of phage genomics is sparse, and the use of existing bioinformatic pipelines and tools requires considerable bioinformatic expertise. These challenges have substantially limited the clinical translation of phage therapy.</p><p><strong>Findings: </strong>We have developed PhageGE (Phage Genome Explorer), a user-friendly graphical interface application for the interactive analysis of phage genomes. PhageGE enables users to perform key analyses, including phylogenetic analysis, visualization of phylogenetic trees, prediction of phage life cycle, and comparative analysis of phage genome annotations. The new R Shiny web server, PhageGE, integrates existing R packages and combines them with several newly developed functions to facilitate these analyses. Additionally, the web server provides interactive visualization capabilities and allows users to directly export publication-quality images.</p><p><strong>Conclusions: </strong>PhageGE is a valuable tool that simplifies the analysis of phage genome data and may expedite the development and clinical translation of phage therapy. PhageGE is publicly available at https://jason-zhao.shinyapps.io/PhageGE_Update/.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423353/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142344887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data. AltaiR:用于多 FASTA 数据无配准和时序分析的 C 语言工具包。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae086
Jorge M Silva, Armando J Pinho, Diogo Pratas

Background: Most viral genome sequences generated during the latest pandemic have presented new challenges for computational analysis. Analyzing millions of viral genomes in multi-FASTA format is computationally demanding, especially when using alignment-based methods. Most existing methods are not designed to handle such large datasets, often requiring the analysis to be divided into smaller parts to obtain results using available computational resources.

Findings: We introduce AltaiR, a toolkit for analyzing multiple sequences in multi-FASTA format using exclusively alignment-free methodologies. AltaiR enables the identification of singularity and similarity patterns within sequences and computes static and temporal dynamics without restrictions on the number or size of input sequences. It automatically filters low-quality, biased, or deviant data. We demonstrate AltaiR's capabilities by analyzing more than 1.5 million full severe acute respiratory virus coronavirus 2 sequences, revealing interesting observations regarding viral genome characteristics over time, such as shifts in nucleotide composition, decreases in average Kolmogorov sequence complexity, and the evolution of the smallest sequences not found in the human host.

Conclusions: AltaiR can identify temporal characteristics and trends in large numbers of sequences, making it ideal for scenarios involving endemic or epidemic outbreaks with vast amounts of available sequence data. Implemented in C with multithreading and methodological optimizations, AltaiR is computationally efficient, flexible, and dependency-free. It accepts any sequence in FASTA format, including amino acid sequences. The complete toolkit is freely available at https://github.com/cobilab/altair.

背景:最近大流行期间产生的大多数病毒基因组序列给计算分析带来了新的挑战。分析多 FASTA 格式的数百万个病毒基因组对计算要求很高,尤其是在使用基于比对的方法时。大多数现有方法都不是为处理如此大的数据集而设计的,往往需要将分析分成较小的部分,才能利用现有计算资源获得结果:我们介绍了 AltaiR,这是一种完全采用无配准方法分析多 FASTA 格式多序列的工具包。AltaiR 能够识别序列中的奇异性和相似性模式,并计算静态和时间动态,而不受输入序列数量或大小的限制。它能自动过滤低质量、有偏见或偏差的数据。我们通过分析 150 多万条完整的严重急性呼吸道病毒冠状病毒 2 序列,展示了 AltaiR 的能力,揭示了病毒基因组随时间变化的有趣特征,如核苷酸组成的变化、平均柯尔莫哥洛夫序列复杂性的降低,以及人类宿主中未发现的最小序列的进化:AltaiR可以识别大量序列的时间特征和趋势,因此非常适合涉及流行病或疫情爆发、拥有大量可用序列数据的情况。AltaiR 采用 C 语言实现,具有多线程和方法优化功能,计算效率高、灵活性强且无依赖性。它接受任何 FASTA 格式的序列,包括氨基酸序列。完整的工具包可在 https://github.com/cobilab/altair 免费获取。
{"title":"AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data.","authors":"Jorge M Silva, Armando J Pinho, Diogo Pratas","doi":"10.1093/gigascience/giae086","DOIUrl":"10.1093/gigascience/giae086","url":null,"abstract":"<p><strong>Background: </strong>Most viral genome sequences generated during the latest pandemic have presented new challenges for computational analysis. Analyzing millions of viral genomes in multi-FASTA format is computationally demanding, especially when using alignment-based methods. Most existing methods are not designed to handle such large datasets, often requiring the analysis to be divided into smaller parts to obtain results using available computational resources.</p><p><strong>Findings: </strong>We introduce AltaiR, a toolkit for analyzing multiple sequences in multi-FASTA format using exclusively alignment-free methodologies. AltaiR enables the identification of singularity and similarity patterns within sequences and computes static and temporal dynamics without restrictions on the number or size of input sequences. It automatically filters low-quality, biased, or deviant data. We demonstrate AltaiR's capabilities by analyzing more than 1.5 million full severe acute respiratory virus coronavirus 2 sequences, revealing interesting observations regarding viral genome characteristics over time, such as shifts in nucleotide composition, decreases in average Kolmogorov sequence complexity, and the evolution of the smallest sequences not found in the human host.</p><p><strong>Conclusions: </strong>AltaiR can identify temporal characteristics and trends in large numbers of sequences, making it ideal for scenarios involving endemic or epidemic outbreaks with vast amounts of available sequence data. Implemented in C with multithreading and methodological optimizations, AltaiR is computationally efficient, flexible, and dependency-free. It accepts any sequence in FASTA format, including amino acid sequences. The complete toolkit is freely available at https://github.com/cobilab/altair.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11590114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142715752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction and analysis of telomere-to-telomere genomes for 2 sweet oranges: Longhuihong and Newhall (Citrus sinensis). 构建和分析两种甜橙的端粒-端粒基因组:龙汇红和纽荷尔(Citrus sinensis)。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae084
Lin Hong, Xin-Dong Xu, Lei Yang, Min Wang, Shuang Li, Haijian Yang, Si-Ying Ye, Ling-Ling Chen, Jia-Ming Song

Background: Sweet orange (Citrus sinensis Osbeck) is a fruit crop of high nutritional value that is widely consumed around the world. However, its susceptibility to low-temperature stress limits its cultivation and production in regions prone to frost damage, severely impacting the sustainable development of the sweet orange industry. Therefore, developing cold-resistant sweet orange varieties is of great necessity. Traditional hybrid breeding methods are not feasible due to the polyembryonic phenomenon in sweet oranges, necessitating the enhancement of its germplasm through molecular breeding. High-quality reference genomes are valuable for studying crop resistance to biotic and abiotic stresses. However, the lack of genomic resources for cold-resistant sweet orange varieties has hindered the progress in developing such varieties and researching their molecular mechanisms of cold resistance.

Findings: This study integrated PacBio HiFi, ONT, Hi-C, and Illumina sequencing data to assemble telomere-to-telomere (T2T) reference genomes for the cold-resistant sweet orange mutant "Longhuihong" (Citrus sinensis [L.] Osb. cv. LHH) and its wild-type counterpart "Newhall" (C. sinensis [L.] Osb. cv. Newhall). Comprehensive evaluations based on multiple criteria revealed that both genomes exhibit high continuity, completeness, and accuracy. The genome sizes were 340.28 Mb and 346.33 Mb, with contig N50 of 39.31 Mb and 36.77 Mb, respectively. In total, 31,456 and 30,021 gene models were annotated in the respective genomes. Leveraging these assembled genomes, comparative genomics analyses were performed, elucidating the evolutionary history of the sweet orange genome. Moreover, the study identified 2,886 structural variants between the 2 genomes, with several SVs located in the upstream, downstream, or intronic regions of homologous genes known to be associated with cold resistance.

Conclusions: The study de novo assembled 2 T2T reference genomes of sweet orange varieties exhibiting different levels of cold tolerance. These genomes serve as valuable foundational resources for genomic research and molecular breeding aimed at enhancing cold tolerance in sweet oranges. Additionally, they expand the existing repository of reference genomes and sequencing data resources for C. sinensis. Moreover, these genomes provide a critical data foundation for comparative genomics analyses across different plant species.

背景:甜橙(Citrus sinensis Osbeck)是一种营养价值很高的水果作物,在世界各地被广泛食用。然而,甜橙易受低温胁迫的影响,限制了其在易受冻害地区的种植和生产,严重影响了甜橙产业的可持续发展。因此,开发抗寒甜橙品种十分必要。由于甜橙的多胚现象,传统的杂交育种方法并不可行,因此必须通过分子育种来提高其种质。高质量的参考基因组对于研究作物对生物和非生物胁迫的抗性非常有价值。然而,抗寒甜橙品种基因组资源的缺乏阻碍了此类品种的开发及其抗寒分子机制的研究进展:本研究整合了 PacBio HiFi、ONT、Hi-C 和 Illumina 测序数据,为抗寒甜橙突变体 "龙会红"(Citrus sinensis [L.] Osb.基于多种标准的综合评估显示,这两个基因组都表现出很高的连续性、完整性和准确性。基因组大小分别为 340.28 Mb 和 346.33 Mb,等位基因 N50 分别为 39.31 Mb 和 36.77 Mb。两个基因组中分别注释了 31,456 和 30,021 个基因模型。利用这些组装好的基因组,进行了比较基因组学分析,阐明了甜橙基因组的进化历史。此外,该研究还发现了2个基因组之间的2886个结构变异,其中几个SV位于已知与抗寒性相关的同源基因的上游、下游或内含区:这项研究从新组装了两个具有不同耐寒性的甜橙品种的 T2T 参考基因组。这些基因组为旨在提高甜橙耐寒性的基因组研究和分子育种提供了宝贵的基础资源。此外,它们还扩大了现有的中华甜橙参考基因组和测序数据资源库。此外,这些基因组还为不同植物物种间的比较基因组学分析提供了重要的数据基础。
{"title":"Construction and analysis of telomere-to-telomere genomes for 2 sweet oranges: Longhuihong and Newhall (Citrus sinensis).","authors":"Lin Hong, Xin-Dong Xu, Lei Yang, Min Wang, Shuang Li, Haijian Yang, Si-Ying Ye, Ling-Ling Chen, Jia-Ming Song","doi":"10.1093/gigascience/giae084","DOIUrl":"10.1093/gigascience/giae084","url":null,"abstract":"<p><strong>Background: </strong>Sweet orange (Citrus sinensis Osbeck) is a fruit crop of high nutritional value that is widely consumed around the world. However, its susceptibility to low-temperature stress limits its cultivation and production in regions prone to frost damage, severely impacting the sustainable development of the sweet orange industry. Therefore, developing cold-resistant sweet orange varieties is of great necessity. Traditional hybrid breeding methods are not feasible due to the polyembryonic phenomenon in sweet oranges, necessitating the enhancement of its germplasm through molecular breeding. High-quality reference genomes are valuable for studying crop resistance to biotic and abiotic stresses. However, the lack of genomic resources for cold-resistant sweet orange varieties has hindered the progress in developing such varieties and researching their molecular mechanisms of cold resistance.</p><p><strong>Findings: </strong>This study integrated PacBio HiFi, ONT, Hi-C, and Illumina sequencing data to assemble telomere-to-telomere (T2T) reference genomes for the cold-resistant sweet orange mutant \"Longhuihong\" (Citrus sinensis [L.] Osb. cv. LHH) and its wild-type counterpart \"Newhall\" (C. sinensis [L.] Osb. cv. Newhall). Comprehensive evaluations based on multiple criteria revealed that both genomes exhibit high continuity, completeness, and accuracy. The genome sizes were 340.28 Mb and 346.33 Mb, with contig N50 of 39.31 Mb and 36.77 Mb, respectively. In total, 31,456 and 30,021 gene models were annotated in the respective genomes. Leveraging these assembled genomes, comparative genomics analyses were performed, elucidating the evolutionary history of the sweet orange genome. Moreover, the study identified 2,886 structural variants between the 2 genomes, with several SVs located in the upstream, downstream, or intronic regions of homologous genes known to be associated with cold resistance.</p><p><strong>Conclusions: </strong>The study de novo assembled 2 T2T reference genomes of sweet orange varieties exhibiting different levels of cold tolerance. These genomes serve as valuable foundational resources for genomic research and molecular breeding aimed at enhancing cold tolerance in sweet oranges. Additionally, they expand the existing repository of reference genomes and sequencing data resources for C. sinensis. Moreover, these genomes provide a critical data foundation for comparative genomics analyses across different plant species.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11590112/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142715757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
demuxSNP: supervised demultiplexing single-cell RNA sequencing using cell hashing and SNPs. demuxSNP:使用细胞哈希和snp进行监督的单细胞RNA解复用测序。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae090
Michael P Lynch, Yufei Wang, Shannan Ho Sui, Laurent Gatto, Aedin C Culhane

Background: Multiplexing single-cell RNA sequencing experiments reduces sequencing cost and facilitates larger-scale studies. However, factors such as cell hashing quality and class size imbalance impact demultiplexing algorithm performance, reducing cost-effectiveness.

Findings: We propose a supervised algorithm, demuxSNP, which leverages both cell hashing and genetic variation between individuals (single-nucletotide polymorphisms [SNPs]). demuxSNP addresses fundamental limitations in demultiplexing methods that use only one data modality. Some cells may be confidently demultiplexed using probabilistic hashing methods. demuxSNP uses these data to infer the genotype of singlet and doublet clusters and predict on cells assigned as negative, uncertain, or doublet using a nearest-neighbor approach adapted for missing data.We benchmarked demuxSNP against hashing, genotype-free SNP and hybrid methods on simulated and real data from renal cell cancer. demuxSNP outperformed standalone hashing methods on low-quality hashing data benchmark, improved overall classification accuracy, and allowed more high RNA quality cells to be recovered. Through varying simulated doublet rates, we showed that genotype-free SNP and hybrid methods that leverage them were impacted by class size imbalance and doublet rate. demuxSNP's supervised approach was more robust to doublet rate in experiments with class size imbalance.

Conclusions: demuxSNP uses hashing and SNP data to demultiplex datasets with low hashing quality where biological samples are genetically distinct. Unassigned or negative cells with high RNA quality are recovered, making more cells available for analysis. Data simulation and benchmarking pipelines as well as processed benchmarking data for 5-50% doublets are publicly available. demuxSNP is available as an R/Bioconductor package (https://doi.org/doi:10.18129/B9.bioc.demuxSNP).

背景:多路单细胞RNA测序实验降低了测序成本,有利于更大规模的研究。然而,诸如单元哈希质量和类大小不平衡等因素会影响解复用算法的性能,降低成本效益。研究结果:我们提出了一种监督算法,demuxSNP,它利用了细胞哈希和个体之间的遗传变异(单核苷酸多态性[snp])。demuxSNP解决了仅使用一种数据模式的解复用方法中的基本限制。一些单元可以使用概率哈希方法自信地解复用。demuxSNP使用这些数据来推断单线和双线簇的基因型,并使用适用于缺失数据的最近邻方法预测分配为阴性,不确定或双线的细胞。我们在模拟和真实的肾细胞癌数据上对demuxSNP与哈希、无基因型SNP和杂交方法进行了基准测试。demuxSNP在低质量哈希数据基准上优于独立哈希方法,提高了整体分类精度,并允许回收更多高RNA质量的细胞。通过不同的模拟双偶率,我们发现无基因型SNP和利用它们的杂交方法受到班级规模不平衡和双偶率的影响。在班级规模不平衡的实验中,demuxSNP的监督方法对重偶率具有更强的鲁棒性。结论:demuxSNP使用哈希和SNP数据对具有低哈希质量的数据集进行解复用,其中生物样本具有遗传差异。具有高RNA质量的未分配或阴性细胞被回收,使更多的细胞可用于分析。数据模拟和基准测试管道以及5-50%双态的处理基准测试数据是公开的。demuxSNP是一个R/Bioconductor包(https://doi.org/doi:10.18129/B9.bioc.demuxSNP)。
{"title":"demuxSNP: supervised demultiplexing single-cell RNA sequencing using cell hashing and SNPs.","authors":"Michael P Lynch, Yufei Wang, Shannan Ho Sui, Laurent Gatto, Aedin C Culhane","doi":"10.1093/gigascience/giae090","DOIUrl":"10.1093/gigascience/giae090","url":null,"abstract":"<p><strong>Background: </strong>Multiplexing single-cell RNA sequencing experiments reduces sequencing cost and facilitates larger-scale studies. However, factors such as cell hashing quality and class size imbalance impact demultiplexing algorithm performance, reducing cost-effectiveness.</p><p><strong>Findings: </strong>We propose a supervised algorithm, demuxSNP, which leverages both cell hashing and genetic variation between individuals (single-nucletotide polymorphisms [SNPs]). demuxSNP addresses fundamental limitations in demultiplexing methods that use only one data modality. Some cells may be confidently demultiplexed using probabilistic hashing methods. demuxSNP uses these data to infer the genotype of singlet and doublet clusters and predict on cells assigned as negative, uncertain, or doublet using a nearest-neighbor approach adapted for missing data.We benchmarked demuxSNP against hashing, genotype-free SNP and hybrid methods on simulated and real data from renal cell cancer. demuxSNP outperformed standalone hashing methods on low-quality hashing data benchmark, improved overall classification accuracy, and allowed more high RNA quality cells to be recovered. Through varying simulated doublet rates, we showed that genotype-free SNP and hybrid methods that leverage them were impacted by class size imbalance and doublet rate. demuxSNP's supervised approach was more robust to doublet rate in experiments with class size imbalance.</p><p><strong>Conclusions: </strong>demuxSNP uses hashing and SNP data to demultiplex datasets with low hashing quality where biological samples are genetically distinct. Unassigned or negative cells with high RNA quality are recovered, making more cells available for analysis. Data simulation and benchmarking pipelines as well as processed benchmarking data for 5-50% doublets are publicly available. demuxSNP is available as an R/Bioconductor package (https://doi.org/doi:10.18129/B9.bioc.demuxSNP).</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11604057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142750345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
stMMR: accurate and robust spatial domain identification from spatially resolved transcriptomics with multimodal feature representation. stMMR:从具有多模态特征表示的空间分解转录组学中准确和健壮的空间域识别。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae089
Daoliang Zhang, Na Yu, Zhiyuan Yuan, Wenrui Li, Xue Sun, Qi Zou, Xiangyu Li, Zhiping Liu, Wei Zhang, Rui Gao

Background: Deciphering spatial domains using spatially resolved transcriptomics (SRT) is of great value for characterizing and understanding tissue architecture. However, the inherent heterogeneity and varying spatial resolutions present challenges in the joint analysis of multimodal SRT data.

Results: We introduce a multimodal geometric deep learning method, named stMMR, to effectively integrate gene expression, spatial location, and histological information for accurate identifying spatial domains from SRT data. stMMR uses graph convolutional networks and a self-attention module for deep embedding of features within unimodality and incorporates similarity contrastive learning for integrating features across modalities.

Conclusions: Comprehensive benchmark analysis on various types of spatial data shows superior performance of stMMR in multiple analyses, including spatial domain identification, pseudo-spatiotemporal analysis, and domain-specific gene discovery. In chicken heart development, stMMR reconstructed the spatiotemporal lineage structures, indicating an accurate developmental sequence. In breast cancer and lung cancer, stMMR clearly delineated the tumor microenvironment and identified marker genes associated with diagnosis and prognosis. Overall, stMMR is capable of effectively utilizing the multimodal information of various SRT data to explore and characterize tissue architectures of homeostasis, development, and tumor.

背景:利用空间分解转录组学(SRT)破译空间域对于表征和理解组织结构具有重要价值。然而,其固有的异质性和不同的空间分辨率给多模态SRT数据的联合分析带来了挑战。结果:我们引入了一种名为stMMR的多模态几何深度学习方法,可以有效地整合基因表达、空间位置和组织学信息,从而从SRT数据中准确识别空间域。stMMR使用图卷积网络和自关注模块在单模态中深度嵌入特征,并结合相似性对比学习来整合跨模态的特征。结论:基于不同类型空间数据的综合基准分析表明,stMMR在空间域识别、伪时空分析、域特异性基因发现等多个分析方面表现优异。在鸡心脏发育中,stMMR重建了时空谱系结构,显示了准确的发育序列。在乳腺癌和肺癌中,stMMR清晰地描绘了肿瘤微环境,并鉴定了与诊断和预后相关的标记基因。总的来说,stMMR能够有效地利用各种SRT数据的多模态信息来探索和表征体内平衡、发育和肿瘤的组织结构。
{"title":"stMMR: accurate and robust spatial domain identification from spatially resolved transcriptomics with multimodal feature representation.","authors":"Daoliang Zhang, Na Yu, Zhiyuan Yuan, Wenrui Li, Xue Sun, Qi Zou, Xiangyu Li, Zhiping Liu, Wei Zhang, Rui Gao","doi":"10.1093/gigascience/giae089","DOIUrl":"10.1093/gigascience/giae089","url":null,"abstract":"<p><strong>Background: </strong>Deciphering spatial domains using spatially resolved transcriptomics (SRT) is of great value for characterizing and understanding tissue architecture. However, the inherent heterogeneity and varying spatial resolutions present challenges in the joint analysis of multimodal SRT data.</p><p><strong>Results: </strong>We introduce a multimodal geometric deep learning method, named stMMR, to effectively integrate gene expression, spatial location, and histological information for accurate identifying spatial domains from SRT data. stMMR uses graph convolutional networks and a self-attention module for deep embedding of features within unimodality and incorporates similarity contrastive learning for integrating features across modalities.</p><p><strong>Conclusions: </strong>Comprehensive benchmark analysis on various types of spatial data shows superior performance of stMMR in multiple analyses, including spatial domain identification, pseudo-spatiotemporal analysis, and domain-specific gene discovery. In chicken heart development, stMMR reconstructed the spatiotemporal lineage structures, indicating an accurate developmental sequence. In breast cancer and lung cancer, stMMR clearly delineated the tumor microenvironment and identified marker genes associated with diagnosis and prognosis. Overall, stMMR is capable of effectively utilizing the multimodal information of various SRT data to explore and characterize tissue architectures of homeostasis, development, and tumor.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11604062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142750406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PlasGO: enhancing GO-based function prediction for plasmid-encoded proteins based on genetic structure. PlasGO:基于基因结构加强质粒编码蛋白的 GO 功能预测。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae104
Yongxin Ji, Jiayu Shang, Jiaojiao Guan, Wei Zou, Herui Liao, Xubo Tang, Yanni Sun

Background: Plasmid, as a mobile genetic element, plays a pivotal role in facilitating the transfer of traits, such as antimicrobial resistance, among the bacterial community. Annotating plasmid-encoded proteins with the widely used Gene Ontology (GO) vocabulary is a fundamental step in various tasks, including plasmid mobility classification. However, GO prediction for plasmid-encoded proteins faces 2 major challenges: the high diversity of functions and the limited availability of high-quality GO annotations.

Results: In this study, we introduce PlasGO, a tool that leverages a hierarchical architecture to predict GO terms for plasmid proteins. PlasGO utilizes a powerful protein language model to learn the local context within protein sentences and a BERT model to capture the global context within plasmid sentences. Additionally, PlasGO allows users to control the precision by incorporating a self-attention confidence weighting mechanism. We rigorously evaluated PlasGO and benchmarked it against 7 state-of-the-art tools in a series of experiments. The experimental results collectively demonstrate that PlasGO has achieved commendable performance. PlasGO significantly expanded the annotations of the plasmid-encoded protein database by assigning high-confidence GO terms to over 95% of previously unannotated proteins, showcasing impressive precision of 0.8229, 0.7941, and 0.8870 for the 3 GO categories, respectively, as measured on the novel protein test set.

Conclusions: PlasGO, a hierarchical tool incorporating protein language models and BERT, significantly expanded plasmid protein annotations by predicting high-confidence GO terms. These annotations have been compiled into a database, which will serve as a valuable contribution to downstream plasmid analysis and research.

背景:质粒作为一种可移动的遗传元件,在促进细菌群落中抗菌素耐药性等性状的转移中起着关键作用。用广泛使用的基因本体(Gene Ontology, GO)词汇对质粒编码的蛋白质进行注释是包括质粒迁移率分类在内的各种任务的基本步骤。然而,质粒编码蛋白的氧化石墨烯预测面临两个主要挑战:功能的高度多样性和高质量氧化石墨烯注释的有限可用性。结果:在本研究中,我们引入了PlasGO,这是一种利用层次结构来预测质粒蛋白的GO术语的工具。PlasGO利用强大的蛋白质语言模型来学习蛋白质句子中的局部上下文,利用BERT模型来捕获质粒句子中的全局上下文。此外,PlasGO允许用户通过结合自我关注自信加权机制来控制精度。我们对PlasGO进行了严格的评估,并在一系列实验中对7种最先进的工具进行了基准测试。实验结果表明,PlasGO取得了良好的性能。PlasGO通过为超过95%的先前未注释的蛋白质分配高置信度的GO术语,显著扩展了质粒编码蛋白质数据库的注释,在新的蛋白质测试集上测量的3个GO类别分别显示出令人印象深刻的精度为0.8229,0.7941和0.8870。结论:PlasGO是一个结合蛋白质语言模型和BERT的分层工具,通过预测高置信度的GO术语,显著扩展了质粒蛋白质注释。这些注释已汇编成一个数据库,将为下游质粒分析和研究提供有价值的贡献。
{"title":"PlasGO: enhancing GO-based function prediction for plasmid-encoded proteins based on genetic structure.","authors":"Yongxin Ji, Jiayu Shang, Jiaojiao Guan, Wei Zou, Herui Liao, Xubo Tang, Yanni Sun","doi":"10.1093/gigascience/giae104","DOIUrl":"10.1093/gigascience/giae104","url":null,"abstract":"<p><strong>Background: </strong>Plasmid, as a mobile genetic element, plays a pivotal role in facilitating the transfer of traits, such as antimicrobial resistance, among the bacterial community. Annotating plasmid-encoded proteins with the widely used Gene Ontology (GO) vocabulary is a fundamental step in various tasks, including plasmid mobility classification. However, GO prediction for plasmid-encoded proteins faces 2 major challenges: the high diversity of functions and the limited availability of high-quality GO annotations.</p><p><strong>Results: </strong>In this study, we introduce PlasGO, a tool that leverages a hierarchical architecture to predict GO terms for plasmid proteins. PlasGO utilizes a powerful protein language model to learn the local context within protein sentences and a BERT model to capture the global context within plasmid sentences. Additionally, PlasGO allows users to control the precision by incorporating a self-attention confidence weighting mechanism. We rigorously evaluated PlasGO and benchmarked it against 7 state-of-the-art tools in a series of experiments. The experimental results collectively demonstrate that PlasGO has achieved commendable performance. PlasGO significantly expanded the annotations of the plasmid-encoded protein database by assigning high-confidence GO terms to over 95% of previously unannotated proteins, showcasing impressive precision of 0.8229, 0.7941, and 0.8870 for the 3 GO categories, respectively, as measured on the novel protein test set.</p><p><strong>Conclusions: </strong>PlasGO, a hierarchical tool incorporating protein language models and BERT, significantly expanded plasmid protein annotations by predicting high-confidence GO terms. These annotations have been compiled into a database, which will serve as a valuable contribution to downstream plasmid analysis and research.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11659980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142863067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genomic decoding of Theobroma grandiflorum (cupuassu) at chromosomal scale: evolutionary insights for horticultural innovation. 在染色体尺度上对大叶猴面包树(cupuassu)进行基因组解码:从进化角度看园艺创新。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae027
Rafael Moysés Alves, Vinicius A C de Abreu, Rafaely Pantoja Oliveira, João Victor Dos Anjos Almeida, Mauro de Medeiros de Oliveira, Saura R Silva, Alexandre R Paschoal, Sintia S de Almeida, Pedro A F de Souza, Jesus A Ferro, Vitor F O Miranda, Antonio Figueira, Douglas S Domingues, Alessandro M Varani

Background: Theobroma grandiflorum (Malvaceae), known as cupuassu, is a tree indigenous to the Amazon basin, valued for its large fruits and seed pulp, contributing notably to the Amazonian bioeconomy. The seed pulp is utilized in desserts and beverages, and its seed butter is used in cosmetics. Here, we present the sequenced telomere-to-telomere genome of cupuassu, disclosing its genomic structure, evolutionary features, and phylogenetic relationships within the Malvaceae family.

Findings: The cupuassu genome spans 423 Mb, encodes 31,381 genes distributed in 10 chromosomes, and exhibits approximately 65% gene synteny with the Theobroma cacao genome, reflecting a conserved evolutionary history, albeit punctuated with unique genomic variations. The main changes are pronounced by bursts of long-terminal repeat retrotransposons at postspecies divergence, retrocopied and singleton genes, and gene families displaying distinctive patterns of expansion and contraction. Furthermore, positively selected genes are evident, particularly among retained and dispersed tandem and proximal duplicated genes associated with general fruit and seed traits and defense mechanisms, supporting the hypothesis of potential episodes of subfunctionalization and neofunctionalization following duplication, as well as impact from distinct domestication process. These genomic variations may underpin the differences observed in fruit and seed morphology, ripening, and disease resistance between cupuassu and the other Malvaceae species.

Conclusions: The cupuassu genome offers a foundational resource for both breeding improvement and conservation biology, yielding insights into the evolution and diversity within the genus Theobroma.

背景:大叶可可树(锦葵科),又称 "杯果",是亚马逊盆地的一种本土树种,因其果实大、籽浆多而珍贵,对亚马逊生物经济的贡献巨大。种子果肉可用于甜点和饮料,种子黄油可用于化妆品。在这里,我们展示了从端粒到端粒的cupuassu基因组测序结果,揭示了其基因组结构、进化特征以及在锦葵科中的系统发育关系:可可巴豆基因组跨度达 423 Mb,编码 31,381 个基因,分布在 10 条染色体上,与可可巴豆基因组的基因同源性约为 65%,反映了其进化史的保守性,尽管其中也有独特的基因组变异。主要变化表现在物种分化后出现的长端重复反转座子、反转座基因和单基因,以及基因家族呈现出独特的扩张和收缩模式。此外,正选择基因也很明显,特别是在保留和分散的串联基因和近端重复基因中,这些基因与果实和种子的一般性状以及防御机制有关,支持了复制后可能出现的亚功能化和新功能化的假说,以及不同驯化过程的影响。这些基因组变异可能是所观察到的巴西莓与其他锦葵科植物在果实和种子形态、成熟和抗病性方面的差异的基础:杯果基因组为育种改良和保护生物学提供了基础资源,有助于深入了解可可巴豆属的进化和多样性。
{"title":"Genomic decoding of Theobroma grandiflorum (cupuassu) at chromosomal scale: evolutionary insights for horticultural innovation.","authors":"Rafael Moysés Alves, Vinicius A C de Abreu, Rafaely Pantoja Oliveira, João Victor Dos Anjos Almeida, Mauro de Medeiros de Oliveira, Saura R Silva, Alexandre R Paschoal, Sintia S de Almeida, Pedro A F de Souza, Jesus A Ferro, Vitor F O Miranda, Antonio Figueira, Douglas S Domingues, Alessandro M Varani","doi":"10.1093/gigascience/giae027","DOIUrl":"10.1093/gigascience/giae027","url":null,"abstract":"<p><strong>Background: </strong>Theobroma grandiflorum (Malvaceae), known as cupuassu, is a tree indigenous to the Amazon basin, valued for its large fruits and seed pulp, contributing notably to the Amazonian bioeconomy. The seed pulp is utilized in desserts and beverages, and its seed butter is used in cosmetics. Here, we present the sequenced telomere-to-telomere genome of cupuassu, disclosing its genomic structure, evolutionary features, and phylogenetic relationships within the Malvaceae family.</p><p><strong>Findings: </strong>The cupuassu genome spans 423 Mb, encodes 31,381 genes distributed in 10 chromosomes, and exhibits approximately 65% gene synteny with the Theobroma cacao genome, reflecting a conserved evolutionary history, albeit punctuated with unique genomic variations. The main changes are pronounced by bursts of long-terminal repeat retrotransposons at postspecies divergence, retrocopied and singleton genes, and gene families displaying distinctive patterns of expansion and contraction. Furthermore, positively selected genes are evident, particularly among retained and dispersed tandem and proximal duplicated genes associated with general fruit and seed traits and defense mechanisms, supporting the hypothesis of potential episodes of subfunctionalization and neofunctionalization following duplication, as well as impact from distinct domestication process. These genomic variations may underpin the differences observed in fruit and seed morphology, ripening, and disease resistance between cupuassu and the other Malvaceae species.</p><p><strong>Conclusions: </strong>The cupuassu genome offers a foundational resource for both breeding improvement and conservation biology, yielding insights into the evolution and diversity within the genus Theobroma.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11152179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141261605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAT: a computational anatomy toolbox for the analysis of structural MRI data. CAT:用于分析核磁共振成像结构数据的计算解剖工具箱。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae049
Christian Gaser, Robert Dahnke, Paul M Thompson, Florian Kurth, Eileen Luders, The Alzheimer's Disease Neuroimaging Initiative

A large range of sophisticated brain image analysis tools have been developed by the neuroscience community, greatly advancing the field of human brain mapping. Here we introduce the Computational Anatomy Toolbox (CAT)-a powerful suite of tools for brain morphometric analyses with an intuitive graphical user interface but also usable as a shell script. CAT is suitable for beginners, casual users, experts, and developers alike, providing a comprehensive set of analysis options, workflows, and integrated pipelines. The available analysis streams-illustrated on an example dataset-allow for voxel-based, surface-based, and region-based morphometric analyses. Notably, CAT incorporates multiple quality control options and covers the entire analysis workflow, including the preprocessing of cross-sectional and longitudinal data, statistical analysis, and the visualization of results. The overarching aim of this article is to provide a complete description and evaluation of CAT while offering a citable standard for the neuroscience community.

神经科学界开发了大量复杂的大脑图像分析工具,极大地推动了人脑绘图领域的发展。这里我们介绍计算解剖工具箱(CAT)--一套功能强大的脑形态分析工具,具有直观的图形用户界面,也可作为 shell 脚本使用。CAT 适合初学者、普通用户、专家和开发人员使用,提供了一套全面的分析选项、工作流程和集成管道。可用的分析流--以一个示例数据集为例--允许进行基于体素、基于表面和基于区域的形态计量分析。值得注意的是,CAT 集成了多种质量控制选项,涵盖了整个分析工作流程,包括横断面和纵向数据的预处理、统计分析和结果可视化。本文的总体目标是对 CAT 进行完整的描述和评估,同时为神经科学界提供一个可引用的标准。
{"title":"CAT: a computational anatomy toolbox for the analysis of structural MRI data.","authors":"Christian Gaser, Robert Dahnke, Paul M Thompson, Florian Kurth, Eileen Luders, The Alzheimer's Disease Neuroimaging Initiative","doi":"10.1093/gigascience/giae049","DOIUrl":"10.1093/gigascience/giae049","url":null,"abstract":"<p><p>A large range of sophisticated brain image analysis tools have been developed by the neuroscience community, greatly advancing the field of human brain mapping. Here we introduce the Computational Anatomy Toolbox (CAT)-a powerful suite of tools for brain morphometric analyses with an intuitive graphical user interface but also usable as a shell script. CAT is suitable for beginners, casual users, experts, and developers alike, providing a comprehensive set of analysis options, workflows, and integrated pipelines. The available analysis streams-illustrated on an example dataset-allow for voxel-based, surface-based, and region-based morphometric analyses. Notably, CAT incorporates multiple quality control options and covers the entire analysis workflow, including the preprocessing of cross-sectional and longitudinal data, statistical analysis, and the visualization of results. The overarching aim of this article is to provide a complete description and evaluation of CAT while offering a citable standard for the neuroscience community.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299546/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141893242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current status of global conservation and characterisation of wild and cultivated Brassicaceae genetic resources. 全球野生和栽培十字花科遗传资源的保护和特征描述现状。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giae050
Elena Castillo-Lorenzo, Elinor Breman, Pablo Gómez Barreiro, Juan Viruel

Background: The economic importance of the globally distributed Brassicaceae family resides in the large diversity of crops within the family and the substantial variety of agronomic and functional traits they possess. We reviewed the current classifications of crop wild relatives (CWRs) in the Brassicaceae family with the aim of identifying new potential cross-compatible species from a total of 1,242 species using phylogenetic approaches.

Results: In general, cross-compatibility data between wild species and crops, as well as phenotype and genotype characterisation data, were available for major crops but very limited for minor crops, restricting the identification of new potential CWRs. Around 70% of wild Brassicaceae did not have genetic sequence data available in public repositories, and only 40% had chromosome counts published. Using phylogenetic distances, we propose 103 new potential CWRs for this family, which we recommend as priorities for cross-compatibility tests with crops and for phenotypic characterisation, including 71 newly identified CWRs for 10 minor crops. From the total species used in this study, more than half had no records of being in ex situ conservation, and 80% were not assessed for their conservation status or were data deficient (IUCN Red List Assessments).

Conclusions: Great efforts are needed on ex situ conservation to have accessible material for characterising and evaluating the species for future breeding programmes. We identified the Mediterranean region as one key conservation area for wild Brassicaceae species, with great numbers of endemic and threatened species. Conservation assessments are urgently needed to evaluate most of these wild Brassicaceae.

背景:分布于全球的十字花科(Brassicaceae)在经济上的重要性在于该科内作物的多样性以及它们所具有的大量农艺学和功能性特征。我们回顾了十字花科作物野生近缘种(CWRs)的现有分类,目的是从总共 1,242 个物种中利用系统发育方法鉴定出新的潜在交叉相容物种:一般来说,主要作物可获得野生物种与作物之间的杂交相容性数据以及表型和基因型特征数据,而次要作物则非常有限,这限制了新的潜在杂交种的鉴定。约 70% 的野生十字花科植物在公共资料库中没有基因序列数据,只有 40% 公布了染色体数。利用系统发育距离,我们为该科提出了 103 个新的潜在 CWRs,并建议将这些 CWRs 作为与作物进行杂交相容性测试和表型鉴定的优先选择,其中包括为 10 种次要作物新鉴定的 71 个 CWRs。在这项研究中使用的所有物种中,超过一半的物种没有进行异地保护的记录,80%的物种没有进行保护状况评估或数据不足(世界自然保护联盟红色名录评估):结论:需要大力开展异地保护工作,以便为未来的繁殖计划提供可获取的材料,对物种进行特征描述和评估。我们发现地中海地区是野生十字花科物种的主要保护区之一,这里有大量特有物种和濒危物种。亟需对这些野生十字花科植物进行保护评估。
{"title":"Current status of global conservation and characterisation of wild and cultivated Brassicaceae genetic resources.","authors":"Elena Castillo-Lorenzo, Elinor Breman, Pablo Gómez Barreiro, Juan Viruel","doi":"10.1093/gigascience/giae050","DOIUrl":"10.1093/gigascience/giae050","url":null,"abstract":"<p><strong>Background: </strong>The economic importance of the globally distributed Brassicaceae family resides in the large diversity of crops within the family and the substantial variety of agronomic and functional traits they possess. We reviewed the current classifications of crop wild relatives (CWRs) in the Brassicaceae family with the aim of identifying new potential cross-compatible species from a total of 1,242 species using phylogenetic approaches.</p><p><strong>Results: </strong>In general, cross-compatibility data between wild species and crops, as well as phenotype and genotype characterisation data, were available for major crops but very limited for minor crops, restricting the identification of new potential CWRs. Around 70% of wild Brassicaceae did not have genetic sequence data available in public repositories, and only 40% had chromosome counts published. Using phylogenetic distances, we propose 103 new potential CWRs for this family, which we recommend as priorities for cross-compatibility tests with crops and for phenotypic characterisation, including 71 newly identified CWRs for 10 minor crops. From the total species used in this study, more than half had no records of being in ex situ conservation, and 80% were not assessed for their conservation status or were data deficient (IUCN Red List Assessments).</p><p><strong>Conclusions: </strong>Great efforts are needed on ex situ conservation to have accessible material for characterising and evaluating the species for future breeding programmes. We identified the Mediterranean region as one key conservation area for wild Brassicaceae species, with great numbers of endemic and threatened species. Conservation assessments are urgently needed to evaluate most of these wild Brassicaceae.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11304946/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Swin Transformer and knowledge transfer for denoising of super-resolution structured illumination microscopy data. 评估用于超分辨率结构照明显微镜数据去噪的斯温变换器和知识转移。
IF 3.5 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-01-02 DOI: 10.1093/gigascience/giad109
Zafran Hussain Shah, Marcel Müller, Wolfgang Hübner, Tung-Cheng Wang, Daniel Telman, Thomas Huser, Wolfram Schenck

Background: Convolutional neural network (CNN)-based methods have shown excellent performance in denoising and reconstruction of super-resolved structured illumination microscopy (SR-SIM) data. Therefore, CNN-based architectures have been the focus of existing studies. However, Swin Transformer, an alternative and recently proposed deep learning-based image restoration architecture, has not been fully investigated for denoising SR-SIM images. Furthermore, it has not been fully explored how well transfer learning strategies work for denoising SR-SIM images with different noise characteristics and recorded cell structures for these different types of deep learning-based methods. Currently, the scarcity of publicly available SR-SIM datasets limits the exploration of the performance and generalization capabilities of deep learning methods.

Results: In this work, we present SwinT-fairSIM, a novel method based on the Swin Transformer for restoring SR-SIM images with a low signal-to-noise ratio. The experimental results show that SwinT-fairSIM outperforms previous CNN-based denoising methods. Furthermore, as a second contribution, two types of transfer learning-namely, direct transfer and fine-tuning-were benchmarked in combination with SwinT-fairSIM and CNN-based methods for denoising SR-SIM data. Direct transfer did not prove to be a viable strategy, but fine-tuning produced results comparable to conventional training from scratch while saving computational time and potentially reducing the amount of training data required. As a third contribution, we publish four datasets of raw SIM images and already reconstructed SR-SIM images. These datasets cover two different types of cell structures, tubulin filaments and vesicle structures. Different noise levels are available for the tubulin filaments.

Conclusion: The SwinT-fairSIM method is well suited for denoising SR-SIM images. By fine-tuning, already trained models can be easily adapted to different noise characteristics and cell structures. Furthermore, the provided datasets are structured in a way that the research community can readily use them for research on denoising, super-resolution, and transfer learning strategies.

背景:基于卷积神经网络(CNN)的方法在超分辨结构照明显微镜(SR-SIM)数据的去噪和重建方面表现出色。因此,基于 CNN 的架构一直是现有研究的重点。然而,最近提出的另一种基于深度学习的图像修复架构 Swin Transformer 还没有被充分研究用于 SR-SIM 图像的去噪。此外,对于这些不同类型的基于深度学习的方法,如何利用迁移学习策略对具有不同噪声特征和记录单元结构的 SR-SIM 图像进行去噪,还没有进行充分的探讨。目前,公开可用的 SR-SIM 数据集的稀缺性限制了对深度学习方法的性能和泛化能力的探索:在这项工作中,我们提出了 SwinT-fairSIM,这是一种基于 Swin 变换器的新方法,用于还原信噪比较低的 SR-SIM 图像。实验结果表明,SwinT-fairSIM 优于之前基于 CNN 的去噪方法。此外,作为第二项贡献,两种类型的迁移学习--即直接迁移和微调--与 SwinT-fairSIM 和基于 CNN 的 SR-SIM 数据去噪方法相结合进行了基准测试。事实证明,直接迁移不是一种可行的策略,但微调的结果与传统的从头开始训练的结果相当,同时节省了计算时间,并有可能减少所需的训练数据量。第三个贡献是,我们发布了四个原始 SIM 图像和已重建 SR-SIM 图像的数据集。这些数据集涵盖两种不同类型的细胞结构,即微管蛋白丝和囊泡结构。对于微管蛋白丝,有不同的噪声水平:结论:SwinT-fairSIM 方法非常适合 SR-SIM 图像去噪。通过微调,已经训练好的模型可以很容易地适应不同的噪声特征和细胞结构。此外,所提供的数据集结构合理,研究界可随时将其用于去噪、超分辨率和迁移学习策略的研究。
{"title":"Evaluation of Swin Transformer and knowledge transfer for denoising of super-resolution structured illumination microscopy data.","authors":"Zafran Hussain Shah, Marcel Müller, Wolfgang Hübner, Tung-Cheng Wang, Daniel Telman, Thomas Huser, Wolfram Schenck","doi":"10.1093/gigascience/giad109","DOIUrl":"10.1093/gigascience/giad109","url":null,"abstract":"<p><strong>Background: </strong>Convolutional neural network (CNN)-based methods have shown excellent performance in denoising and reconstruction of super-resolved structured illumination microscopy (SR-SIM) data. Therefore, CNN-based architectures have been the focus of existing studies. However, Swin Transformer, an alternative and recently proposed deep learning-based image restoration architecture, has not been fully investigated for denoising SR-SIM images. Furthermore, it has not been fully explored how well transfer learning strategies work for denoising SR-SIM images with different noise characteristics and recorded cell structures for these different types of deep learning-based methods. Currently, the scarcity of publicly available SR-SIM datasets limits the exploration of the performance and generalization capabilities of deep learning methods.</p><p><strong>Results: </strong>In this work, we present SwinT-fairSIM, a novel method based on the Swin Transformer for restoring SR-SIM images with a low signal-to-noise ratio. The experimental results show that SwinT-fairSIM outperforms previous CNN-based denoising methods. Furthermore, as a second contribution, two types of transfer learning-namely, direct transfer and fine-tuning-were benchmarked in combination with SwinT-fairSIM and CNN-based methods for denoising SR-SIM data. Direct transfer did not prove to be a viable strategy, but fine-tuning produced results comparable to conventional training from scratch while saving computational time and potentially reducing the amount of training data required. As a third contribution, we publish four datasets of raw SIM images and already reconstructed SR-SIM images. These datasets cover two different types of cell structures, tubulin filaments and vesicle structures. Different noise levels are available for the tubulin filaments.</p><p><strong>Conclusion: </strong>The SwinT-fairSIM method is well suited for denoising SR-SIM images. By fine-tuning, already trained models can be easily adapted to different noise characteristics and cell structures. Furthermore, the provided datasets are structured in a way that the research community can readily use them for research on denoising, super-resolution, and transfer learning strategies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10787368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139466408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1