首页 > 最新文献

GigaScience最新文献

英文 中文
SMIntegration: A Web Tool for Comprehensive Spatial Metabolomics and Transcriptomics Integrated Analysis and Visualization. sminintegration:一个综合空间代谢组学和转录组学综合分析和可视化的网络工具。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-24 DOI: 10.1093/gigascience/giag033
Haoke Deng, Xiaolian Ning, Xun Lin, Liang Zong, Shanqiao Zheng, Yun Zhao, Jing Wang, Lingyun Chen, Jin Zi, Zhanlong Mei

Current tools for spatial omics analysis often face challenges in performing integrated transcriptomics and metabolomics analysis, in-depth biological interpretation, and user-friendly operation. To address this, we developed SMIntegration, the first web-based graphical platform designed specifically for integrated spatial metabolomics and transcriptomics analysis. Built with R/Shiny and deployed using Docker containerization, the platform provides a complete integration workflow, starting from pre-processed spatial features through to functional annotation. Its core functions include: (1) automated and interactive spatial registration; (2) cross-modal spatial pattern recognition; (3) flexible differential analysis of genes and mass features based on clustering results, user-defined regions, or cell type annotations; and (4) group-specific gene-metabolite network construction and interactive visualization. Using adjacent mouse brain coronal sections (Stereo-seq transcriptomics and AFADESI-MS metabolomics) as an example, SMIntegration successfully identified both the periaqueductal gray and subcommissural organ, which were missed by single-modality clustering. Cell type analysis revealed an association between astrocyte-enriched GABA metabolism and Slc6a11, while a comparison between the cornu ammonis region and the midbrain periaqueductal gray dissected glutamatergic and endogenous cannabinoid signaling pathway modules. With a zero-code interface, SMIntegration enables a wide range of researchers to deeply explore gene-metabolite interaction mechanisms within microenvironments during development, homeostasis, and disease.

目前的空间组学分析工具在进行综合转录组学和代谢组学分析、深入生物学解释和用户友好操作方面经常面临挑战。为了解决这个问题,我们开发了sminintegration,这是第一个专门用于综合空间代谢组学和转录组学分析的基于网络的图形平台。该平台使用R/Shiny构建,并使用Docker容器化进行部署,提供了一个完整的集成工作流,从预处理空间特征到功能注释。其核心功能包括:(1)自动交互空间配准;(2)跨模态空间模式识别;(3)基于聚类结果、用户自定义区域或细胞类型注释的基因和质量特征的灵活差异分析;(4)群体特异性基因代谢物网络构建与交互可视化。以邻近的小鼠脑冠状切片(Stereo-seq转录组学和AFADESI-MS代谢组学)为例,sminintegrate成功地识别出了单峰聚类无法识别的导尿管周围灰色和连接下器官。细胞类型分析揭示了星形胶质细胞富集GABA代谢与Slc6a11之间的关联,而玉米氨区与中脑导水管周围灰色解剖的谷氨酸能和内源性大麻素信号通路模块之间的比较。通过零编码接口,sminintegration使广泛的研究人员能够深入探索发育、体内平衡和疾病过程中微环境中基因-代谢物相互作用机制。
{"title":"SMIntegration: A Web Tool for Comprehensive Spatial Metabolomics and Transcriptomics Integrated Analysis and Visualization.","authors":"Haoke Deng, Xiaolian Ning, Xun Lin, Liang Zong, Shanqiao Zheng, Yun Zhao, Jing Wang, Lingyun Chen, Jin Zi, Zhanlong Mei","doi":"10.1093/gigascience/giag033","DOIUrl":"https://doi.org/10.1093/gigascience/giag033","url":null,"abstract":"<p><p>Current tools for spatial omics analysis often face challenges in performing integrated transcriptomics and metabolomics analysis, in-depth biological interpretation, and user-friendly operation. To address this, we developed SMIntegration, the first web-based graphical platform designed specifically for integrated spatial metabolomics and transcriptomics analysis. Built with R/Shiny and deployed using Docker containerization, the platform provides a complete integration workflow, starting from pre-processed spatial features through to functional annotation. Its core functions include: (1) automated and interactive spatial registration; (2) cross-modal spatial pattern recognition; (3) flexible differential analysis of genes and mass features based on clustering results, user-defined regions, or cell type annotations; and (4) group-specific gene-metabolite network construction and interactive visualization. Using adjacent mouse brain coronal sections (Stereo-seq transcriptomics and AFADESI-MS metabolomics) as an example, SMIntegration successfully identified both the periaqueductal gray and subcommissural organ, which were missed by single-modality clustering. Cell type analysis revealed an association between astrocyte-enriched GABA metabolism and Slc6a11, while a comparison between the cornu ammonis region and the midbrain periaqueductal gray dissected glutamatergic and endogenous cannabinoid signaling pathway modules. With a zero-code interface, SMIntegration enables a wide range of researchers to deeply explore gene-metabolite interaction mechanisms within microenvironments during development, homeostasis, and disease.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147503537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sex Chromosome Turnover and Structural Genome Divergence Shapes Meiotic Outcomes in Hybridising Cobitis. 性染色体更替和结构基因组分化影响杂交性肩关节的减数分裂结果。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-24 DOI: 10.1093/gigascience/giag031
Stephen A Schlebusch, Vladimir Trifonov, Zuzana Halenková, Marharyta Klianitskaya, Dmitrij Dedukh, Aurora Ruiz-Herrera, Lucia Álvarez-González, Gala Pujol, Eva Hřibová, Lucija Andjel, Oldřich Bartoš, Petr Pajer, Tomáš Tichopád, Daniel Kulik, Jan Kotusz, Marie Kaštánková Doležálková, Astrid Böhne, Anatolie Marta, Patrik Horna, Radka Reifová, Yann Guiguen, Heiner Kuhl, Jan Pačes, Karel Janko

Background: Hybridisation between divergent species can result in meiotic aberrations and the emergence of asexual reproduction. Yet, it remains poorly understood to what extent such outcomes arise from genome-wide incompatibilities versus more specific conflicts among individual chromosomes inherited from parental species, including their ability to pair during meiosis in hybrids. It is also unclear how interspecific hybrids cope with differences in sex determination systems, particularly in the context of increased ploidy. Addressing these questions requires high-quality, chromosome-level reference genomes of the parental species involved in hybrid formation.

Findings: Here, we present the first chromosome-level genome assemblies for three hybridising Cobitis species (C. elongatoides, C. taenia, and C. tanaitica), providing a comprehensive framework for investigating the genomic and cytogenetic basis of hybrid sterility and the transition to asexuality. By integrating genome scaffolding, male/female pooled sequencing (Pool-Seq), and molecular cytogenetics, we uncover extensive structural variation among homologous chromosomes of the three species, despite overall karyotype conservation. Population-level analyses revealed that each species possesses distinct, non-homologous sex chromosomes, highlighting rapid sex chromosome turnover in this recently diverged lineage. Finally, the design of chromosome-specific painting probes, which we applied to meiotic metaphase I spreads of diploid hybrids. This approach revealed striking differences in the pairing success of orthologous chromosomes.

Conclusions: Our results demonstrate that individual orthologous chromosomes differ markedly in their ability to form bivalents during meiosis in hybrids, indicating that hybrid meiotic behaviour is shaped by chromosome-specific incompatibilities rather than uniform genome-wide failure. We also found that even closely related parental species possess distinct, non-homologous sex chromosomes, highlighting rapid turnover of sex determination systems in hybridising lineages. Together, these findings provide a high-resolution genomic and cytogenetic framework to explore how the architecture of inherited parental genomes influences sex-specific reproductive outcomes in hybrids-ranging from male sterility to the establishment of fertile, clonally reproducing female lineages-and how such asymmetries may contribute to the emergence of asexuality in vertebrates.

背景:不同物种之间的杂交会导致减数分裂畸变和无性生殖的出现。然而,对于这种结果在多大程度上是由全基因组不相容引起的,而不是亲本物种遗传的单个染色体之间更具体的冲突,包括它们在杂交减数分裂期间的配对能力,人们仍然知之甚少。种间杂交种如何应对性别决定系统的差异,特别是在倍性增加的情况下,目前还不清楚。解决这些问题需要高质量的,染色体水平的亲本物种参与杂交形成的参考基因组。研究结果:本研究首次获得了三种杂交Cobitis物种(C. elongatoides, C. taenia和C. tanaitica)的染色体水平基因组组装,为研究杂交不育和向无性生殖过渡的基因组和细胞遗传学基础提供了一个全面的框架。通过整合基因组脚手架、雄性/雌性池测序(Pool-Seq)和分子细胞遗传学,我们发现尽管整体核型保持不变,但这三个物种的同源染色体之间存在广泛的结构差异。种群水平的分析显示,每个物种都具有不同的、非同源的性染色体,突出了在这个最近分化的谱系中快速的性染色体更新。最后,染色体特异性染色探针的设计,我们将其应用于二倍体杂交种减数分裂中期I的扩散。这种方法揭示了同源染色体配对成功的显著差异。结论:我们的研究结果表明,在杂种减数分裂过程中,个体同源染色体形成二价体的能力存在显著差异,这表明杂种减数分裂行为是由染色体特异性不相容而不是统一的全基因组失败决定的。我们还发现,即使是近亲亲本物种也具有不同的非同源性染色体,这突出了杂交谱系中性别决定系统的快速更替。总之,这些发现提供了一个高分辨率的基因组和细胞遗传学框架,以探索遗传亲本基因组的结构如何影响杂交种中性别特异性的生殖结果——从雄性不育到建立可生育的、无性繁殖的雌性谱系——以及这种不对称如何导致脊椎动物无性繁殖的出现。
{"title":"Sex Chromosome Turnover and Structural Genome Divergence Shapes Meiotic Outcomes in Hybridising Cobitis.","authors":"Stephen A Schlebusch, Vladimir Trifonov, Zuzana Halenková, Marharyta Klianitskaya, Dmitrij Dedukh, Aurora Ruiz-Herrera, Lucia Álvarez-González, Gala Pujol, Eva Hřibová, Lucija Andjel, Oldřich Bartoš, Petr Pajer, Tomáš Tichopád, Daniel Kulik, Jan Kotusz, Marie Kaštánková Doležálková, Astrid Böhne, Anatolie Marta, Patrik Horna, Radka Reifová, Yann Guiguen, Heiner Kuhl, Jan Pačes, Karel Janko","doi":"10.1093/gigascience/giag031","DOIUrl":"https://doi.org/10.1093/gigascience/giag031","url":null,"abstract":"<p><strong>Background: </strong>Hybridisation between divergent species can result in meiotic aberrations and the emergence of asexual reproduction. Yet, it remains poorly understood to what extent such outcomes arise from genome-wide incompatibilities versus more specific conflicts among individual chromosomes inherited from parental species, including their ability to pair during meiosis in hybrids. It is also unclear how interspecific hybrids cope with differences in sex determination systems, particularly in the context of increased ploidy. Addressing these questions requires high-quality, chromosome-level reference genomes of the parental species involved in hybrid formation.</p><p><strong>Findings: </strong>Here, we present the first chromosome-level genome assemblies for three hybridising Cobitis species (C. elongatoides, C. taenia, and C. tanaitica), providing a comprehensive framework for investigating the genomic and cytogenetic basis of hybrid sterility and the transition to asexuality. By integrating genome scaffolding, male/female pooled sequencing (Pool-Seq), and molecular cytogenetics, we uncover extensive structural variation among homologous chromosomes of the three species, despite overall karyotype conservation. Population-level analyses revealed that each species possesses distinct, non-homologous sex chromosomes, highlighting rapid sex chromosome turnover in this recently diverged lineage. Finally, the design of chromosome-specific painting probes, which we applied to meiotic metaphase I spreads of diploid hybrids. This approach revealed striking differences in the pairing success of orthologous chromosomes.</p><p><strong>Conclusions: </strong>Our results demonstrate that individual orthologous chromosomes differ markedly in their ability to form bivalents during meiosis in hybrids, indicating that hybrid meiotic behaviour is shaped by chromosome-specific incompatibilities rather than uniform genome-wide failure. We also found that even closely related parental species possess distinct, non-homologous sex chromosomes, highlighting rapid turnover of sex determination systems in hybridising lineages. Together, these findings provide a high-resolution genomic and cytogenetic framework to explore how the architecture of inherited parental genomes influences sex-specific reproductive outcomes in hybrids-ranging from male sterility to the establishment of fertile, clonally reproducing female lineages-and how such asymmetries may contribute to the emergence of asexuality in vertebrates.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147503572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MADRe: Strain-level Metagenomic Classification Through Assembly-Driven Database Reduction. 通过装配驱动的数据库简化的菌株水平宏基因组分类。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-23 DOI: 10.1093/gigascience/giag030
Josipa Lipovac, Mile Šikić, Riccardo Vicedomini, Krešimir Križanović

Strain-level metagenomic classification is essential for understanding microbial diversity and functional potential, yet remains challenging, particularly when sample composition is unknown and reference databases are large and redundant. Here we present MADRe, a modular and scalable pipeline for long-read strain-level metagenomic classification based on Metagenome Assembly-Driven Database Reduction. Beyond system-level integration, MADRe introduces statistical strategies that leverage assembly-derived genomic context to guide database reduction and probabilistic read reassignment. Specifically, it combines long-read metagenome assembly, contig-to-reference reassignment using an expectation-maximization framework for reference reduction, and probabilistic read mapping reassignment on a reduced database to achieve sensitive and precise strain-level classification. We extensively evaluated MADRe on simulated datasets, mock communities, and a real anaerobic digester sludge metagenome. Across diverse similarity and coverage conditions, MADRe consistently improves precision by reducing false-positive strain detections. MADRe's design allows users to apply either the database reduction or read classification step individually. Using only the read classification step shows results on par with other tested tools. MADRe is open source and publicly available at https://github.com/lbcb-sci/MADRe.

菌株水平的宏基因组分类对于了解微生物多样性和功能潜力至关重要,但仍然具有挑战性,特别是当样品组成未知且参考数据库庞大且冗余时。在这里,我们提出了MADRe,一个模块化和可扩展的管道,用于基于宏基因组组装驱动的数据库约简的长读菌株级宏基因组分类。除了系统级集成之外,MADRe还引入了统计策略,利用程序集衍生的基因组上下文来指导数据库缩减和概率读取重分配。具体而言,它结合了长读宏基因组组装、使用期望最大化框架进行参考约简的基因组到参考的重分配以及在简化数据库上的概率读映射重分配,以实现敏感和精确的菌株级分类。我们在模拟数据集、模拟群落和真实厌氧消化污泥宏基因组上广泛评估了MADRe。在不同的相似性和覆盖条件下,MADRe通过减少假阳性应变检测不断提高精度。MADRe的设计允许用户单独应用数据库缩减或读取分类步骤。仅使用读取分类步骤显示的结果与其他测试工具相当。MADRe是开源的,可以在https://github.com/lbcb-sci/MADRe上公开获得。
{"title":"MADRe: Strain-level Metagenomic Classification Through Assembly-Driven Database Reduction.","authors":"Josipa Lipovac, Mile Šikić, Riccardo Vicedomini, Krešimir Križanović","doi":"10.1093/gigascience/giag030","DOIUrl":"https://doi.org/10.1093/gigascience/giag030","url":null,"abstract":"<p><p>Strain-level metagenomic classification is essential for understanding microbial diversity and functional potential, yet remains challenging, particularly when sample composition is unknown and reference databases are large and redundant. Here we present MADRe, a modular and scalable pipeline for long-read strain-level metagenomic classification based on Metagenome Assembly-Driven Database Reduction. Beyond system-level integration, MADRe introduces statistical strategies that leverage assembly-derived genomic context to guide database reduction and probabilistic read reassignment. Specifically, it combines long-read metagenome assembly, contig-to-reference reassignment using an expectation-maximization framework for reference reduction, and probabilistic read mapping reassignment on a reduced database to achieve sensitive and precise strain-level classification. We extensively evaluated MADRe on simulated datasets, mock communities, and a real anaerobic digester sludge metagenome. Across diverse similarity and coverage conditions, MADRe consistently improves precision by reducing false-positive strain detections. MADRe's design allows users to apply either the database reduction or read classification step individually. Using only the read classification step shows results on par with other tested tools. MADRe is open source and publicly available at https://github.com/lbcb-sci/MADRe.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147503595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterising a species-rich and understudied tropical insect fauna using DNA barcoding. 利用DNA条形码描述一个物种丰富且研究不足的热带昆虫动物群。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-17 DOI: 10.1093/gigascience/giag028
David R Hemprich-Bennett, Ezekiel Donkor, Bernard Adams, Naana Afua Acquaah, Eva D Ofori, Samuel Anie-Amoah, Abigail Bailey, H Charles J Godfray, Owen T Lewis, Fred Aboagye-Antwi, Talya D Hackett

Background: West Africa has high biodiversity that is relatively understudied, especially for insects. Studies of West African arthropod diversity can therefore help address important questions regarding conservation, ecosystem services, and insecticide use and other species-control interventions in agriculture and disease management. We intensively sampled arthropods in Ghana using complementary trapping methods, generated DNA barcodes, and classified sequences by Barcode Index Numbers (BINs, a species proxy). Using this dataset, we investigate assemblage composition, temporal activity patterns, and the state of regional biodiversity sampling.

Results: Sequencing DNA from 95,996 individuals captured using Malaise, yellow pan, pitfall, Heath and Centre for Disease Control (CDC) traps, we identified 10,120 unique BINs. The rate of species accumulation did not approach an asymptote for any taxonomic group or trap type, indicating high biodiversity. The different trap types sampled different subsets of the local community, with greatest similarity between yellow pan and pitfall traps. More insects and species (BINs) were trapped during the day than at night. Our dataset shared more BINs in the Barcode of Life Database with South Africa than with any other country, although this predominantly reflects the limited sampling and DNA sequencing campaigns in Africa.

Conclusions: This study more than doubles the published BINs for West Africa, offering insights into the biodiversity of an ecologically important but understudied taxon and region. Using multiple trap types allowed a more complete assessment of the local arthropod assemblage. The public release of these data will support and stimulate further taxonomic and ecological work in the region.

背景:西非生物多样性高,但研究相对不足,尤其是昆虫。因此,对西非节肢动物多样性的研究可以帮助解决有关保护、生态系统服务、杀虫剂使用和农业和疾病管理中的其他物种控制干预措施的重要问题。我们利用互补诱捕法对加纳节肢动物进行了密集采样,生成了DNA条形码,并通过条形码索引编号(bin,一种物种代理)对序列进行了分类。利用该数据集,研究了区域生物多样性采样的组合组成、时间活动模式和状态。结果:对95,996个个体进行DNA测序,共鉴定出10,120个独特的bin。各分类类群和诱捕器类型的物种积累率均未接近渐近线,表明生物多样性较高。不同类型的捕集器对不同的群落进行采样,其中黄盘捕集器与陷阱捕集器的相似性最大。白天捕获的昆虫和种类(箱)比夜间多。我们的数据集与南非共享的生命条形码数据库中的bin比与其他任何国家共享的都多,尽管这主要反映了非洲有限的采样和DNA测序活动。结论:这项研究是西非已发表的bin的两倍多,为一个生态上重要但研究不足的分类群和地区的生物多样性提供了见解。使用多种陷阱类型可以更完整地评估当地节肢动物的组合。这些数据的公开发布将支持和促进该地区进一步的分类学和生态学工作。
{"title":"Characterising a species-rich and understudied tropical insect fauna using DNA barcoding.","authors":"David R Hemprich-Bennett, Ezekiel Donkor, Bernard Adams, Naana Afua Acquaah, Eva D Ofori, Samuel Anie-Amoah, Abigail Bailey, H Charles J Godfray, Owen T Lewis, Fred Aboagye-Antwi, Talya D Hackett","doi":"10.1093/gigascience/giag028","DOIUrl":"https://doi.org/10.1093/gigascience/giag028","url":null,"abstract":"<p><strong>Background: </strong>West Africa has high biodiversity that is relatively understudied, especially for insects. Studies of West African arthropod diversity can therefore help address important questions regarding conservation, ecosystem services, and insecticide use and other species-control interventions in agriculture and disease management. We intensively sampled arthropods in Ghana using complementary trapping methods, generated DNA barcodes, and classified sequences by Barcode Index Numbers (BINs, a species proxy). Using this dataset, we investigate assemblage composition, temporal activity patterns, and the state of regional biodiversity sampling.</p><p><strong>Results: </strong>Sequencing DNA from 95,996 individuals captured using Malaise, yellow pan, pitfall, Heath and Centre for Disease Control (CDC) traps, we identified 10,120 unique BINs. The rate of species accumulation did not approach an asymptote for any taxonomic group or trap type, indicating high biodiversity. The different trap types sampled different subsets of the local community, with greatest similarity between yellow pan and pitfall traps. More insects and species (BINs) were trapped during the day than at night. Our dataset shared more BINs in the Barcode of Life Database with South Africa than with any other country, although this predominantly reflects the limited sampling and DNA sequencing campaigns in Africa.</p><p><strong>Conclusions: </strong>This study more than doubles the published BINs for West Africa, offering insights into the biodiversity of an ecologically important but understudied taxon and region. Using multiple trap types allowed a more complete assessment of the local arthropod assemblage. The public release of these data will support and stimulate further taxonomic and ecological work in the region.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147473325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiplatform comparisons and annotation of structural variants highlight the utility of the T2T reference genome in human diagnostics. 结构变异的多平台比较和注释突出了T2T参考基因组在人类诊断中的效用。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-09 DOI: 10.1093/gigascience/giag027
Jakub Savara, Tomas Novosad, Petr Gajdos, Anna Petrackova, Marek Behalek, Jirina Manakova, Filip Ctvrtlik, Jiri Minarik, Tomas Papajik, Eva Kriegova

Background: Structural variants (SVs) are increasingly recognised as key contributors to human diseases. However, our understanding of SVs in health and disease is limited, mainly due to their structural complexity and variable length in individuals as well as limitations inherent to the available genomic technologies and reference genome used.

Results: To systematically evaluate SVs across human whole-genome samples using hg38/GRCh38 and gapless T2T-CHM13 references, we introduced an innovative multiplatform approach, LongReadChecker (LoReC), which advances SV comparison and annotation based on distance variance, intersection, gene overlap and the closest SV in the clinical database. Comparison of the performance in detecting SVs from public and our own whole-genome datasets from short-read sequencing (SRS), available long-read sequencing (LRS) platforms and optical genome mapping (OGM) revealed that most SVs detected by SRS were confirmed by LRS, but LRS can identify twice as many SVs (25,000 SVs/genome) with greater read mapping accuracy. Our LoReC analysis further highlights the utility of the T2T-CHM13 reference in SV detection, as 20% more deletions and 20% less insertions were detected compared with hg38/GRCh38, which was particularly evident in long-read datasets. Since 80% of the SVs detected by LRS/SRS are smaller than 0.5 kbp, OGM did not detect them.

Conclusions: Our study revealed that introducing distance variance, intersection, gene overlap and the closest SV in the clinical database may help compare and annotate SVs in diagnostics. Our data showed that LRS together with T2T-CHM13 gapless sequences can improve the diagnostics of patients with human diseases when SRS fails to identify the cause.

背景:结构变异(SVs)越来越被认为是人类疾病的主要贡献者。然而,我们对sv在健康和疾病中的理解是有限的,主要是由于其结构的复杂性和个体的可变长度,以及现有基因组技术和参考基因组所固有的局限性。结果:为了利用hg38/GRCh38和无间隙T2T-CHM13参考文献系统地评估人类全基因组样本的SV,我们引入了一种创新的多平台方法LongReadChecker (LoReC),该方法基于距离方差、交叉、基因重叠和临床数据库中最接近的SV进行SV比较和注释。通过短读测序(short-read sequencing, SRS)、现有长读测序(long-read sequencing, LRS)平台和光学基因组定位(optical genome mapping, OGM)对公开和我们自己的全基因组数据集的SVs检测性能的比较发现,SRS检测到的大多数SVs被LRS确认,但LRS可以识别的SVs数量是LRS的两倍(25,000个SVs/基因组),并且具有更高的读取定位精度。我们的LoReC分析进一步强调了T2T-CHM13参考在SV检测中的效用,与hg38/GRCh38相比,检测到的缺失多20%,插入少20%,这在长读数据集中尤为明显。由于LRS/SRS检测到的sv中有80%小于0.5 kbp,因此OGM未检测到。结论:在临床数据库中引入距离方差、交叉、基因重叠和最接近的SV,有助于在诊断中对SV进行比较和注释。我们的数据表明,当SRS无法识别病因时,LRS与T2T-CHM13无间隙序列可以提高对人类疾病患者的诊断。
{"title":"Multiplatform comparisons and annotation of structural variants highlight the utility of the T2T reference genome in human diagnostics.","authors":"Jakub Savara, Tomas Novosad, Petr Gajdos, Anna Petrackova, Marek Behalek, Jirina Manakova, Filip Ctvrtlik, Jiri Minarik, Tomas Papajik, Eva Kriegova","doi":"10.1093/gigascience/giag027","DOIUrl":"https://doi.org/10.1093/gigascience/giag027","url":null,"abstract":"<p><strong>Background: </strong>Structural variants (SVs) are increasingly recognised as key contributors to human diseases. However, our understanding of SVs in health and disease is limited, mainly due to their structural complexity and variable length in individuals as well as limitations inherent to the available genomic technologies and reference genome used.</p><p><strong>Results: </strong>To systematically evaluate SVs across human whole-genome samples using hg38/GRCh38 and gapless T2T-CHM13 references, we introduced an innovative multiplatform approach, LongReadChecker (LoReC), which advances SV comparison and annotation based on distance variance, intersection, gene overlap and the closest SV in the clinical database. Comparison of the performance in detecting SVs from public and our own whole-genome datasets from short-read sequencing (SRS), available long-read sequencing (LRS) platforms and optical genome mapping (OGM) revealed that most SVs detected by SRS were confirmed by LRS, but LRS can identify twice as many SVs (25,000 SVs/genome) with greater read mapping accuracy. Our LoReC analysis further highlights the utility of the T2T-CHM13 reference in SV detection, as 20% more deletions and 20% less insertions were detected compared with hg38/GRCh38, which was particularly evident in long-read datasets. Since 80% of the SVs detected by LRS/SRS are smaller than 0.5 kbp, OGM did not detect them.</p><p><strong>Conclusions: </strong>Our study revealed that introducing distance variance, intersection, gene overlap and the closest SV in the clinical database may help compare and annotate SVs in diagnostics. Our data showed that LRS together with T2T-CHM13 gapless sequences can improve the diagnostics of patients with human diseases when SRS fails to identify the cause.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147389980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patterns of aDNA Damage Through Time and Environments - lessons from herbarium specimens. 随时间和环境变化的dna损伤模式——来自植物标本的教训。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-05 DOI: 10.1093/gigascience/giag026
Stefano Porrelli, Alice Fornasiero, Hong Phuong Le, Wenzhe Yin, Maria Navarrete Rodriguez, Nahed Mohammed, Axel Himmelbach, Andrew C Clarke, Nils Stein, Paul J Kersey, Rod A Wing, Rafal M Gutaker

Herbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium specimens were almost eight times faster than in moa bones, reflecting fundamental differences in tissue composition and preservation environments. Environmental conditions at the time of specimen collection emerged as the major determinants of post-mortem damage rates, with the interaction term between temperature and genus being the dominant driver of cytosine deamination. We find no effect of sample storage on DNA damage and degradation. These findings provide insights into how climatic origin, preservation environment, taxonomic identity and age influence DNA preservation while highlighting opportunities for improving institutional preservation practices. Due to standardised preservation conditions, museum collections can provide better insights into DNA damage and degradation over time than archaeological and paleontological samples.

植物标本室收藏是古代DNA研究的一个巨大但未得到充分利用的资源,包含超过4亿份具有详细元数据的标本,跨越了几个世纪的全球生物多样性。了解自然收藏中DNA保存的模式对于优化古代DNA研究和为未来的管理实践提供信息至关重要。我们分析了220年来在美洲和欧亚大陆收集的6种Hordeum和Oryza属植物的573份标本室标本的基因组数据。使用标准化的实验室协议和霰弹枪测序,我们量化了DNA降解并阐明了加速它的因素。我们发现显著年龄依赖的DNA碎片率,表明在史前样品中未检测到的时间降解过程。在我们的分析中,植物标本馆标本的DNA衰变速度几乎是恐鸟骨骼的8倍,反映了组织组成和保存环境的根本差异。采集标本时的环境条件是死后损伤率的主要决定因素,温度和属之间的相互作用项是胞嘧啶脱氨的主要驱动因素。我们没有发现样品储存对DNA损伤和降解的影响。这些发现提供了气候起源、保存环境、分类身份和年龄如何影响DNA保存的见解,同时突出了改进机构保存实践的机会。由于标准化的保存条件,博物馆藏品可以比考古和古生物样本更好地了解DNA的损伤和降解。
{"title":"Patterns of aDNA Damage Through Time and Environments - lessons from herbarium specimens.","authors":"Stefano Porrelli, Alice Fornasiero, Hong Phuong Le, Wenzhe Yin, Maria Navarrete Rodriguez, Nahed Mohammed, Axel Himmelbach, Andrew C Clarke, Nils Stein, Paul J Kersey, Rod A Wing, Rafal M Gutaker","doi":"10.1093/gigascience/giag026","DOIUrl":"https://doi.org/10.1093/gigascience/giag026","url":null,"abstract":"<p><p>Herbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium specimens were almost eight times faster than in moa bones, reflecting fundamental differences in tissue composition and preservation environments. Environmental conditions at the time of specimen collection emerged as the major determinants of post-mortem damage rates, with the interaction term between temperature and genus being the dominant driver of cytosine deamination. We find no effect of sample storage on DNA damage and degradation. These findings provide insights into how climatic origin, preservation environment, taxonomic identity and age influence DNA preservation while highlighting opportunities for improving institutional preservation practices. Due to standardised preservation conditions, museum collections can provide better insights into DNA damage and degradation over time than archaeological and paleontological samples.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147354631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ZNF331 Modulates Early Embryonic Transcription During Zygotic Genome Activation in Goat. ZNF331在山羊合子基因组激活过程中调控早期胚胎转录
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-05 DOI: 10.1093/gigascience/giag025
Yingnan Yang, Jinhao Zhang, Xiaowei Chen, Haonan Chen, Dongxu Li, Yongjie Wan, Mingtian Deng, Feng Wang

Background: Zygotic genome activation (ZGA) is a pivotal process during early embryogenesis, marking the maternal-to-zygotic transition. ZGA is regulated by a variety of epigenetic and transcriptional factors. However, the transcriptional regulatory mechanisms underlying ZGA in livestock species remain largely unclear.

Results: By integrating ATAC-seq and RNA-seq, we characterized chromatin accessibility and transcriptional dynamics in goat embryos. Transcriptional inhibition with α-amanitin markedly reduced promoter accessibility and disrupted RNA polymerase II (Pol II)-mediated transcription. Motif enrichment analysis identified ZNF331 as a potential regulator with specific upregulation at the 8-cell stage. Functional knockdown of ZNF331 resulted in impaired embryonic development, reduced blastocyst formation, and widespread transcriptome alterations. Mechanistically, ZNF331 depletion caused abnormal elevation of Pol II Ser5 phosphorylation, excessive transcriptional activity, maternal mRNA retention, and excessive activation of zygotic genes.

Conclusion: Our study identifies ZNF331 as a critical regulator of goat ZGA, functioning through fine-tuning Pol II Ser5 phosphorylation to balance maternal transcript clearance and zygotic gene activation. These findings highlight the essential role of the ZNF331-Pol II axis in goat embryogenesis and suggest a potentially conserved mechanism across mammals.

背景:合子基因组激活(Zygotic genome activation, ZGA)是胚胎早期发生的关键过程,标志着母体向合子的转变。ZGA受多种表观遗传和转录因子的调控。然而,家畜ZGA的转录调控机制在很大程度上仍不清楚。结果:通过整合ATAC-seq和RNA-seq,我们表征了山羊胚胎的染色质可及性和转录动力学。α-amanitin的转录抑制显著降低了启动子的可及性和RNA聚合酶II (Pol II)介导的转录中断。Motif富集分析发现ZNF331是8细胞期特异性上调的潜在调控因子。ZNF331的功能性敲低导致胚胎发育受损、囊胚形成减少和广泛的转录组改变。从机制上讲,ZNF331缺失导致Pol II Ser5磷酸化异常升高、转录活性过度、母体mRNA保留和合子基因过度激活。结论:我们的研究发现ZNF331是山羊ZGA的关键调节因子,通过微调Pol II Ser5磷酸化来平衡母体转录物清除和合子基因激活。这些发现强调了ZNF331-Pol II轴在山羊胚胎发生中的重要作用,并提示了哺乳动物中潜在的保守机制。
{"title":"ZNF331 Modulates Early Embryonic Transcription During Zygotic Genome Activation in Goat.","authors":"Yingnan Yang, Jinhao Zhang, Xiaowei Chen, Haonan Chen, Dongxu Li, Yongjie Wan, Mingtian Deng, Feng Wang","doi":"10.1093/gigascience/giag025","DOIUrl":"https://doi.org/10.1093/gigascience/giag025","url":null,"abstract":"<p><strong>Background: </strong>Zygotic genome activation (ZGA) is a pivotal process during early embryogenesis, marking the maternal-to-zygotic transition. ZGA is regulated by a variety of epigenetic and transcriptional factors. However, the transcriptional regulatory mechanisms underlying ZGA in livestock species remain largely unclear.</p><p><strong>Results: </strong>By integrating ATAC-seq and RNA-seq, we characterized chromatin accessibility and transcriptional dynamics in goat embryos. Transcriptional inhibition with α-amanitin markedly reduced promoter accessibility and disrupted RNA polymerase II (Pol II)-mediated transcription. Motif enrichment analysis identified ZNF331 as a potential regulator with specific upregulation at the 8-cell stage. Functional knockdown of ZNF331 resulted in impaired embryonic development, reduced blastocyst formation, and widespread transcriptome alterations. Mechanistically, ZNF331 depletion caused abnormal elevation of Pol II Ser5 phosphorylation, excessive transcriptional activity, maternal mRNA retention, and excessive activation of zygotic genes.</p><p><strong>Conclusion: </strong>Our study identifies ZNF331 as a critical regulator of goat ZGA, functioning through fine-tuning Pol II Ser5 phosphorylation to balance maternal transcript clearance and zygotic gene activation. These findings highlight the essential role of the ZNF331-Pol II axis in goat embryogenesis and suggest a potentially conserved mechanism across mammals.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147354644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive analysis of single-cell trajectories in 3D space with Cell Journey. 使用Cell Journey对三维空间中的单细胞轨迹进行交互分析。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-03 DOI: 10.1093/gigascience/giag021
Damian Panas, Marcin Tabaka

The integration of high-throughput single-cell profiling technologies with RNA velocity analysis has enabled the reconstruction of dynamic cellular differentiation trajectories at unprecedented resolution. Despite these advances, current visualization techniques for RNA velocity are predominantly confined to two-dimensional representations, typically employing arrows or streamlines. While effective for depicting simple cellular trajectories, these approaches are insufficient for capturing the complex topologies of multipartite cellular transitions. This limitation highlights the need for advanced three-dimensional visualization tools that can more accurately convey the structure and dynamics of velocity-inferred transitions in single-cell data. Here, we present Cell Journey, an interactive visualization platform specifically developed for three-dimensional analysis and representation of RNA velocity trajectories derived from single-cell datasets. The platform features an intuitive graphical interface supporting both unimodal and multimodal data, accommodates multiple input formats, and provides extensive customization capabilities for trajectory visualization. Cell Journey computes RNA velocity vector fields on a user-defined three-dimensional grid and constructs velocity trajectories using either Euler integration or the fourth-order Runge-Kutta method. The platform enables dynamic exploration of cellular dynamics through interactive visual elements, including streamlines, streamlets, cones, and volumetric plots. Furthermore, it allows users to investigate changes in feature activity along selected paths, facilitating deeper insights into cellular state transitions within complex multimodal single-cell datasets.

高通量单细胞分析技术与RNA速度分析的整合,使得以前所未有的分辨率重建动态细胞分化轨迹成为可能。尽管有这些进步,目前RNA速度的可视化技术主要局限于二维表示,通常采用箭头或流线。虽然这些方法对于描述简单的细胞轨迹是有效的,但对于捕获多部细胞转换的复杂拓扑是不够的。这一限制突出了对先进的三维可视化工具的需求,这些工具可以更准确地传达单细胞数据中速度推断转换的结构和动态。在这里,我们介绍了Cell Journey,这是一个交互式可视化平台,专门用于从单细胞数据集导出的RNA速度轨迹的三维分析和表示。该平台具有直观的图形界面,支持单模态和多模态数据,适应多种输入格式,并为轨迹可视化提供广泛的定制功能。Cell Journey在用户定义的三维网格上计算RNA速度矢量场,并使用欧拉积分或四阶龙格-库塔方法构建速度轨迹。该平台可以通过交互式视觉元素,包括流线、溪流、锥体和体积图,对细胞动力学进行动态探索。此外,它允许用户沿着选定的路径调查特征活动的变化,从而更深入地了解复杂的多模态单细胞数据集中的细胞状态转换。
{"title":"Interactive analysis of single-cell trajectories in 3D space with Cell Journey.","authors":"Damian Panas, Marcin Tabaka","doi":"10.1093/gigascience/giag021","DOIUrl":"https://doi.org/10.1093/gigascience/giag021","url":null,"abstract":"<p><p>The integration of high-throughput single-cell profiling technologies with RNA velocity analysis has enabled the reconstruction of dynamic cellular differentiation trajectories at unprecedented resolution. Despite these advances, current visualization techniques for RNA velocity are predominantly confined to two-dimensional representations, typically employing arrows or streamlines. While effective for depicting simple cellular trajectories, these approaches are insufficient for capturing the complex topologies of multipartite cellular transitions. This limitation highlights the need for advanced three-dimensional visualization tools that can more accurately convey the structure and dynamics of velocity-inferred transitions in single-cell data. Here, we present Cell Journey, an interactive visualization platform specifically developed for three-dimensional analysis and representation of RNA velocity trajectories derived from single-cell datasets. The platform features an intuitive graphical interface supporting both unimodal and multimodal data, accommodates multiple input formats, and provides extensive customization capabilities for trajectory visualization. Cell Journey computes RNA velocity vector fields on a user-defined three-dimensional grid and constructs velocity trajectories using either Euler integration or the fourth-order Runge-Kutta method. The platform enables dynamic exploration of cellular dynamics through interactive visual elements, including streamlines, streamlets, cones, and volumetric plots. Furthermore, it allows users to investigate changes in feature activity along selected paths, facilitating deeper insights into cellular state transitions within complex multimodal single-cell datasets.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147343982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward total recall: Enhancing data FAIRness through AI-driven metadata standardization. 全面召回:通过人工智能驱动的元数据标准化增强数据公平性。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-03 DOI: 10.1093/gigascience/giag019
Sowmya S Sundaram, Rafael S Gonçalves, Mark A Musen

Scientific metadata often suffer from incompleteness, inconsistency, and formatting errors, which hinder effective discovery and reuse of the associated datasets. We present a method that combines Generative Pre-trained Transformer 4 (GPT-4) with structured metadata templates from the Center for Expanded Data Annotation and Retrieval (CEDAR) knowledge base to automatically standardize metadata and to ensure compliance with established standards. A CEDAR template specifies the expected fields of a metadata submission and their permissible values. Our standardization process involves using CEDAR templates to guide the GPT-4 in accurately correcting and refining metadata entries in bulk, resulting in significant improvements in metadata retrieval performance, especially in recall-the proportion of relevant datasets retrieved from the total relevant datasets available. Using the BioSample and Gene Expression Omnibus (GEO) repositories maintained by the National Center for Biotechnology Information (NCBI), we demonstrate that retrieval of datasets whose metadata are altered by GPT-4 when provided with CEDAR templates (GPT-4+CEDAR) is substantially better than retrieval of datasets whose metadata are in their original state and that of datasets whose metadata are altered using GPT-4 with only data-dictionary guidance (GPT-4+DD). The average recall increases dramatically, from 17.65% with baseline raw metadata to 62.87% with GPT-4+CEDAR. Furthermore, we evaluate the robustness of our approach by comparing GPT-4 against other large language models, including LLaMA-3 and MedLLaMA2, demonstrating consistent performance advantages for GPT-4+CEDAR. These results underscore the transformative potential of combining advanced language models with symbolic models of standardized metadata structures for more effective and reliable data retrieval, thus accelerating scientific discoveries and data-driven research.

科学元数据经常存在不完整、不一致和格式错误,这阻碍了相关数据集的有效发现和重用。我们提出了一种将生成式预训练转换器4 (GPT-4)与来自扩展数据注释和检索中心(CEDAR)知识库的结构化元数据模板相结合的方法,以自动标准化元数据并确保符合既定标准。CEDAR模板指定元数据提交的预期字段及其允许值。我们的标准化过程包括使用CEDAR模板来指导GPT-4准确地批量纠正和精炼元数据条目,从而显著提高了元数据检索性能,特别是在从所有可用的相关数据集中检索到的相关数据集的比例方面。利用国家生物技术信息中心(NCBI)维护的生物样本和基因表达综合数据库(GEO),我们证明了在提供CEDAR模板(GPT-4+CEDAR)的情况下,对元数据被GPT-4修改过的数据集的检索效果明显优于元数据处于原始状态的数据集的检索效果,以及仅使用数据字典指导(GPT-4+DD)的元数据集的检索效果。平均召回率显著提高,从基线原始元数据的17.65%提高到GPT-4+CEDAR的62.87%。此外,我们通过将GPT-4与其他大型语言模型(包括LLaMA-3和MedLLaMA2)进行比较来评估我们方法的鲁棒性,证明GPT-4+CEDAR具有一致的性能优势。这些结果强调了将高级语言模型与标准化元数据结构的符号模型相结合的变革潜力,以实现更有效、更可靠的数据检索,从而加速科学发现和数据驱动研究。
{"title":"Toward total recall: Enhancing data FAIRness through AI-driven metadata standardization.","authors":"Sowmya S Sundaram, Rafael S Gonçalves, Mark A Musen","doi":"10.1093/gigascience/giag019","DOIUrl":"https://doi.org/10.1093/gigascience/giag019","url":null,"abstract":"<p><p>Scientific metadata often suffer from incompleteness, inconsistency, and formatting errors, which hinder effective discovery and reuse of the associated datasets. We present a method that combines Generative Pre-trained Transformer 4 (GPT-4) with structured metadata templates from the Center for Expanded Data Annotation and Retrieval (CEDAR) knowledge base to automatically standardize metadata and to ensure compliance with established standards. A CEDAR template specifies the expected fields of a metadata submission and their permissible values. Our standardization process involves using CEDAR templates to guide the GPT-4 in accurately correcting and refining metadata entries in bulk, resulting in significant improvements in metadata retrieval performance, especially in recall-the proportion of relevant datasets retrieved from the total relevant datasets available. Using the BioSample and Gene Expression Omnibus (GEO) repositories maintained by the National Center for Biotechnology Information (NCBI), we demonstrate that retrieval of datasets whose metadata are altered by GPT-4 when provided with CEDAR templates (GPT-4+CEDAR) is substantially better than retrieval of datasets whose metadata are in their original state and that of datasets whose metadata are altered using GPT-4 with only data-dictionary guidance (GPT-4+DD). The average recall increases dramatically, from 17.65% with baseline raw metadata to 62.87% with GPT-4+CEDAR. Furthermore, we evaluate the robustness of our approach by comparing GPT-4 against other large language models, including LLaMA-3 and MedLLaMA2, demonstrating consistent performance advantages for GPT-4+CEDAR. These results underscore the transformative potential of combining advanced language models with symbolic models of standardized metadata structures for more effective and reliable data retrieval, thus accelerating scientific discoveries and data-driven research.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147344007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stereo-cell deciphers the spatial and functional heterogeneity of polyploid hepatocytes. 立体细胞破译多倍体肝细胞的空间和功能异质性。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-03-02 DOI: 10.1093/gigascience/giag023
Yongqing Yang, Jiahui Luo, Yier Cai, Pengcheng Guo, Qiang Guo, Hong Wu, Longqi Liu, Shijie Hao

A characteristic feature of the liver is the presence of numerous polyploid hepatocytes. However, the functional distinctions among diploid, tetraploid, and octoploid hepatocytes remain poorly understood. In this study, we employed the spatially resolved single-cell sequencing technology, Stereo-cell, to dissect the transcriptomic and functional heterogeneity across hepatocyte ploidy subtypes. We detail the development of Stereo-cell Imaging-based ploidy Identification (SCIPI), a technical pipeline that integrates bright-field cell contour recognition, DAPI-based nuclear area and number quantification, and UMI-barcoded single-cell transcriptomics. This approach enables precise identification of six core hepatocyte subtypes: mononucleated diploid (2n×1), mononucleated tetraploid (4n×1), binucleated tetraploid (2n×2), mononucleated octoploid (8n×1), binucleated octoploid (4n×2), and binucleated hexadecaploid (8n×2) hepatocytes. Single-cell transcriptomic analysis based on ploidy annotation revealed that gene expression levels scale positively with increasing ploidy and nuclear number. Metabolic pathway-associated genes were significantly upregulated in polyploid cells, suggesting that cellular polyploidization enhances the metabolic activity of hepatocytes. Furthermore, this SCIPI strategy is broadly applicable to the study of various polyploid tissues, offering a novel and versatile framework for innovative ploidy-resolved research across diverse biological researches.

肝脏的一个特征性特征是存在大量多倍体肝细胞。然而,二倍体、四倍体和八倍体肝细胞之间的功能差异仍然知之甚少。在这项研究中,我们采用空间分辨单细胞测序技术Stereo-cell来剖析肝细胞倍体亚型的转录组学和功能异质性。我们详细介绍了基于立体细胞成像的倍性鉴定(SCIPI)的发展,这是一种集成了亮场细胞轮廓识别、基于dapi的核区域和数量量化以及uni条形码单细胞转录组学的技术管道。这种方法能够精确鉴定六种核心肝细胞亚型:单核二倍体(2n×1),单核四倍体(4n×1),双核四倍体(2n×2),单核八倍体(8n×1),双核八倍体(4n×2)和双核六倍体(8n×2)肝细胞。基于倍性注释的单细胞转录组学分析显示,基因表达水平与倍性和核数的增加成正比。代谢途径相关基因在多倍体细胞中显著上调,表明细胞多倍体化增强了肝细胞的代谢活性。此外,这种SCIPI策略广泛适用于各种多倍体组织的研究,为跨多种生物学研究的创新型倍性解决研究提供了一种新颖而通用的框架。
{"title":"Stereo-cell deciphers the spatial and functional heterogeneity of polyploid hepatocytes.","authors":"Yongqing Yang, Jiahui Luo, Yier Cai, Pengcheng Guo, Qiang Guo, Hong Wu, Longqi Liu, Shijie Hao","doi":"10.1093/gigascience/giag023","DOIUrl":"https://doi.org/10.1093/gigascience/giag023","url":null,"abstract":"<p><p>A characteristic feature of the liver is the presence of numerous polyploid hepatocytes. However, the functional distinctions among diploid, tetraploid, and octoploid hepatocytes remain poorly understood. In this study, we employed the spatially resolved single-cell sequencing technology, Stereo-cell, to dissect the transcriptomic and functional heterogeneity across hepatocyte ploidy subtypes. We detail the development of Stereo-cell Imaging-based ploidy Identification (SCIPI), a technical pipeline that integrates bright-field cell contour recognition, DAPI-based nuclear area and number quantification, and UMI-barcoded single-cell transcriptomics. This approach enables precise identification of six core hepatocyte subtypes: mononucleated diploid (2n×1), mononucleated tetraploid (4n×1), binucleated tetraploid (2n×2), mononucleated octoploid (8n×1), binucleated octoploid (4n×2), and binucleated hexadecaploid (8n×2) hepatocytes. Single-cell transcriptomic analysis based on ploidy annotation revealed that gene expression levels scale positively with increasing ploidy and nuclear number. Metabolic pathway-associated genes were significantly upregulated in polyploid cells, suggesting that cellular polyploidization enhances the metabolic activity of hepatocytes. Furthermore, this SCIPI strategy is broadly applicable to the study of various polyploid tissues, offering a novel and versatile framework for innovative ploidy-resolved research across diverse biological researches.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147325628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1