首页 > 最新文献

GigaScience最新文献

英文 中文
Improved reference assembly and core collection re-sequencing to facilitate exploration of important agronomical traits for the improvement of oilseed crop, Carthamus tinctorius L. 改进参比组合和核心集合重测序,为油料作物红花改良的重要农艺性状探索提供便利。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-11 DOI: 10.1093/gigascience/giaf151
Megha Sharma, Varun Bhardwaj, Praveen Kumar Oraon, Shivani Choudhary, Heena Ambreen, Rohit Nandan Shukla, Harsha Rayudu Jamedar, Ajitha Vijjeswarapu, Vandana Jaiswal, Palchamy Kadirvel, Arun Jagannath, Shailendra Goel

Background: Safflower (Carthamus tinctorius L.) is a drought-resilient oilseed crop. Besides producing edible oil rich in oleic and linoleic acid, it is also used in biofuels, cosmetics, colouring dyes, pharmaceuticals and nutraceuticals. Despite its significant economic uses, availability of genetic and genomic resources in safflower are limited.

Results: We report an improved de novo genome assembly of safflower (Safflower_A2). A chromosome-level assembly of 1.15 Gb with telomeres and centromeric repeats, was constructed using PacBio HiFi reads, optical maps, Illumina short reads, and Hi-C sequencing. Safflower_A2 shows better contiguity, completeness, and high-quality annotation than previous assemblies. The assembly was further validated with the help of a single nucleotide polymorphism (SNP)-based linkage map. A genome-wide survey identified genes for comprehensive exploration of disease resistance in the safflower. Employing the de novo genome assembly as a reference, we used resequencing data of a global core-collection of 123 accessions to carry out a SNP-based genome-wide association study, which identified significant associations for several traits, their haplotypes of agronomic value, including seed oil content. Resequencing data was also applied for a pan-genome analysis which provided critical insights into genome diversity identifying an additional ∼11000 genes and their functional enrichment that will be useful for region-specific breeding lines.

Conclusion: Our study provides insights into the genomic architecture of safflower by leveraging an improved genome assembly and annotation. Additionally, resources including high-density linkage map, marker-trait associations, and pan-genome developed in this study provide valuable resources for use in breeding and crop improvement programs by the global research community.

背景:红花(Carthamus tinctorius L.)是一种抗旱油料作物。除了生产富含油酸和亚油酸的食用油外,它还用于生物燃料、化妆品、染料、药品和营养保健品。尽管红花具有重要的经济用途,但其遗传和基因组资源的可用性有限。结果:我们报道了一个改进的红花(Safflower_A2)从头基因组组装。利用PacBio HiFi reads、光学图谱、Illumina short reads和Hi-C测序,构建了1.15 Gb染色体水平的端粒和着丝粒重复序列。与以前的程序集相比,Safflower_A2具有更好的连续性、完整性和高质量的注释。通过基于单核苷酸多态性(SNP)的连锁图谱进一步验证了该序列。一项全基因组调查确定了红花抗病基因的全面探索。以从头基因组组装为参考,我们利用123份全球核心收集的重测序数据进行了基于snp的全基因组关联研究,发现了几种性状及其农艺价值单倍型(包括种子含油量)的显著相关性。重测序数据还用于泛基因组分析,该分析为基因组多样性提供了关键见解,确定了额外的约11000个基因及其功能富集,这将对区域特异性育种系有用。结论:我们的研究利用改进的基因组组装和注释为红花的基因组结构提供了见解。此外,本研究开发的高密度连锁图谱、标记-性状关联、泛基因组等资源为全球研究界的育种和作物改良计划提供了宝贵的资源。
{"title":"Improved reference assembly and core collection re-sequencing to facilitate exploration of important agronomical traits for the improvement of oilseed crop, Carthamus tinctorius L.","authors":"Megha Sharma, Varun Bhardwaj, Praveen Kumar Oraon, Shivani Choudhary, Heena Ambreen, Rohit Nandan Shukla, Harsha Rayudu Jamedar, Ajitha Vijjeswarapu, Vandana Jaiswal, Palchamy Kadirvel, Arun Jagannath, Shailendra Goel","doi":"10.1093/gigascience/giaf151","DOIUrl":"https://doi.org/10.1093/gigascience/giaf151","url":null,"abstract":"<p><strong>Background: </strong>Safflower (Carthamus tinctorius L.) is a drought-resilient oilseed crop. Besides producing edible oil rich in oleic and linoleic acid, it is also used in biofuels, cosmetics, colouring dyes, pharmaceuticals and nutraceuticals. Despite its significant economic uses, availability of genetic and genomic resources in safflower are limited.</p><p><strong>Results: </strong>We report an improved de novo genome assembly of safflower (Safflower_A2). A chromosome-level assembly of 1.15 Gb with telomeres and centromeric repeats, was constructed using PacBio HiFi reads, optical maps, Illumina short reads, and Hi-C sequencing. Safflower_A2 shows better contiguity, completeness, and high-quality annotation than previous assemblies. The assembly was further validated with the help of a single nucleotide polymorphism (SNP)-based linkage map. A genome-wide survey identified genes for comprehensive exploration of disease resistance in the safflower. Employing the de novo genome assembly as a reference, we used resequencing data of a global core-collection of 123 accessions to carry out a SNP-based genome-wide association study, which identified significant associations for several traits, their haplotypes of agronomic value, including seed oil content. Resequencing data was also applied for a pan-genome analysis which provided critical insights into genome diversity identifying an additional ∼11000 genes and their functional enrichment that will be useful for region-specific breeding lines.</p><p><strong>Conclusion: </strong>Our study provides insights into the genomic architecture of safflower by leveraging an improved genome assembly and annotation. Additionally, resources including high-density linkage map, marker-trait associations, and pan-genome developed in this study provide valuable resources for use in breeding and crop improvement programs by the global research community.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145722306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Integrative Multi-Omics Random Forest Framework for Robust Biomarker Discovery. 一个整合的多组学随机森林框架稳健的生物标志物发现。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-09 DOI: 10.1093/gigascience/giaf148
Wei Zhang, Hanchen Huang, Lily Wang, Brian D Lehmann, X Steven Chen

High-throughput technologies now produce a wide array of omics data, from genomic and transcriptomic profiles to epigenomic and proteomic measurements. Integrating multiple omics layers measured on the same samples can reveal cross-layer molecular hubs that single-layer analyses miss. We present an unsupervised, multivariate random forest (MRF) framework with an inverse minimal depth (IMD) importance to prioritize shared biomarkers across omics. In each forest, one layer serves as a multivariate response and the other as predictors; IMD summarizes how early a predictor (or response MSRV) appears across trees, yielding interpretable, cross-layer feature rankings. We provide three IMD-based selection strategies and introduce an optional IMD power transform to enhance sensitivity to interaction signals. In extensive simulations spanning linear, nonlinear, and interaction regimes, our method matches SPLS/CCA under linear settings and outperforms them as nonlinearity increases, while adapted univariate ensemble learners (RF, GBM, XGBoost) underperform in the multivariate, unsupervised context. Applied to TCGA BRCA and COAD, MRF-IMD identifies genes, CpGs, and miRNAs enriched for cancer-relevant pathways and yields more robust survival stratification than linear integrators with matched model sizes. In a TCGA pan-cancer analysis, MRF-IMD features achieve higher ARI than alternatives and recover coherent tumor-type clusters; in ADNI, the integrative signature improves dementia-progression stratification over a published methylation risk score. Our scalable, interpretable MRF-IMD framework advances reliable multi-omics biomarker discovery when nonlinear, cross-layer dependencies matter.

高通量技术现在产生广泛的组学数据,从基因组和转录组谱到表观基因组和蛋白质组测量。整合在相同样品上测量的多个组学层可以揭示单层分析遗漏的跨层分子中心。我们提出了一个无监督的多变量随机森林(MRF)框架,具有逆最小深度(IMD)重要性,可以优先考虑组学中共享的生物标志物。在每个森林中,一层作为多变量响应,另一层作为预测因子;IMD总结了预测器(或响应MSRV)在树中出现的时间,从而产生可解释的跨层特征排名。我们提供了三种基于IMD的选择策略,并引入了一个可选的IMD功率变换来提高对交互信号的灵敏度。在跨越线性、非线性和交互机制的广泛模拟中,我们的方法在线性设置下匹配SPLS/CCA,并在非线性增加时优于它们,而自适应单变量集成学习器(RF、GBM、XGBoost)在多变量、无监督环境下表现不佳。应用于TCGA、BRCA和COAD, MRF-IMD可以识别癌症相关途径富集的基因、CpGs和mirna,并且比具有匹配模型大小的线性整合器产生更强大的生存分层。在TCGA泛癌症分析中,MRF-IMD特征比其他选择获得更高的ARI,并恢复连贯的肿瘤类型集群;在ADNI中,综合特征优于已公布的甲基化风险评分,可改善痴呆进展分层。我们的可扩展、可解释的MRF-IMD框架在非线性、跨层依赖关系重要的情况下,推进了可靠的多组学生物标志物发现。
{"title":"An Integrative Multi-Omics Random Forest Framework for Robust Biomarker Discovery.","authors":"Wei Zhang, Hanchen Huang, Lily Wang, Brian D Lehmann, X Steven Chen","doi":"10.1093/gigascience/giaf148","DOIUrl":"https://doi.org/10.1093/gigascience/giaf148","url":null,"abstract":"<p><p>High-throughput technologies now produce a wide array of omics data, from genomic and transcriptomic profiles to epigenomic and proteomic measurements. Integrating multiple omics layers measured on the same samples can reveal cross-layer molecular hubs that single-layer analyses miss. We present an unsupervised, multivariate random forest (MRF) framework with an inverse minimal depth (IMD) importance to prioritize shared biomarkers across omics. In each forest, one layer serves as a multivariate response and the other as predictors; IMD summarizes how early a predictor (or response MSRV) appears across trees, yielding interpretable, cross-layer feature rankings. We provide three IMD-based selection strategies and introduce an optional IMD power transform to enhance sensitivity to interaction signals. In extensive simulations spanning linear, nonlinear, and interaction regimes, our method matches SPLS/CCA under linear settings and outperforms them as nonlinearity increases, while adapted univariate ensemble learners (RF, GBM, XGBoost) underperform in the multivariate, unsupervised context. Applied to TCGA BRCA and COAD, MRF-IMD identifies genes, CpGs, and miRNAs enriched for cancer-relevant pathways and yields more robust survival stratification than linear integrators with matched model sizes. In a TCGA pan-cancer analysis, MRF-IMD features achieve higher ARI than alternatives and recover coherent tumor-type clusters; in ADNI, the integrative signature improves dementia-progression stratification over a published methylation risk score. Our scalable, interpretable MRF-IMD framework advances reliable multi-omics biomarker discovery when nonlinear, cross-layer dependencies matter.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A sulfatide-centered ultra-high resolution magnetic resonance MALDI imaging benchmark dataset for MS1-based lipid annotation tools. 基于ms1的脂质注释工具的以硫脂脂为中心的超高分辨率磁共振MALDI成像基准数据集。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-09 DOI: 10.1093/gigascience/giaf150
Lars Gruber, Stefan Schmidt, Thomas Enzlein, Carsten Hopf

Spatial 'omics techniques are indispensable for studying complex biological systems and for the discovery of spatial biomarkers. While several current matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) instruments are capable of localizing numerous metabolites at high spatial and spectral resolution, the majority of MSI data is acquired at the MS1 level only. Assigning molecular identities based on MS1 data presents significant analytical and computational challenges, as the inherent limitations of MS1 data preclude confident annotations beyond the sum formula level. To enable future advancements of computational lipid annotation tools, well-characterized benchmark - or ground truth - datasets are crucial, which exceed the scope of synthetic data or data derived from mimetic tissue models. To this end, we provide two sulfatide-centered, biology-driven magnetic resonance MSI (MR-MSI) datasets at different mass resolving powers that characterize lipids in a mouse model of human metachromatic dystrophy. This data includes an ultra-high-resolution (R ∼1,230,000) quantum cascade laser mid-infrared imaging-guided MR-MSI dataset that enables isotopic fine structure analysis and therefore enhances the level of confidence substantially. To highlight the usefulness of the data, we compared 118 manual sulfatide annotations with the number of decoy database-controlled sulfatide annotations performed in Metaspace (67 at FDR < 10%). Overall, our datasets can be used to benchmark annotation algorithms, validate spatial biomarker discovery pipelines, and serve as a reference for future studies that explore sulfatide metabolism and its spatial regulation.

空间组学技术对于复杂生物系统的研究和空间生物标志物的发现是不可或缺的。虽然目前一些基质辅助激光解吸/电离(MALDI)质谱成像(MSI)仪器能够在高空间和光谱分辨率下定位大量代谢物,但大多数MSI数据仅在MS1水平上获得。基于MS1数据分配分子身份提出了重大的分析和计算挑战,因为MS1数据的固有局限性排除了求和公式级别以外的自信注释。为了实现计算脂质注释工具的未来发展,具有良好特征的基准数据集(或基础事实数据集)至关重要,这超出了合成数据或源自模拟组织模型的数据的范围。为此,我们提供了两个以硫脂为中心的、生物驱动的磁共振MSI (MR-MSI)数据集,以不同的质量分辨率来表征人类异色性营养不良小鼠模型中的脂质。该数据包括一个超高分辨率(R ~ 123万)量子级联激光中红外成像引导的MR-MSI数据集,该数据集可以进行同位素精细结构分析,从而大大提高了置信度。为了突出数据的有用性,我们比较了118个人工硫胺注释与在Metaspace中执行的诱饵数据库控制的硫胺注释的数量(FDR < 10%时67个)。总体而言,我们的数据集可用于基准标注算法,验证空间生物标志物发现管道,并为未来探索硫脂代谢及其空间调节的研究提供参考。
{"title":"A sulfatide-centered ultra-high resolution magnetic resonance MALDI imaging benchmark dataset for MS1-based lipid annotation tools.","authors":"Lars Gruber, Stefan Schmidt, Thomas Enzlein, Carsten Hopf","doi":"10.1093/gigascience/giaf150","DOIUrl":"https://doi.org/10.1093/gigascience/giaf150","url":null,"abstract":"<p><p>Spatial 'omics techniques are indispensable for studying complex biological systems and for the discovery of spatial biomarkers. While several current matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) instruments are capable of localizing numerous metabolites at high spatial and spectral resolution, the majority of MSI data is acquired at the MS1 level only. Assigning molecular identities based on MS1 data presents significant analytical and computational challenges, as the inherent limitations of MS1 data preclude confident annotations beyond the sum formula level. To enable future advancements of computational lipid annotation tools, well-characterized benchmark - or ground truth - datasets are crucial, which exceed the scope of synthetic data or data derived from mimetic tissue models. To this end, we provide two sulfatide-centered, biology-driven magnetic resonance MSI (MR-MSI) datasets at different mass resolving powers that characterize lipids in a mouse model of human metachromatic dystrophy. This data includes an ultra-high-resolution (R ∼1,230,000) quantum cascade laser mid-infrared imaging-guided MR-MSI dataset that enables isotopic fine structure analysis and therefore enhances the level of confidence substantially. To highlight the usefulness of the data, we compared 118 manual sulfatide annotations with the number of decoy database-controlled sulfatide annotations performed in Metaspace (67 at FDR < 10%). Overall, our datasets can be used to benchmark annotation algorithms, validate spatial biomarker discovery pipelines, and serve as a reference for future studies that explore sulfatide metabolism and its spatial regulation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Translating short-form Python exercises to other programming languages using diverse prompting strategies. 使用不同的提示策略将简短的Python练习翻译成其他编程语言。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-08 DOI: 10.1093/gigascience/giaf149
Stephen R Piccolo, Harlan P Stevens

With the increasing complexity and quantity of experimental and observational data, life scientists rely on programming to automate analyses, enhance reproducibility, and facilitate collaboration. Scripting languages like Python are often favored for their simplicity and flexibility, enabling researchers to focus primarily on high-level tasks. Compiled languages such as C++ and Rust offer greater efficiency, making them preferable for intensive or repeated computations. In educational settings, instructors may wish to teach both types of languages and thus may wish to translate content from one programming language to another. In research contexts, researchers may wish to implement their ideas in one language before translating the code to another. However, translating between programming languages requires significant effort, prompting our interest in using large language models (LLMs) for semi-automated code translation. This study explores the use of an LLM (GPT-4) to translate 559 short-form programming exercises from Python into C++, Rust, Julia, and JavaScript. We used three prompting strategies-instructions only, code only, or both combined-and compared the translated code's output against the Python code's output. Translation success differed considerably by prompting strategy, and at least one of the strategies tested was effective for nearly every exercise. The highest overall success rate occurred for Rust (99.5%), followed by JavaScript (98.9%), C++ (97.9%), and Julia (95.0%). Our findings demonstrate that LLMs can effectively translate small-scale programming exercises between languages, reducing the need for manual rewriting. To support education and research, we have manually translated all exercises that were not translated successfully through automation, and we have made our translations freely available.

随着实验和观测数据的复杂性和数量的增加,生命科学家依靠编程来自动化分析,提高可重复性,促进协作。像Python这样的脚本语言通常因其简单性和灵活性而受到青睐,使研究人员能够主要专注于高级任务。像c++和Rust这样的编译语言提供了更高的效率,使它们更适合密集或重复的计算。在教育环境中,教师可能希望教授两种语言,因此可能希望将内容从一种编程语言翻译成另一种编程语言。在研究环境中,研究人员可能希望在将代码翻译成另一种语言之前先用一种语言实现他们的想法。然而,在编程语言之间进行翻译需要大量的工作,这促使我们对使用大型语言模型(llm)进行半自动代码翻译产生了兴趣。本研究探讨了使用法学硕士(GPT-4)将559个简短的编程练习从Python翻译成c++、Rust、Julia和JavaScript。我们使用了三种提示策略——仅限指令、仅限代码或两者结合——并将翻译后的代码输出与Python代码的输出进行比较。提示策略对翻译成功的影响很大,而且至少有一种策略对几乎所有练习都有效。总体成功率最高的是Rust(99.5%),其次是JavaScript(98.9%)、c++(97.9%)和Julia(95.0%)。我们的研究结果表明,法学硕士可以有效地翻译语言之间的小规模编程练习,减少手工重写的需要。为了支持教育和研究,我们已经手动翻译了所有没有通过自动化成功翻译的练习,并且我们已经免费提供了我们的翻译。
{"title":"Translating short-form Python exercises to other programming languages using diverse prompting strategies.","authors":"Stephen R Piccolo, Harlan P Stevens","doi":"10.1093/gigascience/giaf149","DOIUrl":"https://doi.org/10.1093/gigascience/giaf149","url":null,"abstract":"<p><p>With the increasing complexity and quantity of experimental and observational data, life scientists rely on programming to automate analyses, enhance reproducibility, and facilitate collaboration. Scripting languages like Python are often favored for their simplicity and flexibility, enabling researchers to focus primarily on high-level tasks. Compiled languages such as C++ and Rust offer greater efficiency, making them preferable for intensive or repeated computations. In educational settings, instructors may wish to teach both types of languages and thus may wish to translate content from one programming language to another. In research contexts, researchers may wish to implement their ideas in one language before translating the code to another. However, translating between programming languages requires significant effort, prompting our interest in using large language models (LLMs) for semi-automated code translation. This study explores the use of an LLM (GPT-4) to translate 559 short-form programming exercises from Python into C++, Rust, Julia, and JavaScript. We used three prompting strategies-instructions only, code only, or both combined-and compared the translated code's output against the Python code's output. Translation success differed considerably by prompting strategy, and at least one of the strategies tested was effective for nearly every exercise. The highest overall success rate occurred for Rust (99.5%), followed by JavaScript (98.9%), C++ (97.9%), and Julia (95.0%). Our findings demonstrate that LLMs can effectively translate small-scale programming exercises between languages, reducing the need for manual rewriting. To support education and research, we have manually translated all exercises that were not translated successfully through automation, and we have made our translations freely available.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145700216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Omics and High-Spatial-Resolution Omics: Deciphering Complexity in Neurological Disorders. 多组学和高空间分辨率组学:解读神经系统疾病的复杂性。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-05 DOI: 10.1093/gigascience/giaf137
Xiuyun Liu, Fangfang Li, Marek Czosnyka, Zofia Czosnyka, Huijie Yu, Xiaoguang Tong, Yan Xing, Hongliang Li, Ke Pu, Keke Feng, Kuo Zhang, Meijun Pang, Dong Ming

Background: The world has witnessed a steady rise in neurological diseases, which represent a heterogeneous group of disorders characterized by complex pathogenesis involving disruptions at multiple molecular levels, including genomic, transcriptomic, proteomic, and metabolomic levels. These disorders, often caused by genetic mutations, metabolic imbalances, immune dysregulation, and environmental factors, pose significant challenges to global public health due to their high prevalence, mortality, and disability burden.

Results: The advent of high-throughput technologies, such as next-generation sequencing and mass spectrometry, has provided valuable insights into the underlying mechanisms of disease, especially the development of multi- and high-spatial-resolution omics technologies, enabling the interaction of multiple levels of biology and analysis of the complex molecular networks and pathophysiological processes.

Conclusions: This review provides a comprehensive analysis of the latest advancements in multi- and high-spatial-resolution omics, with a focus on their applications in precision diagnostics, biomarker discovery, and therapeutic target identification in brain diseases. The study also highlights the current challenges in the clinical implementation and discusses the future directions, with artificial intelligence being anticipated to enhance clinical translation and diagnostic accuracy significantly.

背景:世界范围内神经系统疾病的发病率稳步上升,神经系统疾病是一种异质性疾病,其发病机制复杂,涉及多个分子水平的破坏,包括基因组、转录组、蛋白质组和代谢组水平。这些疾病通常由基因突变、代谢失衡、免疫失调和环境因素引起,由于其高患病率、死亡率和残疾负担,对全球公共卫生构成重大挑战。结果:高通量技术的出现,如下一代测序和质谱,为疾病的潜在机制提供了有价值的见解,特别是多分辨率和高空间分辨率组学技术的发展,使多个生物学水平的相互作用和复杂分子网络和病理生理过程的分析成为可能。结论:本文综述了多分辨率和高空间分辨率组学的最新进展,重点介绍了它们在脑部疾病的精确诊断、生物标志物发现和治疗靶点识别方面的应用。该研究还强调了临床实施中的当前挑战,并讨论了未来的方向,预计人工智能将显著提高临床翻译和诊断准确性。
{"title":"Multi-Omics and High-Spatial-Resolution Omics: Deciphering Complexity in Neurological Disorders.","authors":"Xiuyun Liu, Fangfang Li, Marek Czosnyka, Zofia Czosnyka, Huijie Yu, Xiaoguang Tong, Yan Xing, Hongliang Li, Ke Pu, Keke Feng, Kuo Zhang, Meijun Pang, Dong Ming","doi":"10.1093/gigascience/giaf137","DOIUrl":"https://doi.org/10.1093/gigascience/giaf137","url":null,"abstract":"<p><strong>Background: </strong>The world has witnessed a steady rise in neurological diseases, which represent a heterogeneous group of disorders characterized by complex pathogenesis involving disruptions at multiple molecular levels, including genomic, transcriptomic, proteomic, and metabolomic levels. These disorders, often caused by genetic mutations, metabolic imbalances, immune dysregulation, and environmental factors, pose significant challenges to global public health due to their high prevalence, mortality, and disability burden.</p><p><strong>Results: </strong>The advent of high-throughput technologies, such as next-generation sequencing and mass spectrometry, has provided valuable insights into the underlying mechanisms of disease, especially the development of multi- and high-spatial-resolution omics technologies, enabling the interaction of multiple levels of biology and analysis of the complex molecular networks and pathophysiological processes.</p><p><strong>Conclusions: </strong>This review provides a comprehensive analysis of the latest advancements in multi- and high-spatial-resolution omics, with a focus on their applications in precision diagnostics, biomarker discovery, and therapeutic target identification in brain diseases. The study also highlights the current challenges in the clinical implementation and discusses the future directions, with artificial intelligence being anticipated to enhance clinical translation and diagnostic accuracy significantly.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145687130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Endometrial Whole Slide Images Dataset for Detection of malignancy in endometrial biopsies. 子宫内膜全切片图像数据集用于子宫内膜活检中恶性肿瘤的检测。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-05 DOI: 10.1093/gigascience/giaf147
Mahnaz Mohammadi, Christina Fell, Sarah Bell, Gareth Bryson, Sheeba Syed, Prakash Konanahalli, David Harris-Birtill, Ognjen Arandjelovic, Clare Orange, Prishma Shahi, In Hwa Um, James D Blackwood, David J Harrison

Background: Whole slide imaging (WSI) enables the digitisation of entire histological slides at high resolution, allowing pathologists and researchers to analyse tissue samples digitally rather than through traditional microscopy. This technology has become increasingly valuable in pathology for research, education, and clinical diagnostics. Endometrial biopsy is very common, often being undertaken to exclude non-cancerous disease. This means that most cases do not contain cancer, and the challenge is to accurately and efficiently exclude serious pathology rather than simply make a diagnosis of malignancy. A well-curated, expert-annotated, endometrial whole slide dataset covering a spread of cancer and non-cancer diagnoses will support machine learning applications in automated diagnosis, facilitate research into the pathology of endometrial cancer, and serve as an educational resource for medical professionals.

Results: We introduce a newly constructed, large-scale dataset of endometrial biopsies, comprising 2,909 whole slide images in iSyntax format, each accompanied by a corresponding annotation file in JSON format. Each whole slide image is labelled with a primary class label representing its final diagnosis and a sub-category label providing further details within that diagnostic class. These class labels are critical for machine learning applications, as they enable the development of AI models capable of distinguishing between different types of endometrial abnormalities, improving automated classification, and guiding clinical decision-making.

Conclusions: Constructing and curating a high-quality endometrial whole slide dataset requires significant effort to ensure accurate annotations, data integrity, and patient privacy protection. However, the availability of a well-annotated dataset with detailed class labels is crucial for advancing digital pathology. Such a resource can enhance diagnostic accuracy, support personalized treatment strategies, and ultimately improve outcomes for patients with endometrial cancer and other endometrial conditions.

背景:全切片成像(WSI)能够以高分辨率实现整个组织学切片的数字化,使病理学家和研究人员能够以数字方式分析组织样本,而不是通过传统的显微镜。这项技术在病理学研究、教育和临床诊断方面变得越来越有价值。子宫内膜活检是非常常见的,通常用于排除非癌性疾病。这意味着大多数病例并不包含癌症,挑战在于准确有效地排除严重的病理,而不是简单地做出恶性诊断。一个精心策划、专家注释的子宫内膜全幻灯片数据集涵盖了癌症和非癌症诊断的传播,将支持机器学习在自动诊断中的应用,促进对子宫内膜癌病理的研究,并作为医疗专业人员的教育资源。结果:我们引入了一个新构建的大规模子宫内膜活检数据集,包括2909张iSyntax格式的整张幻灯片图像,每张图像都附有相应的JSON格式的注释文件。每个完整的幻灯片图像都标有一个代表其最终诊断的主要类别标签和一个在该诊断类别内提供进一步细节的子类别标签。这些分类标签对于机器学习应用至关重要,因为它们使人工智能模型能够区分不同类型的子宫内膜异常,改进自动分类,并指导临床决策。结论:构建和管理高质量的子宫内膜全幻灯片数据集需要付出巨大的努力,以确保准确的注释、数据完整性和患者隐私保护。然而,一个带有详细分类标签的注释良好的数据集的可用性对于推进数字病理学至关重要。这样的资源可以提高诊断的准确性,支持个性化的治疗策略,并最终改善子宫内膜癌和其他子宫内膜疾病患者的预后。
{"title":"Endometrial Whole Slide Images Dataset for Detection of malignancy in endometrial biopsies.","authors":"Mahnaz Mohammadi, Christina Fell, Sarah Bell, Gareth Bryson, Sheeba Syed, Prakash Konanahalli, David Harris-Birtill, Ognjen Arandjelovic, Clare Orange, Prishma Shahi, In Hwa Um, James D Blackwood, David J Harrison","doi":"10.1093/gigascience/giaf147","DOIUrl":"https://doi.org/10.1093/gigascience/giaf147","url":null,"abstract":"<p><strong>Background: </strong>Whole slide imaging (WSI) enables the digitisation of entire histological slides at high resolution, allowing pathologists and researchers to analyse tissue samples digitally rather than through traditional microscopy. This technology has become increasingly valuable in pathology for research, education, and clinical diagnostics. Endometrial biopsy is very common, often being undertaken to exclude non-cancerous disease. This means that most cases do not contain cancer, and the challenge is to accurately and efficiently exclude serious pathology rather than simply make a diagnosis of malignancy. A well-curated, expert-annotated, endometrial whole slide dataset covering a spread of cancer and non-cancer diagnoses will support machine learning applications in automated diagnosis, facilitate research into the pathology of endometrial cancer, and serve as an educational resource for medical professionals.</p><p><strong>Results: </strong>We introduce a newly constructed, large-scale dataset of endometrial biopsies, comprising 2,909 whole slide images in iSyntax format, each accompanied by a corresponding annotation file in JSON format. Each whole slide image is labelled with a primary class label representing its final diagnosis and a sub-category label providing further details within that diagnostic class. These class labels are critical for machine learning applications, as they enable the development of AI models capable of distinguishing between different types of endometrial abnormalities, improving automated classification, and guiding clinical decision-making.</p><p><strong>Conclusions: </strong>Constructing and curating a high-quality endometrial whole slide dataset requires significant effort to ensure accurate annotations, data integrity, and patient privacy protection. However, the availability of a well-annotated dataset with detailed class labels is crucial for advancing digital pathology. Such a resource can enhance diagnostic accuracy, support personalized treatment strategies, and ultimately improve outcomes for patients with endometrial cancer and other endometrial conditions.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145687123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cervical Whole Slide Images Dataset for Multi-class Classification. 用于多类分类的宫颈全幻灯片图像数据集。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-11-29 DOI: 10.1093/gigascience/giaf144
Mahnaz Mohammadi, Christina Fell, David Morrison, Sarah Bell, Gareth Bryson, Sheeba Syed, Prakash Konanahalli, David Harris-Birtill, Ognjen Arandjelovic, Clare Orange, Prishma Shahi, In Hwa Um, James D Blackwood, David J Harrison

The clinical pathway for prevention and treatment of cervical cancer depends on cytology and then the assessment of biopsies, fragments of tissue removed for histological examination. This can be a significant workload and is an obvious exemplar to explore triage based on machine learning analysis of slides. Limited access to large annotated datasets of human diseased tissue is a major obstacle to developing standards and algorithms that can assist diagnosis. We present a dataset comprising 2539 whole slide images of cervical biopsies, each annotated by several pathologists and consensus on diagnosis and individual features agreed. Each whole slide image represents one slide per patient, in iSyntax format with manual annotations by pathologists in Jason format. Each whole slide image is assigned a category label which is the final diagnosis of the image, and a subcategory label which declares in which subcategory the image is found. This dataset has been used to build a model that accurately predicts diagnosis, allowing the possibility of automatically triaging biopsies, so that the most significant pathologies can be identified rapidly and those patients selected for immediate treatment. The level of annotation, at sub-slide level, and the number of cases is unique in public databases and should allow investigators to explore multiple aspects of computer vision relevant to human tissue diagnosis, with no limitation placed on access to the whole slide images.

预防和治疗宫颈癌的临床途径取决于细胞学,然后是活检的评估,切除组织片段进行组织学检查。这可能是一个很大的工作量,并且是探索基于机器学习分析幻灯片的分类的一个明显的例子。对人类患病组织的大型注释数据集的有限访问是开发有助于诊断的标准和算法的主要障碍。我们提出了一个数据集,包括2539张完整的宫颈活检切片图像,每张图像都由几位病理学家注释,并就诊断和个体特征达成共识。每张完整的幻灯片图像代表每位患者的一张幻灯片,采用iSyntax格式,病理学家使用Jason格式进行手动注释。每个完整的幻灯片图像都被分配了一个类别标签,这是图像的最终诊断,以及一个子类别标签,声明在哪个子类别中找到图像。这个数据集被用来建立一个准确预测诊断的模型,允许自动分诊活检,这样最重要的病理可以被快速识别,并选择那些患者立即治疗。在亚幻灯片级别的注释水平和病例数量在公共数据库中是独一无二的,并且应该允许研究者探索与人体组织诊断相关的计算机视觉的多个方面,而不限制对整个幻灯片图像的访问。
{"title":"Cervical Whole Slide Images Dataset for Multi-class Classification.","authors":"Mahnaz Mohammadi, Christina Fell, David Morrison, Sarah Bell, Gareth Bryson, Sheeba Syed, Prakash Konanahalli, David Harris-Birtill, Ognjen Arandjelovic, Clare Orange, Prishma Shahi, In Hwa Um, James D Blackwood, David J Harrison","doi":"10.1093/gigascience/giaf144","DOIUrl":"https://doi.org/10.1093/gigascience/giaf144","url":null,"abstract":"<p><p>The clinical pathway for prevention and treatment of cervical cancer depends on cytology and then the assessment of biopsies, fragments of tissue removed for histological examination. This can be a significant workload and is an obvious exemplar to explore triage based on machine learning analysis of slides. Limited access to large annotated datasets of human diseased tissue is a major obstacle to developing standards and algorithms that can assist diagnosis. We present a dataset comprising 2539 whole slide images of cervical biopsies, each annotated by several pathologists and consensus on diagnosis and individual features agreed. Each whole slide image represents one slide per patient, in iSyntax format with manual annotations by pathologists in Jason format. Each whole slide image is assigned a category label which is the final diagnosis of the image, and a subcategory label which declares in which subcategory the image is found. This dataset has been used to build a model that accurately predicts diagnosis, allowing the possibility of automatically triaging biopsies, so that the most significant pathologies can be identified rapidly and those patients selected for immediate treatment. The level of annotation, at sub-slide level, and the number of cases is unique in public databases and should allow investigators to explore multiple aspects of computer vision relevant to human tissue diagnosis, with no limitation placed on access to the whole slide images.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Giant chromosomes of a tiny plant - the complete telomere-to-telomere genome assembly of the simple thalloid liverwort Apopellia endiviifolia (Jungermanniopsida, Marchantiophyta). 一种微小植物的巨大染色体——简单菌体肝草Apopellia endiviifolia (Jungermanniopsida, Marchantiophyta)端粒到端粒的完整基因组组装。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-11-29 DOI: 10.1093/gigascience/giaf145
Joanna Szablińska-Piernik, Paweł Sulima, Jakub Sawicki

Background: The liverwort A. endiviifolia, a dioicous, simple thalloid species, is notable for its cryptic diversity, habitat adaptability, genomic innovation, and represents a clade that is sister to all other Jungermanniopsida. These features make A. endiviifolia an essential model for exploring speciation mechanisms and the evolution of genome structures within liverworts.

Findings: We present the genome assembly of a haploid A. endiviifolia isolate with a total size of 2,914,960,273 bp and an N50 of 468,157,909 bp, demonstrating high completeness (99.2% BUSCO) and a high consensus quality (QV 47.6). The assembly consisted of nine chromosomes, which included 18 telomeres and nine centromeres (ranging from 1.9 to 5 Mbp in length). RNA-seq-based annotation identified 34,615 genes, predominantly protein-coding. The TEs comprised 12.16% LTR elements and 57 Helitrons. Among the retroelements, the Copia and Gypsy superfamilies comprised 8.94% and 2.95% of the genome, respectively. The Ty3/Gypsy superfamily was found to be significantly enriched in centromeric regions. The average GC content ranged from 38.8% to 39.6%, with gene density varied between a value 5.52 and 9.78 genes per 500 kbp. Synteny analysis of related liverwort species has revealed complex chromosomal relationships, indicating extensive genome rearrangements among species.

Conclusions: This study provides the first high-quality reference genome assembly of the haploid liverwort A. endiviifolia. Assembly and annotation offer valuable resources for investigating liverwort evolution, centromere biology, and genome expansion in simple thalloid liverworts.

背景:苔类a . endiviifolia dioicous,简单的叶状物种,值得注意的是它的神秘的多样性、生境适应性,基因组创新,代表着一个进化枝,是所有其他Jungermanniopsida妹妹。这些特征使其成为探索地植物物种形成机制和基因组结构进化的重要模型。结果:我们展示了一个单倍体a . endiviifolia分离物的基因组组装,其总大小为2,914,960,273 bp, N50为468,157,909 bp,显示出高完整性(99.2% BUSCO)和高一致性质量(QV 47.6)。该组合由9条染色体组成,其中包括18个端粒和9个着丝粒(长度从1.9到5mbp不等)。基于rna -seq的注释鉴定了34,615个基因,主要是蛋白质编码。TEs由12.16%的LTR元素和57个helitron组成。其中,Copia超家族和Gypsy超家族分别占基因组的8.94%和2.95%。Ty3/Gypsy超家族在着丝粒区显著富集。平均GC含量为38.8% ~ 39.6%,基因密度为5.52 ~ 9.78个/ 500 kbp。近缘种的同源性分析揭示了复杂的染色体关系,表明物种之间广泛的基因组重排。结论:本研究提供了第一个高质量的单倍体肝草参考基因组序列。组装和注释为研究简单菌体苔类的进化、着丝粒生物学和基因组扩增提供了宝贵的资源。
{"title":"Giant chromosomes of a tiny plant - the complete telomere-to-telomere genome assembly of the simple thalloid liverwort Apopellia endiviifolia (Jungermanniopsida, Marchantiophyta).","authors":"Joanna Szablińska-Piernik, Paweł Sulima, Jakub Sawicki","doi":"10.1093/gigascience/giaf145","DOIUrl":"https://doi.org/10.1093/gigascience/giaf145","url":null,"abstract":"<p><strong>Background: </strong>The liverwort A. endiviifolia, a dioicous, simple thalloid species, is notable for its cryptic diversity, habitat adaptability, genomic innovation, and represents a clade that is sister to all other Jungermanniopsida. These features make A. endiviifolia an essential model for exploring speciation mechanisms and the evolution of genome structures within liverworts.</p><p><strong>Findings: </strong>We present the genome assembly of a haploid A. endiviifolia isolate with a total size of 2,914,960,273 bp and an N50 of 468,157,909 bp, demonstrating high completeness (99.2% BUSCO) and a high consensus quality (QV 47.6). The assembly consisted of nine chromosomes, which included 18 telomeres and nine centromeres (ranging from 1.9 to 5 Mbp in length). RNA-seq-based annotation identified 34,615 genes, predominantly protein-coding. The TEs comprised 12.16% LTR elements and 57 Helitrons. Among the retroelements, the Copia and Gypsy superfamilies comprised 8.94% and 2.95% of the genome, respectively. The Ty3/Gypsy superfamily was found to be significantly enriched in centromeric regions. The average GC content ranged from 38.8% to 39.6%, with gene density varied between a value 5.52 and 9.78 genes per 500 kbp. Synteny analysis of related liverwort species has revealed complex chromosomal relationships, indicating extensive genome rearrangements among species.</p><p><strong>Conclusions: </strong>This study provides the first high-quality reference genome assembly of the haploid liverwort A. endiviifolia. Assembly and annotation offer valuable resources for investigating liverwort evolution, centromere biology, and genome expansion in simple thalloid liverworts.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segmentation-Based Quality Control of Structural MRI using the CAT12 Toolbox. 使用CAT12工具箱的基于分割的结构MRI质量控制。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-11-29 DOI: 10.1093/gigascience/giaf146
Robert Dahnke, Polona Kalc, Gabriel Ziegler, Julian Grosskreutz, Christian Gaser

Background: The processing and analysis of magnetic resonance images is highly dependent on the quality of the input data, and systematic differences in quality can consequently lead to loss of sensitivity or biased results. However, varying image properties due to different scanners and acquisition protocols, as well as subject-specific image interferences, such as motion artifacts, can be incorporated in the analysis. A reliable assessment of image quality is therefore essential to identify critical outliers that may bias results.

Findings: Here we present a quality assessment for structural (T1-weighted) images using tissue classification in the SPM/CAT12 ecosystem. We introduce multiple useful image quality measures, standardize them into quality scales and combine them into an integrated structural image quality rating to facilitate the interpretation and fast identification of outliers with (motion) artifacts. The reliability and robustness of the measures are evaluated using synthetic and real datasets. Our study results demonstrate that the proposed measures are robust to simulated segmentation problems and variables of interest such as cortical atrophy, age, sex, brain size and severe disease-related changes, and might facilitate the separation of motion artifacts based on within-protocol deviations.

Conclusion: The quality control framework presents a simple but powerful tool for the use in research and clinical settings.

背景:磁共振图像的处理和分析高度依赖于输入数据的质量,系统质量的差异可能导致灵敏度的丧失或结果的偏差。然而,由于不同的扫描仪和采集协议,以及特定对象的图像干扰,如运动伪影,不同的图像属性可以纳入分析。因此,可靠的图像质量评估对于识别可能导致结果偏差的关键异常值至关重要。研究结果:在这里,我们使用SPM/CAT12生态系统中的组织分类对结构(t1加权)图像进行质量评估。我们引入了多种有用的图像质量度量,将它们标准化为质量尺度,并将它们组合成一个集成的结构图像质量评级,以促进对带有(运动)伪影的异常值的解释和快速识别。使用合成数据集和真实数据集对这些措施的可靠性和鲁棒性进行了评估。我们的研究结果表明,所提出的方法对模拟分割问题和感兴趣的变量(如皮质萎缩、年龄、性别、脑大小和严重疾病相关的变化)具有鲁棒性,并且可能有助于基于协议内偏差的运动伪影的分离。结论:质量控制框架为研究和临床提供了一个简单而有力的工具。
{"title":"Segmentation-Based Quality Control of Structural MRI using the CAT12 Toolbox.","authors":"Robert Dahnke, Polona Kalc, Gabriel Ziegler, Julian Grosskreutz, Christian Gaser","doi":"10.1093/gigascience/giaf146","DOIUrl":"https://doi.org/10.1093/gigascience/giaf146","url":null,"abstract":"<p><strong>Background: </strong>The processing and analysis of magnetic resonance images is highly dependent on the quality of the input data, and systematic differences in quality can consequently lead to loss of sensitivity or biased results. However, varying image properties due to different scanners and acquisition protocols, as well as subject-specific image interferences, such as motion artifacts, can be incorporated in the analysis. A reliable assessment of image quality is therefore essential to identify critical outliers that may bias results.</p><p><strong>Findings: </strong>Here we present a quality assessment for structural (T1-weighted) images using tissue classification in the SPM/CAT12 ecosystem. We introduce multiple useful image quality measures, standardize them into quality scales and combine them into an integrated structural image quality rating to facilitate the interpretation and fast identification of outliers with (motion) artifacts. The reliability and robustness of the measures are evaluated using synthetic and real datasets. Our study results demonstrate that the proposed measures are robust to simulated segmentation problems and variables of interest such as cortical atrophy, age, sex, brain size and severe disease-related changes, and might facilitate the separation of motion artifacts based on within-protocol deviations.</p><p><strong>Conclusion: </strong>The quality control framework presents a simple but powerful tool for the use in research and clinical settings.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145632242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A high-quality chromosome-level genome assembly of the oligophagous fruit fly Bactrocera tsuneonis (Diptera: Tephritidae) and insights into its host specificity. 少食果蝇海啸小实蝇(双翅目:蝗科)高质量染色体水平基因组组装及其宿主特异性研究。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-11-20 DOI: 10.1093/gigascience/giaf143
Tengda Guo, Weisong Li, Yuan Zhang, Wenzhao Yang, Zhihong Li, Yujia Qin

Background: Bactrocera tsuneonis is a major pest of citrus, causing significant economic losses in fruit production. It exhibits a highly specialized host preference, primarily infesting citrus fruits. However, the genetic basis underlying its olfactory adaptation and host specificity remains largely unexplored. To elucidate the molecular mechanisms governing host selection in B. tsuneonis, we assembled a high-quality chromosome-level genome and performed comparative genomic, transcriptomic, and functional analyses of its chemosensory system.

Results: The genome of B. tsuneonis was assembled to a total size of 339 Mb, with a contig N50 of 11.21 Mb and a scaffold N50 of 59.93 Mb. Comparative genomic analysis revealed significant contractions in chemosensory-related gene families, particularly in odorant-binding proteins (OBPs) and odorant receptors (ORs), maybe suggesting an adaptation to a narrow host range. Transcriptome analysis demonstrated that BtsuOBP83a and BtsuOBP83b were highly expressed in the antennae, and most ORs were predominantly expressed in the antennae. Functional assays confirmed that BtsuOBP83a selectively binds to two citrus volatiles, trans-nerolidol and piperitone, with strong affinity. Molecular docking and molecular dynamics simulations further revealed that BtsuOr7a-6 and BtsuOr7a-4 specifically interact with these volatiles, suggesting their role in host odor recognition.

Conclusions: Our high-quality genome of B. tsuneonis provides a valuable resource for genomic research and offers valuable insights into the genetic basis of its olfactory adaptation and host specificity. The findings highlight key molecular mechanisms underlying host selection and provide potential targets for behavior-based pest management strategies.

背景:海啸小实蝇是柑橘的主要害虫,给柑橘生产造成重大经济损失。它表现出高度专业化的寄主偏好,主要侵染柑橘类水果。然而,其嗅觉适应性和宿主特异性的遗传基础仍未得到充分研究。为了阐明海啸贝氏体宿主选择的分子机制,我们组装了一个高质量的染色体水平基因组,并对其化学感觉系统进行了比较基因组学、转录组学和功能分析。结果:海啸贝氏菌基因组总大小为339 Mb,序列N50为11.21 Mb,支架N50为59.93 Mb。比较基因组分析显示,化学感觉相关基因家族明显收缩,特别是气味结合蛋白(OBPs)和气味受体(ORs),可能表明其适应较窄的宿主范围。转录组分析显示BtsuOBP83a和BtsuOBP83b在触角中高表达,大部分ORs主要在触角中表达。功能分析证实,BtsuOBP83a选择性结合柑橘挥发物反式神经醇和胡椒酮,具有较强的亲和力。分子对接和分子动力学模拟进一步揭示了BtsuOr7a-6和BtsuOr7a-4与这些挥发物特异性相互作用,提示它们在宿主气味识别中起作用。结论:高质量的海啸贝氏杆菌基因组为基因组研究提供了宝贵的资源,并为其嗅觉适应和宿主特异性的遗传基础提供了有价值的见解。这些发现突出了宿主选择的关键分子机制,并为基于行为的害虫管理策略提供了潜在的目标。
{"title":"A high-quality chromosome-level genome assembly of the oligophagous fruit fly Bactrocera tsuneonis (Diptera: Tephritidae) and insights into its host specificity.","authors":"Tengda Guo, Weisong Li, Yuan Zhang, Wenzhao Yang, Zhihong Li, Yujia Qin","doi":"10.1093/gigascience/giaf143","DOIUrl":"https://doi.org/10.1093/gigascience/giaf143","url":null,"abstract":"<p><strong>Background: </strong>Bactrocera tsuneonis is a major pest of citrus, causing significant economic losses in fruit production. It exhibits a highly specialized host preference, primarily infesting citrus fruits. However, the genetic basis underlying its olfactory adaptation and host specificity remains largely unexplored. To elucidate the molecular mechanisms governing host selection in B. tsuneonis, we assembled a high-quality chromosome-level genome and performed comparative genomic, transcriptomic, and functional analyses of its chemosensory system.</p><p><strong>Results: </strong>The genome of B. tsuneonis was assembled to a total size of 339 Mb, with a contig N50 of 11.21 Mb and a scaffold N50 of 59.93 Mb. Comparative genomic analysis revealed significant contractions in chemosensory-related gene families, particularly in odorant-binding proteins (OBPs) and odorant receptors (ORs), maybe suggesting an adaptation to a narrow host range. Transcriptome analysis demonstrated that BtsuOBP83a and BtsuOBP83b were highly expressed in the antennae, and most ORs were predominantly expressed in the antennae. Functional assays confirmed that BtsuOBP83a selectively binds to two citrus volatiles, trans-nerolidol and piperitone, with strong affinity. Molecular docking and molecular dynamics simulations further revealed that BtsuOr7a-6 and BtsuOr7a-4 specifically interact with these volatiles, suggesting their role in host odor recognition.</p><p><strong>Conclusions: </strong>Our high-quality genome of B. tsuneonis provides a valuable resource for genomic research and offers valuable insights into the genetic basis of its olfactory adaptation and host specificity. The findings highlight key molecular mechanisms underlying host selection and provide potential targets for behavior-based pest management strategies.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145563504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1