首页 > 最新文献

GigaScience最新文献

英文 中文
LinkML: An Open Data Modeling Framework. LinkML:一个开放数据建模框架。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-12 DOI: 10.1093/gigascience/giaf152
Sierra A T Moxon, Harold Solbrig, Nomi L Harris, Patrick Kalita, Mark A Miller, Sujay Patil, Kevin Schaper, Chris Bizon, J Harry Caufield, Silvano Cirujano Cuesta, Corey Cox, Frank Dekervel, Damion M Dooley, William D Duncan, Tim Fliss, Sarah Gehrke, Adam S L Graefe, Harshad Hegde, A J Ireland, Julius O B Jacobsen, Madan Krishnamurthy, Carlo Kroll, David Linke, Ryan Ly, Nicolas Matentzoglu, James A Overton, Jonny L Saunders, Deepak R Unni, Gaurav Vaidya, Wouter-Michiel A M Vierdag, Oliver Ruebel, Christopher G Chute, Matthew H Brush, Melissa A Haendel, Christopher J Mungall

Background: Scientific research relies on well-structured, standardized data; however, much of it is stored in formats such as free-text lab notebooks, non-standardized spreadsheets, or data repositories. This lack of structure challenges interoperability, making data integration, validation, and reuse difficult.

Findings: LinkML (Linked Data Modeling Language) is an open framework that simplifies the process of authoring, validating, and sharing data. LinkML can describe a range of data structures, from flat, list-based models to complex, interrelated, and normalized models that utilize polymorphism and compound inheritance. It offers an approachable syntax that is not tied to any one technical architecture and can be integrated seamlessly with many existing frameworks. The LinkML syntax provides a standard way to describe schemas, classes, and relationships, allowing modelers to build well-defined, stable, and optionally ontology-aligned data structures. Once defined, LinkML schemas may be imported into other LinkML schemas. These key features make LinkML an accessible platform for interdisciplinary collaboration and a reliable way to define and share data semantics.

Conclusions: LinkML helps reduce heterogeneity, complexity, and the proliferation of single-use data models while simultaneously enabling compliance with FAIR data standards. LinkML has seen increasing adoption in various fields, including biology, chemistry, biomedicine, microbiome research, finance, electrical engineering, transportation, and commercial software development. In short, LinkML makes implicit models explicitly computable and allows data to be standardized at its origin. LinkML documentation and code are available at linkml.io.

背景:科学研究依赖于结构良好、标准化的数据;然而,大部分数据是以自由文本实验笔记本、非标准化电子表格或数据存储库等格式存储的。这种结构的缺乏挑战了互操作性,使数据集成、验证和重用变得困难。发现:LinkML(关联数据建模语言)是一个开放的框架,它简化了创作、验证和共享数据的过程。LinkML可以描述一系列数据结构,从扁平的、基于列表的模型到利用多态性和复合继承的复杂的、相互关联的和规范化的模型。它提供了一种易于使用的语法,不依赖于任何一种技术体系结构,可以与许多现有框架无缝集成。LinkML语法提供了一种描述模式、类和关系的标准方法,允许建模者构建定义良好、稳定且可选地与本体对齐的数据结构。一旦定义,就可以将LinkML模式导入到其他LinkML模式中。这些关键特性使LinkML成为跨学科协作的可访问平台,也是定义和共享数据语义的可靠方式。结论:LinkML有助于减少异构性、复杂性和一次性使用数据模型的激增,同时使其符合FAIR数据标准。LinkML在各个领域的应用越来越广泛,包括生物学、化学、生物医学、微生物组研究、金融、电子工程、交通运输和商业软件开发。简而言之,LinkML使隐式模型显式可计算,并允许数据在其起源处标准化。LinkML文档和代码可在LinkML .io上获得。
{"title":"LinkML: An Open Data Modeling Framework.","authors":"Sierra A T Moxon, Harold Solbrig, Nomi L Harris, Patrick Kalita, Mark A Miller, Sujay Patil, Kevin Schaper, Chris Bizon, J Harry Caufield, Silvano Cirujano Cuesta, Corey Cox, Frank Dekervel, Damion M Dooley, William D Duncan, Tim Fliss, Sarah Gehrke, Adam S L Graefe, Harshad Hegde, A J Ireland, Julius O B Jacobsen, Madan Krishnamurthy, Carlo Kroll, David Linke, Ryan Ly, Nicolas Matentzoglu, James A Overton, Jonny L Saunders, Deepak R Unni, Gaurav Vaidya, Wouter-Michiel A M Vierdag, Oliver Ruebel, Christopher G Chute, Matthew H Brush, Melissa A Haendel, Christopher J Mungall","doi":"10.1093/gigascience/giaf152","DOIUrl":"https://doi.org/10.1093/gigascience/giaf152","url":null,"abstract":"<p><strong>Background: </strong>Scientific research relies on well-structured, standardized data; however, much of it is stored in formats such as free-text lab notebooks, non-standardized spreadsheets, or data repositories. This lack of structure challenges interoperability, making data integration, validation, and reuse difficult.</p><p><strong>Findings: </strong>LinkML (Linked Data Modeling Language) is an open framework that simplifies the process of authoring, validating, and sharing data. LinkML can describe a range of data structures, from flat, list-based models to complex, interrelated, and normalized models that utilize polymorphism and compound inheritance. It offers an approachable syntax that is not tied to any one technical architecture and can be integrated seamlessly with many existing frameworks. The LinkML syntax provides a standard way to describe schemas, classes, and relationships, allowing modelers to build well-defined, stable, and optionally ontology-aligned data structures. Once defined, LinkML schemas may be imported into other LinkML schemas. These key features make LinkML an accessible platform for interdisciplinary collaboration and a reliable way to define and share data semantics.</p><p><strong>Conclusions: </strong>LinkML helps reduce heterogeneity, complexity, and the proliferation of single-use data models while simultaneously enabling compliance with FAIR data standards. LinkML has seen increasing adoption in various fields, including biology, chemistry, biomedicine, microbiome research, finance, electrical engineering, transportation, and commercial software development. In short, LinkML makes implicit models explicitly computable and allows data to be standardized at its origin. LinkML documentation and code are available at linkml.io.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145742108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An analysis of performance bottlenecks in MRI preprocessing. MRI预处理中的性能瓶颈分析。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae098
Mathieu Dugré, Yohan Chatelain, Tristan Glatard

Magnetic resonance imaging (MRI) preprocessing is a critical step for neuroimaging analysis. However, the computational cost of MRI preprocessing pipelines is a major bottleneck for large cohort studies and some clinical applications. While high-performance computing and, more recently, deep learning have been adopted to accelerate the computations, these techniques require costly hardware and are not accessible to all researchers. Therefore, it is important to understand the performance bottlenecks of MRI preprocessing pipelines to improve their performance. Using the Intel VTune profiler, we characterized the bottlenecks of several commonly used MRI preprocessing pipelines from the Advanced Normalization Tools (ANTs), FMRIB Software Library, and FreeSurfer toolboxes. We found few functions contributed to most of the CPU time and that linear interpolation was the largest contributor. Data access was also a substantial bottleneck. We identified a bug in the Insight Segmentation and Registration Toolkit library that impacts the performance of the ANTs pipeline in single precision and a potential issue with the OpenMP scaling in FreeSurfer recon-all. Our results provide a reference for future efforts to optimize MRI preprocessing pipelines.

磁共振成像(MRI)预处理是神经成像分析的关键步骤。然而,MRI预处理管道的计算成本是大型队列研究和一些临床应用的主要瓶颈。虽然高性能计算和最近的深度学习已被用于加速计算,但这些技术需要昂贵的硬件,并且并非所有研究人员都可以使用。因此,了解MRI预处理管道的性能瓶颈对提高其性能至关重要。使用英特尔VTune分析器,我们从Advanced Normalization Tools (ANTs)、FMRIB Software Library和FreeSurfer工具箱中描述了几种常用的MRI预处理管道的瓶颈。我们发现很少有函数占用大部分CPU时间,线性插值是最大的贡献者。数据访问也是一个重要的瓶颈。我们发现了Insight Segmentation and Registration Toolkit库中的一个bug,它会影响单精度下ANTs管道的性能,并且在FreeSurfer recon-all中存在OpenMP缩放的潜在问题。我们的研究结果为进一步优化MRI预处理流程提供了参考。
{"title":"An analysis of performance bottlenecks in MRI preprocessing.","authors":"Mathieu Dugré, Yohan Chatelain, Tristan Glatard","doi":"10.1093/gigascience/giae098","DOIUrl":"10.1093/gigascience/giae098","url":null,"abstract":"<p><p>Magnetic resonance imaging (MRI) preprocessing is a critical step for neuroimaging analysis. However, the computational cost of MRI preprocessing pipelines is a major bottleneck for large cohort studies and some clinical applications. While high-performance computing and, more recently, deep learning have been adopted to accelerate the computations, these techniques require costly hardware and are not accessible to all researchers. Therefore, it is important to understand the performance bottlenecks of MRI preprocessing pipelines to improve their performance. Using the Intel VTune profiler, we characterized the bottlenecks of several commonly used MRI preprocessing pipelines from the Advanced Normalization Tools (ANTs), FMRIB Software Library, and FreeSurfer toolboxes. We found few functions contributed to most of the CPU time and that linear interpolation was the largest contributor. Data access was also a substantial bottleneck. We identified a bug in the Insight Segmentation and Registration Toolkit library that impacts the performance of the ANTs pipeline in single precision and a potential issue with the OpenMP scaling in FreeSurfer recon-all. Our results provide a reference for future efforts to optimize MRI preprocessing pipelines.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899568/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143614576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Similar, but not the same: multiomics comparison of human valve interstitial cells and osteoblast osteogenic differentiation expanded with an estimation of data-dependent and data-independent PASEF proteomics. 相似,但不相同:人瓣膜间质细胞和成骨细胞成骨分化的多组学比较扩展了对数据依赖和数据独立的PASEF蛋白质组学的估计。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae110
Arseniy Lobov, Polina Kuchur, Nadezhda Boyarskaya, Daria Perepletchikova, Ivan Taraskin, Andrei Ivashkin, Daria Kostina, Irina Khvorova, Vladimir Uspensky, Egor Repkin, Evgeny Denisov, Tatiana Gerashchenko, Rashid Tikhilov, Svetlana Bozhkova, Vitaly Karelkin, Chunli Wang, Kang Xu, Anna Malashicheva

Osteogenic differentiation is crucial in normal bone formation and pathological calcification, such as calcific aortic valve disease (CAVD). Understanding the proteomic and transcriptomic landscapes underlying this differentiation can unveil potential therapeutic targets for CAVD. In this study, we employed RNA sequencing transcriptomics and proteomics on a timsTOF Pro platform to explore the multiomics profiles of valve interstitial cells (VICs) and osteoblasts during osteogenic differentiation. For proteomics, we utilized 3 data acquisition/analysis techniques: data-dependent acquisition (DDA)-parallel accumulation serial fragmentation (PASEF) and data-independent acquisition (DIA)-PASEF with a classic library-based (DIA) and machine learning-based library-free search (DIA-ML). Using RNA sequencing data as a biological reference, we compared these 3 analytical techniques in the context of actual biological experiments. We use this comprehensive dataset to reveal distinct proteomic and transcriptomic profiles between VICs and osteoblasts, highlighting specific biological processes in their osteogenic differentiation pathways. The study identified potential therapeutic targets specific for VICs osteogenic differentiation in CAVD, including the MAOA and ERK1/2 pathway. From a technical perspective, we found that DIA-based methods demonstrate even higher superiority against DDA for more sophisticated human primary cell cultures than it was shown before on HeLa samples. While the classic library-based DIA approach has proved to be a gold standard for shotgun proteomics research, the DIA-ML offers significant advantages with a relatively minor compromise in data reliability, making it the method of choice for routine proteomics.

成骨分化在正常骨形成和病理钙化(如钙化性主动脉瓣病(CAVD))中至关重要。了解这种分化背后的蛋白质组和转录组图谱可以揭示治疗 CAVD 的潜在靶点。在这项研究中,我们在timsTOF Pro平台上采用了RNA测序转录组学和蛋白质组学,以探索成骨分化过程中瓣膜间质细胞(VICs)和成骨细胞的多组学特征。在蛋白质组学方面,我们采用了3种数据采集/分析技术:数据依赖性采集(DDA)-平行累积序列片段(PASEF)和数据无关性采集(DIA)-PASEF,以及基于经典文库的搜索(DIA)和基于机器学习的无文库搜索(DIA-ML)。我们使用 RNA 测序数据作为生物参考,在实际生物实验中对这 3 种分析技术进行了比较。我们利用这个全面的数据集揭示了 VICs 和成骨细胞之间不同的蛋白质组和转录组特征,突出了它们成骨分化途径中的特定生物过程。研究发现了CAVD中VICs成骨分化的潜在治疗靶点,包括MAOA和ERK1/2通路。从技术角度看,我们发现基于 DIA 的方法在更复杂的人类原代细胞培养物上比 DDA 更有优势,这一点在 HeLa 样品上已经得到证实。虽然经典的基于文库的 DIA 方法已被证明是枪式蛋白质组学研究的黄金标准,但 DIA-ML 在数据可靠性方面的妥协相对较小,却提供了显著的优势,使其成为常规蛋白质组学研究的首选方法。
{"title":"Similar, but not the same: multiomics comparison of human valve interstitial cells and osteoblast osteogenic differentiation expanded with an estimation of data-dependent and data-independent PASEF proteomics.","authors":"Arseniy Lobov, Polina Kuchur, Nadezhda Boyarskaya, Daria Perepletchikova, Ivan Taraskin, Andrei Ivashkin, Daria Kostina, Irina Khvorova, Vladimir Uspensky, Egor Repkin, Evgeny Denisov, Tatiana Gerashchenko, Rashid Tikhilov, Svetlana Bozhkova, Vitaly Karelkin, Chunli Wang, Kang Xu, Anna Malashicheva","doi":"10.1093/gigascience/giae110","DOIUrl":"10.1093/gigascience/giae110","url":null,"abstract":"<p><p>Osteogenic differentiation is crucial in normal bone formation and pathological calcification, such as calcific aortic valve disease (CAVD). Understanding the proteomic and transcriptomic landscapes underlying this differentiation can unveil potential therapeutic targets for CAVD. In this study, we employed RNA sequencing transcriptomics and proteomics on a timsTOF Pro platform to explore the multiomics profiles of valve interstitial cells (VICs) and osteoblasts during osteogenic differentiation. For proteomics, we utilized 3 data acquisition/analysis techniques: data-dependent acquisition (DDA)-parallel accumulation serial fragmentation (PASEF) and data-independent acquisition (DIA)-PASEF with a classic library-based (DIA) and machine learning-based library-free search (DIA-ML). Using RNA sequencing data as a biological reference, we compared these 3 analytical techniques in the context of actual biological experiments. We use this comprehensive dataset to reveal distinct proteomic and transcriptomic profiles between VICs and osteoblasts, highlighting specific biological processes in their osteogenic differentiation pathways. The study identified potential therapeutic targets specific for VICs osteogenic differentiation in CAVD, including the MAOA and ERK1/2 pathway. From a technical perspective, we found that DIA-based methods demonstrate even higher superiority against DDA for more sophisticated human primary cell cultures than it was shown before on HeLa samples. While the classic library-based DIA approach has proved to be a gold standard for shotgun proteomics research, the DIA-ML offers significant advantages with a relatively minor compromise in data reliability, making it the method of choice for routine proteomics.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143055932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to select predictive models for decision-making or causal inference. 如何为决策或因果推理选择预测模型。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf016
Matthieu Doutreligne, Gaël Varoquaux

Background: We investigate which procedure selects the most trustworthy predictive model to explain the effect of an intervention and support decision-making.

Methods: We study a large variety of model selection procedures in practical settings: finite samples settings and without a theoretical assumption of well-specified models. Beyond standard cross-validation or internal validation procedures, we also study elaborate causal risks. These build proxies of the causal error using "nuisance" reweighting to compute it on the observed data. We evaluate whether empirically estimated nuisances, which are necessarily noisy, add noise to model selection and compare different metrics for causal model selection in an extensive empirical study based on a simulation and 3 health care datasets based on real covariates.

Results: Among all metrics, the mean squared error, classically used to evaluate predictive modes, is worse. Reweighting it with a propensity score does not bring much improvement in most cases. On average, the $Rtext{-risk}$, which uses as nuisances a model of mean outcome and propensity scores, leads to the best performances. Nuisance corrections are best estimated with flexible estimators such as a super learner.

Conclusions: When predictive models are used to explain the effect of an intervention, they must be evaluated with different procedures than standard predictive settings, using the $Rtext{-risk}$ from causal inference.

背景:我们研究哪种程序选择最可信的预测模型来解释干预和支持决策的效果。方法:我们在实际设置中研究了各种各样的模型选择程序:有限样本设置和没有良好指定模型的理论假设。除了标准的交叉验证或内部验证程序,我们还研究了详细的因果风险。这些方法使用“讨厌的”重新加权来根据观察到的数据计算因果误差,从而构建因果误差的代理。在基于模拟和基于真实协变量的3个医疗数据集的广泛实证研究中,我们评估了经验估计的滋扰(必然是有噪声的)是否在模型选择中添加了噪声,并比较了因果模型选择的不同度量。结果:在所有指标中,均方误差(通常用于评估预测模式)较差。在大多数情况下,用倾向分数重新加权并不能带来太大的改善。平均而言,$Rtext{-risk}$(使用均值结果和倾向分数模型作为干扰)会产生最好的表现。最好用灵活的估计器(如超级学习器)来估计麻烦的修正。结论:当使用预测模型来解释干预措施的效果时,必须使用与标准预测设置不同的程序来评估它们,使用因果推理的$Rtext{-risk}$。
{"title":"How to select predictive models for decision-making or causal inference.","authors":"Matthieu Doutreligne, Gaël Varoquaux","doi":"10.1093/gigascience/giaf016","DOIUrl":"10.1093/gigascience/giaf016","url":null,"abstract":"<p><strong>Background: </strong>We investigate which procedure selects the most trustworthy predictive model to explain the effect of an intervention and support decision-making.</p><p><strong>Methods: </strong>We study a large variety of model selection procedures in practical settings: finite samples settings and without a theoretical assumption of well-specified models. Beyond standard cross-validation or internal validation procedures, we also study elaborate causal risks. These build proxies of the causal error using \"nuisance\" reweighting to compute it on the observed data. We evaluate whether empirically estimated nuisances, which are necessarily noisy, add noise to model selection and compare different metrics for causal model selection in an extensive empirical study based on a simulation and 3 health care datasets based on real covariates.</p><p><strong>Results: </strong>Among all metrics, the mean squared error, classically used to evaluate predictive modes, is worse. Reweighting it with a propensity score does not bring much improvement in most cases. On average, the $Rtext{-risk}$, which uses as nuisances a model of mean outcome and propensity scores, leads to the best performances. Nuisance corrections are best estimated with flexible estimators such as a super learner.</p><p><strong>Conclusions: </strong>When predictive models are used to explain the effect of an intervention, they must be evaluated with different procedures than standard predictive settings, using the $Rtext{-risk}$ from causal inference.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11927402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The haplotype-resolved T2T genome for Bauhinia × blakeana sheds light on the genetic basis of flower heterosis. 紫荆T2T基因组的单倍型解析揭示了紫荆花杂种优势的遗传基础。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf044
Weixue Mu, Joshua Casey Darian, Wing-Kin Sung, Xing Guo, Tuo Yang, Mandy Wai Man Tang, Ziqiang Chen, Steve Kwan Hok Tong, Irene Wing Shan Chik, Robert L Davidson, Scott C Edmunds, Tong Wei, Stephen Kwok-Wing Tsui

Background: The Hong Kong orchid tree Bauhinia × blakeana Dunn has long been proposed to be a sterile interspecific hybrid exhibiting flower heterosis when compared to its likely parental species, Bauhinia purpurea L. and Bauhinia variegata L. Here, we report comparative genomic and transcriptomic analyses of the 3 Bauhinia species.

Findings: We generated chromosome-level assemblies for the parental species and applied a trio-binning approach to construct a haplotype-resolved telomere-to-telomere (T2T) genome for B. blakeana. Comparative chloroplast genome analysis confirmed B. purpurea as the maternal parent. Transcriptome profiling of flower tissues highlighted a closer resemblance of B. blakeana to its maternal parent. Differential gene expression analyses revealed distinct expression patterns among the 3 species, particularly in biosynthetic and metabolic processes. To investigate the genetic basis of flower heterosis observed in B. blakeana, we focused on gene expression patterns within pigment biosynthesis-related pathways. High-parent dominance and overdominance expression patterns were observed, particularly in genes associated with carotenoid biosynthesis. Additionally, allele-specific expression analysis revealed a balanced contribution of maternal and paternal alleles in shaping the gene expression patterns in B. blakeana.

Conclusions: Our study offers valuable insights into the genome architecture of hybrid B. blakeana, establishing a comprehensive genomic and transcriptomic resource for future functional genetics research within the Bauhinia genus. It also serves as a model for exploring the characteristics of hybrid species using T2T haplotype-resolved genomes, providing a novel approach to understanding genetic interactions and evolutionary mechanisms in complex genomes with high heterozygosity.

背景:香港兰树紫荆花(Bauhinia × blakeana Dunn)长期以来一直被认为是一种不育的种间杂交植物,与其可能的亲本种紫荆花(Bauhinia purpurea L.)和紫荆花(Bauhinia variegata L.)相比,表现出花的杂种优势。结果:我们为亲本物种生成了染色体水平的组装,并应用三联体方法构建了单倍型解决的黑螺旋藻端粒到端粒(T2T)基因组。比较叶绿体基因组分析证实紫芽孢杆菌为亲本。花组织的转录组分析突出显示了白桦与其母本更接近的相似性。差异基因表达分析显示,3种植物的表达模式不同,特别是在生物合成和代谢过程中。为了研究白叶白叶白花杂种优势的遗传基础,我们重点研究了白叶白花色素生物合成相关途径的基因表达模式。观察到高亲本显性和过显性表达模式,特别是在类胡萝卜素生物合成相关基因中。此外,等位基因特异性表达分析揭示了母本和父本等位基因在形成白螺旋藻基因表达模式中的平衡贡献。结论:本研究为杂种紫荆的基因组结构提供了有价值的见解,为未来紫荆属功能遗传学研究建立了全面的基因组和转录组学资源。该研究还可作为利用T2T单倍型解析基因组探索杂交物种特征的模型,为理解高杂合性复杂基因组中的遗传相互作用和进化机制提供了新的途径。
{"title":"The haplotype-resolved T2T genome for Bauhinia × blakeana sheds light on the genetic basis of flower heterosis.","authors":"Weixue Mu, Joshua Casey Darian, Wing-Kin Sung, Xing Guo, Tuo Yang, Mandy Wai Man Tang, Ziqiang Chen, Steve Kwan Hok Tong, Irene Wing Shan Chik, Robert L Davidson, Scott C Edmunds, Tong Wei, Stephen Kwok-Wing Tsui","doi":"10.1093/gigascience/giaf044","DOIUrl":"https://doi.org/10.1093/gigascience/giaf044","url":null,"abstract":"<p><strong>Background: </strong>The Hong Kong orchid tree Bauhinia × blakeana Dunn has long been proposed to be a sterile interspecific hybrid exhibiting flower heterosis when compared to its likely parental species, Bauhinia purpurea L. and Bauhinia variegata L. Here, we report comparative genomic and transcriptomic analyses of the 3 Bauhinia species.</p><p><strong>Findings: </strong>We generated chromosome-level assemblies for the parental species and applied a trio-binning approach to construct a haplotype-resolved telomere-to-telomere (T2T) genome for B. blakeana. Comparative chloroplast genome analysis confirmed B. purpurea as the maternal parent. Transcriptome profiling of flower tissues highlighted a closer resemblance of B. blakeana to its maternal parent. Differential gene expression analyses revealed distinct expression patterns among the 3 species, particularly in biosynthetic and metabolic processes. To investigate the genetic basis of flower heterosis observed in B. blakeana, we focused on gene expression patterns within pigment biosynthesis-related pathways. High-parent dominance and overdominance expression patterns were observed, particularly in genes associated with carotenoid biosynthesis. Additionally, allele-specific expression analysis revealed a balanced contribution of maternal and paternal alleles in shaping the gene expression patterns in B. blakeana.</p><p><strong>Conclusions: </strong>Our study offers valuable insights into the genome architecture of hybrid B. blakeana, establishing a comprehensive genomic and transcriptomic resource for future functional genetics research within the Bauhinia genus. It also serves as a model for exploring the characteristics of hybrid species using T2T haplotype-resolved genomes, providing a novel approach to understanding genetic interactions and evolutionary mechanisms in complex genomes with high heterozygosity.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overture: an open-source genomics data platform. Overture:一个开源基因组数据平台。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf038
Mitchell Shiell, Rosi Bajari, Dusan Andric, Jon Eubank, Brandon F Chan, Anders J Richardsson, Azher Ali, Bashar Allabadi, Yelizar Alturmessov, Jared Baker, Ann Catton, Kim Cullion, Daniel DeMaria, Patrick Dos Santos, Henrich Feher, Francois Gerthoffert, Minh Ha, Robin A Haw, Atul Kachru, Alexandru Lepsa, Alexis Li, Rakesh N Mistry, Hardeep K Nahal-Bose, Aleksandra Pejovic, Samantha Rich, Leonardo Rivera, Ciarán Schütte, Edmund Su, Robert Tisma, Jaser Uddin, Chang Wang, Alex N Wilmer, Linda Xiang, Junjun Zhang, Lincoln D Stein, Vincent Ferretti, Mélanie Courtot, Christina K Yung

Background: Next-generation sequencing has created many new technological challenges in organizing and distributing genomics datasets, which now can routinely reach petabyte scales. Coupled with data-hungry artificial intelligence and machine learning applications, findable, accessible, interoperable, and reusable genomics datasets have never been more valuable. While major archives like the Genomics Data Commons, Sequence Reads Archive, and European Genome-Phenome Archive have improved researchers' ability to share and reuse data, and general-purpose repositories such as Zenodo and Figshare provide valuable platforms for research data publication, the diversity of genomics research precludes any one-size-fits-all approach. In many cases, bespoke solutions are required, and despite funding agencies and journals increasingly mandating reusable data practices, researchers still lack the technical support needed to meet the multifaceted challenges of data reuse.

Findings: Overture bridges this gap by providing open-source software for building and deploying customizable genomics data platforms. Its architecture consists of modular microservices, each of which is generalized with narrow responsibilities that together combine to create complete data management systems. These systems enable researchers to organize, share, and explore their genomics data at any scale. Through Overture, researchers can connect their data to both humans and machines, fostering reproducibility and enabling new insights through controlled data sharing and reuse.

Conclusions: By making these tools freely available, we can accelerate the development of reliable genomic data management across the research community quickly, flexibly, and at multiple scales. Overture is an open-source project licensed under AGPLv3.0 with all source code publicly available from https://github.com/overture-stack and documentation on development, deployment, and usage available from www.overture.bio.

背景:下一代测序在组织和分发基因组学数据集方面创造了许多新的技术挑战,这些数据集现在通常可以达到pb级。再加上数据饥渴的人工智能和机器学习应用程序,可查找、可访问、可互操作和可重用的基因组学数据集从未像现在这样有价值。虽然基因组学数据共享、序列读取档案和欧洲基因组-表型档案等主要档案提高了研究人员共享和重用数据的能力,而通用存储库(如Zenodo和Figshare)为研究数据发布提供了有价值的平台,但基因组学研究的多样性排除了任何一种通用的方法。在许多情况下,需要定制的解决方案,尽管资助机构和期刊越来越多地要求可重用数据实践,但研究人员仍然缺乏应对数据重用的多方面挑战所需的技术支持。Overture通过提供开源软件来构建和部署可定制的基因组数据平台,从而弥补了这一差距。它的体系结构由模块化的微服务组成,每个微服务都具有狭义的职责,这些职责结合在一起创建了完整的数据管理系统。这些系统使研究人员能够组织、共享和探索任何规模的基因组学数据。通过Overture,研究人员可以将他们的数据与人类和机器连接起来,通过受控的数据共享和重用来促进再现性并实现新的见解。结论:通过免费提供这些工具,我们可以快速、灵活、多尺度地加速整个研究界可靠的基因组数据管理的发展。Overture是一个基于AGPLv3.0许可的开源项目,所有源代码都可以从https://github.com/overture-stack公开获得,有关开发、部署和使用的文档可以从www.overture.bio获得。
{"title":"Overture: an open-source genomics data platform.","authors":"Mitchell Shiell, Rosi Bajari, Dusan Andric, Jon Eubank, Brandon F Chan, Anders J Richardsson, Azher Ali, Bashar Allabadi, Yelizar Alturmessov, Jared Baker, Ann Catton, Kim Cullion, Daniel DeMaria, Patrick Dos Santos, Henrich Feher, Francois Gerthoffert, Minh Ha, Robin A Haw, Atul Kachru, Alexandru Lepsa, Alexis Li, Rakesh N Mistry, Hardeep K Nahal-Bose, Aleksandra Pejovic, Samantha Rich, Leonardo Rivera, Ciarán Schütte, Edmund Su, Robert Tisma, Jaser Uddin, Chang Wang, Alex N Wilmer, Linda Xiang, Junjun Zhang, Lincoln D Stein, Vincent Ferretti, Mélanie Courtot, Christina K Yung","doi":"10.1093/gigascience/giaf038","DOIUrl":"https://doi.org/10.1093/gigascience/giaf038","url":null,"abstract":"<p><strong>Background: </strong>Next-generation sequencing has created many new technological challenges in organizing and distributing genomics datasets, which now can routinely reach petabyte scales. Coupled with data-hungry artificial intelligence and machine learning applications, findable, accessible, interoperable, and reusable genomics datasets have never been more valuable. While major archives like the Genomics Data Commons, Sequence Reads Archive, and European Genome-Phenome Archive have improved researchers' ability to share and reuse data, and general-purpose repositories such as Zenodo and Figshare provide valuable platforms for research data publication, the diversity of genomics research precludes any one-size-fits-all approach. In many cases, bespoke solutions are required, and despite funding agencies and journals increasingly mandating reusable data practices, researchers still lack the technical support needed to meet the multifaceted challenges of data reuse.</p><p><strong>Findings: </strong>Overture bridges this gap by providing open-source software for building and deploying customizable genomics data platforms. Its architecture consists of modular microservices, each of which is generalized with narrow responsibilities that together combine to create complete data management systems. These systems enable researchers to organize, share, and explore their genomics data at any scale. Through Overture, researchers can connect their data to both humans and machines, fostering reproducibility and enabling new insights through controlled data sharing and reuse.</p><p><strong>Conclusions: </strong>By making these tools freely available, we can accelerate the development of reliable genomic data management across the research community quickly, flexibly, and at multiple scales. Overture is an open-source project licensed under AGPLv3.0 with all source code publicly available from https://github.com/overture-stack and documentation on development, deployment, and usage available from www.overture.bio.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12020472/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143996787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Telomere-to-telomere genome assembly of Electrophorus electricus provides insights into the evolution of electric eels. 电鳗的端粒到端粒基因组组装提供了对电鳗进化的见解。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf024
Zan Qi, Qun Liu, Haorong Li, Yaolei Zhang, Ziwei Yu, Wenkai Luo, Kun Wang, Yuxin Zhang, Shoupeng Pan, Chao Wang, Hui Jiang, Qiang Qiu, Wen Wang, Guangyi Fan, Yongxin Li

Background: Electric eels evolved remarkable electric organs that enable them to instantaneously discharge hundreds of volts for predation, defense, and communication. However, the absence of a high-quality reference genome has extremely constrained the studies of electric eels in various aspects.

Results: Using high-depth, multiplatform sequencing data, we successfully assembled the first telomere-to-telomere high-quality reference genome of Electrophorus electricus, which has a genome size of 833.43 Mb and comprises 26 chromosomes. Multiple evaluations, including N50 statistics (30.38 Mb), BUSCO scores (97.30%), and mapping ratio of short-insert sequencing data (99.91%), demonstrate the high contiguity and completeness of the electric eel genome assembly we obtained. Genome annotation predicted 396.63 Mb repetitive sequences and 20,992 protein-coding genes. Furthermore, evolutionary analyses indicate that Gymnotiformes, which the electric eel belongs to, has a closer relationship with Characiformes than Siluriformes and diverged from Characiformes 95.00 million years ago. Pairwise sequentially Markovian coalescent analysis found a sharply decreased trend of the population size of E. electricus over the past few hundred thousand years. Furthermore, many regulatory factors related to neurotransmitters and classical signaling pathways during embryonic development were significantly expanded, potentially contributing to the generation of high-voltage electricity.

Conclusions: This study not only provided the first high-quality telomere-to-telomere reference genome of E. electricus but also greatly enhanced our understanding of electric eels.

背景:电鳗进化出了非凡的电器官,使它们能够瞬间放电数百伏,用于捕食、防御和交流。然而,缺乏高质量的参考基因组极大地限制了电鳗在各个方面的研究。结果:利用高深度、多平台的测序数据,成功组装了电鳗首个端粒到端粒的高质量参考基因组,基因组大小为833.43 Mb,包含26条染色体。N50统计量(30.38 Mb)、BUSCO评分(97.30%)和短插入测序数据作图率(99.91%)等多项评价表明,我们获得的电鳗基因组序列具有较高的亲和性和完整性。基因组注释预测了396.63 Mb重复序列和20,992个蛋白质编码基因。此外,进化分析表明,电鳗所属的裸子形目与特征形目的亲缘关系比志卢形目更密切,并在9500万年前从特征形目中分化出来。两两序贯马尔可夫聚结分析发现,电鳗种群规模在过去几十万年间呈急剧下降趋势。此外,胚胎发育过程中与神经递质和经典信号通路相关的许多调节因子显着扩增,可能有助于产生高压电。结论:本研究不仅提供了第一个高质量的电鳗端粒-端粒参考基因组,而且大大提高了我们对电鳗的认识。
{"title":"Telomere-to-telomere genome assembly of Electrophorus electricus provides insights into the evolution of electric eels.","authors":"Zan Qi, Qun Liu, Haorong Li, Yaolei Zhang, Ziwei Yu, Wenkai Luo, Kun Wang, Yuxin Zhang, Shoupeng Pan, Chao Wang, Hui Jiang, Qiang Qiu, Wen Wang, Guangyi Fan, Yongxin Li","doi":"10.1093/gigascience/giaf024","DOIUrl":"10.1093/gigascience/giaf024","url":null,"abstract":"<p><strong>Background: </strong>Electric eels evolved remarkable electric organs that enable them to instantaneously discharge hundreds of volts for predation, defense, and communication. However, the absence of a high-quality reference genome has extremely constrained the studies of electric eels in various aspects.</p><p><strong>Results: </strong>Using high-depth, multiplatform sequencing data, we successfully assembled the first telomere-to-telomere high-quality reference genome of Electrophorus electricus, which has a genome size of 833.43 Mb and comprises 26 chromosomes. Multiple evaluations, including N50 statistics (30.38 Mb), BUSCO scores (97.30%), and mapping ratio of short-insert sequencing data (99.91%), demonstrate the high contiguity and completeness of the electric eel genome assembly we obtained. Genome annotation predicted 396.63 Mb repetitive sequences and 20,992 protein-coding genes. Furthermore, evolutionary analyses indicate that Gymnotiformes, which the electric eel belongs to, has a closer relationship with Characiformes than Siluriformes and diverged from Characiformes 95.00 million years ago. Pairwise sequentially Markovian coalescent analysis found a sharply decreased trend of the population size of E. electricus over the past few hundred thousand years. Furthermore, many regulatory factors related to neurotransmitters and classical signaling pathways during embryonic development were significantly expanded, potentially contributing to the generation of high-voltage electricity.</p><p><strong>Conclusions: </strong>This study not only provided the first high-quality telomere-to-telomere reference genome of E. electricus but also greatly enhanced our understanding of electric eels.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11959694/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
New implementation of data standards for AI in oncology: Experience from the EuCanImage project. 肿瘤学人工智能数据标准的新实施:来自EuCanImage项目的经验。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae101
Teresa García-Lezana, Maciej Bobowicz, Santiago Frid, Michael Rutherford, Mikel Recuero, Katrine Riklund, Aldar Cabrelles, Marlena Rygusik, Lauren Fromont, Roberto Francischello, Emanuele Neri, Salvador Capella, Arcadi Navarro, Fred Prior, Jonathan Bona, Pilar Nicolas, Martijn P A Starmans, Karim Lekadir, Jordi Rambla

Background: An unprecedented amount of personal health data, with the potential to revolutionize precision medicine, is generated at health care institutions worldwide. The exploitation of such data using artificial intelligence (AI) relies on the ability to combine heterogeneous, multicentric, multimodal, and multiparametric data, as well as thoughtful representation of knowledge and data availability. Despite these possibilities, significant methodological challenges and ethicolegal constraints still impede the real-world implementation of data models.

Technical details: The EuCanImage is an international consortium aimed at developing AI algorithms for precision medicine in oncology and enabling secondary use of the data based on necessary ethical approvals. The use of well-defined clinical data standards to allow interoperability was a central element within the initiative. The consortium is focused on 3 different cancer types and addresses 7 unmet clinical needs. We have conceived and implemented an innovative process to capture clinical data from hospitals, transform it into the newly developed EuCanImage data models, and then store the standardized data in permanent repositories. This new workflow combines recognized software (REDCap for data capture), data standards (FHIR for data structuring), and an existing repository (EGA for permanent data storage and sharing), with newly developed custom tools for data transformation and quality control purposes (ETL pipeline, QC scripts) to complement the gaps.

Conclusion: This article synthesizes our experience and procedures for health care data interoperability, standardization, and reproducibility.

背景:世界各地的卫生保健机构产生了前所未有的个人健康数据,具有彻底改变精准医疗的潜力。使用人工智能(AI)对这些数据的利用依赖于将异构、多中心、多模式和多参数数据结合起来的能力,以及对知识和数据可用性的深思熟虑的表示。尽管有这些可能性,重大的方法挑战和伦理法律约束仍然阻碍了数据模型在现实世界中的实现。技术细节:EuCanImage是一个国际联盟,旨在开发用于肿瘤精准医学的人工智能算法,并在必要的伦理批准的基础上实现数据的二次使用。使用定义良好的临床数据标准来实现互操作性是该计划的核心要素。该联盟专注于3种不同的癌症类型,并解决7个未满足的临床需求。我们构思并实施了一个创新流程,从医院捕获临床数据,将其转换为新开发的EuCanImage数据模型,然后将标准化数据存储在永久存储库中。这个新的工作流程结合了公认的软件(用于数据捕获的REDCap)、数据标准(用于数据结构的FHIR)和现有的存储库(用于永久数据存储和共享的EGA),以及新开发的用于数据转换和质量控制目的的定制工具(ETL管道、QC脚本),以弥补差距。结论:本文综合了我们在医疗保健数据互操作性、标准化和可重复性方面的经验和流程。
{"title":"New implementation of data standards for AI in oncology: Experience from the EuCanImage project.","authors":"Teresa García-Lezana, Maciej Bobowicz, Santiago Frid, Michael Rutherford, Mikel Recuero, Katrine Riklund, Aldar Cabrelles, Marlena Rygusik, Lauren Fromont, Roberto Francischello, Emanuele Neri, Salvador Capella, Arcadi Navarro, Fred Prior, Jonathan Bona, Pilar Nicolas, Martijn P A Starmans, Karim Lekadir, Jordi Rambla","doi":"10.1093/gigascience/giae101","DOIUrl":"10.1093/gigascience/giae101","url":null,"abstract":"<p><strong>Background: </strong>An unprecedented amount of personal health data, with the potential to revolutionize precision medicine, is generated at health care institutions worldwide. The exploitation of such data using artificial intelligence (AI) relies on the ability to combine heterogeneous, multicentric, multimodal, and multiparametric data, as well as thoughtful representation of knowledge and data availability. Despite these possibilities, significant methodological challenges and ethicolegal constraints still impede the real-world implementation of data models.</p><p><strong>Technical details: </strong>The EuCanImage is an international consortium aimed at developing AI algorithms for precision medicine in oncology and enabling secondary use of the data based on necessary ethical approvals. The use of well-defined clinical data standards to allow interoperability was a central element within the initiative. The consortium is focused on 3 different cancer types and addresses 7 unmet clinical needs. We have conceived and implemented an innovative process to capture clinical data from hospitals, transform it into the newly developed EuCanImage data models, and then store the standardized data in permanent repositories. This new workflow combines recognized software (REDCap for data capture), data standards (FHIR for data structuring), and an existing repository (EGA for permanent data storage and sharing), with newly developed custom tools for data transformation and quality control purposes (ETL pipeline, QC scripts) to complement the gaps.</p><p><strong>Conclusion: </strong>This article synthesizes our experience and procedures for health care data interoperability, standardization, and reproducibility.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12071370/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144010593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Best-practice guidance for Earth BioGenome Project sample collection and processing: progress and challenges in biodiverse reference genome creation. 地球生物基因组计划样本收集和处理的最佳实践指南:生物多样性参考基因组创建的进展和挑战。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf041
Mara K N Lawniczak, Kevin M Kocot, Jonas J Astrin, Mark Blaxter, Cibele G Sotero-Caio, Katharine B Barker, Anna K Childers, Jonathan Coddington, Paul Davis, Kerstin Howe, Warren E Johnson, Duane D McKenna, Jeremy G Wideman, Olga Vinnere Pettersson, Verena Ras, Bernardo F Santos

The Earth BioGenome Project has the extremely ambitious goal of generating, at scale, high-quality reference genomes across the entire Tree of Life. Currently in its first phase, the project is targeting family-level representatives and is progressing rapidly. Here we outline recommended standards and considerations in sample acquisition and processing for those involved in biodiverse reference genome creation. These standards and recommendations will evolve with advances in related processes. Additionally, we discuss the challenges raised by the ambitions for later phases of the project, highlighting topics related to sample collection and processing that require further development.

地球生物基因组计划有一个雄心勃勃的目标,那就是在整个生命之树中大规模地生成高质量的参考基因组。该项目目前处于第一阶段,以家庭一级的代表为对象,进展迅速。在这里,我们概述了在样本采集和处理中涉及生物多样性参考基因组创建的建议标准和注意事项。这些标准和建议将随着相关进程的进展而发展。此外,我们还讨论了项目后期雄心所带来的挑战,强调了需要进一步发展的与样品收集和处理相关的主题。
{"title":"Best-practice guidance for Earth BioGenome Project sample collection and processing: progress and challenges in biodiverse reference genome creation.","authors":"Mara K N Lawniczak, Kevin M Kocot, Jonas J Astrin, Mark Blaxter, Cibele G Sotero-Caio, Katharine B Barker, Anna K Childers, Jonathan Coddington, Paul Davis, Kerstin Howe, Warren E Johnson, Duane D McKenna, Jeremy G Wideman, Olga Vinnere Pettersson, Verena Ras, Bernardo F Santos","doi":"10.1093/gigascience/giaf041","DOIUrl":"10.1093/gigascience/giaf041","url":null,"abstract":"<p><p>The Earth BioGenome Project has the extremely ambitious goal of generating, at scale, high-quality reference genomes across the entire Tree of Life. Currently in its first phase, the project is targeting family-level representatives and is progressing rapidly. Here we outline recommended standards and considerations in sample acquisition and processing for those involved in biodiverse reference genome creation. These standards and recommendations will evolve with advances in related processes. Additionally, we discuss the challenges raised by the ambitions for later phases of the project, highlighting topics related to sample collection and processing that require further development.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121479/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144173608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial integration of multi-omics data from serial sections using the novel Multi-Omics Imaging Integration Toolset. 使用新颖的多组学成像集成工具集对来自连续切片的多组学数据进行空间集成。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf035
Maximilian Wess, Maria K Andersen, Elise Midtbust, Juan Carlos Cabellos Guillem, Trond Viset, Øystein Størkersen, Sebastian Krossa, Morten Beck Rye, May-Britt Tessem

Background: Truly understanding the cancer biology of heterogeneous tumors in precision medicine requires capturing the complexities of multiple omics levels and the spatial heterogeneity of cancer tissue. Techniques like mass spectrometry imaging (MSI) and spatial transcriptomics (ST) achieve this by spatially detecting metabolites and RNA but are often applied to serial sections. To fully leverage the advantage of such multi-omics data, the individual measurements need to be integrated into 1 dataset.

Results: We present the Multi-Omics Imaging Integration Toolset (MIIT), a Python framework for integrating spatially resolved multi-omics data. A key component of MIIT's integration is the registration of serial sections for which we developed a nonrigid registration algorithm, GreedyFHist. We validated GreedyFHist on 244 images from fresh-frozen serial sections, achieving state-of-the-art performance. As a proof of concept, we used MIIT to integrate ST and MSI data from prostate tissue samples and assessed the correlation of a gene signature for citrate-spermine secretion derived from ST with metabolic measurements from MSI.

Conclusion: MIIT is a highly accurate, customizable, open-source framework for integrating spatial omics technologies performed on different serial sections.

背景:在精准医学中,真正理解异质性肿瘤的肿瘤生物学,需要捕捉到多组学水平的复杂性和肿瘤组织的空间异质性。质谱成像(MSI)和空间转录组学(ST)等技术通过空间检测代谢物和RNA来实现这一点,但通常应用于序列切片。为了充分利用这种多组学数据的优势,需要将单个测量值集成到一个数据集中。结果:我们提出了多组学成像集成工具集(MIIT),这是一个用于集成空间分辨多组学数据的Python框架。工信部集成的一个关键组成部分是串行部分的配准,为此我们开发了一种非刚性配准算法GreedyFHist。我们在244张来自新鲜冷冻连续切片的图像上验证了GreedyFHist,达到了最先进的性能。为了验证这一概念,我们利用工信部整合了来自前列腺组织样本的ST和MSI数据,并评估了ST衍生的柠檬酸盐-精胺分泌基因标记与MSI代谢测量的相关性。结论:MIIT是一个高度精确的、可定制的、开源的框架,用于整合在不同序列剖面上执行的空间组学技术。
{"title":"Spatial integration of multi-omics data from serial sections using the novel Multi-Omics Imaging Integration Toolset.","authors":"Maximilian Wess, Maria K Andersen, Elise Midtbust, Juan Carlos Cabellos Guillem, Trond Viset, Øystein Størkersen, Sebastian Krossa, Morten Beck Rye, May-Britt Tessem","doi":"10.1093/gigascience/giaf035","DOIUrl":"10.1093/gigascience/giaf035","url":null,"abstract":"<p><strong>Background: </strong>Truly understanding the cancer biology of heterogeneous tumors in precision medicine requires capturing the complexities of multiple omics levels and the spatial heterogeneity of cancer tissue. Techniques like mass spectrometry imaging (MSI) and spatial transcriptomics (ST) achieve this by spatially detecting metabolites and RNA but are often applied to serial sections. To fully leverage the advantage of such multi-omics data, the individual measurements need to be integrated into 1 dataset.</p><p><strong>Results: </strong>We present the Multi-Omics Imaging Integration Toolset (MIIT), a Python framework for integrating spatially resolved multi-omics data. A key component of MIIT's integration is the registration of serial sections for which we developed a nonrigid registration algorithm, GreedyFHist. We validated GreedyFHist on 244 images from fresh-frozen serial sections, achieving state-of-the-art performance. As a proof of concept, we used MIIT to integrate ST and MSI data from prostate tissue samples and assessed the correlation of a gene signature for citrate-spermine secretion derived from ST with metabolic measurements from MSI.</p><p><strong>Conclusion: </strong>MIIT is a highly accurate, customizable, open-source framework for integrating spatial omics technologies performed on different serial sections.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077394/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144076950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1