首页 > 最新文献

GigaScience最新文献

英文 中文
SpaceBF: Spatial coexpression analysis using Bayesian Fused approaches in spatial omics datasets. 空间组学数据集中使用贝叶斯融合方法的空间共表达分析。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-20 DOI: 10.1093/gigascience/giag006
Souvik Seal, Brian Neelon

Advances in spatial omics enable measurement of genes (spatial transcriptomics) and peptides, lipids, or N-glycans (mass spectrometry imaging) across thousands of locations within a tissue. While detecting spatially variable molecules is a well-studied problem, robust methods for identifying spatially varying co-expression between molecule pairs remain limited. We introduce SpaceBF, a Bayesian fused modeling framework that estimates co-expression at both local (location-specific) and global (tissue-wide) levels. SpaceBF enforces spatial smoothness via a fused horseshoe prior on the edges of a predefined spatial adjacency graph, allowing large, edge-specific differences to escape shrinkage while preserving overall structure. In extensive simulations, SpaceBF achieves higher specificity and power than commonly used methods that leverage geospatial metrics, including bivariate Moran's I and Lee's L. We also benchmark the proposed prior against standard alternatives, such as intrinsic conditional autoregressive (ICAR) and Matérn priors. Applied to spatial transcriptomics and proteomics datasets, SpaceBF reveals cancer-relevant molecular interactions and patterns of cell-cell communication (e.g., ligand-receptor signaling), demonstrating its utility for principled, uncertainty-aware co-expression analysis of spatial omics data.

空间组学的进步使基因(空间转录组学)和肽、脂质或n -聚糖(质谱成像)在组织内数千个位置的测量成为可能。虽然检测空间可变分子是一个研究得很好的问题,但识别分子对之间空间变化共表达的可靠方法仍然有限。我们介绍SpaceBF,这是一个贝叶斯融合建模框架,可以在局部(特定位置)和全局(组织范围)级别估计共表达。SpaceBF通过预先在预定义的空间邻接图的边缘上融合马蹄形来增强空间的平滑性,允许大的、边缘特定的差异在保留整体结构的同时避免收缩。在广泛的模拟中,SpaceBF比利用地理空间度量的常用方法(包括双变量Moran's I和Lee's l)实现了更高的特异性和功率。我们还将提出的先验与标准替代方法(如内在条件自回归(ICAR)和mat先验)进行了基准测试。SpaceBF应用于空间转录组学和蛋白质组学数据集,揭示了癌症相关的分子相互作用和细胞-细胞通信模式(例如配体-受体信号),证明了其在空间组学数据的原则、不确定性感知共表达分析中的实用性。
{"title":"SpaceBF: Spatial coexpression analysis using Bayesian Fused approaches in spatial omics datasets.","authors":"Souvik Seal, Brian Neelon","doi":"10.1093/gigascience/giag006","DOIUrl":"10.1093/gigascience/giag006","url":null,"abstract":"<p><p>Advances in spatial omics enable measurement of genes (spatial transcriptomics) and peptides, lipids, or N-glycans (mass spectrometry imaging) across thousands of locations within a tissue. While detecting spatially variable molecules is a well-studied problem, robust methods for identifying spatially varying co-expression between molecule pairs remain limited. We introduce SpaceBF, a Bayesian fused modeling framework that estimates co-expression at both local (location-specific) and global (tissue-wide) levels. SpaceBF enforces spatial smoothness via a fused horseshoe prior on the edges of a predefined spatial adjacency graph, allowing large, edge-specific differences to escape shrinkage while preserving overall structure. In extensive simulations, SpaceBF achieves higher specificity and power than commonly used methods that leverage geospatial metrics, including bivariate Moran's I and Lee's L. We also benchmark the proposed prior against standard alternatives, such as intrinsic conditional autoregressive (ICAR) and Matérn priors. Applied to spatial transcriptomics and proteomics datasets, SpaceBF reveals cancer-relevant molecular interactions and patterns of cell-cell communication (e.g., ligand-receptor signaling), demonstrating its utility for principled, uncertainty-aware co-expression analysis of spatial omics data.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146009819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A preregistered, open pipeline for early cerebral palsy risk assessment from Infant Videos. 一个预先注册的开放管道,用于婴儿视频的早期脑瘫风险评估。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-20 DOI: 10.1093/gigascience/giag003
Melanie Segado, Laura A Prosser, Andrea F Duncan, Michelle J Johnson, Konrad P Kording

Cerebral Palsy (CP), affecting approximately 1 in 500 children due to abnormal brain development, impacts movement control. Early risk assessment via the General Movements Assessment (GMA) at 3-4 months is highly predictive for CP but relies on trained clinicians. Machine-learning-based approaches for predicting GMA score from video have shown considerable promise, but typically rely on dataset-specific preprocessing, custom feature sets, and manually designed model pipelines, which make external benchmarking more difficult. This, combined with strict privacy constraints on sharing data, makes it challenging to train and evaluate models across datasets, which is important for assessing clinical utility. There is therefore a need to develop approaches that will work across different datasets to enable multi-site dataset aggregation and model training. To address this gap, we developed an end-to-end pipeline that uses off-the-shelf pose estimation, general-purpose feature extraction, and automated machine learning - none of which are tuned to a specific dataset. We applied this approach to a newly generated large dataset of 1053 infants (with approximately 10-12% positive class for adverse GMA outcome, drawn from a high-risk clinical cohort) within a preregistered study design. Model performance was evaluated on a strict "lock-box" test set, which remained untouched during any phase of model development or preprocessing optimization, and only used for evaluation once the final model and pipeline had been preregistered. The developed model achieved moderate predictive accuracy for clinician-assessed GMA scores (Area Under the Receiver Operating Characteristic Curve, ROC-AUC = 0.77; Area Under the Precision-Recall Curve, PR-AUC = 0.41). The moderate accuracy is noteworthy given the 10-12% positive class prevalence, and power-law scaling of ROC-AUC as a function of increasing dataset size. By releasing de-identified feature data and open-source code, and simplifying the training pipeline using AutoML, our work establishes essential groundwork for future robust, globally relevant CP screening tools suitable for low-resource settings.

由于大脑发育异常,脑瘫(CP)影响运动控制,大约每500名儿童中就有1名受到影响。通过一般运动评估(GMA)在3-4个月的早期风险评估对CP有很高的预测性,但依赖于训练有素的临床医生。基于机器学习的方法从视频中预测GMA分数已经显示出相当大的前景,但通常依赖于数据集特定的预处理、自定义特征集和手动设计的模型管道,这使得外部基准测试更加困难。再加上对共享数据的严格隐私限制,使得跨数据集训练和评估模型变得具有挑战性,这对于评估临床效用非常重要。因此,有必要开发跨不同数据集工作的方法,以实现多站点数据集聚合和模型训练。为了解决这一差距,我们开发了一个端到端管道,使用现成的姿态估计、通用特征提取和自动机器学习——这些都没有调整到特定的数据集。我们将该方法应用于预注册研究设计中新生成的1053名婴儿的大型数据集(来自高风险临床队列的不良GMA结果约为10-12%的阳性类别)。模型性能在严格的“锁盒”测试集上进行评估,在模型开发或预处理优化的任何阶段都保持不变,并且只有在最终模型和管道被预注册后才用于评估。所开发的模型对临床医生评估的GMA评分具有中等的预测准确性(受试者工作特征曲线下面积,ROC-AUC = 0.77;精确-召回曲线下面积,PR-AUC = 0.41)。考虑到10-12%的正类流行率,以及ROC-AUC的幂律缩放作为数据集大小增加的函数,中等精度值得注意。通过发布去识别特征数据和开源代码,以及使用AutoML简化训练管道,我们的工作为未来强大的、适用于低资源环境的全球相关CP筛选工具奠定了必要的基础。
{"title":"A preregistered, open pipeline for early cerebral palsy risk assessment from Infant Videos.","authors":"Melanie Segado, Laura A Prosser, Andrea F Duncan, Michelle J Johnson, Konrad P Kording","doi":"10.1093/gigascience/giag003","DOIUrl":"10.1093/gigascience/giag003","url":null,"abstract":"<p><p>Cerebral Palsy (CP), affecting approximately 1 in 500 children due to abnormal brain development, impacts movement control. Early risk assessment via the General Movements Assessment (GMA) at 3-4 months is highly predictive for CP but relies on trained clinicians. Machine-learning-based approaches for predicting GMA score from video have shown considerable promise, but typically rely on dataset-specific preprocessing, custom feature sets, and manually designed model pipelines, which make external benchmarking more difficult. This, combined with strict privacy constraints on sharing data, makes it challenging to train and evaluate models across datasets, which is important for assessing clinical utility. There is therefore a need to develop approaches that will work across different datasets to enable multi-site dataset aggregation and model training. To address this gap, we developed an end-to-end pipeline that uses off-the-shelf pose estimation, general-purpose feature extraction, and automated machine learning - none of which are tuned to a specific dataset. We applied this approach to a newly generated large dataset of 1053 infants (with approximately 10-12% positive class for adverse GMA outcome, drawn from a high-risk clinical cohort) within a preregistered study design. Model performance was evaluated on a strict \"lock-box\" test set, which remained untouched during any phase of model development or preprocessing optimization, and only used for evaluation once the final model and pipeline had been preregistered. The developed model achieved moderate predictive accuracy for clinician-assessed GMA scores (Area Under the Receiver Operating Characteristic Curve, ROC-AUC = 0.77; Area Under the Precision-Recall Curve, PR-AUC = 0.41). The moderate accuracy is noteworthy given the 10-12% positive class prevalence, and power-law scaling of ROC-AUC as a function of increasing dataset size. By releasing de-identified feature data and open-source code, and simplifying the training pipeline using AutoML, our work establishes essential groundwork for future robust, globally relevant CP screening tools suitable for low-resource settings.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146009833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings. 用于在看不见的医疗环境中快速预测抗菌素耐药性的通用机器学习模型。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-19 DOI: 10.1093/gigascience/giaf156
Diane Duroux, Paul P Meyer, Giovanni Visoná, Niko Beerenwinkel

Background: The deployment of machine learning in clinical settings is often hindered by the limited generalizability of the models. Models that perform well during development tend to underperform in new environments, limiting their clinical utility. This issue affects models designed for the rapid identification of antimicrobial resistance, which is essential to guide treatment decisions. Traditional susceptibility tests can take up to three days, whereas integrating MALDI-TOF mass spectrometry with machine learning has the potential to reduce this to one day. However, model performance declines drastically in hospitals or time frames outside the training data.

Results: To improve robustness, we develop advanced feature representations using masked autoencoders (MAE) for MALDI-TOF spectra, and chemical language models and SELF-referencing embedded strings (SELFIES) for antimicrobials. Cross-validated on data from four medical institutions, our models demonstrate improved performance and stability. The MAE and SELFIES encodings increase the area under the precision-recall curve by 4% when evaluated on unseen time periods, while the MAE and Molformer language model encodings improve it by 10% when applied across different hospitals.

Conclusions: These results underscore the value of combining deep learning with chemical and spectral information to build generalizable, high-impact clinical AI.

背景:机器学习在临床环境中的部署经常受到模型有限的泛化性的阻碍。在开发过程中表现良好的模型往往在新环境中表现不佳,从而限制了它们的临床效用。这一问题影响到为快速识别抗微生物药物耐药性而设计的模型,这对指导治疗决策至关重要。传统的敏感性测试可能需要长达三天的时间,而将MALDI-TOF质谱法与机器学习相结合,有可能将这一时间缩短到一天。然而,在医院或训练数据之外的时间框架中,模型性能急剧下降。结果:为了提高鲁棒性,我们开发了先进的特征表示,使用MALDI-TOF光谱的掩模自编码器(MAE),以及抗菌剂的化学语言模型和自引用嵌入字符串(自)。通过对四家医疗机构的数据进行交叉验证,我们的模型显示出更好的性能和稳定性。当对未见过的时间段进行评估时,MAE和自拍编码将精确召回曲线下的面积增加了4%,而MAE和Molformer语言模型编码在不同医院应用时将其提高了10%。结论:这些结果强调了将深度学习与化学和光谱信息相结合,构建可推广的、高影响力的临床人工智能的价值。
{"title":"Generalizable machine learning models for rapid antimicrobial resistance prediction in unseen healthcare settings.","authors":"Diane Duroux, Paul P Meyer, Giovanni Visoná, Niko Beerenwinkel","doi":"10.1093/gigascience/giaf156","DOIUrl":"https://doi.org/10.1093/gigascience/giaf156","url":null,"abstract":"<p><strong>Background: </strong>The deployment of machine learning in clinical settings is often hindered by the limited generalizability of the models. Models that perform well during development tend to underperform in new environments, limiting their clinical utility. This issue affects models designed for the rapid identification of antimicrobial resistance, which is essential to guide treatment decisions. Traditional susceptibility tests can take up to three days, whereas integrating MALDI-TOF mass spectrometry with machine learning has the potential to reduce this to one day. However, model performance declines drastically in hospitals or time frames outside the training data.</p><p><strong>Results: </strong>To improve robustness, we develop advanced feature representations using masked autoencoders (MAE) for MALDI-TOF spectra, and chemical language models and SELF-referencing embedded strings (SELFIES) for antimicrobials. Cross-validated on data from four medical institutions, our models demonstrate improved performance and stability. The MAE and SELFIES encodings increase the area under the precision-recall curve by 4% when evaluated on unseen time periods, while the MAE and Molformer language model encodings improve it by 10% when applied across different hospitals.</p><p><strong>Conclusions: </strong>These results underscore the value of combining deep learning with chemical and spectral information to build generalizable, high-impact clinical AI.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145997116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamont: A comprehensive cross-species comparison of ONT segmentation tools. Dynamont: ONT分割工具的综合跨物种比较。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-19 DOI: 10.1093/gigascience/giag005
Jannes Spangenberg, Christian Höner Zu Siederdissen, Winfried Goettsch, Lennart Köhler, Liz Maria Luke, Kai Papenfort, Manja Marz

Background: Oxford Nanopore Technologies (Oxford Nanopore Technologies (ONT)) sequencing enables direct, long-read sequencing of DNA and RNA, preserving nucleotide modifications. During basecalling, deep neural networks translate raw nanopore signals into nucleotide sequences, internally segmenting the signal to align it with the corresponding bases. This is a challenging task due to uneven motor protein rotation, signal variability, low-quality reads, and the presence of nucleotide modifications. However, the signal to nucleotide assignment is critical for novel downstream signal analysis. Existing tools, such as Tombo Resquiggle, f5c Eventalign, f5c Resquiggle, and Uncalled4, operate after basecalling and rely on event-based segmentation and mapping approaches, that often fail to align low-quality or modified reads and lack confidence estimates for segmentation accuracy.

Results: Here, we present a large-scale comparative study in which 5 segmentation tools, including our novel tool Dynamont, are applied to 16 ONT-sequenced data sets spanning different kingdoms of life. Overall, we segmented 160 000 reads and evaluated the tools performance on a combination of 12 signal and downstream assembly metrics. Our study is accompanied by a comprehensive and extensible Supplement that summarizes all data sets, execution instructions, and evaluation results. We score the segmentation results using an aggregated metric score, created from all our analysed metrics.

Conclusions: No tool delivered the best results for all data sets. We recommend a careful choice and normalization of evaluation metrics to select the best segmentation tool as a critical step in the process of ONT signal segmentation. Across nearly all RNA data sets, Dynamont outperforms other segmentation tools in terms of aggregated metric scores. For DNA data sets, however, the performance is more variable, with mixed results observed across tools.

背景:Oxford Nanopore Technologies (Oxford Nanopore Technologies (ONT))测序能够实现DNA和RNA的直接、长读测序,并保留核苷酸修饰。在碱基调用过程中,深度神经网络将原始纳米孔信号翻译成核苷酸序列,并在内部对信号进行分割,使其与相应的碱基对齐。由于运动蛋白旋转不均匀、信号可变性、低质量读取和核苷酸修饰的存在,这是一项具有挑战性的任务。然而,信号到核苷酸的分配对于新的下游信号分析至关重要。现有的工具,如Tombo Resquiggle, f5c Eventalign, f5c Resquiggle和Uncalled4,在基调用之后操作,依赖于基于事件的分割和映射方法,这些方法通常无法对齐低质量或修改的读取,并且缺乏对分割精度的置信度估计。结果:在这里,我们提出了一项大规模的比较研究,其中包括我们的新工具Dynamont在内的5种分割工具应用于跨越不同生命领域的16个ont测序数据集。总的来说,我们分割了160000个读数,并在12个信号和下游装配指标的组合上评估了工具的性能。我们的研究附有一个全面和可扩展的补充,总结了所有数据集,执行说明和评估结果。我们使用从所有分析指标创建的聚合指标得分对分割结果进行评分。结论:没有一种工具能对所有数据集提供最好的结果。我们建议仔细选择和规范化评估指标,以选择最佳的分割工具,作为ONT信号分割过程中的关键步骤。在几乎所有RNA数据集中,Dynamont在聚合度量分数方面优于其他分割工具。然而,对于DNA数据集,性能变化更大,不同工具观察到的结果好坏参半。
{"title":"Dynamont: A comprehensive cross-species comparison of ONT segmentation tools.","authors":"Jannes Spangenberg, Christian Höner Zu Siederdissen, Winfried Goettsch, Lennart Köhler, Liz Maria Luke, Kai Papenfort, Manja Marz","doi":"10.1093/gigascience/giag005","DOIUrl":"https://doi.org/10.1093/gigascience/giag005","url":null,"abstract":"<p><strong>Background: </strong>Oxford Nanopore Technologies (Oxford Nanopore Technologies (ONT)) sequencing enables direct, long-read sequencing of DNA and RNA, preserving nucleotide modifications. During basecalling, deep neural networks translate raw nanopore signals into nucleotide sequences, internally segmenting the signal to align it with the corresponding bases. This is a challenging task due to uneven motor protein rotation, signal variability, low-quality reads, and the presence of nucleotide modifications. However, the signal to nucleotide assignment is critical for novel downstream signal analysis. Existing tools, such as Tombo Resquiggle, f5c Eventalign, f5c Resquiggle, and Uncalled4, operate after basecalling and rely on event-based segmentation and mapping approaches, that often fail to align low-quality or modified reads and lack confidence estimates for segmentation accuracy.</p><p><strong>Results: </strong>Here, we present a large-scale comparative study in which 5 segmentation tools, including our novel tool Dynamont, are applied to 16 ONT-sequenced data sets spanning different kingdoms of life. Overall, we segmented 160 000 reads and evaluated the tools performance on a combination of 12 signal and downstream assembly metrics. Our study is accompanied by a comprehensive and extensible Supplement that summarizes all data sets, execution instructions, and evaluation results. We score the segmentation results using an aggregated metric score, created from all our analysed metrics.</p><p><strong>Conclusions: </strong>No tool delivered the best results for all data sets. We recommend a careful choice and normalization of evaluation metrics to select the best segmentation tool as a critical step in the process of ONT signal segmentation. Across nearly all RNA data sets, Dynamont outperforms other segmentation tools in terms of aggregated metric scores. For DNA data sets, however, the performance is more variable, with mixed results observed across tools.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145997919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cord blood DNA methylation and cell type composition are not significantly associated with severe preeclampsia, after cell type and clinical covariate adjustment. 经过细胞类型和临床协变量调整后,脐带血DNA甲基化和细胞类型组成与严重子痫前期无显著相关。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-16 DOI: 10.1093/gigascience/giag002
Xiaotong Yang, Wenting Liu, Zhixin Mao, Yuheng Du, Cameron Lassiter, Fadhl M AlAkwaa, Paula A Benny, Lana X Garmire

Background: Preeclampsia is a severe pregnancy complication that threatens maternal and neonatal health and well-being. Previous studies on epigenome-wide association analysis (EWAS) of preeclampsia produced inconsistent results in cord blood tissues, and one possible explanation is their failure to rigorously adjust for cell proportions, gestational age, or other necessary variables.

Methods: Here, we calculated the DNA methylation change in cord blood from newborns affected by preeclampsia, using a multi-ethnic cohort from the Hawaii population (24 cases, 38 controls). We comprehensively adjusted for variables such as maternal age, body mass index (BMI), parity, and estimated the cell proportions. We also re-analyzed two previous datasets with adjustments to estimated cell proportions and conducted a pooled analysis by merging all three datasets together to increase the statistical power (58 cases, 71 controls). Lastly, we include idiopathic preterm (preterm delivery with no known reasons) cord blood samples (n=11) to disentangle the effect of severe preeclampsia and small gestational age.

Results: We showed that after adjusting cell type proportions and patient clinical characteristics, most of the so-called statistically significant CpG methylation changes associated with severe preeclampsia disappeared in our own data, two public datasets, and the pooled analysis combining all three datasets. This result still holds after including idiopathic preterm samples in the control group. Rather, we found that gestation progression is accompanied by statistically significant proportion changes in several cell types, such as granulocytes, nRBCs, CD8Ts, and B cells, which contribute to most DNA methylation differences between case and control groups. Preeclampsia has interactions on cell proportion changes in granulocytes, monocytes, and nRBCs.

Conclusions: In summary, our study shows that the previously reported differentially methylated patterns in cord blood are actually artifacts due to not properly adjusting for cell type heterogeneity, gestational age, and clinical covariates. Severe preeclampsia is not associated with statistically significant DNA methylation changes but changes in cell proportion. This finding alerts to the scientific rigor needed in EWAS.

背景:子痫前期是一种严重的妊娠并发症,威胁孕产妇和新生儿的健康和福祉。先前关于子痫前期表观基因组关联分析(EWAS)的研究在脐带血组织中产生了不一致的结果,一个可能的解释是他们没有严格调整细胞比例、胎龄或其他必要的变量。方法:在这里,我们使用来自夏威夷人群的多种族队列(24例,38例对照)计算了受先兆子痫影响的新生儿脐带血中的DNA甲基化变化。我们综合调整了诸如母亲年龄、身体质量指数(BMI)、胎次等变量,并估计了细胞比例。我们还重新分析了两个先前的数据集,调整了估计的细胞比例,并将所有三个数据集合并在一起进行了汇总分析,以提高统计能力(58例,71例对照)。最后,我们纳入了特发性早产(没有已知原因的早产)脐带血样本(n=11),以解开严重先兆子痫和小胎龄的影响。结果:我们发现,在调整细胞类型比例和患者临床特征后,在我们自己的数据、两个公共数据集以及结合所有三个数据集的汇总分析中,大多数与严重子痫前期相关的所谓具有统计学意义的CpG甲基化变化消失了。在将特发性早产样本纳入对照组后,这一结果仍然成立。相反,我们发现妊娠进程伴随着几种细胞类型(如粒细胞、nrbc、cd8t和B细胞)的统计学显著比例变化,这是病例组和对照组之间DNA甲基化差异的主要原因。子痫前期与粒细胞、单核细胞和nrbc的细胞比例变化有相互作用。结论:总之,我们的研究表明,之前报道的脐带血差异甲基化模式实际上是由于没有适当调整细胞类型异质性、胎龄和临床协变量而导致的人工产物。重度子痫前期与DNA甲基化变化无关,但与细胞比例变化相关。这一发现提醒了EWAS所需的科学严谨性。
{"title":"Cord blood DNA methylation and cell type composition are not significantly associated with severe preeclampsia, after cell type and clinical covariate adjustment.","authors":"Xiaotong Yang, Wenting Liu, Zhixin Mao, Yuheng Du, Cameron Lassiter, Fadhl M AlAkwaa, Paula A Benny, Lana X Garmire","doi":"10.1093/gigascience/giag002","DOIUrl":"https://doi.org/10.1093/gigascience/giag002","url":null,"abstract":"<p><strong>Background: </strong>Preeclampsia is a severe pregnancy complication that threatens maternal and neonatal health and well-being. Previous studies on epigenome-wide association analysis (EWAS) of preeclampsia produced inconsistent results in cord blood tissues, and one possible explanation is their failure to rigorously adjust for cell proportions, gestational age, or other necessary variables.</p><p><strong>Methods: </strong>Here, we calculated the DNA methylation change in cord blood from newborns affected by preeclampsia, using a multi-ethnic cohort from the Hawaii population (24 cases, 38 controls). We comprehensively adjusted for variables such as maternal age, body mass index (BMI), parity, and estimated the cell proportions. We also re-analyzed two previous datasets with adjustments to estimated cell proportions and conducted a pooled analysis by merging all three datasets together to increase the statistical power (58 cases, 71 controls). Lastly, we include idiopathic preterm (preterm delivery with no known reasons) cord blood samples (n=11) to disentangle the effect of severe preeclampsia and small gestational age.</p><p><strong>Results: </strong>We showed that after adjusting cell type proportions and patient clinical characteristics, most of the so-called statistically significant CpG methylation changes associated with severe preeclampsia disappeared in our own data, two public datasets, and the pooled analysis combining all three datasets. This result still holds after including idiopathic preterm samples in the control group. Rather, we found that gestation progression is accompanied by statistically significant proportion changes in several cell types, such as granulocytes, nRBCs, CD8Ts, and B cells, which contribute to most DNA methylation differences between case and control groups. Preeclampsia has interactions on cell proportion changes in granulocytes, monocytes, and nRBCs.</p><p><strong>Conclusions: </strong>In summary, our study shows that the previously reported differentially methylated patterns in cord blood are actually artifacts due to not properly adjusting for cell type heterogeneity, gestational age, and clinical covariates. Severe preeclampsia is not associated with statistically significant DNA methylation changes but changes in cell proportion. This finding alerts to the scientific rigor needed in EWAS.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome Assembly of Three Shrub Mangroves in the Genus Acanthus Reveals Two Polyploidy Events and Expansion of Genes Linked to Root Adaptation in Coastal Habitats. 三种灌木红树棘属的基因组组装揭示了两个多倍体事件和与沿海生境根适应相关的基因扩展。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2026-01-02 DOI: 10.1093/gigascience/giaf162
Wanapinun Nawae, Chaiwat Naktang, Peeraphat Paenpong, Duangjai Sangsrakru, Thippawan Yoocha, Sonicha U-Thoomporn, Wasitthee Kongkachana, Poonsri Wanthongchai, Suchart Yamprasai, Chonlawit Samart, Sithichoke Tangphatsornruang, Wirulda Pootakham

The genomes of mangrove Acanthus species have not been reported, despite their ecological and medicinal importance. Using PacBio and Hi-C data, we generated a chromosome-scale genome assembly of the recently identified allotetraploid species Acanthus tetraploideus (2n = 96). The genomes of diploid progenitors, Acanthus ilicifolius and Acanthus ebracteatus (2n = 48), were assembled from stLFR data. We identified an Acanthus-specific whole-genome duplication (WGD) event that occurred ∼43 million years ago (Mya). Ancestral karyotype reconstruction revealed a shift in haploid chromosome number from 11 to 24 in the progenitors, following the WGD and subsequent chromosomal fission events. The hybridization that formed A. tetraploideus was estimated to have occurred 0.7-1.8 Mya. Phylogenomic and synteny analyses clearly showed that A. tetraploideus inherited subgenomes SG1 and SG2 from A. ilicifolius and A. ebracteatus, respectively. Gene structure and retention analyses revealed a smaller and more structurally flexible genome in A. ebracteatus and SG2 compared with A. ilicifolius and SG1. Gene family and machine learning analyses identified expansions in protein families related to Casparian strip formation, root development, and salt stress response. Several of these families were expanded in A. ilicifolius and SG1 but contracted in A. ebracteatus and SG2. These genomic patterns might have contributed to the establishment of A. tetraploideus within the habitat of A. ebracteatus. For all three species, population analysis revealed clear genetic divergence between samples from the eastern and western coasts of Thailand. This study provides valuable genomic resources and insights into the evolutionary adaptation of plants to intertidal environments.

尽管红树林棘虫具有重要的生态和药用价值,但其基因组尚未被报道。利用PacBio和Hi-C数据,我们对最近发现的异源四倍体物种Acanthus tetraploideus (2n = 96)进行了染色体尺度的基因组组装。利用stLFR数据,对二倍体祖棘棘(Acanthus ilicifolius)和棘棘(Acanthus ebracteatus, 2n = 48)的基因组进行了组装。我们确定了大约4300万年前(Mya)发生的棘类特异性全基因组复制(WGD)事件。祖先核型重建显示,在WGD和随后的染色体裂变事件之后,祖先的单倍体染色体数从11转移到24。据估计,形成四倍古猿的杂交发生在0.7-1.8亿年前。系统基因组分析和同源性分析表明,四倍体拟南猿分别继承了拟南猿ilicifolius和拟南猿ebracteatus的SG1和SG2亚基因组。基因结构和保留分析表明,与a . ilicifolius和SG1相比,a . ebracteatus和SG2的基因组更小,结构更灵活。基因家族和机器学习分析确定了与Casparian条带形成、根系发育和盐胁迫反应相关的蛋白质家族的扩展。其中几个科在白杨和SG1中扩展,而在白杨和SG2中收缩。这些基因组模式可能促成了四倍猿人在棘足猿人栖息地的建立。对于这三个物种,种群分析显示了泰国东海岸和西海岸样本之间明显的遗传差异。这项研究为植物对潮间带环境的进化适应提供了宝贵的基因组资源和见解。
{"title":"Genome Assembly of Three Shrub Mangroves in the Genus Acanthus Reveals Two Polyploidy Events and Expansion of Genes Linked to Root Adaptation in Coastal Habitats.","authors":"Wanapinun Nawae, Chaiwat Naktang, Peeraphat Paenpong, Duangjai Sangsrakru, Thippawan Yoocha, Sonicha U-Thoomporn, Wasitthee Kongkachana, Poonsri Wanthongchai, Suchart Yamprasai, Chonlawit Samart, Sithichoke Tangphatsornruang, Wirulda Pootakham","doi":"10.1093/gigascience/giaf162","DOIUrl":"https://doi.org/10.1093/gigascience/giaf162","url":null,"abstract":"<p><p>The genomes of mangrove Acanthus species have not been reported, despite their ecological and medicinal importance. Using PacBio and Hi-C data, we generated a chromosome-scale genome assembly of the recently identified allotetraploid species Acanthus tetraploideus (2n = 96). The genomes of diploid progenitors, Acanthus ilicifolius and Acanthus ebracteatus (2n = 48), were assembled from stLFR data. We identified an Acanthus-specific whole-genome duplication (WGD) event that occurred ∼43 million years ago (Mya). Ancestral karyotype reconstruction revealed a shift in haploid chromosome number from 11 to 24 in the progenitors, following the WGD and subsequent chromosomal fission events. The hybridization that formed A. tetraploideus was estimated to have occurred 0.7-1.8 Mya. Phylogenomic and synteny analyses clearly showed that A. tetraploideus inherited subgenomes SG1 and SG2 from A. ilicifolius and A. ebracteatus, respectively. Gene structure and retention analyses revealed a smaller and more structurally flexible genome in A. ebracteatus and SG2 compared with A. ilicifolius and SG1. Gene family and machine learning analyses identified expansions in protein families related to Casparian strip formation, root development, and salt stress response. Several of these families were expanded in A. ilicifolius and SG1 but contracted in A. ebracteatus and SG2. These genomic patterns might have contributed to the establishment of A. tetraploideus within the habitat of A. ebracteatus. For all three species, population analysis revealed clear genetic divergence between samples from the eastern and western coasts of Thailand. This study provides valuable genomic resources and insights into the evolutionary adaptation of plants to intertidal environments.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145888972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An evaluation of computational methods for reconstruction of human viral DNA genomes. 人类病毒DNA基因组重建计算方法的评价。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-26 DOI: 10.1093/gigascience/giaf159
Maria J P Sousa, Mari Toppinen, Lari Pyöriä, Klaus Hedman, Antti Sajantila, Maria F Perdomo, Diogo Pratas

Background: The increasing availability of viral sequencing data has led to the emergence of many optimized viral genome reconstruction tools. Given that the number of new tools is steadily increasing, it is complex to identify functional and optimized tools that offer an equilibrium between accuracy and computational resources as well as the features that each tool provides.

Results: In this paper, we surveyed open-source computational tools (including pipelines) used for human viral genome reconstruction, identifying specific characteristics, features, similarities, and dissimilarities between these tools. For quantitative comparison, we created an open-source reconstruction benchmark based on viral data. The benchmark was executed using both synthetic and real datasets. With the former, we evaluated the effects to the reconstruction process of using different human DNA viruses with simulated mutation rates, contamination and mitochondrial DNA inclusion, and various coverage depths. Each reconstruction program was also evaluated using real datasets, demonstrating their performance in real-life scenarios. The evaluation measures include the identity, a Normalized Compression Semi-Distance, and the Normalized Relative Compression between the genomes before and after reconstruction, as well as metrics regarding the length of the genomes reconstructed, computational time and resources spent by each tool.

Conclusions: We provide a fully reproducible benchmark capable of evaluating currently available reconstruction programs. The benchmark is open-source and freely available at https://github.com/viromelab/HVRS. Additionally, based on the knowledge obtained from the systematic review and the benchmark, we provide some program recommendations for different reconstruction scenarios.

背景:病毒测序数据的不断增加导致了许多优化的病毒基因组重建工具的出现。鉴于新工具的数量正在稳步增加,要确定在准确性和计算资源以及每个工具提供的特性之间提供平衡的功能和优化工具是很复杂的。结果:在本文中,我们调查了用于人类病毒基因组重建的开源计算工具(包括管道),确定了这些工具之间的具体特征、特征、相似性和差异性。为了进行定量比较,我们基于病毒数据创建了一个开源重建基准。基准测试使用合成数据集和真实数据集执行。对于前者,我们评估了不同人类DNA病毒在模拟突变率、污染和线粒体DNA包涵以及不同覆盖深度下对重建过程的影响。每个重建程序还使用真实数据集进行评估,展示了它们在现实场景中的性能。评估指标包括重构前后基因组的同一性、归一化压缩半距离和归一化相对压缩,以及各工具重构基因组长度、计算时间和资源消耗等指标。结论:我们提供了一个完全可重复的基准,能够评估目前可用的重建方案。该基准是开源的,可以在https://github.com/viromelab/HVRS上免费获得。此外,基于系统综述和基准测试所获得的知识,我们针对不同的重建场景提出了一些方案建议。
{"title":"An evaluation of computational methods for reconstruction of human viral DNA genomes.","authors":"Maria J P Sousa, Mari Toppinen, Lari Pyöriä, Klaus Hedman, Antti Sajantila, Maria F Perdomo, Diogo Pratas","doi":"10.1093/gigascience/giaf159","DOIUrl":"https://doi.org/10.1093/gigascience/giaf159","url":null,"abstract":"<p><strong>Background: </strong>The increasing availability of viral sequencing data has led to the emergence of many optimized viral genome reconstruction tools. Given that the number of new tools is steadily increasing, it is complex to identify functional and optimized tools that offer an equilibrium between accuracy and computational resources as well as the features that each tool provides.</p><p><strong>Results: </strong>In this paper, we surveyed open-source computational tools (including pipelines) used for human viral genome reconstruction, identifying specific characteristics, features, similarities, and dissimilarities between these tools. For quantitative comparison, we created an open-source reconstruction benchmark based on viral data. The benchmark was executed using both synthetic and real datasets. With the former, we evaluated the effects to the reconstruction process of using different human DNA viruses with simulated mutation rates, contamination and mitochondrial DNA inclusion, and various coverage depths. Each reconstruction program was also evaluated using real datasets, demonstrating their performance in real-life scenarios. The evaluation measures include the identity, a Normalized Compression Semi-Distance, and the Normalized Relative Compression between the genomes before and after reconstruction, as well as metrics regarding the length of the genomes reconstructed, computational time and resources spent by each tool.</p><p><strong>Conclusions: </strong>We provide a fully reproducible benchmark capable of evaluating currently available reconstruction programs. The benchmark is open-source and freely available at https://github.com/viromelab/HVRS. Additionally, based on the knowledge obtained from the systematic review and the benchmark, we provide some program recommendations for different reconstruction scenarios.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145843567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Haplotype-resolved chromosome-level genome assemblies of four Diamesa species reveal the genetic basis of cold tolerance and high-altitude adaptations in arctic chironomids. 四种双翅目植物的单倍型染色体水平基因组组装揭示了北极chironomids耐寒性和高海拔适应性的遗传基础。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-22 DOI: 10.1093/gigascience/giaf160
Sarah L F Martin, Renato La Torre, Bram Danneels, Ave Tooming-Klunderud, Morten Skage, Spyridon Kollias, Ole Kristian Tørresen, Mohsen Falahati Anbaran, Elisabeth Stur, Kjetill S Jakobsen, Michael D Martin, Torbjørn Ekrem

Background: Arctic and alpine insects experience extreme environmental stressors, yet the genomic basis of their adaptation is poorly understood. Diamesa midges (Diptera: Chironomidae) are cold-adapted insects inhabiting glacial and high-altitude freshwater ecosystems, but no chromosome-level genomes have been available to date.

Findings: We present the first haplotype-resolved, chromosome-level genomes for four Diamesa species (D. hyperborea, D. lindrothi, D. serratosioi and D. tonsa), assembled using PacBio HiFi sequencing and Hi-C scaffolding. The assemblies show high completeness and k-mer representation. Phylogenomic analyses place Diamesinae as sister to other Chironomidae except Podonominae, and comparisons suggest introgression between the distinct species D. hyperborea and D. tonsa. Comparative genomic analyses across 20 Diptera species identified significant gene family contractions in Diamesa related to oxygen transport and metabolism, consistent with adaptation to high-altitude, low-oxygen environments. Expansions were observed in histone-related and Toll-like receptor gene families, suggesting roles in chromatin remodeling and immune regulation under cold stress. A glucose dehydrogenase gene family was significantly expanded across all cold-adapted species studied, implicating it in cryoprotectant synthesis and oxidative stress mitigation. Diamesa exhibited the largest gene family contraction at any phylogenetic node, with limited overlap in expansions with other cold-adapted Diptera, indicating lineage-specific adaptation.

Conclusions: Our findings support the hypothesis that genome size condensation and selective gene family changes underpin survival in cold environments. These new genome assemblies provide a valuable resource for studying adaptation, speciation, and conservation in cold-specialist insects. Future integration of gene expression and population genomics will further clarify the evolutionary resilience of Diamesa in a warming world.

背景:北极和高山昆虫经历极端的环境压力,但其适应的基因组基础知之甚少。蠓(双翅目:蠓科)是一种适应寒冷环境的昆虫,生活在冰川和高海拔的淡水生态系统中,但迄今为止还没有染色体水平的基因组。研究结果:利用PacBio HiFi测序和Hi-C脚手架,我们首次获得了四种Diamesa物种(D. hyperborea, D. lindrothi, D. serratosioi和D. tonsa)的单倍型染色体水平基因组。该组合具有较高的完备性和k-mer表征性。系统基因组学分析表明,除足尾虫科外,蝶尾虫科是其他手尾虫科的姐妹,并且比较表明在不同的物种d.p orborea和d.t onsa之间存在渐渗现象。对20个双翅目物种的比较基因组分析发现,双翅目昆虫与氧运输和代谢相关的基因家族显著收缩,这与对高海拔、低氧环境的适应一致。在组蛋白相关和toll样受体基因家族中观察到扩增,提示在冷胁迫下染色质重塑和免疫调节中起作用。葡萄糖脱氢酶基因家族在所有研究的冷适应物种中显著扩展,暗示其与低温保护剂合成和氧化应激缓解有关。在任何系统发育节点上,双翅目蝶的基因家族收缩最大,与其他冷适应双翅目的扩展重叠有限,表明了谱系特异性适应。结论:我们的研究结果支持了基因组大小凝聚和选择性基因家族变化是寒冷环境下生存的基础的假设。这些新的基因组组合为研究嗜冷昆虫的适应、物种形成和保护提供了宝贵的资源。基因表达和种群基因组学的未来整合将进一步阐明Diamesa在变暖世界中的进化弹性。
{"title":"Haplotype-resolved chromosome-level genome assemblies of four Diamesa species reveal the genetic basis of cold tolerance and high-altitude adaptations in arctic chironomids.","authors":"Sarah L F Martin, Renato La Torre, Bram Danneels, Ave Tooming-Klunderud, Morten Skage, Spyridon Kollias, Ole Kristian Tørresen, Mohsen Falahati Anbaran, Elisabeth Stur, Kjetill S Jakobsen, Michael D Martin, Torbjørn Ekrem","doi":"10.1093/gigascience/giaf160","DOIUrl":"https://doi.org/10.1093/gigascience/giaf160","url":null,"abstract":"<p><strong>Background: </strong>Arctic and alpine insects experience extreme environmental stressors, yet the genomic basis of their adaptation is poorly understood. Diamesa midges (Diptera: Chironomidae) are cold-adapted insects inhabiting glacial and high-altitude freshwater ecosystems, but no chromosome-level genomes have been available to date.</p><p><strong>Findings: </strong>We present the first haplotype-resolved, chromosome-level genomes for four Diamesa species (D. hyperborea, D. lindrothi, D. serratosioi and D. tonsa), assembled using PacBio HiFi sequencing and Hi-C scaffolding. The assemblies show high completeness and k-mer representation. Phylogenomic analyses place Diamesinae as sister to other Chironomidae except Podonominae, and comparisons suggest introgression between the distinct species D. hyperborea and D. tonsa. Comparative genomic analyses across 20 Diptera species identified significant gene family contractions in Diamesa related to oxygen transport and metabolism, consistent with adaptation to high-altitude, low-oxygen environments. Expansions were observed in histone-related and Toll-like receptor gene families, suggesting roles in chromatin remodeling and immune regulation under cold stress. A glucose dehydrogenase gene family was significantly expanded across all cold-adapted species studied, implicating it in cryoprotectant synthesis and oxidative stress mitigation. Diamesa exhibited the largest gene family contraction at any phylogenetic node, with limited overlap in expansions with other cold-adapted Diptera, indicating lineage-specific adaptation.</p><p><strong>Conclusions: </strong>Our findings support the hypothesis that genome size condensation and selective gene family changes underpin survival in cold environments. These new genome assemblies provide a valuable resource for studying adaptation, speciation, and conservation in cold-specialist insects. Future integration of gene expression and population genomics will further clarify the evolutionary resilience of Diamesa in a warming world.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145804052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open RGB Imaging Workflow for Morphological and Morphometric Analysis of Fruits using Deep Learning: A Case Study on Almonds. 基于深度学习的开放式RGB成像流程用于水果形态和形态计量学分析:以杏仁为例
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-19 DOI: 10.1093/gigascience/giaf157
Jorge Mas-Gómez, Manuel Rubio, Federico Dicenta, Pedro José Martínez-García

Background: High-throughput phenotyping is addressing the current bottleneck in phenotyping within breeding programs. Imaging tools are becoming the primary resource for improving the efficiency of phenotyping processes and providing large datasets for genomic selection approaches. The advent of AI brings new advantages by enhancing phenotyping methods using imaging, making them more accessible to breeding programs. In this context, we have developed an open Python workflow for analyzing morphology, colour and morphometric traits using AI, which can be applied to fruits and other plant organs.

Results: The workflow was implemented in almond (Prunus dulcis (Mill.) D. A. Webb), a species where breeding efficiency is critical due to its long breeding cycle. Over 25,000 kernels, more than 20,000 nuts, and over 600 individuals were phenotyped, making this the largest morphological study conducted in almond so far. The best segmentation and reconstruction approaches achieved error rates below 1%. Weight and area variables enabled accurate estimation of kernel thickness, with a root mean squared error (RMSE) of 0.47. Fifty-five heritable morphological, morphometric and colour traits were identified, highlighting their potential as target traits in breeding programs.

Conclusion: The proposed workflow demonstrated robust performance across diverse datasets and being effective with limited training data for fine-tuning. Its compatibility with the output of AI-based labelling tools allows users to fully leverage the advantages of these technologies-reducing manual effort, accelerating dataset preparation, and streamlining the fine-tuning process of segmentation models. This flexibility enhances the scalability and practical applicability of the workflow in real-world phenotyping scenarios, especially in the context of breeding programs.

背景:高通量表型分析正在解决育种计划中表型分析的当前瓶颈。成像工具正在成为提高表型过程效率和为基因组选择方法提供大型数据集的主要资源。人工智能的出现带来了新的优势,它增强了使用成像的表型分析方法,使它们更容易用于育种计划。在此背景下,我们开发了一个开放的Python工作流,用于使用AI分析形态,颜色和形态特征,可应用于水果和其他植物器官。结果:该流程可在扁桃(Prunus dulcis, Mill.)中实现。D. a . Webb),由于其繁殖周期长,繁殖效率至关重要。超过25,000粒,20,000多个坚果,600多个个体进行了表型分析,这是迄今为止对杏仁进行的最大规模的形态学研究。最好的分割和重建方法使错误率低于1%。权重和面积变量能够准确估计核厚,均方根误差(RMSE)为0.47。鉴定了55个可遗传的形态、形态计量和颜色性状,突出了它们作为育种目标性状的潜力。结论:所提出的工作流在不同的数据集上表现出稳健的性能,并且在有限的训练数据上进行微调是有效的。它与基于人工智能的标签工具的输出的兼容性允许用户充分利用这些技术的优势-减少人工劳动,加速数据集准备,并简化分割模型的微调过程。这种灵活性增强了工作流程在现实世界表型场景中的可扩展性和实际适用性,特别是在育种计划的背景下。
{"title":"Open RGB Imaging Workflow for Morphological and Morphometric Analysis of Fruits using Deep Learning: A Case Study on Almonds.","authors":"Jorge Mas-Gómez, Manuel Rubio, Federico Dicenta, Pedro José Martínez-García","doi":"10.1093/gigascience/giaf157","DOIUrl":"https://doi.org/10.1093/gigascience/giaf157","url":null,"abstract":"<p><strong>Background: </strong>High-throughput phenotyping is addressing the current bottleneck in phenotyping within breeding programs. Imaging tools are becoming the primary resource for improving the efficiency of phenotyping processes and providing large datasets for genomic selection approaches. The advent of AI brings new advantages by enhancing phenotyping methods using imaging, making them more accessible to breeding programs. In this context, we have developed an open Python workflow for analyzing morphology, colour and morphometric traits using AI, which can be applied to fruits and other plant organs.</p><p><strong>Results: </strong>The workflow was implemented in almond (Prunus dulcis (Mill.) D. A. Webb), a species where breeding efficiency is critical due to its long breeding cycle. Over 25,000 kernels, more than 20,000 nuts, and over 600 individuals were phenotyped, making this the largest morphological study conducted in almond so far. The best segmentation and reconstruction approaches achieved error rates below 1%. Weight and area variables enabled accurate estimation of kernel thickness, with a root mean squared error (RMSE) of 0.47. Fifty-five heritable morphological, morphometric and colour traits were identified, highlighting their potential as target traits in breeding programs.</p><p><strong>Conclusion: </strong>The proposed workflow demonstrated robust performance across diverse datasets and being effective with limited training data for fine-tuning. Its compatibility with the output of AI-based labelling tools allows users to fully leverage the advantages of these technologies-reducing manual effort, accelerating dataset preparation, and streamlining the fine-tuning process of segmentation models. This flexibility enhances the scalability and practical applicability of the workflow in real-world phenotyping scenarios, especially in the context of breeding programs.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145793888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The genomes of five mantises provide insights into sex chromosome evolution and Mantodea phylogeny clarification. 五种螳螂的基因组提供了性染色体进化和螳螂科系统发育澄清的见解。
IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-12-18 DOI: 10.1093/gigascience/giaf158
Hangwei Liu, Lihong Lei, Fan Jiang, Bo Zhang, Hengchao Wang, Yutong Zhang, Hanbo Zhao, Guirong Wang, Wei Fan

Background: Praying mantises, members of the order Mantodea, play important roles in agriculture, medicine, bionics, and entertainment. However, the scarcity of genomic resources has hindered extensive studies on mantis evolution and behaviour.

Results: Here, we present the chromosome-scale reference genomes of five mantis species: the European mantis (Mantis religiosa), Chinese mantis (Tenodera sinensis), triangle dead leaf mantis (Deroplatys truncata), orchid mantis (Hymenopus coronatus), and metallic mantis (Metallyticus violacea). The assembled genome sizes range ∼2.3-4.2 Gb, with contig N50 size 1-109 Mb and 85-99% of sequence anchored to chromosomes. The annotated protein-coding gene number ranges 17,804-19,017, with BUSCO complete rate 96.7-98.4%. We found that transposable element expansion is the major force governing genome size in Mantodea, and suggest that translocations between the X chromosome and an autosome have occurred in the lineage of the family Mantidae. In addition, we found the lineage of M. violacea has accumulated fewer substitutions than the lineages of other mantises. Furthermore, our genome-wide analyses showed that D. truncata is sister to H. coronatus than M. religiosa and T. sinensis, helps resolve the phylogenic controversies of Deroplatys genus.

Conclusions: The high-quality genome assemblies of the five mantises provide a valuable resource for evolution studies of Mantodea and genetic improvement and breeding of beneficial biological control agents.

背景:螳螂是螳螂目的一员,在农业、医学、仿生学和娱乐中发挥着重要作用。然而,基因组资源的匮乏阻碍了对螳螂进化和行为的广泛研究。结果:本研究获得了欧洲螳螂(mantis religiosa)、中国螳螂(Tenodera sinensis)、三角死叶螳螂(Deroplatys truncata)、兰花螳螂(hymenus coronatus)和金属螳螂(Metallyticus violacea) 5种螳螂的染色体尺度参考基因组。组装的基因组大小范围为~ 2.3-4.2 Gb,其中N50序列大小为1-109 Mb, 85% -99%的序列锚定在染色体上。注释的蛋白编码基因数为17,804 ~ 19,017个,BUSCO完成率为96.7 ~ 98.4%。我们发现,转座因子扩展是控制螳螂基因组大小的主要力量,并表明X染色体和常染色体之间的易位发生在螳螂家族的谱系中。此外,我们还发现紫毛螳螂的谱系比其他种类的螳螂积累了更少的替换。此外,我们的全基因组分析表明,与宗教支原体和中华支原体相比,truncata是冠状支原体的姐妹,这有助于解决Deroplatys属的系统发育争议。结论:高质量的五种螳螂基因组组合为螳螂的进化研究和有益生物防治剂的遗传改良和选育提供了宝贵的资源。
{"title":"The genomes of five mantises provide insights into sex chromosome evolution and Mantodea phylogeny clarification.","authors":"Hangwei Liu, Lihong Lei, Fan Jiang, Bo Zhang, Hengchao Wang, Yutong Zhang, Hanbo Zhao, Guirong Wang, Wei Fan","doi":"10.1093/gigascience/giaf158","DOIUrl":"https://doi.org/10.1093/gigascience/giaf158","url":null,"abstract":"<p><strong>Background: </strong>Praying mantises, members of the order Mantodea, play important roles in agriculture, medicine, bionics, and entertainment. However, the scarcity of genomic resources has hindered extensive studies on mantis evolution and behaviour.</p><p><strong>Results: </strong>Here, we present the chromosome-scale reference genomes of five mantis species: the European mantis (Mantis religiosa), Chinese mantis (Tenodera sinensis), triangle dead leaf mantis (Deroplatys truncata), orchid mantis (Hymenopus coronatus), and metallic mantis (Metallyticus violacea). The assembled genome sizes range ∼2.3-4.2 Gb, with contig N50 size 1-109 Mb and 85-99% of sequence anchored to chromosomes. The annotated protein-coding gene number ranges 17,804-19,017, with BUSCO complete rate 96.7-98.4%. We found that transposable element expansion is the major force governing genome size in Mantodea, and suggest that translocations between the X chromosome and an autosome have occurred in the lineage of the family Mantidae. In addition, we found the lineage of M. violacea has accumulated fewer substitutions than the lineages of other mantises. Furthermore, our genome-wide analyses showed that D. truncata is sister to H. coronatus than M. religiosa and T. sinensis, helps resolve the phylogenic controversies of Deroplatys genus.</p><p><strong>Conclusions: </strong>The high-quality genome assemblies of the five mantises provide a valuable resource for evolution studies of Mantodea and genetic improvement and breeding of beneficial biological control agents.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145774156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
GigaScience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1