首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
SAIGE-GPU: accelerating genome- and phenome-wide association studies using GPUs. 使用gpu加速基因组和全现象关联研究。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag032
Alex Rodriguez, Youngdae Kim, Tarak Nath Nandi, Karl Keat, Rachit Kumar, Mitchell Conery, Rohan Bhukar, Molei Liu, John Hessington, Ketan Maheshwari, Edmon Begoli, Georgia Tourassi, Sumitra Muralidhar, Pradeep Natarajan, Benjamin F Voight, Kelly Cho, John Michael Gaziano, Scott M Damrauer, Katherine P Liao, Wei Zhou, Jennifer E Huffman, Anurag Verma, Ravi K Madduri

Motivation: Genome-wide association studies (GWAS) at biobank scale are computationally intensive, especially for admixed populations requiring robust statistical models. SAIGE is a widely used method for generalized linear mixed-model GWAS but is limited by its CPU-based implementation, making phenome-wide association studies impractical for many research groups.

Results: We developed SAIGE-GPU, a GPU-accelerated version of SAIGE that replaces CPU-intensive matrix operations with GPU-optimized kernels. The core innovation is distributing genetic relationship matrix calculations across GPUs and communication layers. Applied to 2068 phenotypes from 635 969 participants in the Million Veteran Program, including diverse and admixed populations, SAIGE-GPU achieved a 5-fold speedup in mixed model fitting on supercomputing infrastructure and cloud platforms. We further optimized the variant association testing step through multi-core and multi-trait parallelization. Deployed on Google Cloud Platform and Azure, the method provided substantial cost and time savings.

Availability and implementation: Source code and binaries are available for download at https://github.com/saigegit/SAIGE/tree/SAIGE-GPU-1.3.3. A code snapshot is archived at Zenodo for reproducibility (DOI: [10.5281/zenodo.17642591]). SAIGE-GPU is available in a containerized format for use across HPC and cloud environments and is implemented in R/C++ and runs on Linux systems.

动机:生物库规模的全基因组关联研究(GWAS)是计算密集型的,特别是对于需要稳健统计模型的混合种群。SAIGE是一种广泛应用于广义线性混合模型GWAS的方法,但受限于其基于cpu的实现,使得许多研究小组无法进行全现象关联研究。结果:我们开发了SAIGE- gpu,这是一个gpu加速版本的SAIGE,它用gpu优化的内核取代了cpu密集型矩阵运算。核心创新是在gpu和通信层之间分配遗传关系矩阵计算。SAIGE-GPU应用于百万老兵计划(MVP)中635,969名参与者的2,068种表型,包括多样化和混合人群,在超级计算基础设施和云平台上实现了混合模型拟合的5倍加速。通过多核、多性状并行化进一步优化变异关联测试步骤。该方法部署在谷歌云平台和Azure上,节省了大量的成本和时间。可用性:源代码和二进制文件可从https://github.com/saigegit/SAIGE/tree/SAIGE-GPU-1.3.3下载。为了重现性,代码快照在Zenodo存档(DOI: [10.5281/ Zenodo .17642591])。SAIGE-GPU以容器化格式提供,可跨HPC和云环境使用,并在R/ c++中实现,在Linux系统上运行。补充信息:补充数据可在生物信息学在线获取。
{"title":"SAIGE-GPU: accelerating genome- and phenome-wide association studies using GPUs.","authors":"Alex Rodriguez, Youngdae Kim, Tarak Nath Nandi, Karl Keat, Rachit Kumar, Mitchell Conery, Rohan Bhukar, Molei Liu, John Hessington, Ketan Maheshwari, Edmon Begoli, Georgia Tourassi, Sumitra Muralidhar, Pradeep Natarajan, Benjamin F Voight, Kelly Cho, John Michael Gaziano, Scott M Damrauer, Katherine P Liao, Wei Zhou, Jennifer E Huffman, Anurag Verma, Ravi K Madduri","doi":"10.1093/bioinformatics/btag032","DOIUrl":"10.1093/bioinformatics/btag032","url":null,"abstract":"<p><strong>Motivation: </strong>Genome-wide association studies (GWAS) at biobank scale are computationally intensive, especially for admixed populations requiring robust statistical models. SAIGE is a widely used method for generalized linear mixed-model GWAS but is limited by its CPU-based implementation, making phenome-wide association studies impractical for many research groups.</p><p><strong>Results: </strong>We developed SAIGE-GPU, a GPU-accelerated version of SAIGE that replaces CPU-intensive matrix operations with GPU-optimized kernels. The core innovation is distributing genetic relationship matrix calculations across GPUs and communication layers. Applied to 2068 phenotypes from 635 969 participants in the Million Veteran Program, including diverse and admixed populations, SAIGE-GPU achieved a 5-fold speedup in mixed model fitting on supercomputing infrastructure and cloud platforms. We further optimized the variant association testing step through multi-core and multi-trait parallelization. Deployed on Google Cloud Platform and Azure, the method provided substantial cost and time savings.</p><p><strong>Availability and implementation: </strong>Source code and binaries are available for download at https://github.com/saigegit/SAIGE/tree/SAIGE-GPU-1.3.3. A code snapshot is archived at Zenodo for reproducibility (DOI: [10.5281/zenodo.17642591]). SAIGE-GPU is available in a containerized format for use across HPC and cloud environments and is implemented in R/C++ and runs on Linux systems.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12960912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From genes to trajectories: mapping genetic influences on Huntington's disease progression. 从基因到轨迹:绘制亨廷顿氏病进展的遗传影响。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag072
Sanjoy Dey, Zhaonan Sun, John Warner, Eileen Koski, Elif Eyigoz, Swati Sathe, Cristina Sampaio, Jianying Hu

Motivation: There are many diseases with established genetic factors, such as Huntington's disease (HD), that are characterized by variable rates of progression. However, beyond the contribution of the known genetic factors - in this case the Huntingtin (HTT) gene - the impact of the full human genome on the natural progression of such diseases throughout a patient's life remains largely unknown. The increased availability of genome wide association (GWA) data in HD gene expansion carriers (HDGECs), combined with the clinical assessment scores on the same set of patients, has provided a perfect opportunity to assess the potentially broader genetic impact on the natural progression of HD.

Results: We present a genetics-driven, probabilistic disease progression model designed to identify and investigate the ways in which a range of genetic factors affect the natural progression of HD. When applied to a clinico-genomic HD dataset, our model identified several single nucleotide polymorphisms (SNPs) with previously unreported effects on disease progression that act at distinct stages and with varying magnitudes. This discovery may shed light on the potential mechanistic impact of previously unidentified genes on HD that may have implications for clinical management. As increasing amounts of GWA data become available more generally, we anticipate that this modeling framework will be broadly applicable to other diseases with strong genetic components.

Availability and implementation: The source code for IHDPM is available at https://github.com/BiomedSciAI/IHDPM.

背景:有许多疾病具有确定的遗传因素,如亨廷顿舞蹈病(HD),其特点是可变的进展率。然而,除了已知的遗传因素——在这个病例中是亨廷顿蛋白(HTT)基因——之外,整个人类基因组对这类疾病在患者一生中自然发展的影响在很大程度上仍然是未知的。动机:HD基因扩增携带者(HDGECs)基因组全关联(GWA)数据的增加,结合同一组患者的临床评估评分,为评估HD自然进展的潜在更广泛的遗传影响提供了一个完美的机会。随着越来越多的GWA数据变得更普遍,我们期望为HD开发的方法将适用于具有强烈遗传成分的其他疾病。方法:建立遗传驱动的概率疾病进展模型,以确定和研究一系列遗传因素影响HD自然进展的方式。结果:几个snp被发现在HD疾病进展的不同阶段具有先前未记载的影响。这一发现可能揭示了先前未确定的基因对HD的潜在机制影响,这可能对临床管理有影响。这一结果也确立了该方法在其他遗传疾病中的潜在应用。
{"title":"From genes to trajectories: mapping genetic influences on Huntington's disease progression.","authors":"Sanjoy Dey, Zhaonan Sun, John Warner, Eileen Koski, Elif Eyigoz, Swati Sathe, Cristina Sampaio, Jianying Hu","doi":"10.1093/bioinformatics/btag072","DOIUrl":"10.1093/bioinformatics/btag072","url":null,"abstract":"<p><strong>Motivation: </strong>There are many diseases with established genetic factors, such as Huntington's disease (HD), that are characterized by variable rates of progression. However, beyond the contribution of the known genetic factors - in this case the Huntingtin (HTT) gene - the impact of the full human genome on the natural progression of such diseases throughout a patient's life remains largely unknown. The increased availability of genome wide association (GWA) data in HD gene expansion carriers (HDGECs), combined with the clinical assessment scores on the same set of patients, has provided a perfect opportunity to assess the potentially broader genetic impact on the natural progression of HD.</p><p><strong>Results: </strong>We present a genetics-driven, probabilistic disease progression model designed to identify and investigate the ways in which a range of genetic factors affect the natural progression of HD. When applied to a clinico-genomic HD dataset, our model identified several single nucleotide polymorphisms (SNPs) with previously unreported effects on disease progression that act at distinct stages and with varying magnitudes. This discovery may shed light on the potential mechanistic impact of previously unidentified genes on HD that may have implications for clinical management. As increasing amounts of GWA data become available more generally, we anticipate that this modeling framework will be broadly applicable to other diseases with strong genetic components.</p><p><strong>Availability and implementation: </strong>The source code for IHDPM is available at https://github.com/BiomedSciAI/IHDPM.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13003314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IgPose: a generative data-augmented pipeline for robust immunoglobulin-antigen binding prediction. IgPose:一个强大的免疫球蛋白抗原结合预测生成数据增强管道。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag076
Tien-Cuong Bui, Injae Chung, Wonjun Lee, Junsu Ko, Juyong Lee

Motivation: Predicting immunoglobulin-antigen (Ig-Ag) binding remains a significant challenge due to the paucity of experimentally resolved complexes and the limited accuracy of de novo Ig structure prediction.

Results: We introduce IgPose, a generalizable framework for Ig-Ag pose identification and scoring, built on a generative data-augmentation pipeline. To mitigate data scarcity, we constructed the Structural Immunoglobulin Decoy Database (SIDD), a comprehensive repository of high-fidelity synthetic decoys. IgPose integrates equivariant graph neural networks, ESM-2 embeddings, and gated recurrent units to synergistically capture both geometric and evolutionary features. We implemented interface-focused k-hop sampling with biologically guided pooling to enhance generalization across diverse interfaces. The framework comprises two sub-networks-IgPoseClassifier for binding pose discrimination and IgPoseScore for DockQ score estimation-and achieves robust performance on curated internal test sets and the CASP-16 benchmark compared to physics and deep learning baselines. IgPose serves as a versatile computational tool for high-throughput antibody discovery pipelines by providing accurate pose filtering and ranking.

Availability and implementation: IgPose is available on GitHub (https://github.com/arontier/igpose).

动机:预测免疫球蛋白-抗原(Ig- ag)结合仍然是一个重大挑战,因为缺乏实验分解的复合物,并且从头预测Ig结构的准确性有限。结果:我们介绍了IgPose,这是一个基于生成数据增强管道的Ig-Ag姿态识别和评分的通用框架。为了缓解数据短缺,我们构建了结构免疫球蛋白诱饵数据库(SIDD),这是一个高保真合成诱饵的综合存储库。IgPose集成了等变图神经网络、ESM-2嵌入和门控循环单元,以协同捕获几何和进化特征。我们使用生物引导池实现了以接口为中心的k-hop采样,以增强不同接口之间的泛化。该框架包括两个子网络——用于绑定姿态识别的igposecassifier和用于DockQ分数估计的IgPoseScore——与物理和深度学习基线相比,在精心设计的内部测试集和CASP-16基准上实现了强大的性能。IgPose通过提供准确的姿态过滤和排名,作为高通量抗体发现管道的通用计算工具。可用性和实施:IgPose可在GitHub上获得(https://github.com/arontier/igpose).Supplementary信息:补充信息可在Bioinformatics在线获得。
{"title":"IgPose: a generative data-augmented pipeline for robust immunoglobulin-antigen binding prediction.","authors":"Tien-Cuong Bui, Injae Chung, Wonjun Lee, Junsu Ko, Juyong Lee","doi":"10.1093/bioinformatics/btag076","DOIUrl":"10.1093/bioinformatics/btag076","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting immunoglobulin-antigen (Ig-Ag) binding remains a significant challenge due to the paucity of experimentally resolved complexes and the limited accuracy of de novo Ig structure prediction.</p><p><strong>Results: </strong>We introduce IgPose, a generalizable framework for Ig-Ag pose identification and scoring, built on a generative data-augmentation pipeline. To mitigate data scarcity, we constructed the Structural Immunoglobulin Decoy Database (SIDD), a comprehensive repository of high-fidelity synthetic decoys. IgPose integrates equivariant graph neural networks, ESM-2 embeddings, and gated recurrent units to synergistically capture both geometric and evolutionary features. We implemented interface-focused k-hop sampling with biologically guided pooling to enhance generalization across diverse interfaces. The framework comprises two sub-networks-IgPoseClassifier for binding pose discrimination and IgPoseScore for DockQ score estimation-and achieves robust performance on curated internal test sets and the CASP-16 benchmark compared to physics and deep learning baselines. IgPose serves as a versatile computational tool for high-throughput antibody discovery pipelines by providing accurate pose filtering and ranking.</p><p><strong>Availability and implementation: </strong>IgPose is available on GitHub (https://github.com/arontier/igpose).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12989135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint modeling of longitudinal biomarker and survival outcomes with the presence of competing risk in the nested case-control studies with application to the TEDDY microbiome dataset. 应用于TEDDY微生物组数据集的嵌套病例对照研究中存在竞争风险的纵向生物标志物和生存结果联合建模
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag038
Yanan Zhao, Ting-Fang Lee, Boyan Zhou, Chan Wang, Ann Marie Schmidt, Mengling Liu, Huilin Li, Jiyuan Hu

Motivation: Large-scale prospective cohort studies collect longitudinal biospecimens alongside time-to-event outcomes to investigate biomarker dynamics in relation to disease risk. The nested case-control (NCC) design provides a cost-effective alternative to full cohort biomarker studies while preserving statistical efficiency. Despite advances in joint modeling for longitudinal and time-to-event outcomes, few approaches address the unique challenges posed by NCC sampling, non-normally distributed biomarkers, and competing survival outcomes.

Results: Motivated by the TEDDY study, we propose "JM-NCC", a joint modeling framework designed for NCC studies with competing events. It integrates a generalized linear mixed-effects model for potentially non-normally distributed biomarkers with a cause-specific hazard model for competing risks. Two estimation methods are developed. fJM-NCC leverages NCC sub-cohort longitudinal biomarker data and full cohort survival and clinical metadata, while wJM-NCC uses only NCC sub-cohort data. Both simulation studies and an application to TEDDY microbiome dataset demonstrate the robustness and efficiency of the proposed methods.

Availability and implementation: Software is available at https://github.com/Zhaoyn-oss/JMNCC and archived on Zenodo at https://zenodo.org/records/18199759 (DOI: 10.5281/zenodo.18199759).

动机:大规模前瞻性队列研究收集纵向生物标本以及事件时间结果,以调查与疾病风险相关的生物标志物动态。嵌套病例对照(NCC)设计为全队列生物标志物研究提供了一种具有成本效益的替代方案,同时保持了统计效率。尽管纵向和事件时间结果的联合建模取得了进展,但很少有方法解决NCC抽样、非正态分布生物标志物和竞争生存结果所带来的独特挑战。结果:在TEDDY研究的激励下,我们提出了“JM-NCC”,这是一个为具有竞争项目的NCC研究设计的联合建模框架。它将潜在非正态分布生物标志物的广义线性混合效应模型与竞争风险的原因特定风险模型集成在一起。提出了两种估计方法。fJM-NCC利用NCC亚队列纵向生物标志物数据和全队列生存和临床元数据,而wJM-NCC仅使用NCC亚队列数据。仿真研究和对TEDDY微生物组数据集的应用都证明了所提出方法的鲁棒性和有效性。可用性:软件可从https://github.com/Zhaoyn-oss/JMNCC获得,并在Zenodo上存档https://zenodo.org/records/18199759 (DOI: 10.5281/ Zenodo .18199759)。补充信息:补充数据可在生物信息学在线获取。
{"title":"Joint modeling of longitudinal biomarker and survival outcomes with the presence of competing risk in the nested case-control studies with application to the TEDDY microbiome dataset.","authors":"Yanan Zhao, Ting-Fang Lee, Boyan Zhou, Chan Wang, Ann Marie Schmidt, Mengling Liu, Huilin Li, Jiyuan Hu","doi":"10.1093/bioinformatics/btag038","DOIUrl":"10.1093/bioinformatics/btag038","url":null,"abstract":"<p><strong>Motivation: </strong>Large-scale prospective cohort studies collect longitudinal biospecimens alongside time-to-event outcomes to investigate biomarker dynamics in relation to disease risk. The nested case-control (NCC) design provides a cost-effective alternative to full cohort biomarker studies while preserving statistical efficiency. Despite advances in joint modeling for longitudinal and time-to-event outcomes, few approaches address the unique challenges posed by NCC sampling, non-normally distributed biomarkers, and competing survival outcomes.</p><p><strong>Results: </strong>Motivated by the TEDDY study, we propose \"JM-NCC\", a joint modeling framework designed for NCC studies with competing events. It integrates a generalized linear mixed-effects model for potentially non-normally distributed biomarkers with a cause-specific hazard model for competing risks. Two estimation methods are developed. fJM-NCC leverages NCC sub-cohort longitudinal biomarker data and full cohort survival and clinical metadata, while wJM-NCC uses only NCC sub-cohort data. Both simulation studies and an application to TEDDY microbiome dataset demonstrate the robustness and efficiency of the proposed methods.</p><p><strong>Availability and implementation: </strong>Software is available at https://github.com/Zhaoyn-oss/JMNCC and archived on Zenodo at https://zenodo.org/records/18199759 (DOI: 10.5281/zenodo.18199759).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13005730/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scMix: learning temporal dynamics of gene expression under irregular time intervals. 学习不规则时间间隔下基因表达的时间动态。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag080
Shangjin Han, Dongsup Kim

Motivation: Understanding temporal gene expression is fundamental in the study of cellular development and differentiation. In practice, temporal single-cell datasets tend to contain only a limited number of measured time points, which are often unevenly spaced, resulting in irregular intervals between observations due to experimental constraints. Existing methods typically address these intervals by sequentially predicting one time point after another, yet lack mechanisms to explicitly model time intervals, leading to error accumulation.

Results: In this work, we introduce scMix, a language-model-based framework for predicting single-cell gene expression, which enables prediction from multiple historical time points. We build scMix on the Receptance Weighted Key Value architecture and use its time decay mechanism to model temporal dependencies over time. Moreover, scMix proposes a delta-time mechanism that allows the model to bypass unmeasured time points, reducing error accumulation and improving robustness. In addition, we incorporate a trend regularization strategy to enhance the temporal coherence of predicted gene expression trajectories. scMix demonstrates state-of-the-art performance in predicting gene expression at unmeasured time points, surpassing existing methods, and also achieves outstanding results on downstream tasks.

Availability and implementation: The code used for this study is available at https://doi.org/10.5281/zenodo.18287184.

动机:了解时间基因表达是研究细胞发育和分化的基础。在实践中,时间单细胞数据集往往只包含有限数量的测量时间点,这些时间点通常是不均匀间隔的,由于实验限制,导致观测之间的间隔不规则。现有方法通常通过顺序预测一个时间点接着另一个时间点来处理这些间隔,但缺乏显式建模时间间隔的机制,导致误差累积。结果:在这项工作中,我们引入了scMix,这是一个基于语言模型的框架,用于预测单细胞基因表达,可以从多个历史时间点进行预测。我们在接受加权键值(RWKV)架构上构建scMix,并使用其时间衰减机制来建模随时间变化的时间依赖性。此外,scMix提出了一种delta-time机制,允许模型绕过未测量的时间点,减少误差积累并提高鲁棒性。此外,我们采用趋势正则化策略来增强预测基因表达轨迹的时间一致性。scMix在预测非测量时间点的基因表达方面表现出了最先进的性能,超越了现有的方法,并且在下游任务中也取得了出色的结果。可获得性和实施:本研究使用的代码可在https://doi.org/10.5281/zenodo.18287184.Supplementary上获得。补充数据可在Bioinformatics在线上获得。
{"title":"scMix: learning temporal dynamics of gene expression under irregular time intervals.","authors":"Shangjin Han, Dongsup Kim","doi":"10.1093/bioinformatics/btag080","DOIUrl":"10.1093/bioinformatics/btag080","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding temporal gene expression is fundamental in the study of cellular development and differentiation. In practice, temporal single-cell datasets tend to contain only a limited number of measured time points, which are often unevenly spaced, resulting in irregular intervals between observations due to experimental constraints. Existing methods typically address these intervals by sequentially predicting one time point after another, yet lack mechanisms to explicitly model time intervals, leading to error accumulation.</p><p><strong>Results: </strong>In this work, we introduce scMix, a language-model-based framework for predicting single-cell gene expression, which enables prediction from multiple historical time points. We build scMix on the Receptance Weighted Key Value architecture and use its time decay mechanism to model temporal dependencies over time. Moreover, scMix proposes a delta-time mechanism that allows the model to bypass unmeasured time points, reducing error accumulation and improving robustness. In addition, we incorporate a trend regularization strategy to enhance the temporal coherence of predicted gene expression trajectories. scMix demonstrates state-of-the-art performance in predicting gene expression at unmeasured time points, surpassing existing methods, and also achieves outstanding results on downstream tasks.</p><p><strong>Availability and implementation: </strong>The code used for this study is available at https://doi.org/10.5281/zenodo.18287184.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12970592/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146204253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SeOMLR: one-step multi-view latent representation with self-weighted ensemble learning for multi-omics cancer subtyping. SeOMLR:基于自加权集成学习的一步多视图潜在表征多组学癌症亚型分型。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag074
Wenjing Song, Yesen Sun, Le Ou-Yang

Motivation: Accurate cancer subtyping is critically important for cancer treatment due to significant molecular heterogeneity. While existing methods with multi-omics integration have achieved some success in cancer subtype identification by leveraging the rich information provided by multi-omics data, most approaches remain limited by an overemphasis on cross-omics consistency at the expense of intra-omics specificity. Furthermore, a two-step scheme is often adopted to extract cluster structure from a consistency matrix or a continuous indicator matrix by k-means, which inevitably leads to information loss and unstable clusters.

Results: To overcome these issues, we propose seOMLR, a one-step multi-view latent representation method with self-weighted ensemble learning for cancer subtyping. Using relaxed exclusivity constraints and consistency regularization terms, seOMLR exploits the specificity and consistency of multi-omics data by building a sparse low-rank self-representation framework. Simultaneously, a self-weighted ensemble strategy is introduced to adaptively incorporate prior subtyping information from other methods, indirectly promoting specificity and consistency learning. Moreover, the discrete clustering structure is subsequently extracted via spectral rotation to avoid information loss and cluster instability. Through joint iterative optimization of fusion and clustering, seOMLR enhances subtyping accuracy. Experiments on both simulated datasets and eight real multi-omics cancer datasets from TCGA demonstrate that seOMLR outperforms competing methods, achieving efficient multi-omics data fusion and providing computational framework support for cancer subtyping research.

Availability and implementation: Supplementary data are available at Bioinformatics online.

动机:由于显著的分子异质性,准确的癌症亚型对癌症治疗至关重要。虽然利用多组学数据提供的丰富信息,现有的多组学整合方法在癌症亚型鉴定方面取得了一些成功,但大多数方法仍然受到过度强调跨组学一致性而牺牲组学内特异性的限制。此外,通过k-means从一致性矩阵或连续指标矩阵中提取聚类结构时,通常采用两步方案,这不可避免地会导致信息丢失和聚类不稳定。结果:为了克服这些问题,我们提出了seOMLR,这是一种基于自加权集成学习的一步多视图潜在表示方法。seOMLR使用宽松的排他性约束和一致性正则化项,通过构建稀疏的低秩自表示框架来利用多组学数据的特异性和一致性。同时,引入自加权集成策略,自适应地吸收其他方法的先验亚型信息,间接促进特异性和一致性学习。此外,随后通过光谱旋转提取离散聚类结构,以避免信息丢失和聚类不稳定。seOMLR通过融合和聚类的联合迭代优化,提高了分型精度。在TCGA的模拟数据集和8个真实多组学癌症数据集上的实验表明,seOMLR优于竞争对手的方法,实现了高效的多组学数据融合,并为癌症亚型研究提供了计算框架支持。可用性和实施:补充数据可在生物信息学在线获得。
{"title":"SeOMLR: one-step multi-view latent representation with self-weighted ensemble learning for multi-omics cancer subtyping.","authors":"Wenjing Song, Yesen Sun, Le Ou-Yang","doi":"10.1093/bioinformatics/btag074","DOIUrl":"10.1093/bioinformatics/btag074","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate cancer subtyping is critically important for cancer treatment due to significant molecular heterogeneity. While existing methods with multi-omics integration have achieved some success in cancer subtype identification by leveraging the rich information provided by multi-omics data, most approaches remain limited by an overemphasis on cross-omics consistency at the expense of intra-omics specificity. Furthermore, a two-step scheme is often adopted to extract cluster structure from a consistency matrix or a continuous indicator matrix by k-means, which inevitably leads to information loss and unstable clusters.</p><p><strong>Results: </strong>To overcome these issues, we propose seOMLR, a one-step multi-view latent representation method with self-weighted ensemble learning for cancer subtyping. Using relaxed exclusivity constraints and consistency regularization terms, seOMLR exploits the specificity and consistency of multi-omics data by building a sparse low-rank self-representation framework. Simultaneously, a self-weighted ensemble strategy is introduced to adaptively incorporate prior subtyping information from other methods, indirectly promoting specificity and consistency learning. Moreover, the discrete clustering structure is subsequently extracted via spectral rotation to avoid information loss and cluster instability. Through joint iterative optimization of fusion and clustering, seOMLR enhances subtyping accuracy. Experiments on both simulated datasets and eight real multi-omics cancer datasets from TCGA demonstrate that seOMLR outperforms competing methods, achieving efficient multi-omics data fusion and providing computational framework support for cancer subtyping research.</p><p><strong>Availability and implementation: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12980331/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147367671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SummArIzeR: simplifying cross-database enrichment result clustering and annotation via large language models. SummArIzeR:通过大型语言模型简化跨数据库浓缩结果聚类和注释。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag102
Marie Brinkmann, Michael Bonelli, Anela Tosevska

Motivation: Enrichment analysis across multiple databases often results in a high level of redundancy due to overlapping terms, complicating the interpretation of biological data. To address this, we developed SummArIzeR, an R package to cluster and annotate enrichment results across multiple databases, enabling fast, intuitive interpretation and comparison across multiple conditions. SummArIzeR clusters enrichment results based on shared genes, calculates a pooled P-value for each cluster and facilitates the cluster annotation using large-language models. It further allows an easily interpretable visualization of the results.

Results: Compared to existing tools, SummArIzeR provides unbiased and fast cluster annotation using large language models. We demonstrate that SummArIzeR achieves clustering comparable to manual curation while offering superior grouping based on shared underlying genes.

Availability and implementation: The SummArIzeR package is available as an open-source R package, with a comprehensive user manual provided in its GitHub repository: https://github.com/bonellilab/SummArIzeR.

动机:由于术语重叠,跨多个数据库的丰富分析通常会导致高度冗余,使生物数据的解释复杂化。为了解决这个问题,我们开发了SummArIzeR,这是一个R包,用于跨多个数据库对富集结果进行聚类和注释,从而实现跨多个条件的快速、直观的解释和比较。SummArIzeR基于共享基因对聚类结果进行聚类,计算每个聚类的池p值,并使用大语言模型促进聚类注释。它还允许对结果进行易于解释的可视化。结果:与现有工具相比,SummArIzeR使用大型语言模型提供了无偏和快速的聚类注释。我们证明了SummArIzeR在提供基于共享底层基因的优越分组的同时,实现了与手动策展相当的聚类。可用性:SummArIzeR包是一个开源R包,在其GitHub存储库中提供了全面的用户手册:https://github.com/bonellilab/SummArIzeR.Supplementary information:补充数据可在Bioinformatics在线获取。
{"title":"SummArIzeR: simplifying cross-database enrichment result clustering and annotation via large language models.","authors":"Marie Brinkmann, Michael Bonelli, Anela Tosevska","doi":"10.1093/bioinformatics/btag102","DOIUrl":"10.1093/bioinformatics/btag102","url":null,"abstract":"<p><strong>Motivation: </strong>Enrichment analysis across multiple databases often results in a high level of redundancy due to overlapping terms, complicating the interpretation of biological data. To address this, we developed SummArIzeR, an R package to cluster and annotate enrichment results across multiple databases, enabling fast, intuitive interpretation and comparison across multiple conditions. SummArIzeR clusters enrichment results based on shared genes, calculates a pooled P-value for each cluster and facilitates the cluster annotation using large-language models. It further allows an easily interpretable visualization of the results.</p><p><strong>Results: </strong>Compared to existing tools, SummArIzeR provides unbiased and fast cluster annotation using large language models. We demonstrate that SummArIzeR achieves clustering comparable to manual curation while offering superior grouping based on shared underlying genes.</p><p><strong>Availability and implementation: </strong>The SummArIzeR package is available as an open-source R package, with a comprehensive user manual provided in its GitHub repository: https://github.com/bonellilab/SummArIzeR.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13005729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147328657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mamba6mA: a Mamba-based DNA N6-methyladenine site prediction model. Mamba6mA:一个基于mamba的DNA n6 -甲基腺嘌呤位点预测模型。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag060
Qi Zhao, Zhen Zhang, Tingwei Chen, Qian Mao, Haoxuan Shi, Jingjing Chen, Zheng Zhao, Xiaoya Fan

Motivation: N6-methyladenine (6 mA) is an important epigenetic modification of DNA that regulates biological processes such as gene expression, transcription, replication, DNA repair, and cell cycle without altering the DNA sequence. It also plays a key role in many diseases including cancer and autoimmune diseases. Although experimental approaches such as SMRT sequencing and methylated DNA immunoprecipitation can identify 6 mA sites, they suffer from drawbacks including suboptimal sequencing quality, low signal-to-noise ratios, high costs, and time-consuming procedures. In recent years, deep learning approaches have demonstrated significant advantages in predicting 6 mA sites; however, their generalization ability still requires further improvement.

Results: Inspired by the state space model Mamba, we propose a novel model for 6 mA site prediction, named Mamba6mA. In the Mamba6mA model, we design position-specific linear layers to replace traditional convolutional layers to facilitate capture specific positional information. Meanwhile, we construct a multi-scale feature extraction module and integrate features captured by sliding windows of different scales, feeding them into the classifier for prediction. Experimental results show that Mamba6mA achieves the best MCC on 9 out of 11 species datasets, surpassing existing state-of-the-art models. Ablation studies confirm that the position-specific linear layers and the multi-scale fusion module contribute MCC performance gains of 2.36% and 2.31%, respectively. Feature visualization analysis further reveals that the model effectively captures sequence patterns upstream and downstream of 6 mA sites providing a new technical approach for studying epigenetic modification mechanisms.

Availability and implementation: The source code for Mamba6mA is available at: https://github.com/XploreAI-Lab/Mamba6mA.

动机:n6 -甲基腺嘌呤(6ma)是一种重要的DNA表观遗传修饰,在不改变DNA序列的情况下调节基因表达、转录、复制、DNA修复和细胞周期等生物过程。它在包括癌症和自身免疫性疾病在内的许多疾病中也起着关键作用。虽然SMRT测序和甲基化DNA免疫沉淀等实验方法可以识别6ma位点,但它们存在测序质量不理想、信噪比低、成本高和耗时等缺点。近年来,深度学习方法在预测6个mA位点方面显示出显著的优势;但其泛化能力还有待进一步提高。结果:受状态空间模型Mamba的启发,我们提出了一种新的6ma位点预测模型Mamba6mA。在Mamba6mA模型中,我们设计了位置特定的线性层来取代传统的卷积层,以方便捕获特定的位置信息。同时,我们构建了一个多尺度特征提取模块,将不同尺度滑动窗捕获的特征整合到分类器中进行预测。实验结果表明,Mamba6mA在11个物种数据集中的9个上达到了最佳MCC,超过了现有的最先进模型。烧蚀研究证实,位置特定线性层和多尺度融合模块对MCC性能的贡献分别为2.36%和2.31%。特征可视化分析进一步表明,该模型有效捕获了6ma位点上下游的序列模式,为研究表观遗传修饰机制提供了新的技术途径。获取和实现:Mamba6mA的源代码可在:https://github.com/XploreAI-Lab/Mamba6mA.Contact;范小雅(xiaoyafan@dlut.edu.cn),赵征(zhaozheng@dlmu.edu.cn)。补充信息:补充信息可在Bioinformatics online获取。
{"title":"Mamba6mA: a Mamba-based DNA N6-methyladenine site prediction model.","authors":"Qi Zhao, Zhen Zhang, Tingwei Chen, Qian Mao, Haoxuan Shi, Jingjing Chen, Zheng Zhao, Xiaoya Fan","doi":"10.1093/bioinformatics/btag060","DOIUrl":"10.1093/bioinformatics/btag060","url":null,"abstract":"<p><strong>Motivation: </strong>N6-methyladenine (6 mA) is an important epigenetic modification of DNA that regulates biological processes such as gene expression, transcription, replication, DNA repair, and cell cycle without altering the DNA sequence. It also plays a key role in many diseases including cancer and autoimmune diseases. Although experimental approaches such as SMRT sequencing and methylated DNA immunoprecipitation can identify 6 mA sites, they suffer from drawbacks including suboptimal sequencing quality, low signal-to-noise ratios, high costs, and time-consuming procedures. In recent years, deep learning approaches have demonstrated significant advantages in predicting 6 mA sites; however, their generalization ability still requires further improvement.</p><p><strong>Results: </strong>Inspired by the state space model Mamba, we propose a novel model for 6 mA site prediction, named Mamba6mA. In the Mamba6mA model, we design position-specific linear layers to replace traditional convolutional layers to facilitate capture specific positional information. Meanwhile, we construct a multi-scale feature extraction module and integrate features captured by sliding windows of different scales, feeding them into the classifier for prediction. Experimental results show that Mamba6mA achieves the best MCC on 9 out of 11 species datasets, surpassing existing state-of-the-art models. Ablation studies confirm that the position-specific linear layers and the multi-scale fusion module contribute MCC performance gains of 2.36% and 2.31%, respectively. Feature visualization analysis further reveals that the model effectively captures sequence patterns upstream and downstream of 6 mA sites providing a new technical approach for studying epigenetic modification mechanisms.</p><p><strong>Availability and implementation: </strong>The source code for Mamba6mA is available at: https://github.com/XploreAI-Lab/Mamba6mA.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12960908/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PScnv: personalized self-normalizing CNV detection with a hierarchical multi-phase framework. 基于分层多阶段框架的个性化自归一化CNV检测。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag099
Xuwen Wang, Zhili Chang, Wansheng Lv, Akhatov Akmal, Xamidov Munis, Xunbiao Liu, Shenjie Wang, Xiaoyan Zhu, Chong Du, Shuqun Zhang, Jiayin Wang

Motivation: Accurate detection of copy number variations (CNVs) from targeted panel sequencing remains challenging due to limited genomic coverage and pronounced sample-specific biases. Existing normalization strategies, including baseline-cohort, matched-control, and single-sample approaches, often struggle to balance noise suppression with adaptability, leading to inconsistent performance across heterogeneous samples.

Results: We present PScnv, a personalized self-normalizing framework for robust CNV detection from panel sequencing data. PScnv integrates a pre-built panel-of-normals (PoN) with sample-intrinsic stable chromosomes through ridge-regression normalization to generate individualized log2 ratio profiles with reduced systematic variation. CNVs are then identified using a hierarchical multi-phase segmentation pipeline incorporating z-score pre-partitioning, kernel-based correction, and circular binary segmentation. In 139 clinical tumor samples with orthogonal FISH validation at MET, ERBB2, and MTAP, PScnv showed improved accuracy and robustness over existing methods that do not require patient-matched normal samples, provided that a pre-built PoN cohort is available.

Availability: Source code is available for academic use at https://github.com/lvws/PScnv.

动机:由于有限的基因组覆盖范围和明显的样本特异性偏差,从靶向面板测序中准确检测拷贝数变异(CNVs)仍然具有挑战性。现有的归一化策略,包括基线队列、匹配控制和单样本方法,往往难以平衡噪声抑制和适应性,导致异构样本的性能不一致。结果:我们提出了PScnv,一个个性化的自归一化框架,用于从面板测序数据中检测CNV。PScnv通过脊回归归一化集成了预先构建的正常面板(PoN)和样本固有稳定染色体,以产生个性化的log2比率剖面,减少了系统变化。然后使用分层多相分割管道识别CNVs,该管道包含z-score预分割,基于核的校正和圆形二进制分割。在MET、ERBB2和MTAP进行正交FISH验证的139个临床肿瘤样本中,只要有预先建立的PoN队列,PScnv比现有的不需要患者匹配的正常样本的方法显示出更高的准确性和稳健性。可用性:源代码可在https://github.com/lvws/PScnv.Supplementary上用于学术用途;补充数据可在Bioinformatics在线上获得。
{"title":"PScnv: personalized self-normalizing CNV detection with a hierarchical multi-phase framework.","authors":"Xuwen Wang, Zhili Chang, Wansheng Lv, Akhatov Akmal, Xamidov Munis, Xunbiao Liu, Shenjie Wang, Xiaoyan Zhu, Chong Du, Shuqun Zhang, Jiayin Wang","doi":"10.1093/bioinformatics/btag099","DOIUrl":"10.1093/bioinformatics/btag099","url":null,"abstract":"<p><strong>Motivation: </strong>Accurate detection of copy number variations (CNVs) from targeted panel sequencing remains challenging due to limited genomic coverage and pronounced sample-specific biases. Existing normalization strategies, including baseline-cohort, matched-control, and single-sample approaches, often struggle to balance noise suppression with adaptability, leading to inconsistent performance across heterogeneous samples.</p><p><strong>Results: </strong>We present PScnv, a personalized self-normalizing framework for robust CNV detection from panel sequencing data. PScnv integrates a pre-built panel-of-normals (PoN) with sample-intrinsic stable chromosomes through ridge-regression normalization to generate individualized log2 ratio profiles with reduced systematic variation. CNVs are then identified using a hierarchical multi-phase segmentation pipeline incorporating z-score pre-partitioning, kernel-based correction, and circular binary segmentation. In 139 clinical tumor samples with orthogonal FISH validation at MET, ERBB2, and MTAP, PScnv showed improved accuracy and robustness over existing methods that do not require patient-matched normal samples, provided that a pre-built PoN cohort is available.</p><p><strong>Availability: </strong>Source code is available for academic use at https://github.com/lvws/PScnv.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13005925/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topological model selection: a case-study in tumour-induced angiogenesis. 拓扑模型选择:肿瘤诱导血管生成的案例研究。
IF 5.4 Pub Date : 2026-02-28 DOI: 10.1093/bioinformatics/btag065
Robert A McDonald, Helen M Byrne, Heather A Harrington, Thomas Thorne, Bernadette J Stolz

Motivation: Comparing mathematical models offers a means to evaluate competing scientific theories. However, exact methods of model calibration are not applicable to many probabilistic models which simulate high-dimensional spatio-temporal data. Approximate Bayesian Computation is a widely used method for parameter inference and model selection in such scenarios, and it may be combined with Topological Data Analysis to study models which simulate data with fine spatial structure.

Results: We develop a flexible pipeline for parameter inference and model selection in spatio-temporal models. Our pipeline identifies topological summary statistics which quantify spatio-temporal data and uses them to approximate parameter and model posterior distributions. We validate our pipeline on models of tumour-induced angiogenesis, inferring four parameters in three established models and identifying the correct model in synthetic test-cases.

Availability and implementation: Simulation code for all models, data analyses, parameter inference and model selection is available online at https://github.com/rmcdomaths/tms/ and archived at https://doi.org/10.5281/zenodo.17392787.

动机:比较数学模型提供了一种评估相互竞争的科学理论的方法。然而,对于许多模拟高维时空数据的概率模型,精确的模型标定方法并不适用。近似贝叶斯计算是这类场景中广泛使用的参数推断和模型选择方法,它可以与拓扑数据分析相结合,研究模拟具有精细空间结构数据的模型。结果:我们开发了一个灵活的时空模型参数推理和模型选择管道。我们的管道识别拓扑汇总统计,量化时空数据,并使用它们来近似参数和模型后验分布。我们在肿瘤诱导血管生成模型上验证了我们的管道,在三个已建立的模型中推断了四个参数,并在合成测试案例中确定了正确的模型。可用性和实现:所有模型、数据分析、参数推断和模型选择的仿真代码可在https://github.com/rmcdomaths/tms/在线获取,并存档于https://doi.org/10.5281/zenodo.17392787.Supplementary。
{"title":"Topological model selection: a case-study in tumour-induced angiogenesis.","authors":"Robert A McDonald, Helen M Byrne, Heather A Harrington, Thomas Thorne, Bernadette J Stolz","doi":"10.1093/bioinformatics/btag065","DOIUrl":"10.1093/bioinformatics/btag065","url":null,"abstract":"<p><strong>Motivation: </strong>Comparing mathematical models offers a means to evaluate competing scientific theories. However, exact methods of model calibration are not applicable to many probabilistic models which simulate high-dimensional spatio-temporal data. Approximate Bayesian Computation is a widely used method for parameter inference and model selection in such scenarios, and it may be combined with Topological Data Analysis to study models which simulate data with fine spatial structure.</p><p><strong>Results: </strong>We develop a flexible pipeline for parameter inference and model selection in spatio-temporal models. Our pipeline identifies topological summary statistics which quantify spatio-temporal data and uses them to approximate parameter and model posterior distributions. We validate our pipeline on models of tumour-induced angiogenesis, inferring four parameters in three established models and identifying the correct model in synthetic test-cases.</p><p><strong>Availability and implementation: </strong>Simulation code for all models, data analyses, parameter inference and model selection is available online at https://github.com/rmcdomaths/tms/ and archived at https://doi.org/10.5281/zenodo.17392787.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147446297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1