首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
Spider: a flexible and unified framework for simulating spatial transcriptomics data. Spider:一个灵活和统一的框架,用于模拟空间转录组学数据。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf562
Jiyuan Yang, Nana Wei, Yang Qu, Congcong Hu, Weiwei Zhang, Lin Liu, Hua-Jun Wu, Xiaoqi Zheng

Motivation: Spatial transcriptomics (ST) technologies provide valuable insights into cellular heterogeneity by simultaneously acquiring both gene expression profiles and cellular location information. However, the limited diversity and accuracy of "gold standard" datasets hindered the effectiveness and fairness of benchmarking rapidly growing ST analysis tools.

Results: To address this issue, we proposed Spider, a flexible and comprehensive framework for simulating ST data without requiring real ST data as a reference. By characterizing the spatial patterns using cell type proportions and transition matrix between adjacent cells, Spider can produce more realistic and diverse simulated data and offer enhanced modeling flexibility compared to existing simulation methods. Additionally, Spider provides interactive features for customizing the spatial domain, such as zone segmentation and integration of histology imaging data. Benchmark analyses demonstrate that Spider outperforms other simulation tools in preserving the spatial characteristics of real ST data and facilitating the evaluation of downstream analysis methods. Spider is implemented in Python and available at https://github.com/YANG-ERA/Spider.

Availability and implementation: All codes, simulated ST data in this paper are publicly available at https://github.com/YANG-ERA/Spider.

动机:空间转录组学(ST)技术通过同时获取基因表达谱和细胞位置信息,为细胞异质性提供了有价值的见解。然而,“金标准”数据集的有限多样性和准确性阻碍了对快速增长的ST分析工具进行基准测试的有效性和公平性。结果:为了解决这一问题,我们提出了Spider,这是一个灵活而全面的框架,可以在不需要参考真实ST数据的情况下模拟ST数据。通过使用单元格类型比例和相邻单元格之间的过渡矩阵来表征空间格局,与现有的仿真方法相比,Spider可以产生更真实和多样化的模拟数据,并提供更强的建模灵活性。此外,Spider还提供了用于自定义空间域的交互功能,例如区域分割和组织学成像数据的集成。基准分析表明,Spider在保留真实ST数据的空间特征和便于下游分析方法的评估方面优于其他模拟工具。Spider是用Python实现的,可以在https://github.com/YANG-ERA/Spider.Availability上获得:所有代码,本文中的模拟ST数据都可以在https://github.com/YANG-ERA/Spider.Supplementary上公开获得:补充数据可以在Bioinformatics online上获得。
{"title":"Spider: a flexible and unified framework for simulating spatial transcriptomics data.","authors":"Jiyuan Yang, Nana Wei, Yang Qu, Congcong Hu, Weiwei Zhang, Lin Liu, Hua-Jun Wu, Xiaoqi Zheng","doi":"10.1093/bioinformatics/btaf562","DOIUrl":"10.1093/bioinformatics/btaf562","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics (ST) technologies provide valuable insights into cellular heterogeneity by simultaneously acquiring both gene expression profiles and cellular location information. However, the limited diversity and accuracy of \"gold standard\" datasets hindered the effectiveness and fairness of benchmarking rapidly growing ST analysis tools.</p><p><strong>Results: </strong>To address this issue, we proposed Spider, a flexible and comprehensive framework for simulating ST data without requiring real ST data as a reference. By characterizing the spatial patterns using cell type proportions and transition matrix between adjacent cells, Spider can produce more realistic and diverse simulated data and offer enhanced modeling flexibility compared to existing simulation methods. Additionally, Spider provides interactive features for customizing the spatial domain, such as zone segmentation and integration of histology imaging data. Benchmark analyses demonstrate that Spider outperforms other simulation tools in preserving the spatial characteristics of real ST data and facilitating the evaluation of downstream analysis methods. Spider is implemented in Python and available at https://github.com/YANG-ERA/Spider.</p><p><strong>Availability and implementation: </strong>All codes, simulated ST data in this paper are publicly available at https://github.com/YANG-ERA/Spider.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CholeraSeq: a comprehensive genomic pipeline for cholera surveillance and near real-time outbreak investigation. CholeraSeq:用于霍乱监测和近实时疫情调查的全面基因组管道。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf665
Massimiliano S Tagliamonte, Abhinav Sharma, Alberto Riva, Monika Moir, Marco Salemi, Cheryl Baxter, Tulio de Oliveira, Carla N Mavian, Eduan Wilkinson

Summary: Next Generation Sequencing is widely deployed in cholera-endemic regions, yet an end-to-end reproducible pipeline that unifies read QC, filtering, reference mapping, variant calling/annotation, recombination screening, and extraction of parsimony informative sites/variant codons, phylogenetic inference for downstream phylodynamic and epidemiological analyses have been lacking, slowing outbreak investigation and public health response. CholeraSeq is a high-throughput genomics pipeline for cholera genomic surveillance. It ingests consensus genomes, short read sequence data, draft assemblies, and scales seamlessly from local to cloud environments. To accelerate epidemiological context placement of new outbreak strains, we provide a curated ready-to-use core genome alignment compiled from public data, enabling flexible, fast, integration of new samples for outbreak investigations.

Availability and implementation: CholeraSeq is freely available on the GitHub platform https://github.com/CERI-KRISP/CholeraSeq. CholeraSeq is implemented in Nextflow with a modular design building upon the nf-core community standards.

摘要:动机:下一代测序技术已广泛应用于霍乱流行地区,但缺乏端到端的可重复管道,该管道将读取QC、过滤、参考图谱、变异调用/注释、重组筛选、简约信息位点/变异密码子提取、用于下游系统动力学和流行病学分析的系统发育推断结合起来,从而减缓了疫情调查和公共卫生反应。结果:CholeraSeq是一个用于霍乱基因组监测的高通量基因组学管道。它摄取一致的基因组、短读序列数据、草稿程序集,并从本地环境无缝扩展到云环境。为了加快在流行病学背景下对新爆发菌株的定位,我们提供了从公共数据汇编而成的经过策划的现成核心基因组比对,从而能够灵活、快速地整合新样本,用于爆发调查。可用性和实现:CholeraSeq在GitHub平台https://github.com/CERI-KRISP/CholeraSeq上免费提供。CholeraSeq在Nextflow中实现,采用基于非核心社区标准的模块化设计。补充信息:现成的参考核心对齐和相关的元数据:https://doi.org/10.5281/zenodo.16909942。
{"title":"CholeraSeq: a comprehensive genomic pipeline for cholera surveillance and near real-time outbreak investigation.","authors":"Massimiliano S Tagliamonte, Abhinav Sharma, Alberto Riva, Monika Moir, Marco Salemi, Cheryl Baxter, Tulio de Oliveira, Carla N Mavian, Eduan Wilkinson","doi":"10.1093/bioinformatics/btaf665","DOIUrl":"10.1093/bioinformatics/btaf665","url":null,"abstract":"<p><strong>Summary: </strong>Next Generation Sequencing is widely deployed in cholera-endemic regions, yet an end-to-end reproducible pipeline that unifies read QC, filtering, reference mapping, variant calling/annotation, recombination screening, and extraction of parsimony informative sites/variant codons, phylogenetic inference for downstream phylodynamic and epidemiological analyses have been lacking, slowing outbreak investigation and public health response. CholeraSeq is a high-throughput genomics pipeline for cholera genomic surveillance. It ingests consensus genomes, short read sequence data, draft assemblies, and scales seamlessly from local to cloud environments. To accelerate epidemiological context placement of new outbreak strains, we provide a curated ready-to-use core genome alignment compiled from public data, enabling flexible, fast, integration of new samples for outbreak investigations.</p><p><strong>Availability and implementation: </strong>CholeraSeq is freely available on the GitHub platform https://github.com/CERI-KRISP/CholeraSeq. CholeraSeq is implemented in Nextflow with a modular design building upon the nf-core community standards.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building multiscale Markov state models by systematic mapping of temporal communities. 基于时间群落系统映射的多尺度马尔可夫状态模型构建。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf585
Nir Nitskansky, Kessem Clein, Barak Raveh

Motivation: Biomolecules undergo dynamic transitions among metastable states to carry out their biological functions. Markov State Models (MSMs) effectively capture these metastable states and transitions at a defined temporal scale. However, biomolecular dynamics typically span multiple temporal scales, ranging from fast atomic vibrations to slower conformational changes and folding events.

Results: We introduce multiscale Markov State Models (mMSMs), which capture biomolecular dynamics across multiple temporal resolutions simultaneously via a hierarchy of MSMs, and mMSM-explore, an unsupervised algorithm for generating mMSMs through multiscale adaptive sampling with on-the-fly identification of temporally metastable states. We benchmark our method on a toy system with nested energy minima; on alanine dipeptide, first with and then without assuming prior knowledge of its two reaction coordinates; and finally, on a fast-folding 35-residue miniprotein, where we map folding pathways across scales. We demonstrate efficient mapping of energy landscapes, correct representation of multiscale hierarchies and transition states, accurate inference of stationary probabilities and transition kinetics, as well as de novo identification of underlying slow, intermediate, and fast reaction coordinates. mMSMs reveal how dynamic processes at different scales contribute collectively to the functional mechanisms of biomolecular machines.

Availability and implementation: Python code and instructions are available at https://github.com/ravehlab/mMSM.

动机:生物分子通过亚稳态之间的动态转变来实现其生物学功能。马尔可夫状态模型(mmsm)有效地捕获了这些亚稳态和在定义的时间尺度上的转变。然而,实际的动力学通常跨越多个时间尺度,从快速的原子振动到较慢的构象变化和折叠事件。结果:我们引入了多尺度马尔可夫状态模型(mmsm),该模型通过msm层次结构同时代表了多个时间分辨率的生物分子动力学,以及mMSM-explore,这是一种无监督算法,用于通过多尺度自适应采样生成mmsm,并实时识别时间亚稳态。我们在一个具有嵌套能量最小值的玩具系统上对我们的方法进行基准测试;在丙氨酸二肽上,先知道然后不知道它的两个反应坐标;最后,我们绘制了一个快速折叠的35个残基微型蛋白的折叠路径。我们展示了能量景观的有效映射,多尺度层次和过渡状态的正确表示,平稳概率和过渡动力学的准确推断,以及潜在的慢、中、快速反应坐标的从头识别。mmms揭示了不同尺度的动态过程如何共同促进生物分子机器的功能机制。可用性:Python代码和说明可在https://github.com/ravehlab/mMSM.Supplementary上获得:信息:补充数据可在Bioinformatics在线获得。
{"title":"Building multiscale Markov state models by systematic mapping of temporal communities.","authors":"Nir Nitskansky, Kessem Clein, Barak Raveh","doi":"10.1093/bioinformatics/btaf585","DOIUrl":"10.1093/bioinformatics/btaf585","url":null,"abstract":"<p><strong>Motivation: </strong>Biomolecules undergo dynamic transitions among metastable states to carry out their biological functions. Markov State Models (MSMs) effectively capture these metastable states and transitions at a defined temporal scale. However, biomolecular dynamics typically span multiple temporal scales, ranging from fast atomic vibrations to slower conformational changes and folding events.</p><p><strong>Results: </strong>We introduce multiscale Markov State Models (mMSMs), which capture biomolecular dynamics across multiple temporal resolutions simultaneously via a hierarchy of MSMs, and mMSM-explore, an unsupervised algorithm for generating mMSMs through multiscale adaptive sampling with on-the-fly identification of temporally metastable states. We benchmark our method on a toy system with nested energy minima; on alanine dipeptide, first with and then without assuming prior knowledge of its two reaction coordinates; and finally, on a fast-folding 35-residue miniprotein, where we map folding pathways across scales. We demonstrate efficient mapping of energy landscapes, correct representation of multiscale hierarchies and transition states, accurate inference of stationary probabilities and transition kinetics, as well as de novo identification of underlying slow, intermediate, and fast reaction coordinates. mMSMs reveal how dynamic processes at different scales contribute collectively to the functional mechanisms of biomolecular machines.</p><p><strong>Availability and implementation: </strong>Python code and instructions are available at https://github.com/ravehlab/mMSM.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstructing and comparing signal transduction networks from single-cell protein quantification data. 从单细胞蛋白定量数据重建和比较信号转导网络。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf675
Tim Stohn, Roderick A P M van Eijl, Klaas W Mulder, Lodewyk F A Wessels, Evert Bosdriesz

Motivation: Signal transduction networks regulate many essential biological processes and are frequently aberrated in diseases such as cancer. A mechanistic understanding of such networks, and how they differ between cell populations, is essential to design effective treatment strategies. Typically, such networks are computationally reconstructed based on systematic perturbation experiments, followed by quantification of signaling protein activity. Recent technological advances now allow for the quantification of the activity of many (signaling) proteins simultaneously in single cells. This makes it feasible to reconstruct or quantify signaling networks without performing systematic perturbations.

Results: Here, we introduce single-cell modular response analysis (scMRA) and single-cell comparative network reconstruction (scCNR) to derive signal transduction networks by exploiting the heterogeneity of single-cell (phospho-)protein measurements. The methods treat stochastic variation in total protein abundances as natural perturbation experiments, whose effects propagate through the network and hence facilitate the reconstruction and quantification of the underlying signaling network. scCNR reconstructs cell population-specific networks, where cells from different populations have the same underlying topology, but the interaction strengths can differ between populations. We extensively validated scMRA and scCNR on simulated data, and applied it to unpublished data of (phospho-)protein measurements of EGFR-inhibitor-treated keratinocytes to recover signaling differences downstream of EGFR. scCNR will help to unravel the mechanistic signaling differences between cell populations, and will subsequently guide the development of well-informed treatment strategies.

Availability and implementation: The code used for scCNR in this study has been deposited on Zenodo https://doi.org/10.5281/zenodo.17600937 and is also available as a Python module at https://github.com/ibivu/scmra. Additionally, data and code to reproduce all figures is available at https://github.com/tstohn/scmra_analysis.

动机:信号转导网络调节了许多基本的生物过程,在癌症等疾病中经常发生畸变。对这种网络的机制理解,以及它们在细胞群之间的差异,对于设计有效的治疗策略至关重要。通常,这样的网络是基于系统扰动实验的计算重建,然后是信号蛋白活性的量化。最近的技术进步现在允许在单个细胞中同时定量许多(信号)蛋白的活性。这使得在不进行系统扰动的情况下重建或量化信号网络成为可能。结果:在这里,我们引入单细胞模块化响应分析(scMRA)和单细胞比较网络重建(scCNR),通过利用单细胞(磷-)蛋白测量的异质性来推导信号转导网络。该方法将总蛋白丰度的随机变化视为自然扰动实验,其影响通过网络传播,从而促进了潜在信号网络的重建和量化。scCNR重建细胞群体特异性网络,其中来自不同群体的细胞具有相同的底层拓扑结构,但群体之间的相互作用强度可能不同。我们在模拟数据上广泛验证了scMRA和scCNR,并将其应用于未发表的EGFR抑制剂处理的角质形成细胞的(磷-)蛋白测量数据,以恢复EGFR下游的信号差异。scCNR将有助于揭示细胞群之间信号传导的机制差异,并将随后指导良好的治疗策略的发展。可用性和实现:本研究中用于scCNR的代码已经存放在Zenodo https://doi.org/10.5281/zenodo.17600937上,也可以在https://github.com/ibivu/scmra上作为python模块获得。此外,复制所有数字的代码可在https://github.com/tstohn/scmra_analysis上获得。
{"title":"Reconstructing and comparing signal transduction networks from single-cell protein quantification data.","authors":"Tim Stohn, Roderick A P M van Eijl, Klaas W Mulder, Lodewyk F A Wessels, Evert Bosdriesz","doi":"10.1093/bioinformatics/btaf675","DOIUrl":"10.1093/bioinformatics/btaf675","url":null,"abstract":"<p><strong>Motivation: </strong>Signal transduction networks regulate many essential biological processes and are frequently aberrated in diseases such as cancer. A mechanistic understanding of such networks, and how they differ between cell populations, is essential to design effective treatment strategies. Typically, such networks are computationally reconstructed based on systematic perturbation experiments, followed by quantification of signaling protein activity. Recent technological advances now allow for the quantification of the activity of many (signaling) proteins simultaneously in single cells. This makes it feasible to reconstruct or quantify signaling networks without performing systematic perturbations.</p><p><strong>Results: </strong>Here, we introduce single-cell modular response analysis (scMRA) and single-cell comparative network reconstruction (scCNR) to derive signal transduction networks by exploiting the heterogeneity of single-cell (phospho-)protein measurements. The methods treat stochastic variation in total protein abundances as natural perturbation experiments, whose effects propagate through the network and hence facilitate the reconstruction and quantification of the underlying signaling network. scCNR reconstructs cell population-specific networks, where cells from different populations have the same underlying topology, but the interaction strengths can differ between populations. We extensively validated scMRA and scCNR on simulated data, and applied it to unpublished data of (phospho-)protein measurements of EGFR-inhibitor-treated keratinocytes to recover signaling differences downstream of EGFR. scCNR will help to unravel the mechanistic signaling differences between cell populations, and will subsequently guide the development of well-informed treatment strategies.</p><p><strong>Availability and implementation: </strong>The code used for scCNR in this study has been deposited on Zenodo https://doi.org/10.5281/zenodo.17600937 and is also available as a Python module at https://github.com/ibivu/scmra. Additionally, data and code to reproduce all figures is available at https://github.com/tstohn/scmra_analysis.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145822381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PanForest: predicting genes in genomes using random forests. PanForest:使用随机森林预测基因组中的基因。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btag005
Alan J S Beavan, Maria Rosa Domingo-Sananes, James O McInerney

Motivation: The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent. Understanding these gene co-occurrence and avoidance patterns reveals fundamental principles of genome organization, with applications ranging from evolutionary reconstruction to rational design of synthetic genomes.

Results: PanForest, presented here, uses random forest classifiers to predict the presence and absence of genes in genomes from the set of other genes present. Performance statistics output by PanForest reveal how predictable each gene's presence or absence is, based on the presence or absence of other genes in the genome. Further, PanForest produces statistics indicating the importance of each gene in predicting the presence or absence of each other gene. The PanForest software can run serially or in parallel, thereby facilitating the analysis of pangenomes at Network of Life scale.A pangenome of 12 741 accessory genes in 1000 Escherichia coli genomes was analysed in around 5 h using eight processors. To demonstrate PanForest's utility, we present a case study and show that certain genes associated with resistance to antimicrobial drugs reliably predict the presence or absence of other genes associated with resistance to the same drug. Further, we highlight several associations between those genes and others not known to be associated with antimicrobial resistance (AMR), or associated with resistance to other drugs. We envisage PanForest's use in studies from multiple disciplines concerning the dynamics of gene distributions in pangenomes ranging from biomedical science and synthetic biology to molecular ecology.

Availability and implementation: The software if freely available with a full manual and can be found with at www.github.com/alanbeavan/PanForest DOI: https://doi.org/10.5281/zenodo.17865482.

动机:基因组中某些基因的存在或不存在会影响其他基因是否可能存在或不存在。了解这些基因共存和回避模式揭示了基因组组织的基本原理,其应用范围从进化重建到合成基因组的合理设计。实现:这里介绍的PanForest使用随机森林分类器从存在的其他基因集中预测基因组中基因的存在和不存在。PanForest输出的性能统计数据显示,基于基因组中其他基因的存在或不存在,每个基因的存在或不存在是如何可预测的。此外,PanForest产生统计数据,表明每个基因在预测其他基因存在或不存在时的重要性。PanForest软件可以串行或并行运行,从而便于在生命网络规模上分析泛基因组。结果:使用8台处理器,在大约5小时内分析了1000个大肠杆菌基因组中12741个辅助基因的全基因组。为了证明PanForest的实用性,我们提出了一个案例研究,并表明某些与抗微生物药物耐药性相关的基因可靠地预测了与同一药物耐药性相关的其他基因的存在或缺失。此外,我们强调了这些基因与其他未知的与抗菌素耐药性(AMR)或与其他药物耐药性相关的基因之间的几种关联。我们设想将PanForest应用于从生物医学科学、合成生物学到分子生态学等涉及泛基因组基因分布动力学的多个学科的研究中。可用性:该软件是免费提供的,附有完整的手册,可在www.github.com/alanbeavan/PanForest DOI: https://doi.org/10.5281/zenodo.17865482.Supplementary上找到信息:补充数据可在Bioinformatics在线获得。
{"title":"PanForest: predicting genes in genomes using random forests.","authors":"Alan J S Beavan, Maria Rosa Domingo-Sananes, James O McInerney","doi":"10.1093/bioinformatics/btag005","DOIUrl":"10.1093/bioinformatics/btag005","url":null,"abstract":"<p><strong>Motivation: </strong>The presence or absence of some genes in a genome can influence whether other genes are likely to be present or absent. Understanding these gene co-occurrence and avoidance patterns reveals fundamental principles of genome organization, with applications ranging from evolutionary reconstruction to rational design of synthetic genomes.</p><p><strong>Results: </strong>PanForest, presented here, uses random forest classifiers to predict the presence and absence of genes in genomes from the set of other genes present. Performance statistics output by PanForest reveal how predictable each gene's presence or absence is, based on the presence or absence of other genes in the genome. Further, PanForest produces statistics indicating the importance of each gene in predicting the presence or absence of each other gene. The PanForest software can run serially or in parallel, thereby facilitating the analysis of pangenomes at Network of Life scale.A pangenome of 12 741 accessory genes in 1000 Escherichia coli genomes was analysed in around 5 h using eight processors. To demonstrate PanForest's utility, we present a case study and show that certain genes associated with resistance to antimicrobial drugs reliably predict the presence or absence of other genes associated with resistance to the same drug. Further, we highlight several associations between those genes and others not known to be associated with antimicrobial resistance (AMR), or associated with resistance to other drugs. We envisage PanForest's use in studies from multiple disciplines concerning the dynamics of gene distributions in pangenomes ranging from biomedical science and synthetic biology to molecular ecology.</p><p><strong>Availability and implementation: </strong>The software if freely available with a full manual and can be found with at www.github.com/alanbeavan/PanForest DOI: https://doi.org/10.5281/zenodo.17865482.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12857576/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145946703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PLM-eXplain: divide and conquer the protein embedding space. PLM-eXplain:分而治之蛋白质嵌入空间。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf631
Jan van Eck, Dea Gogishvili, Wilson Silva, Sanne Abeln

Motivation: Protein language models (PLMs) have revolutionized computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. Bridging this gap requires approaches that maintain predictive performance while providing interpretable explanations of model behaviour.

Results: We present PLM-eXplain (PLM-X), an explainable adapter layer that bridges this gap by factoring PLM embeddings into two complementary components: an interpretable subspace based on established biochemical features, and a residual subspace that retains predictive, non-interpretable information. Using embeddings from ESM2 and ProtBert, PLM-X incorporates well-established properties, including secondary structure and hydropathy, while maintaining high predictive performance. We demonstrate the effectiveness of our approach across three biologically relevant classification tasks: extracellular vesicle association, transmembrane helix prediction, and aggregation propensity prediction. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalizable solution for enhancing PLM interpretability across various downstream applications.

Availability and implementation: Source code and models are available at https://github.com/AIT4LIFE-UU/PLM-eXplain/.

动机:蛋白质语言模型(PLMs)通过其为各种预测任务生成强大的序列表示的能力,彻底改变了计算生物学。然而,它们的黑箱性质限制了生物学解释和翻译可操作的见解。弥合这一差距需要在提供模型行为的可解释解释的同时保持预测性能的方法。结果:我们提出了PLM- explain (PLM- x),这是一个可解释的适配器层,通过将PLM嵌入分解为两个互补的组件来弥合这一差距:一个基于已建立的生化特征的可解释子空间,和一个保留预测性、不可解释信息的残差子空间。利用ESM2和ProtBert的嵌入,PLM-X结合了完善的特性,包括二级结构和亲水性,同时保持了高预测性能。我们在三个生物学相关的分类任务中证明了我们的方法的有效性:细胞外囊泡关联、跨膜螺旋预测和聚集倾向预测。PLM- x能够在不牺牲准确性的情况下对模型决策进行生物解释,为提高PLM在各种下游应用程序中的可解释性提供了一种通用的解决方案。可用性和实现:源代码和模型可在https://github.com/AIT4LIFE-UU/PLM-eXplain/.Supplementary上获得信息:其他数据可在在线补充材料中获得。
{"title":"PLM-eXplain: divide and conquer the protein embedding space.","authors":"Jan van Eck, Dea Gogishvili, Wilson Silva, Sanne Abeln","doi":"10.1093/bioinformatics/btaf631","DOIUrl":"10.1093/bioinformatics/btaf631","url":null,"abstract":"<p><strong>Motivation: </strong>Protein language models (PLMs) have revolutionized computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. Bridging this gap requires approaches that maintain predictive performance while providing interpretable explanations of model behaviour.</p><p><strong>Results: </strong>We present PLM-eXplain (PLM-X), an explainable adapter layer that bridges this gap by factoring PLM embeddings into two complementary components: an interpretable subspace based on established biochemical features, and a residual subspace that retains predictive, non-interpretable information. Using embeddings from ESM2 and ProtBert, PLM-X incorporates well-established properties, including secondary structure and hydropathy, while maintaining high predictive performance. We demonstrate the effectiveness of our approach across three biologically relevant classification tasks: extracellular vesicle association, transmembrane helix prediction, and aggregation propensity prediction. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalizable solution for enhancing PLM interpretability across various downstream applications.</p><p><strong>Availability and implementation: </strong>Source code and models are available at https://github.com/AIT4LIFE-UU/PLM-eXplain/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790820/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
isoSeQL: comparing long-read isoforms across multiple datasets. isoSeQL:比较跨多个数据集的长读异构体。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf680
Christine S Liu, Jerold Chun

Motivation: Long-read sequencing has made RNA isoform detection and characterization more accessible. While several bioinformatics tools have been developed to examine the data generated by these approaches, a major challenge in the field has been comparing isoform profiles across several samples.

Results: We developed isoSeQL, a tool for compiling long-read transcriptomic data, identifying common and unique isoforms across multiple samples, and extracting and visualizing various metrics. isoSeQL will augment approaches that utilize long-read sequencing to discover novel isoforms and to examine how isoforms vary across different experimental and biological conditions and cell types. We demonstrate how to use isoSeQL with publicly available datasets.

Availability and implementation: isoSeQL is available on Github: https://github.com/christine-liu/isoSeQL and Zenodo:https://doi.org/10.5281/zenodo.15717809.

动机:长读测序使RNA异构体的检测和表征更容易实现。虽然已经开发了几种生物信息学工具来检查这些方法产生的数据,但该领域的一个主要挑战是比较几个样本的异构体剖面。结果:我们开发了isoSeQL,这是一个编译长读转录组数据的工具,可以在多个样本中识别常见和独特的同种异构体,并提取和可视化各种指标。isoSeQL将增加利用长读测序来发现新的异构体的方法,并检查异构体在不同的实验和生物条件以及细胞类型中如何变化。我们将演示如何对公开可用的数据集使用isoSeQL。可用性:isoSeQL可在Github上获得:https://github.com/christine-liu/isoSeQL和Zenodo: https://doi.org/10.5281/zenodo.15717809.Supplementary信息:补充数据可在Bioinformatics在线获得。
{"title":"isoSeQL: comparing long-read isoforms across multiple datasets.","authors":"Christine S Liu, Jerold Chun","doi":"10.1093/bioinformatics/btaf680","DOIUrl":"10.1093/bioinformatics/btaf680","url":null,"abstract":"<p><strong>Motivation: </strong>Long-read sequencing has made RNA isoform detection and characterization more accessible. While several bioinformatics tools have been developed to examine the data generated by these approaches, a major challenge in the field has been comparing isoform profiles across several samples.</p><p><strong>Results: </strong>We developed isoSeQL, a tool for compiling long-read transcriptomic data, identifying common and unique isoforms across multiple samples, and extracting and visualizing various metrics. isoSeQL will augment approaches that utilize long-read sequencing to discover novel isoforms and to examine how isoforms vary across different experimental and biological conditions and cell types. We demonstrate how to use isoSeQL with publicly available datasets.</p><p><strong>Availability and implementation: </strong>isoSeQL is available on Github: https://github.com/christine-liu/isoSeQL and Zenodo:https://doi.org/10.5281/zenodo.15717809.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145844273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient algorithms for simulating sequences along a phylogenetic tree. 沿着系统发育树模拟序列的有效算法。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf686
Elya Wygoda, Asher Moshe, Nimrod Serok, Edo Dotan, Noa Ecker, Naiel Jabareen, Omer Israeli, Itsik Pe'er, Tal Pupko

Motivation: Sequence simulations along phylogenetic trees play an important role in numerous molecular evolution studies such as benchmarking algorithms for ancestral sequence reconstruction, multiple sequence alignment, and phylogeny inference. They are also used in phylogenetic model-selection tasks, including the inference of selective forces. Recently, Approximate Bayesian Computation (ABC)-based approaches have been developed for inferring parameters of complex evolutionary models, which rely on massive generation of simulated data. For all these applications, computationally efficient sequence simulators are essential.

Results: In this study, we investigate fast algorithms for simulating sequences along a phylogenetic tree, focusing on accelerating the speed-limiting component of the simulation process: handling insertion and deletion (indel) events. We demonstrate that data structures which efficiently store indel events along a tree can substantially accelerate the simulation process compared to a naive approach. To illustrate the utility of this efficient simulator, we integrated it into an ABC-based algorithm for inferring indel model parameters and applied it to study indel dynamics within Chiroptera.

Availability and implementation: The source code for the different simulation algorithms, alongside the data used, is available at: https://github.com/nimrodSerokTAU/evo-sim. The simulator has also been integrated into SpartaABC, a website for the inference of indel parameters, accessible at: https://spartaabc.tau.ac.il/.

动机:沿着系统发育树的序列模拟在许多分子进化研究中发挥着重要作用,例如祖先序列重建的基准算法,多序列比对和系统发育推断。它们也用于系统发育模型选择任务,包括选择力的推断。近年来,基于近似贝叶斯计算(ABC)的复杂进化模型参数推断方法得到了发展,该方法依赖于大量模拟数据的生成。对于所有这些应用,计算效率高的序列模拟器是必不可少的。结果:在本研究中,我们研究了沿系统发育树模拟序列的快速算法,重点是加速模拟过程中的速度限制部分:处理插入和删除(indel)事件。我们证明,与简单的方法相比,沿着树有效地存储indel事件的数据结构可以大大加快模拟过程。为了说明该模拟器的实用性,我们将其集成到基于abc的indel模型参数推断算法中,并将其应用于研究翼目昆虫indel动力学。可用性:不同模拟算法的源代码,以及使用的数据,可在https://github.com/nimrodSerokTAU/evo-sim上获得。该模拟器还被集成到用于推断indel参数的网站SpartaABC中,该网站可访问:https://spartaabc.tau.ac.il/.Supplementary information:补充数据可在Bioinformatics在线获得。
{"title":"Efficient algorithms for simulating sequences along a phylogenetic tree.","authors":"Elya Wygoda, Asher Moshe, Nimrod Serok, Edo Dotan, Noa Ecker, Naiel Jabareen, Omer Israeli, Itsik Pe'er, Tal Pupko","doi":"10.1093/bioinformatics/btaf686","DOIUrl":"10.1093/bioinformatics/btaf686","url":null,"abstract":"<p><strong>Motivation: </strong>Sequence simulations along phylogenetic trees play an important role in numerous molecular evolution studies such as benchmarking algorithms for ancestral sequence reconstruction, multiple sequence alignment, and phylogeny inference. They are also used in phylogenetic model-selection tasks, including the inference of selective forces. Recently, Approximate Bayesian Computation (ABC)-based approaches have been developed for inferring parameters of complex evolutionary models, which rely on massive generation of simulated data. For all these applications, computationally efficient sequence simulators are essential.</p><p><strong>Results: </strong>In this study, we investigate fast algorithms for simulating sequences along a phylogenetic tree, focusing on accelerating the speed-limiting component of the simulation process: handling insertion and deletion (indel) events. We demonstrate that data structures which efficiently store indel events along a tree can substantially accelerate the simulation process compared to a naive approach. To illustrate the utility of this efficient simulator, we integrated it into an ABC-based algorithm for inferring indel model parameters and applied it to study indel dynamics within Chiroptera.</p><p><strong>Availability and implementation: </strong>The source code for the different simulation algorithms, alongside the data used, is available at: https://github.com/nimrodSerokTAU/evo-sim. The simulator has also been integrated into SpartaABC, a website for the inference of indel parameters, accessible at: https://spartaabc.tau.ac.il/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797210/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145851820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cleanifier: contamination removal from microbial sequences using spaced seeds of a human pangenome index. 净化器:利用人类泛基因组指数的间隔种子去除微生物序列中的污染。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf632
Jens Zentgraf, Johanna Elena Schmitz, Sven Rahmann

Motivation: The first step when working with DNA data of human-derived microbiomes is to remove human contamination for two reasons. First, many countries have strict privacy and data protection guidelines for human sequence data, so microbiome data containing partly human data cannot be easily further processed or published. Second, human contamination may cause problems in downstream analysis, such as metagenomic binning or genome assembly. For large-scale metagenomics projects, fast and accurate removal of human contamination is therefore critical.

Results: We introduce Cleanifier, a fast and memory frugal alignment-free tool for detecting and removing human contamination based on gapped k-mers, or spaced seeds. Cleanifier uses a pangenome index of known human gapped k-mers, and the creation and use of alternative references is also possible. Reads are classified and filtered according to their gapped k-mer content. Cleanifier supports two filtering modes: one that queries all gapped k-mers and one that queries only a sample of them. A comparison of Cleanifier with other state-of-the-art tools shows that the sampling mode makes Cleanifier the fastest method with comparable accuracy. When using a probabilistic Cuckoo filter to store the complete k-mer set, Cleanifier has similar memory requirements to methods that use a sampled minimizer index. At the same time, Cleanifier is more flexible, because it can use different sampling methods on the same index.

Availability and implementation: Cleanifier is available via gitlab (https://gitlab.com/rahmannlab/cleanifier), PyPi (https://pypi.org/project/cleanifier/), and Bioconda (https://anaconda.org/bioconda/cleanifier). The pre-computed human pangenome index is available at Zenodo (https://doi.org/10.5281/zenodo.15639519).

动机:处理人类来源的微生物组DNA数据的第一步是消除人类污染,原因有两个。首先,许多国家对人类序列数据有严格的隐私和数据保护准则,因此包含部分人类数据的微生物组数据不容易进一步处理或公布。其次,人类污染可能导致下游分析出现问题,如宏基因组分拆或基因组组装。因此,对于大规模宏基因组学项目,快速准确地去除人类污染至关重要。结果:我们介绍了Cleanifier,这是一种基于间隙k-mers或间隔种子的快速且节省内存的无对齐工具,用于检测和去除人类污染。清洁器使用已知人类缺口k-mers的泛基因组索引,并且创建和使用替代参考也是可能的。根据它们的k-mer内容进行分类和过滤。Cleanifier支持两种过滤模式:一种查询所有间隙k-mers,另一种只查询其中的一个样本。Cleanifier与其他最先进的工具的比较表明,采样模式使Cleanifier最快的方法具有相当的准确性。当使用概率Cuckoo过滤器来存储完整的k-mer集时,Cleanifier与使用采样最小化索引的方法具有相似的内存需求。同时,Cleanifier更加灵活,因为它可以对同一指标使用不同的采样方法。可用性和实现:Cleanifier可通过gitlab (https://gitlab.com/rahmannlab/cleanifier), PyPi (https://pypi.org/project/cleanifier/)和Bioconda (https://anaconda.org/bioconda/cleanifier)获得。预先计算的人类泛基因组指数可在Zenodo上获得(https://doi.org/10.5281/zenodo.15639519).Supplementary information:可在线获得)。
{"title":"Cleanifier: contamination removal from microbial sequences using spaced seeds of a human pangenome index.","authors":"Jens Zentgraf, Johanna Elena Schmitz, Sven Rahmann","doi":"10.1093/bioinformatics/btaf632","DOIUrl":"10.1093/bioinformatics/btaf632","url":null,"abstract":"<p><strong>Motivation: </strong>The first step when working with DNA data of human-derived microbiomes is to remove human contamination for two reasons. First, many countries have strict privacy and data protection guidelines for human sequence data, so microbiome data containing partly human data cannot be easily further processed or published. Second, human contamination may cause problems in downstream analysis, such as metagenomic binning or genome assembly. For large-scale metagenomics projects, fast and accurate removal of human contamination is therefore critical.</p><p><strong>Results: </strong>We introduce Cleanifier, a fast and memory frugal alignment-free tool for detecting and removing human contamination based on gapped k-mers, or spaced seeds. Cleanifier uses a pangenome index of known human gapped k-mers, and the creation and use of alternative references is also possible. Reads are classified and filtered according to their gapped k-mer content. Cleanifier supports two filtering modes: one that queries all gapped k-mers and one that queries only a sample of them. A comparison of Cleanifier with other state-of-the-art tools shows that the sampling mode makes Cleanifier the fastest method with comparable accuracy. When using a probabilistic Cuckoo filter to store the complete k-mer set, Cleanifier has similar memory requirements to methods that use a sampled minimizer index. At the same time, Cleanifier is more flexible, because it can use different sampling methods on the same index.</p><p><strong>Availability and implementation: </strong>Cleanifier is available via gitlab (https://gitlab.com/rahmannlab/cleanifier), PyPi (https://pypi.org/project/cleanifier/), and Bioconda (https://anaconda.org/bioconda/cleanifier). The pre-computed human pangenome index is available at Zenodo (https://doi.org/10.5281/zenodo.15639519).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758600/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145552501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
cuteSV-OL: a real-time structural variation detection framework for nanopore sequencing devices. cuteSV-OL:用于纳米孔测序装置的实时结构变化检测框架。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf668
Weimin Guo, Yadong Liu, Yadong Wang, Tao Jiang

Summary: Nanopore sequencing technology enables real-time sequencing and is widely used in rapid detection applications. However, in clinical scenarios, existing structural variant (SV) detection tools typically separate sequencing from computation, limiting their timeliness for clinical applications. To address this, we introduce cuteSV-OL, a novel framework designed for real-time SV discovery, which can be embedded within nanopore sequencing instruments to analyze data concurrently with its generation. Additionally, cuteSV-OL features a real-time SV detection rate evaluation module, allowing users to terminate sequencing early when appropriate, thereby reducing time and cost. Experimental results show that on a standard desktop computer, cuteSV-OL can perform real-time analysis during sequencing and complete SV calling within min after sequencing ends, achieving performance comparable to offline methods. This approach has the potential to enhance rapid clinical diagnostics.

Availability and implementation: cuteSV-OL is released under the MIT license and is available at https://github.com/gwmHIT/cuteSV-OL. It can also be installed via Bioconda or accessed through https://doi.org/10.5281/zenodo.17777436.

摘要:纳米孔测序技术实现了实时测序,广泛应用于快速检测领域。然而,在临床场景中,现有的结构变异(SV)检测工具通常将测序与计算分离,限制了其临床应用的及时性。为了解决这个问题,我们引入了cuteSV-OL,这是一个为实时SV发现而设计的新框架,它可以嵌入到纳米孔测序仪器中,在数据生成的同时分析数据。此外,cuteSV-OL还具有实时SV检测率评估模块,允许用户在适当的时候提前终止测序,从而减少时间和成本。实验结果表明,在标准台式计算机上,cuteSV-OL可以在测序过程中进行实时分析,并在测序结束后几分钟内完成SV调用,性能可与离线方法媲美。这种方法具有增强快速临床诊断的潜力。可用性和实现:cuteSV-OL在MIT许可下发布,可从https://github.com/gwmHIT/cuteSV-OL获得。它也可以通过Bioconda安装或通过https://doi.org/10.5281/zenodo.17777436.Supplementary信息访问:补充数据可在Bioinformatics在线获得。
{"title":"cuteSV-OL: a real-time structural variation detection framework for nanopore sequencing devices.","authors":"Weimin Guo, Yadong Liu, Yadong Wang, Tao Jiang","doi":"10.1093/bioinformatics/btaf668","DOIUrl":"10.1093/bioinformatics/btaf668","url":null,"abstract":"<p><strong>Summary: </strong>Nanopore sequencing technology enables real-time sequencing and is widely used in rapid detection applications. However, in clinical scenarios, existing structural variant (SV) detection tools typically separate sequencing from computation, limiting their timeliness for clinical applications. To address this, we introduce cuteSV-OL, a novel framework designed for real-time SV discovery, which can be embedded within nanopore sequencing instruments to analyze data concurrently with its generation. Additionally, cuteSV-OL features a real-time SV detection rate evaluation module, allowing users to terminate sequencing early when appropriate, thereby reducing time and cost. Experimental results show that on a standard desktop computer, cuteSV-OL can perform real-time analysis during sequencing and complete SV calling within min after sequencing ends, achieving performance comparable to offline methods. This approach has the potential to enhance rapid clinical diagnostics.</p><p><strong>Availability and implementation: </strong>cuteSV-OL is released under the MIT license and is available at https://github.com/gwmHIT/cuteSV-OL. It can also be installed via Bioconda or accessed through https://doi.org/10.5281/zenodo.17777436.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1