首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
NanoSSL: attention mechanism-based self-supervised learning method for protein identification using nanopores. 基于注意机制的纳米孔蛋白质识别自监督学习方法。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf657
Yong Xie, Jindong Li, Ziyan Zhang, Bin Meng, Shuaijian Dai, Yuchen Zhou, Eamonn Kennedy, Niandong Jiao, Haobin Chen, Zhuxin Dong

Motivation: Nanopores are cutting-edge interdisciplinary tools that can analyze biomolecules at the single-molecule level for many applications, e.g. DNA sequencing. Efforts are underway to extend nanopores to proteomics, including the development of machine learning algorithms for protein sequencing and identification. However, single-molecule data are intrinsically noisy and hard to process. Moreover, the development and performance of machine learning for nanopore is jeopardized by data scarcity. Self-supervised learning is an emerging method that may yield advantages in nanopore scenarios.

Results: We propose and experimentally validate Nanopore analysis using Self-Supervised Learning (NanoSSL), a generative self-supervised learning framework based on attention mechanisms for the identification of protein signals from nanopores. Leveraging a two-step approach consisting of self-supervised pre-training and supervised fine-tuning, NanoSSL learns useful feature representations from empirical data to facilitate downstream classification tasks. Inspired by the concept of fragmentation in conventional protein sequencing technologies, during pretraining each translocation event is split into multiple non-overlapping fragments of equal size, some of which are randomly masked and reconstructed using a masked autoencoder. Learning the feature representations of the reconstructed nanopore events facilitates molecular identification in fine-tuning. In this study, we retested a publicly available nanopore multiplexed protein sensing dataset for model iteration, and subsequently measured Alzheimer's disease biomarker Aβ1-42 using homemade solid-state nanopores. Empirical results indicated NanoSSL achieved an unprecedented performance across four metrics: accuracy, precision, recall, and F1 score, when classifying two mutated Aβ1-42, E22G and G37R. The self-supervised learning and attention mechanism were verified as the source of performance gains.

Availability and implementation: The main program is available at https://doi.org/10.5281/zenodo.17172822.

动机:纳米孔是一种前沿的跨学科工具,可以在单分子水平上分析生物分子,用于许多应用,例如DNA测序。目前正在努力将纳米孔扩展到蛋白质组学,包括开发用于蛋白质测序和鉴定的机器学习算法。然而,单分子数据本质上是有噪声的,难以处理。此外,纳米孔机器学习的发展和性能受到数据稀缺的影响。自监督学习是一种新兴的方法,可能在纳米孔场景中产生优势。结果:我们提出并实验验证了使用自监督学习(NanoSSL)进行纳米孔分析,这是一种基于注意力机制的生成式自监督学习框架,用于识别纳米孔中的蛋白质信号。利用由自我监督预训练和监督微调组成的两步方法,NanoSSL从经验数据中学习有用的特征表示,以促进下游分类任务。受传统蛋白质测序技术中片段化概念的启发,在预训练过程中,每个易位事件被分割成多个大小相等的非重叠片段,其中一些片段被随机屏蔽,并使用屏蔽自编码器进行重构。学习重构的纳米孔事件的特征表示有助于分子识别的微调。在这项研究中,我们重新测试了一个公开可用的纳米孔多重蛋白质传感数据集,用于模型迭代,随后使用自制的固态纳米孔测量了阿尔茨海默病生物标志物a β1-42。实验结果表明,在对两个突变的a - β1-42、E22G和G37R进行分类时,NanoSSL在正确率、精密度、召回率和F1分数四个指标上取得了前所未有的成绩。验证了自监督学习和注意机制是成绩提高的来源。可用性和实现:主程序可在https://doi.org/10.5281/zenodo.17172822上获得。
{"title":"NanoSSL: attention mechanism-based self-supervised learning method for protein identification using nanopores.","authors":"Yong Xie, Jindong Li, Ziyan Zhang, Bin Meng, Shuaijian Dai, Yuchen Zhou, Eamonn Kennedy, Niandong Jiao, Haobin Chen, Zhuxin Dong","doi":"10.1093/bioinformatics/btaf657","DOIUrl":"10.1093/bioinformatics/btaf657","url":null,"abstract":"<p><strong>Motivation: </strong>Nanopores are cutting-edge interdisciplinary tools that can analyze biomolecules at the single-molecule level for many applications, e.g. DNA sequencing. Efforts are underway to extend nanopores to proteomics, including the development of machine learning algorithms for protein sequencing and identification. However, single-molecule data are intrinsically noisy and hard to process. Moreover, the development and performance of machine learning for nanopore is jeopardized by data scarcity. Self-supervised learning is an emerging method that may yield advantages in nanopore scenarios.</p><p><strong>Results: </strong>We propose and experimentally validate Nanopore analysis using Self-Supervised Learning (NanoSSL), a generative self-supervised learning framework based on attention mechanisms for the identification of protein signals from nanopores. Leveraging a two-step approach consisting of self-supervised pre-training and supervised fine-tuning, NanoSSL learns useful feature representations from empirical data to facilitate downstream classification tasks. Inspired by the concept of fragmentation in conventional protein sequencing technologies, during pretraining each translocation event is split into multiple non-overlapping fragments of equal size, some of which are randomly masked and reconstructed using a masked autoencoder. Learning the feature representations of the reconstructed nanopore events facilitates molecular identification in fine-tuning. In this study, we retested a publicly available nanopore multiplexed protein sensing dataset for model iteration, and subsequently measured Alzheimer's disease biomarker Aβ1-42 using homemade solid-state nanopores. Empirical results indicated NanoSSL achieved an unprecedented performance across four metrics: accuracy, precision, recall, and F1 score, when classifying two mutated Aβ1-42, E22G and G37R. The self-supervised learning and attention mechanism were verified as the source of performance gains.</p><p><strong>Availability and implementation: </strong>The main program is available at https://doi.org/10.5281/zenodo.17172822.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"42 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145919221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BPSS: a Nextflow pipeline for Bacterial Peptide Sequence Selection to detect protein diversity. BPSS:用于检测蛋白质多样性的细菌肽序列选择的Nextflow管道。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf677
Sylvère Bastien, Pauline François, Sara Moussadeq, Jérôme Lemoine, Karen Moreau, François Vandenesch

Motivation: Sequence variability can be extremely high, particularly in bacteria due to the rapid accumulation of mutations linked to their high replication rate and environmental selection pressure, which often favors diversifying selection. For most species, there are no automated, computationally efficient tools available for constructing a nonredundant database covering the allelic variability of target proteins.

Results: We have thus developed Bacterial Peptide Sequence Selection, a Nextflow pipeline to define a minimal list of peptide sequences for detecting all variants of a protein of interest.

Availability and implementation: All the code and containers used are freely available on Gitlab from https://gitbio.ens-lyon.fr/ciri/stapath/bpss or on Zenodo (10.5281/zenodo.16894981) under GPLv3 open-source license and DockerHub platform from https://hub.docker.com/u/stapath.

动机:序列可变性可能非常高,特别是在细菌中,由于与它们的高复制率和环境选择压力相关的突变的快速积累,这通常有利于多样化选择。对于大多数物种来说,没有自动化的、计算效率高的工具可用于构建覆盖目标蛋白等位基因变异的非冗余数据库。结果:我们因此开发了细菌肽序列选择(BPSS),这是Nextflow的一个管道,用于定义用于检测感兴趣蛋白质的所有变体的肽序列的最小列表。可用性:所有使用的代码和容器都可以在Gitlab上从https://gitbio.ens-lyon.fr/ciri/stapath/bpss免费获得,或者在GPLv3开源许可证和DockerHub平台下从https://hub.docker.com/u/stapath.Supplementary免费获得Zenodo (10.5281/ Zenodo .16894981)。
{"title":"BPSS: a Nextflow pipeline for Bacterial Peptide Sequence Selection to detect protein diversity.","authors":"Sylvère Bastien, Pauline François, Sara Moussadeq, Jérôme Lemoine, Karen Moreau, François Vandenesch","doi":"10.1093/bioinformatics/btaf677","DOIUrl":"10.1093/bioinformatics/btaf677","url":null,"abstract":"<p><strong>Motivation: </strong>Sequence variability can be extremely high, particularly in bacteria due to the rapid accumulation of mutations linked to their high replication rate and environmental selection pressure, which often favors diversifying selection. For most species, there are no automated, computationally efficient tools available for constructing a nonredundant database covering the allelic variability of target proteins.</p><p><strong>Results: </strong>We have thus developed Bacterial Peptide Sequence Selection, a Nextflow pipeline to define a minimal list of peptide sequences for detecting all variants of a protein of interest.</p><p><strong>Availability and implementation: </strong>All the code and containers used are freely available on Gitlab from https://gitbio.ens-lyon.fr/ciri/stapath/bpss or on Zenodo (10.5281/zenodo.16894981) under GPLv3 open-source license and DockerHub platform from https://hub.docker.com/u/stapath.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145835679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MegaPlantTF: a machine learning framework for comprehensive identification and classification of plant transcription factors. MegaPlantTF:一个用于植物转录因子综合鉴定和分类的机器学习框架。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf678
Genereux Akotenou, Asmaa H Hassan, Morad M Mokhtar, Achraf El Allali

Motivation: Understanding the role of transcription factors (TFs) in plants is essential for the study of gene regulation and various biological processes. However, both TF detection and classification remain challenging due to the great diversity and complexity of these proteins. Conventional approaches, such as BLAST, often suffer from high computational complexity and limited performance on less common TF families.

Results: We introduce MegaPlantTF, the first comprehensive machine learning and deep learning framework for the prediction (TF versus non-TF) and classification (family-level) of plant TFs. Our method employs k-mer-based protein representations and a two-stage architecture combining a deep feed-forward neural network with a stacking ensemble classifier. To ensure robust performance assessment, we report micro-, macro-, and weighted-average performance metrics, providing a holistic evaluation of both frequent and underrepresented TF families. Additionally, we employ threshold-based evaluation to calibrate confidence in TF detection. The results show that MegaPlantTF achieves strong accuracy and precision, particularly with a k-mer size of 3 and a classification threshold of 0.5, and maintains stable performance even under stringent thresholds. In addition to the standard cross-validation tests, a use case study on Sorghum bicolor confirms that our method performs strongly in the genome-wide analysis, making it highly suitable for large-scale TF identification and classification tasks. MegaPlantTF represents a novel contribution by integrating k-mer encoding, binary family-specific classifiers, and a two-stage stacking ensemble into a unified, reproducible framework for large-scale plant TF identification and classification.

Availability and implementation: MegaPlantTF is freely accessible through a public web server available at https://bioinformatics.um6p.ma/MegaPlantTF. The complete source code, including pretrained models and example datasets, is available at https://github.com/Bioinformatics-UM6P/MegaPlantTF.

研究动机:了解转录因子在植物中的作用对研究基因调控和各种生物过程至关重要。然而,由于这些蛋白质的多样性和复杂性,TF的检测和分类仍然具有挑战性。传统的方法,如BLAST,通常在不太常见的转录因子家族上存在较高的计算复杂度和有限的性能。结果:我们引入了MegaPlantTF,这是第一个全面的机器学习和深度学习框架,用于预测(TF与非TF)和分类(家族水平)植物转录因子。我们的方法采用基于k-mer的蛋白质表示和结合深度前馈神经网络和堆叠集成分类器的两阶段架构。为了确保可靠的性能评估,我们报告了微观、宏观和加权平均性能指标,提供了频繁和代表性不足的TF家族的整体评估。此外,我们采用基于阈值的评估来校准TF检测的置信度。结果表明,MegaPlantTF在k-mer大小为3、分类阈值为0.5的情况下具有较强的准确性和精密度,即使在严格的阈值下也能保持稳定的性能。除了标准的交叉验证测试外,对高粱双色的用例研究证实,我们的方法在全基因组分析中表现出色,使其非常适合大规模的TF鉴定和分类任务。MegaPlantTF通过将k-mer编码、二元家族特异性分类器和两阶段堆叠集成到一个统一的、可重复的框架中,为大规模植物TF识别和分类做出了新的贡献。可用性和实现:MegaPlantTF可通过公共web服务器(https://bioinformatics.um6p.ma/MegaPlantTF)免费访问。完整的源代码,包括预训练模型和示例数据集,可在https://github.com/Bioinformatics-UM6P/MegaPlantTF.Contacts和补充信息:补充数据可在线获得。任何通信应通过电子邮件或在MegaPlantTF Github页面上打开问题发给作者。
{"title":"MegaPlantTF: a machine learning framework for comprehensive identification and classification of plant transcription factors.","authors":"Genereux Akotenou, Asmaa H Hassan, Morad M Mokhtar, Achraf El Allali","doi":"10.1093/bioinformatics/btaf678","DOIUrl":"10.1093/bioinformatics/btaf678","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding the role of transcription factors (TFs) in plants is essential for the study of gene regulation and various biological processes. However, both TF detection and classification remain challenging due to the great diversity and complexity of these proteins. Conventional approaches, such as BLAST, often suffer from high computational complexity and limited performance on less common TF families.</p><p><strong>Results: </strong>We introduce MegaPlantTF, the first comprehensive machine learning and deep learning framework for the prediction (TF versus non-TF) and classification (family-level) of plant TFs. Our method employs k-mer-based protein representations and a two-stage architecture combining a deep feed-forward neural network with a stacking ensemble classifier. To ensure robust performance assessment, we report micro-, macro-, and weighted-average performance metrics, providing a holistic evaluation of both frequent and underrepresented TF families. Additionally, we employ threshold-based evaluation to calibrate confidence in TF detection. The results show that MegaPlantTF achieves strong accuracy and precision, particularly with a k-mer size of 3 and a classification threshold of 0.5, and maintains stable performance even under stringent thresholds. In addition to the standard cross-validation tests, a use case study on Sorghum bicolor confirms that our method performs strongly in the genome-wide analysis, making it highly suitable for large-scale TF identification and classification tasks. MegaPlantTF represents a novel contribution by integrating k-mer encoding, binary family-specific classifiers, and a two-stage stacking ensemble into a unified, reproducible framework for large-scale plant TF identification and classification.</p><p><strong>Availability and implementation: </strong>MegaPlantTF is freely accessible through a public web server available at https://bioinformatics.um6p.ma/MegaPlantTF. The complete source code, including pretrained models and example datasets, is available at https://github.com/Bioinformatics-UM6P/MegaPlantTF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12803907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145835682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hi-Enhancer: a two-stage framework for prediction and localization of enhancers based on Blending-KAN and Stacking-Auto models. Hi-Enhancer:基于blend - kan和Stacking-Auto模型的两阶段增强子预测和定位框架。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf441
Aimin Li, Haotian Zhou, Rong Fei, Juntao Zou, Xiguo Yuan, Yajun Liu, Saurav Mallik, Xinhong Hei, Lei Wang

Motivation: Gene expression plays a crucial role in cell function, and enhancers can regulate gene expression precisely. Therefore, accurate prediction of enhancers is particularly critical. However, existing prediction methods have low accuracy or rely on fixed multiple epigenetic signals, which may not always be available.

Results: We propose a two-stage framework that accurately predicts enhancers by flexibly combining multiple epigenetic signals. In the first stage, we designed a Blending-KAN model, which integrates the results of various base classifiers and employs Kolmogorov-Arnold Networks (KAN) as a meta-classifier to predict enhancers based on flexible combinations of multiple epigenetic signals. In the second stage, we developed a Stacking-Auto model, which extracted sequence features using DNABERT-2 and located the enhancers based on the Stacking strategy and AutoGluon framework. The accuracy of the Blending-KAN model reached 99.69 ± 0.11% when five epigenetic signals were used. In cross-cell line prediction, the accuracy was more significant than or equal to 93.72%. With Gaussian noise, it still maintains an accuracy of 98.74 ± 0.03%. In the second stage, the accuracy of the Stacking-Auto model is 80.50%, which is better than the existing 17 methods. The results show that our models can be flexibly used to predict and locate enhancers utilizing a combination of multiple epigenetic signals.

Availability and implementation: The source code is available at https://github.com/emanlee/Hi-Enhancer and https://doi.org/10.6084/m9.figshare.29262158.v1.

动机:基因表达在细胞功能中起着至关重要的作用,增强子可以精确调控基因表达。因此,对增强子的准确预测尤为重要。然而,现有的预测方法精度较低,或者依赖于固定的多个表观遗传信号,这些信号可能并不总是可用的。结果:我们提出了一个两阶段框架,通过灵活组合多个表观遗传信号来准确预测增强子。在第一阶段,我们设计了一个blend -KAN模型,该模型集成了各种基分类器的结果,并采用Kolmogorov-Arnold Networks (KAN)作为元分类器,基于多个表观遗传信号的灵活组合来预测增强子。在第二阶段,我们建立了一个stack - auto模型,该模型使用DNABERT-2提取序列特征,并基于Stacking策略和AutoGluon框架定位增强子。当使用5个表观遗传信号时,blendin - kan模型的准确率达到99.69±0.11%。在跨细胞系预测中,准确率大于等于93.72%。在高斯噪声条件下,仍能保持98.74±0.03%的精度。在第二阶段,stack - auto模型的准确率达到80.50%,优于现有的17种方法。结果表明,我们的模型可以灵活地利用多种表观遗传信号的组合来预测和定位增强子。可用性和实施:源代码可在https://github.com/emanlee/Hi-Enhancer和https://doi.org/10.6084/m9.figshare.29262158.v1.Supplementary上获得信息:补充数据可在Bioinformatics在线上获得。
{"title":"Hi-Enhancer: a two-stage framework for prediction and localization of enhancers based on Blending-KAN and Stacking-Auto models.","authors":"Aimin Li, Haotian Zhou, Rong Fei, Juntao Zou, Xiguo Yuan, Yajun Liu, Saurav Mallik, Xinhong Hei, Lei Wang","doi":"10.1093/bioinformatics/btaf441","DOIUrl":"10.1093/bioinformatics/btaf441","url":null,"abstract":"<p><strong>Motivation: </strong>Gene expression plays a crucial role in cell function, and enhancers can regulate gene expression precisely. Therefore, accurate prediction of enhancers is particularly critical. However, existing prediction methods have low accuracy or rely on fixed multiple epigenetic signals, which may not always be available.</p><p><strong>Results: </strong>We propose a two-stage framework that accurately predicts enhancers by flexibly combining multiple epigenetic signals. In the first stage, we designed a Blending-KAN model, which integrates the results of various base classifiers and employs Kolmogorov-Arnold Networks (KAN) as a meta-classifier to predict enhancers based on flexible combinations of multiple epigenetic signals. In the second stage, we developed a Stacking-Auto model, which extracted sequence features using DNABERT-2 and located the enhancers based on the Stacking strategy and AutoGluon framework. The accuracy of the Blending-KAN model reached 99.69 ± 0.11% when five epigenetic signals were used. In cross-cell line prediction, the accuracy was more significant than or equal to 93.72%. With Gaussian noise, it still maintains an accuracy of 98.74 ± 0.03%. In the second stage, the accuracy of the Stacking-Auto model is 80.50%, which is better than the existing 17 methods. The results show that our models can be flexibly used to predict and locate enhancers utilizing a combination of multiple epigenetic signals.</p><p><strong>Availability and implementation: </strong>The source code is available at https://github.com/emanlee/Hi-Enhancer and https://doi.org/10.6084/m9.figshare.29262158.v1.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758598/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144839356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building multiscale Markov state models by systematic mapping of temporal communities. 基于时间群落系统映射的多尺度马尔可夫状态模型构建。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf585
Nir Nitskansky, Kessem Clein, Barak Raveh

Motivation: Biomolecules undergo dynamic transitions among metastable states to carry out their biological functions. Markov State Models (MSMs) effectively capture these metastable states and transitions at a defined temporal scale. However, biomolecular dynamics typically span multiple temporal scales, ranging from fast atomic vibrations to slower conformational changes and folding events.

Results: We introduce multiscale Markov State Models (mMSMs), which capture biomolecular dynamics across multiple temporal resolutions simultaneously via a hierarchy of MSMs, and mMSM-explore, an unsupervised algorithm for generating mMSMs through multiscale adaptive sampling with on-the-fly identification of temporally metastable states. We benchmark our method on a toy system with nested energy minima; on alanine dipeptide, first with and then without assuming prior knowledge of its two reaction coordinates; and finally, on a fast-folding 35-residue miniprotein, where we map folding pathways across scales. We demonstrate efficient mapping of energy landscapes, correct representation of multiscale hierarchies and transition states, accurate inference of stationary probabilities and transition kinetics, as well as de novo identification of underlying slow, intermediate, and fast reaction coordinates. mMSMs reveal how dynamic processes at different scales contribute collectively to the functional mechanisms of biomolecular machines.

Availability and implementation: Python code and instructions are available at https://github.com/ravehlab/mMSM.

动机:生物分子通过亚稳态之间的动态转变来实现其生物学功能。马尔可夫状态模型(mmsm)有效地捕获了这些亚稳态和在定义的时间尺度上的转变。然而,实际的动力学通常跨越多个时间尺度,从快速的原子振动到较慢的构象变化和折叠事件。结果:我们引入了多尺度马尔可夫状态模型(mmsm),该模型通过msm层次结构同时代表了多个时间分辨率的生物分子动力学,以及mMSM-explore,这是一种无监督算法,用于通过多尺度自适应采样生成mmsm,并实时识别时间亚稳态。我们在一个具有嵌套能量最小值的玩具系统上对我们的方法进行基准测试;在丙氨酸二肽上,先知道然后不知道它的两个反应坐标;最后,我们绘制了一个快速折叠的35个残基微型蛋白的折叠路径。我们展示了能量景观的有效映射,多尺度层次和过渡状态的正确表示,平稳概率和过渡动力学的准确推断,以及潜在的慢、中、快速反应坐标的从头识别。mmms揭示了不同尺度的动态过程如何共同促进生物分子机器的功能机制。可用性:Python代码和说明可在https://github.com/ravehlab/mMSM.Supplementary上获得:信息:补充数据可在Bioinformatics在线获得。
{"title":"Building multiscale Markov state models by systematic mapping of temporal communities.","authors":"Nir Nitskansky, Kessem Clein, Barak Raveh","doi":"10.1093/bioinformatics/btaf585","DOIUrl":"10.1093/bioinformatics/btaf585","url":null,"abstract":"<p><strong>Motivation: </strong>Biomolecules undergo dynamic transitions among metastable states to carry out their biological functions. Markov State Models (MSMs) effectively capture these metastable states and transitions at a defined temporal scale. However, biomolecular dynamics typically span multiple temporal scales, ranging from fast atomic vibrations to slower conformational changes and folding events.</p><p><strong>Results: </strong>We introduce multiscale Markov State Models (mMSMs), which capture biomolecular dynamics across multiple temporal resolutions simultaneously via a hierarchy of MSMs, and mMSM-explore, an unsupervised algorithm for generating mMSMs through multiscale adaptive sampling with on-the-fly identification of temporally metastable states. We benchmark our method on a toy system with nested energy minima; on alanine dipeptide, first with and then without assuming prior knowledge of its two reaction coordinates; and finally, on a fast-folding 35-residue miniprotein, where we map folding pathways across scales. We demonstrate efficient mapping of energy landscapes, correct representation of multiscale hierarchies and transition states, accurate inference of stationary probabilities and transition kinetics, as well as de novo identification of underlying slow, intermediate, and fast reaction coordinates. mMSMs reveal how dynamic processes at different scales contribute collectively to the functional mechanisms of biomolecular machines.</p><p><strong>Availability and implementation: </strong>Python code and instructions are available at https://github.com/ravehlab/mMSM.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstructing and comparing signal transduction networks from single-cell protein quantification data. 从单细胞蛋白定量数据重建和比较信号转导网络。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf675
Tim Stohn, Roderick A P M van Eijl, Klaas W Mulder, Lodewyk F A Wessels, Evert Bosdriesz

Motivation: Signal transduction networks regulate many essential biological processes and are frequently aberrated in diseases such as cancer. A mechanistic understanding of such networks, and how they differ between cell populations, is essential to design effective treatment strategies. Typically, such networks are computationally reconstructed based on systematic perturbation experiments, followed by quantification of signaling protein activity. Recent technological advances now allow for the quantification of the activity of many (signaling) proteins simultaneously in single cells. This makes it feasible to reconstruct or quantify signaling networks without performing systematic perturbations.

Results: Here, we introduce single-cell modular response analysis (scMRA) and single-cell comparative network reconstruction (scCNR) to derive signal transduction networks by exploiting the heterogeneity of single-cell (phospho-)protein measurements. The methods treat stochastic variation in total protein abundances as natural perturbation experiments, whose effects propagate through the network and hence facilitate the reconstruction and quantification of the underlying signaling network. scCNR reconstructs cell population-specific networks, where cells from different populations have the same underlying topology, but the interaction strengths can differ between populations. We extensively validated scMRA and scCNR on simulated data, and applied it to unpublished data of (phospho-)protein measurements of EGFR-inhibitor-treated keratinocytes to recover signaling differences downstream of EGFR. scCNR will help to unravel the mechanistic signaling differences between cell populations, and will subsequently guide the development of well-informed treatment strategies.

Availability and implementation: The code used for scCNR in this study has been deposited on Zenodo https://doi.org/10.5281/zenodo.17600937 and is also available as a Python module at https://github.com/ibivu/scmra. Additionally, data and code to reproduce all figures is available at https://github.com/tstohn/scmra_analysis.

动机:信号转导网络调节了许多基本的生物过程,在癌症等疾病中经常发生畸变。对这种网络的机制理解,以及它们在细胞群之间的差异,对于设计有效的治疗策略至关重要。通常,这样的网络是基于系统扰动实验的计算重建,然后是信号蛋白活性的量化。最近的技术进步现在允许在单个细胞中同时定量许多(信号)蛋白的活性。这使得在不进行系统扰动的情况下重建或量化信号网络成为可能。结果:在这里,我们引入单细胞模块化响应分析(scMRA)和单细胞比较网络重建(scCNR),通过利用单细胞(磷-)蛋白测量的异质性来推导信号转导网络。该方法将总蛋白丰度的随机变化视为自然扰动实验,其影响通过网络传播,从而促进了潜在信号网络的重建和量化。scCNR重建细胞群体特异性网络,其中来自不同群体的细胞具有相同的底层拓扑结构,但群体之间的相互作用强度可能不同。我们在模拟数据上广泛验证了scMRA和scCNR,并将其应用于未发表的EGFR抑制剂处理的角质形成细胞的(磷-)蛋白测量数据,以恢复EGFR下游的信号差异。scCNR将有助于揭示细胞群之间信号传导的机制差异,并将随后指导良好的治疗策略的发展。可用性和实现:本研究中用于scCNR的代码已经存放在Zenodo https://doi.org/10.5281/zenodo.17600937上,也可以在https://github.com/ibivu/scmra上作为python模块获得。此外,复制所有数字的代码可在https://github.com/tstohn/scmra_analysis上获得。
{"title":"Reconstructing and comparing signal transduction networks from single-cell protein quantification data.","authors":"Tim Stohn, Roderick A P M van Eijl, Klaas W Mulder, Lodewyk F A Wessels, Evert Bosdriesz","doi":"10.1093/bioinformatics/btaf675","DOIUrl":"10.1093/bioinformatics/btaf675","url":null,"abstract":"<p><strong>Motivation: </strong>Signal transduction networks regulate many essential biological processes and are frequently aberrated in diseases such as cancer. A mechanistic understanding of such networks, and how they differ between cell populations, is essential to design effective treatment strategies. Typically, such networks are computationally reconstructed based on systematic perturbation experiments, followed by quantification of signaling protein activity. Recent technological advances now allow for the quantification of the activity of many (signaling) proteins simultaneously in single cells. This makes it feasible to reconstruct or quantify signaling networks without performing systematic perturbations.</p><p><strong>Results: </strong>Here, we introduce single-cell modular response analysis (scMRA) and single-cell comparative network reconstruction (scCNR) to derive signal transduction networks by exploiting the heterogeneity of single-cell (phospho-)protein measurements. The methods treat stochastic variation in total protein abundances as natural perturbation experiments, whose effects propagate through the network and hence facilitate the reconstruction and quantification of the underlying signaling network. scCNR reconstructs cell population-specific networks, where cells from different populations have the same underlying topology, but the interaction strengths can differ between populations. We extensively validated scMRA and scCNR on simulated data, and applied it to unpublished data of (phospho-)protein measurements of EGFR-inhibitor-treated keratinocytes to recover signaling differences downstream of EGFR. scCNR will help to unravel the mechanistic signaling differences between cell populations, and will subsequently guide the development of well-informed treatment strategies.</p><p><strong>Availability and implementation: </strong>The code used for scCNR in this study has been deposited on Zenodo https://doi.org/10.5281/zenodo.17600937 and is also available as a Python module at https://github.com/ibivu/scmra. Additionally, data and code to reproduce all figures is available at https://github.com/tstohn/scmra_analysis.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12797212/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145822381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spider: a flexible and unified framework for simulating spatial transcriptomics data. Spider:一个灵活和统一的框架,用于模拟空间转录组学数据。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf562
Jiyuan Yang, Nana Wei, Yang Qu, Congcong Hu, Weiwei Zhang, Lin Liu, Hua-Jun Wu, Xiaoqi Zheng

Motivation: Spatial transcriptomics (ST) technologies provide valuable insights into cellular heterogeneity by simultaneously acquiring both gene expression profiles and cellular location information. However, the limited diversity and accuracy of "gold standard" datasets hindered the effectiveness and fairness of benchmarking rapidly growing ST analysis tools.

Results: To address this issue, we proposed Spider, a flexible and comprehensive framework for simulating ST data without requiring real ST data as a reference. By characterizing the spatial patterns using cell type proportions and transition matrix between adjacent cells, Spider can produce more realistic and diverse simulated data and offer enhanced modeling flexibility compared to existing simulation methods. Additionally, Spider provides interactive features for customizing the spatial domain, such as zone segmentation and integration of histology imaging data. Benchmark analyses demonstrate that Spider outperforms other simulation tools in preserving the spatial characteristics of real ST data and facilitating the evaluation of downstream analysis methods. Spider is implemented in Python and available at https://github.com/YANG-ERA/Spider.

Availability and implementation: All codes, simulated ST data in this paper are publicly available at https://github.com/YANG-ERA/Spider.

动机:空间转录组学(ST)技术通过同时获取基因表达谱和细胞位置信息,为细胞异质性提供了有价值的见解。然而,“金标准”数据集的有限多样性和准确性阻碍了对快速增长的ST分析工具进行基准测试的有效性和公平性。结果:为了解决这一问题,我们提出了Spider,这是一个灵活而全面的框架,可以在不需要参考真实ST数据的情况下模拟ST数据。通过使用单元格类型比例和相邻单元格之间的过渡矩阵来表征空间格局,与现有的仿真方法相比,Spider可以产生更真实和多样化的模拟数据,并提供更强的建模灵活性。此外,Spider还提供了用于自定义空间域的交互功能,例如区域分割和组织学成像数据的集成。基准分析表明,Spider在保留真实ST数据的空间特征和便于下游分析方法的评估方面优于其他模拟工具。Spider是用Python实现的,可以在https://github.com/YANG-ERA/Spider.Availability上获得:所有代码,本文中的模拟ST数据都可以在https://github.com/YANG-ERA/Spider.Supplementary上公开获得:补充数据可以在Bioinformatics online上获得。
{"title":"Spider: a flexible and unified framework for simulating spatial transcriptomics data.","authors":"Jiyuan Yang, Nana Wei, Yang Qu, Congcong Hu, Weiwei Zhang, Lin Liu, Hua-Jun Wu, Xiaoqi Zheng","doi":"10.1093/bioinformatics/btaf562","DOIUrl":"10.1093/bioinformatics/btaf562","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics (ST) technologies provide valuable insights into cellular heterogeneity by simultaneously acquiring both gene expression profiles and cellular location information. However, the limited diversity and accuracy of \"gold standard\" datasets hindered the effectiveness and fairness of benchmarking rapidly growing ST analysis tools.</p><p><strong>Results: </strong>To address this issue, we proposed Spider, a flexible and comprehensive framework for simulating ST data without requiring real ST data as a reference. By characterizing the spatial patterns using cell type proportions and transition matrix between adjacent cells, Spider can produce more realistic and diverse simulated data and offer enhanced modeling flexibility compared to existing simulation methods. Additionally, Spider provides interactive features for customizing the spatial domain, such as zone segmentation and integration of histology imaging data. Benchmark analyses demonstrate that Spider outperforms other simulation tools in preserving the spatial characteristics of real ST data and facilitating the evaluation of downstream analysis methods. Spider is implemented in Python and available at https://github.com/YANG-ERA/Spider.</p><p><strong>Availability and implementation: </strong>All codes, simulated ST data in this paper are publicly available at https://github.com/YANG-ERA/Spider.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CholeraSeq: a comprehensive genomic pipeline for cholera surveillance and near real-time outbreak investigation. CholeraSeq:用于霍乱监测和近实时疫情调查的全面基因组管道。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf665
Massimiliano S Tagliamonte, Abhinav Sharma, Alberto Riva, Monika Moir, Marco Salemi, Cheryl Baxter, Tulio de Oliveira, Carla N Mavian, Eduan Wilkinson

Summary: Next Generation Sequencing is widely deployed in cholera-endemic regions, yet an end-to-end reproducible pipeline that unifies read QC, filtering, reference mapping, variant calling/annotation, recombination screening, and extraction of parsimony informative sites/variant codons, phylogenetic inference for downstream phylodynamic and epidemiological analyses have been lacking, slowing outbreak investigation and public health response. CholeraSeq is a high-throughput genomics pipeline for cholera genomic surveillance. It ingests consensus genomes, short read sequence data, draft assemblies, and scales seamlessly from local to cloud environments. To accelerate epidemiological context placement of new outbreak strains, we provide a curated ready-to-use core genome alignment compiled from public data, enabling flexible, fast, integration of new samples for outbreak investigations.

Availability and implementation: CholeraSeq is freely available on the GitHub platform https://github.com/CERI-KRISP/CholeraSeq. CholeraSeq is implemented in Nextflow with a modular design building upon the nf-core community standards.

摘要:动机:下一代测序技术已广泛应用于霍乱流行地区,但缺乏端到端的可重复管道,该管道将读取QC、过滤、参考图谱、变异调用/注释、重组筛选、简约信息位点/变异密码子提取、用于下游系统动力学和流行病学分析的系统发育推断结合起来,从而减缓了疫情调查和公共卫生反应。结果:CholeraSeq是一个用于霍乱基因组监测的高通量基因组学管道。它摄取一致的基因组、短读序列数据、草稿程序集,并从本地环境无缝扩展到云环境。为了加快在流行病学背景下对新爆发菌株的定位,我们提供了从公共数据汇编而成的经过策划的现成核心基因组比对,从而能够灵活、快速地整合新样本,用于爆发调查。可用性和实现:CholeraSeq在GitHub平台https://github.com/CERI-KRISP/CholeraSeq上免费提供。CholeraSeq在Nextflow中实现,采用基于非核心社区标准的模块化设计。补充信息:现成的参考核心对齐和相关的元数据:https://doi.org/10.5281/zenodo.16909942。
{"title":"CholeraSeq: a comprehensive genomic pipeline for cholera surveillance and near real-time outbreak investigation.","authors":"Massimiliano S Tagliamonte, Abhinav Sharma, Alberto Riva, Monika Moir, Marco Salemi, Cheryl Baxter, Tulio de Oliveira, Carla N Mavian, Eduan Wilkinson","doi":"10.1093/bioinformatics/btaf665","DOIUrl":"10.1093/bioinformatics/btaf665","url":null,"abstract":"<p><strong>Summary: </strong>Next Generation Sequencing is widely deployed in cholera-endemic regions, yet an end-to-end reproducible pipeline that unifies read QC, filtering, reference mapping, variant calling/annotation, recombination screening, and extraction of parsimony informative sites/variant codons, phylogenetic inference for downstream phylodynamic and epidemiological analyses have been lacking, slowing outbreak investigation and public health response. CholeraSeq is a high-throughput genomics pipeline for cholera genomic surveillance. It ingests consensus genomes, short read sequence data, draft assemblies, and scales seamlessly from local to cloud environments. To accelerate epidemiological context placement of new outbreak strains, we provide a curated ready-to-use core genome alignment compiled from public data, enabling flexible, fast, integration of new samples for outbreak investigations.</p><p><strong>Availability and implementation: </strong>CholeraSeq is freely available on the GitHub platform https://github.com/CERI-KRISP/CholeraSeq. CholeraSeq is implemented in Nextflow with a modular design building upon the nf-core community standards.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cleanifier: contamination removal from microbial sequences using spaced seeds of a human pangenome index. 净化器:利用人类泛基因组指数的间隔种子去除微生物序列中的污染。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf632
Jens Zentgraf, Johanna Elena Schmitz, Sven Rahmann

Motivation: The first step when working with DNA data of human-derived microbiomes is to remove human contamination for two reasons. First, many countries have strict privacy and data protection guidelines for human sequence data, so microbiome data containing partly human data cannot be easily further processed or published. Second, human contamination may cause problems in downstream analysis, such as metagenomic binning or genome assembly. For large-scale metagenomics projects, fast and accurate removal of human contamination is therefore critical.

Results: We introduce Cleanifier, a fast and memory frugal alignment-free tool for detecting and removing human contamination based on gapped k-mers, or spaced seeds. Cleanifier uses a pangenome index of known human gapped k-mers, and the creation and use of alternative references is also possible. Reads are classified and filtered according to their gapped k-mer content. Cleanifier supports two filtering modes: one that queries all gapped k-mers and one that queries only a sample of them. A comparison of Cleanifier with other state-of-the-art tools shows that the sampling mode makes Cleanifier the fastest method with comparable accuracy. When using a probabilistic Cuckoo filter to store the complete k-mer set, Cleanifier has similar memory requirements to methods that use a sampled minimizer index. At the same time, Cleanifier is more flexible, because it can use different sampling methods on the same index.

Availability and implementation: Cleanifier is available via gitlab (https://gitlab.com/rahmannlab/cleanifier), PyPi (https://pypi.org/project/cleanifier/), and Bioconda (https://anaconda.org/bioconda/cleanifier). The pre-computed human pangenome index is available at Zenodo (https://doi.org/10.5281/zenodo.15639519).

动机:处理人类来源的微生物组DNA数据的第一步是消除人类污染,原因有两个。首先,许多国家对人类序列数据有严格的隐私和数据保护准则,因此包含部分人类数据的微生物组数据不容易进一步处理或公布。其次,人类污染可能导致下游分析出现问题,如宏基因组分拆或基因组组装。因此,对于大规模宏基因组学项目,快速准确地去除人类污染至关重要。结果:我们介绍了Cleanifier,这是一种基于间隙k-mers或间隔种子的快速且节省内存的无对齐工具,用于检测和去除人类污染。清洁器使用已知人类缺口k-mers的泛基因组索引,并且创建和使用替代参考也是可能的。根据它们的k-mer内容进行分类和过滤。Cleanifier支持两种过滤模式:一种查询所有间隙k-mers,另一种只查询其中的一个样本。Cleanifier与其他最先进的工具的比较表明,采样模式使Cleanifier最快的方法具有相当的准确性。当使用概率Cuckoo过滤器来存储完整的k-mer集时,Cleanifier与使用采样最小化索引的方法具有相似的内存需求。同时,Cleanifier更加灵活,因为它可以对同一指标使用不同的采样方法。可用性和实现:Cleanifier可通过gitlab (https://gitlab.com/rahmannlab/cleanifier), PyPi (https://pypi.org/project/cleanifier/)和Bioconda (https://anaconda.org/bioconda/cleanifier)获得。预先计算的人类泛基因组指数可在Zenodo上获得(https://doi.org/10.5281/zenodo.15639519).Supplementary information:可在线获得)。
{"title":"Cleanifier: contamination removal from microbial sequences using spaced seeds of a human pangenome index.","authors":"Jens Zentgraf, Johanna Elena Schmitz, Sven Rahmann","doi":"10.1093/bioinformatics/btaf632","DOIUrl":"10.1093/bioinformatics/btaf632","url":null,"abstract":"<p><strong>Motivation: </strong>The first step when working with DNA data of human-derived microbiomes is to remove human contamination for two reasons. First, many countries have strict privacy and data protection guidelines for human sequence data, so microbiome data containing partly human data cannot be easily further processed or published. Second, human contamination may cause problems in downstream analysis, such as metagenomic binning or genome assembly. For large-scale metagenomics projects, fast and accurate removal of human contamination is therefore critical.</p><p><strong>Results: </strong>We introduce Cleanifier, a fast and memory frugal alignment-free tool for detecting and removing human contamination based on gapped k-mers, or spaced seeds. Cleanifier uses a pangenome index of known human gapped k-mers, and the creation and use of alternative references is also possible. Reads are classified and filtered according to their gapped k-mer content. Cleanifier supports two filtering modes: one that queries all gapped k-mers and one that queries only a sample of them. A comparison of Cleanifier with other state-of-the-art tools shows that the sampling mode makes Cleanifier the fastest method with comparable accuracy. When using a probabilistic Cuckoo filter to store the complete k-mer set, Cleanifier has similar memory requirements to methods that use a sampled minimizer index. At the same time, Cleanifier is more flexible, because it can use different sampling methods on the same index.</p><p><strong>Availability and implementation: </strong>Cleanifier is available via gitlab (https://gitlab.com/rahmannlab/cleanifier), PyPi (https://pypi.org/project/cleanifier/), and Bioconda (https://anaconda.org/bioconda/cleanifier). The pre-computed human pangenome index is available at Zenodo (https://doi.org/10.5281/zenodo.15639519).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758600/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145552501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
cuteSV-OL: a real-time structural variation detection framework for nanopore sequencing devices. cuteSV-OL:用于纳米孔测序装置的实时结构变化检测框架。
IF 5.4 Pub Date : 2026-01-02 DOI: 10.1093/bioinformatics/btaf668
Weimin Guo, Yadong Liu, Yadong Wang, Tao Jiang

Summary: Nanopore sequencing technology enables real-time sequencing and is widely used in rapid detection applications. However, in clinical scenarios, existing structural variant (SV) detection tools typically separate sequencing from computation, limiting their timeliness for clinical applications. To address this, we introduce cuteSV-OL, a novel framework designed for real-time SV discovery, which can be embedded within nanopore sequencing instruments to analyze data concurrently with its generation. Additionally, cuteSV-OL features a real-time SV detection rate evaluation module, allowing users to terminate sequencing early when appropriate, thereby reducing time and cost. Experimental results show that on a standard desktop computer, cuteSV-OL can perform real-time analysis during sequencing and complete SV calling within min after sequencing ends, achieving performance comparable to offline methods. This approach has the potential to enhance rapid clinical diagnostics.

Availability and implementation: cuteSV-OL is released under the MIT license and is available at https://github.com/gwmHIT/cuteSV-OL. It can also be installed via Bioconda or accessed through https://doi.org/10.5281/zenodo.17777436.

摘要:纳米孔测序技术实现了实时测序,广泛应用于快速检测领域。然而,在临床场景中,现有的结构变异(SV)检测工具通常将测序与计算分离,限制了其临床应用的及时性。为了解决这个问题,我们引入了cuteSV-OL,这是一个为实时SV发现而设计的新框架,它可以嵌入到纳米孔测序仪器中,在数据生成的同时分析数据。此外,cuteSV-OL还具有实时SV检测率评估模块,允许用户在适当的时候提前终止测序,从而减少时间和成本。实验结果表明,在标准台式计算机上,cuteSV-OL可以在测序过程中进行实时分析,并在测序结束后几分钟内完成SV调用,性能可与离线方法媲美。这种方法具有增强快速临床诊断的潜力。可用性和实现:cuteSV-OL在MIT许可下发布,可从https://github.com/gwmHIT/cuteSV-OL获得。它也可以通过Bioconda安装或通过https://doi.org/10.5281/zenodo.17777436.Supplementary信息访问:补充数据可在Bioinformatics在线获得。
{"title":"cuteSV-OL: a real-time structural variation detection framework for nanopore sequencing devices.","authors":"Weimin Guo, Yadong Liu, Yadong Wang, Tao Jiang","doi":"10.1093/bioinformatics/btaf668","DOIUrl":"10.1093/bioinformatics/btaf668","url":null,"abstract":"<p><strong>Summary: </strong>Nanopore sequencing technology enables real-time sequencing and is widely used in rapid detection applications. However, in clinical scenarios, existing structural variant (SV) detection tools typically separate sequencing from computation, limiting their timeliness for clinical applications. To address this, we introduce cuteSV-OL, a novel framework designed for real-time SV discovery, which can be embedded within nanopore sequencing instruments to analyze data concurrently with its generation. Additionally, cuteSV-OL features a real-time SV detection rate evaluation module, allowing users to terminate sequencing early when appropriate, thereby reducing time and cost. Experimental results show that on a standard desktop computer, cuteSV-OL can perform real-time analysis during sequencing and complete SV calling within min after sequencing ends, achieving performance comparable to offline methods. This approach has the potential to enhance rapid clinical diagnostics.</p><p><strong>Availability and implementation: </strong>cuteSV-OL is released under the MIT license and is available at https://github.com/gwmHIT/cuteSV-OL. It can also be installed via Bioconda or accessed through https://doi.org/10.5281/zenodo.17777436.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777969/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1